Real Time Analysis using DistributedR & Vertica

What is the best way to combine Vertica and DistributedR to get real-time predictions based on a clustering model using DistributedR

 

Use Case:

  • We have a model which clusters twitter data using location and retweet count and stores the results in vertica.
  • This will be run on a weekly basis. 

 

  • For tweets coming in real time, we want to predict which cluster is the best fit for each tweet. 
  • We could be looking at up to 500 tweets per minute at peak times.

What is the recommended way to combine Vertica and DistributedR in production for this use case?

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file