Real Time Analysis using DistributedR & Vertica
What is the best way to combine Vertica and DistributedR to get real-time predictions based on a clustering model using DistributedR
Use Case:
- We have a model which clusters twitter data using location and retweet count and stores the results in vertica.
- This will be run on a weekly basis.
- For tweets coming in real time, we want to predict which cluster is the best fit for each tweet.
- We could be looking at up to 500 tweets per minute at peak times.
What is the recommended way to combine Vertica and DistributedR in production for this use case?
0