Pulse- User Influence
I have created scripts to ingest followers & following details on all users who've been indexed with Pulse. I'm currently running update_sentiment every 2 minutes. When I have a large # of users it takes a long time to do all my scripted wgets, etc. 2 questions:
1- Has anyone figured out a way to run sentiment analysis as the tweets are being loaded in?
2- Does anyone have a fast way of acquiring followers/following to further enrich the data?
Thanks for your help.
1- Has anyone figured out a way to run sentiment analysis as the tweets are being loaded in?
2- Does anyone have a fast way of acquiring followers/following to further enrich the data?
Thanks for your help.
0
Comments
If you're fine with slight latency (10 seconds), they you can lower the latency of the script to run every 10 seconds instead of every 2 minutes. You can also use the SocialMediaConnector to load tweets into Vertica. It runs every 10 seconds or when we hit 10000 tweets, whichever happens first, but the tweets don't get analyzed by Pulse until your script runs. No promises of good performance, but this is the easiest fix if 10 seconds latency is acceptable.
If you absolutely need Pulse run on each individual tweet they have two options, neither of which we would recommend for performance reasons:
1. Stop using the script and modify the VerticaSink code so that it does the following
2. You can set batchSize of the SocialMediaConnector to 1 to force tweets to be loaded into Vertica as they arrive. However, if you're getting hundreds of tweets per second the script will probably still be grabbing a bunch of tweets at a time rather than individual ones no matter how low you set the time for the script runs.
Getting to you second question, the SocialMediaConnector fetches the entire tweet, you can define you table schema to extract all the fields that are relevant to your use case.
Pratibha