Pulse- User Influence

I have created scripts to ingest followers & following details on all users who've been indexed with Pulse.  I'm currently running update_sentiment every 2 minutes.  When I have a large # of users it takes a long time to do all my scripted wgets, etc.  2 questions:

1- Has anyone figured out a way to run sentiment analysis as the tweets are being loaded in?
2- Does anyone have a fast way of acquiring followers/following to further enrich the data?

Thanks for your help.

Comments

  • PRanaPRana Employee
    Hi Jeromie,

    If you're fine with slight latency (10 seconds), they you can lower the latency of the script to run every 10 seconds instead of every 2 minutes. You can also use the SocialMediaConnector  to load tweets into Vertica. It runs every 10 seconds or when we hit 10000 tweets, whichever happens first, but the tweets don't get analyzed by Pulse until your script runs. No promises of good performance, but this is the easiest fix if 10 seconds latency is acceptable.

    If you absolutely need Pulse run on each individual tweet they have two options, neither of which we would recommend for performance reasons:

    1. Stop using the script  and modify the VerticaSink code so that it does the following
    • "insert into tweet_sentiment select " + id_of_tweet + ", " + username_of_tweets + ", sentimentanalysis(" + text_of_tweet + ") over();"
    •  Periodically the JSON file with a bunch of tweets is loaded into Vertica. This is literally running Pulse on each individual tweet. However it would mean that sometimes the analyzed tweet gets put into Vertica before the actual text of the tweet does

    2. You can set batchSize of the
    SocialMediaConnector to 1 to force tweets to be loaded into Vertica as they arrive.  However, if you're getting hundreds of tweets per second the script will probably still be grabbing a bunch of tweets at a time rather than individual ones no matter how low you set the time for the script runs.


    Getting to you second question, the SocialMediaConnector fetches the entire tweet, you can define you table schema to extract all the fields that are relevant to your use case.


    Pratibha

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file