Understand workload analyzer recommendations

Hi, I have a 4 nodes cluster, being loaded at around 20GB/day, and executing analytics queries regularly, every mn, hour, day, ... Overal activity reported by MC is: CPU from 10 to 50%, memory around 5%,. I recently decided to start working on improving execution times. I went in MC, looked at all tuning recommendations, and started by applying ALL analyze_statistics(), will play with Db Designer later. Then I asked for a new workload analyzer run, and it keeps me giving the exact same recommendations on same columns and tables. Can someone explain why I get same recommendations, again and again ? Thanks

Comments

  • Hi Luc, I'm not sure the answer. But I have a question: Could you describe (more precisely) what are the exact recommendations that you are being given, and what you've done to address those recommendations (if anything other than running analyze_statistics())? Also, if you're trying to tune query performance, how fast are your queries running currently? (Roughly speaking -- under a second?, minutes?, hours?) Thanks, Adam
  • I have filtered all analyze_statistics () recommendations I got from wla (such as : analyze statistics on table column xxx.YYY.ZZZ), and I have executed all of them: select analyze_statistics('xxx.YYY.ZZZ') I did strictly nothing else, and then hit the Analyze button. But I still see the same recommendations listed, absolutely no changes, that I do not understand. Currently, the set of queries that I would like to run every mn is taking about 1mn, so I can only run it every 2mn if I want to be safe. If, right after doing the analyze_statistics, I get the same recommendation, how can I know if it was usefull or not, and why the WLA ask me to do it again ?
  • Hm... Well, I don't know why the Workload Analyzer is still indicating that you should run analyze_statistics(). Hopefully someone else here can speak to that. In terms of knowing whether any given change was useful or not, well, did your queries get faster? :-) At the end of the day, that's presumably what really matters. Statistics are fickle -- some queries get dramatically faster; others, not at all. And it's hard to tell in advance. (The way Vertica normally judges this is, well, using statistics.) For the DB Designer, what that does is create optimized projections. Have you already created projections by hand, or are you just using Vertica's defaults? If the latter, especially if you're doing any big JOINs or your data has any often-used columns with lots of duplicate values, I would expect that to make a bigger difference towards your goals. (Though, of course, it also takes much more time to run :-) )
  • Some queries are indeed running 2 to 3 times faster, good ! But the thing is that I would like to understand if I am supposed to rerun the analyze_statistics ('') every 10mn, hour, day, week, month .... If the WLA keeps asking for the same recommendations day after day, when will I know that I am OK ...? For the moment, I am using Vertica defaults, no custom projections yet, will start soon
  • So, the short answer to when to run analyze_statistics() is, "whenever the distribution of values in the data in your tables has changed significantly." So not just when you add data; but when the numbers in the data look different, for example new/different dates, a re-numbering/re-pricing of products, that sort of thing. I don't think you need to re-analyze every 10min; that would be very unusual. (Though, I suppose, not impossible if you keep truncating and rebuilding the table or something.) I can't speak for the WLA; again hopefully someone else here can comment. But I would suggest that you just keep an eye on query runtimes over the next few days or weeks. If they get slower, try running analyze_statistics() again.
  • thread necromancy time...

    Workload Analyzer pulls statistics recommendations from the query profiles and looks for queries that were ran and produced "events" of "NO HISTOGRAM" or "PREDICATE OUT OF RANGE" events. 

    Updating the statistics on a table in the recommendation list won't remove the "event" of the previous query's no statistics message, but as the OP reported, it will improve query performance. The good news is, as the event data is moved along in the course of doing business, older data will get purged to make room for newer stuff, and thus will no longer be reported.... eventually.

    It might be worth modifying the rule to check against the actual table definition to verify that the table has statistics in order to no longer report on it.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file