performance of simple query on from-scratch database varies 2x

Hi Folks,

I've started an experiment on our small 4-node cluster to simulate possible performance gains in adding another node. (A good excuse to learn Vertica along the way :-) To do this I'm running our join-heavy benchmark query on 3 nodes and then again after adding the fourth node. I don't yet understand why running DBD makes the query twice as slow, but in working to understand this, I discovered another oddity: When I create my test database from scratch and then run my query (no DBD, just default partitions), I get two (seemingly random?) execution times: 6 minutes (2/10 runs) and 11.5 minutes (8/10 runs). Weird! I haven't been able to recreate the 'fast' result in the last few trials, so I don't have EXPLAIN or PROFILE results to share. But I'm hoping you can share a few possibile explanations. Note that I'm using the same three nodes each time, and that there are no non-OS processes running on them.

Thanks in advance!


  • Options
    I made a beginner mistake: I thought I had run multiple times to prime the caches, but I did not. Results of runs 2+ are consistently faster. Sorry about that. -- matt

  • Options
     Adding a new node will increase your overall performance because of the internal distributed query execution.
    here are few steps in making this possible after adding a new node to your Vertica Cluster: 
     1- re-balance your data after the new node is up(this can be quite resource costly).
     2- run the DBD (query specific or comprehensive)
     3- review the projections segmentation and replication according to your needs.
     4- collect the statistics on the new design layout.

    - this are the need to steps - more actions are up to your app needs.
  • Options
    Thanks, Adrian. I believe I did those, but I'll have a look. I appreciate it.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file