performance of simple query on from-scratch database varies 2x
Hi Folks,
I've started an experiment on our small 4-node cluster to simulate possible performance gains in adding another node. (A good excuse to learn Vertica along the way :-) To do this I'm running our join-heavy benchmark query on 3 nodes and then again after adding the fourth node. I don't yet understand why running DBD makes the query twice as slow, but in working to understand this, I discovered another oddity: When I create my test database from scratch and then run my query (no DBD, just default partitions), I get two (seemingly random?) execution times: 6 minutes (2/10 runs) and 11.5 minutes (8/10 runs). Weird! I haven't been able to recreate the 'fast' result in the last few trials, so I don't have EXPLAIN or PROFILE results to share. But I'm hoping you can share a few possibile explanations. Note that I'm using the same three nodes each time, and that there are no non-OS processes running on them.
Thanks in advance!
I've started an experiment on our small 4-node cluster to simulate possible performance gains in adding another node. (A good excuse to learn Vertica along the way :-) To do this I'm running our join-heavy benchmark query on 3 nodes and then again after adding the fourth node. I don't yet understand why running DBD makes the query twice as slow, but in working to understand this, I discovered another oddity: When I create my test database from scratch and then run my query (no DBD, just default partitions), I get two (seemingly random?) execution times: 6 minutes (2/10 runs) and 11.5 minutes (8/10 runs). Weird! I haven't been able to recreate the 'fast' result in the last few trials, so I don't have EXPLAIN or PROFILE results to share. But I'm hoping you can share a few possibile explanations. Note that I'm using the same three nodes each time, and that there are no non-OS processes running on them.
Thanks in advance!
0
Comments
There are few steps in making this possible after adding a new node to your Vertica Cluster:
1- re-balance your data after the new node is up(this can be quite resource costly).
2- run the DBD (query specific or comprehensive)
3- review the projections segmentation and replication according to your needs.
4- collect the statistics on the new design layout.
- this are the need to steps - more actions are up to your app needs.