elastic_cluster good value and clarification.

Hi, I have a 5 node cluster, and wonder what should be a good value for scaling_factor. and is_local_segment_enabled. The defaults (or at least my current values) are 4 and false, respectively. If I understand well, the scaling_factor does not matter too much if the number of nodes does not change often. Is that right? Regarding the is_local_segment_enabled, what impact/consequences would this one have? We have a (apparently) very busy instance of Vertica, even if we are not too sure yet where the apparent slowness come from. One of our table in particular contains about 8 billions rows, is partitioned on one column, with every day a few tens of millions added and removed. My guess is that local_segment enabled, the rows would be more evenly spread over the cluster, thus leading to better performance? Thanks,

Comments

  • Hi Guillaume, The answer to these questions is, of course, "it depends" :-) "scaling_factor" says how much you can grow your cluster without a very-expensive resegment operation to re-split the data. "is_local_segment_enabled" will not actually have any impact on how data is distributed around the cluster. That would be cluster segmentation. Local segmentation actually splits the data up further, into multiple segments within each node. In the same way that cluster segmentation allows multiple nodes to work on different pieces of the data at once, local segmentation allows multiple threads within each node to (more easily) work on different pieces of the data at once. So each query can use more CPU cores. But re-splitting the data into local segments is quite expensive; I wouldn't bother enabling this if you're seeing good CPU utilization already. Regarding your workload, are you using DELETE to delete all of those records? Or are you using DROP PARTITION? If DELETE, you might want to take a look at how large your delete vectors are; that might be the cause of your reduced performance. (See the DELETE_VECTORS system table.) Adam
  • Thanks for this, Adam. For what I understand. this means that those values are good enough for now. For the workload, we use a mix of MERGE and are working on using partitions. I am cleaning up the delete vectors once a week because I already noticed it was a big issue. The thing is our performance issues seem global. A simple select count(*) from nodes; via vsql takes between 0.03 and 100 seconds, without obvious connection with the rest of the workload. That is why I was trying to find a global fix. Cheers,
  • Hi Guillaume, For the workload, ah, interesting. With a 5-node cluster, I assume you're an Enterprise customer? In which case, I'd encourage you to open a support case, if you haven't already; support has a whole process for tracking down potential performance issues. One possibly non-obvious thing to check: Are you seeing any dropped UDP network packets? Vertica uses UDP for its control layer; some fancy switches see all of Vertica's UDP traffic and think we're a video-streaming program that's misbehaving or something, and start dropping our UDP packets. This can cause all kinds of weird issues with the cluster appearing idle but not responsive... (There are, of course, a bunch of more-common issues, the usual suspects regarding locking/concurrency/etc., that they can help with. For Community members who don't have a support contract, the first thing I'd recommend is to make sure you're running the latest version of Vertica.) Adam

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file