How does CompressNetworkData work?
To improve performance across country, we are interested in the CompressNetworkData configuration parameter to reduce payload and speed up queries.
But simply asserting CompressNetworkData in a cluster has no effect. Payload is exactly the same size, asserted or not, according to tcpdump analysis.
Are other factors involved in making CompressNetworkData work? Are there client side dependencies?
Thanks!
But simply asserting CompressNetworkData in a cluster has no effect. Payload is exactly the same size, asserted or not, according to tcpdump analysis.
Are other factors involved in making CompressNetworkData work? Are there client side dependencies?
Thanks!
0
Comments
There are, unfortunately, some significant limitations in the current design and implementation of this parameter. For some workloads, it may simply not work as desired.
We hope to address this in a future release. So, keep an eye out.
One potential limitation: I would expect you to find that the limiting factor in this type of setup is actually not bandwidth; it's latency. Vertica is extremely sensitive to the latency between nodes. An additional 100ms of latency between just two nodes in the cluster can, in some cases, add seconds to the runtime of transactions. Additionally, long-distance links often become saturated (from Vertica itself or from other applications sharing the same fiber); this can greatly slow and even de-stabilize the cluster, as nodes fail to respond to heartbeats. Some of this can be addressed by configuration; I don't believe we have any whitepapers/etc on how to do it, though, and it's not yet automated.
You're of course welcome to experiment with this sort of configuration. If you do, feel free to post your results; I think various people here would be curious to hear more. For what it's worth, the official Vertica recommendation is, I believe, to run one cluster per data center.
Adam
CompressNetworkData is for data sent between the Vertica nodes not between the Vertica node and the application. For what you understand in your question is that the computer that is in other zone is the computer consuming the data not one of the vertica ndoes. Vertica is not supported with nodes in different zones, and Adam explain perfectly why.
Hope this helps
Eugenia
HP Vertica Employee
In addition to Eugenia & Adam's comments, a typical use case for CompressNetworkData is when exporting or copying between Vertica clusters. If the additional CPU load is acceptable, the benefit is there. in one case, a 20TB copy improved 30% with compression (mileage will vary).
I also run accross the "CompressNetworkData" configuration whle try to improve my cluster netowrking performace, just 1 cluster of 50 nodes. Our 1Gbps private network to lag and when we ware on 7.1.12 nodes were just flapping. After upgrade to 7.2.3-1, the flapping stop but logging into MC take a long time queries timeout.
I know this thread is over a year old but are there anything in this setting has change? Base on what the documentation for 7.2.x. This might seem to help our situation.
https://my.vertica.com/docs/7.2.x/HTML/Content/Authoring/AdministratorsGuide/ConfiguringTheDB/GeneralParameters.htm
"Compresses all data sent over the internal network when enabled (value set to 1). This compression speeds up network traffic at the expense of added CPU load. If the network is throttling database performance, enable compression to correct the issue.
Default Value: 0"
Thanks