Compress result set to reduce network time
My client is selecting a large data set from vertica ( vsql -h ... -c "sql query" ) and expects a faster completion time by compressing the network traffic to the client. Is there an option on the driver ( any client driver) to compress client traffic and decompress it on the client side ?
Thanks.
0
Comments
POST REMOVED
Check out the "CompressNetworkData" database parameter. It will run LZO compression before sending data.
See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/AdministratorsGuide/ConfiguringTheDB/GeneralParameters.htm
It can only be enabled at the database level...
Thanks Jim for the response.
I read about the parameter in the docs and it states that
"Compresses all data sent over the internal network when enabled (value set to 1). This compression speeds up network traffic at the expense of added CPU load. If the network is throttling database performance, enable compression to correct the issue."
Since it is the internal network, not the external ( client connection) this would only be compressing traffic across the vertica nodes. We need traffic between sql client and host endpoint to be compressed. I believe this parameter would not do that.
Yeah, it compresses data on the data channel (TCP). So you may see increased performance in general as query results from executor nodes are passed back to the initiator node.
I do not believe there is a way to tell the initiator node to compress data and pass it back to the client, or have the client (i.e. vsql) to ask for compressed data.
A basic compress send decompress approach suggested by one of the engineers involved goes like this :
time (ssh user@verticanode 'vsql -f query.sql | gzip -c -f' | gunzip > ~/tmp/out6.txt)
and the time to execute drops in half.
Hopefully useful to anyone having the same need.
Hi All -
If I'm not completely mistaken, the Vertica JDBC and ODBC drivers do exactly what seems to be required here: When you open a cursor from a SELECT in your program, the Vertica server side compresses the data before putting it onto the network, and the driver on the client side decompresses the incoming data before handing it on to the process that called the API.
Is it really true that vsql behaves differently?
Thanks @marcothesane
I was originally testing with the open source vertica-python driver -- https://github.com/uber/vertica-python -- which I believe does not support compression of results over the wire. As a comparison, we tried the technique @colin_loghin mentioned above.
A crude benchmark showed the query with vertica-python took about 6 minutes, while this method of gzipping remotely and gunzipping locally took about 30 seconds. A similar test using an ssh tunnel with compression then querying localhost:port (using vertica-python) took about 50 seconds.
Based on your comment, I tried pyodbc and I'm seeing times of about 2 minutes. This is a big improvement! Looking at the traffic on the wire using wireshark, it looks like the results are being compressed. Do you know if packets are compressed individually, or if the entire result set is compressed, then sent over the wire. In the latter case, you would get much better compression. I suspect the packets are compressed, which is the reason using pyodbc still takes about 4 times as long as the ssh/gzip/gunzip method.
Is this client/server compression negotiation documented? It would be great if Vertica could contribute to the open source vertica-python project.
thanks,
Dennis
You can try proxy: do request against a proxy and proxy will compress on the fly(pipeline) a response from vertica.
It will take less than 100 lines of source code on python.
@sKwa I did consider using an ssh tunnel for compression but for security reasons it's not an option. If there is another type of proxy you are suggesting I would be interested to learn more.
@marcothesane For the record, vsql run from the local client has similar performance to using the ODBC driver (via pyodbc). Both are much faster than vertica-python for this use case. If there is any documentation on how the client can tell the server it supports compression, and details on the compression, please share. The vertica-python project has a lot of interest, so improving that would make a big impact.
thanks,
Dennis