Performance issue
Hi, We have some performance issues using Vertica in production. Basically, Vertica is sometimes vveeeerrrryyyy slow, without me being able to pinpoint any particular cause. I have a 'performance monitoring' running every minute, which basically just do a 'select count(*) from nodes' via vsql directly on a node. This query takes when all goes well about 0.02 seconds, but sometimes it goes up to 30 seconds. Trying to understand what is going on, there is nothing obvious to me. I do monitor a few metrics, and there is no correlations: - not related to high load average - not related to high iowait - not related to high network traffic That said, those metrics tend to be higher when the slowness is higher, but not necessarily. This happens with or without a lot of users connected. It does happen even when no big queries are running. Usually there is no memory issue (using no more than 75%), sometimes some queries cannot allocate enough memory but that is not common. The processus or thread count goes high (up to 1500 threads for Vertica) but not through the roof. All the ulimits are very high. There is nothing weird in the logs, except a lot of:
Poll dispatch:0x77c5f40 [Dist] <WARNING> Messenger::readcb_r: closing fd 14 due to early read error: EOF 2013-06-27 02:11:03.475 Init Session:0x7f1860018ea0 <LOG> @v_spil_dwh_node0001: 00000/2705: Connection received: host=172.16.0.122 port=44250 (connCnt 1) 2013-06-27 02:11:03.475 Init Session:0x7f1860018ea0 <LOG> @v_spil_dwh_node0001: {SessionRun} 00000: missing error text 2013-06-27 02:11:03.745 Init Session:0x7f1860024a90 <LOG> @v_spil_dwh_node0001: 00000/2705: Connection received: host=172.16.0.123 port=56382 (connCnt 1) 2013-06-27 02:11:03.745 Init Session:0x7f1860024a90 <LOG> @v_spil_dwh_node0001: {SessionRun} 00000: missing error text 2013-06-27 02:11:04.097 Init Session:0x7f1860017770 <LOG> @v_spil_dwh_node0001: 00000/2705: Connection received: host=172.16.0.121 port=36722 (connCnt 1) 2013-06-27 02:11:04.097 Init Session:0x7f1860017770 <LOG> @v_spil_dwh_node0001: {SessionRun} 00000: missing error text 2013-06-27 02:11:04.128 Poll dispatch:0x77c5f40 [Dist] <WARNING> Messenger::readcb_r: closing fd 26 due to early read error: EOF </WARNING></LOG></LOG></LOG></LOG></LOG></LOG></WARNING>The very weird thing is, I updated vertica from 6.1.1 to 6.1.2 recently, and during 2-3 hours after update the speed was just fantastic. Then it went back to the normal slow self. Trying to restart vertica later on all nodes did not give any improvement, even temporary. I know this is a confused description of my problem (but then I really am confused about this). I would be very grateful to get any insight, ideas or similar issues. Thanks,
0
Comments
Parvendra Adhran
Are the slowdown is mainly on catalog query’s or it also related to none catalog query’s ?
select 1+1;
also sometimes runs more than 30 seconds.
Mm....
Vertica is heavy using FS buffers cache , I think it worth to check this side , please share the output of the linux “free” command , when problem is appearingI also attach a reference how to monitor it http://lonesysadmin.net/2011/12/04/leave-some-ram-for-filesystem-cache/
Thanks .
https://community.vertica.com/vertica/topics/query_caching_data_caching
I have seen slowness on catalog queries specially detailed system tables.
Also , If a a query is hitting Vertica for the first time, it seems to take more time and then when the same query regularly hits vertica, the timing seems to be decreased drastically.
We can also test this by using clear_caches() function
Hope this helps
NC