Performance issue

GuillaumeGuillaume Registered User
Hi, We have some performance issues using Vertica in production. Basically, Vertica is sometimes vveeeerrrryyyy slow, without me being able to pinpoint any particular cause. I have a 'performance monitoring' running every minute, which basically just do a 'select count(*) from nodes' via vsql directly on a node. This query takes when all goes well about 0.02 seconds, but sometimes it goes up to 30 seconds. Trying to understand what is going on, there is nothing obvious to me. I do monitor a few metrics, and there is no correlations: - not related to high load average - not related to high iowait - not related to high network traffic That said, those metrics tend to be higher when the slowness is higher, but not necessarily. This happens with or without a lot of users connected. It does happen even when no big queries are running. Usually there is no memory issue (using no more than 75%), sometimes some queries cannot allocate enough memory but that is not common. The processus or thread count goes high (up to 1500 threads for Vertica) but not through the roof. All the ulimits are very high. There is nothing weird in the logs, except a lot of:
Poll dispatch:0x77c5f40 [Dist] <WARNING> Messenger::readcb_r: closing fd 14 due to early read error: EOF  2013-06-27 02:11:03.475 Init Session:0x7f1860018ea0 <LOG> @v_spil_dwh_node0001: 00000/2705: Connection received: host=172.16.0.122 port=44250 (connCnt 1)  2013-06-27 02:11:03.475 Init Session:0x7f1860018ea0 <LOG> @v_spil_dwh_node0001: {SessionRun} 00000: missing error text  2013-06-27 02:11:03.745 Init Session:0x7f1860024a90 <LOG> @v_spil_dwh_node0001: 00000/2705: Connection received: host=172.16.0.123 port=56382 (connCnt 1)  2013-06-27 02:11:03.745 Init Session:0x7f1860024a90 <LOG> @v_spil_dwh_node0001: {SessionRun} 00000: missing error text  2013-06-27 02:11:04.097 Init Session:0x7f1860017770 <LOG> @v_spil_dwh_node0001: 00000/2705: Connection received: host=172.16.0.121 port=36722 (connCnt 1)  2013-06-27 02:11:04.097 Init Session:0x7f1860017770 <LOG> @v_spil_dwh_node0001: {SessionRun} 00000: missing error text  2013-06-27 02:11:04.128 Poll dispatch:0x77c5f40 [Dist] <WARNING> Messenger::readcb_r: closing fd 26 due to early read error: EOF  </WARNING></LOG></LOG></LOG></LOG></LOG></LOG></WARNING>
The very weird thing is, I updated vertica from 6.1.1 to 6.1.2 recently, and during 2-3 hours after update the speed was just fantastic. Then it went back to the normal slow self. Trying to restart vertica later on all nodes did not give any improvement, even temporary. I know this is a confused description of my problem (but then I really am confused about this). I would be very grateful to get any insight, ideas or similar issues. Thanks,

Comments

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file