Performance issue

Hi, We have some performance issues using Vertica in production. Basically, Vertica is sometimes vveeeerrrryyyy slow, without me being able to pinpoint any particular cause. I have a 'performance monitoring' running every minute, which basically just do a 'select count(*) from nodes' via vsql directly on a node. This query takes when all goes well about 0.02 seconds, but sometimes it goes up to 30 seconds. Trying to understand what is going on, there is nothing obvious to me. I do monitor a few metrics, and there is no correlations: - not related to high load average - not related to high iowait - not related to high network traffic That said, those metrics tend to be higher when the slowness is higher, but not necessarily. This happens with or without a lot of users connected. It does happen even when no big queries are running. Usually there is no memory issue (using no more than 75%), sometimes some queries cannot allocate enough memory but that is not common. The processus or thread count goes high (up to 1500 threads for Vertica) but not through the roof. All the ulimits are very high. There is nothing weird in the logs, except a lot of:
Poll dispatch:0x77c5f40 [Dist] <WARNING> Messenger::readcb_r: closing fd 14 due to early read error: EOF  2013-06-27 02:11:03.475 Init Session:0x7f1860018ea0 <LOG> @v_spil_dwh_node0001: 00000/2705: Connection received: host=172.16.0.122 port=44250 (connCnt 1)  2013-06-27 02:11:03.475 Init Session:0x7f1860018ea0 <LOG> @v_spil_dwh_node0001: {SessionRun} 00000: missing error text  2013-06-27 02:11:03.745 Init Session:0x7f1860024a90 <LOG> @v_spil_dwh_node0001: 00000/2705: Connection received: host=172.16.0.123 port=56382 (connCnt 1)  2013-06-27 02:11:03.745 Init Session:0x7f1860024a90 <LOG> @v_spil_dwh_node0001: {SessionRun} 00000: missing error text  2013-06-27 02:11:04.097 Init Session:0x7f1860017770 <LOG> @v_spil_dwh_node0001: 00000/2705: Connection received: host=172.16.0.121 port=36722 (connCnt 1)  2013-06-27 02:11:04.097 Init Session:0x7f1860017770 <LOG> @v_spil_dwh_node0001: {SessionRun} 00000: missing error text  2013-06-27 02:11:04.128 Poll dispatch:0x77c5f40 [Dist] <WARNING> Messenger::readcb_r: closing fd 26 due to early read error: EOF  </WARNING></LOG></LOG></LOG></LOG></LOG></LOG></WARNING>
The very weird thing is, I updated vertica from 6.1.1 to 6.1.2 recently, and during 2-3 hours after update the speed was just fantastic. Then it went back to the normal slow self. Trying to restart vertica later on all nodes did not give any improvement, even temporary. I know this is a confused description of my problem (but then I really am confused about this). I would be very grateful to get any insight, ideas or similar issues. Thanks,

Comments

  • Hi, We have simillar issue. Sometimes, when there is no heavy load in Vertica, queries can run few times longer than usually, for example 40 seconds instead of 6 seconds. CPU is 20-30%, memeory usage is stable ~80%. Not many requests during the same time. Executionparallelism set to 1 (during heavy load this works better than Auto). We have small database. Vertica Analytic Database v6.1.2-0 Any ideas what can be the cause?
  • Hi Amelia, I have similar problem. Did you find any explanation/reason of this behaviour?
  • Hi, we are also facing the same issue with vertica 6.1.2. Please provide your suggestions.

    Parvendra Adhran 
  • Hi ,
    Are the slowdown is mainly on catalog query’s or it also related to none catalog query’s ?

  • HI Eli!

    select 1+1;

    also sometimes runs more than 30 seconds.
  • Mm....

    Vertica is heavy using FS buffers cache , I think it worth to check this side , please share  the output of the   linux “free”  command , when problem is appearing 

    I also attach a reference how to monitor it  http://lonesysadmin.net/2011/12/04/leave-some-ram-for-filesystem-cache/
    Thanks .

  • Navin_CNavin_C Vertica Customer
    Few thoughts on Vertica and cache:
    https://community.vertica.com/vertica/topics/query_caching_data_caching

    I have seen slowness on catalog queries specially detailed system tables.
    Also , If a a query is hitting Vertica for the first time, it seems to take more time and then when the same query regularly hits vertica, the timing seems to be decreased drastically.


    We can also test this by using clear_caches() function


    Hope this helps
    NC

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file