Vertica Slowing Down over time?

Hey there

 

We are seeing something in our environment which is very odd and peculiar... over time, queries against Vertica tend to start slowing down to the point where there is a ~2x or more wall clock time difference for the same queries against the same dataset. There is a known workaround for this problem: a full Vertica cluster restart.

 

Do others experience something similar in their environment? This has been something that we've been observing since Vertica 7.0, 7.1 and now 7.2

 

Some facts that may help this discussion:

 

- We are running on CentOS 6.5

 

- Our Java version is Java 1.7.0_51

 

- Current Vertica release is 7.2.2-2 (SP2)

 

- We use the HDFS Connector and/or home built Java functions/UDX

 

- Physical stand alone nodes; DAS mode (no SAN)

 

- There are no extra queries or background activity like tuple mover activity/statistics/data loads, etc.

 

- According to our monitoring, the system usage is roughly the same between days when Vertica is "fast" and when Vertica is "slow"

 

Thoughts?

Comments

  • Hi ,
    I personally do not familiar with such behave , one of the ways to troubleshoot this behave is by comparing profile info of specific query for the two time periods . You will be able to see which execution operators has degridation in term on performnece .

     

    For example , if you will see the "scan" operator is taking more time ,or run with less parallel threads , you may have indication that you may have system problem or io problem .

     

    Addtional point which you can also check is with regard to your server setup , it very importent to Disable CPU Frequency Scaling at the BIOS level https://my.vertica.com/docs/6.1.x/PDF/HP_Vertica_6.1.x_InstallGuide.pdf

     

    I hope you will find it useful

     

    Thanks

  • Hey Eli, thanks for the reply

    We do have CPU frequency scaling disabled at the BIOS level. If memory serves right, Verticas installer won't let you proceed with install/upgrades

    Any other thoughts/ideas?
  • When oyu say this : 

    - According to our monitoring, the system usage is roughly the same between days when Vertica is "fast" and when Vertica is "slow"

     

     

    There is any reosurce cap ? 

    Are you haveing Resource pool queues ? 

    Your data has statistics ? 

    Do you monitor your query execution times periodicly ? 

    Do you follow up on your query requests ? 

      - do ou use labels ? 

      - do you keep count of those labeled requests and thieir times/resource aquisition ? 

    Is your network working the same as in the past ? 

       - do you monitor this ? 

    Do you check for delete vectors ? 

      - is you purge policy in place ? 

     

    There are many point oyu need to look for and if you understant your workload them make sure you capture it so you can benchmark it againts future work

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file