Options

High value for open_file_handle_count in resource_acquisition causing slow queries

We have a query that normally runs with thread_count of 65 and open_file_handle_count between 50 and 60 as reported by RESOURCE_ACQUISITIONS, and executes in less than a second. However, this same query on the same table started taking over 10 seconds recently, without any change on the data in the underlying tables or the table structure (including projections). On reviewing the system tables, the open_file_handle_count jumps to more than 700 and the thread count is 146 during this degraded performance. The resource pool, db user, overall system load during query execution etc. haven't changed when the performance dropped, but new schema and tables have been added in the cluster. The open file count in /proc/Vertica Process ID /fd also rises in proportion to the open_file_handle_count during query execution.

What can cause the open_file_handle_count and the thread_count to change for a query? Is it indicative of a large number of ROSes or partitions, and is the open file count for a query impacted by the overall number of tables in the cluster?

Thanks
Sajan


Comments

  • Options
    Hi ,

    You do not mention how much objects /schemas was added since the time your performance was  fine , Vertica keep all the catalog in memory so adding more objects will impact the available memory , if you add many objects  you many now have less memory available which lead to more  physical I/O .

    By the way i see many time high thread_count on RESOURCE_ACQUISITIONS but when check on the level of the OS it was much less , you can check it using this shell script  (12742 = vertica linux process id )  - >

    #/bin/ksh

    while [ 1 -eq 1  ]

    do

    date='date +%s%N'

    T_STAT='ps -C firefox-bin -L -o pid,tid,pcpu,state 12742|awk 'BEGIN{V_S=0;V_R=0}{if ($4 == "S"){V_S=V_S+1}else {V_R=V_R+1}}END{print V_S", "V_R}''

     

    echo  $date","$T_STAT

    done



  • Options
    Thanks Eli for the response. The open_file_handle_count in resource_acquisitions is the counter for the OS level file descriptor count, and i have seen a high ratio between open_file_handle_count and thread_count during poor performance. If you compare the value of open_file_handle_count with /proc/12742/fd, the two should vary proportionately. 

    Regarding your question on how much objects were added, 3 schema with around 50 tables were added. However, the memory usage on the node is around 15% only, and there i dont see any memory, i/o or CPU issue in the usages tables. Also, the structure and data in the tables involved in the degraded queries havent changed at all, so I am still struggling to understand why new objects will impact a query running on existing objects.

    Thanks
    Sajan
  • Options
    Regarding amount of  thread's you see  , vertica is opening all this threads but you can see , using the script i attach that in most of the time they are in sleep . 

    DO you have Vertica  profile data of the before and after ? this can be very useful for you to understand the root cause  .

  • Options
    You should see the change root problem using profile , if you can share the profile output of both befor and after ,we mat see somting

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file