High value for open_file_handle_count in resource_acquisition causing slow queries
We have a query that normally runs with thread_count of 65 and open_file_handle_count between 50 and 60 as reported by RESOURCE_ACQUISITIONS, and executes in less than a second. However, this same query on the same table started taking over 10 seconds recently, without any change on the data in the underlying tables or the table structure (including projections). On reviewing the system tables, the open_file_handle_count jumps to more than 700 and the thread count is 146 during this degraded performance. The resource pool, db user, overall system load during query execution etc. haven't changed when the performance dropped, but new schema and tables have been added in the cluster. The open file count in /proc/Vertica Process ID /fd also rises in proportion to the open_file_handle_count during query execution.
What can cause the open_file_handle_count and the thread_count to change for a query? Is it indicative of a large number of ROSes or partitions, and is the open file count for a query impacted by the overall number of tables in the cluster?
Thanks
Sajan
What can cause the open_file_handle_count and the thread_count to change for a query? Is it indicative of a large number of ROSes or partitions, and is the open file count for a query impacted by the overall number of tables in the cluster?
Thanks
Sajan
0
Comments
You do not mention how much objects /schemas was added since the time your performance was fine , Vertica keep all the catalog in memory so adding more objects will impact the available memory , if you add many objects you many now have less memory available which lead to more physical I/O .
By the way i see many time high thread_count on RESOURCE_ACQUISITIONS but when check on the level of the OS it was much less , you can check it using this shell script (12742 = vertica linux process id ) - >
#/bin/kshwhile [ 1 -eq 1 ]
do
date='date +%s%N'
T_STAT='ps -C firefox-bin -L -o pid,tid,pcpu,state 12742|awk 'BEGIN{V_S=0;V_R=0}{if ($4 == "S"){V_S=V_S+1}else {V_R=V_R+1}}END{print V_S", "V_R}''
echo $date","$T_STAT
done
Regarding your question on how much objects were added, 3 schema with around 50 tables were added. However, the memory usage on the node is around 15% only, and there i dont see any memory, i/o or CPU issue in the usages tables. Also, the structure and data in the tables involved in the degraded queries havent changed at all, so I am still struggling to understand why new objects will impact a query running on existing objects.
Thanks
Sajan
DO you have Vertica profile data of the before and after ? this can be very useful for you to understand the root cause .