How to analyse high query performance variabilty?
We recently observed a drop in query performance, or rather a high performance variabilty.
E.g. a query like:SELECT * FROM table LIMIT 1;
'
on a rather small 677 x 38 table, takes between 8 and 4000 milli-seconds.
Looking at v_monitor.query_requests e.g.
request_duration_ms request
------------------------- ----------
999 SELECT * FROM table LIMIT 1
1260 SELECT * FROM table LIMIT 1
8 SELECT * FROM table LIMIT 1
3918 SELECT * FROM table LIMIT 1
4093 SELECT * FROM table LIMIT 1
946 SELECT * FROMtable LIMIT 1
We are aware that this might have a range of reasons, not only database related. However, it does not seem to be related to memory or cpu usage.
Unfortunately, the table v_monitor.query_requests only holds query information for the current day. Is there a way to increase this history?
How can we find out/investigate where this high variability comes from?
Best Answers
-
mosheg Vertica Employee Administrator
Here is a vsql query to generate statements to collect data for 3 days:
\set DAYS_NO 3 WITH temp_dc_coll_policy AS ( SELECT component, node_name, memory_buffer_size_kb,disk_size_kb,(current_disk_bytes/(1024^2))::NUMERIC(10,3) cur_dsk_util_MB, kb_per_day::NUMERIC(20,3), interval_set, interval_time, TRUNC(first_time,'SS') as first_time, CAST((CURRENT_TIMESTAMP - first_time) AS INTERVAL DAY TO MINUTE) AS "intvl (DD HH:MM)" FROM v_monitor.data_collector WHERE lower(table_name) IN ( 'columns', 'license_audits', 'projection_checkpoint_epochs', 'projection_columns', 'projections', 'resource_pools', 'tables', 'dc_cpu_aggregate_by_hour', 'dc_execution_summaries', 'dc_lock_attempts', 'dc_memory_info_by_hour', 'dc_process_info_by_hour', 'dc_requests_completed', 'dc_requests_issued', 'dc_resource_acquisitions', 'dc_slow_events', 'dc_spread_monitor', 'dc_tuple_mover_events', 'vs_elastic_cluster', 'delete_vectors', 'host_resources', 'projection_storage', 'query_profiles', 'query_requests', 'resource_pool_status', 'storage_containers', 'system', 'dc_requests_issued', 'dc_requests_completed', 'columns','license_audits','projection_checkpoint_epochs','projection_columns','projections','resource_pools','tables','dc_cpu_aggregate_by_hour','dc_execution_summaries','dc_lock_attempts','dc_memory_info_by_hour','dc_process_info_by_hour','dc_requests_completed','dc_requests_issued','dc_resource_acquisitions','dc_slow_events','dc_spread_monitor','dc_tuple_mover_events','vs_elastic_cluster','delete_vectors','host_resources','projection_storage','query_profiles','query_requests','resource_pool_status','storage_containers','system', 'dc_requests_issued', 'dc_requests_completed','dc_session_starts','dc_session_ends') ) SELECT component, max(memory_buffer_size_kb) as max_memory_buf_size_kb, max(cur_dsk_util_MB) as max_cur_dsk_util_MB, max(disk_size_kb) as max_disk_size_kb, max(kb_per_day) as max_kb_per_day, max(interval_set) as interval_set, max (interval_time) as max_interval_time_set, max(first_time) ealierst_data_on_allnodes, max("intvl (DD HH:MM)" ) as "intvl (DD HH:MM)" ,CASE WHEN component ilike '%Ends' OR component ilike '%Completed' THEN 'SELECT SET_DATA_COLLECTOR_POLICY('''||component||''', '|| max(memory_buffer_size_kb)::varchar ||', '|| GREATEST(max(disk_size_kb) ,CEILING ((:DAYS_NO::int + 3)*max(kb_per_day)))::varchar ||', '''||:DAYS_NO+3||' days'');' ELSE 'SELECT SET_DATA_COLLECTOR_POLICY('''||component||''', '|| max(memory_buffer_size_kb)::varchar ||', '|| GREATEST(max(disk_size_kb) ,CEILING ((:DAYS_NO::int) *max(kb_per_day)))::varchar ||', '''||:DAYS_NO||' days'');' END as command FROM temp_dc_coll_policy GROUP BY component ORDER BY component;
Find your query transaction_id and statement_id
by adding the word profile before the query
or just after the query statement add the following:SELECT REQUEST_DURATION_MS, transaction_id, statement_id, LEFT(REQUEST,110) AS Statement FROM QUERY_REQUESTS WHERE transaction_id=current_trans_id() AND (statement_id=current_statement()-1);
When you know what is your query transaction_id and statement_id
use the following example to find which execution step run longer than usual.\set MY_TRANSACTION_ID 45035996273713601 \set MY_STATEMENT_ID 3 SELECT execution_step, MAX(completion_time - time) AS elapsed FROM v_internal.dc_query_executions WHERE transaction_id=:MY_TRANSACTION_ID AND statement_id=:MY_STATEMENT_ID GROUP BY 1 ORDER BY 2 desc ;
0 -
marcothesane - Select Field - Administrator
Dear Joerg -
I can see that you haveCOPY with SOURCE KAFKASOURCE()
, meaning that you have activated at least a Kafka source, if not a Vertica Kafka micro batch scheduler. Best practice, indeed, is to configure a specific resource pool for that, otherwise you risk to step on the toes of the people querying the system.
The answer on how to configure it, is the famous "it depends", of course, and will require a little investigation on the types of workloads currently running on the system.1
Answers
the below knowledge base link contains many useful queries for troubleshooting query performance.
https://www.vertica.com/kb/System-Tables-For-Performance/Content/BestPractices/System-Tables-For-Performance.htm
Sounds a lot like you have queries waiting in queue for resources (memory). That is a prime reason for high variability on tiny queries.
Of course it can be a lot of other reasons.
Yes, that is my suspicion too. There are always some queries in the queue, especially COPY from KAFKASOURCE queries with a duration of 10s. I am thinking about defining a new resource pool for these queries that are run by a special user. I found some documentation how to define new resoures pools and their parameters, but not how to determine the best setting for those parameters.
In the resource_queues table I can see how much memory is requested by those queries. Should I base the respource-pool parameteer on this?
Ok, got it. Thanks!
We now defined a specific resource pool for those queries and it solved the problem.
The following documentation might be useful to configuring your resource pools for copy kafkasource. https://www.vertica.com/docs/12.0.x/HTML/Content/Authoring/KafkaIntegrationGuide/SchedulerResourcePool.htm