How to analyse high query performance variabilty?

joergschaber · November 2022

We recently observed a drop in query performance, or rather a high performance variabilty.
E.g. a query like:
SELECT * FROM table LIMIT 1;'
on a rather small 677 x 38 table, takes between 8 and 4000 milli-seconds.
Looking at v_monitor.query_requests e.g.

request_duration_ms request
------------------------- ----------
999 SELECT * FROM table LIMIT 1
1260 SELECT * FROM table LIMIT 1
8 SELECT * FROM table LIMIT 1
3918 SELECT * FROM table LIMIT 1
4093 SELECT * FROM table LIMIT 1
946 SELECT * FROMtable LIMIT 1

We are aware that this might have a range of reasons, not only database related. However, it does not seem to be related to memory or cpu usage.
Unfortunately, the table v_monitor.query_requests only holds query information for the current day. Is there a way to increase this history?

How can we find out/investigate where this high variability comes from?

mosheg · November 2022

Here is a vsql query to generate statements to collect data for 3 days:

\set DAYS_NO 3
WITH temp_dc_coll_policy AS ( SELECT component, node_name, memory_buffer_size_kb,disk_size_kb,(current_disk_bytes/(1024^2))::NUMERIC(10,3) cur_dsk_util_MB, kb_per_day::NUMERIC(20,3), interval_set, interval_time, TRUNC(first_time,'SS') as first_time, CAST((CURRENT_TIMESTAMP - first_time) AS INTERVAL DAY TO MINUTE) AS "intvl (DD HH:MM)" 
FROM v_monitor.data_collector 
WHERE lower(table_name) IN ( 'columns', 'license_audits', 'projection_checkpoint_epochs', 'projection_columns', 'projections', 'resource_pools', 'tables', 'dc_cpu_aggregate_by_hour', 
         'dc_execution_summaries', 'dc_lock_attempts', 'dc_memory_info_by_hour', 'dc_process_info_by_hour', 'dc_requests_completed', 'dc_requests_issued', 'dc_resource_acquisitions', 
         'dc_slow_events', 'dc_spread_monitor', 'dc_tuple_mover_events', 'vs_elastic_cluster', 'delete_vectors', 'host_resources', 'projection_storage', 'query_profiles', 'query_requests', 
         'resource_pool_status', 'storage_containers', 'system', 'dc_requests_issued', 'dc_requests_completed', 'columns','license_audits','projection_checkpoint_epochs','projection_columns','projections','resource_pools','tables','dc_cpu_aggregate_by_hour','dc_execution_summaries','dc_lock_attempts','dc_memory_info_by_hour','dc_process_info_by_hour','dc_requests_completed','dc_requests_issued','dc_resource_acquisitions','dc_slow_events','dc_spread_monitor','dc_tuple_mover_events','vs_elastic_cluster','delete_vectors','host_resources','projection_storage','query_profiles','query_requests','resource_pool_status','storage_containers','system', 'dc_requests_issued', 'dc_requests_completed','dc_session_starts','dc_session_ends')  )
SELECT component, max(memory_buffer_size_kb) as max_memory_buf_size_kb,  max(cur_dsk_util_MB) as max_cur_dsk_util_MB, max(disk_size_kb) as max_disk_size_kb,  max(kb_per_day) as max_kb_per_day, max(interval_set) as interval_set, max (interval_time) as max_interval_time_set, max(first_time) ealierst_data_on_allnodes, max("intvl (DD HH:MM)" ) as "intvl (DD HH:MM)"
,CASE 
     WHEN component ilike '%Ends' OR component ilike '%Completed' THEN 'SELECT SET_DATA_COLLECTOR_POLICY('''||component||''', '|| max(memory_buffer_size_kb)::varchar ||', '|| GREATEST(max(disk_size_kb) ,CEILING ((:DAYS_NO::int + 3)*max(kb_per_day)))::varchar ||', '''||:DAYS_NO+3||' days'');'
     ELSE 'SELECT SET_DATA_COLLECTOR_POLICY('''||component||''', '|| max(memory_buffer_size_kb)::varchar ||', '|| GREATEST(max(disk_size_kb) ,CEILING ((:DAYS_NO::int) *max(kb_per_day)))::varchar ||', '''||:DAYS_NO||' days'');'
END as command 
FROM temp_dc_coll_policy
GROUP BY component
ORDER BY component;

Find your query transaction_id and statement_id
by adding the word profile before the query
or just after the query statement add the following:

SELECT REQUEST_DURATION_MS, transaction_id, statement_id, LEFT(REQUEST,110) AS Statement
FROM QUERY_REQUESTS
WHERE transaction_id=current_trans_id()
  AND (statement_id=current_statement()-1);

When you know what is your query transaction_id and statement_id
use the following example to find which execution step run longer than usual.

\set MY_TRANSACTION_ID 45035996273713601
\set MY_STATEMENT_ID 3

SELECT
        execution_step,
        MAX(completion_time - time) AS elapsed
    FROM
        v_internal.dc_query_executions
    WHERE
        transaction_id=:MY_TRANSACTION_ID AND
        statement_id=:MY_STATEMENT_ID
    GROUP BY 1
    ORDER BY 2 desc ;

marcothesane · November 2022

Dear Joerg -
I can see that you have COPY with SOURCE KAFKASOURCE(), meaning that you have activated at least a Kafka source, if not a Vertica Kafka micro batch scheduler. Best practice, indeed, is to configure a specific resource pool for that, otherwise you risk to step on the toes of the people querying the system.
The answer on how to configure it, is the famous "it depends", of course, and will require a little investigation on the types of workloads currently running on the system.

SruthiA · November 2022

the below knowledge base link contains many useful queries for troubleshooting query performance.

https://www.vertica.com/kb/System-Tables-For-Performance/Content/BestPractices/System-Tables-For-Performance.htm

Sergey_Cherepan_1 · November 2022

Sounds a lot like you have queries waiting in queue for resources (memory). That is a prime reason for high variability on tiny queries.
Of course it can be a lot of other reasons.

joergschaber · November 2022

Yes, that is my suspicion too. There are always some queries in the queue, especially COPY from KAFKASOURCE queries with a duration of 10s. I am thinking about defining a new resource pool for these queries that are run by a special user. I found some documentation how to define new resoures pools and their parameters, but not how to determine the best setting for those parameters.
In the resource_queues table I can see how much memory is requested by those queries. Should I base the respource-pool parameteer on this?

joergschaber · November 2022

Ok, got it. Thanks!

joergschaber · November 2022

We now defined a specific resource pool for those queries and it solved the problem.

SergeB · November 2022

The following documentation might be useful to configuring your resource pools for copy kafkasource. https://www.vertica.com/docs/12.0.x/HTML/Content/Authoring/KafkaIntegrationGuide/SchedulerResourcePool.htm

We're Moving!

Create My New Community Account Now

How to analyse high query performance variabilty?

Best Answers

Answers

Leave a Comment