How to analyse high query performance variabilty?

joergschaberjoergschaber Vertica Customer
edited November 2022 in General Discussion

We recently observed a drop in query performance, or rather a high performance variabilty.
E.g. a query like:
on a rather small 677 x 38 table, takes between 8 and 4000 milli-seconds.
Looking at v_monitor.query_requests e.g.

request_duration_ms request
------------------------- ----------
999 SELECT * FROM table LIMIT 1
1260 SELECT * FROM table LIMIT 1
3918 SELECT * FROM table LIMIT 1
4093 SELECT * FROM table LIMIT 1
946 SELECT * FROMtable LIMIT 1

We are aware that this might have a range of reasons, not only database related. However, it does not seem to be related to memory or cpu usage.
Unfortunately, the table v_monitor.query_requests only holds query information for the current day. Is there a way to increase this history?

How can we find out/investigate where this high variability comes from?

Best Answers

  • moshegmosheg Vertica Employee Administrator
    edited November 2022 Answer ✓

    Here is a vsql query to generate statements to collect data for 3 days:

    \set DAYS_NO 3
    WITH temp_dc_coll_policy AS ( SELECT component, node_name, memory_buffer_size_kb,disk_size_kb,(current_disk_bytes/(1024^2))::NUMERIC(10,3) cur_dsk_util_MB, kb_per_day::NUMERIC(20,3), interval_set, interval_time, TRUNC(first_time,'SS') as first_time, CAST((CURRENT_TIMESTAMP - first_time) AS INTERVAL DAY TO MINUTE) AS "intvl (DD HH:MM)" 
    FROM v_monitor.data_collector 
    WHERE lower(table_name) IN ( 'columns', 'license_audits', 'projection_checkpoint_epochs', 'projection_columns', 'projections', 'resource_pools', 'tables', 'dc_cpu_aggregate_by_hour', 
             'dc_execution_summaries', 'dc_lock_attempts', 'dc_memory_info_by_hour', 'dc_process_info_by_hour', 'dc_requests_completed', 'dc_requests_issued', 'dc_resource_acquisitions', 
             'dc_slow_events', 'dc_spread_monitor', 'dc_tuple_mover_events', 'vs_elastic_cluster', 'delete_vectors', 'host_resources', 'projection_storage', 'query_profiles', 'query_requests', 
             'resource_pool_status', 'storage_containers', 'system', 'dc_requests_issued', 'dc_requests_completed', 'columns','license_audits','projection_checkpoint_epochs','projection_columns','projections','resource_pools','tables','dc_cpu_aggregate_by_hour','dc_execution_summaries','dc_lock_attempts','dc_memory_info_by_hour','dc_process_info_by_hour','dc_requests_completed','dc_requests_issued','dc_resource_acquisitions','dc_slow_events','dc_spread_monitor','dc_tuple_mover_events','vs_elastic_cluster','delete_vectors','host_resources','projection_storage','query_profiles','query_requests','resource_pool_status','storage_containers','system', 'dc_requests_issued', 'dc_requests_completed','dc_session_starts','dc_session_ends')  )
    SELECT component, max(memory_buffer_size_kb) as max_memory_buf_size_kb,  max(cur_dsk_util_MB) as max_cur_dsk_util_MB, max(disk_size_kb) as max_disk_size_kb,  max(kb_per_day) as max_kb_per_day, max(interval_set) as interval_set, max (interval_time) as max_interval_time_set, max(first_time) ealierst_data_on_allnodes, max("intvl (DD HH:MM)" ) as "intvl (DD HH:MM)"
         WHEN component ilike '%Ends' OR component ilike '%Completed' THEN 'SELECT SET_DATA_COLLECTOR_POLICY('''||component||''', '|| max(memory_buffer_size_kb)::varchar ||', '|| GREATEST(max(disk_size_kb) ,CEILING ((:DAYS_NO::int + 3)*max(kb_per_day)))::varchar ||', '''||:DAYS_NO+3||' days'');'
         ELSE 'SELECT SET_DATA_COLLECTOR_POLICY('''||component||''', '|| max(memory_buffer_size_kb)::varchar ||', '|| GREATEST(max(disk_size_kb) ,CEILING ((:DAYS_NO::int) *max(kb_per_day)))::varchar ||', '''||:DAYS_NO||' days'');'
    END as command 
    FROM temp_dc_coll_policy
    GROUP BY component
    ORDER BY component;

    Find your query transaction_id and statement_id
    by adding the word profile before the query
    or just after the query statement add the following:

    SELECT REQUEST_DURATION_MS, transaction_id, statement_id, LEFT(REQUEST,110) AS Statement
    WHERE transaction_id=current_trans_id()
      AND (statement_id=current_statement()-1);

    When you know what is your query transaction_id and statement_id
    use the following example to find which execution step run longer than usual.

    \set MY_TRANSACTION_ID 45035996273713601
    \set MY_STATEMENT_ID 3
            MAX(completion_time - time) AS elapsed
            transaction_id=:MY_TRANSACTION_ID AND
        GROUP BY 1
        ORDER BY 2 desc ;
  • marcothesanemarcothesane - Select Field - Administrator
    Answer ✓

    Dear Joerg -
    I can see that you have COPY with SOURCE KAFKASOURCE(), meaning that you have activated at least a Kafka source, if not a Vertica Kafka micro batch scheduler. Best practice, indeed, is to configure a specific resource pool for that, otherwise you risk to step on the toes of the people querying the system.
    The answer on how to configure it, is the famous "it depends", of course, and will require a little investigation on the types of workloads currently running on the system.


Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file