Large negative interval in query_plan_profiles

Sergey_Cherepan_1Sergey_Cherepan_1 ✭✭✭
edited July 2023 in General Discussion

Hi,

I am looking at query_plan_profile - finding place where query spent most time.

It appears, there are quite a few rather large values for runtime_interval column, slightly bigger than -10957 days. You can see them right here:

select running_time
from query_plan_profiles
where running_time is not null
and running_time < '-10000'::interval;

-10956 23:51:05.141894
-10956 23:37:53.243389
-10956 23:37:50.560545
-10956 23:51:05.297616
-10956 23:48:04.284543
-10956 23:37:50.513517
-10956 23:51:04.907482

Rest of values looks OK, either NULL or small interval like it should be.

Can you ask around, what is wrong with runtime_interval in query_plan_profiles. I am on v 12.0.4.

I resorted to

case when running_time < '-100'::interval then running_time + '10957'::interval else running_time end running_time,

that seems to be working.

Thank you
Sergey

Answers

  • Bryan_HBryan_H Vertica Employee Administrator

    It's a system view (see the entry in v_internal.vs_system_views) that appears to compute running_time as shown below. The second COALESCE in this query would return 0 if no "end time" was recorded leading to (0 - start_time) or a large negative interval, so my guess is that if you looked up the transaction and statement id, you would find that the query didn't complete so no "end time" was recorded. Here's the running_time calculation - what is the expected behavior if you can confirm the query failed (I see some large negative as well but haven't yet tracked the problem transaction and statement, and would appreciate confirmation anyways):
    select t_st.transaction_id, t_st.statement_id, t_st.path_id, t_st.started, t_ed.completed, to_timestamp(t_ed.end_time / 1000000.0) - to_timestamp(t_st.start_time / 1000000.0) as running_time, t_st.start_time as start_time
    from
    (select transaction_id, statement_id, path_id, min(counter_value) as start_time, max(coalesce(counter_value,0)) > 0 as started
    from v_internal.vs_execution_engine_profiles
    where counter_name = 'start time'
    group by 1,2,3
    )t_st
    join
    (select transaction_id, statement_id, path_id, max(counter_value) as end_time, min(coalesce(counter_value,0)) > 0 as completed
    from v_internal.vs_execution_engine_profiles
    where counter_name = 'end time'
    group by 1,2,3
    )t_ed
    using (transaction_id, statement_id, path_id)
    ) rt_t

  • Bryan_HBryan_H Vertica Employee Administrator

    Well, I was close. The large negative interval occurs on my system because there's no end time yet while the path is executing, so please check whether same occurs for you:
    d2=> select transaction_id,statement_id,path_id,is_executing,running_time
    d2-> from query_plan_profiles
    d2-> where running_time is not null
    d2-> and running_time < '-10000'::interval;
    transaction_id | statement_id | path_id | is_executing | running_time
    -------------------+--------------+---------+--------------+------------------------
    45035996276728334 | 1 | -1 | t | -10956 23:59:57.804859
    Correct behavior is probably not to show a time at all while executing.

  • Thanks for hints, looking...
    General idea behind query plan profile is to show most resource consuming operators. Would be very nice to see total CPU consumption by operator while query still being executed. At least, network and disk reads are real time counters, so it is possible.
    Operator duration is a poor indicator of CPU consumption, it can be blocked by other operators down in query plan.
    If operator not yet finish, probably best approach is to show interval up to current time.

  • Yes, there is a bug in v_internal.execution_engine_profiles

    case when (completed) then running_time else (case when (started) then clock_timestamp() - to_timestamp(rt_t.start_time / 1000000.0 + 946684800) else null end) end as running_time

    Constant 946684800 is being incorrectly added. Removing this constant makes query to return correct interval.

    My best guess is that this constant is from earlier versions of Vertica, this system view did not get updated where execution engine profiles counters changed.

    Question:
    There is a counter execution time (us). Is it a CPU consumption by operator? I am looking to add this counter for query_plan_profiles to get CPU consumption by operator. Can you ask engineering, how closely value of execution time (us) counter related to actual CPU consumption by operator?

  • Sergey_Cherepan_1Sergey_Cherepan_1 ✭✭✭
    edited July 2023

    Here is my take on rewrite of query_plan_profiles.

    Notes:
    1) execution_engine_profiles is a union all of vs_ and dc_ execution_engine profiles and resources. Resources has only memory allocation counters. I am not sure how to interpret memory counters and I omitted all of resources.
    2) vs and dc execution_engine_profiles do not have counter 'memory allocated (bytes)' that is referred in query_plan_profiles. But have 'memory reserved (bytes)', and I used it instead. Hopefully it is summarising all counters from execution_engine_resources.
    3) Having multiple joins in query_plan_profile is kinda dumb and poor performing. I replaced joins with pivot, it is much better performing and well scale with more counters.
    4) I added counter 'execution time (us)', in hope that it is closely resembles CPU consumption by operator.
    5) There are very interesting new counters '%rows%', I added them. They show when optimiser was wrong in estimating number of rows processed by operator, vs what was actually processed by operator. It pinpoint when optimiser produced suboptimal query plan.

    Enjoy!

    select e.transaction_id, e.statement_id, e.path_id, p.path_line_index,
    e.is_started,
    e.is_running,
    e.is_completed,
    case when path_line_index = 1 then e.running_time else null end runtime,
    case when path_line_index = 1 then to_timestamp(e.sum_execution_time_us/1000000.0) - to_timestamp(0.0) else null end cpu_time,
    case when path_line_index = 1 then e.max_memory_reserved else null end memory_reserved,
    case when path_line_index = 1 then e.sum_bytes_read_from_disk else null end bytes_read_from_disk,
    case when path_line_index = 1 then e.sum_bytes_received else null end bytes_received,
    case when path_line_index = 1 then e.sum_bytes_received else null end bytes_sent,
    case when path_line_index = 1 then e.sum_estimated_rows_produced else null end estimated_rows_produced,
    case when path_line_index = 1 then e.sum_rows_produced else null end rows_produced,
    case when path_line_index = 1 then e.sum_rows_processed else null end rows_processed,
    case when path_line_index = 1 then e.operator_name else null end operator,
    p.path_line
    from
    (select
    transaction_id, statement_id, path_id, max(operator_name) operator_name,
    to_timestamp(min(case when counter_name = 'start time' then counter_value else null::int end)/1000000.0) min_start_time,
    to_timestamp(max(case when counter_name = 'end time' then counter_value else null::int end)/1000000.0) max_end_time,
    min_start_time is null is_started,
    min_start_time is not null and max_end_time is null is_running,
    min_start_time is not null and max_end_time is not null is_completed,
    case when min_start_time is null then null::interval else case when max_end_time is null then clock_timestamp() - min_start_time else max_end_time - min_start_time end end running_time,
    sum(case when counter_name = 'execution time (us)' then counter_value else null end) sum_execution_time_us,
    max(case when counter_name = 'memory reserved (bytes)' then counter_value else null end) max_memory_reserved,
    sum(case when counter_name = 'bytes read from disk' then counter_value else null end) sum_bytes_read_from_disk,
    sum(case when counter_name = 'bytes received' then counter_value else null end) sum_bytes_received,
    sum(case when counter_name = 'bytes sent' then counter_value else null end) sum_bytes_sent,
    sum(case when counter_name = 'estimated rows produced' then counter_value else null end) sum_estimated_rows_produced,
    sum(case when counter_name = 'rows produced' then counter_value else null end) sum_rows_produced,
    sum(case when counter_name = 'rows processed' then counter_value else null end) sum_rows_processed
    from
    (select transaction_id, statement_id, path_id, operator_name, counter_name, counter_value from vs_execution_engine_profiles
    union all
    select transaction_id, statement_id, path_id, operator_name, counter_name, counter_value from dc_execution_engine_profiles
    ) v
    group by transaction_id, statement_id, path_id) e
    join
    (select * from dc_explain_plans) p
    using(transaction_id, statement_id, path_id)
    order by e.transaction_id, e.statement_id, e.path_id, p.path_line_index;

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file