profile timing data doens't match the real sql elapse time.

qihua_wu · September 2023

I have profiled one sql and the result is as below:
HINT: Select * from v_monitor.execution_engine_profiles where transaction_id=1193453901253184496 and statement_id=1;
vsql:new.sql:1789: NOTICE 3557: Initiator memory for query: [on pool merge: 7808279 KB, minimum: 7721225 KB]
vsql:new.sql:1789: NOTICE 5077: Total memory required by query: [20608676 KB]

OUTPUT

732199007
(1 row)

Time: First fetch (1 row): 1072106.511 ms. All rows formatted: 1072106.570 ms

Here we can see the total time is 1000 seconds. But from the profile table, the sum of execution time or clock time is less than 600 seconds, why could this happen? (adhoc.profiel_daniel_merge_allen_sql_part1 is the table stored data in v_monitor.execution_engine_profiles this this specific query)

select path_id,operator_name,count(distinct operator_id),
max(case when counter_name='execution time (us)' then (counter_value/1000/1000)::decimal(10,2) else 0 end) execution_second,
max(case when counter_name='clock time (us)' then (counter_value/1000/1000)::decimal(10,2) else 0 end) clock_second
from adhoc.profiel_daniel_merge_allen_sql_part1 where node_name='v_node0256' group by path_id, operator_name order by 1 asc

path_id operator_name count execution_second clock_second
-1 GroupByNothing 1 0.00 0.00
-1 Root 1 0.00 0.00
15 ExprEval 1 0.00 0.00
17 ExprEval 2 0.00 0.00
16 StorageMerge 1 0.01 0.00
16 GroupByPipe 3 0.02 0.02
-1 Copy 448 0.09 0.18
17 Scan 2 0.15 5.05
-1 NewEENode 1 0.73 0.74
2 ExprEval 1 0.91 0.92
19 StorageMerge 1 0.98 0.00
-1 Recv 126 0.99 0.00
-1 Send 63 1.04 0.00
7 ExprEval 1 1.07 1.07
-1 Union 2 1.55 1.62
13 NetworkSend 2 2.33 0.00
9 ExprEval 16 2.81 2.85
12 ExprEval 16 2.89 2.94
13 ExprEval 1 3.12 3.16
19 GroupByPipe 17 3.46 3.47
20 ExprEval 16 3.63 3.66
18 ExprEval 1 4.09 4.10
13 NetworkRecv 2 4.13 0.00
14 Join 1 7.88 8.09
-1 ValExpr 4 7.96 8.05
5 NetworkRecv 2 9.07 0.00
3 ExprEval 1 9.37 9.43
9 Scan 16 10.16 57.66
10 ExprEval 1 10.78 10.86
5 NetworkSend 2 11.57 0.00
11 StorageMerge 1 11.93 0.00
8 GroupByPipe 17 13.04 13.10
11 GroupByPipe 17 13.83 13.90
-1 Router 2 17.72 17.93
12 Scan 16 18.27 66.53
20 Scan 16 18.97 42.79
5 ExprEval 1 21.94 22.20
8 StorageMerge 1 32.33 0.00
6 Join 1 42.49 43.25
1 ExprEval 1 46.68 46.94
4 Join 1 83.22 90.61
-1 DataTarget 2 115.15 52.71

mosheg · September 2023

Waiting times are not always reflected as queries can possibly delayed because of resource contention on some of the nodes.
See: https://docs.vertica.com/23.3.x/en/mc/monitoring-using-mc/monitoring-system-resources/monitoring-user-query-phases/

qihua_wu · September 2023

waiting time is not included in the execution time, but it should be included in the clock time, right? If there are resource contention, then the additional time caused by contention should be reflected in clock time. And profile I did is on a dedicate 64 nodes cluster just started up for testing, no contention at all.

marcothesane · September 2023

No. the clock, in execution_engine_profiles, starts ticking when the query is released to execution from the queue of the resource pool it is running under. The prepared query can easily estimate a memory amount that exceeds the resource pool's memory budget, because other queries are running that already have reserved memory for themselves, and the resource pool will queue the new query until enough other queries have finished, thus giving back the reserved memory, and the new query can then be released. Or the query has a maximal concurrency of 4 queries, and 4 queries are running, and the new query will only be released when one of the 4 will finish. The time between submitting the query and the query's release to execution might very well account for the difference between what vsql reports and execution_engine_profiles might report.

qihua_wu · September 2023

Thx, so if the sql is waiting for run (queue in resource pool), then the clock won't tick, but if the sql already begins to run, and need to get other resource, such as network, io or additional memory, the clock still will tick, right? My case is that sql is already running, from below snapshot, this step has "end time" - "start time" > 200 seconds, but both clock time and execution time are less than 10 seconds, so much difference.

Sergey_Cherepan_1 · September 2023

It is by far more complicated than you think.
For starters, system view execution_engine_profile is outdated and refer to some counters that no longer exist. You can consider output from execution_engine_profile as mostly wrong (some data still OK).
Query plan combines several different operators into one line of plan.
Can be many instances of same operator.
And so on.

Sergey_Cherepan_1 · September 2023

Generally speaking, it is very hard to look on query_plan_profile. I recommend to users to look on int only when query still running, and only when they quickly need to find place that consume most resources. Of course I am providing my own correct and more detailed version of query_plan_profiles, see below, it is valid for v 12.0.4.
All other uses of this view are not practical. You can look on it as learning exercise, no more.

    select 
    e.is_started,
    e.is_running,
    e.is_completed,
    e.running_time runtime,
    to_timestamp(e.sum_execution_time_us/1000000.0) - to_timestamp(0.0) cpu_time,
    e.max_memory_reserved memory_reserved,
    e.sum_bytes_read_from_disk bytes_read_from_disk,
    e.sum_bytes_received bytes_received,
    e.sum_bytes_sent bytes_sent,
    e.min_estimated_rows_produced estimated_rows_produced,
    e.sum_rows_produced rows_produced,
    e.sum_rows_processed rows_processed,
    e.operator_name operator,
    p.path_line,
    e.cnt counters,    
    p.transaction_id, p.statement_id, p.path_id, p.path_line_index, e.baseplan_id 
    from
    (select 
    transaction_id, statement_id, path_id, baseplan_id, 1 path_line_index,
    min(operator_name) operator_name, count(*) cnt,
    to_timestamp(min(case when counter_name = 'start time' then counter_value else null::int end)/1000000.0) min_start_time,
    to_timestamp(max(case when counter_name = 'end time' then counter_value else null::int end)/1000000.0) max_end_time,
    min_start_time is null is_started,
    min_start_time is not null and max_end_time is null is_running,
    min_start_time is not null and max_end_time is not null is_completed,
    case when min_start_time is null then null::interval else case when max_end_time is null then clock_timestamp() - min_start_time else max_end_time - min_start_time end end running_time,
    sum(case when counter_name = 'execution time (us)' then counter_value::float else null::float end) sum_execution_time_us,
    max(case when counter_name = 'memory reserved (bytes)' then counter_value else null end) max_memory_reserved,
    sum(case when counter_name = 'bytes read from disk' then counter_value else null end) sum_bytes_read_from_disk,
    sum(case when counter_name = 'bytes received' then counter_value else null end) sum_bytes_received,
    sum(case when counter_name = 'bytes sent' then counter_value else null end) sum_bytes_sent,
    min(case when counter_name = 'estimated rows produced' then counter_value else null end) min_estimated_rows_produced,
    sum(case when counter_name = 'rows produced' then counter_value else null end) sum_rows_produced,
    sum(case when counter_name = 'rows processed' then counter_value else null end) sum_rows_processed
    from 
    (select transaction_id, statement_id, path_id, baseplan_id, operator_name, counter_name, counter_value 
    from vs_execution_engine_profiles
    where transaction_id = 'transaction_id'
    and statement_id = 'statement_id'
    union all
    select transaction_id, statement_id, path_id, baseplan_id, operator_name, counter_name, counter_value 
    from dc_execution_engine_profiles
    where transaction_id = 'transaction_id'
    and statement_id = 'statement_id'
    ) v
    group by transaction_id, statement_id, path_id, baseplan_id) e
    right outer join
    (select * from dc_explain_plans
    where transaction_id = 'transaction_id'
    and statement_id = 'statement_id'
    ) p
    using(transaction_id, statement_id, path_id, path_line_index)
    order by p.transaction_id, p.statement_id, p.path_id, p.path_line_index, e.baseplan_id;

Query is using my c++ udx to_timestamp, it takes number of micros since epoch and return Vertica timestamp. You can find it on github - verticaudxutilities.

qihua_wu · September 2023

what's wonderful. several questions to the stats:
1: you put min_estimated_rows_produced in the view, then what if min_estimated_rows_produced is way larget than sum_rows_produced? in my example the former is 95,096 Million rows while the later is 4,943 Million rows, 20 times difference.
2: The following is from path_id 24,
a: there are 2 NetworkRecv and 2 NetworkSend, so does that means 4 threads?
b: the number of gb_received (recieived in Gigbtype) and gb_send are similar, but m_rows_produced (million rows) are several times diff, why could this happen?

We're Moving!

Create My New Community Account Now

profile timing data doens't match the real sql elapse time.

OUTPUT

Answers

Leave a Comment