PROFILE vs EXPLAIN

elghali · June 2019

Hello,
I'm trying to assess the cost of a huge query, now when I run it with EXPLAIN, it shows
SELECT LIMIT 1K [Cost: 572M, Rows: 1K (NO STATISTICS)] (PATH ID: 30)
but when I run it with PROFILE, it shows
NOTICE 5077: Total memory required by query: [49320002 KB]
as you can see there's a huge difference in memory consumption, why do these commands generate two different outputs and what am I reading wrong here?

LenoyJ · June 2019

The 572M in your explain plan does NOT mean 572 MB of memory. It is a number that is generated based on many factors (row count, lowest I/O, CPU, memory, network etc.) Vertica uses a cost based optimizer to choose the best query plan with (generally) the lowest total cost.
Read more here: https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/QueryManagement/QueryPlans/QueryPlanCostEstimation.htm

PS: It is advisable to analyze statistics so that the optimizer can be more accurate in estimating costs and thereby generating better query plans. The plan states that the table has no statistics.

Vertica_Curtis · June 2019

Cost is an interesting metric, but a large query might not be terribly costly, if you only run it one time. Often, the more "expensive" queries are the ones that are medium-sized which run all the time. Those are the ones which are often worth trying to tune, since the total cost savings might be much larger.

I wrote a query to find them:

most common queries in the system, but calculates a "cost" metric for the
-- query, based on threads, memory, duration. Also includes the last transaction_id so you can see the full
-- query if you want, and the range of the times it's ran from and to.
-- Note: This query specifically targets just JOIN and GROUP BY SPILL events, because coincidentally those
-- are the most expensive queries, but if you want to target a broader set of queries, you can remove that
-- line (near the bottom)
select REGEXP_REPLACE(REGEXP_REPLACE(substr(qr.request, 1, 500), '\d'), '\''[\S\s]?\''', '''?''' ) as query
, max(qr.transaction_id) as last_trans_id
, avg(qr.request_duration_ms)/1000/60 as avg_duration_mins
, avg(ra.thread_count) as avg_threads
, to_char(avg(memory_inuse_kb)/1024, '999,999') as avg_mem_mb
, (sum(qr.request_duration_ms) * count(1))/1000 as total_duration_secs
, max(eee.event_type) as event_type
, trunc(min(qr.start_timestamp), 'MI') as first_run
, trunc(max(qr.start_timestamp), 'MI') as last_run
, (sum(qr.request_duration_ms) * count(1)) * sum(ra.thread_count) as total_cost
, count(1) as num_runs
from v_monitor.query_requests qr
join (select transaction_id, statement_id, event_type
from v_internal.dc_execution_engine_events
where node_name = (select min(node_name)
from v_catalog.nodes)
) eee using (transaction_id, statement_id)
left join (select transaction_id, statement_id, thread_count, memory_inuse_kb, duration_ms
from v_monitor.resource_acquisitions
where node_name = (select min(node_name)
from v_catalog.nodes)
) ra using (transaction_id, statement_id)
where start_timestamp::date > current_date -1
and not qr.is_executing
and eee.event_type in ('GROUP_BY_SPILLED', 'JOIN_SPILLED')
group by 1
having count() > 3
order by total_cost desc ;

We're Moving!

Create My New Community Account Now

PROFILE vs EXPLAIN

Comments

Leave a Comment