PROFILE vs EXPLAIN
Hello,
I'm trying to assess the cost of a huge query, now when I run it with EXPLAIN, it shows
SELECT LIMIT 1K [Cost: 572M, Rows: 1K (NO STATISTICS)] (PATH ID: 30)
but when I run it with PROFILE, it shows
NOTICE 5077: Total memory required by query: [49320002 KB]
as you can see there's a huge difference in memory consumption, why do these commands generate two different outputs and what am I reading wrong here?
0
Comments
Read more here: https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/QueryManagement/QueryPlans/QueryPlanCostEstimation.htm
PS: It is advisable to analyze statistics so that the optimizer can be more accurate in estimating costs and thereby generating better query plans. The plan states that the table has no statistics.
Cost is an interesting metric, but a large query might not be terribly costly, if you only run it one time. Often, the more "expensive" queries are the ones that are medium-sized which run all the time. Those are the ones which are often worth trying to tune, since the total cost savings might be much larger.
I wrote a query to find them:
most common queries in the system, but calculates a "cost" metric for the
-- query, based on threads, memory, duration. Also includes the last transaction_id so you can see the full
-- query if you want, and the range of the times it's ran from and to.
-- Note: This query specifically targets just JOIN and GROUP BY SPILL events, because coincidentally those
-- are the most expensive queries, but if you want to target a broader set of queries, you can remove that
-- line (near the bottom)
select REGEXP_REPLACE(REGEXP_REPLACE(substr(qr.request, 1, 500), '\d'), '\''[\S\s]?\''', '''?''' ) as query
, max(qr.transaction_id) as last_trans_id
, avg(qr.request_duration_ms)/1000/60 as avg_duration_mins
, avg(ra.thread_count) as avg_threads
, to_char(avg(memory_inuse_kb)/1024, '999,999') as avg_mem_mb
, (sum(qr.request_duration_ms) * count(1))/1000 as total_duration_secs
, max(eee.event_type) as event_type
, trunc(min(qr.start_timestamp), 'MI') as first_run
, trunc(max(qr.start_timestamp), 'MI') as last_run
, (sum(qr.request_duration_ms) * count(1)) * sum(ra.thread_count) as total_cost
, count(1) as num_runs
from v_monitor.query_requests qr
join (select transaction_id, statement_id, event_type
from v_internal.dc_execution_engine_events
where node_name = (select min(node_name)
from v_catalog.nodes)
) eee using (transaction_id, statement_id)
left join (select transaction_id, statement_id, thread_count, memory_inuse_kb, duration_ms
from v_monitor.resource_acquisitions
where node_name = (select min(node_name)
from v_catalog.nodes)
) ra using (transaction_id, statement_id)
where start_timestamp::date > current_date -1
and not qr.is_executing
and eee.event_type in ('GROUP_BY_SPILLED', 'JOIN_SPILLED')
group by 1
having count() > 3
order by total_cost desc ;