PROFILE vs EXPLAIN

elghalielghali Registered User

Hello,
I'm trying to assess the cost of a huge query, now when I run it with EXPLAIN, it shows
SELECT LIMIT 1K [Cost: 572M, Rows: 1K (NO STATISTICS)] (PATH ID: 30)
but when I run it with PROFILE, it shows
NOTICE 5077: Total memory required by query: [49320002 KB]
as you can see there's a huge difference in memory consumption, why do these commands generate two different outputs and what am I reading wrong here?

Comments

  • LenoyJLenoyJ Employee, Registered User
    The 572M in your explain plan does NOT mean 572 MB of memory. It is a number that is generated based on many factors (row count, lowest I/O, CPU, memory, network etc.) Vertica uses a cost based optimizer to choose the best query plan with (generally) the lowest total cost.
    Read more here: https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/QueryManagement/QueryPlans/QueryPlanCostEstimation.htm

    PS: It is advisable to analyze statistics so that the optimizer can be more accurate in estimating costs and thereby generating better query plans. The plan states that the table has no statistics.
  • Vertica_CurtisVertica_Curtis Employee, Registered User

    Cost is an interesting metric, but a large query might not be terribly costly, if you only run it one time. Often, the more "expensive" queries are the ones that are medium-sized which run all the time. Those are the ones which are often worth trying to tune, since the total cost savings might be much larger.

    I wrote a query to find them:

    most common queries in the system, but calculates a "cost" metric for the
    -- query, based on threads, memory, duration. Also includes the last transaction_id so you can see the full
    -- query if you want, and the range of the times it's ran from and to.
    -- Note: This query specifically targets just JOIN and GROUP BY SPILL events, because coincidentally those
    -- are the most expensive queries, but if you want to target a broader set of queries, you can remove that
    -- line (near the bottom)
    select REGEXP_REPLACE(REGEXP_REPLACE(substr(qr.request, 1, 500), '\d'), '\''[\S\s]?\''', '''?''' ) as query
    , max(qr.transaction_id) as last_trans_id
    , avg(qr.request_duration_ms)/1000/60 as avg_duration_mins
    , avg(ra.thread_count) as avg_threads
    , to_char(avg(memory_inuse_kb)/1024, '999,999') as avg_mem_mb
    , (sum(qr.request_duration_ms) * count(1))/1000 as total_duration_secs
    , max(eee.event_type) as event_type
    , trunc(min(qr.start_timestamp), 'MI') as first_run
    , trunc(max(qr.start_timestamp), 'MI') as last_run
    , (sum(qr.request_duration_ms) * count(1)) * sum(ra.thread_count) as total_cost
    , count(1) as num_runs
    from v_monitor.query_requests qr
    join (select transaction_id, statement_id, event_type
    from v_internal.dc_execution_engine_events
    where node_name = (select min(node_name)
    from v_catalog.nodes)
    ) eee using (transaction_id, statement_id)
    left join (select transaction_id, statement_id, thread_count, memory_inuse_kb, duration_ms
    from v_monitor.resource_acquisitions
    where node_name = (select min(node_name)
    from v_catalog.nodes)
    ) ra using (transaction_id, statement_id)
    where start_timestamp::date > current_date -1
    and not qr.is_executing
    and eee.event_type in ('GROUP_BY_SPILLED', 'JOIN_SPILLED')
    group by 1
    having count(
    ) > 3
    order by total_cost desc ;

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file