Same query executes 2 times in a row
I'm experiencing a strange issue that I can't explain. Sometimes when an INSERT ... SELECT query gets executed from our ETL tool (Pentaho) I can see in v_monitor.query_requests that it for some reason tried to be executed 2 times in a row. First time memory_acquired_mb is usually very large (in this case success=False in v_monitor.query_requests) and the second time memory_acquired_mb is often significantly smaller (in this case success=True in v_monitor.query_requests).
My first guess was that engine tries to execute the query in memory, fails to do so and proceeds with the execution on disk. However, in this case I would expect to see events like GROUP_BY_SPILLED or JOIN_SPILLED, but I don't see these events.
Could there be other reasons for this behaviour?
I can't share an SQL query, because this issue appears for many queries we have in our ETL process - therefore I don't think it's query-specific. What I can say is that we're on Vertica 9.3.1 and we have a cluster of 3 nodes (if this information is useful for this question).
To detect a retry, PROFILE the query or check vertica.log
If hash table cannot fit in memory, join must spill to disk
Both the fact and dimension must spill to disk
Query tries to fit in memory first; if it cannot fit, it retries automatically with join spill enabled
If you know a query will spill, add hint to skip hashing to memory
SELECT /+add_vertica_options(EE,ENABLE_JOIN_SPILL)/ …
Loading into pre-join projections requires ENABLE_JOIN_SPILL set if join hash table exceeds memory.
Try to see how much memory the query requires compared to the RP memory budget.
E.g. SELECT pool_name, thread_count, open_file_handle_count, memory_inuse_kb, queue_entry_timestamp,
acquisition_timestamp FROM V_MONITOR.RESOURCE_ACQUISITIONS WHERE node_name ILIKE '%node0001';
Check how much memory the query really needs..
create resource pool p_pool memorysize '1K' plannedconcurrency 4 maxconcurrency 4;
set session resource_pool to p_pool;
profile ... your query;
If you don't have the memory usage printed (which may happen)
select * from resource_acquisitions where transaction_id=... and statement_id=...;
The trick: p_pool with very small memorysize.
(if you run profile on general pool instead, it will use the query_budget instead on minimal memory needed)
Of course p_pool is only for profiling.
Run the following query to check your Vertica events status.
Recommends actions one can take to resolve event issues are listed here.
select event_type,event_description, count(*)
where ( NOW() - EVENT_TIMESTAMP) < '24 hour'
group by 1,2
order by 3 desc;
Thank you, mosheg!
This is a very detailed and useful answer. There's only one thing I don't understand in this context. In case of retry I expect to see some "SPILL" event (like GROUP_BY_SPILLED or JOIN_SPILLED) in v_monitor.query_events. However, I don't see any events for affected transactions.
You should be able to check the DC_REQUESTS_ISSUED and DC_REQUESTS_RETRIED Data Collector tables to find out which queries retried and why.
Thanks @Jim_Knicely , I didn't know about dc_requests_issued and dc_requests_retried - that is a useful piece of information.