Query Tuning Question
I have a query that attempts to use 78 GB, fails (inner join did not fit into memory) and then re-runs using 3 GB. Below are the 2 query plans from dc_explain_plans (first one fails, second succeeds) .
The only difference is the materialize columns get moved from the join hash to the outer join. (the join hash cost also goes way up).
I then added ENABLE_JOIN_SPILL but this time it tried to use 152 GB! It failed and then re-ran again using 3 GB. The odd thing is both plans from dc_explain_plans using join spill are exactly the same! (and match exactly the second one shown).
Suggestions?
*table names changed to protect the innocent
+-DML INSERT [Cost: 0, Rows: 0]
| Target Projection: schema.table1_b1 (SORT BY PROJECTION SORT ORDER) (RESEGMENT)
| Target Projection: schema.table1_b0 (SORT BY PROJECTION SORT ORDER) (RESEGMENT)
| +---> JOIN HASH [Cost: 331M, Rows: 214M] (PATH ID: 3) Inner (BROADCAST)
| | Join Cond: (a.CNX_ADDR = b.CNX_ADDR)
| | Materialize at Output: b.CNX_HH, b.CNX_IND, b.PartnerId, b.PartnerUserId, b.Attributes
| | Execute on: All Nodes
| | +-- Outer -> STORAGE ACCESS for b [Cost: 16M, Rows: 214M] (PATH ID: 4)
| | | Projection: schema.table2_b0
| | | Materialize: b.CNX_ADDR
| | | Execute on: All Nodes
| | | Runtime Filter: (SIP1(HashJoin): b.CNX_ADDR)
| | +-- Inner -> STORAGE ACCESS for a [Cost: 833K, Rows: 36M] (PATH ID: 5)
| | | Projection: schema.table3_b0
| | | Materialize: a.SpendProfileId, a.INSTITUTIONID, a.CNX_ADDR, a.CNX_HH, a.CNX_IND
| | | Filter: (a.SpendProfileId IS NOT NULL)
| | | Execute on: All Nodes
+-DML INSERT [Cost: 0, Rows: 0]
| Target Projection: schema.table1_b1 (SORT BY PROJECTION SORT ORDER) (RESEGMENT)
| Target Projection: schema.table1_b0 (SORT BY PROJECTION SORT ORDER) (RESEGMENT)
| +---> JOIN HASH [Cost: 1B, Rows: 214M] (PATH ID: 3) Inner (BROADCAST)
| | Join Cond: (a.CNX_ADDR = b.CNX_ADDR)
| | Execute on: All Nodes
| | +-- Outer -> STORAGE ACCESS for b [Cost: 80M, Rows: 214M] (PATH ID: 4)
| | | Projection: schema.table2_b0
| | | Materialize: b.CNX_ADDR, b.CNX_HH, b.CNX_IND, b.PartnerId, b.PartnerUserId, b.Attributes
| | | Execute on: All Nodes
| | | Runtime Filter: (SIP1(HashJoin): b.CNX_ADDR)
| | +-- Inner -> STORAGE ACCESS for a [Cost: 833K, Rows: 36M] (PATH ID: 5)
| | | Projection: schema.table3_b0
| | | Materialize: a.SpendProfileId, a.INSTITUTIONID, a.CNX_ADDR, a.CNX_HH, a.CNX_IND
| | | Filter: (a.SpendProfileId IS NOT NULL)
| | | Execute on: All Nodes
INSERT /*+ DIRECT , set_vertica_options(EE, ENABLE_JOIN_SPILL), LABEL(EDP) */
INTO
schema.table1
SELECT
a.SpendProfileId ,
a.InstitutionId ,
a.SpendProfileId % 10000000000 AS CustomerId ,
b.PartnerId ,
b.PartnerUserId ,
CASE
WHEN a.CNX_HH = b.CNX_HH
AND a.CNX_Ind = b.CNX_IND
THEN 'I'
WHEN a.CNX_HH = b.CNX_HH
AND a.CNX_Ind != b.CNX_IND
THEN 'H'
ELSE 'A'
END AS MatchLevel ,
b.Attributes
FROM
schema.table3 a
JOIN
schema.table2 b
ON
a.CNX_Addr = b.CNX_ADDR
WHERE
a.SpendProfileId IS NOT NULL ;
Best Answer
-
Thank you! I changed the segmentation and order by for both tables (as well as the data types on a few columns) and the query now runs in 1 min and uses 3 GB of memory (vs 11 min and 78 GB) Success.
1
Answers
Hi Scott,
If I were you, the first thing I'd do is avoiding the join spill caused by the hash join by ordering both Table3 and Table2 projections' on CNX_ADDR
Fantastic!
Hi Scott,
Since the query ran with 3GB on the retry, this means that the resource pool's "query budget" is 3GB, which is on the low side. For some queries, this may mean that some query operators end up running without full parallelism (not reaching the pool's EXECUTIONPARALLELISM). You may want increase the query budget by adjusting PLANNEDCONCURRENCY or MAXMEMORYSIZE/MEMORYSIZE. See https://www.vertica.com/docs/9.1.x/HTML/index.htm#Authoring/AdministratorsGuide/ManageWorkloads/QueryBudgeting.htm.
--Sharon