Query Tuning Question

I have a query that attempts to use 78 GB, fails (inner join did not fit into memory) and then re-runs using 3 GB. Below are the 2 query plans from dc_explain_plans (first one fails, second succeeds) .

The only difference is the materialize columns get moved from the join hash to the outer join. (the join hash cost also goes way up).

I then added ENABLE_JOIN_SPILL but this time it tried to use 152 GB! It failed and then re-ran again using 3 GB. The odd thing is both plans from dc_explain_plans using join spill are exactly the same! (and match exactly the second one shown).

Suggestions?

*table names changed to protect the innocent

+-DML INSERT [Cost: 0, Rows: 0]
| Target Projection: schema.table1_b1 (SORT BY PROJECTION SORT ORDER) (RESEGMENT)
| Target Projection: schema.table1_b0 (SORT BY PROJECTION SORT ORDER) (RESEGMENT)

| +---> JOIN HASH [Cost: 331M, Rows: 214M] (PATH ID: 3) Inner (BROADCAST)
| | Join Cond: (a.CNX_ADDR = b.CNX_ADDR)
| | Materialize at Output: b.CNX_HH, b.CNX_IND, b.PartnerId, b.PartnerUserId, b.Attributes
| | Execute on: All Nodes
| | +-- Outer -> STORAGE ACCESS for b [Cost: 16M, Rows: 214M] (PATH ID: 4)
| | | Projection: schema.table2_b0
| | | Materialize: b.CNX_ADDR
| | | Execute on: All Nodes
| | | Runtime Filter: (SIP1(HashJoin): b.CNX_ADDR)
| | +-- Inner -> STORAGE ACCESS for a [Cost: 833K, Rows: 36M] (PATH ID: 5)
| | | Projection: schema.table3_b0
| | | Materialize: a.SpendProfileId, a.INSTITUTIONID, a.CNX_ADDR, a.CNX_HH, a.CNX_IND
| | | Filter: (a.SpendProfileId IS NOT NULL)
| | | Execute on: All Nodes

+-DML INSERT [Cost: 0, Rows: 0]
| Target Projection: schema.table1_b1 (SORT BY PROJECTION SORT ORDER) (RESEGMENT)
| Target Projection: schema.table1_b0 (SORT BY PROJECTION SORT ORDER) (RESEGMENT)

| +---> JOIN HASH [Cost: 1B, Rows: 214M] (PATH ID: 3) Inner (BROADCAST)
| | Join Cond: (a.CNX_ADDR = b.CNX_ADDR)
| | Execute on: All Nodes
| | +-- Outer -> STORAGE ACCESS for b [Cost: 80M, Rows: 214M] (PATH ID: 4)
| | | Projection: schema.table2_b0
| | | Materialize: b.CNX_ADDR, b.CNX_HH, b.CNX_IND, b.PartnerId, b.PartnerUserId, b.Attributes
| | | Execute on: All Nodes
| | | Runtime Filter: (SIP1(HashJoin): b.CNX_ADDR)
| | +-- Inner -> STORAGE ACCESS for a [Cost: 833K, Rows: 36M] (PATH ID: 5)
| | | Projection: schema.table3_b0
| | | Materialize: a.SpendProfileId, a.INSTITUTIONID, a.CNX_ADDR, a.CNX_HH, a.CNX_IND
| | | Filter: (a.SpendProfileId IS NOT NULL)
| | | Execute on: All Nodes

INSERT /*+ DIRECT , set_vertica_options(EE, ENABLE_JOIN_SPILL), LABEL(EDP) */
INTO
schema.table1
SELECT
a.SpendProfileId ,
a.InstitutionId ,
a.SpendProfileId % 10000000000 AS CustomerId ,
b.PartnerId ,
b.PartnerUserId ,
CASE
WHEN a.CNX_HH = b.CNX_HH
AND a.CNX_Ind = b.CNX_IND
THEN 'I'
WHEN a.CNX_HH = b.CNX_HH
AND a.CNX_Ind != b.CNX_IND
THEN 'H'
ELSE 'A'
END AS MatchLevel ,
b.Attributes
FROM
schema.table3 a
JOIN
schema.table2 b
ON
a.CNX_Addr = b.CNX_ADDR
WHERE
a.SpendProfileId IS NOT NULL ;

Best Answer

  • Answer ✓

    Thank you! I changed the segmentation and order by for both tables (as well as the data types on a few columns) and the query now runs in 1 min and uses 3 GB of memory (vs 11 min and 78 GB) Success.

Answers

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file