We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Query Tuning Question — Vertica Forum

Query Tuning Question

I have a query that attempts to use 78 GB, fails (inner join did not fit into memory) and then re-runs using 3 GB. Below are the 2 query plans from dc_explain_plans (first one fails, second succeeds) .

The only difference is the materialize columns get moved from the join hash to the outer join. (the join hash cost also goes way up).

I then added ENABLE_JOIN_SPILL but this time it tried to use 152 GB! It failed and then re-ran again using 3 GB. The odd thing is both plans from dc_explain_plans using join spill are exactly the same! (and match exactly the second one shown).

Suggestions?

*table names changed to protect the innocent

+-DML INSERT [Cost: 0, Rows: 0]
| Target Projection: schema.table1_b1 (SORT BY PROJECTION SORT ORDER) (RESEGMENT)
| Target Projection: schema.table1_b0 (SORT BY PROJECTION SORT ORDER) (RESEGMENT)

| +---> JOIN HASH [Cost: 331M, Rows: 214M] (PATH ID: 3) Inner (BROADCAST)
| | Join Cond: (a.CNX_ADDR = b.CNX_ADDR)
| | Materialize at Output: b.CNX_HH, b.CNX_IND, b.PartnerId, b.PartnerUserId, b.Attributes
| | Execute on: All Nodes
| | +-- Outer -> STORAGE ACCESS for b [Cost: 16M, Rows: 214M] (PATH ID: 4)
| | | Projection: schema.table2_b0
| | | Materialize: b.CNX_ADDR
| | | Execute on: All Nodes
| | | Runtime Filter: (SIP1(HashJoin): b.CNX_ADDR)
| | +-- Inner -> STORAGE ACCESS for a [Cost: 833K, Rows: 36M] (PATH ID: 5)
| | | Projection: schema.table3_b0
| | | Materialize: a.SpendProfileId, a.INSTITUTIONID, a.CNX_ADDR, a.CNX_HH, a.CNX_IND
| | | Filter: (a.SpendProfileId IS NOT NULL)
| | | Execute on: All Nodes

+-DML INSERT [Cost: 0, Rows: 0]
| Target Projection: schema.table1_b1 (SORT BY PROJECTION SORT ORDER) (RESEGMENT)
| Target Projection: schema.table1_b0 (SORT BY PROJECTION SORT ORDER) (RESEGMENT)

| +---> JOIN HASH [Cost: 1B, Rows: 214M] (PATH ID: 3) Inner (BROADCAST)
| | Join Cond: (a.CNX_ADDR = b.CNX_ADDR)
| | Execute on: All Nodes
| | +-- Outer -> STORAGE ACCESS for b [Cost: 80M, Rows: 214M] (PATH ID: 4)
| | | Projection: schema.table2_b0
| | | Materialize: b.CNX_ADDR, b.CNX_HH, b.CNX_IND, b.PartnerId, b.PartnerUserId, b.Attributes
| | | Execute on: All Nodes
| | | Runtime Filter: (SIP1(HashJoin): b.CNX_ADDR)
| | +-- Inner -> STORAGE ACCESS for a [Cost: 833K, Rows: 36M] (PATH ID: 5)
| | | Projection: schema.table3_b0
| | | Materialize: a.SpendProfileId, a.INSTITUTIONID, a.CNX_ADDR, a.CNX_HH, a.CNX_IND
| | | Filter: (a.SpendProfileId IS NOT NULL)
| | | Execute on: All Nodes

INSERT /*+ DIRECT , set_vertica_options(EE, ENABLE_JOIN_SPILL), LABEL(EDP) */
INTO
schema.table1
SELECT
a.SpendProfileId ,
a.InstitutionId ,
a.SpendProfileId % 10000000000 AS CustomerId ,
b.PartnerId ,
b.PartnerUserId ,
CASE
WHEN a.CNX_HH = b.CNX_HH
AND a.CNX_Ind = b.CNX_IND
THEN 'I'
WHEN a.CNX_HH = b.CNX_HH
AND a.CNX_Ind != b.CNX_IND
THEN 'H'
ELSE 'A'
END AS MatchLevel ,
b.Attributes
FROM
schema.table3 a
JOIN
schema.table2 b
ON
a.CNX_Addr = b.CNX_ADDR
WHERE
a.SpendProfileId IS NOT NULL ;

Best Answer

  • scottpedersoliscottpedersoli Vertica Customer
    Answer ✓

    Thank you! I changed the segmentation and order by for both tables (as well as the data types on a few columns) and the query now runs in 1 min and uses 3 GB of memory (vs 11 min and 78 GB) Success.

Answers

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file