Forcing a Merge Join

Jim_KnicelyJim_Knicely - Select Field - Administrator
edited April 2019 in Tips from the Team

The Vertica optimizer implements a join with one of the following algorithms:

  • Merge join is used when projections of the joined tables are sorted on the join columns. Merge joins are faster and uses less memory than hash joins.
  • Hash join is used when projections of the joined tables are not already sorted on the join columns. In this case, the optimizer builds an in-memory hash table on the inner table's join column. The optimizer then scans the outer table for matches to the hash table, and joins data from the two tables accordingly. The cost of performing a hash join is low if the entire hash table can fit in memory. Cost rises significantly if the hash table must be written to disk.

If you have limited resources (i.e. memory) perhaps some of your queries might not be able to perform a Hash Join and Vertica will let you know with the “Join inner did not fit in memory” error. Typically you can get around this error by optimizing your table projections or telling Vertica you want to enable join spill. However, what neither of those solutions are an option?

No problem. Here are two other methods that you can employ to force a Merge Join over a Hash Join so that your query can complete.

Example:

If I create these two simple tables:

dbadmin=> create table f (c1 int, c2 varchar(10));
CREATE TABLE

dbadmin=> create table d (c1 int, c2 varchar(10));
CREATE TABLE

A join of the 2 tables will use a Hash Join:

explain select f.* from f left join d on d.c1 = f.c1;

Access Path:
+-JOIN HASH [LeftOuter] [Cost: 0, Rows: 0] (PATH ID: 1) Outer (LOCAL ROUND ROBIN)
|  Join Cond: (d.c1 = f.c1)
| +-- Outer -> SELECT [Cost: 0, Rows: 0] (PATH ID: 2)
| +-- Inner -> SELECT [Cost: 0, Rows: 0] (PATH ID: 3

But I can force a Merge Join by ordering each join input by the join columns:

explain select f.* from (select * from f order by 1) f left join (select * from d order by 1) d on d.c1 = f.c1;

Access Path:
+-JOIN MERGEJOIN(inputs presorted) [LeftOuter] [Cost: 0, Rows: 0] (PATH ID: 1)
|  Join Cond: (d.c1 = f.c1)
| +-- Outer -> SELECT [Cost: 0, Rows: 0] (PATH ID: 2)
| | +---> SORT [Cost: 0, Rows: 0] (PATH ID: 3)
| | |      Order: f.c1 ASC
| +-- Inner -> SELECT [Cost: 0, Rows: 0] (PATH ID: 5)
| | +---> SORT [Cost: 0, Rows: 0] (PATH ID: 6)
| | |      Order: d.c1 ASC

I can also use Vertica hints to force a Merge Join:

explain select /* + SYNTACTIC_JOIN*/ f.* from f left join /*+JTYPE(FM)*/ d on d.c1 = f.c1;

Access Path:
+-JOIN MERGEJOIN(inputs presorted) [LeftOuter] [Cost: 0, Rows: 0] (PATH ID: 1) Outer (SORT ON JOIN KEY) Inner (SORT ON JOIN KEY)
|  Join Cond: (d.c1 = f.c1)
| +-- Outer -> SELECT [Cost: 0, Rows: 0] (PATH ID: 2)
| +-- Inner -> SELECT [Cost: 0, Rows: 0] (PATH ID: 3)

Helpful Link:
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AnalyzingData/Optimizations/HashJoinsVs.MergeJoins.htm

Have fun!

Sign In or Register to comment.