MERGEJOIN not used where expected
Admittedly, this is not the cleanest data model. However, I'm struggling to understand why HASH join is used rather than MERGEJOIN here.
CREATE TABLE T (COUNTRY, LOCALE, ...)
ORDER BY COUNTRY, LOCALE_ID
SEGMENTED BY HASH(COUNTRY, LOCALE_ID);
CREATE TABLE US_ZIPS (ZIP PRIMARY KEY, STATE, ....)
ORDER BY ZIP
UNSEGMENTED ALL NODES;
SELECT T.*, R.STATE
FROM T
LEFT JOIN R ON T.COUNTRY = 'USA' AND T.LOCALE=R.ZIP;
This variation does give desired results using MERGEJOIN:
SELECT T.*, R.STATE
FROM T
LEFT JOIN R ON AND T.LOCALE=R.ZIP
WHERE T.COUNTRY = 'USA'
UNION ALL
SELECT T.*, NULL
FROM T
WHERE T.COUNTRY<>'USA';
I feel that understanding why the former doesn't use MERGEJOIN would be good to know for future reference (assuming that's by design).
0
Comments
Hi,
Both are different queries.
Vertica's optimizer will comeup with various plans and decide which is best cost efficient plan to execute for a query.
In former case, it thought MERGE JOIN would be good, but in later case a HASH join was considered.
BTW, you can even force queries to go for a join as per your wish
.
https://my.vertica.com/docs/8.0.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Hints/SyntacticJoin.htm
A better pitcure of Vertica joins can be found here :
https://community.dev.hpe.com/t5/Vertica-Blog/Vertica-Joins-A-Refresher/ba-p/234901