MERGEJOIN not used where expected

KKirkpatrick · January 2017

Admittedly, this is not the cleanest data model. However, I'm struggling to understand why HASH join is used rather than MERGEJOIN here.

CREATE TABLE T (COUNTRY, LOCALE, ...)

ORDER BY COUNTRY, LOCALE_ID

SEGMENTED BY HASH(COUNTRY, LOCALE_ID);

CREATE TABLE US_ZIPS (ZIP PRIMARY KEY, STATE, ....)

ORDER BY ZIP

UNSEGMENTED ALL NODES;

SELECT T.*, R.STATE

FROM T

LEFT JOIN R ON T.COUNTRY = 'USA' AND T.LOCALE=R.ZIP;

This variation does give desired results using MERGEJOIN:

SELECT T.*, R.STATE

FROM T

LEFT JOIN R ON AND T.LOCALE=R.ZIP

WHERE T.COUNTRY = 'USA'

UNION ALL

SELECT T.*, NULL

FROM T

WHERE T.COUNTRY<>'USA';

I feel that understanding why the former doesn't use MERGEJOIN would be good to know for future reference (assuming that's by design).

Navin_C · January 2017

Hi,

Both are different queries.

Vertica's optimizer will comeup with various plans and decide which is best cost efficient plan to execute for a query.

In former case, it thought MERGE JOIN would be good, but in later case a HASH join was considered.

BTW, you can even force queries to go for a join as per your wish

.

A better pitcure of Vertica joins can be found here :

We're Moving!