We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Watch Out for Data Induced “Cross” Joins — Vertica Forum

Watch Out for Data Induced “Cross” Joins

Jim_KnicelyJim_Knicely - Select Field - Administrator

I recently worked with a client that reported that a query in Vertica would not return data. That is, the query ran forever. The query was pretty simple, joining two 100 billion+ tables together. After some investigation we realized that the query could potential return trillions and trillions of records! Why? Because the join keys actually performed a COSS JOIN. In this case it turned out to be bad data in one of the tables. I call this a data induced cross join, but it was actually just an unexpected many to many join. The point is, be careful with your join conditions!

Example:

dbadmin=> CREATE TABLE t1 (c INT);
CREATE TABLE

dbadmin=> CREATE TABLE t2 (c INT);
CREATE TABLE

dbadmin=> INSERT INTO t1 SELECT 1;
 OUTPUT
--------
   1
(1 row)

dbadmin=> INSERT INTO t1 SELECT 1;
 OUTPUT
--------
   1
(1 row)

dbadmin=> INSERT INTO t2 SELECT 1;
 OUTPUT
--------
   1
(1 row)

dbadmin=> INSERT INTO t2 SELECT 1;
 OUTPUT
--------
   1
(1 row)

dbadmin=> SELECT t1.*, t2.* FROM t1 JOIN t2 USING (c);
 c | c
---+---
 1 | 1
 1 | 1
 1 | 1
 1 | 1
(4 rows)

dbadmin=> SELECT t1.*, t2.* FROM t1 CROSS JOIN t2;
 c | c
---+---
 1 | 1
 1 | 1
 1 | 1
 1 | 1
(4 rows)

See how we are getting a cross join from the inner join? Imagine the result set of joining two 100+ billion records tables like this!

Helpful Link:
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AnalyzingData/Queries/Joins/CrossJoins.htm

Have fun!

Sign In or Register to comment.