Optimization for joins vs analytics

Aaron_Roe · October 2014

We have several fact tables with approximately 5 billion rows each. There are 650 columns in the biggest table and more like 30-40 columns in the other tables.

They are joined together using a single column primary key but this unique key is rarely used in analytics. The data is segmented on 10 nodes using modularhash of this unique key.

In the projections should this unique key be first in the sort order? Should it be near the top but not first? It will almost never be used in a group by or predicate but it will often be used as a join. I am mostly seeing HASH joins between our tables instead of MERGE joins which I don't think is good. Can you give me any pointers for optimizing the tables for joins without degrading performance for analytics?

This type of join is running out of memory for us and I'd like to get it optimized.

| +---> JOIN HASH [Cost: 101M, Rows: 3B] (PATH ID: 2) Inner (BROADCAST)

Thanks.

We're Moving!

Create My New Community Account Now

Optimization for joins vs analytics

Leave a Comment