Is segmenting on FK of a table helpful sometimes?

Charles_Yu · June 2014

For segmenting on big tables it is common practice and recommended to use primary key as hashing key.

Now for two big tables A, B for which a join query on condition A.PK=B.FK_A is frequently performed (where B.FK_A is B’s FK to A), if A and B are both segmented on their own PKs, since A.PK and B.PK are different, there is big chance that the matched records of the given query are not in the same nodes.

If instead, let A be segmented on A.PK, and B be segmented on B.FK_A (instead of on B.PK), then all matched records for the given query in question will be in the same respective nodes; and so the performance of the query may be better. In general, using FK as segment key is sometimes helpful, e.g., in the above scenario.

(Here we may think of A, B as for transaction and transaction detail respectively, for example)

Is above thinking in right track, and comments?

[Deleted User] · June 2014

Hi Charles,
Yes, this is the right track, for query optimization you always need to look for ISP ( Identically Segemented Projections) so the Join happens in the local node and you avoid a re segmentation that will just increase network traffic and affect the query performance.
However, when segment please check that the data is not skewed. You can look at the projection_storage table to see that the data is equally (similar) distributed in all the nodes.
Hope this answer your question.
Regards,
Eugenia

Charles_Yu · June 2014

Thanks for reply!
Then a next point to realize is that one table can only have one segment key, so if A1 and A2 have different ISP, table B can only follow one of the two.

[Deleted User] · June 2014

You can always create query specific projections. If you are just joining to get a couple columns, you could create a separated projection with different segmentation.

Sajan_Sangraula · June 2014

To add to Eugenia'a point, if one of the two tables doesn't have a large number of rows (e.g. Fact and Dimension table joins), you can also replicate the smaller table. This will assure joins can happen locally.

/Sajan

We're Moving!

Create My New Community Account Now

Is segmenting on FK of a table helpful sometimes?

Comments

Leave a Comment