Is segmenting on FK of a table helpful sometimes?
For segmenting on big tables it is common practice and recommended to use primary key as hashing key.
Now for two big tables A, B for which a join query on condition A.PK=B.FK_A is frequently performed (where B.FK_A is B’s FK to A), if A and B are both segmented on their own PKs, since A.PK and B.PK are different, there is big chance that the matched records of the given query are not in the same nodes.
If instead, let A be segmented on A.PK, and B be segmented on B.FK_A (instead of on B.PK), then all matched records for the given query in question will be in the same respective nodes; and so the performance of the query may be better. In general, using FK as segment key is sometimes helpful, e.g., in the above scenario.
(Here we may think of A, B as for transaction and transaction detail respectively, for example)
Is above thinking in right track, and comments?
Comments
Yes, this is the right track, for query optimization you always need to look for ISP ( Identically Segemented Projections) so the Join happens in the local node and you avoid a re segmentation that will just increase network traffic and affect the query performance.
However, when segment please check that the data is not skewed. You can look at the projection_storage table to see that the data is equally (similar) distributed in all the nodes.
Hope this answer your question.
Regards,
Eugenia
Then a next point to realize is that one table can only have one segment key, so if A1 and A2 have different ISP, table B can only follow one of the two.
/Sajan