explain plan understanding

Arvind_Kumar · April 2017

Anyone can help me understanding in explain plan
what is Global resegmentation vs local resegmentation vs broadcast ? why this happens ? how we can avoid these ?

swalkaus · April 2017

Data is resegmented to get it where it needs to be. For example, consider a query that computes the number of trades for each stock symbol. The trades for any one company must eventually be processed by a single operator. This can be done in multiple steps. The execution engine can first resegment trades locally, such that all trades of stock 'XYZ' are processed by one operator on each node. Then to get a final count the partially aggregated trade data can be resegmented among nodes such that all trades of stock 'XYZ' are processed by exactly one operator in the entire cluster. Of course, if the data were already segmented on stock symbol the global resegmentation wouldn't be necessary.

When joining two tables data must sometimes be resegmented to get it where it needs to be. In this case the join key of both tables must be taken into consideration, however, the concept is similar to that of group by. There is, however, another scenario for joins. Let's say one table is much larger than the other. In that case it might be better to broadcast or replicate data from the smaller table to all nodes in the cluster so that the larger table does not need to be resegmented.

The documentation describes projection optimization tricks to avoid resegmentation for group by and joins. Here are some links:

https://my.vertica.com/docs/8.0.x/HTML/index.htm#Authoring/AnalyzingData/Optimizations/AvoidingResegmentationDuringGROUPBYOptimizationWithProjectionDesign.htm?Highlight=local resegment

https://my.vertica.com/docs/8.0.x/HTML/index.htm#Authoring/AnalyzingData/Optimizations/AvoidingResegmentationDuringJoins.htm?TocPath=Analyzing%20Data|Query%20Optimization|JOIN%20Queries|_____2