explain plan understanding
Arvind_Kumar
Community Edition User ✭
Anyone can help me understanding in explain plan
what is Global resegmentation vs local resegmentation vs broadcast ? why this happens ? how we can avoid these ?
0
Comments
Data is resegmented to get it where it needs to be. For example, consider a query that computes the number of trades for each stock symbol. The trades for any one company must eventually be processed by a single operator. This can be done in multiple steps. The execution engine can first resegment trades locally, such that all trades of stock 'XYZ' are processed by one operator on each node. Then to get a final count the partially aggregated trade data can be resegmented among nodes such that all trades of stock 'XYZ' are processed by exactly one operator in the entire cluster. Of course, if the data were already segmented on stock symbol the global resegmentation wouldn't be necessary.
When joining two tables data must sometimes be resegmented to get it where it needs to be. In this case the join key of both tables must be taken into consideration, however, the concept is similar to that of group by. There is, however, another scenario for joins. Let's say one table is much larger than the other. In that case it might be better to broadcast or replicate data from the smaller table to all nodes in the cluster so that the larger table does not need to be resegmented.
The documentation describes projection optimization tricks to avoid resegmentation for group by and joins. Here are some links:
https://my.vertica.com/docs/8.0.x/HTML/index.htm#Authoring/AnalyzingData/Optimizations/AvoidingResegmentationDuringGROUPBYOptimizationWithProjectionDesign.htm?Highlight=local resegment
https://my.vertica.com/docs/8.0.x/HTML/index.htm#Authoring/AnalyzingData/Optimizations/AvoidingResegmentationDuringJoins.htm?TocPath=Analyzing%20Data|Query%20Optimization|JOIN%20Queries|_____2
Fyi ... Here is a link to a great blog post "Reading Query Plans":
https://my.vertica.com/kb/Reading-Query-Plans/Content/BestPractices/Reading-Query-Plans.htm
Also "local resegmentation" is a good thing - it's taking a data stream and redistributing it across multiple threads for the purpose of parallelism.