Understanding GraphViz Query Plan
of two parts: the text query plan and GraphViz plan.
In vertica administrator's guide, there's a lot of information about text query
plan. But for GraphViz plan, there is little information about that. And I have
problem understanding it and please help. (Sorry the post has some format problems, maybe because there is some escape characters in graphViz plan part which causes the html control misinterpretation. )
For example, I use the query
EXPLAIN SELECT customer_name, customer_state FROM customer_dimension WHERE customer_state in ('MA','NH') AND customer_gender = 'Male' ORDER BY customer_name LIMIT 10;and get the output like below,
QUERY PLAN DESCRIPTION:
------------------------------
EXPLAIN SELECT customer_name, customer_state FROM customer_dimension
WHERE customer_state in ('MA','NH') AND customer_gender = 'Male'
ORDER BY customer_name LIMIT 10;
Access Path:
+-SELECT LIMIT 10 [Cost: 6K, Rows: 10 (NO STATISTICS)] (PATH ID: 0)
| Output Only: 10 tuples
| +---> SORT [TOPK] [Cost: 6K, Rows: 50K (NO STATISTICS)] (PATH ID: 1)
| | Order: customer_dimension.customer_name ASC
| | Output Only: 10 tuples
| | +---> STORAGE ACCESS for customer_dimension [Cost: 746, Rows: 50K (NO STATISTICS)] (PATH ID: 2)
| | | Projection: public.customer_dimension_super
| | | Materialize: customer_dimension.customer_name, customer_dimension.customer_state
| | | Filter: (customer_dimension.customer_state = ANY (ARRAY['MA', 'NH']))
| | | Filter: (customer_dimension.customer_gender = 'Male')
------------------------------
-----------------------------------------------
PLAN: BASE QUERY PLAN (GraphViz Format)
-----------------------------------------------
digraph G {
graph [rankdir=BT, label = "BASE QUERY PLAN\nQuery: EXPLAIN SELECT customer_name, customer_state FROM customer_dimensio
\nORDER BY customer_name LIMIT 10;\n\nAll Nodes Vector: \n\n node[0]=v_vmart_node0001 (initiator) Up\n", labelloc=t, la
beljust=l ordering=out]
0[label = "Root \nOutBlk=[UncTuple(2)]", color = "green", shape = "house"];
1[label = "NewEENode \nOutBlk=[UncTuple(2)]", color = "green", shape = "box"];
2[label = "TopK[PIPELINED]\n Sort: (keys = )\nLIMIT: 10\nOffset: 0\nUnc: Varchar(256)\nUnc: Char(2)", color = "green",
shape = "box"];
3[label = "TopK[HEAP]\n Sort: (keys = A,N)\nLIMIT: 10\nUnc: Varchar(256)\nUnc: Char(2)", color = "green", shape = "box"
];
4[label = "StorageUnionStep: customer_dimension_super\nUnc: Varchar(256)\nUnc: Char(2)", color = "purple", shape = "box
"];
5[label = "ScanStep: customer_dimension_super\n(customer_dimension.customer_state = ANY (ARRAY[\'MA\', \'NH\']))\n(cust
omer_dimension.customer_gender = \'Male\')\ncustomer_name\ncustomer_gender (not emitted)\ncustomer_state\nUnc: Varchar(2
56)\nUnc: Char(2)", color = "brown", shape = "box"];
1->0 [label = "V[0] C=2",color = "black",style="bold", arrowtail="inv"];
2->1 [label = "0",color = "blue"];
3->2 [label = "0",color = "blue"];
4->3 [label = "0",color = "blue"];
5->4 [label = "0",color = "blue"];
}
I have several questions and please kindly help:- Question 1: What does Root and NewEENode mean? I find a post in this forum https://community.vertica.com/vertica/topics/neweenode_counter_in_execution_engine_profiles saying that NewEENode is vertica implementation detail... But sorry I still don't understand it. And why almost every GraphViz plan contain Root and NewEENode nodes?
- Question 2: Where can I find more information about GraphViz plan terminologies?
- Question 3: It not a one to one mapping between text plan and graphviz plan. For example, I can see there is a node on SORT [TOPK], and in graphViz plan we have two sort (Node 2 and 3: TopK[PIPELINED and TopK[HEAP]]. Why it's not one to one mapping and how can I correlate the text plan with graphViz plan?
- Question 4: May I know where the GraphViz plan come from? I mean there is a system table for example, that I can query from?
Thank you very much for help!
Comments
The GraphViz query plan is currently intentionally undocumented. It is there to help Vertica employees know what's going on within query execution; it contains various information that's not useful. (And some that's misleading -- "we know what that means; that's good enough, right?")
(Given that, "why do we display it by default?" is a good question.)
That said:
"Root node": Every query is a tree of operators. (Except for queries that aren't. But we won't go there.) Every tree has a root; a node that starts things off, gets results back, etc. That's the Root node.
"NewEENode": This used to be much more meaningful than it is today. NewEENode enables certain types of low-level operations as part of the rest of the plan. Nowadays, if a plan needs that functionality, the plan gets a NewEENode. In early versions of Vertica, some plans couldn't use NewEENodes; this led to various issues. (When you're a new startup with a new product, sometimes you haven't implemented all the edge cases yet...) But that's long since resolved. Why does a plan-node have this effect? Implementation detail :-)
"one to one mapping": The EXPLAIN plan is a purely logical plan. The GraphViz plan is closer to a physical plan, though still relatively high-level; it gets into how we actually read the data out of multiple different storages (ROS vs WOS, etc), how we parallelize some operations, etc.
"system table": No; tables describe things that are running or that have been run, EXPLAIN describes queries that haven't been run yet. EXECUTION_ENGINE_PROFILES has some (much-more-detailed; also undocumented) related information for long-running queries.
Adam
Happy new year!
This answer gives me useful information on how to understand text plan and graphViz plan info. I will keep learning that and if I have questions I will post it here. Thanks again!