Understanding GraphViz Query Plan

We use EXPLAIN [SQL] to get the query plan, and the output is composed
of two parts: the text query plan and GraphViz plan. 

In vertica administrator's guide, there's a lot of information about text query
plan. But for GraphViz plan, there is little information about that. And I have
problem understanding it and please help. (Sorry the post has some format problems, maybe because there is some escape characters in graphViz plan part which causes the html control misinterpretation. )

For example, I use the query 
EXPLAIN SELECT customer_name, customer_state FROM  customer_dimension   WHERE customer_state in ('MA','NH') AND  customer_gender = 'Male'   ORDER BY customer_name LIMIT 10;
and get the output like below,

 QUERY PLAN DESCRIPTION:

 ------------------------------

 

 EXPLAIN SELECT customer_name, customer_state FROM customer_dimension

 WHERE customer_state in ('MA','NH') AND customer_gender = 'Male'

 ORDER BY customer_name LIMIT 10;

 

 Access Path:

 +-SELECT  LIMIT 10 [Cost: 6K, Rows: 10 (NO STATISTICS)] (PATH ID: 0)

 |  Output Only: 10 tuples

 | +---> SORT [TOPK] [Cost: 6K, Rows: 50K (NO STATISTICS)] (PATH ID: 1)

 | |      Order: customer_dimension.customer_name ASC

 | |      Output Only: 10 tuples

 | | +---> STORAGE ACCESS for customer_dimension [Cost: 746, Rows: 50K (NO STATISTICS)] (PATH ID: 2)

 | | |      Projection: public.customer_dimension_super

 | | |      Materialize: customer_dimension.customer_name, customer_dimension.customer_state

 | | |      Filter: (customer_dimension.customer_state = ANY (ARRAY['MA', 'NH']))

 | | |      Filter: (customer_dimension.customer_gender = 'Male') 

 

 ------------------------------

 -----------------------------------------------

 PLAN: BASE QUERY PLAN (GraphViz Format)

 -----------------------------------------------

 digraph G {

 graph [rankdir=BT, label = "BASE QUERY PLAN\nQuery: EXPLAIN SELECT customer_name, customer_state FROM customer_dimensio

\nORDER BY customer_name LIMIT 10;\n\nAll Nodes Vector: \n\n  node[0]=v_vmart_node0001 (initiator) Up\n", labelloc=t, la

beljust=l ordering=out]

 0[label = "Root \nOutBlk=[UncTuple(2)]", color = "green", shape = "house"];

 1[label = "NewEENode \nOutBlk=[UncTuple(2)]", color = "green", shape = "box"];

 2[label = "TopK[PIPELINED]\n Sort: (keys = )\nLIMIT: 10\nOffset: 0\nUnc: Varchar(256)\nUnc: Char(2)", color = "green",

shape = "box"];

 3[label = "TopK[HEAP]\n Sort: (keys = A,N)\nLIMIT: 10\nUnc: Varchar(256)\nUnc: Char(2)", color = "green", shape = "box"

];

 4[label = "StorageUnionStep: customer_dimension_super\nUnc: Varchar(256)\nUnc: Char(2)", color = "purple", shape = "box

"];

 5[label = "ScanStep: customer_dimension_super\n(customer_dimension.customer_state = ANY (ARRAY[\'MA\', \'NH\']))\n(cust

omer_dimension.customer_gender = \'Male\')\ncustomer_name\ncustomer_gender (not emitted)\ncustomer_state\nUnc: Varchar(2

56)\nUnc: Char(2)", color = "brown", shape = "box"];

 1->0 [label = "V[0] C=2",color = "black",style="bold", arrowtail="inv"];

 2->1 [label = "0",color = "blue"];

 3->2 [label = "0",color = "blue"];

 4->3 [label = "0",color = "blue"];

 5->4 [label = "0",color = "blue"];

 }

I have several questions and please kindly help:
  • Question 2: Where can I find more information about GraphViz plan terminologies?
  • Question 3: It not a one to one mapping between text plan and graphviz plan. For example, I can see there is a node on SORT [TOPK], and in graphViz plan we have two sort (Node 2 and 3: TopK[PIPELINED and TopK[HEAP]]. Why it's not one to one mapping and how can I correlate the text plan with graphViz plan?
  • Question 4: May I know where the GraphViz plan come from? I mean there is a system table for example, that I can query from? 


Thank you very much for help!

Comments

  • Hi,

    The GraphViz query plan is currently intentionally undocumented.  It is there to help Vertica employees know what's going on within query execution; it contains various information that's not useful.  (And some that's misleading -- "we know what that means; that's good enough, right?")

    (Given that, "why do we display it by default?" is a good question.)

    That said:

    "Root node":  Every query is a tree of operators.  (Except for queries that aren't.  But we won't go there.)  Every tree has a root; a node that starts things off, gets results back, etc.  That's the Root node.

    "NewEENode":  This used to be much more meaningful than it is today.  NewEENode enables certain types of low-level operations as part of the rest of the plan.  Nowadays, if a plan needs that functionality, the plan gets a NewEENode.  In early versions of Vertica, some plans couldn't use NewEENodes; this led to various issues.  (When you're a new startup with a new product, sometimes you haven't implemented all the edge cases yet...)  But that's long since resolved.  Why does a plan-node have this effect?  Implementation detail :-)

    "one to one mapping":  The EXPLAIN plan is a purely logical plan.  The GraphViz plan is closer to a physical plan, though still relatively high-level; it gets into how we actually read the data out of multiple different storages (ROS vs WOS, etc), how we parallelize some operations, etc.

    "system table":  No; tables describe things that are running or that have been run, EXPLAIN describes queries that haven't been run yet.  EXECUTION_ENGINE_PROFILES has some (much-more-detailed; also undocumented) related information for long-running queries.

    Adam
  • Thank you Adam for your help. 
    Happy new year!

    This answer gives me useful information on how to understand text plan and graphViz plan info. I will keep learning that and if I have questions I will post it here. Thanks again!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file