Query distribution & understanding explain command output

Antoine_Niek · May 2013

Hi all, We are trying to understand and ideally optimize some queries. We use a common db layout which consist in one big fact table and multiple small dimension tables to join against. The fact table is segmented on all nodes (using its composite key fields for the hashing as suggested by the dbd) and the dimension tables are unsegmented on all nodes (as suggested by dbd as well). My understanding is that the segmentation of the fact table allow our reporting queries hitting the fact table to be distributed on every node so that the joining with the dimension tables can occur on every node if necessary. However when I look at the projections fired by the query using the EXPLAIN command I can see that only the unsegmented *node0001 projections are being used when it comes to joining (which is the node on which I fired the query). However when the query is ran the IO rising on all our nodes suggest they are all being used to answer. What is happening ? Does the joining only occur on node 1 ?

[Deleted User] · May 2013

Hi Antoine, I'm curious, what do you mean by the "unsegmented *node0001 projections"? You indicated that the unsegmented projections were on all nodes. If you take a look at the following example EXPLAIN output (borrowed from the documentation, the "Running the EXPLAIN command" page): ------------------------------ QUERY PLAN DESCRIPTION: ------------------------------ EXPLAIN SELECT customer_name, customer_state FROM customer_dimension WHERE customer_state in ('MA','NH') AND customer_gender = 'Male' ORDER BY customer_name LIMIT 10; Access Path: +-SELECT LIMIT 10 [Cost: 370, Rows: 10] (PATH ID: 0) | Output Only: 10 tuples | Execute on: Query Initiator | +---> SORT [Cost: 370, Rows: 544] (PATH ID: 1) | | Order: customer_dimension.customer_name ASC | | Output Only: 10 tuples | | Execute on: Query Initiator | | +---> STORAGE ACCESS for customer_dimension [Cost: 331, Rows: 544] (PATH ID: 2) | | | Projection: public.customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001 | | | Materialize: customer_dimension.customer_state, customer_dimension.customer_name | | | Filter: (customer_dimension.customer_gender = 'Male') | | | Filter: (customer_dimension.customer_state = ANY (ARRAY['MA', 'NH'])) | | | Execute on: Query Initiator You'll see that every operator lists "Execute on:" as a field. That field can be "All Nodes" if the step is executed on all nodes; "Query Initiator" if it's only run on the node that you're connected to when you initiate the query; etc. If you're doing a simple SELECT and JOIN on the schema that you describe, I would expect most or all operators in the plan to say "Execute on: All Nodes", and for the query to in fact run in parallel with all the nodes doing a roughly-equal share of the work.

Antoine_Niek · May 2013

Hi Adam, thanks a lot for your answer, I can confirm all operators are saying "Execute on: All Nodes" for my simple query. In the EXPLAIN output above, by "unsegmented *node0001 projections" I would be referring to the "customer_dimension_DBD_1_rep_vmartdb_design_vmartdb_design_node0001" projection. I guess I was mislead by the projection naming. Thank you again, Antoine

Query distribution & understanding explain command output

Comments

Leave a Comment

Query distribution &amp; understanding explain command output

Comments