Query execution on multiple nodes

siva · May 2014

I have few questions regarding resource consumption for my query. I have a 3 node cluster.

1. select node_name from projection_usage where transaction_id= 45234523452345234
- This query shows all the projections are from same node.
- Does this mean complete data is drawn for same node
- Query plan has both segmented and unsegmented projections
- Query plan also shows ' Execute on: All Nodes ' for all joins.

2. Vertica requests 4 GB of memory for my query. Is this entire memory drawn from same node. Since computation will be distributed does each node in the resource poos needs have this much of memory ?

Navin_C · May 2014

Hello Siva,

1. select node_name from projection_usage where transaction_id= 45234523452345234

--This query shows all the projections are from same node.
The query made by you above is on the initiator node or it is becoming the initiator node at that instant, due to which the table projection_usage is showing you all projections residing on this node . The column node_name will show only the projections residing on initiator node and are participating in Query execution


select node_name,transaction_id,client_type,client_host from sessions where transaction_id=45234523452345234

--- Does this mean complete data is drawn for same node
This does not mean the complete data is drawn from same node ( this can be only possible if all data is located on same initiator node )

--Query plan has both segmented and unsegmented projections
When you see all projection names in the below query, are all the projections dimension tables or they are mixed dimension and fact.....to be precise are they only unsegmented or are they mixed segmented as well unsegmented in the list

select node_name,projection_name from projection_usage where transaction_id= 45234523452345234

If the above command shows the 6 projections out of which 5 are unsegmented and 1 is unsegmented then it is valid that the Explain plan is showing both segmented and unsegmented projections

--Query plan also shows ' Execute on: All Nodes ' for all joins.
Yes, that's correct, The query is executing on all nodes as all the data is not residing on the same initiator node. That means the projections from the projections list that the transaction is using is segmented across nodes.

2. Vertica requests 4 GB of memory for my query. Is this entire memory drawn from same node. Since computation will be distributed does each node in the resource poos needs have this much of memory ?
If you found out the memory using profile command then 4GB will be required by every node participating and fetching resultset in the query plan. So each node has to have 4GB of memory at that instant in the directed resource pools

Adrian_Oprea_1 · May 2014

1 - Take a look at your projection definition to see how they are distributed(if they are created on only one node).
2 - In your cluster all nodes can become initiator node(executor), this node will consume a bit more of resource in order to create and deploy the mini-plans across the other cluster nodes ,the nodes of the cluster share most of the work of executing expensive queries.
This is why is a good idea to let DBD do its work, as he will chose the right projections and their nodes for the task.
The resources consumption are divided between the participating nodes(not necessarily all of the cluster nodes).
Task such as Result Set formatting, combining results it will be on initiator expense.

Navin_C · May 2014

Hi Adrian,

This is why is a good idea to let DBD do its work, as he will chose the right projections and their nodes for the task.

Are you referring to query planner in Vertica and not the DBD in this scenario ?

Adrian_Oprea_1 · May 2014

The DBD remark was an off topic, since bad projection layout might lead to a poor query planner.

We're Moving!

Create My New Community Account Now

Query execution on multiple nodes

Comments

Leave a Comment