Query execution on multiple nodes
I have few questions regarding resource consumption for my query. I have a 3 node cluster.
1. select node_name from projection_usage where transaction_id= 45234523452345234
- This query shows all the projections are from same node.
- Does this mean complete data is drawn for same node
- Query plan has both segmented and unsegmented projections
- Query plan also shows ' Execute on: All Nodes ' for all joins.
2. Vertica requests 4 GB of memory for my query. Is this entire memory drawn from same node. Since computation will be distributed does each node in the resource poos needs have this much of memory ?
1. select node_name from projection_usage where transaction_id= 45234523452345234
- This query shows all the projections are from same node.
- Does this mean complete data is drawn for same node
- Query plan has both segmented and unsegmented projections
- Query plan also shows ' Execute on: All Nodes ' for all joins.
2. Vertica requests 4 GB of memory for my query. Is this entire memory drawn from same node. Since computation will be distributed does each node in the resource poos needs have this much of memory ?
0
Comments
1. select node_name from projection_usage where transaction_id= 45234523452345234
--This query shows all the projections are from same node.
The query made by you above is on the initiator node or it is becoming the initiator node at that instant, due to which the table projection_usage is showing you all projections residing on this node . The column node_name will show only the projections residing on initiator node and are participating in Query execution --- Does this mean complete data is drawn for same node
This does not mean the complete data is drawn from same node ( this can be only possible if all data is located on same initiator node )
--Query plan has both segmented and unsegmented projections
When you see all projection names in the below query, are all the projections dimension tables or they are mixed dimension and fact.....to be precise are they only unsegmented or are they mixed segmented as well unsegmented in the list If the above command shows the 6 projections out of which 5 are unsegmented and 1 is unsegmented then it is valid that the Explain plan is showing both segmented and unsegmented projections
--Query plan also shows ' Execute on: All Nodes ' for all joins.
Yes, that's correct, The query is executing on all nodes as all the data is not residing on the same initiator node. That means the projections from the projections list that the transaction is using is segmented across nodes.
2. Vertica requests 4 GB of memory for my query. Is this entire memory drawn from same node. Since computation will be distributed does each node in the resource poos needs have this much of memory ?
If you found out the memory using profile command then 4GB will be required by every node participating and fetching resultset in the query plan. So each node has to have 4GB of memory at that instant in the directed resource pools
1 - Take a look at your projection definition to see how they are distributed(if they are created on only one node).
2 - In your cluster all nodes can become initiator node(executor), this node will consume a bit more of resource in order to create and deploy the mini-plans across the other cluster nodes ,the nodes of the cluster share most of the work of executing expensive queries.
This is why is a good idea to let DBD do its work, as he will chose the right projections and their nodes for the task.
The resources consumption are divided between the participating nodes(not necessarily all of the cluster nodes).
Task such as Result Set formatting, combining results it will be on initiator expense.