Avoid Query going to other Nodes if segment upper and lower bound is provided.
Hi,
I'm trying to use projection_segmnets for exporting out data in parallel.
SELECT get_projection_segments('my_table_b0') shows below segment ranges
v_node0001 v_node0004 v_lnode0002 v_node0003 v_node0005
858993458 1717986917 2576980376 3435973835 4294967294
4294967295 858993459 1717986918 2576980377 3435973836
--> my_table is segmented and ordered on column User_id.
--> User_id (123) resides in v_node0004
case 1:
Explain Select * FROM my_table where User_id = 123;
QUERY PLAN DESCRIPTION:
explain
select * FROM my_table where User_id = 123;
Access Path:
+-STORAGE ACCESS for my_table [Cost: 411, Rows: 789K] (PATH ID: 1)
| Projection: my_table_b0
| Materialize: my_table.User_id
| Filter: (my_table.User_id = (123))
| Execute on: v_node0004
case 2:
explain select person_id FROM my_table where person_id in (123,123) ;
QUERY PLAN DESCRIPTION:
Access Path:
+-STORAGE ACCESS for my_table [Cost: 504K, Rows: 248M] (PATH ID: 1)
| Projection: my_table_b0
| Materialize: my_table.User_id
| Filter: (my_table.User_id = ANY (ARRAY[(123), (123)]))
| Execute on: All Nodes
case 3:
explain select count(*) from my_table WHERE (0xffffffff & hash(User_id)) BETWEEN 858993459 AND 1717986917 ;
QUERY PLAN DESCRIPTION:
Access Path:
+-GROUPBY NOTHING [Cost: 504K, Rows: 1] (PATH ID: 1)
| Aggregates: count(*)
| Execute on: All Nodes
| +---> STORAGE ACCESS for my_table [Cost: 504K, Rows: 248M] (PATH ID: 2)
| | Projection: my_table_b0
| | Filter: (((4294967295 & hash(my_table.User_id)) >= 858993459) AND ((4294967295 & hash(my_table.User_id)) <= 1717986917))
| | Execute on: All Nodes
case 1: When provided single user_id it executes on one node i.e on the Node where the data resides
Even if we have logged in through NODE 1 in VSQL, The query get's executed in NODE 4 only.
Kindly help me with below queries:
Case 2: Why the query is executing on All Nodes, though the user_id are same and belongs to NODE 4, shouldn't it execute only on NODE 4 as in Case 1.
Case 3: If my filter condition is only looking at segment range of NODE 4, why the query is getting distributed on ALL Nodes.
Though Query is getting distributed to ALL NODES, Is it that only NODE 4 is contributing to the query and other don't do any thing (i.e there is no processing cost for this on other nodes except NODE 4).
Is there a system table from where this could be confirmed for a run like this.
If I look into execution_engine_profiles, all Nodes are going through same steps as NODE 4 is.
I'm trying to avoid query going to other nodes if it belongs to Node-4.
Tried case 2 & case 3 by logging in from Node-4 but the execution plan remained same as earlier.
Also please suggest if there is any other way to achieve this use case. (Please Note: I may join this table to another table , but those tables are also segmented and ordered by user_id )
Comments
That's because at now, our optimizer can only generate single node plan with equal predicate on segment columns of segmented projections , or unsegmented projections.
Thanks.
I may be repeating the question, but please help me with this.
Case 3: If my filter condition is only looking at segment range of NODE 4, why the query is getting distributed on ALL Nodes.
Though Query is getting distributed to ALL NODES, Is it that only NODE 4 is contributing to the query and other do Nothing (i.e there is no processing cost for this on other nodes except NODE 4).
Is there a system table from where this could be confirmed for a run like this.
If I look into execution_engine_profiles, all Nodes are going through same steps as NODE 4 is.
Maybe LOCAL_NODE_NAME() can help?
Thanks for replying.
It still goes to all nodes.
Although all nodes participate, only the local node specified does the bulk of the work.
If you want only one node to participate, you'll have to replicate the table projection.
Thanks Jim,
Hi,
Question 2:
If we only look at "Scanned" data we see that all rows do come from a single node:
In my example, all of the data did come from node 3. There are other operators that produce rows (i.e. GroupByPipe, StorageUnion, etc.).
Question 3:
For a segmented projection I think the best you can do is limit the computation to 2 nodes (buddy projections) because Vertica is going to look for the buddy segments.
For question 3:
the tables created in my case has one buddy projection (ie _b0, _b1) and each of them are replicated in all nodes. So for 5 nodes all of them have (_b0, _b1).
It seems we can't do anything with this.
Please suggest.