Partition Elimination testing
Hi there
I'm trying to evaluate whether partition elimination is occurring effectively with the following example, and looking at the documentation, cant see any way of deternmining this on this page:
https://my.vertica.com/docs/CE/6.0.1/HTML/index.htm#13731.htm
We store our facts with an idDate, which links to a date dimension with a business date, and wanted to make a partition strategy by month / year, as follows:
DDL
CREATE TABLE rl.FactTest
( idDate INT NOT NULL PRIMARY KEY
,idDim1 INT NOT NULL
,idDim2 INT NOT NULL
,value DECIMAL(22,6)
)
ORDER BY idDate
PARTITION BY ( FLOOR(idDate / 100)::INT );
INSERT INTO rl.FactTest (idDate, idDim1, idDim2, value) VALUES (20120101, 1, 2, 300.67);
INSERT INTO rl.FactTest (idDate, idDim1, idDim2, value) VALUES (20120201, 3, 4, 300.67);
explain
SELECT * FROM rl.FactTest
WHERE idDate = 20120201
Output
------------------------------
QUERY PLAN DESCRIPTION:
------------------------------
explain SELECT * FROM rl.FactTest WHERE idDate = 20120201
Access Path:
+-STORAGE ACCESS for FactTest [Cost: 8, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
| Projection: rl.FactTest_b0
| Materialize: FactTest.idDate, FactTest.idDim1, FactTest.idDim2, FactTest.value
| Filter: (FactTest.idDate = 20120201)
| Execute on: db_node0002
How is it possible to determine partition elimination in Vertica?
Thanks
Mike
I'm trying to evaluate whether partition elimination is occurring effectively with the following example, and looking at the documentation, cant see any way of deternmining this on this page:
https://my.vertica.com/docs/CE/6.0.1/HTML/index.htm#13731.htm
We store our facts with an idDate, which links to a date dimension with a business date, and wanted to make a partition strategy by month / year, as follows:
DDL
CREATE TABLE rl.FactTest
( idDate INT NOT NULL PRIMARY KEY
,idDim1 INT NOT NULL
,idDim2 INT NOT NULL
,value DECIMAL(22,6)
)
ORDER BY idDate
PARTITION BY ( FLOOR(idDate / 100)::INT );
INSERT INTO rl.FactTest (idDate, idDim1, idDim2, value) VALUES (20120101, 1, 2, 300.67);
INSERT INTO rl.FactTest (idDate, idDim1, idDim2, value) VALUES (20120201, 3, 4, 300.67);
explain
SELECT * FROM rl.FactTest
WHERE idDate = 20120201
Output
------------------------------
QUERY PLAN DESCRIPTION:
------------------------------
explain SELECT * FROM rl.FactTest WHERE idDate = 20120201
Access Path:
+-STORAGE ACCESS for FactTest [Cost: 8, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
| Projection: rl.FactTest_b0
| Materialize: FactTest.idDate, FactTest.idDim1, FactTest.idDim2, FactTest.value
| Filter: (FactTest.idDate = 20120201)
| Execute on: db_node0002
How is it possible to determine partition elimination in Vertica?
Thanks
Mike
0
Comments
Vertica developers from optimizer team claims it works.
--Sharon
Thank you for a hint, I can see now how partition is eliminated.
And it raises related question - with table being concurrently loaded by several processes, partition pruning show all WOS included in number of containers that will be accessed. It looks like there is no information about partition key attached to WOS, and it is not possible to prune WOS from query execution.
It can be a performance hit on our new cluster with much larger memory, with WOS resouirce pool at 25% of memory.
I remember adding "at epoch latest" used to be eliminate WOS data from query consideration. But I do not see difference now, when query run with or without "at epoch latest" clause WOS segments still included.
Do you know how to remove WOS data from query results?
Thank you
Sergey
So I have done a couple of tests, and I can only get partition elimination to work, if the criteria in a query is on a fact table, and not a dimension table, here is an example:
CREATE TABLE rl.FactTest2
( idDate INT NOT NULL
,businessDate date not null
,idDim1 INT NOT NULL
,idDim2 INT NOT NULL
,value DECIMAL(22,6)
)
ORDER BY idDate
PARTITION BY ( EXTRACT(year FROM businessDate)*100 + EXTRACT(month FROM businessDate) );
INSERT INTO rlFactTest2 (idDate, businessDate, idDim1, idDim2, value) VALUES (20120101, '2012-01-01', 1, 2, 300.67);
INSERT INTO rl.FactTest2 (idDate, businessDate, idDim1, idDim2, value) VALUES (20120201, '2012-02-01', 3, 4, 300.67);
CREATE TABLE rl.DimensionDate
(idDate INT NOT NULL PRIMARY KEY
,businessDate date not null
,isCurrentDay boolean not null)
INSERT INTO rl.DimensionDate (idDate, businessDate, isCurrentDay) VALUES (20120101, '2012-01-01', true);
INSERT INTO rl.DimensionDate (idDate, businessDate, isCurrentDay) VALUES (20120201, '2012-02-01', false);
Running the following query:
explain local verbose
SELECT f.*, d.isCurrentDay FROM rl.FactTest2 f
INNER JOIN rl.DimensionDate d ON d.idDate = f.idDate
WHERE f.businessDate In ('2012-01-01')
I get the following output (filtered for 'prun') in which I can see the pruning of 1 storage, 1 row (highlighted)
5[label = "StorageUnionStep: DimensionDate_b0\nsubdivision: 1048576\nstartEpoch 0\nendEpoch 6422\nNo of storages: 1\nPruned storages: 0 (0 rows)\nnodeSet: [0, 1, 2] \n[Local Segmented Properties: UNSEGMENTED]\nUnc: f.businessDate[1,2] - Date(8)\nUnc: f.idDate[1,1] - Integer(8)\nUnc: f.idDim1[1,3] - Integer(8)\nUnc: f.idDim2[1,4] - Integer(8)\nUnc: f.value[1,5] - Numeric(22,6)\nUnc: isCurrentDay[2,3] - Boolean(1)", color = "purple", shape = "box"];
8[label = "StorageUnionStep: FactTest2_b0\nstartEpoch 0\nendEpoch 6422\nNo of storages: 1\nPruned storages: 0 (0 rows) \n[Local Segmented Properties: UNSEGMENTED]\nUnc: idDate[1,1] - Integer(8)\nUnc: businessDate[1,2] - Date(8)\nUnc: idDim1[1,3] - Integer(8)\nUnc: idDim2[1,4] - Integer(8)\nUnc: value[1,5] - Numeric(22,6)", color = "purple", shape = "box"];
2[label = "StorageUnionStep: DimensionDate_b0\nsubdivision: 1048576\nstartEpoch 0\nendEpoch 6422\nNo of storages: 1\nPruned storages: 0 (0 rows)\nnodeSet: [0, 1, 2] \n[Local Segmented Properties: UNSEGMENTED]\nUnc: f.businessDate[1,2] - Date(8)\nUnc: f.idDate[1,1] - Integer(8)\nUnc: f.idDim1[1,3] - Integer(8)\nUnc: f.idDim2[1,4] - Integer(8)\nUnc: f.value[1,5] - Numeric(22,6)\nUnc: isCurrentDay[2,3] - Boolean(1)", color = "purple", shape = "box"];
5[label = "StorageMergeStep: FactTest2_b0; 2 sorted\nsubdivision: 1048576\nstartEpoch 0\nendEpoch 6422\nPruned storages: 1 (1 rows)\nnodeSet: [0, 1, 2] \n[Local Segmented Properties: UNSEGMENTED]\nUnc: idDate[1,1] - Integer(8)\nUnc: businessDate[1,2] - Date(8)\nUnc: idDim1[1,3] - Integer(8)\nUnc: idDim2[1,4] - Integer(8)\nUnc: value[1,5] - Numeric(22,6)", color = "purple", shape = "box"];
However if I run the following query, using the date dimension businessDate as the criteria:
explain local verbose
SELECT f.*, d.isAmericasCurrentDay FROM rl.FactTest2 f
INNER JOIN rl.DimDate d ON d.idDate = f.idDate
WHERE d.businessDate = '2012-01-01'
I get the following output (filtered for 'prun') in which I can see the pruning of 0 storage, 0 row (highlighted) on FactTest2
7[label = "StorageMergeStep: DimDate_b1; 1 sorted\nsubdivision: 1048576\nstartEpoch 0\nendEpoch 6422\nPruned storages: 1 (11 rows)\nnodeSet: [0, 1, 2] \n[Local Segmented Properties: UNSEGMENTED]\nUnc: idDate[2,1] - Integer(8)\nUnc: isAmericasCurrentDay[2,6] - Boolean(1)", color = "purple", shape = "box"];
2[label = "StorageUnionStep: FactTest2_b1\nsubdivision: 1048576\nstartEpoch 0\nendEpoch 6422\nNo of storages: 1\nPruned storages: 0 (0 rows)\nnodeSet: [0, 1, 2] \n[Local Segmented Properties: UNSEGMENTED]\nUnc: f.idDate[1,1] - Integer(8)\nUnc: businessDate[1,2] - Date(8)\nUnc: idDim1[1,3] - Integer(8)\nUnc: idDim2[1,4] - Integer(8)\nUnc: value[1,5] - Numeric(22,6)\nUnc: d.isAmericasCurrentDay[2,6] - Boolean(1)", color = "purple", shape = "box"];
5[label = "StorageUnionStep: DimDate_b1\nstartEpoch 0\nendEpoch 6422\nNo of storages: 1\nPruned storages: 1 (6 rows) \n[Local Segmented Properties: UNSEGMENTED]\nUnc: idDate[2,1] - Integer(8)\nUnc: isAmericasCurrentDay[2,6] - Boolean(1)", color = "purple", shape = "box"];
2[label = "StorageUnionStep: FactTest2_b1\nsubdivision: 1048576\nstartEpoch 0\nendEpoch 6422\nNo of storages: 1\nPruned storages: 0 (0 rows)\nnodeSet: [0, 1, 2] \n[Local Segmented Properties: UNSEGMENTED]\nUnc: f.idDate[1,1] - Integer(8)\nUnc: businessDate[1,2] - Date(8)\nUnc: idDim1[1,3] - Integer(8)\nUnc: idDim2[1,4] - Integer(8)\nUnc: value[1,5] - Numeric(22,6)\nUnc: d.isAmericasCurrentDay[2,6] - Boolean(1)", color = "purple", shape = "box"];
5[label = "StorageMergeStep: DimDate_b1; 1 sorted\nsubdivision: 1048576\nstartEpoch 0\nendEpoch 6422\nPruned storages: 2 (16 rows)\nnodeSet: [0, 1, 2] \n[Local Segmented Properties: UNSEGMENTED]\nUnc: idDate[2,1] - Integer(8)\nUnc: isAmericasCurrentDay[2,6] - Boolean(1)", color = "purple", shape = "box"];
Is it possible for criteria specified on a dimension, to affect the partition pruning on a partitioned fact table?
businessdate between <> and <>;
that will prune the ros that have data from those partitions.
Otherwise, how Vertica will know that what you are filtering is in fact the same data than the dimension? You may need to add a constraint to show the relation ship.
Eugenia
Vertica can prune ROS containers based on any column in the projection - it stores min and max values for every column as metadata for every ROS container. You could denormalize your tables so that the fact table includes the date dimension or a subset of the date dimension to allow for pruning. Or you can use a prejoin projection which will have the same effect of bringing the data into a single projection and thus be prunable. Both have the benefit of avoiding a run-time join, so better query performane. Since the time dimension values will correlate closely to the businessDate values, you'll get effective pruning and good compression on the columns.
--Sharon
I've denormalised the businessDate column down to the fact table, and in OBI configuration have linked the Date dimension and the Facts with both idDate and businessDate columns. This generates SQL which eliminates partitions based on the business date passed in as a predicate on the Date dimension, as this seems to be picked up by the optimiser as implied predicate on the partition column in Fact.
However the OBI time series functionality does not eliminate partitions. It generates two SQL common table expressions which evaluate the date difference, from a supplied date predicate. The date selected is linked through the common table expressions, the attached diagram may explain things a little better than just words.
Following this, I tried creating a prejoin projection between salient fields in the Date dimension and the Fact, but this did not succeed in eliminating partitions either. I can supply full DDL and full OBI select statements if needed. Any ideas?
ideas?
Regards
Mike
I created a simple test case using a pre-join projection, emulating loosely the data in your example, and I did get partition elimination. Perhaps there's some subtle difference between my simplified test case and your more complicated scenario.
--Sharon
Regards
Mike
Here's my simple script showing that partition elimination does work on a pre-join projection. In my test case I have partitioning on the fact table but not the dimension table. The queries use a predicate on a date column in the dimension table.
--Sharon
I tried this, and there were no query events returned at all.
I also tried using "explain local verbose" and that showed no pruning at all.
2[label = "StorageUnionStep: fact_super\nsubdivision: 1048576\nstartEpoch 0\nendEpoch 2556\nNo of storages: 3\nPruned storages: 0 (0 rows)\nnodeSet: [0] \n[Local Segmented Properties: UNSEGMENTED]\nUnc: fact.fk[1,1] - Integer(8)\nUnc: fact.i[1,2] - Integer(8)\nUnc: dim.pk[2,1] - Integer(8)\nUnc: dim.dt[2,2] - Date(8)", color = "purple", shape = "box"];
2[label = "StorageUnionStep: fact_super\nsubdivision: 1048576\nstartEpoch 0\nendEpoch 2556\nNo of storages: 3\nPruned storages: 0 (0 rows)\nnodeSet: [0] \n[Local Segmented Properties: UNSEGMENTED]\nUnc: fact.fk[1,1] - Integer(8)\nUnc: fact.i[1,2] - Integer(8)\nUnc: dim.pk[2,1] - Integer(8)\nUnc: dim.dt[2,2] - Date(8)", color = "purple", shape = "box"];
We are using Vertica Analytic Database v6.1.3-0
Regards
Mike
That's curious - the script works for me on a one-node 6.1.3 db and a multi-node 6.1.2 db.
--Sharon
I was using DBVisualizer and the following criteria was not matching the events in the query_events table: Using vsql, this works as you suggested. Thanks for your help
Regards
Mike