Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

Partition Elimination testing

Hi there

I'm trying to evaluate whether partition elimination is occurring effectively with the following example, and looking at the documentation, cant see any way of deternmining this on this page:
https://my.vertica.com/docs/CE/6.0.1/HTML/index.htm#13731.htm

We store our facts with an idDate, which links to a date dimension with a business date, and wanted to make a partition strategy by month / year, as follows:

DDL

CREATE TABLE rl.FactTest
( idDate INT NOT NULL PRIMARY KEY
 ,idDim1 INT NOT NULL
 ,idDim2 INT NOT NULL
 ,value DECIMAL(22,6)
)
ORDER BY idDate
PARTITION BY ( FLOOR(idDate / 100)::INT );

INSERT INTO  rl.FactTest (idDate, idDim1, idDim2, value) VALUES (20120101, 1, 2, 300.67);
INSERT INTO  rl.FactTest (idDate, idDim1, idDim2, value) VALUES (20120201, 3, 4, 300.67);

explain
SELECT * FROM rl.FactTest
WHERE idDate = 20120201

Output

------------------------------
QUERY PLAN DESCRIPTION:
------------------------------

explain SELECT * FROM rl.FactTest WHERE idDate = 20120201

Access Path:
+-STORAGE ACCESS for FactTest [Cost: 8, Rows: 1 (NO STATISTICS)] (PATH ID: 1)
|  Projection: rl.FactTest_b0
|  Materialize: FactTest.idDate, FactTest.idDim1, FactTest.idDim2, FactTest.value
|  Filter: (FactTest.idDate = 20120201)
|  Execute on: db_node0002

How is it possible to determine partition elimination in Vertica?

Thanks
Mike

Comments

  • In very early Vertica releases, explain statement produced by far more detailed and at same time by far more complex to understand execution plan. Then explain statement was rewritten, output simplified and get easier to understand by untrained human eye. At same time partition pruning information was dropped from output as unnecessary detail. And now it is a great mistery if partition pruning works. Even if you will dig into all documented system views, you will not find it.
    Vertica developers from optimizer team claims it works.
  • You can verify that partition elimination occurred in 6.x by using the QUERY_EVENTS table.  See the PARTITIONS_ELIMINATED event type.

      --Sharon

  • Thanks this gave me exactly what I needed!
  • Sharon,
    Thank you for a hint, I can see now how partition is eliminated.
    And it raises related question - with table being concurrently loaded by several processes, partition pruning show all WOS included in number of containers that will be accessed. It looks like there is no information about partition key attached to WOS, and it is not possible to prune WOS from query execution.
    It can be a performance hit on our new cluster with much larger memory, with WOS resouirce pool at 25% of memory.
    I remember adding "at epoch latest" used to be eliminate WOS data from query consideration. But I do not see difference now, when query run with or without "at epoch latest" clause WOS segments still included.
    Do you know how to remove WOS data from query results?
    Thank you
    Sergey
  • Hi there

    So I have done a couple of tests, and I can only get partition elimination to work, if the criteria in a query is on a fact table, and not a dimension table, here is an example:

    CREATE TABLE rl.FactTest2
    ( idDate INT NOT NULL
     ,businessDate date not null
     ,idDim1 INT NOT NULL
     ,idDim2 INT NOT NULL
     ,value DECIMAL(22,6)
    )
    ORDER BY idDate
    PARTITION BY ( EXTRACT(year FROM businessDate)*100 + EXTRACT(month FROM businessDate) );

    INSERT INTO  rlFactTest2 (idDate, businessDate, idDim1, idDim2, value) VALUES (20120101, '2012-01-01', 1, 2, 300.67);
    INSERT INTO  rl.FactTest2 (idDate, businessDate, idDim1, idDim2, value) VALUES (20120201, '2012-02-01', 3, 4, 300.67);

    CREATE TABLE rl.DimensionDate
    (idDate INT NOT NULL PRIMARY KEY
     ,businessDate date not null
     ,isCurrentDay boolean not null)

    INSERT INTO  rl.DimensionDate (idDate, businessDate, isCurrentDay) VALUES (20120101, '2012-01-01', true);
    INSERT INTO  rl.DimensionDate (idDate, businessDate, isCurrentDay) VALUES (20120201, '2012-02-01', false);

    Running the following query:

    explain local verbose
    SELECT f.*, d.isCurrentDay FROM rl.FactTest2 f
    INNER JOIN rl.DimensionDate d ON d.idDate = f.idDate
    WHERE f.businessDate In ('2012-01-01')

    I get the following output (filtered for 'prun') in which I can see the pruning of 1 storage, 1 row (highlighted)

    5[label = "StorageUnionStep: DimensionDate_b0\nsubdivision: 1048576\nstartEpoch 0\nendEpoch 6422\nNo of storages: 1\nPruned storages: 0 (0 rows)\nnodeSet: [0, 1, 2] \n[Local Segmented Properties: UNSEGMENTED]\nUnc: f.businessDate[1,2] - Date(8)\nUnc: f.idDate[1,1] - Integer(8)\nUnc: f.idDim1[1,3] - Integer(8)\nUnc: f.idDim2[1,4] - Integer(8)\nUnc: f.value[1,5] - Numeric(22,6)\nUnc: isCurrentDay[2,3] - Boolean(1)", color = "purple", shape = "box"];
    8[label = "StorageUnionStep: FactTest2_b0\nstartEpoch 0\nendEpoch 6422\nNo of storages: 1\nPruned storages: 0 (0 rows) \n[Local Segmented Properties: UNSEGMENTED]\nUnc: idDate[1,1] - Integer(8)\nUnc: businessDate[1,2] - Date(8)\nUnc: idDim1[1,3] - Integer(8)\nUnc: idDim2[1,4] - Integer(8)\nUnc: value[1,5] - Numeric(22,6)", color = "purple", shape = "box"];
    2[label = "StorageUnionStep: DimensionDate_b0\nsubdivision: 1048576\nstartEpoch 0\nendEpoch 6422\nNo of storages: 1\nPruned storages: 0 (0 rows)\nnodeSet: [0, 1, 2] \n[Local Segmented Properties: UNSEGMENTED]\nUnc: f.businessDate[1,2] - Date(8)\nUnc: f.idDate[1,1] - Integer(8)\nUnc: f.idDim1[1,3] - Integer(8)\nUnc: f.idDim2[1,4] - Integer(8)\nUnc: f.value[1,5] - Numeric(22,6)\nUnc: isCurrentDay[2,3] - Boolean(1)", color = "purple", shape = "box"];
    5[label = "StorageMergeStep: FactTest2_b0; 2 sorted\nsubdivision: 1048576\nstartEpoch 0\nendEpoch 6422\nPruned storages: 1 (1 rows)\nnodeSet: [0, 1, 2] \n[Local Segmented Properties: UNSEGMENTED]\nUnc: idDate[1,1] - Integer(8)\nUnc: businessDate[1,2] - Date(8)\nUnc: idDim1[1,3] - Integer(8)\nUnc: idDim2[1,4] - Integer(8)\nUnc: value[1,5] - Numeric(22,6)", color = "purple", shape = "box"];

    However if I run the following query, using the date dimension businessDate as the criteria:

    explain local verbose
    SELECT f.*, d.isAmericasCurrentDay FROM rl.FactTest2 f
    INNER JOIN rl.DimDate d ON d.idDate = f.idDate
    WHERE d.businessDate = '2012-01-01'

    I get the following output (filtered for 'prun') in which I can see the pruning of 0 storage, 0 row (highlighted) on FactTest2

    7[label = "StorageMergeStep: DimDate_b1; 1 sorted\nsubdivision: 1048576\nstartEpoch 0\nendEpoch 6422\nPruned storages: 1 (11 rows)\nnodeSet: [0, 1, 2] \n[Local Segmented Properties: UNSEGMENTED]\nUnc: idDate[2,1] - Integer(8)\nUnc: isAmericasCurrentDay[2,6] - Boolean(1)", color = "purple", shape = "box"];
    2[label = "StorageUnionStep: FactTest2_b1\nsubdivision: 1048576\nstartEpoch 0\nendEpoch 6422\nNo of storages: 1\nPruned storages: 0 (0 rows)\nnodeSet: [0, 1, 2] \n[Local Segmented Properties: UNSEGMENTED]\nUnc: f.idDate[1,1] - Integer(8)\nUnc: businessDate[1,2] - Date(8)\nUnc: idDim1[1,3] - Integer(8)\nUnc: idDim2[1,4] - Integer(8)\nUnc: value[1,5] - Numeric(22,6)\nUnc: d.isAmericasCurrentDay[2,6] - Boolean(1)", color = "purple", shape = "box"];
    5[label = "StorageUnionStep: DimDate_b1\nstartEpoch 0\nendEpoch 6422\nNo of storages: 1\nPruned storages: 1 (6 rows) \n[Local Segmented Properties: UNSEGMENTED]\nUnc: idDate[2,1] - Integer(8)\nUnc: isAmericasCurrentDay[2,6] - Boolean(1)", color = "purple", shape = "box"];
    2[label = "StorageUnionStep: FactTest2_b1\nsubdivision: 1048576\nstartEpoch 0\nendEpoch 6422\nNo of storages: 1\nPruned storages: 0 (0 rows)\nnodeSet: [0, 1, 2] \n[Local Segmented Properties: UNSEGMENTED]\nUnc: f.idDate[1,1] - Integer(8)\nUnc: businessDate[1,2] - Date(8)\nUnc: idDim1[1,3] - Integer(8)\nUnc: idDim2[1,4] - Integer(8)\nUnc: value[1,5] - Numeric(22,6)\nUnc: d.isAmericasCurrentDay[2,6] - Boolean(1)", color = "purple", shape = "box"];
    5[label = "StorageMergeStep: DimDate_b1; 1 sorted\nsubdivision: 1048576\nstartEpoch 0\nendEpoch 6422\nPruned storages: 2 (16 rows)\nnodeSet: [0, 1, 2] \n[Local Segmented Properties: UNSEGMENTED]\nUnc: idDate[2,1] - Integer(8)\nUnc: isAmericasCurrentDay[2,6] - Boolean(1)", color = "purple", shape = "box"];

    Is it possible for criteria specified on a dimension, to affect the partition pruning on a partitioned fact table?

  • In order to have partition pruning you have to put the predicated on the partition key field.  

  • Hmm, we have a BI tool which uses time series functionality based on the date dimension, bearing in mind we intend to store 8 years worth of data in monthly partitions, millions of rows per day,  this could pose a big problem for us.
  • If you know what are the date that you plan to add you just need to also include the partition field such as :

    businessdate between <> and <>; 

    that will prune the ros that have data from those partitions. 

    Otherwise, how Vertica will know that what you are filtering is in fact the same data than the dimension? You may need to add a constraint to show the relation ship. 

    Eugenia

  • Mike,

    Vertica can prune ROS containers based on any column in the projection - it stores min and max values for every column as metadata for every ROS container.  You could denormalize your tables so that the fact table includes the date dimension or a subset of the date dimension to allow for pruning.  Or you can use a prejoin projection which will have the same effect of bringing the data into a single projection and thus be prunable.  Both have the benefit of avoiding a run-time join, so better query performane.   Since the time dimension values will correlate closely to the businessDate values, you'll get effective pruning and good compression on the columns.   

      --Sharon

  • Hi Sharon

    I've denormalised the businessDate column down to the fact table, and in OBI configuration have linked the Date dimension and the Facts with both idDate and businessDate columns. This generates SQL which eliminates partitions based on the business date passed in as a predicate on the Date dimension, as this seems to be picked up by the optimiser as implied predicate on the partition column in Fact.

    However the OBI time series functionality does not eliminate partitions. It generates two SQL common table expressions which evaluate the date difference, from a supplied date predicate. The date selected is linked through the common table expressions, the attached diagram may explain things a little better than just words.

    Following this, I tried creating a prejoin projection between salient fields in the Date dimension and the Fact, but this did not succeed in eliminating partitions either. I can supply full DDL and full OBI select statements if needed. Any ideas?


     imageideas?

    Regards
    Mike



  • Hi Mike,

    I created a simple test case using a pre-join projection, emulating loosely the data in your example, and I did get partition elimination.   Perhaps there's some subtle difference between my simplified test case and your more complicated scenario. 

      --Sharon

  • Hi Sharon, please could you post the simplified DDL, so I can try and evaluate the differences?

    Regards
    Mike
  • Hi Mike,

    Here's my simple script showing that partition elimination does work on a pre-join projection.  In my test case I have partitioning on the fact table but not the dimension table.  The queries use a predicate on a date column in the dimension table.

      --Sharon
    drop table if exists dim cascade;
    create table dim(pk integer not null primary key, dt date);
    drop table if exists fact cascade;
    create table fact(fk integer not null, i integer, foreign key (fk) references dim(pk))
    partition by fk;
    create projection prejoin(pk, fk, i, dt)
    as select pk, fk, i, dt
    from fact join dim on pk = fk
    segmented by hash(pk) all nodes ksafe;
    insert /*+direct*/ into dim values(1, '2013-11-01');
    insert /*+direct*/ into dim values(2, '2013-11-02');
    insert /*+direct*/ into dim values(3, '2013-11-03');
    insert /*+direct*/ into fact values (1,1);
    insert /*+direct*/ into fact values (2,2);
    insert /*+direct*/ into fact values (3,3);
    commit;
    -- We have three rows with different dates, each in their own partitions.  This should use one storage container.
    select * from fact join dim on pk=fk
    where dt = '2013-11-03';
    select event_details from query_events where event_type='PARTITIONS_ELIMINATED'
    and (transaction_id, statement_id) in (select transaction_id, statement_id-1 from current_session);
     
    -- We have three rows with different dates, each in their own partitions.  This should use two storage containers.
    select * from fact join dim on pk=fk
    where dt >= '2013-11-02';
    select event_details from query_events where event_type='PARTITIONS_ELIMINATED'
    and (transaction_id, statement_id) in (select transaction_id, statement_id-1 from current_session);
  • Hi there

    I tried this, and there were no query events returned at all.
    I also tried using "explain local verbose" and that showed no pruning at all.

    2[label = "StorageUnionStep: fact_super\nsubdivision: 1048576\nstartEpoch 0\nendEpoch 2556\nNo of storages: 3\nPruned storages: 0 (0 rows)\nnodeSet: [0] \n[Local Segmented Properties: UNSEGMENTED]\nUnc: fact.fk[1,1] - Integer(8)\nUnc: fact.i[1,2] - Integer(8)\nUnc: dim.pk[2,1] - Integer(8)\nUnc: dim.dt[2,2] - Date(8)", color = "purple", shape = "box"];

    2[label = "StorageUnionStep: fact_super\nsubdivision: 1048576\nstartEpoch 0\nendEpoch 2556\nNo of storages: 3\nPruned storages: 0 (0 rows)\nnodeSet: [0] \n[Local Segmented Properties: UNSEGMENTED]\nUnc: fact.fk[1,1] - Integer(8)\nUnc: fact.i[1,2] - Integer(8)\nUnc: dim.pk[2,1] - Integer(8)\nUnc: dim.dt[2,2] - Date(8)", color = "purple", shape = "box"];

    We are using Vertica Analytic Database v6.1.3-0

    Regards
    Mike

  • Hi Mike,

    That's curious - the script works for me on a one-node 6.1.3 db and a multi-node 6.1.2 db.

      --Sharon

  • Hi Sharon

    I was using DBVisualizer and the following criteria was not matching the events in the query_events table:
    and (transaction_id, statement_id) in (select transaction_id, statement_id-1 from current_session)
    Using vsql, this works as you suggested. Thanks for your help

    Regards
    Mike

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.