Customized Projections

I have run DBD and deployed it's recommendations . However, i'm working on creating customized projections to see if my queries could get extra performance gain. I find customized projections to be ultrafast provided i access them directly (not referring to base table)but i don't know how to force optimizer to access customized projections created by me when i directly refer to base table. Any pointers on tips to make optimizer use customized projections? thanks for your help in advance.

Comments

  • Hi ,

    Below is an  undocumented method  , do not use it in production without support guideline :

    SELECT set_optimizer_directives('AvoidUsingProjections=proj1,proj2'); < disable projection

    SELECT set_optimizer_directives('AvoidUsingProjections='); <-Enable it back 

    Thanks .

  • You can also check the use of UseOnlyProjections=<your projection name>
    like
    SELECT set_optimizer_directives('UseOnlyProjections=<your projection name>'); 
  • Thanks Eli for the response.  (Official disclaimer -- use of these functions is only officially supported if it is under official Support guidance through MyVertica.)

    A request, if I may:  We are constantly trying to improve our DBD to create optimal projections for your queries, and our Optimizer to always choose the right projection.  Usually both do.  If you are consistently able to do much better by hand, we would really appreciate it if you're willing to post an example so that we can figure out what our implementation could be doing better.

    Adam
  • Thanks Eli & Adam for responding. First of all, using set_optimizer_directive is nothing different than querying custom projection directly. Does Vertica recommend using custom projection in query directly? [My intention is not to override Optimizers plan, just trying to understand better]. What is the flip side of it?
    @Adam: I'm sure DBD is the best bet to create optimal projections . However, as mentioned in documentation, we can create optimized design by modifying or creating the design created by DBD. As i review DBD projections, i thought i should try applying some of the basic tuning principles such as GROUP HASH --> GROUP PIPELINED , Avoiding Data Re-segmentation for GROUP BY queries & Sorting by Low Cardinality to High Cardinality columns etc.


    -- Sample SQL query:

    Select Prod_Tier_Cd As Prod_Tier_Cd ,
    Sum(Total_Store_Inventory_Qty) As Total_Store_Inventory_Qty ,
    Sum(Total_Hub_Inventory_Qty) As Total_Hub_Inventory_Qty  ,
    Sum(In_Transit_Hub_Unit_Qty) As In_Transit_Hub_Unit_Qty  ,
    Sum(In_Transit_Store_Unit_Qty) As In_Transit_Store_Unit_Qty ,
    Sum(Avg_Units_On_Hand_Qty) As Avg_Units_On_Hand_Qty ,
    Sum(Prod_For_Sale_Prev_2_Week_Days) As Prod_For_Sale_Prev_2_Week_Days,
    Sum(Day_1_To_7_Sales_Qty) As Day_1_To_7_Sales_Qty
    FROM TABLEA
    WHERE
    Rptg_Dt='2013-05-13' AND
    Region_Name='APAC' AND
    Country_Name='APAC' AND
    Prod_Categ_Cd IN  ('XYZ','ABC')
    GROUP BY Prod_Tier_Cd

    --DBD Recommended Projection:

    CREATE PROJECTION TABLEA /*+basename(TABLEA),createtype(L)*/ 
    (
     Rptg_Dt ENCODING RLE, 
     Country_Name ENCODING RLE, 
     Prod_Id ENCODING AUTO, 
     Prod_Categ_Cd ENCODING RLE, 
     Region_Name ENCODING RLE, 
     Metric_Categ_Name ENCODING AUTO, 
     Prod_Class_Name ENCODING AUTO, 
     Prod_Tier_Cd ENCODING AUTO, 
     Planner_Cd ENCODING AUTO, 
     Prod_Row_Id ENCODING AUTO, 
     Day_1_To_7_Sales_Qty ENCODING DELTARANGE_COMP 
    )
    AS
     SELECT TABLEA.Rptg_Dt,
            TABLEA.Country_Name,
            TABLEA.Prod_Id,
            TABLEA.Prod_Categ_Cd,
            TABLEA.Region_Name,
            TABLEA.Metric_Categ_Name,
            TABLEA.Prod_Class_Name,
            TABLEA.Prod_Tier_Cd,
            TABLEA.Planner_Cd,
            TABLEA.Prod_Row_Id,
            TABLEA.Day_1_To_7_Sales_Qty
     FROM retail_ops.TABLEA
     ORDER BY Rptg_Dt,
              Country_Name,
              Prod_Categ_Cd,
              Prod_Class_Nr,
              Prod_Row_Id,
              Row_Order_Id,
              Prod_Tier_Cd,
              Prod_Id
    SEGMENTED BY hash(TABLEA.Metric_Categ_Name) ALL NODES
    KSAFE 1 ;


    --Custom Projection with consideration of below tuning principles:

    -- ALL GROUP COLUMNS IN ORDER BY CLAUSE (GROUP PIPELINED INSTEAD OF GROUP BY HASH )
    -- ALL SEGMENTED COLUMNS ARE INCLUDED IN GROUP BY (TO AVOID RUN TIME RE-SEGMENTATION)
    -- STATS Collected.

    CREATE PROJECTION TABLEA_t /*+basename(TABLEA),createtype(L)*/ 
    (
     Rptg_Dt ENCODING RLE, 
     Country_Name ENCODING RLE, 
     Prod_Id ENCODING AUTO, 
     Prod_Categ_Cd ENCODING RLE, 
     Region_Name ENCODING RLE, 
     Metric_Categ_Name ENCODING AUTO, 
     Prod_Class_Name ENCODING AUTO, 
     Prod_Tier_Cd ENCODING AUTO, 
     Planner_Cd ENCODING AUTO, 
     Prod_Row_Id ENCODING AUTO, 
     Day_1_To_7_Sales_Qty ENCODING DELTARANGE_COMP 
    )
    AS
     SELECT TABLEA.Rptg_Dt,
            TABLEA.Country_Name,
            TABLEA.Prod_Id,
            TABLEA.Prod_Categ_Cd,
            TABLEA.Region_Name,
            TABLEA.Metric_Categ_Name,
            TABLEA.Prod_Class_Name,
            TABLEA.Prod_Tier_Cd,
            TABLEA.Planner_Cd,
            TABLEA.Prod_Row_Id,
            TABLEA.Day_1_To_7_Sales_Qty
     FROM retail_ops.TABLEA
     ORDER BY TABLEA.Prod_Tier_Cd,
              TABLEA.Prod_Categ_Cd,
              TABLEA.Region_Name,
              TABLEA.Country_Name,
     TABLEA.Rptg_Dt
    SEGMENTED BY hash(TABLEA.Prod_Tier_Cd) ALL NODES
    KSAFE 1 ;



    -- Trace back in execution_engine_profiles

    -- CLOCK TIME

    --DBD Recommended

    App_owner=> SELECT operator_name,path_id,SUM(counter_value) FROM V_monitor.execution_engine_profiles where transaction_id=63050394795945858 and statement_id=9 and counter_name='clock time (us)' GROUP BY 1,2 ORDER BY path_id ;
     operator_name | path_id |  SUM   
    ---------------+---------+--------
     NewEENode     |      -1 |     93
     ExprEval      |      -1 |    100
     Root          |      -1 |    144
     ExprEval      |       1 |    151
     Sort          |       1 |    476
     ExprEval      |       3 |   1591
     GroupByHash   |       3 |  17451
     ParallelUnion |       3 | 566618
     GroupByPipe   |       3 |   1219
     ExprEval      |       4 |    313
     Scan          |       4 |  12623
    (11 rows)

    -- CUSTOM

    Time: First fetch (11 rows): 1027.724 ms. All rows formatted: 1027.777 ms
    App_owner=> SELECT operator_name,path_id,SUM(counter_value) FROM V_monitor.execution_engine_profiles where transaction_id=63050394795945858 and statement_id=21 and counter_name='clock time (us)' GROUP BY 1,2 ORDER BY path_id ;
     operator_name | path_id |  SUM   
    ---------------+---------+--------
     NewEENode     |      -1 |     67
     ExprEval      |      -1 |     90
     Root          |      -1 |    107
     ExprEval      |       1 |    161
     Sort          |       1 |    444
     ExprEval      |       3 |    111
     GroupByPipe   |       3 |   1127
     ExprEval      |       4 |   1200
     Scan          |       4 | 104084
    (9 rows)


    --EXECUTION TIME (CPU)

    --DBD Recommended

    App_owner=> SELECT operator_name,path_id,SUM(counter_value) FROM V_monitor.execution_engine_profiles where transaction_id=63050394795945858 and statement_id=9 and counter_name='execution time (us)' GROUP BY 1,2 ORDER BY path_id ;
     operator_name | path_id |  SUM  
    ---------------+---------+-------
     NewEENode     |      -1 |    71
     ExprEval      |      -1 |    86
     Root          |      -1 |   140
     ExprEval      |       1 |   147
     NetworkSend   |       1 |    58
     Sort          |       1 |   442
     NetworkRecv   |       1 |   244
     ExprEval      |       3 |   577
     GroupByHash   |       3 | 13809
     NetworkRecv   |       3 |  1297
     ParallelUnion |       3 |  5797
     GroupByPipe   |       3 |  1154
     StorageUnion  |       3 |  3337
     NetworkSend   |       3 |   187
     ExprEval      |       4 |   264
     Scan          |       4 | 12505
    (16 rows)


    -- CUSTOM

    Time: First fetch (16 rows): 972.249 ms. All rows formatted: 972.314 ms
    App_owner=> SELECT operator_name,path_id,SUM(counter_value) FROM V_monitor.execution_engine_profiles where transaction_id=63050394795945858 and statement_id=21 and counter_name='execution time (us)' GROUP BY 1,2 ORDER BY path_id ;
     operator_name | path_id |  SUM  
    ---------------+---------+-------
     NewEENode     |      -1 |    58
     ExprEval      |      -1 |    77
     Root          |      -1 |   104
     ExprEval      |       1 |   159
     NetworkSend   |       1 |    47
     Sort          |       1 |   399
     NetworkRecv   |       1 |   256
     ExprEval      |       3 |    93
     StorageMerge  |       3 |   547
     GroupByPipe   |       3 |   536
     ExprEval      |       4 |   470
     Scan          |       4 | 72454
    (12 rows)

    --OBSERVATIONS:
    CUSTOM has better stats than DBD recommended . However, Optimizer is showing DBD recommended/Super projection in Explain plan.




  • Hi Mahesh,

    Thanks for the detailed reply.  Hm...  When you ran the DBD, did you provide it with this as one of its example queries?  That doesn't look to me like a design that we would produce for that query.

    Thanks,
    Adam
  • Did you ensure there were no tuple mover operations before executing each test? I would strongly encourage getting at least 5 trials per projection.

    Adding up the counter values, the DBD recommended profile appears to be 35,000 counters less than your custom projection.

    What we really need to see is the explain plan for the query, using the DBD projection, and your custom projection.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file