Options

Can I use any of Vertica's analytic functions to improve this query?

I'm trying to do a little bake off between our existing Exadata installation and Vertica. I was wondering if there are Vertica specific analytic functions that would improve this query?

select s.REGION, s.FUND_CODE, sum(s.transaction_amount)  from TRANSACTIONS s where s.TRANSACTION_TYPE = 'P'
and YEAR (s.TRANSACTION_DATE) = YEAR (current_date)
group by s.REGION, s.FUND_CODE
order by s.REGION, sum(s.transaction_amount) desc
limit 10;

It's runs fine in Oracle, but in Vertica this query takes close to a minute with the YEAR function. If I remove the YEAR function it only takes 8 seconds. 

I looked at windowing but I'm not sure I have the correct use case. I'm simply looking to run an aggregation query. I have passed this through the database designer. 

Here is the explain plan

Access Path:                                                                                                                                                                                                                                                                                                                                                                                                                                                        +-SELECT  LIMIT 10 [Cost: 409K, Rows: 10] (PATH ID: 0)                                                                
|  Output Only: 10 tuples                                                                                                          
|  Execute on: Query Initiator                                                               
| +---> SORT [TOPK] [Cost: 409K, Rows: 2] (PATH ID: 1)
| |      Order: s.REPID ASC, sum(s.transaction_amount) DESC
| |      Output Only: 10 tuples                                                                                                        
| |      Execute on: All Nodes 
| | +---> GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 409K, Rows: 2] (PATH ID: 2)
| | |      Aggregates: sum(s.transaction_amount)
| | |      Group By: s.REPID                                                                                                         
| | |      Execute on: All Nodes                           
| | | +---> STORAGE ACCESS for s [Cost: 369K, Rows: 1M] (PATH ID: 3)                                     
| | | |      Projection: transaction_schema.transactions_DBD_1_seg_trans_proj_b0
| | | |      Materialize: s.REPID, s.transaction_amount 
| | | |      Filter: (s.TRANSACTION_TYPE = 'P') 
| | | |      Filter: (s.REPID = '000049')
| | | |      Execute on: All Nodes    

Comments

  • Options
    1. Can you post the plan without the year function when it take 8 seconds
    2. what is  s.REPID = '000049/Group By: s.REPID , i dont see this col in the query. is the plan accurate?
  • Options
    I don't think analytics help here.

    In your plan, I don't see anything related to your YEAR filter. Is the plan you posted correct? I suspect that since you are using the YEAR function around transaction_date, it causes Vertica to materialize that column before evaluating the filter. If you changed your query to something like

    transaction_date between timestamp_trunc(current_date, 'YEAR') and timestamp_trunc(current_date, 'YEAR')+364+23/24 

    it might help.

    Also, another option is if you actually had a YEAR column in your table, it would be low cardinality and a filter on that column could be very fast, especially if it was early in the sort order of a projection.
  • Options
    marcothesanemarcothesane - Select Field - Administrator

    Just try:

    and s.TRANSACTION_DATE >= trunc(current_date,'year')
    and s.TRANSACTION_DATE < add_months(trunc(current_date,'year'),12)

    Comparing the unmodified column's content with a value derived once will be much faster, and will not force Vertica to materialize the TRANSACTION_DATE column ...

    Good luck ....

  • Options
    Thanks for the tips. I gave these all a shot and it turned out that the biggest improvement was refactoring it into nested subqueries. 

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file