We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


Can I use any of Vertica's analytic functions to improve this query? — Vertica Forum

Can I use any of Vertica's analytic functions to improve this query?

I'm trying to do a little bake off between our existing Exadata installation and Vertica. I was wondering if there are Vertica specific analytic functions that would improve this query?

select s.REGION, s.FUND_CODE, sum(s.transaction_amount)  from TRANSACTIONS s where s.TRANSACTION_TYPE = 'P'
and YEAR (s.TRANSACTION_DATE) = YEAR (current_date)
group by s.REGION, s.FUND_CODE
order by s.REGION, sum(s.transaction_amount) desc
limit 10;

It's runs fine in Oracle, but in Vertica this query takes close to a minute with the YEAR function. If I remove the YEAR function it only takes 8 seconds. 

I looked at windowing but I'm not sure I have the correct use case. I'm simply looking to run an aggregation query. I have passed this through the database designer. 

Here is the explain plan

Access Path:                                                                                                                                                                                                                                                                                                                                                                                                                                                        +-SELECT  LIMIT 10 [Cost: 409K, Rows: 10] (PATH ID: 0)                                                                
|  Output Only: 10 tuples                                                                                                          
|  Execute on: Query Initiator                                                               
| +---> SORT [TOPK] [Cost: 409K, Rows: 2] (PATH ID: 1)
| |      Order: s.REPID ASC, sum(s.transaction_amount) DESC
| |      Output Only: 10 tuples                                                                                                        
| |      Execute on: All Nodes 
| | +---> GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 409K, Rows: 2] (PATH ID: 2)
| | |      Aggregates: sum(s.transaction_amount)
| | |      Group By: s.REPID                                                                                                         
| | |      Execute on: All Nodes                           
| | | +---> STORAGE ACCESS for s [Cost: 369K, Rows: 1M] (PATH ID: 3)                                     
| | | |      Projection: transaction_schema.transactions_DBD_1_seg_trans_proj_b0
| | | |      Materialize: s.REPID, s.transaction_amount 
| | | |      Filter: (s.TRANSACTION_TYPE = 'P') 
| | | |      Filter: (s.REPID = '000049')
| | | |      Execute on: All Nodes    

Comments

  • 1. Can you post the plan without the year function when it take 8 seconds
    2. what is  s.REPID = '000049/Group By: s.REPID , i dont see this col in the query. is the plan accurate?
  • I don't think analytics help here.

    In your plan, I don't see anything related to your YEAR filter. Is the plan you posted correct? I suspect that since you are using the YEAR function around transaction_date, it causes Vertica to materialize that column before evaluating the filter. If you changed your query to something like

    transaction_date between timestamp_trunc(current_date, 'YEAR') and timestamp_trunc(current_date, 'YEAR')+364+23/24 

    it might help.

    Also, another option is if you actually had a YEAR column in your table, it would be low cardinality and a filter on that column could be very fast, especially if it was early in the sort order of a projection.
  • marcothesanemarcothesane - Select Field - Administrator

    Just try:

    and s.TRANSACTION_DATE >= trunc(current_date,'year')
    and s.TRANSACTION_DATE < add_months(trunc(current_date,'year'),12)

    Comparing the unmodified column's content with a value derived once will be much faster, and will not force Vertica to materialize the TRANSACTION_DATE column ...

    Good luck ....

  • Thanks for the tips. I gave these all a shot and it turned out that the biggest improvement was refactoring it into nested subqueries. 

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file