Can I use any of Vertica's analytic functions to improve this query?

Ryan_Fitch_1 · October 2014

I'm trying to do a little bake off between our existing Exadata installation and Vertica. I was wondering if there are Vertica specific analytic functions that would improve this query?

select s.REGION, s.FUND_CODE, sum(s.transaction_amount) from TRANSACTIONS s where s.TRANSACTION_TYPE = 'P'
and YEAR (s.TRANSACTION_DATE) = YEAR (current_date)
group by s.REGION, s.FUND_CODE
order by s.REGION, sum(s.transaction_amount) desc
limit 10;

It's runs fine in Oracle, but in Vertica this query takes close to a minute with the YEAR function. If I remove the YEAR function it only takes 8 seconds.

I looked at windowing but I'm not sure I have the correct use case. I'm simply looking to run an aggregation query. I have passed this through the database designer.

Here is the explain plan

Access Path: +-SELECT LIMIT 10 [Cost: 409K, Rows: 10] (PATH ID: 0)
| Output Only: 10 tuples
| Execute on: Query Initiator
| +---> SORT [TOPK] [Cost: 409K, Rows: 2] (PATH ID: 1)
| | Order: s.REPID ASC, sum(s.transaction_amount) DESC
| | Output Only: 10 tuples
| | Execute on: All Nodes
| | +---> GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 409K, Rows: 2] (PATH ID: 2)
| | | Aggregates: sum(s.transaction_amount)
| | | Group By: s.REPID
| | | Execute on: All Nodes
| | | +---> STORAGE ACCESS for s [Cost: 369K, Rows: 1M] (PATH ID: 3)
| | | | Projection: transaction_schema.transactions_DBD_1_seg_trans_proj_b0
| | | | Materialize: s.REPID, s.transaction_amount
| | | | Filter: (s.TRANSACTION_TYPE = 'P')
| | | | Filter: (s.REPID = '000049')
| | | | Execute on: All Nodes

[Deleted User] · October 2014

1. Can you post the plan without the year function when it take 8 seconds
2. what is s.REPID = '000049/Group By: s.REPID , i dont see this col in the query. is the plan accurate?

David_Rydzewski · October 2014

I don't think analytics help here.

In your plan, I don't see anything related to your YEAR filter. Is the plan you posted correct? I suspect that since you are using the YEAR function around transaction_date, it causes Vertica to materialize that column before evaluating the filter. If you changed your query to something like

transaction_date between timestamp_trunc(current_date, 'YEAR') and timestamp_trunc(current_date, 'YEAR')+364+23/24

it might help.

Also, another option is if you actually had a YEAR column in your table, it would be low cardinality and a filter on that column could be very fast, especially if it was early in the sort order of a projection.

marcothesane · October 2014

Just try:

and s.TRANSACTION_DATE >= trunc(current_date,'year')

and s.TRANSACTION_DATE < add_months(trunc(current_date,'year'),12)

Comparing the unmodified column's content with a value derived once will be much faster, and will not force Vertica to materialize the TRANSACTION_DATE column ...

Good luck ....

Ryan_Fitch_1 · October 2014

Thanks for the tips. I gave these all a shot and it turned out that the biggest improvement was refactoring it into nested subqueries.

We're Moving!

Create My New Community Account Now

Can I use any of Vertica's analytic functions to improve this query?

Comments

Leave a Comment