Can I use any of Vertica's analytic functions to improve this query?
I'm trying to do a little bake off between our existing Exadata installation and Vertica. I was wondering if there are Vertica specific analytic functions that would improve this query?
select s.REGION, s.FUND_CODE, sum(s.transaction_amount) from TRANSACTIONS s where s.TRANSACTION_TYPE = 'P'
and YEAR (s.TRANSACTION_DATE) = YEAR (current_date)
group by s.REGION, s.FUND_CODE
order by s.REGION, sum(s.transaction_amount) desc
limit 10;
It's runs fine in Oracle, but in Vertica this query takes close to a minute with the YEAR function. If I remove the YEAR function it only takes 8 seconds.
I looked at windowing but I'm not sure I have the correct use case. I'm simply looking to run an aggregation query. I have passed this through the database designer.
Here is the explain plan
Access Path: +-SELECT LIMIT 10 [Cost: 409K, Rows: 10] (PATH ID: 0)
| Output Only: 10 tuples
| Execute on: Query Initiator
| +---> SORT [TOPK] [Cost: 409K, Rows: 2] (PATH ID: 1)
| | Order: s.REPID ASC, sum(s.transaction_amount) DESC
| | Output Only: 10 tuples
| | Execute on: All Nodes
| | +---> GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 409K, Rows: 2] (PATH ID: 2)
| | | Aggregates: sum(s.transaction_amount)
| | | Group By: s.REPID
| | | Execute on: All Nodes
| | | +---> STORAGE ACCESS for s [Cost: 369K, Rows: 1M] (PATH ID: 3)
| | | | Projection: transaction_schema.transactions_DBD_1_seg_trans_proj_b0
| | | | Materialize: s.REPID, s.transaction_amount
| | | | Filter: (s.TRANSACTION_TYPE = 'P')
| | | | Filter: (s.REPID = '000049')
| | | | Execute on: All Nodes
select s.REGION, s.FUND_CODE, sum(s.transaction_amount) from TRANSACTIONS s where s.TRANSACTION_TYPE = 'P'
and YEAR (s.TRANSACTION_DATE) = YEAR (current_date)
group by s.REGION, s.FUND_CODE
order by s.REGION, sum(s.transaction_amount) desc
limit 10;
It's runs fine in Oracle, but in Vertica this query takes close to a minute with the YEAR function. If I remove the YEAR function it only takes 8 seconds.
I looked at windowing but I'm not sure I have the correct use case. I'm simply looking to run an aggregation query. I have passed this through the database designer.
Here is the explain plan
Access Path: +-SELECT LIMIT 10 [Cost: 409K, Rows: 10] (PATH ID: 0)
| Output Only: 10 tuples
| Execute on: Query Initiator
| +---> SORT [TOPK] [Cost: 409K, Rows: 2] (PATH ID: 1)
| | Order: s.REPID ASC, sum(s.transaction_amount) DESC
| | Output Only: 10 tuples
| | Execute on: All Nodes
| | +---> GROUPBY PIPELINED (GLOBAL RESEGMENT GROUPS) [Cost: 409K, Rows: 2] (PATH ID: 2)
| | | Aggregates: sum(s.transaction_amount)
| | | Group By: s.REPID
| | | Execute on: All Nodes
| | | +---> STORAGE ACCESS for s [Cost: 369K, Rows: 1M] (PATH ID: 3)
| | | | Projection: transaction_schema.transactions_DBD_1_seg_trans_proj_b0
| | | | Materialize: s.REPID, s.transaction_amount
| | | | Filter: (s.TRANSACTION_TYPE = 'P')
| | | | Filter: (s.REPID = '000049')
| | | | Execute on: All Nodes
0
Comments
2. what is s.REPID = '000049/Group By: s.REPID , i dont see this col in the query. is the plan accurate?
In your plan, I don't see anything related to your YEAR filter. Is the plan you posted correct? I suspect that since you are using the YEAR function around transaction_date, it causes Vertica to materialize that column before evaluating the filter. If you changed your query to something like
transaction_date between timestamp_trunc(current_date, 'YEAR') and timestamp_trunc(current_date, 'YEAR')+364+23/24
it might help.
Also, another option is if you actually had a YEAR column in your table, it would be low cardinality and a filter on that column could be very fast, especially if it was early in the sort order of a projection.
Just try:
Comparing the unmodified column's content with a value derived once will be much faster, and will not force Vertica to materialize the TRANSACTION_DATE column ...
Good luck ....