Optimizing performance of a query with row_number

I am trying to optimize the performance of a query.
Here is an example of 2 queries that demonstrate the issue we have.
Both queries return the same results and have the same explain plan but the execution time is very different.
 
Query 1 - execution time 0.3 sec
with test as
(
SELECT
case when ROW_NUMBER() OVER (PARTITION BY PURCHASE_ID  ORDER BY ACTION_TIME ASC, GOAL_MATCH_IND DESC NULLS LAST)=1 then value end as REVENUE
FROM B_DEFAULT.RPT_FA_CONTRIBUTION F
where account_id = '123'
and purchase_id is not null
)
select sum(REVENUE)
from test
 
 
explain:
 
Access Path:
+-GROUPBY NOTHING [Cost: 89K, Rows: 1] (PATH ID: 1)
|  Aggregates: sum(test.REVENUE)
|  Execute on: All Nodes
|  Execute on: All Nodes
| +---> ANALYTICAL [Cost: 89K, Rows: 2M] (PATH ID: 3)
| |      Analytic Group
| |       Functions: row_number()
| |       Group Global & Local Resegment: F.PURCHASE_ID
| |       Group Sort: F.PURCHASE_ID ASC, F.ACTION_TIME ASC NULLS LAST, F.GOAL_MATCH_IND DESC NULLS LAST
| |      Execute on: All Nodes
| | +---> STORAGE ACCESS for F [Cost: 7K, Rows: 2M] (PATH ID: 4)
| | |      Projection: B_DEFAULT.RPT_FA_CONTRIBUTION_CROSS_SESSION_b0
| | |      Materialize: F.PURCHASE_ID, F.VALUE, F.ACTION_TIME, F.GOAL_MATCH_IND
| | |      Filter: (F.ACCOUNT_ID = '123')
| | |      Filter: (F.PURCHASE_ID IS NOT NULL)
| | |      Execute on: All Nodes
 
 
Query 2 - execution time 3.3 sec
(removed the purchase_id is not null filter)
with test as
(
SELECT
case when ROW_NUMBER() OVER (PARTITION BY PURCHASE_ID  ORDER BY ACTION_TIME ASC, GOAL_MATCH_IND DESC NULLS LAST)=1 then value end as REVENUE
FROM B_DEFAULT.RPT_FA_CONTRIBUTION F
where account_id = '123'
)
select sum(REVENUE)
from test
 
explain - 
Access Path:
+-GROUPBY NOTHING [Cost: 89K, Rows: 1] (PATH ID: 1)
|  Aggregates: sum(test.REVENUE)
|  Execute on: All Nodes
|  Execute on: All Nodes
| +---> ANALYTICAL [Cost: 89K, Rows: 2M] (PATH ID: 3)
| |      Analytic Group
| |       Functions: row_number()
| |       Group Global & Local Resegment: F.PURCHASE_ID
| |       Group Sort: F.PURCHASE_ID ASC, F.ACTION_TIME ASC NULLS LAST, F.GOAL_MATCH_IND DESC NULLS LAST
| |      Execute on: All Nodes
| | +---> STORAGE ACCESS for F [Cost: 7K, Rows: 2M] (PATH ID: 4)
| | |      Projection: B_DEFAULT.RPT_FA_CONTRIBUTION_CROSS_SESSION_b0
| | |      Materialize: F.VALUE, F.ACTION_TIME, F.GOAL_MATCH_IND, F.PURCHASE_ID
| | |      Filter: (F.ACCOUNT_ID = '123')
| | |      Execute on: All Nodes
 
* The reason that both queries return the same result is that the value column is null where purchase_id is also null.
 
Though it does not look like it from the explain plans , I believe that query 1 is faster because the analytical row_number function is running on a smaller number of rows.
I am trying to understand how to re-write query 2 to improve it's performance but without adding the filter on the where clause.
 
I tried adding this filter to the case, like:
case when purchase_id is not null and  ROW_NUMBER()....
But this does not help.
 
We would love to hear your ideas image

Comments

  • I checked with a technical expert and his reply was:

     

    The first query will process fewer number of rows because of the additional filter, unless the purchase_id column does not contain NULL.

     

    I hope that helps.

     

    Sarah

     

  • Hi Sarah,

     

    That's true, but I am looking for a way to filter out the nulls without adding the condition ( ... is not null) to the where clause.

    The reason for this is that I use the inner query as part of a view and some calculated columns in the view do require going over the null values. 

     

    Thank you for your help.

  • I'm not clear on whether you want to ignore the NULL values or not.

     

    Can you file a support case with Vertica? I think this requires more details.

     

    Sorry that I'm not more help,

     

    Sarah

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file