Optimizing performance of a query with row_number

dafna · May 2015

I am trying to optimize the performance of a query.

Here is an example of 2 queries that demonstrate the issue we have.

Both queries return the same results and have the same explain plan but the execution time is very different.

Query 1 - execution time 0.3 sec

with test as

(

SELECT

case when ROW_NUMBER() OVER (PARTITION BY PURCHASE_ID ORDER BY ACTION_TIME ASC, GOAL_MATCH_IND DESC NULLS LAST)=1 then value end as REVENUE

FROM B_DEFAULT.RPT_FA_CONTRIBUTION F

where account_id = '123'

and purchase_id is not null

)

select sum(REVENUE)

from test

explain:

Access Path:

+-GROUPBY NOTHING [Cost: 89K, Rows: 1] (PATH ID: 1)

| Aggregates: sum(test.REVENUE)

| Execute on: All Nodes

| +---> ANALYTICAL [Cost: 89K, Rows: 2M] (PATH ID: 3)

| | Analytic Group

| | Functions: row_number()

| | Group Global & Local Resegment: F.PURCHASE_ID

| | Group Sort: F.PURCHASE_ID ASC, F.ACTION_TIME ASC NULLS LAST, F.GOAL_MATCH_IND DESC NULLS LAST

| | Execute on: All Nodes

| | +---> STORAGE ACCESS for F [Cost: 7K, Rows: 2M] (PATH ID: 4)

| | | Projection: B_DEFAULT.RPT_FA_CONTRIBUTION_CROSS_SESSION_b0

| | | Materialize: F.PURCHASE_ID, F.VALUE, F.ACTION_TIME, F.GOAL_MATCH_IND

| | | Filter: (F.ACCOUNT_ID = '123')

| | | Filter: (F.PURCHASE_ID IS NOT NULL)

| | | Execute on: All Nodes

Query 2 - execution time 3.3 sec
(removed the purchase_id is not null filter)

with test as

(

SELECT

case when ROW_NUMBER() OVER (PARTITION BY PURCHASE_ID ORDER BY ACTION_TIME ASC, GOAL_MATCH_IND DESC NULLS LAST)=1 then value end as REVENUE

FROM B_DEFAULT.RPT_FA_CONTRIBUTION F

where account_id = '123'

)

select sum(REVENUE)

from test

explain -

Access Path:

+-GROUPBY NOTHING [Cost: 89K, Rows: 1] (PATH ID: 1)

| Aggregates: sum(test.REVENUE)

| Execute on: All Nodes

| +---> ANALYTICAL [Cost: 89K, Rows: 2M] (PATH ID: 3)

| | Analytic Group

| | Functions: row_number()

| | Group Global & Local Resegment: F.PURCHASE_ID

| | Group Sort: F.PURCHASE_ID ASC, F.ACTION_TIME ASC NULLS LAST, F.GOAL_MATCH_IND DESC NULLS LAST

| | Execute on: All Nodes

| | +---> STORAGE ACCESS for F [Cost: 7K, Rows: 2M] (PATH ID: 4)

| | | Projection: B_DEFAULT.RPT_FA_CONTRIBUTION_CROSS_SESSION_b0

| | | Materialize: F.VALUE, F.ACTION_TIME, F.GOAL_MATCH_IND, F.PURCHASE_ID

| | | Filter: (F.ACCOUNT_ID = '123')

| | | Execute on: All Nodes

* The reason that both queries return the same result is that the value column is null where purchase_id is also null.

Though it does not look like it from the explain plans , I believe that query 1 is faster because the analytical row_number function is running on a smaller number of rows.

I am trying to understand how to re-write query 2 to improve it's performance but without adding the filter on the where clause.

I tried adding this filter to the case, like:

case when purchase_id is not null and ROW_NUMBER()....

But this does not help.

We would love to hear your ideas

[Deleted User] · May 2015

I checked with a technical expert and his reply was:

The first query will process fewer number of rows because of the additional filter, unless the purchase_id column does not contain NULL.

I hope that helps.

Sarah

dafna · May 2015

Hi Sarah,

That's true, but I am looking for a way to filter out the nulls without adding the condition ( ... is not null) to the where clause.

The reason for this is that I use the inner query as part of a view and some calculated columns in the view do require going over the null values.

Thank you for your help.

[Deleted User] · May 2015

I'm not clear on whether you want to ignore the NULL values or not.

Can you file a support case with Vertica? I think this requires more details.

Sorry that I'm not more help,

Sarah

We're Moving!

Create My New Community Account Now

Optimizing performance of a query with row_number

Comments

Leave a Comment