Best projection design

Navin_C · June 2014

Hello,

I have a where clause on a view like this.

where period_key between 20111207 and 20130101 and vendor_key=300 and retailer_key=6 and item_key in (2891934) ORDER BY STORE_KEY, PERIOD_KEY

The view is a result of join between a fact and dimension table on store_key.The fact projection that is currently in use by this query is

ORDER BY STORE_KEY, PERIOD_KEY
SEGMENTED BY hash(STORE_KEY,PERIOD_KEY) ALL NODES

Can anybody suggest, what should be the best design for the query predicate stated above, The only thing that changes in the query everytime it is executes is the item number.
It executes many time ( more then 5000 in span of 3 hrs ).

I am seeing different execution time for each instance of this query which I want to minimize.

Also, Can we create manual projections with derived columns. Currently I am getting error on this.

Sajan_Sangraula · June 2014

Hi Navin,
I have had a similar requirement for providing consistent query performance, and have found guidelines provided here very useful. Adding the predicate columns (period_key, vendor_key, retail_key, item_key) in the projection "order by" clause before the existing columns should help performance.

You've also used the columns used in the where and order by clause for segmenting data across nodes. This can skew your workload to a single or a small number of nodes, as data relevant to the query is more likely to be localized in that node(s). If you have a unique key or a combination of columns that collectively result in a high cardinality, then the column(s) will work better for segmentation.

Hope that helps.

Thanks
Sajan

We're Moving!

Create My New Community Account Now

Best projection design

Comments

Leave a Comment