Best projection design

Hello,

I have a where clause on a view like this.
where period_key between 20111207 and 20130101 and vendor_key=300 and retailer_key=6 and item_key in (2891934) ORDER BY STORE_KEY, PERIOD_KEY
The view is a result of join between a fact and dimension table  on store_key.The fact projection that is currently in use by this query is
ORDER BY STORE_KEY, PERIOD_KEY
SEGMENTED BY hash(STORE_KEY,PERIOD_KEY) ALL NODES
Can anybody suggest, what should be the best design for the query predicate stated above, The only thing that changes in the query everytime it is executes is the item number.
It executes many time ( more then 5000 in span of 3 hrs ).

I am seeing different execution time for each instance of this query which I want to minimize.

Also, Can we create manual projections with derived columns. Currently I am getting error on this.




Comments

  • Hi Navin,
    I have had a similar requirement for providing consistent query performance, and have found guidelines provided here very useful. Adding the predicate columns (period_key, vendor_key, retail_key, item_key) in the projection "order by" clause before the existing columns should help performance. 

    You've also used the columns used in the where and order by clause for segmenting data across nodes. This can skew your workload to a single or a small number of nodes, as data relevant to the query is more likely to be localized in that node(s). If you have a unique key or a combination of columns that collectively result in a high cardinality, then the column(s) will work better for segmentation.

    Hope that helps.

    Thanks
    Sajan

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file