Best projection design
Hello,
I have a where clause on a view like this.
It executes many time ( more then 5000 in span of 3 hrs ).
I am seeing different execution time for each instance of this query which I want to minimize.
Also, Can we create manual projections with derived columns. Currently I am getting error on this.
I have a where clause on a view like this.
where period_key between 20111207 and 20130101 and vendor_key=300 and retailer_key=6 and item_key in (2891934) ORDER BY STORE_KEY, PERIOD_KEYThe view is a result of join between a fact and dimension table on store_key.The fact projection that is currently in use by this query is
ORDER BY STORE_KEY, PERIOD_KEYCan anybody suggest, what should be the best design for the query predicate stated above, The only thing that changes in the query everytime it is executes is the item number.
SEGMENTED BY hash(STORE_KEY,PERIOD_KEY) ALL NODES
It executes many time ( more then 5000 in span of 3 hrs ).
I am seeing different execution time for each instance of this query which I want to minimize.
Also, Can we create manual projections with derived columns. Currently I am getting error on this.
0
Comments
I have had a similar requirement for providing consistent query performance, and have found guidelines provided here very useful. Adding the predicate columns (period_key, vendor_key, retail_key, item_key) in the projection "order by" clause before the existing columns should help performance.
You've also used the columns used in the where and order by clause for segmenting data across nodes. This can skew your workload to a single or a small number of nodes, as data relevant to the query is more likely to be localized in that node(s). If you have a unique key or a combination of columns that collectively result in a high cardinality, then the column(s) will work better for segmentation.
Hope that helps.
Thanks
Sajan