I am working with Vertica for few months, creating a DWH for online retails. I am new to Vertica but I have already 7 years of experions with DWH.
Right now I am working on optimizing my star schemas and data-marts on top of them. Facts are Hundred Million Rows, one fact 1 billion+ rows. 50+ dimensions.
There are several targets:
1) SSAS ROLAP cube that queries vertica. There is already a working prototype.
2) Repors (queries with filters and group bys and joins, sometimes window functions)
3) Data Exports for the purposes of data mining
I am looking for advises on how to improve what I have created.
First of all. The data changes in the past. Orders for the last days undergo frequent changes, orders for last month undergo some changes, orders for the prior period can undergo very few changes but such changes do exist even until 2009. Data is partitioned by month key but most frequent filter are by day key. month key and daya key are INTs, i.e. 201512 and 20151209 respectively.
I am using projections but where data changes I cannot use preaggegare or prejoin projections. Can someone tell, is there a workaround to utilize the power of projections here?
It there a way to tell query engine that day key 20151209 is strictly related to month key 201512, because I see that query tries to scan all partitions? There are other relations between dimesions, for example, supplier in the majority of cases is related to certain brand. And certain SKU has cerain size.
I have already played with database desiner but queries are numerous and new types or queries are anticipated. What are the general principals for desining facts and dimensions.
We have keys, mostly int, fields for measures (mosly floating point, sometime ints) and we have dimesions (dictionaries) with int keys and string attributes. Are there advisable general proncipals of designing domension and fact a) projections, 2) encodings 3) access ranks 4) constraints ?
Are there any other general or specific advises?