Slow Performance on Flattened Facts Table
Hi, hope someone had a similar issue. I have facts table with about 400 mln records and 20 columns. Client requested to provide information from facts table and additional attributes from 4 Dim tables.
DimA - 21 attributes
DimB - 5 atributes
DimC - 4 attributes
DimD - 2 attributes.
My first instinct was to create a view with all the joins to Dim tables to bring relevant data into one structure. I noticed that as I was adding joins from facts table to dimensions, the performance of the view was gradually degrading (I started adding the lowest DimD to the highest DimA).
The performance on the final view, with all 4 Dim added, was unusable for the client to work.
My second approach was to build a flat table with all facts and all relevant attributes added as columns via SET USING function. A/C to Vertica documentation, this is recommended setup on very large table; however, REFRESH_COLUMNS takes hours (>3 hrs now) to populate! This is again unacceptable.
I plan to change the column setting on attributes from DimB-C to use DEFAULT instead of SET USING, but DimA is very volatile (meaning attributes in that table get updated daily and are very likely to change). Client requests to have these attributes up_to date on entire Facts data.
Any recommendations are appreciated. How to make either views or flattened tabel work faster and more efficient.