Using Slowly Changing Dimension
Since client source data is not provided with correct end time period for SCD, I am trying to use start date as time interval and convert the scd table to view price listed for each date on separate line. Please check below scd table (note that product_id and start_date are PK):
The output of the view would be following:
Appreciate any feedback.
Tagged:
0
Answers
Looks like a use-case for Time Series Analytics?
.
Also, if you're looking to fix end_date. See if using the lead analytical function would help?
This depends on whether you want your product table to be an SCD table or not.
If you have something like a fact table near it, I'd model the table like so:
CREATE TABLE d_product (
prdkey INT NOT NULL PRIMARY KEY DEFAULT HASH(prdid,start_date)
, prdid INT NOT NULL
, start_date DATE NOT NULL
, end_date DATE NOT NULL
, price NUMERIC(18,2)
);
And I'd fill it like so (sorry for the lack of monospace font, but I could simply not get it to display....):
If ever you can, remove redundancies.
And use the one-photograph-per-day approach using the TIMESERIES clause, as suggested by @LenoyJ - of which I'm deeply fond, too, at reporting time only;
Don't create views that produce much more rows than will ever be used in a report, and, by all means, don't produce tables that risk ending up containing millions of redundant rows ...
Great suggestions! Thank you very much. I have already created a view with lead statement to correct end date and joined product table to date table based on start / end date interval. Just curious, why would you state not to use views?
In the example I provided, i gave the final pricing table; however, client provides us partial updates. Sometimes, these are not updates at all, as you can see, the start date might change, but price remains the same. In other cases, price changes; while start/end dates remain the same. I load all new data to a temp table (src) first and then use merge to product dim (tgt) to identify updates and inserts. I limit updates only to records witch changed end date or price. Inserts are defined by a new start date. After reloading all pricing files from client, i noticed that end date is incorrectly managed on client end and I need to fix it on my end. The best option I saw was to load the data as is (using merge) and then create a view on the product dim to correct end dates. Would you recommend a different approach?
I don't mean to not use views in general. But a view that generates one row per day, even if a "product" changes, say, once a year, would be 364 rows too many.
I would use:
SELECT * FROM temp_table
UNION ALL
SELECT * FROM target_product_dim
WHERE start_date >= (SELECT MIN(start_date) FROM temp_table);
And then use the output of this select for the query I wrote above, removing all rows with equal prices, and replace the rows from the target table with the same time range with the output of my query above. Storing data on-change-only, and with no gaps in the timeline.