SCD type 1 with merge table data

mpedroso · September 2018

Hi

I have to create an ETL for my diodes using SCD type 1. Soon, I thought of carrying out the following steps:

load the target table with the fields of the dimension + an id field, the id field being one of type identity. In this way, entering the data would create a substitute key for the record.
Load the source table with the updated data (no substitute key) and perform a merge command to update the target table records so that the new records are inserted and a substitute key is automatically generated by the ID field of type identity. In case the record does not exist the specified fields would be updated.

Table target:
CREATE TABLE target (id IDENTITY, a INT, b INT, c INT);

Table Source:
CREATE TABLE source (a INT, b INT, c INT);

Command merge:
MERGE INTO target t USING source s ON t.a = s.a
WHEN MATCHED THEN UPDATE SET a = s.a, b = s.b, c = s.c
WHEN NOT MATCHED THEN INSERT (a, b, c) VALUES (s.a, s.b, s.c);

Any solution to the problem?

marcothesane · September 2018

You would have to keep the "identity" column in the staging table, too. And define it as INT NOT NULL in the SCD table, and as INT NOT NULL DEFAULT .nextval in your staging table.

If your dimension table is not too big, I would also experiment with the hash value of the natural identifier (combined with the valid-from date if you also have Type 2 columns) as the default, as the hash collision risk is extremely low.

Check out my two blogs on Slowly Changing Dimensions in Vertica, if you find time:

https://www.linkedin.com/pulse/veterans-bit-trickier-maintain-history-your-data-slowly-marco-gessner/

and:

https://www.linkedin.com/pulse/said-veterans-tricky-maintain-even-vertica-batches-doubly-gessner/

Happy playing ...
Marco

marcothesane · September 2018

I meant <sequence>.nextval

We're Moving!

Create My New Community Account Now

SCD type 1 with merge table data

Comments

Leave a Comment