Dimensions with sequences/auto-increment
My dimension table surrogate keys are integers. When loading the dimension table, I check the table maximum and add 1 to it in the next row. By any chance, is that a problem? Because, the lookup and data loading with Pentaho is like 30ish rows per second.
Suggestions?
0
Comments
Let Vertica do the job. Especially, if you insert row by row, it's like trying to feed an elephant with a teaspoon. You need shovels or frontloaders for that job. That means, many, many, many rows in one SQL call.
To be sure you get new rows inserted, define your table like so:
CREATE TABLE dim_cust (
cust_key IDENTITY NOT NULL
, cust_no INT NOT NULL
, cust_name VARCHAR(32)
, cust_dob DATE
, PRIMARY KEY (cust_key)
) UNSEGMENTED ALL NODES ; -- if it's a smaller table
An IDENTITY key must not be referenced in an INSERT or UPDATE statement, so you would only reference
cust_no
,cust_name
andcust_dob
in your insert job by Pentaho.cust_key
gets populated by Vertica.And make sure you load thousands of rows in one SQL call.
Good luck
Marco
Marco is right, let Vertica do the work
Fyi, in addition to IDENTITY columns, you can also use a sequence to generate a unique key value for a table column as a default...