Dimensions with sequences/auto-increment
My dimension table surrogate keys are integers. When loading the dimension table, I check the table maximum and add 1 to it in the next row. By any chance, is that a problem? Because, the lookup and data loading with Pentaho is like 30ish rows per second.
Suggestions?
0
Comments
Let Vertica do the job. Especially, if you insert row by row, it's like trying to feed an elephant with a teaspoon. You need shovels or frontloaders for that job. That means, many, many, many rows in one SQL call.
To be sure you get new rows inserted, define your table like so:
CREATE TABLE dim_cust (cust_key IDENTITY NOT NULL, cust_no INT NOT NULL, cust_name VARCHAR(32), cust_dob DATE, PRIMARY KEY (cust_key)) UNSEGMENTED ALL NODES ; -- if it's a smaller tableAn IDENTITY key must not be referenced in an INSERT or UPDATE statement, so you would only reference
cust_no,cust_nameandcust_dobin your insert job by Pentaho.cust_keygets populated by Vertica.And make sure you load thousands of rows in one SQL call.
Good luck
Marco
Marco is right, let Vertica do the work
Fyi, in addition to IDENTITY columns, you can also use a sequence to generate a unique key value for a table column as a default...
dbadmin=> CREATE SEQUENCE dim_cust_seq; CREATE SEQUENCE dbadmin=> CREATE TABLE dim_cust ( cust_key INT DEFAULT dim_cust_seq.NEXTVAL, cust_no INT NOT NULL, cust_name VARCHAR(32), cust_dob DATE, PRIMARY KEY (cust_key)) UNSEGMENTED ALL NODES ; CREATE TABLE dbadmin=> INSERT INTO dim_cust (cust_no, cust_name, cust_dob) VALUES (1, 'VERTICA', sysdate); OUTPUT -------- 1 (1 row) dbadmin=> SELECT * FROM dim_cust; cust_key | cust_no | cust_name | cust_dob ----------+---------+-----------+------------ 1 | 1 | VERTICA | 2017-06-21 (1 row)