Data loading and constrains enforcement
I'm using vertica community edition for an ad server. I generate reports based on some events. I store those events on my webserver cluster as csv file and every minute i save the file to Amazon S3 and I send a message through a queue to another server that downloads all new csv files from S3 and imports the data to Vertica using vsql and the copy command. The problem is that we have duplicate events so we get some errors when querying the data. I identify the duplicate data using the analyze constraints command and then delete each row. Is this a good practice? How do you load your data and enforce the constraints? PS: I only have 5 tables in the database. Thanks!
0
Comments