Data loading and constrains enforcement

I'm using vertica community edition for an ad server. I generate reports based on some events. I store those events on my webserver cluster as csv file and every minute i save the file to Amazon S3 and I send a message through a queue to another server that downloads all new csv files from S3 and imports the data to Vertica using vsql and the copy command. The problem is that we have duplicate events so we get some errors when querying the data. I identify the duplicate data using the analyze constraints command and then delete each row. Is this a good practice? How do you load your data and enforce the constraints? PS: I only have 5 tables in the database. Thanks!

Comments

  • Welcome to the community Mircea! Some other community members may be able to help you with your project and we will have support take a look.
  • Hello Mircea: Documentation suggests checking for constraint violations during the Copy function by using NO COMMIT, so you can roll back the COPY to correct any violations. Are you checking for constraint violations as part of loading, or after query errors? Please see the Administrator's Guide for more directions: Bulk Loading Data > Choosing a Load Method and About Constraints > Analyzing Constraints

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file