The Vertica Forum recently got a makeover! Let us know what you think by filling out this short, anonymous survey.
Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

Deleting duplicates from a table

I have large tables which have duplicate records. I know we can delete duplicates by creating a temporary table and inserting distinct record there and then dropping the original table and renaming the temp table to original table.
But I have a service which may be writing to the original table via copy command with auto commit when would be performing this drop and rename approach. What will happen in such case? My application which writes data to the table can retry, but when I do a insert /* +direct+*/ into temp select distinct,... the app could be writing to the main table. The insert query would have inserted some data but meanwhile that app might have written new data to the main table, If i drop the main table and rename temp to main table, will there be any data loss? What would be the best practice to remove duplicates from a table where other application could still be writing to the same table.


  • marcothesanemarcothesane - Select Field - Administrator

    You could try partitioning the table - by the element of time - and then

    INSERT INTO my_copy SELECT DISTINCT * FROM my_orig WHERE my_partition_key='<whatever_it_is>';
    . '<whatever_the_smallest_partition_key_is>'
    . '<whatever_the_greatest_partition_key_is>'
    , 'my_orig'

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file