deleting duplicated rows

Ravi_Vijay · July 2013

Hi All, Is there an efficient way to delete duplicates (whole row duplicated) in a big vertica table . All I can think of is a group by on all the columns, store the unique rows in a temp table, delete the duplicated chunk, repopulate from the temp table. Thanks, RAvi.

[Deleted User] · July 2013

Hi Ravi, My approach would be similar, except, recall that Vertica really doesn't like DELETEs. So I would just do: CREATE TABLE temp AS SELECT DISTINCT * FROM orig; DROP TABLE orig; ALTER TABLE temp RENAME TO orig; (Of course, before dropping orig, you should verify that temp contains what you want. You could also "ALTER TABLE orig RENAME TO old_orig;" or something, to get it out of the way but keep it around.) Adam

[Deleted User] · July 2013

If your data is partitioned usefully, you can also do this by partition, using DROP PARTITION and MOVE_PARTITIONS_TO_TABLE().

We're Moving!

Create My New Community Account Now

deleting duplicated rows

Comments

Leave a Comment