deleting duplicated rows

Hi All, Is there an efficient way to delete duplicates (whole row duplicated) in a big vertica table . All I can think of is a group by on all the columns, store the unique rows in a temp table, delete the duplicated chunk, repopulate from the temp table. Thanks, RAvi.

Comments

  • Hi Ravi, My approach would be similar, except, recall that Vertica really doesn't like DELETEs. So I would just do: CREATE TABLE temp AS SELECT DISTINCT * FROM orig; DROP TABLE orig; ALTER TABLE temp RENAME TO orig; (Of course, before dropping orig, you should verify that temp contains what you want. You could also "ALTER TABLE orig RENAME TO old_orig;" or something, to get it out of the way but keep it around.) Adam
  • If your data is partitioned usefully, you can also do this by partition, using DROP PARTITION and MOVE_PARTITIONS_TO_TABLE().

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file