Deleting duplicates from a table
I have large tables which have duplicate records. I know we can delete duplicates by creating a temporary table and inserting distinct record there and then dropping the original table and renaming the temp table to original table.
But I have a service which may be writing to the original table via copy command with auto commit when would be performing this drop and rename approach. What will happen in such case? My application which writes data to the table can retry, but when I do a insert /* +direct+*/ into temp select distinct,... the app could be writing to the main table. The insert query would have inserted some data but meanwhile that app might have written new data to the main table, If i drop the main table and rename temp to main table, will there be any data loss? What would be the best practice to remove duplicates from a table where other application could still be writing to the same table.
Answers
You could try partitioning the table - by the element of time - and then