Remove duplicates from my table

ismael · August 2017

Good morning .
I trying remove duplicates , but is very difficult.

I try this :

CREATE TABLE t1 AS SELECT DISTINCT *
FROM t2;

But no is good i want remove all duplicates field1 is same .

I need help in other databases is ok but in vertica no good.

i used this but in vertica not ok.

INSERT INTO t2
select
f1 ,
f2 ,
f3 ,
f4 ,
f5,
f6 ,
f7 ,
f8 ,
f9 ,
f10 ,
f11 ,
f12 ,
f13 ,
f14 ,
f15 ,
f16 ,
f17 ,
f18 ,
f19 ,
20 ,
Ranking
from
( select rank() over (partition by f2 order by
f1 ,
f3 ,
f4 ,
f5,
f6 ,
f7 ,
f8 ,
f9 ,
f10 ,
f11 ,
f12 ,
f13 ,
f14 ,
f15 ,
f16 ,
f17 ,
f18 ,
f19 ,
20
asc ) as Ranking,

from t2 ) a where a.Ranking =1
;

I wait help .
thank you.

marcothesane · August 2017

Hi Ismael -
I would try the analytic LIMIT clause. It's currently not mentioned along with the documentation of the LIMIT keyword. But it is used in the examples in the documentation on the definition of Top-K projections. And, as the SELECT clause of a projection definition is expected to be nothing else than a query, it's fair, in my eyes, to use the analytic LIMIT clause in a query.
The code below creates an in-line table with example data in a WITH clause.
If you don't need the output to be ordered, you can just as well use the second Common Table Expression, de_duped, as the main query. You can't put an ORDER clause after an analytic LIMIT clause.

And, if it works for 5 columns, it will work for your 20 columns, too.

WITH
input( f1 , f2 , f3 , f4 , f5 ) AS (
SELECT 1,1,'John','1957-01-08',42
UNION ALL SELECT 1,2,'John','1957-01-09',42
UNION ALL SELECT 2,1,'Jack','1958-01-09',4711
UNION ALL SELECT 2,1,'Jack','1958-01-09',4711
UNION ALL SELECT 3,2,'Ford','1959-01-09',815
UNION ALL SELECT 3,1,'Ford','1959-01-09',815
UNION ALL SELECT 4,1,'Anne','1960-01-09',101
UNION ALL SELECT 4,1,'Anne','1960-01-09',101
UNION ALL SELECT 5,1,'Mary','1961-01-09',17
UNION ALL SELECT 5,2,'Mary','1961-01-09',17
)
,
de_duped AS (
SELECT
*
FROM input
LIMIT 1 OVER(PARTITION BY f1 ORDER BY f2 , f3 , f4 , f5)
)
SELECT * FROM de_duped
ORDER BY f1
;

Happy playing -
Marco the Sane

Jim_Knicely · August 2017

The analytic LIMIT clause is pretty awesome! When eliminating dups I sometimes want to keep the dup committed last (or first). To do that you can include the epoch in the ORDER BY...

dbadmin=> select epoch, * from de_duped order by 1;
 epoch | f1 | f2 |  f3  |     f4     |  f5
-------+----+----+------+------------+------
   389 |  1 |  1 | John | 1957-01-08 |   42
   390 |  1 |  2 | John | 1957-01-09 |   42
   390 |  2 |  1 | Jack | 1958-01-09 | 4711
   391 |  2 |  1 | Jack | 1958-01-09 | 4711
   391 |  3 |  1 | Ford | 1959-01-09 |  815
   391 |  3 |  2 | Ford | 1959-01-09 |  815
   391 |  4 |  1 | Anne | 1960-01-09 |  101
   391 |  4 |  1 | Anne | 1960-01-09 |  101
   391 |  5 |  1 | Mary | 1961-01-09 |   17
   392 |  5 |  2 | Mary | 1961-01-09 |   17
(10 rows)

dbadmin=> select epoch, * from de_duped limit 1 over (partition by f1 order by epoch desc, f2 asc, f3 asc, f4 asc, f5 asc);
 epoch | f1 | f2 |  f3  |     f4     |  f5
-------+----+----+------+------------+------
   391 |  2 |  1 | Jack | 1958-01-09 | 4711
   392 |  5 |  2 | Mary | 1961-01-09 |   17
   391 |  3 |  1 | Ford | 1959-01-09 |  815
   390 |  1 |  2 | John | 1957-01-09 |   42
   391 |  4 |  1 | Anne | 1960-01-09 |  101
(5 rows)

sKwa · August 2017

Hi!

Your syntax is not valid in Vertica, you should use:
SELECT f1, f2, f3,... INTO TABLE newTable FROM ...

Vertica does NOT! support syntax: INSER INTO newTable SELECT...

This supported:

This NOT supported:

https://www.w3schools.com/sql/sql_select_into.asp

Sunil568 · August 2021

You can use below script to remove duplicates

-- Create a intermediate table to port the data with row number
-- Here porting data from Table1 to Table2 along with row_num column
select * into Table2 from (select *, ROW_NUMBER() OVER(PARTITION BY A,B order by C)as row_num from Table1 ) A;

-- Delete data from Table1 using earlier created Table2 in above step
DELETE FROM Table1 WHERE EXISTS (SELECT NULL FROM Table2
where Table2.A=Table1.A
and Table2.B=Table1.B
and row_num > 1);
-- Drop table create in first step i.e Table2
Drop Table Table2;

We're Moving!

Create My New Community Account Now

Remove duplicates from my table

Comments

Leave a Comment