Please take this survey to help us learn more about how you use third party tools. Your input is greatly appreciated!

Remove duplicates from my table

Good morning .
I trying remove duplicates , but is very difficult.

I try this :

CREATE TABLE t1 AS SELECT DISTINCT *
FROM t2;

But no is good i want remove all duplicates field1 is same .

I need help in other databases is ok but in vertica no good.

i used this but in vertica not ok.

INSERT INTO t2
select
f1 ,
f2 ,
f3 ,
f4 ,
f5,
f6 ,
f7 ,
f8 ,
f9 ,
f10 ,
f11 ,
f12 ,
f13 ,
f14 ,
f15 ,
f16 ,
f17 ,
f18 ,
f19 ,
20 ,
Ranking
from
( select rank() over (partition by f2 order by
f1 ,
f3 ,
f4 ,
f5,
f6 ,
f7 ,
f8 ,
f9 ,
f10 ,
f11 ,
f12 ,
f13 ,
f14 ,
f15 ,
f16 ,
f17 ,
f18 ,
f19 ,
20
asc ) as Ranking,

  • from t2 ) a where a.Ranking =1
    ;

I wait help .
thank you.

Comments

  • Hi Ismael -
    I would try the analytic LIMIT clause. It's currently not mentioned along with the documentation of the LIMIT keyword. But it is used in the examples in the documentation on the definition of Top-K projections. And, as the SELECT clause of a projection definition is expected to be nothing else than a query, it's fair, in my eyes, to use the analytic LIMIT clause in a query.
    The code below creates an in-line table with example data in a WITH clause.
    If you don't need the output to be ordered, you can just as well use the second Common Table Expression, de_duped, as the main query. You can't put an ORDER clause after an analytic LIMIT clause.

    And, if it works for 5 columns, it will work for your 20 columns, too.

    WITH
    input( f1 , f2 , f3 , f4 , f5 ) AS (
    SELECT 1,1,'John','1957-01-08',42
    UNION ALL SELECT 1,2,'John','1957-01-09',42
    UNION ALL SELECT 2,1,'Jack','1958-01-09',4711
    UNION ALL SELECT 2,1,'Jack','1958-01-09',4711
    UNION ALL SELECT 3,2,'Ford','1959-01-09',815
    UNION ALL SELECT 3,1,'Ford','1959-01-09',815
    UNION ALL SELECT 4,1,'Anne','1960-01-09',101
    UNION ALL SELECT 4,1,'Anne','1960-01-09',101
    UNION ALL SELECT 5,1,'Mary','1961-01-09',17
    UNION ALL SELECT 5,2,'Mary','1961-01-09',17
    )
    ,
    de_duped AS (
    SELECT
    *
    FROM input
    LIMIT 1 OVER(PARTITION BY f1 ORDER BY f2 , f3 , f4 , f5)
    )
    SELECT * FROM de_duped
    ORDER BY f1
    ;

    Happy playing -
    Marco the Sane

  • Jim_KnicelyJim_Knicely Administrator

    The analytic LIMIT clause is pretty awesome! When eliminating dups I sometimes want to keep the dup committed last (or first). To do that you can include the epoch in the ORDER BY...

    dbadmin=> select epoch, * from de_duped order by 1;
     epoch | f1 | f2 |  f3  |     f4     |  f5
    -------+----+----+------+------------+------
       389 |  1 |  1 | John | 1957-01-08 |   42
       390 |  1 |  2 | John | 1957-01-09 |   42
       390 |  2 |  1 | Jack | 1958-01-09 | 4711
       391 |  2 |  1 | Jack | 1958-01-09 | 4711
       391 |  3 |  1 | Ford | 1959-01-09 |  815
       391 |  3 |  2 | Ford | 1959-01-09 |  815
       391 |  4 |  1 | Anne | 1960-01-09 |  101
       391 |  4 |  1 | Anne | 1960-01-09 |  101
       391 |  5 |  1 | Mary | 1961-01-09 |   17
       392 |  5 |  2 | Mary | 1961-01-09 |   17
    (10 rows)
    
    dbadmin=> select epoch, * from de_duped limit 1 over (partition by f1 order by epoch desc, f2 asc, f3 asc, f4 asc, f5 asc);
     epoch | f1 | f2 |  f3  |     f4     |  f5
    -------+----+----+------+------------+------
       391 |  2 |  1 | Jack | 1958-01-09 | 4711
       392 |  5 |  2 | Mary | 1961-01-09 |   17
       391 |  3 |  1 | Ford | 1959-01-09 |  815
       390 |  1 |  2 | John | 1957-01-09 |   42
       391 |  4 |  1 | Anne | 1960-01-09 |  101
    (5 rows)
    
  • edited August 2017
  • You can use below script to remove duplicates

    -- Create a intermediate table to port the data with row number
    -- Here porting data from Table1 to Table2 along with row_num column
    select * into Table2 from (select *, ROW_NUMBER() OVER(PARTITION BY A,B order by C)as row_num from Table1 ) A;

    -- Delete data from Table1 using earlier created Table2 in above step
    DELETE FROM Table1 WHERE EXISTS (SELECT NULL FROM Table2
    where Table2.A=Table1.A
    and Table2.B=Table1.B
    and row_num > 1);
    -- Drop table create in first step i.e Table2
    Drop Table Table2;

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.