How can I import csv with duplicate data because I was set primary key while table creation?

naresh_way2naresh_way2 Registered User

I have set primary key in vertica table and I tried to copy data from csv into my table but it was not copying because the primary key field has duplicate ids how can skip that row and insert all other data into my table.I was doing like this..
testing=> create table vertical(id int primary key enabled,name varchar);
CREATE TABLE
testing=> copy vertical from '/home/naresh/Desktop/result.csv' parser fcsvparser();
ERROR 6745: Duplicate key values: 'id=3' -- violates constraint 'public.vertical.C_PRIMARY'
teststing=>

My csv is like this
1 abc
2 abcd
3 cba
4 adbc
5 bcd
6 rgukt
7 adbcc
3 erthgf

How can i insert this csv without duplicates and Is there any possible to store that duplicated data in another file?

Answers

  • Jim_KnicelyJim_Knicely Employee, Registered User, VerticaExpert

    One option is to disabled the PK. COPY the data, DELETE the duplicates, and then re-enable the PK.

    Example:

    dbadmin=> \! cat /home/dbadmin/result.csv
    1|abc
    2|abcd
    3|cba
    4|adbc
    5|bcd
    6|rgukt
    7|adbcc
    3|erthgf
    
    dbadmin=> copy vertical from '/home/dbadmin/result.csv';
    ERROR 6745:  Duplicate key values: 'id=3' -- violates constraint 'public.vertical.vertical_pk'
    
    dbadmin=> alter table vertical alter constraint vertical_pk disabled;
    ALTER TABLE
    
    dbadmin=> copy vertical from '/home/dbadmin/result.csv';
     Rows Loaded
    -------------
               8
    (1 row)
    
    dbadmin=> select * from vertical;
     id | stuff
    ----+--------
      1 | abc
      2 | abcd
      3 | cba
      3 | erthgf
      4 | adbc
      5 | bcd
      6 | rgukt
      7 | adbcc
    (8 rows)
    
    dbadmin=> delete from vertical where id || stuff not in (select id || stuff from vertical limit 1 over (partition by id order by id));
     OUTPUT
    --------
          1
    (1 row)
    
    dbadmin=> select * from vertical;
     id | stuff
    ----+-------
      1 | abc
      2 | abcd
      3 | cba
      4 | adbc
      5 | bcd
      6 | rgukt
      7 | adbcc
    (7 rows)
    
    dbadmin=> alter table vertical alter constraint vertical_pk enabled;
    ALTER TABLE
    
  • Jim_KnicelyJim_Knicely Employee, Registered User, VerticaExpert

    Or, you could create a temporary table, load the data there, then copy only the distinct records to your real table...

    Example:

    dbadmin=> create table vertical_temp like vertical;
    CREATE TABLE
    
    dbadmin=> copy vertical_temp from '/home/dbadmin/result.csv';
     Rows Loaded
    -------------
               8
    (1 row)
    
    dbadmin=> select * from vertical_temp;
     id | stuff
    ----+--------
      1 | abc
      2 | abcd
      3 | cba
      3 | erthgf
      4 | adbc
      5 | bcd
      6 | rgukt
      7 | adbcc
    (8 rows)
    
    dbadmin=> insert into vertical (select * from vertical_temp limit 1 over (partition by id order by id));
     OUTPUT
    --------
          7
    (1 row)
    
    dbadmin=> select * from vertical;
     id | stuff
    ----+-------
      1 | abc
      2 | abcd
      3 | cba
      4 | adbc
      5 | bcd
      6 | rgukt
      7 | adbcc
    (7 rows)
    
    dbadmin=> drop table vertical_temp;
    DROP TABLE
    
  • naresh_way2naresh_way2 Registered User

    Hi ,
    Yesterday I got this error I'm unable to solve please resolve this issue thanks in advance,

    events.js:163
    throw er; // Unhandled 'error' event
    ^
    Error: Unavailable: initiator locks for query - Locking failure: Timed out X locking Table:public.test2. X held by [user user_testing (select count(*) from test2;)]. Your current transaction isolation level is READ COMMITTED