How can I import csv with duplicate data because I was set primary key while table creation?
I have set primary key in vertica table and I tried to copy data from csv into my table but it was not copying because the primary key field has duplicate ids how can skip that row and insert all other data into my table.I was doing like this..
testing=> create table vertical(id int primary key enabled,name varchar);
CREATE TABLE
testing=> copy vertical from '/home/naresh/Desktop/result.csv' parser fcsvparser();
ERROR 6745: Duplicate key values: 'id=3' -- violates constraint 'public.vertical.C_PRIMARY'
teststing=>
My csv is like this
1 abc
2 abcd
3 cba
4 adbc
5 bcd
6 rgukt
7 adbcc
3 erthgf
How can i insert this csv without duplicates and Is there any possible to store that duplicated data in another file?
0
Answers
One option is to disabled the PK. COPY the data, DELETE the duplicates, and then re-enable the PK.
Example:
Or, you could create a temporary table, load the data there, then copy only the distinct records to your real table...
Example:
Hi ,
Yesterday I got this error I'm unable to solve please resolve this issue thanks in advance,
events.js:163
throw er; // Unhandled 'error' event
^
Error: Unavailable: initiator locks for query - Locking failure: Timed out X locking Table:public.test2. X held by [user user_testing (select count(*) from test2;)]. Your current transaction isolation level is READ COMMITTED
See:
https://forum.vertica.com/discussion/239078/read-commited-error#latest
Hi,
Jim_Knicely I have a user with 600 data points, and 5 crores users.How can I store that data in vertica? which is the best procedure to store that data to get counts with in smaller time?Can u suggest me?
Sorry jim I have requirement like this.I have to store arrays in vertica table(not in flex table).If it unable to store what is the best way to store that type of data.My array length lies between 200 to 1000 and cardinality length lies between 10000 to 100000.It exists for each user and no.of users are 10crore plz tell me the solution for this and updates also should be fast.Is this database support for all these requirements?
You could create a wide table where you'd have a column for each data point. Then so that you can get super fast counts, you can create a Live Aggregate Projection.
Example (Your table would have all 600 data points):
See:
https://my.vertica.com/docs/9.0.x/HTML/index.htm#Authoring/AnalyzingData/AggregatedData/LiveAggregateProjections.htm
Hi Jim. Thank you for your reply. I have a question regarding the performance of updates in real time. My current performance on 4core server(4GB RAM,Single node) with 5Cr records(23 columns- 21 are int and other are varchar) is 20records updates/s.
How can I scale it upto 10K updates/s.
Please suggest me the configurations or the architecture from which I can attain it. I followed this url: https://my.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/AnalyzingData/Optimizations /OptimizingDELETEsAndUPDATEsForPerformance.htm but It is not giving me the result I want. Can you also suggest me the scale I can get from this configuration that I described before.
Hi,
Jim,thank you for your reply.I have a question.Is Vertica support for UPDATES in real time?If it is at max how many records it can update per second?To make faster updates in real time what architecture we need to follow? My case is like this
I have 5 crore records(23 data points 21 columns are integers and 2 columns are varchars) and I made id as primary key.
My current performance is 30 records updates/s
I want to scale it upto 10k records/s.Can u suggest me in what ways configuration changes can give me that performance?
I followed this url : https://my.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/AnalyzingData/Optimizations/OptimizingDELETEsAndUPDATEsForPerformance.htm but it was not working for me.Now Iam using it for analytics and i also wanted to serve the data.
My requirement to serve data is if we serve one record in one query it should not repeat in any query that means we need to change the status of the record (for this i need update). Here it is taking more time how can i face this challenge with vertica(instead of updates or with updates). That means data should not repeat(data should rotate).
And it also shows affect on counts also.