I am running a query which would require a where clause matching 2 different table columns but I get the error 3149 Duplicate primary/unique key and i don't understand why or how to get around this
This typically is caused by a duplicate record. Vertica constraints do not check for duplicates on load for performance reasons, only on subsequent queries.
To check for duplicate records run the following commands. If they report 0 rows then they are ok, otherwise they will report the row that is a duplicate.
This resulted in finding the table with the duplicate record. => SELECT ANALYZE_CONSTRAINTS ('clife.Users'); Schema Name | Table Name | Column Names | Constraint Name | ConstraintType | Column Values -------------+------------+--------------+-----------------+-----------------+---------------------------------------------- clife | Users | udid | C_PRIMARY | PRIMARY | ('fca53a876319351330836d328e74128346b9f74c') (1 row)
You can delete duplicate record using below:
Using epoch as a unique identifier: =================================== Each record in Vertica has a hidden epoch column that represents the epoch at the time the data was loaded/inserted. If the two records were loaded at different times then the epoch can be used as a unique identifier to delete one and retain the other. This typically allows removal without constraint errors forcing temporary removal of the constraint.
# shows duplicate rows with different epochs Testing1=> select epoch,* from a_test where a='2'; epoch | a | b | c -------+---+---+-------- 1441 | 2 | 3 | xxxxx 5 | 2 | 3 | xxxxx (2 rows) # shows delete using the epoch as the unique identifier Testing1=> delete from a_test where a='2' and b='3' and c='xxxxx' and epoch='1441'; OUTPUT -------- 1 (1 row) # resulting in single record Testing1=> select epoch,* from a_test where a='2'; epoch | a | b | c -------+---+---+-------- 5 | 2 | 3 | xxxxx (1 row)
Hi Abhishek ...argggghhh ok the query you suggest shows one table with multiple entries SELECT ANALYZE_CONSTRAINTS('live_bp_uk_owner.DYN_PRODUCT_ATTRIBUTES');
You can try to remove these duplicate entries from this table, as suggested in previous post using EPOCH column. If duplicity is there, you will see same rows with different epoch.
Once done with removing those duplicate rows & keeping only one row. Try your query of SELECT again.
Thanks Abhishek, I appear to have a slightly different situation in that it appears to me that I have duplicate rows but with the same epoch, I am not sure that should be even possible but that is how it appears .. see here : SELECT ANALYZE_CONSTRAINTS('live_bp_uk_owner.DYN_PRODUCT_ATTRIBUTES');
Comments
This typically is caused by a duplicate record. Vertica constraints do not check for duplicates on load for performance reasons, only on subsequent queries.
To check for duplicate records run the following commands. If they report 0 rows then they are ok, otherwise they will report the row that is a duplicate.
Eg:
==
SELECT ANALYZE_CONSTRAINTS ('clife.Sessions');
SELECT ANALYZE_CONSTRAINTS ('clife.Users');
SELECT ANALYZE_CONSTRAINTS ('clife.UserStatus');
This resulted in finding the table with the duplicate record.
=> SELECT ANALYZE_CONSTRAINTS ('clife.Users');
Schema Name | Table Name | Column Names | Constraint Name | ConstraintType | Column Values
-------------+------------+--------------+-----------------+-----------------+----------------------------------------------
clife | Users | udid | C_PRIMARY | PRIMARY | ('fca53a876319351330836d328e74128346b9f74c')
(1 row)
You can delete duplicate record using below:
Using epoch as a unique identifier:
===================================
Each record in Vertica has a hidden epoch column that represents the epoch at the time the data was loaded/inserted. If the two records were loaded at different times then the epoch can be used as a unique identifier to delete one and retain the other. This typically allows removal without constraint errors forcing temporary removal of the constraint.
# shows duplicate rows with different epochs
Testing1=> select epoch,* from a_test where a='2';
epoch | a | b | c
-------+---+---+--------
1441 | 2 | 3 | xxxxx
5 | 2 | 3 | xxxxx
(2 rows)
# shows delete using the epoch as the unique identifier
Testing1=> delete from a_test where a='2' and b='3' and c='xxxxx' and epoch='1441';
OUTPUT
--------
1
(1 row)
# resulting in single record
Testing1=> select epoch,* from a_test where a='2';
epoch | a | b | c
-------+---+---+--------
5 | 2 | 3 | xxxxx
(1 row)
Regards'
Abhishek
SELECT ANALYZE_CONSTRAINTS('live_bp_uk_owner.DYN_PRODUCT_ATTRIBUTES');
-----------------+------------------------+---------------------+--------------------------------+-----------------+---------------
live_bp_uk_owner | DYN_PRODUCT_ATTRIBUTES | OUTPUT_PROD_SYS_KEY | PK_MASH_DYN_PRODUCT_ATTRIBUTES | PRIMARY | ('5516013')
live_bp_uk_owner | DYN_PRODUCT_ATTRIBUTES | OUTPUT_PROD_SYS_KEY | PK_MASH_DYN_PRODUCT_ATTRIBUTES | PRIMARY | ('5860511')
etc ..
Whereas the other table returns no rows!
SELECT ANALYZE_CONSTRAINTS('live_bp_uk_owner.RPT_FULL_SALE_SUMMARY');
(0) rows ,,,
so where do I go from here ?
If prefered I can raise a case..
You can try to remove these duplicate entries from this table, as suggested in previous post using EPOCH column. If duplicity is there, you will see same rows with different epoch.
Once done with removing those duplicate rows & keeping only one row. Try your query of SELECT again.
Regards'
Abhishek
SELECT ANALYZE_CONSTRAINTS('live_bp_uk_owner.DYN_PRODUCT_ATTRIBUTES');
live_bp_uk_owner | DYN_PRODUCT_ATTRIBUTES | OUTPUT_PROD_SYS_KEY | PK_MASH_DYN_PRODUCT_ATTRIBUTES | PRIMARY | ('2422337')
live_bp_uk_owner | DYN_PRODUCT_ATTRIBUTES | OUTPUT_PROD_SYS_KEY | PK_MASH_DYN_PRODUCT_ATTRIBUTES | PRIMARY | ('4729617')
live_bp_uk_owner | DYN_PRODUCT_ATTRIBUTES | OUTPUT_PROD_SYS_KEY | PK_MASH_DYN_PRODUCT_ATTRIBUTES | PRIMARY | ('26788')
lets use just one of the 'dupliacte' key entries
dbadmin=> select epoch from live_bp_uk_owner.DYN_PRODUCT_ATTRIBUTES where OUTPUT_PROD_SYS_KEY='6447541';
epoch
---------
1043503
1043503
(2 rows)
Odd ?
If such is the case, you can opt for keeping copy of this record row with you in some separate table & then delete these rows from the table.
It will make all rows for this key to be deleted.
Once done. Re insert the single copy of that row again.
Regards'
Abhishek