Why to define Foreign key constraint as it can not be enforced
sujai
Vertica Customer
Consider the following scenario
create table City (
Name varchar (100) NOT NULL PRIMARY KEY ENABLED,
District varchar (100),
State varchar (100)
);
INSERT INTO City( Name, District, State)
VALUES('Mysore', 'Mysore' , 'Karnataka');
SELECT * from City;
create table Person (
Name varchar (100)NOT NULL PRIMARY KEY ENABLED,
Age Integer,
City varchar (100) NOT NULL CONSTRAINT fk_Person_City REFERENCES City(Name)
);
INSERT INTO Person( Name, Age, City)
VALUES('Sujai', 34 , 'Antartica');
Logically above insert should fail as Antartica is not defined in City table.
But it is not failing
SELECT * FROM Person
Tagged:
0
Answers
@sujai , that's possible. Did you check this clause?
NULL Values in Foreign Key
A foreign key that whose columns omit NOT NULL can contain NULL values, even if the primary key contains no NULL values. Thus, you can insert rows into the table even if their foreign key is not yet known.
Ref: https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/Constraints/ConstraintTypes/ForeignKeyConstraints.htm?zoom_highlight=foreign key
I can't see this in other relational databases though.
Vertica does not support enforcement of foreign keys and referential integrity. Too much overhead!
To validate foreign key constraints, use ANALYZE_CONSTRAINTS.
Example:
Doc Page:
https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/AdministratorsGuide/Constraints/ConstraintEnforcement/ConstraintEnforcementRestrictions.htm
So why define Foreign Key Constraints?
Becuase when a Hash Join Operator is part of the the query plan, the optimizer benefits from explicitly defined primary and foreign keys as it will use the table with the primary key as the inner table and the table with the foreign key as the outer table. This saves the optimizer the step of determining which table is smaller!
Thanks jim.
Though with ANALYZE_CONSTRAINTS we can validate the constraints in reality already 'bad\data without any reference is committed to database. So the main purpose referential integrity of data is lost.
A foreign key joins a table to another table by referencing its primary key. A foreign key constraint specifies that the key can only contain values that are in the referenced primary key, and thus ensures the referential integrity of data that is joined on the two keys.
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/Constraints/ConstraintTypes/ForeignKeyConstraints.htm#:~:text=A foreign key joins a,joined on the two keys.
The best pratice is to run ANALYZE_CONSTRAINTS prior to commiting the data
That way you can fix any violations before any other session sees the violations.
See:
https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/AdministratorsGuide/Constraints/DetectingConstraintViolationswithAnalyzeC.htm
Adding to Jim's posts: Independently of Vertica, in all Data Warehousing implementations outside of Vertica I was a part of in the last 3 decades, I made sure that referential integrity constraints - and all other constraints and triggers - were replaced by data quality checks within the ETL flow before loading into the data warehouse.
Just using that approach, I could increase the load speed by factors of between 4 and 10. And the ETL batch window for the data warehouse is always too small.
But still, if the DBMS allowed it, I kept the constraints as DISABLED in the catalog. According to the ancient Romans: Quod non est in actis, non est in mundo. (What is not on file, is not in the world). To me, "What is not in the database catalog, does not exist". It's the first source of information - for machines (ETL tools and BI tools exchanging metadata, data lineage tracking software, etc. etc) and for the human eye. That's why I keep this information, albeit disabled for performance reasons.
Hope this sheds some light on the matter ...
thanks jim,
Could you please elaborate how to run ANALYZE_CONSTRAINTS prior to commiting the data?
If you use COPY , go:
If you fill your data using DML, then, connecting with
vsql
, anyINSERT ... SELECT ...
would have to be explicitly COMMIT-ted. So, in that scenario, run ANALYZE_CONSTRAINTS() before committing.If ANALYZE_CONSTRAINTS() reports issues, just roll back , and fix the source ...