Options

Why to define Foreign key constraint as it can not be enforced

sujaisujai Vertica Customer

Consider the following scenario
create table City (
Name varchar (100) NOT NULL PRIMARY KEY ENABLED,
District varchar (100),
State varchar (100)
);
INSERT INTO City( Name, District, State)
VALUES('Mysore', 'Mysore' , 'Karnataka');
SELECT * from City;
create table Person (
Name varchar (100)NOT NULL PRIMARY KEY ENABLED,
Age Integer,
City varchar (100) NOT NULL CONSTRAINT fk_Person_City REFERENCES City(Name)
);

INSERT INTO Person( Name, Age, City)
VALUES('Sujai', 34 , 'Antartica');
Logically above insert should fail as Antartica is not defined in City table.
But it is not failing
SELECT * FROM Person

Answers

  • Options
    SankarmnSankarmn Community Edition User ✭✭

    @sujai , that's possible. Did you check this clause?
    NULL Values in Foreign Key
    A foreign key that whose columns omit NOT NULL can contain NULL values, even if the primary key contains no NULL values. Thus, you can insert rows into the table even if their foreign key is not yet known.
    Ref: https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/Constraints/ConstraintTypes/ForeignKeyConstraints.htm?zoom_highlight=foreign key
    I can't see this in other relational databases though.

  • Options
    Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited August 2020

    Vertica does not support enforcement of foreign keys and referential integrity. Too much overhead!

    To validate foreign key constraints, use ANALYZE_CONSTRAINTS.

    Example:

    dbadmin=> SELECT analyze_constraints('person');
     Schema Name | Table Name | Column Names | Constraint Name | Constraint Type | Column Values
    -------------+------------+--------------+-----------------+-----------------+---------------
     public      | Person     | City         | fk_Person_City  | FOREIGN         | ('Antartica')
    (1 row)
    
    

    Doc Page:

    https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/AdministratorsGuide/Constraints/ConstraintEnforcement/ConstraintEnforcementRestrictions.htm

    So why define Foreign Key Constraints?

    Becuase when a Hash Join Operator is part of the the query plan, the optimizer benefits from explicitly defined primary and foreign keys as it will use the table with the primary key as the inner table and the table with the foreign key as the outer table. This saves the optimizer the step of determining which table is smaller!

  • Options
    sujaisujai Vertica Customer

    Thanks jim.
    Though with ANALYZE_CONSTRAINTS we can validate the constraints in reality already 'bad\data without any reference is committed to database. So the main purpose referential integrity of data is lost.

    A foreign key joins a table to another table by referencing its primary key. A foreign key constraint specifies that the key can only contain values that are in the referenced primary key, and thus ensures the referential integrity of data that is joined on the two keys.
    https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/AdministratorsGuide/Constraints/ConstraintTypes/ForeignKeyConstraints.htm#:~:text=A foreign key joins a,joined on the two keys.

  • Options
    Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited August 2020

    The best pratice is to run ANALYZE_CONSTRAINTS prior to commiting the data :smile:

    That way you can fix any violations before any other session sees the violations.

    See:
    https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/AdministratorsGuide/Constraints/DetectingConstraintViolationswithAnalyzeC.htm

  • Options
    marcothesanemarcothesane - Select Field - Administrator

    Adding to Jim's posts: Independently of Vertica, in all Data Warehousing implementations outside of Vertica I was a part of in the last 3 decades, I made sure that referential integrity constraints - and all other constraints and triggers - were replaced by data quality checks within the ETL flow before loading into the data warehouse.

    Just using that approach, I could increase the load speed by factors of between 4 and 10. And the ETL batch window for the data warehouse is always too small.

    But still, if the DBMS allowed it, I kept the constraints as DISABLED in the catalog. According to the ancient Romans: Quod non est in actis, non est in mundo. (What is not on file, is not in the world). To me, "What is not in the database catalog, does not exist". It's the first source of information - for machines (ETL tools and BI tools exchanging metadata, data lineage tracking software, etc. etc) and for the human eye. That's why I keep this information, albeit disabled for performance reasons.

    Hope this sheds some light on the matter ...

  • Options
    sujaisujai Vertica Customer

    thanks jim,
    Could you please elaborate how to run ANALYZE_CONSTRAINTS prior to commiting the data?

  • Options
    marcothesanemarcothesane - Select Field - Administrator

    If you use COPY , go:

    COPY person FROM '/data/infiles/person.csv' ON ANY NODE DELIMITER ',' NO COMMIT;
    SELECT ANALYZE_CONSTRAINTS('person');
    

    If you fill your data using DML, then, connecting with vsql , any INSERT ... SELECT ... would have to be explicitly COMMIT-ted. So, in that scenario, run ANALYZE_CONSTRAINTS() before committing.
    If ANALYZE_CONSTRAINTS() reports issues, just roll back , and fix the source ...

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file