Database crash during object replication

Database crash during object replication

  1. Create simple table
    create table product_type ( product_type_id int , product_type_name varchar(255) );

  2. Create sequence + flattened table with nextval(sequence) as default value
    create sequence seq_product; create table product ( product_id int default nextval('seq_product') , product_name varchar(255) default '' , product_type_id int default -1 , product_type_name varchar(255) default (select product_type_name from product_type where product_type.product_type_id = product.product_type_id) );

  3. Create copy of flattened table
    create table product_copy like product including projections;

  4. This copy loses all default expressions
    select export_objects('','product_copy');

CREATE TABLE public.product_copy
(
product_id int,
product_name varchar(255),
product_type_id int,
product_type_name varchar(255)
);

  1. Setup replication config only for product_copy table and run it
    objects = product_copy

Error: SQL command "select object_snapshot('recovery_snapshot_TJB8M57213K3BQ92C77GZG8AWROS6U6O','product_copy','','',false);" failed: ERROR 7832: Table public.product_type is referenced by public.product_copy in a SET USING or DEFAULT expression, but is not included in the snapshot
HINT: Back up the tables together, or alter table removing SET USING/DEFAULT on attributes

As described before product_copy lost "flattened" default, but not completely. This is the first BUG.

  1. In next step append all necessary objects and run replication
    objects = product_copy, product_type, product

  2. Run replication AGAIN and check the cluster. All nodes will crash

2018-11-08 09:35:09.528 DistCall Dispatch:7f50267fd700-a0000000000af7 @v_recovery_node0002: VX001/5445: VIAssert(0) failed
DETAIL: /scratch_a/release/svrtar2886/vbuild/vertica/Catalog/Table.cpp: 910
HINT: Please report this error to Vertica; try restating your query
LOCATION: ??, /scratch_a/release/svrtar2886/vbuild/vertica/Basics/VAssert.cpp:22
2018-11-08 09:35:09.537 DistCall Dispatch:7f50267fd700-a0000000000af7 @v_recovery_node0002: VX001/8238: Unable to apply catalog diff. Error: VIAssert(0) failed
LOCATION: dispatchIncomingCallMessage, /scratch_a/release/svrtar2886/vbuild/vertica/Dist/DistCalls.cpp:16917
2018-11-08 09:35:09.758 DistCall Dispatch:7f50267fd700-a0000000000af7 [Main] Wrote backtrace to ErrorReport.txt

Comments

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited November 2018

    How did you get STEP 2 to run?

    dbadmin=> create table product (
    dbadmin(>   product_id int default nextval('seq_product'),
    dbadmin(>   product_name varchar(255) default '',
    dbadmin(>   product_type_id int default -1,
    dbadmin(>   product_type_name varchar(255) default (select product_type_name from product_type where product_type.product_type_id = product.product_type_id)) ;
    ERROR 7344:  default expressions may not refer to other columns with default expressions
    HINT:  Column product_type_id in the expression has a default definition
    

    Should be:

    dbadmin=> create table product (
    dbadmin(>   product_id int default nextval('seq_product'),
    dbadmin(>   product_name varchar(255) default '',
    dbadmin(>   product_type_id int,
    dbadmin(>   product_type_name varchar(255) default (select product_type_name from product_type where product_type.product_type_id = product.product_type_id)) ;
    CREATE TABLE
    

    Per the doc, a CREATE TABLE LIKE statement "replicates the source table definition and any storage policy associated with it. It does not copy table data or expressions on columns."

    See:
    https://www.vertica.com/docs/9.1.x/HTML/index.htm#Authoring/AdministratorsGuide/Tables/CreateTableLike.htm

    Per the crash, that's an issue! What version of Vertica are you running?

    Fyi ... Vertica 9.0.1-6 has this fix:

    9.0.1-6 (03/28/2018)- Running replication on a schema that contains sequences resulted in an error causing the replication to fail. In addition, it caused a node crash on the secondary cluster. This issue has been fixed.

    See:
    https://www.vertica.com/docs/ReleaseNotes/9.0.x/Vertica_9.0.x_Release_Notes.htm

  • How did you get STEP 2 to run?

    It's my mistake - code was edited before publishing and not tested.
    Original structure was simplier:
    create table product ( product_id int default nextval('seq_product') , product_name varchar(255) , product_type_id int , product_type_name varchar(255) default (select product_type_name from product_type where product_type.product_type_id = product.product_type_id) );
    Everything else is original and tested.

    It does not copy table data or expressions on columns.

    "create table like" loses all default expressions ONLY if table has identity/sequence expressions.

    Per the crash, that's an issue! What version of Vertica are you running?

    Version is 9.1.1-0

    Please fix it. Replicated database is completely destroyed after such a failure and must be recreated from scratch.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Hi,

    The doc says that CREATE TABLE...LIKE copies all table constraints, with one the exception being "Any column that obtains its values from a sequence, including IDENTITY and AUTO_INCREMENT columns. Vertica copies the column values into the new table, but removes the original constraint."

    That explains why the DEFAULT with the sequence is not copied, but not why unrelated DEFAULT expressions are not copied. I opened a JIRA to see if this is expected behavior. I will update this tread when I hear back.

    Per the crash, can you please open a Vertica Support Case for that? I believe that you need more attention than can be provided via the forum.

  • vasmez81vasmez81
    edited November 2018

    I opened a JIRA to see if this is expected behavior

    Thanks!

    Per the crash, can you please open a Vertica Support Case for that?

    We are still using CE license. So I can't.

    My crash can be easily reproduced. I also added ErrorReport to the first post.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited November 2018

    Hi,

    I can not reproduce the cluster failure in your example.

    Here is my quick test:

    My target cluster DB does not have the three tables:

    [dbadmin@vertica8 ~]$ vsql -h verticademos.com -c "SELECT table_name FROM tables WHERE table_name IN ('product_type', 'product', 'product_copy');"
     table_name
    ------------
    (0 rows)
    

    On my source cluster, I will create the three tables and replicate to the target:

    dbadmin=> SELECT version();
                  version
    ------------------------------------
     Vertica Analytic Database v9.1.1-4
    (1 row)
    
    dbadmin=> create table product_type ( product_type_id int , product_type_name varchar(255) );
    CREATE TABLE
    
    dbadmin=> CREATE SEQUENCE seq_product;
    CREATE SEQUENCE
    
    dbadmin=> create table product ( product_id int default nextval('seq_product') , product_name varchar(255) , product_type_id int , product_type_name varchar(255) default (select product_type_name from product_type where product_type.product_type_id = product.product_type_id) );
    CREATE TABLE
    
    dbadmin=> create table product_copy like product including projections;
    CREATE TABLE
    
    dbadmin=> \q
    
    [dbadmin@vertica8 ~]$ vbr -t replicate -c replicate.ini
    Enter vertica password for destination database:
    Participating nodes: v_test_db_node0001.
    Starting replication of objects ['product', 'product_copy', 'product_type'] from test_db.
    Snapshotting the source database.
    Snapshot complete.
    Copying catalog snapshot from source to destination.
    Preparing destination database for replicating objects.
    Prep complete, start syncing files.
    Approximate bytes to copy: 0 of 0 total.
    [==================================================] 100%
    Finalizing object replication.
    Complete object replication, removing snapshot from the source database.
    Object replication complete!
    

    The target cluster (verticademos.com) is still up, and the tables were created:

    [dbadmin@vertica8 ~]$ vsql -h verticademos.com -c "SELECT table_name FROM tables WHERE table_name IN ('product_type', 'product', 'product_copy');"
      table_name
    --------------
     product_type
     product
     product_copy
    (3 rows)
    
  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    @vasmez81 - Are your source and target DB running the same version of Vertica?

  • I can not reproduce the cluster failure in your example.

    Please repeat replication. Crash occurs in the second replication run.
    First run is always successful. After the second run, it reports "Object replication is complete!", but the DB is falling.

    Are your source and target DB running the same version of Vertica?

    Both DBs has the same version 9.1.1-0

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Hi,

    I tried replicating 2 more times. The target DB stayed up.

    [dbadmin@vertica8 ~]$ vsql -c "SELECT * FROM product_type;" # SOURCE CLUSTER
     product_type_id | product_type_name
    -----------------+-------------------
    (0 rows)
    
    [dbadmin@vertica8 ~]$ vsql -h verticademos.com -c "SELECT * FROM product_type;" # TARGET CLUSTER
     product_type_id | product_type_name
    -----------------+-------------------
    (0 rows)
    
    [dbadmin@vertica8 ~]$ vsql -c "INSERT INTO product_type SELECT 1, 'TYPE1'; COMMIT;" # SOURCE CLUSTER
    COMMIT
    
    [dbadmin@vertica8 ~]$ vbr -t replicate -c replicate.ini
    Enter vertica password for destination database:
    Participating nodes: v_test_db_node0001.
    Starting replication of objects ['product', 'product_copy', 'product_type'] from test_db.
    Snapshotting the source database.
    Snapshot complete.
    Copying catalog snapshot from source to destination.
    Preparing destination database for replicating objects.
    Prep complete, start syncing files.
    Approximate bytes to copy: 81 of 81 total.
    [==================================================] 100%
    Finalizing object replication.
    Complete object replication, removing snapshot from the source database.
    Object replication complete!
    
    [dbadmin@vertica8 ~]$ vsql -h verticademos.com -c "SELECT * FROM product_type;" # TARGET CLUSTER
     product_type_id | product_type_name
    -----------------+-------------------
                   1 | TYPE1
    (1 row)
    
    [dbadmin@vertica8 ~]$ vsql -c "INSERT INTO product_type SELECT 2, 'TYPE2'; COMMIT;" # SOURCE CLUSTER
    COMMIT
    
    [dbadmin@vertica8 ~]$ vbr -t replicate -c replicate.ini
    Enter vertica password for destination database:
    Participating nodes: v_test_db_node0001.
    Starting replication of objects ['product', 'product_copy', 'product_type'] from test_db.
    Snapshotting the source database.
    Snapshot complete.
    Copying catalog snapshot from source to destination.
    Preparing destination database for replicating objects.
    Prep complete, start syncing files.
    Approximate bytes to copy: 81 of 162 total.
    [==================================================] 100%
    Finalizing object replication.
    Complete object replication, removing snapshot from the source database.
    Object replication complete!
    
    [dbadmin@vertica8 ~]$ vsql -h verticademos.com -c "SELECT * FROM product_type;" # TARGET CLUSTER
     product_type_id | product_type_name
    -----------------+-------------------
                   1 | TYPE1
                   2 | TYPE2
    (2 rows)
    
  • Hello again!
    Jim, could You check on cluster with more than one node?
    I checked replication on a single node cluster. You are right, the crash is not reproduced in 9.1.1-0.
    But in a 3-node cluster the problem is clear.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    I was able to reproduce the target failure on a 2 node cluster. I will open a JIRA immediately and will keep this thread updated with its resolution progress. Thanks for letting us know about this!

    Fyi:

    [dbadmin@SE-Sandbox-43-node1 ~]$ vsql -d mydb -c "SELECT table_name FROM tables WHERE table_name IN ('product_type', 'product', 'product_copy');" # SOURCE DB
      table_name
    --------------
     product_type
     product
     product_copy
    (3 rows)
    
    [dbadmin@SE-Sandbox-43-node1 ~]$ vsql -h 172.16.61.176 -d mydb -c "SELECT table_name FROM tables WHERE table_name IN ('product_type', 'product', 'product_copy');" # TARGET DB
     table_name
    ------------
    (0 rows)
    
    [dbadmin@SE-Sandbox-43-node1 ~]$ vbr -t replicate -c replicate_3_nodes.ini
    Enter vertica password for destination database:
    Participating nodes: v_mydb_node0001, v_mydb_node0002, v_mydb_node0003.
    Starting replication of objects ['product', 'product_copy', 'product_type'] from mydb.
    Snapshotting the source database.
    Snapshot complete.
    Copying catalog snapshot from source to destination.
    Preparing destination database for replicating objects.
    Prep complete, start syncing files.
    Approximate bytes to copy: 0 of 0 total.
    [==================================================] 100%
    Finalizing object replication.
    Complete object replication, removing snapshot from the source database.
    Object replication complete!
    
    [dbadmin@SE-Sandbox-43-node1 ~]$ vsql -h 172.16.61.176 -d mydb -c "SELECT table_name FROM tables WHERE table_name IN ('product_type', 'product', 'product_copy');" # TARGET DB
      table_name
    --------------
     product_type
     product
     product_copy
    (3 rows)
    
    [dbadmin@SE-Sandbox-43-node1 ~]$ vsql -d mydb -c "INSERT INTO product_type SELECT 1, 'TYPE1'; COMMIT;" # SOURCE DB
    COMMIT
    
    [dbadmin@SE-Sandbox-43-node1 ~]$ vsql -h 172.16.61.176 -d mydb -c "SELECT * FROM product_type;" # TARGET DB
     product_type_id | product_type_name
    -----------------+-------------------
    (0 rows)
    
    [dbadmin@SE-Sandbox-43-node1 ~]$ vbr -t replicate -c replicate_3_nodes.ini
    Enter vertica password for destination database:
    Participating nodes: v_mydb_node0001, v_mydb_node0002, v_mydb_node0003.
    Starting replication of objects ['product', 'product_copy', 'product_type'] from mydb.
    Snapshotting the source database.
    Snapshot complete.
    Copying catalog snapshot from source to destination.
    Preparing destination database for replicating objects.
    Prep complete, start syncing files.
    Approximate bytes to copy: 162 of 162 total.
    [==================================================] 100%
    Finalizing object replication.
    Complete object replication, removing snapshot from the source database.
    Object replication complete!
    
    [dbadmin@SE-Sandbox-43-node1 ~]$ vsql -h 172.16.61.176 -d mydb -c "SELECT * FROM product_type;" # TARGET DB
    vsql: could not connect to server: Connection refused
            Is the server running on host "172.16.61.176" and accepting
            TCP/IP connections on port 5433?
    
  • Thank you for support!
    Is there any estimate for fixes?
    1. Dropping default for columns after copy_table / create table like
    2. Replication of such tables

  • Jim_KnicelyJim_Knicely - Select Field - Administrator

    Hi,

    Engineering has identified and fixed the issue. Waiting to hear when the code fix will be available to the community. I will update this thread as soon as I know!

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file