Database crash during object replication
Database crash during object replication
Create simple table
create table product_type ( product_type_id int , product_type_name varchar(255) );Create sequence + flattened table with nextval(sequence) as default value
create sequence seq_product; create table product ( product_id int default nextval('seq_product') , product_name varchar(255) default '' , product_type_id int default -1 , product_type_name varchar(255) default (select product_type_name from product_type where product_type.product_type_id = product.product_type_id) );Create copy of flattened table
create table product_copy like product including projections;This copy loses all default expressions
select export_objects('','product_copy');
CREATE TABLE public.product_copy
(
product_id int,
product_name varchar(255),
product_type_id int,
product_type_name varchar(255)
);
- Setup replication config only for product_copy table and run it
objects = product_copy
Error: SQL command "select object_snapshot('recovery_snapshot_TJB8M57213K3BQ92C77GZG8AWROS6U6O','product_copy','','',false);" failed: ERROR 7832: Table public.product_type is referenced by public.product_copy in a SET USING or DEFAULT expression, but is not included in the snapshot
HINT: Back up the tables together, or alter table removing SET USING/DEFAULT on attributes
As described before product_copy lost "flattened" default, but not completely. This is the first BUG.
In next step append all necessary objects and run replication
objects = product_copy, product_type, productRun replication AGAIN and check the cluster. All nodes will crash
2018-11-08 09:35:09.528 DistCall Dispatch:7f50267fd700-a0000000000af7 @v_recovery_node0002: VX001/5445: VIAssert(0) failed
DETAIL: /scratch_a/release/svrtar2886/vbuild/vertica/Catalog/Table.cpp: 910
HINT: Please report this error to Vertica; try restating your query
LOCATION: ??, /scratch_a/release/svrtar2886/vbuild/vertica/Basics/VAssert.cpp:22
2018-11-08 09:35:09.537 DistCall Dispatch:7f50267fd700-a0000000000af7 @v_recovery_node0002: VX001/8238: Unable to apply catalog diff. Error: VIAssert(0) failed
LOCATION: dispatchIncomingCallMessage, /scratch_a/release/svrtar2886/vbuild/vertica/Dist/DistCalls.cpp:16917
2018-11-08 09:35:09.758 DistCall Dispatch:7f50267fd700-a0000000000af7 [Main] Wrote backtrace to ErrorReport.txt
Comments
How did you get STEP 2 to run?
dbadmin=> create table product ( dbadmin(> product_id int default nextval('seq_product'), dbadmin(> product_name varchar(255) default '', dbadmin(> product_type_id int default -1, dbadmin(> product_type_name varchar(255) default (select product_type_name from product_type where product_type.product_type_id = product.product_type_id)) ; ERROR 7344: default expressions may not refer to other columns with default expressions HINT: Column product_type_id in the expression has a default definitionShould be:
dbadmin=> create table product ( dbadmin(> product_id int default nextval('seq_product'), dbadmin(> product_name varchar(255) default '', dbadmin(> product_type_id int, dbadmin(> product_type_name varchar(255) default (select product_type_name from product_type where product_type.product_type_id = product.product_type_id)) ; CREATE TABLEPer the doc, a CREATE TABLE LIKE statement "replicates the source table definition and any storage policy associated with it. It does not copy table data or expressions on columns."
See:
https://www.vertica.com/docs/9.1.x/HTML/index.htm#Authoring/AdministratorsGuide/Tables/CreateTableLike.htm
Per the crash, that's an issue! What version of Vertica are you running?
Fyi ... Vertica 9.0.1-6 has this fix:
9.0.1-6 (03/28/2018)- Running replication on a schema that contains sequences resulted in an error causing the replication to fail. In addition, it caused a node crash on the secondary cluster. This issue has been fixed.
See:
https://www.vertica.com/docs/ReleaseNotes/9.0.x/Vertica_9.0.x_Release_Notes.htm
It's my mistake - code was edited before publishing and not tested.
Original structure was simplier:
create table product ( product_id int default nextval('seq_product') , product_name varchar(255) , product_type_id int , product_type_name varchar(255) default (select product_type_name from product_type where product_type.product_type_id = product.product_type_id) );Everything else is original and tested.
"create table like" loses all default expressions ONLY if table has identity/sequence expressions.
Version is 9.1.1-0
Please fix it. Replicated database is completely destroyed after such a failure and must be recreated from scratch.
Hi,
The doc says that CREATE TABLE...LIKE copies all table constraints, with one the exception being "Any column that obtains its values from a sequence, including IDENTITY and AUTO_INCREMENT columns. Vertica copies the column values into the new table, but removes the original constraint."
That explains why the DEFAULT with the sequence is not copied, but not why unrelated DEFAULT expressions are not copied. I opened a JIRA to see if this is expected behavior. I will update this tread when I hear back.
Per the crash, can you please open a Vertica Support Case for that? I believe that you need more attention than can be provided via the forum.
Thanks!
We are still using CE license. So I can't.
My crash can be easily reproduced. I also added ErrorReport to the first post.
Hi,
I can not reproduce the cluster failure in your example.
Here is my quick test:
My target cluster DB does not have the three tables:
[dbadmin@vertica8 ~]$ vsql -h verticademos.com -c "SELECT table_name FROM tables WHERE table_name IN ('product_type', 'product', 'product_copy');" table_name ------------ (0 rows)On my source cluster, I will create the three tables and replicate to the target:
dbadmin=> SELECT version(); version ------------------------------------ Vertica Analytic Database v9.1.1-4 (1 row) dbadmin=> create table product_type ( product_type_id int , product_type_name varchar(255) ); CREATE TABLE dbadmin=> CREATE SEQUENCE seq_product; CREATE SEQUENCE dbadmin=> create table product ( product_id int default nextval('seq_product') , product_name varchar(255) , product_type_id int , product_type_name varchar(255) default (select product_type_name from product_type where product_type.product_type_id = product.product_type_id) ); CREATE TABLE dbadmin=> create table product_copy like product including projections; CREATE TABLE dbadmin=> \q [dbadmin@vertica8 ~]$ vbr -t replicate -c replicate.ini Enter vertica password for destination database: Participating nodes: v_test_db_node0001. Starting replication of objects ['product', 'product_copy', 'product_type'] from test_db. Snapshotting the source database. Snapshot complete. Copying catalog snapshot from source to destination. Preparing destination database for replicating objects. Prep complete, start syncing files. Approximate bytes to copy: 0 of 0 total. [==================================================] 100% Finalizing object replication. Complete object replication, removing snapshot from the source database. Object replication complete!The target cluster (verticademos.com) is still up, and the tables were created:
[dbadmin@vertica8 ~]$ vsql -h verticademos.com -c "SELECT table_name FROM tables WHERE table_name IN ('product_type', 'product', 'product_copy');" table_name -------------- product_type product product_copy (3 rows)@vasmez81 - Are your source and target DB running the same version of Vertica?
Please repeat replication. Crash occurs in the second replication run.
First run is always successful. After the second run, it reports "Object replication is complete!", but the DB is falling.
Both DBs has the same version 9.1.1-0
Hi,
I tried replicating 2 more times. The target DB stayed up.
[dbadmin@vertica8 ~]$ vsql -c "SELECT * FROM product_type;" # SOURCE CLUSTER product_type_id | product_type_name -----------------+------------------- (0 rows) [dbadmin@vertica8 ~]$ vsql -h verticademos.com -c "SELECT * FROM product_type;" # TARGET CLUSTER product_type_id | product_type_name -----------------+------------------- (0 rows) [dbadmin@vertica8 ~]$ vsql -c "INSERT INTO product_type SELECT 1, 'TYPE1'; COMMIT;" # SOURCE CLUSTER COMMIT [dbadmin@vertica8 ~]$ vbr -t replicate -c replicate.ini Enter vertica password for destination database: Participating nodes: v_test_db_node0001. Starting replication of objects ['product', 'product_copy', 'product_type'] from test_db. Snapshotting the source database. Snapshot complete. Copying catalog snapshot from source to destination. Preparing destination database for replicating objects. Prep complete, start syncing files. Approximate bytes to copy: 81 of 81 total. [==================================================] 100% Finalizing object replication. Complete object replication, removing snapshot from the source database. Object replication complete! [dbadmin@vertica8 ~]$ vsql -h verticademos.com -c "SELECT * FROM product_type;" # TARGET CLUSTER product_type_id | product_type_name -----------------+------------------- 1 | TYPE1 (1 row) [dbadmin@vertica8 ~]$ vsql -c "INSERT INTO product_type SELECT 2, 'TYPE2'; COMMIT;" # SOURCE CLUSTER COMMIT [dbadmin@vertica8 ~]$ vbr -t replicate -c replicate.ini Enter vertica password for destination database: Participating nodes: v_test_db_node0001. Starting replication of objects ['product', 'product_copy', 'product_type'] from test_db. Snapshotting the source database. Snapshot complete. Copying catalog snapshot from source to destination. Preparing destination database for replicating objects. Prep complete, start syncing files. Approximate bytes to copy: 81 of 162 total. [==================================================] 100% Finalizing object replication. Complete object replication, removing snapshot from the source database. Object replication complete! [dbadmin@vertica8 ~]$ vsql -h verticademos.com -c "SELECT * FROM product_type;" # TARGET CLUSTER product_type_id | product_type_name -----------------+------------------- 1 | TYPE1 2 | TYPE2 (2 rows)Hello again!
Jim, could You check on cluster with more than one node?
I checked replication on a single node cluster. You are right, the crash is not reproduced in 9.1.1-0.
But in a 3-node cluster the problem is clear.
I was able to reproduce the target failure on a 2 node cluster. I will open a JIRA immediately and will keep this thread updated with its resolution progress. Thanks for letting us know about this!
Fyi:
[dbadmin@SE-Sandbox-43-node1 ~]$ vsql -d mydb -c "SELECT table_name FROM tables WHERE table_name IN ('product_type', 'product', 'product_copy');" # SOURCE DB table_name -------------- product_type product product_copy (3 rows) [dbadmin@SE-Sandbox-43-node1 ~]$ vsql -h 172.16.61.176 -d mydb -c "SELECT table_name FROM tables WHERE table_name IN ('product_type', 'product', 'product_copy');" # TARGET DB table_name ------------ (0 rows) [dbadmin@SE-Sandbox-43-node1 ~]$ vbr -t replicate -c replicate_3_nodes.ini Enter vertica password for destination database: Participating nodes: v_mydb_node0001, v_mydb_node0002, v_mydb_node0003. Starting replication of objects ['product', 'product_copy', 'product_type'] from mydb. Snapshotting the source database. Snapshot complete. Copying catalog snapshot from source to destination. Preparing destination database for replicating objects. Prep complete, start syncing files. Approximate bytes to copy: 0 of 0 total. [==================================================] 100% Finalizing object replication. Complete object replication, removing snapshot from the source database. Object replication complete! [dbadmin@SE-Sandbox-43-node1 ~]$ vsql -h 172.16.61.176 -d mydb -c "SELECT table_name FROM tables WHERE table_name IN ('product_type', 'product', 'product_copy');" # TARGET DB table_name -------------- product_type product product_copy (3 rows) [dbadmin@SE-Sandbox-43-node1 ~]$ vsql -d mydb -c "INSERT INTO product_type SELECT 1, 'TYPE1'; COMMIT;" # SOURCE DB COMMIT [dbadmin@SE-Sandbox-43-node1 ~]$ vsql -h 172.16.61.176 -d mydb -c "SELECT * FROM product_type;" # TARGET DB product_type_id | product_type_name -----------------+------------------- (0 rows) [dbadmin@SE-Sandbox-43-node1 ~]$ vbr -t replicate -c replicate_3_nodes.ini Enter vertica password for destination database: Participating nodes: v_mydb_node0001, v_mydb_node0002, v_mydb_node0003. Starting replication of objects ['product', 'product_copy', 'product_type'] from mydb. Snapshotting the source database. Snapshot complete. Copying catalog snapshot from source to destination. Preparing destination database for replicating objects. Prep complete, start syncing files. Approximate bytes to copy: 162 of 162 total. [==================================================] 100% Finalizing object replication. Complete object replication, removing snapshot from the source database. Object replication complete! [dbadmin@SE-Sandbox-43-node1 ~]$ vsql -h 172.16.61.176 -d mydb -c "SELECT * FROM product_type;" # TARGET DB vsql: could not connect to server: Connection refused Is the server running on host "172.16.61.176" and accepting TCP/IP connections on port 5433?Thank you for support!
Is there any estimate for fixes?
1. Dropping default for columns after copy_table / create table like
2. Replication of such tables
Hi,
Engineering has identified and fixed the issue. Waiting to hear when the code fix will be available to the community. I will update this thread as soon as I know!