Merge Queries failing during node down
We tested performance of our cluster by making one of the node down but merge jobs got failed..
The thing is same jobs got successful initially after an hour of node down.
But same jobs got failed after 5 hours since node went down.
We cannot determine the exact reason. Can someone help us.
Here are vertica logs for the same-
2021-11-20 00:16:28.128 DistCall Dispatch:7f25d8cc3700-190000007c282bd [TM] updateCPE: Updating projection tracking_package_history_temp20160330231749_v2_b1 CPE to 0x2299401 from 0xffffffffffffffff for wos 0x210000032b4c648
2021-11-20 00:16:28.128 DistCall Dispatch:7f25d8cc3700-190000007c282bd [TM] updateCPE: Updating projection tracking_package_history_temp20160330231749_v2_b0 CPE to 0x2299401 from 0xffffffffffffffff for wos 0x210000032b4c600
2021-11-20 00:16:28.136 Init Session:7f212ffff700-a00000064ba36d [EE] plan_mergeout: after processing plan
2021-11-20 00:16:28.136 DistCall Dispatch:7f25d8cc3700 [Txn] Commit Complete: Txn: 190000007c282bd at epoch 0x2299402 and new global catalog version 59476569
2021-11-20 00:16:28.136 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) TM mergeout: Total row count = 97, Total delete row count = 3
2021-11-20 00:16:28.136 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Executing the mergeout old replay delete plan
2021-11-20 00:16:28.150 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) End of executing the mergeout old replay delete plan
2021-11-20 00:16:28.150 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Dropping source ROSs:
2021-11-20 00:16:28.150 Init Session:7f212ffff700-a00000064ba36d [EE] 45036012777134781
2021-11-20 00:16:28.150 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout: Updating checkpoint epoch
2021-11-20 00:16:28.151 Init Session:7f212ffff700-a00000064ba36d [TM] makeProjectionCPECurrent: Updating CPE for proj 'order_refund_temp20180813170200_b0' to current
2021-11-20 00:16:28.152 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout: CPE of proj oms.order_refund_temp20180813170200_b0 set to 36279298
2021-11-20 00:16:28.154 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout projection oms.order_refund_temp20180813170200_b0 done
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [TM] ThreadId = 0, Projection chosen = oms.order_refund_temp20180813170200_b0, REsType = 3, RECnt = 1, stratumNo = 0, partnKey =
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout projection oms.order_refund_temp20180813170200_b0 startEpoch = 36277926 endEpoch = 36277926
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout ROS list (from TMService):
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] 45036012777134731
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout ROS list
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] 45036012777134731
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] 1 ROSes should be merged
2021-11-20 00:16:28.180 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Executing the mergeout plan
Answers
@SK21 Can you open a new support case and provide vertica.log file and the failed MERGE statement with the timestamp when it failed?
Hello @Hibiki ,
Unfortunately our support license has expire and we are in a process of renewal.
Please find logs and timestamp here and let us know if you have any suggestions.
2021-11-20 00:16:28.127
2021-11-20 00:16:28.127 DistCall Dispatch:7f25d8cc3700-190000007c282bd [Txn] Starting Commit: Txn: 190000007c282bd 'MERGE INTO shiptrack_dwh.tracking_package_history AS t USING shiptrack_dwh.tmp_tracking_package_history
2021-11-20 00:16:28.127 DistCall Dispatch:7f25d8cc3700-190000007c282bd [Txn] Starting Commit: Txn: 190000007c282bd 'MERGE INTO shiptrack_dwh.tracking_package_history AS t USING shiptrack_dwh.tmp_tracking_package_history AS s ON t.id=s.id AND t.tracking_package_id=s.tracking_package_id WHEN MATCHED THEN UPDATE SET id=s.id,tracking_package_id=s.tracking_package_id,last_status=s.last_status,new_status=s.new_status,current_courier_status=s.current_courier_status,courier_remarks=s.courier_remarks,remarks=s.remarks,status_date=s.status_date,current_location=s.current_location,extra_details=s.extra_details,created=s.created,updated=s.updated,updated_by=s.updated_by,created_by=s.created_by,current_location_description=s.current_location_description,nsl_code=s.nsl_code,delivery_person_name=s.delivery_person_name,delivery_person_mobile=s.delivery_person_mobile,current_location_pincode=s.current_location_pincode,scan_type=s.scan_type,hub_type=s.hub_type,origin_hub_location_code=s.origin_hub_location_code,origin_hub_location_pincode=s.origin_hub_location_pincode,destination_hub_location_code=s.destination_hub_location_code,destination_hub_location_pincode=s.destination_hub_location_pincode,delay_time=s.delay_time,route_id=s.route_id,curr_latitude=s.curr_latitude,curr_longitude=s.curr_longitude,aud_update_ts=s.aud_update_ts,aud_update_key=s.aud_update_key WHEN NOT MATCHED THEN INSERT ( id, tracking_package_id, last_status, new_status, current_courier_status, courier_remarks, remarks, status_date, current_location, extra_details, created, updated, updated_by, created_by, current_location_description, nsl_code, delivery_person_name, delivery_person_mobile, current_location_pincode, scan_type, hub_type, origin_hub_location_code, origin_hub_location_pincode, destination_hub_location_code, destination_hub_location_pincode, delay_time, route_id, curr_latitude, curr_longitude,aud_create_ts,aud_update_ts,aud_create_key,aud_update_key) VALUES ( s.id, s.tracking_package_id, s.last_status, s.new_status, s.current_courier_status, s.courier_remarks, s.remarks, s.status_date, s.current_location, s.extra_details, s.created, s.updated, s.updated_by, s.created_by, s.current_location_description, s.nsl_code, s.delivery_person_name, s.delivery_person_mobile, s.current_location_pincode, s.scan_type, s.hub_type, s.origin_hub_location_code, s.origin_hub_location_pincode, s.destination_hub_location_code, s.destination_hub_location_pincode, s.delay_time, s.route_id, s.curr_latitude, s.curr_longitude,s.aud_create_ts,s.aud_update_ts,s.aud_create_key,s.aud_update_key);COMMIT;SELECT PURGE_TABLE('shiptrack_dwh.tracking_package_history');' 59476568
2021-11-20 00:16:28.128 DistCall Dispatch:7f25d8cc3700-190000007c282bd [TM] updateCPE: Updating projection tracking_package_history_temp20160330231749_v2_b1 CPE to 0x2299401 from 0xffffffffffffffff for wos 0x210000032b4c648
2021-11-20 00:16:28.128 DistCall Dispatch:7f25d8cc3700-190000007c282bd [TM] updateCPE: Updating projection tracking_package_history_temp20160330231749_v2_b0 CPE to 0x2299401 from 0xffffffffffffffff for wos 0x210000032b4c600
2021-11-20 00:16:28.136 Init Session:7f212ffff700-a00000064ba36d [EE] plan_mergeout: after processing plan
2021-11-20 00:16:28.136 DistCall Dispatch:7f25d8cc3700 [Txn] Commit Complete: Txn: 190000007c282bd at epoch 0x2299402 and new global catalog version 59476569
2021-11-20 00:16:28.136 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) TM mergeout: Total row count = 97, Total delete row count = 3
2021-11-20 00:16:28.136 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Executing the mergeout old replay delete plan
2021-11-20 00:16:28.150 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) End of executing the mergeout old replay delete plan
2021-11-20 00:16:28.150 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Dropping source ROSs:
2021-11-20 00:16:28.150 Init Session:7f212ffff700-a00000064ba36d [EE] 45036012777134781
2021-11-20 00:16:28.150 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout: Updating checkpoint epoch
2021-11-20 00:16:28.151 Init Session:7f212ffff700-a00000064ba36d [TM] makeProjectionCPECurrent: Updating CPE for proj 'order_refund_temp20180813170200_b0' to current
2021-11-20 00:16:28.152 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout: CPE of proj oms.order_refund_temp20180813170200_b0 set to 36279298
2021-11-20 00:16:28.154 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout projection oms.order_refund_temp20180813170200_b0 done
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [TM] ThreadId = 0, Projection chosen = oms.order_refund_temp20180813170200_b0, REsType = 3, RECnt = 1, stratumNo = 0, partnKey =
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout projection oms.order_refund_temp20180813170200_b0 startEpoch = 36277926 endEpoch = 36277926
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout ROS list (from TMService):
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] 45036012777134731
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout ROS list
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] 45036012777134731
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] 1 ROSes should be merged
2021-11-20 00:16:28.180 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Executing the mergeout plan
@SK21 I understand your current situation. I will check it internally.
I saw the log messages you posted but I could not see any error message. How did you know the MERGE statement failed?
@Hibiki our internal logs for reporting displayed it although we could not also find exact results in management console and vertica .logs.
Internal Logs
22:30:01 : Execution of script /home/biops/dwh/scripts/dwh_vertica_data_merge.sh failed because of VSQL or OS error
22:30:01 : Script dwh_vertica_data_load.sh failed because of SQL or OS error
@SK21 I understand it. So I guess your internal system got an unexpected return value from dwh_vertica_data_merge.sh and then generated this error message. Please check why dwh_vertica_data_merge.sh returned that value.
@Hibiki we got the reason from Vertica Management Console i.e" Execution canceled by operator".
@SK21 So you executed the statements from MC and saw the error, right?
According to the log messages you provided before, you executed the following statements from the script.
Did you execute only the MERGE statement from MC?
If you saw that error, I believe you can see that error with the associated messages in vertica.log. To know the details, can you share vertica.log file with me?