Merge Queries failing during node down

SK21SK21 Vertica Customer

We tested performance of our cluster by making one of the node down but merge jobs got failed..

The thing is same jobs got successful initially after an hour of node down.

But same jobs got failed after 5 hours since node went down.

We cannot determine the exact reason. Can someone help us.

Here are vertica logs for the same-

2021-11-20 00:16:28.128 DistCall Dispatch:7f25d8cc3700-190000007c282bd [TM] updateCPE: Updating projection tracking_package_history_temp20160330231749_v2_b1 CPE to 0x2299401 from 0xffffffffffffffff for wos 0x210000032b4c648
2021-11-20 00:16:28.128 DistCall Dispatch:7f25d8cc3700-190000007c282bd [TM] updateCPE: Updating projection tracking_package_history_temp20160330231749_v2_b0 CPE to 0x2299401 from 0xffffffffffffffff for wos 0x210000032b4c600
2021-11-20 00:16:28.136 Init Session:7f212ffff700-a00000064ba36d [EE] plan_mergeout: after processing plan
2021-11-20 00:16:28.136 DistCall Dispatch:7f25d8cc3700 [Txn] Commit Complete: Txn: 190000007c282bd at epoch 0x2299402 and new global catalog version 59476569
2021-11-20 00:16:28.136 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) TM mergeout: Total row count = 97, Total delete row count = 3
2021-11-20 00:16:28.136 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Executing the mergeout old replay delete plan
2021-11-20 00:16:28.150 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) End of executing the mergeout old replay delete plan
2021-11-20 00:16:28.150 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Dropping source ROSs:
2021-11-20 00:16:28.150 Init Session:7f212ffff700-a00000064ba36d [EE] 45036012777134781
2021-11-20 00:16:28.150 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout: Updating checkpoint epoch
2021-11-20 00:16:28.151 Init Session:7f212ffff700-a00000064ba36d [TM] makeProjectionCPECurrent: Updating CPE for proj 'order_refund_temp20180813170200_b0' to current
2021-11-20 00:16:28.152 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout: CPE of proj oms.order_refund_temp20180813170200_b0 set to 36279298
2021-11-20 00:16:28.154 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout projection oms.order_refund_temp20180813170200_b0 done
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [TM] ThreadId = 0, Projection chosen = oms.order_refund_temp20180813170200_b0, REsType = 3, RECnt = 1, stratumNo = 0, partnKey =
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout projection oms.order_refund_temp20180813170200_b0 startEpoch = 36277926 endEpoch = 36277926
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout ROS list (from TMService):
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] 45036012777134731
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout ROS list
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] 45036012777134731
2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] 1 ROSes should be merged
2021-11-20 00:16:28.180 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Executing the mergeout plan

Answers

  • HibikiHibiki Vertica Employee Employee

    @SK21 Can you open a new support case and provide vertica.log file and the failed MERGE statement with the timestamp when it failed?

  • SK21SK21 Vertica Customer

    Hello @Hibiki ,
    Unfortunately our support license has expire and we are in a process of renewal.
    Please find logs and timestamp here and let us know if you have any suggestions.
    2021-11-20 00:16:28.127
    2021-11-20 00:16:28.127 DistCall Dispatch:7f25d8cc3700-190000007c282bd [Txn] Starting Commit: Txn: 190000007c282bd 'MERGE INTO shiptrack_dwh.tracking_package_history AS t USING shiptrack_dwh.tmp_tracking_package_history

  • SK21SK21 Vertica Customer

    2021-11-20 00:16:28.127 DistCall Dispatch:7f25d8cc3700-190000007c282bd [Txn] Starting Commit: Txn: 190000007c282bd 'MERGE INTO shiptrack_dwh.tracking_package_history AS t USING shiptrack_dwh.tmp_tracking_package_history AS s ON t.id=s.id AND t.tracking_package_id=s.tracking_package_id WHEN MATCHED THEN UPDATE SET id=s.id,tracking_package_id=s.tracking_package_id,last_status=s.last_status,new_status=s.new_status,current_courier_status=s.current_courier_status,courier_remarks=s.courier_remarks,remarks=s.remarks,status_date=s.status_date,current_location=s.current_location,extra_details=s.extra_details,created=s.created,updated=s.updated,updated_by=s.updated_by,created_by=s.created_by,current_location_description=s.current_location_description,nsl_code=s.nsl_code,delivery_person_name=s.delivery_person_name,delivery_person_mobile=s.delivery_person_mobile,current_location_pincode=s.current_location_pincode,scan_type=s.scan_type,hub_type=s.hub_type,origin_hub_location_code=s.origin_hub_location_code,origin_hub_location_pincode=s.origin_hub_location_pincode,destination_hub_location_code=s.destination_hub_location_code,destination_hub_location_pincode=s.destination_hub_location_pincode,delay_time=s.delay_time,route_id=s.route_id,curr_latitude=s.curr_latitude,curr_longitude=s.curr_longitude,aud_update_ts=s.aud_update_ts,aud_update_key=s.aud_update_key WHEN NOT MATCHED THEN INSERT ( id, tracking_package_id, last_status, new_status, current_courier_status, courier_remarks, remarks, status_date, current_location, extra_details, created, updated, updated_by, created_by, current_location_description, nsl_code, delivery_person_name, delivery_person_mobile, current_location_pincode, scan_type, hub_type, origin_hub_location_code, origin_hub_location_pincode, destination_hub_location_code, destination_hub_location_pincode, delay_time, route_id, curr_latitude, curr_longitude,aud_create_ts,aud_update_ts,aud_create_key,aud_update_key) VALUES ( s.id, s.tracking_package_id, s.last_status, s.new_status, s.current_courier_status, s.courier_remarks, s.remarks, s.status_date, s.current_location, s.extra_details, s.created, s.updated, s.updated_by, s.created_by, s.current_location_description, s.nsl_code, s.delivery_person_name, s.delivery_person_mobile, s.current_location_pincode, s.scan_type, s.hub_type, s.origin_hub_location_code, s.origin_hub_location_pincode, s.destination_hub_location_code, s.destination_hub_location_pincode, s.delay_time, s.route_id, s.curr_latitude, s.curr_longitude,s.aud_create_ts,s.aud_update_ts,s.aud_create_key,s.aud_update_key);COMMIT;SELECT PURGE_TABLE('shiptrack_dwh.tracking_package_history');' 59476568
    2021-11-20 00:16:28.128 DistCall Dispatch:7f25d8cc3700-190000007c282bd [TM] updateCPE: Updating projection tracking_package_history_temp20160330231749_v2_b1 CPE to 0x2299401 from 0xffffffffffffffff for wos 0x210000032b4c648
    2021-11-20 00:16:28.128 DistCall Dispatch:7f25d8cc3700-190000007c282bd [TM] updateCPE: Updating projection tracking_package_history_temp20160330231749_v2_b0 CPE to 0x2299401 from 0xffffffffffffffff for wos 0x210000032b4c600
    2021-11-20 00:16:28.136 Init Session:7f212ffff700-a00000064ba36d [EE] plan_mergeout: after processing plan
    2021-11-20 00:16:28.136 DistCall Dispatch:7f25d8cc3700 [Txn] Commit Complete: Txn: 190000007c282bd at epoch 0x2299402 and new global catalog version 59476569
    2021-11-20 00:16:28.136 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) TM mergeout: Total row count = 97, Total delete row count = 3
    2021-11-20 00:16:28.136 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Executing the mergeout old replay delete plan
    2021-11-20 00:16:28.150 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) End of executing the mergeout old replay delete plan
    2021-11-20 00:16:28.150 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Dropping source ROSs:
    2021-11-20 00:16:28.150 Init Session:7f212ffff700-a00000064ba36d [EE] 45036012777134781
    2021-11-20 00:16:28.150 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout: Updating checkpoint epoch
    2021-11-20 00:16:28.151 Init Session:7f212ffff700-a00000064ba36d [TM] makeProjectionCPECurrent: Updating CPE for proj 'order_refund_temp20180813170200_b0' to current
    2021-11-20 00:16:28.152 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout: CPE of proj oms.order_refund_temp20180813170200_b0 set to 36279298
    2021-11-20 00:16:28.154 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout projection oms.order_refund_temp20180813170200_b0 done
    2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [TM] ThreadId = 0, Projection chosen = oms.order_refund_temp20180813170200_b0, REsType = 3, RECnt = 1, stratumNo = 0, partnKey =
    2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout projection oms.order_refund_temp20180813170200_b0 startEpoch = 36277926 endEpoch = 36277926
    2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout ROS list (from TMService):
    2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] 45036012777134731
    2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Mergeout ROS list
    2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] 45036012777134731
    2021-11-20 00:16:28.179 Init Session:7f212ffff700-a00000064ba36d [EE] 1 ROSes should be merged
    2021-11-20 00:16:28.180 Init Session:7f212ffff700-a00000064ba36d [EE] (a00000064ba36d) Executing the mergeout plan

  • HibikiHibiki Vertica Employee Employee

    @SK21 I understand your current situation. I will check it internally.

    I saw the log messages you posted but I could not see any error message. How did you know the MERGE statement failed?

  • SK21SK21 Vertica Customer

    @Hibiki our internal logs for reporting displayed it although we could not also find exact results in management console and vertica .logs.

    Internal Logs
    22:30:01 : Execution of script /home/biops/dwh/scripts/dwh_vertica_data_merge.sh failed because of VSQL or OS error
    22:30:01 : Script dwh_vertica_data_load.sh failed because of SQL or OS error

  • HibikiHibiki Vertica Employee Employee

    @SK21 I understand it. So I guess your internal system got an unexpected return value from dwh_vertica_data_merge.sh and then generated this error message. Please check why dwh_vertica_data_merge.sh returned that value.

  • SK21SK21 Vertica Customer

    @Hibiki we got the reason from Vertica Management Console i.e" Execution canceled by operator".

  • HibikiHibiki Vertica Employee Employee

    @SK21 So you executed the statements from MC and saw the error, right?

    According to the log messages you provided before, you executed the following statements from the script.

    • MERGE statement
    • COMMIT
    • PURGE_TABLE function

    Did you execute only the MERGE statement from MC?

    If you saw that error, I believe you can see that error with the associated messages in vertica.log. To know the details, can you share vertica.log file with me?

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file