Vertica crash on AWS when applied partitions

Need help in debugging an issue with Vertica on AWS.We have set up a 3 instance cluster on AWS with t2 large across all nodes while trying to partition the table facing an issue that shuts down Vertica database abruptly, there are no significant messages recorded on /var/log/messages file.
messages file around the shutdown on
Node-1
Nov 9 17:21:04 ip-10.0.0.1 su: (to dbadmin) ec2-user on pts/1
Nov 9 17:21:45 ip- 10.0.0.1 dhclient[3217]: XMT: Solicit on eth0, interval 108330ms.
Nov 9 17:23:34 ip- 10.0.0.1 dhclient[3217]: XMT: Solicit on eth0, interval 120900ms.
Nov 9 17:25:35 ip- 10.0.0.1 dhclient[3217]: XMT: Solicit on eth0, interval 113820ms.
Nov 9 17:27:29 ip- 10.0.0.1 dhclient[3217]: XMT: Solicit on eth0, interval 117800ms.
Nov 9 17:29:27 ip- 10.0.0.1 dhclient[3217]: XMT: Solicit on eth0, interval 119340ms.
Nov 9 17:30:01 ip- 10.0.0.1 systemd: Created slice User Slice of root.

Node-2
Nov 9 17:20:01 ip-10.0.0.2 systemd: Created slice User Slice of root.
Nov 9 17:20:01 ip-10.0.0.2 systemd: Starting User Slice of root.
Nov 9 17:20:01 ip-10.0.0.2 systemd: Started Session 51 of user root.
Nov 9 17:20:01 ip-10.0.0.2 systemd: Starting Session 51 of user root.
Nov 9 17:20:01 ip-10.0.0.2 systemd: Removed slice User Slice of root.
Nov 9 17:20:01 ip-10.0.0.2 systemd: Stopping User Slice of root.
Nov 9 17:20:20 ip-10.0.0.2 dhclient[3189]: XMT: Solicit on eth0, interval 110860ms.
Nov 9 17:22:03 ip-10.0.0.2 systemd-logind: Removed session 40.

Node-3
Nov 9 17:20:01 ip-10.0.0.3 systemd: Stopping User Slice of root.
Nov 9 17:21:14 ip-10.0.0.3 systemd-logind: Removed session 40.
Nov 9 17:21:14 ip-10.0.0.3 systemd: Removed slice User Slice of dbadmin.
Nov 9 17:21:14 ip-10.0.0.3 systemd: Stopping User Slice of dbadmin.

                           Instance Type     Diskspace  Swap Size   ulimit -n

Node-1(Initiator) t2 large 130 GB 10 GB 65536
Node-2 t2 large 30 GB 10 GB 65536
Node-3 t2 large 30 GB 10 GB 65536

The error is similar for 1GB of data and 10 GB while partitioning any table
records from DC_ERRORS table for that session
/SELECT event_timestamp,node_name,user_name,session_id,error_level,error_code,message,hint
FROM error_messages
where session_id = 'v_tpcds_db2_node0001-31783:0x40b';
/
event_timestamp node_name user_name session_id error_level error_code message hint
09-11-2020 17:21 v_tpcds_db2_node0001 dbadmin v_tpcds_db2_node0001-31783:0x40b NOTICE 0 The new partitioning scheme will produce partitions in 72 physical storage containers per projection
09-11-2020 17:21 v_tpcds_db2_node0001 dbadmin v_tpcds_db2_node0001-31783:0x40b WARNING 64 Queries using table "web_returns" may not perform optimally since the data may not be repartitioned in accordance with the new partition expression Use "ALTER TABLE tpc1gb.web_returns REORGANIZE;" to repartition the data.
In the comment down will add the message from the vertica.log file on Node-1 during table partition .

Answers

  • dakshayaniravilladakshayaniravilla Community Edition User

    --Node-1(Initiator)
    2020-11-09 17:21:13.791 Init Session:0x7fc7d99ff700 [Session] [Query] TX:0(v_tpcds_db2_node0001-31783:0x40b) ALTER TABLE tpc1gb.web_returns
    PARTITION BY ((wr_d_date)::date)
    GROUP BY DATE(DATE_TRUNC('MONTH',wr_d_date));
    2020-11-09 17:21:13.791 Init Session:0x7fc7d99ff700-a00000000005fa [Txn] Begin Txn: a00000000005fa 'ALTER TABLE tpc1gb.web_returns
    PARTITION BY ((wr_d_date)::date)
    GROUP BY DATE(DATE_TRUNC('MONTH',wr_d_date));'
    2020-11-09 17:21:13.795 Init Session:0x7fc7d99ff700-a00000000005fa [Txn] Starting Commit: Txn: a00000000005fa 'ALTER TABLE tpc1gb.web_returns
    PARTITION BY ((wr_d_date)::date)
    GROUP BY DATE(DATE_TRUNC('MONTH',wr_d_date));' 876
    2020-11-09 17:21:13.796 Init Session:0x7fc7d99ff700 [Txn] Commit Complete: Txn: a00000000005fa at epoch 0x4f and global catalog version 876
    2020-11-09 17:21:13.798 Init Session:0x7fc7d99ff700-a00000000005fb [Txn] Begin Txn: a00000000005fb 'ALTER TABLE tpc1gb.web_returns
    PARTITION BY ((wr_d_date)::date)
    GROUP BY DATE(DATE_TRUNC('MONTH',wr_d_date));'
    2020-11-09 17:21:13.804 InternalStmt:0x7fc6e712f700-a00000000005fc [Txn] Begin Txn: a00000000005fc 'parseOptimizerDirectives'
    2020-11-09 17:21:13.804 InternalStmt:0x7fc6e712f700-a00000000005fc [Txn] Rollback Txn: a00000000005fc 'parseOptimizerDirectives'
    2020-11-09 17:21:13.805 InternalStmt:0x7fc6e712f700 [Session] InternalStatement subsession v_tpcds_db2_node0001-31783:0x40d inherited parent session v_tpcds_db2_
    node0001-31783:0x40b
    2020-11-09 17:21:13.805 InternalStmt:0x7fc6e712f700-a00000000005fd [Txn] Begin Txn: a00000000005fd 'SELECT COUNT (CASE WHEN wr_d_date IS NULL THEN 1 ELSE NULL EN
    D) AS wr_d_date, COUNT (DISTINCT date(date_trunc('MONTH', web_returns.wr_d_date))) FROM tpc1gb.web_returns;'
    2020-11-09 17:21:13.868 InternalStmt:0x7fc6e712f700-a00000000005fd [Txn] Starting Commit: Txn: a00000000005fd 'SELECT COUNT (CASE WHEN wr_d_date IS NULL THEN 1 E
    LSE NULL END) AS wr_d_date, COUNT (DISTINCT date(date_trunc('MONTH', web_returns.wr_d_date))) FROM tpc1gb.web_returns;' 876
    2020-11-09 17:21:13.868 InternalStmt:0x7fc6e712f700 [Txn] Commit Complete: Txn: a00000000005fd at epoch 0x4f and global catalog version 876
    2020-11-09 17:21:13.869 Init Session:0x7fc7d99ff700-a00000000005fb @v_tpcds_db2_node0001: 00000/8364: The new partitioning scheme will produce partitions in 72
    physical storage containers per projection
    2020-11-09 17:21:13.876 Init Session:0x7fc7d99ff700-a00000000005fb @v_tpcds_db2_node0001: 01000/4493: Queries using table "web_returns" may not perform optima
    lly since the data may not be repartitioned in accordance with the new partition expression
    HINT: Use "ALTER TABLE tpc1gb.web_returns REORGANIZE;" to repartition the data
    2020-11-09 17:21:13.877 Init Session:0x7fc7d99ff700-a00000000005fb [Txn] Starting Commit: Txn: a00000000005fb 'ALTER TABLE tpc1gb.web_returns
    PARTITION BY ((wr_d_date)::date)
    GROUP BY DATE(DATE_TRUNC('MONTH',wr_d_date));' 876
    2020-11-09 17:21:13.878 Init Session:0x7fc7d99ff700-a00000000005fb [Util] Task 'TM Mergeout(00)' enabled
    2020-11-09 17:21:13.878 Init Session:0x7fc7d99ff700-a00000000005fb [Util] Task 'TM Mergeout(01)' enabled
    2020-11-09 17:21:13.878 Init Session:0x7fc7d99ff700-a00000000005fb [Util] Task 'TM Mergeout(02)' enabled
    2020-11-09 17:21:13.878 Init Session:0x7fc7d99ff700-a00000000005fb [Util] Task 'TM Mergeout(03)' enabled
    2020-11-09 17:21:13.878 Init Session:0x7fc7d99ff700-a00000000005fb [Util] Task 'TM Mergeout(04)' enabled
    2020-11-09 17:21:13.878 Init Session:0x7fc7d99ff700-a00000000005fb [Util] Task 'TM Mergeout(05)' enabled
    2020-11-09 17:21:13.878 Init Session:0x7fc7d99ff700-a00000000005fb [Util] Task 'TM Mergeout(06)' enabled
    2020-11-09 17:21:13.879 Init Session:0x7fc7d99ff700 [Txn] Commit Complete: Txn: a00000000005fb at epoch 0x4f and new global catalog version 877
    2020-11-09 17:21:13.879 TM Mergeout(02):0x7fc7caec9700-a0000000000600 [Txn] Begin Txn: a0000000000600 'Mergeout: Tuple Mover'
    2020-11-09 17:21:13.879 TM Mergeout(02):0x7fc7caec9700-a0000000000600 [TM] TMService : dequeued a [MERGEOUT] request for the projection 45035996273712834
    2020-11-09 17:21:13.879 TM Mergeout(04):0x7fc58cf37700-a00000000005ff [Txn] Begin Txn: a00000000005ff 'Mergeout: Tuple Mover'
    2020-11-09 17:21:13.879 TM Mergeout(04):0x7fc58cf37700-a00000000005ff [TM] TMService : dequeued a [MERGEOUT] request for the projection 45035996273712776
    2020-11-09 17:21:13.880 TM Mergeout(03):0x7fc7cbecb700-a0000000000601 [Txn] Begin Txn: a0000000000601 'Mergeout: Tuple Mover'
    2020-11-09 17:21:13.880 TM Mergeout(03):0x7fc7cbecb700-a0000000000601 [Txn] Rollback Txn: a0000000000601 'Mergeout: Tuple Mover'
    2020-11-09 17:21:13.880 TM Mergeout(04):0x7fc58cf37700-a00000000005ff [Main] Handling signal: 11
    2020-11-09 17:21:13.880 TM Mergeout(02):0x7fc7caec9700-a0000000000600 [TM] Has more than one job ? No, has eligible threads No, this threadId: 2, minimum stratum
    # of skipped jobs 65535
    2020-11-09 17:21:13.880 TM Mergeout(02):0x7fc7caec9700-a0000000000600 [Txn] Rollback Txn: a0000000000600 'Mergeout: Tuple Mover'
    2020-11-09 17:21:13.881 TM Mergeout(01):0x7fc6e712f700-a0000000000602 [Txn] Begin Txn: a0000000000602 'Mergeout: Tuple Mover'
    2020-11-09 17:21:13.881 TM Mergeout(03):0x7fc7cbecb700 [Util] Task 'TM Mergeout(03)' enabled
    2020-11-09 17:21:13.881 TM Mergeout(02):0x7fc7caec9700-a0000000000603 [Txn] Begin Txn: a0000000000603 'Mergeout: Tuple Mover'
    2020-11-09 17:21:13.881 TM Mergeout(01):0x7fc6e712f700-a0000000000602 [Txn] Rollback Txn: a0000000000602 'Mergeout: Tuple Mover'
    2020-11-09 17:21:13.881 TM Mergeout(01):0x7fc6e712f700 [Util] Task 'TM Mergeout(01)' enabled
    2020-11-09 17:21:13.882 TM Mergeout(02):0x7fc7caec9700-a0000000000603 [Txn] Rollback Txn: a0000000000603 'Mergeout: Tuple Mover'
    2020-11-09 17:21:13.882 TM Mergeout(02):0x7fc7caec9700 [Util] Task 'TM Mergeout(02)' enabled
    2020-11-09 17:21:13.882 TM Mergeout(00):0x7fc7cb6ca700-a0000000000604 [Txn] Begin Txn: a0000000000604 'Mergeout: Tuple Mover'
    2020-11-09 17:21:13.882 TM Mergeout(00):0x7fc7cb6ca700-a0000000000604 [Txn] Rollback Txn: a0000000000604 'Mergeout: Tuple Mover'
    2020-11-09 17:21:13.882 TM Mergeout(00):0x7fc7cb6ca700 [Util] Task 'TM Mergeout(00)' enabled
    2020-11-09 17:21:13.883 TM Mergeout(05):0x7fc7cc6cc700-a0000000000605 [Txn] Begin Txn: a0000000000605 'Mergeout: Tuple Mover'
    2020-11-09 17:21:13.883 TM Mergeout(05):0x7fc7cc6cc700-a0000000000605 [Txn] Rollback Txn: a0000000000605 'Mergeout: Tuple Mover'
    2020-11-09 17:21:13.883 TM Mergeout(05):0x7fc7cc6cc700 [Util] Task 'TM Mergeout(05)' enabled
    2020-11-09 17:21:13.884 TM Mergeout(06):0x7fc6efd34700-a0000000000606 [Txn] Begin Txn: a0000000000606 'Mergeout: Tuple Mover'
    2020-11-09 17:21:13.884 TM Mergeout(06):0x7fc6efd34700-a0000000000606 [Txn] Rollback Txn: a0000000000606 'Mergeout: Tuple Mover'
    2020-11-09 17:21:13.884 TM Mergeout(06):0x7fc6efd34700 [Util] Task 'TM Mergeout(06)' enabled
    2020-11-09 17:21:14.003 MetadataPoolMonitor:0x7fc6e712f700 [ResourceManager] Update metadata resource pool memory with delta: Memory(KB): 1
    2020-11-09 17:21:14.003 MetadataPoolMonitor:0x7fc6e712f700 @v_tpcds_db2_node0001: 00000/7794: Updated metadata pool: Memory(KB): 39801
    2020-11-09 17:21:14.080 TM Mergeout(04):0x7fc58cf37700-a00000000005ff [Main] Received fatal signal SIGSEGV.
    2020-11-09 17:21:14.080 TM Mergeout(04):0x7fc58cf37700-a00000000005ff [Main] Info: si_code: 128, si_pid: 0, si_uid: 0, si_addr: (nil)

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file