Cluster crash when selecting from v_monitor.rebalance_table_status

when running a simple 'select *' query on v_monitor.rebalance_table_status, all 3 nodes in our vertica cluster crashed. This problem happened twice. The version of Vertica this happened on was 6.1.0 Log lines from the last crash are: 2013-04-05 15:31:15.311 Init Session:0x7fa6dc4cdda0 [Txn] Begin Txn: a00000000dd195 'SELECT * FROM "v_monitor"."rebalance_table_status" limit 500 ' 2013-04-05 15:31:25.698 Init Session:0x7fa6dc4cdda0 @v_bidw_node0002: 01000/3161: EE: Could not fork a command queue thread 2013-04-05 15:31:27.546 Init Session:0x7fa6dc4cdda0 [Dist] Node v_bidw_node0002 was in plan recipents list at plan start but is now DOWN 2013-04-05 15:31:27.547 Init Session:0x7fa6dc4cdda0 @v_bidw_node0001: 01000/4539: Received no response from v_bidw_node0002 in abandon plan 2013-04-05 15:31:27.547 Init Session:0x7fa6dc4cdda0 @v_bidw_node0001: 01000/4539: Received no response from v_bidw_node0003 in abandon plan 2013-04-05 15:31:27.548 Init Session:0x7fa6dc4cdda0 [EE] Query Retry retrying after following discovery: 2013-04-05 15:31:27.548 Init Session:0x7fa6dc4cdda0 [EE] Node failure during execution 2013-04-05 15:31:27.548 Init Session:0x7fa6dc4cdda0 [EE] Query Retry action: Node failure: Recipients list has a node that is not currently up 2013-04-05 15:31:27.575 Init Session:0x7fa6dc4cdda0 @v_bidw_node0001: 01000/4539: Received no response from v_bidw_node0002 in roll back transaction 2013-04-05 15:31:27.575 Init Session:0x7fa6dc4cdda0 @v_bidw_node0001: 01000/4539: Received no response from v_bidw_node0003 in roll back transaction 2013-04-05 15:31:27.680 Init Session:0x7fa6dc4cdda0 @v_bidw_node0001: 42V15/3586: Insufficient projections to answer query 2013-04-05 15:31:27.681 Init Session:0x7fa6dc4cdda0 @v_bidw_node0001: 01000/4539: Received no response from v_bidw_node0002 in roll back transaction 2013-04-05 15:31:27.681 Init Session:0x7fa6dc4cdda0 @v_bidw_node0001: 01000/4539: Received no response from v_bidw_node0003 in roll back transaction 2013-04-05 15:31:27.681 Init Session:0x7fa6dc4cdda0 @v_bidw_node0001: 57V03/4748: Shutdown in progress. No longer accepting connections

Comments

  • Rob could you attach the ErrorReport.txt files from the nodes and search for the word PANIC in the vertica.log files.  Can you then upload the vertica.log file that contains the word PANIC.  This will help us in the reason why this was occuring.  

    Thanks
    Amy
  • Rob, If you are still experiencing this issue and can't post the file, please send it to me directly and I will get someone to review it. My email is amiller@vertica.com
  • I see this in your log snippet: 2013-04-05 15:31:25.698 Init Session:0x7fa6dc4cdda0 @v_bidw_node0002: 01000/3161: EE: Could not fork a command queue thread I suspect thread limit is too low. As dbadmin, run 'ulimit -a' and look for 'max user processes' - it should be a large number

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file