Server closed the connection unexpectedly

I'm getting the following when running a Select statement

"dbadmin=> \i queries/Query1.sql
vsql:queries/Query1.sql:49: server closed the connection unexpectedly        This probably means the server terminated abnormally        before or while processing the request.
vsql:queries/Query1.sql:49: connection to server was lost"

Although the Management Centre still reports that the nodes are all up, and the database is live, running admin tools reports that the node on which vsql was running is down. So running this query kills the node. Using AdminTools on the node to restart it 'Restart Vertica on Host' fails to restart the node. So far, I'm still working out how to get that node up again.

Clearly, when simple SQL Select statements bring down nodes, it's very concerning. It's also concerning that the two different tools AdminTools, and Management Console offer 2 different opinions on whether the node is up (Management Console seems particularly confused as whilst it has Node 1 as up, it has Nodes 2,3,4 as critical - but if Node 1 is up, they wouldn't be I'd think).

Our Vertica version is 6.1.2, we have the EE licence, and a 4 Node cluster running Xeon-E5 2670 processors.

I can't paste the query here, but line 49 is the last line of the query. The query does a little simple math, a few joins, has a where clause.

I don't expect a resolution, but I would like some suggestions on how to go about investigating the cause and resolving it myself.

Thanks,

Neil

Comments

  • Did you check "select * from nodes;" via vsql or other SQL tool?

    Also, did you try to restart the database via admintools?

    Stop the database> "Stop Database" in main menu
    Start the database> "Start Database" in main menu
  • Hi,

    I didn't check "select * from nodes" - relied solely on:
    • not being able to connect to vsql on that node
    • admintools informing that the node was down
    • management console informing that the 3 other nodes had become (and remained) critical
    I did try to restart the database via admintools as you mention, and also attempting to stop (& kill) the Vertica process, before then attempting to start the Vertica process on the failed node.

    Can you recommend steps I should take to understand what caused this node to fail?



  • You mean that one node is still down? Or could you make the node up? If the node is up now and you want to understand the reason why the node failed, please check vertica.log on the failed node. You can find vertica.log under the catalog directory like "/data/demo/v_demo_node0002_catalog". If you look at some logs around when the node failed, you can see some related logs. Management Console might tell you some points from Messages tab, but it might not be enough sometimes. So, I would like to recommend to check vertica.log.
  • We got the node back up and joined. Looking at the vertica log showed us that the partition to which we were directing the output of the query had run out of disk space.

    Thanks for your responses,

    Neil
  • You're welcome. Thanks for sharing your investigation. Kanako

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file