Server closed the connection unexpectedly

Neil_1 · January 2014

I'm getting the following when running a Select statement

"dbadmin=> \i queries/Query1.sql
vsql:queries/Query1.sql:49: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.
vsql:queries/Query1.sql:49: connection to server was lost"

Although the Management Centre still reports that the nodes are all up, and the database is live, running admin tools reports that the node on which vsql was running is down. So running this query kills the node. Using AdminTools on the node to restart it 'Restart Vertica on Host' fails to restart the node. So far, I'm still working out how to get that node up again.

Clearly, when simple SQL Select statements bring down nodes, it's very concerning. It's also concerning that the two different tools AdminTools, and Management Console offer 2 different opinions on whether the node is up (Management Console seems particularly confused as whilst it has Node 1 as up, it has Nodes 2,3,4 as critical - but if Node 1 is up, they wouldn't be I'd think).

Our Vertica version is 6.1.2, we have the EE licence, and a 4 Node cluster running Xeon-E5 2670 processors.

I can't paste the query here, but line 49 is the last line of the query. The query does a little simple math, a few joins, has a where clause.

I don't expect a resolution, but I would like some suggestions on how to go about investigating the cause and resolving it myself.

Thanks,

Neil

[Deleted User] · January 2014

Did you check "select * from nodes;" via vsql or other SQL tool?

Also, did you try to restart the database via admintools?

Stop the database> "Stop Database" in main menu
Start the database> "Start Database" in main menu

Neil_1 · January 2014

Hi,

I didn't check "select * from nodes" - relied solely on:

not being able to connect to vsql on that node
admintools informing that the node was down
management console informing that the 3 other nodes had become (and remained) critical

I did try to restart the database via admintools as you mention, and also attempting to stop (& kill) the Vertica process, before then attempting to start the Vertica process on the failed node.

Can you recommend steps I should take to understand what caused this node to fail?

[Deleted User] · January 2014

You mean that one node is still down? Or could you make the node up? If the node is up now and you want to understand the reason why the node failed, please check vertica.log on the failed node. You can find vertica.log under the catalog directory like "/data/demo/v_demo_node0002_catalog". If you look at some logs around when the node failed, you can see some related logs. Management Console might tell you some points from Messages tab, but it might not be enough sometimes. So, I would like to recommend to check vertica.log.

Neil_1 · February 2014

We got the node back up and joined. Looking at the vertica log showed us that the partition to which we were directing the output of the query had run out of disk space.

Thanks for your responses,

Neil

[Deleted User] · February 2014

You're welcome. Thanks for sharing your investigation. Kanako

We're Moving!

Create My New Community Account Now

Server closed the connection unexpectedly

Comments

Leave a Comment