Error: 'Could not send data to client: No such file or directory'
Recently we're running into issues with queries failing for no apparent reason with the error message 'Could not send data to client: No such file or directory'. The logs show the following:
$ cat vertica.log | grep 501c8bff 2019-01-30 13:23:54.849 Init Session:7f0a70ff9700-a00000501c8bff [Txn] <INFO> Begin Txn: a00000501c8bff 'SELECT article_1, 2019-01-30 13:24:04.916 Init Session:7f0a70ff9700-a00000501c8bff [EE] <INFO> Query runtime exceeds limit, canceling 2019-01-30 13:28:54.924 Init Session:7f0a70ff9700-a00000501c8bff [EE] <INFO> Query runtime exceeds limit, canceling [a00000501c8bff,1] - Queries:1,Threads:29,File Handles:104,Memory(KB):1102659, 2019-01-30 13:58:54.025 EEThread:7f09a0cca700-a00000501c8bff <LOG> @v_node0001: VX001/2907: Could not send data to client: No such file or directory 2019-01-30 13:58:57.444 Init Session:7f0a70ff9700-a00000501c8bff [EE] <INFO> Query can't be replanned due to partial output from initial execution 2019-01-30 13:58:57.444 Init Session:7f0a70ff9700-a00000501c8bff <FATAL> @v_node001: 08006/2607: Client has disconnected 2019-01-30 13:58:57.445 Init Session:7f0a70ff9700-a00000501c8bff [Txn] <INFO> Rollback Txn: a00000501c8bff 'SELECT article_1,
The second and third lines correspond to the query cascading to the next resource pool. It seems the query failed exactly 30 minutes after the final cascade.
We also see the same issue when executing a number of similar queries directly after each other (same select statement, different time windows). In that case the failure also occurs roughly 30 minutes after the start of the first query, even though the final failed query had been running for only a few minutes at that point.
These queries are all executed by connecting directly to a node in the cluster, so there are no load balancer timeouts that might be involved. Nowhere in our settings have we defined a timeout of 30 minutes, not on the resource pool nor for users individually.
We're running Vertica v9.0.1-0 on a 4 node cluster. To execute these queries we use pyODBC and vertica_python from Python applications, and we see the same issue with both modules. I have not yet encountered the issue with vsql or JDBC.
Any help on what could cause this error would be much appreciated.