The Vertica Forum recently got a makeover! Let us know what you think by filling out this short, anonymous survey.

Error copying large files from HDFS to vertica

Hi,
I am trying to copy over large amount of data from HDFS to Vertica cluster. I'm using Vertica Copy command through a Scala code, which runs as an Oozie job. The Copy command I am using is:
COPY myTable FROM 'hdfs:///mydirectory/ofmanyfiles/part*';
The Vertica nodes are configured as hadoop edge nodes. I am running query using vertica-jdbc-8.1.1-18.jar. But for large size data (> ), it is giving following error(~ after 5 minutes):

Please Note: The copy command finishes even after below error, but the time taken to load data is not fixed. Also for 2GB data it doesn't give error, but for 23 GB it is throwing this error.

Exception in thread "main" java.sql.SQLRecoverableException: [Vertica]VJDBC IOException while communicating with server: java.net.SocketException: Broken pipe (Write failed).
at com.vertica.io.ProtocolStream.logAndConvertToNetworkException(Unknown Source)
at com.vertica.io.ProtocolStream.flush(Unknown Source)
at com.vertica.core.VConnection.handleError(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.executeSimpleProtocol(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.execute(Unknown Source)
at com.vertica.jdbc.common.SStatement.executeNoParams(Unknown Source)
at com.vertica.jdbc.common.SStatement.executeUpdate(Unknown Source)
at com.export.core.DataLoader$.execute(DataLoader.scala:44)
at com.export.core.CheckPointsManager$$anonfun$doExport$1$$anonfun$apply$2.apply(CheckPointsManager.scala:31)
at com.export.core.CheckPointsManager$$anonfun$doExport$1$$anonfun$apply$2.apply(CheckPointsManager.scala:30)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.LinkedHashMap.foreach(LinkedHashMap.scala:139)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at com.export.core.CheckPointsManager$$anonfun$doExport$1.apply(CheckPointsManager.scala:30)
at com.export.core.CheckPointsManager$$anonfun$doExport$1.apply(CheckPointsManager.scala:23)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at com.export.core.CheckPointsManager$.doExport(CheckPointsManager.scala:23)
at com.export.StartMain$.main(StartMain.scala:17)
at com.export.StartMain.main(StartMain.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: com.vertica.support.exceptions.RecoverableException: [Vertica]VJDBC IOException while communicating with server: java.net.SocketException: Broken pipe (Write failed).
... 26 more
Caused by: java.net.SocketException: Broken pipe (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at com.vertica.io.VStream.flush(Unknown Source)
... 25 more

Answers

  • SruthiASruthiA Employee

    May be you are receiving Hadoop connection timeout for huge amount of data. Please try increasing the value of parameter of 'HadoopFSConnectionTimeout' and see if it helps.

  • I couldn't find any property "HadoopFSConnectionTimeout". Could you please tell where to find this?

  • SruthiASruthiA Employee

    Please try running the below command.
    select set_config_parameter('HadoopFSConnectionTimeout',600 );

  • edited October 15

    Hi,
    I tried running copy command after setting this parameter, still getting same error - Broken pipe (Write failed), after 5 minutes.

  • SruthiASruthiA Employee

    Please open a support case if you can. It looks like we need to review logs to debug further.

  • edited October 18

    I checked the log and found only following FATAL:

    [Txn] Commit Complete: Txn: 1b0000000026be3 at epoch 0x5f95f0 and new global catalog version 7850619
    2019-10-17 21:31:39.320 Init Session:7fe319fge708 [EE] [Bulkload] DIRECT copy targeting TABLE_NAME from pipe committed at version 7650699.
    2019-10-17 21:31:39.321 Init Session:7fe519ffe700 @v_vertica247_node0018: 08006/2907: Could not send data to client: No such file or directory
    @v_vertica247_node0018: 08006/2607: Client has disconnected
    LOCATION: internal_handle_error, pqcomm.c:731
    Session _vertica247_node0018-53602:0x40ef73 ended; closing connection (connCnt 1)
    2019-10-17 21:30:40.300 Init Session:7fe319fge708 @v_vertica247_node0018: 00000/2705: Connection received: host=xx.xx.xxx.x port=60810 (connCnt 1)

    However this data load also got succeeded even after throwing the "Broken Pipe" error.
    Can you please let me know if this helps?

    Thanks

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file

Can't find what you're looking for? Search the Vertica Documentation, Knowledge Base, or Blog for more information.