Error copying large files from HDFS to vertica

Hi,
I am trying to copy over large amount of data from HDFS to Vertica cluster. I'm using Vertica Copy command through a Scala code, which runs as an Oozie job. The Copy command I am using is:
COPY myTable FROM 'hdfs:///mydirectory/ofmanyfiles/part*';
The Vertica nodes are configured as hadoop edge nodes. I am running query using vertica-jdbc-8.1.1-18.jar. But for large size data (> ), it is giving following error(~ after 5 minutes):

Please Note: The copy command finishes even after below error, but the time taken to load data is not fixed. Also for 2GB data it doesn't give error, but for 23 GB it is throwing this error.

Exception in thread "main" java.sql.SQLRecoverableException: [Vertica]VJDBC IOException while communicating with server: java.net.SocketException: Broken pipe (Write failed).
at com.vertica.io.ProtocolStream.logAndConvertToNetworkException(Unknown Source)
at com.vertica.io.ProtocolStream.flush(Unknown Source)
at com.vertica.core.VConnection.handleError(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.executeSimpleProtocol(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.execute(Unknown Source)
at com.vertica.jdbc.common.SStatement.executeNoParams(Unknown Source)
at com.vertica.jdbc.common.SStatement.executeUpdate(Unknown Source)
at com.export.core.DataLoader$.execute(DataLoader.scala:44)
at com.export.core.CheckPointsManager$$anonfun$doExport$1$$anonfun$apply$2.apply(CheckPointsManager.scala:31)
at com.export.core.CheckPointsManager$$anonfun$doExport$1$$anonfun$apply$2.apply(CheckPointsManager.scala:30)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.LinkedHashMap.foreach(LinkedHashMap.scala:139)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at com.export.core.CheckPointsManager$$anonfun$doExport$1.apply(CheckPointsManager.scala:30)
at com.export.core.CheckPointsManager$$anonfun$doExport$1.apply(CheckPointsManager.scala:23)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at com.export.core.CheckPointsManager$.doExport(CheckPointsManager.scala:23)
at com.export.StartMain$.main(StartMain.scala:17)
at com.export.StartMain.main(StartMain.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: com.vertica.support.exceptions.RecoverableException: [Vertica]VJDBC IOException while communicating with server: java.net.SocketException: Broken pipe (Write failed).
... 26 more
Caused by: java.net.SocketException: Broken pipe (Write failed)
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at com.vertica.io.VStream.flush(Unknown Source)
... 25 more

Answers

  • SruthiASruthiA Administrator

    May be you are receiving Hadoop connection timeout for huge amount of data. Please try increasing the value of parameter of 'HadoopFSConnectionTimeout' and see if it helps.

  • anup1007anup1007 - Select Field -

    I couldn't find any property "HadoopFSConnectionTimeout". Could you please tell where to find this?

  • SruthiASruthiA Administrator

    Please try running the below command.
    select set_config_parameter('HadoopFSConnectionTimeout',600 );

  • anup1007anup1007 - Select Field -
    edited October 2019

    Hi,
    I tried running copy command after setting this parameter, still getting same error - Broken pipe (Write failed), after 5 minutes.

  • SruthiASruthiA Administrator

    Please open a support case if you can. It looks like we need to review logs to debug further.

  • anup1007anup1007 - Select Field -
    edited October 2019

    I checked the log and found only following FATAL:

    [Txn] Commit Complete: Txn: 1b0000000026be3 at epoch 0x5f95f0 and new global catalog version 7850619
    2019-10-17 21:31:39.320 Init Session:7fe319fge708 [EE] [Bulkload] DIRECT copy targeting TABLE_NAME from pipe committed at version 7650699.
    2019-10-17 21:31:39.321 Init Session:7fe519ffe700 @v_vertica247_node0018: 08006/2907: Could not send data to client: No such file or directory
    @v_vertica247_node0018: 08006/2607: Client has disconnected
    LOCATION: internal_handle_error, pqcomm.c:731
    Session _vertica247_node0018-53602:0x40ef73 ended; closing connection (connCnt 1)
    2019-10-17 21:30:40.300 Init Session:7fe319fge708 @v_vertica247_node0018: 00000/2705: Connection received: host=xx.xx.xxx.x port=60810 (connCnt 1)

    However this data load also got succeeded even after throwing the "Broken Pipe" error.
    Can you please let me know if this helps?

    Thanks

  • ersaurabhexersaurabhex - Select Field -

    Hi Anup,
    I am also facing similar issue while loading a 40 GB File using PDI (Pentaho) using Insert Direct. What I observed was load gets failed (pentaho gives IO File exception) exactly after 10 minutes. Even through I get this error, but data load is getting completed. I couldn't find any issue in Vertica.log at that time.
    I checked and found LockTimeOut set to 600 Seconds for 'I' (Insert) type of lock and could be the reason for such error, which I increased to 25 Minutes. Still I keep receiving errors after 10 minutes (but not at exatly 10 minutes of load) of load cycle.
    Did you get any resolution for this? Please help out so that others can benefit.
    PS - I am using Vertica 9.0 Enterprise Mode.

    Regards,
    Saurabh

  • anup1007anup1007 - Select Field -

    Hi Saurabh,

    I didn't find any resolution to this, did a workaround instead. I used polling with system table v_monitor.query_profiles to check if the query has completed executing. Used a stream name to identify particular executing query.

    Thanks,
    Anup.

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited December 2019
  • ersaurabhexersaurabhex - Select Field -

    Sure Jim. Let me try that out first and if we still see the error, I'll get back to you.

    Regards,
    Saurabh

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file