problem with HDFS-vertica connector

I installed HDFS vertica connector on all hosts and I spun up an EMR job with webHDFS enabled on port 9101. 1) For testing webHDFS, I tried this below : ---------------------------------------------------------------------------------------------- /root# curl -i -L "http://hadoop-namenode:9101/webhdfs/tmp/test.txt?op=OPEN&user.name=hadoop" ----------------------------------------------------------------------------------------------- HTTP/1.1 307 TEMPORARY_REDIRECT .... HTTP/1.1 200 OK ......... A|1|2|3 B|4|5|6 2) NOW after verifying "webhdfs", I try to use native COPY from vertica to copy files form HDFS. ---------------------------------------------------------------------------------------------------- -> COPY qa.testtable SOURCE Hdfs(url='http://hadoop-namenode:9101/webhdfs/tmp/test.txt', username='hadoop'); ------------------------------------------------------------------------------------------------------ ERROR 0: Error calling plan() in User Function HdfsFactory at [src/Hdfs.cpp:198], error code: 0, message: No files match [http://hadoop-namenode:9101/webhdfs/tmp/test.txt] Any one has an idea on why the COPY fails, even though the files are present on HDFS ? Please let know if you see any mistake here..

Comments

  • Hi Amelia, Thanks for the response. Few more details : I am using EMR which has Hadoop 0.20.205 version. Vertica is residing on a Linux version 2.6.32-279.5.1.el6.x86_64 (Red Hat 4.4.6-4) distribution.
  • Can you please post relevant messages from your vertica.log and UDxLogs? The Hdfs connector implements glob functions to list files. In some cases it does not do a good job propagating errors that occur in these functions (we are working on a fix). It does however write INFO messages to a udx log. Did you run curl from the vertica node (every Vertica node must be able to access webhdfs)?
  • I've seen the identical issue using actual HDFS and actual community edition of Vertica on a single node VM test installation. Can provide details if needed
  • tail vertica.log ------------------------- 2013-07-03 11:17:43.394 Init Session:0x7fd9f4028cb0 @v_vmart_node0004: 00000/2705: Connection received: host=127.0.0.1 port=44325 (connCnt 1) 2013-07-03 11:17:43.394 Init Session:0x7fd9f4028cb0 @v_vmart_node0004: 00000/4540: Received SSL negotiation startup packet 2013-07-03 11:17:43.394 Init Session:0x7fd9f4028cb0 @v_vmart_node0004: 00000/4691: Sending SSL negotiation response 'N' 2013-07-03 11:17:43.394 Init Session:0x7fd9f4028cb0 @v_vmart_node0004: 00000/4686: sendAuthRequest: user=dbadmin database=VMart host=127.0.0.1 authType=0 2013-07-03 11:17:43.395 Init Session:0x7fd9f4028cb0 @v_vmart_node0004: 00000/2703: Connection authenticated: user=dbadmin database=VMart host=127.0.0.1 2013-07-03 11:17:43.395 Init Session:0x7fd9f4028cb0 @v_vmart_node0004: 00000/2609: Client pid: 27548 2013-07-03 11:17:43.396 Init Session:0x7fd9f4028cb0 [Session] [Query] TX:0(debian-27591:0x61ba) COPY testTable SOURCE Hdfs(url='http://localhost:50075/webhdfs/v1/tmp/test.txt',username='ulil'); 2013-07-03 11:17:43.400 Init Session:0x7fd9f4028cb0-a00000000025b7 [Txn] Begin Txn: a00000000025b7 'COPY testTable SOURCE Hdfs(url='http://localhost:50075/webhdfs/v1/tmp/test.txt',username='ulil');' 2013-07-03 11:17:43.411 Init Session:0x7fd9f4028cb0-a00000000025b7 [UserMessage] [HDFS UDL INFO|src/Glob.cpp,287]: Unknown type 2013-07-03 11:17:43.412 Init Session:0x7fd9f4028cb0-a00000000025b7 @v_vmart_node0004: VP001: Error calling plan() in User Function HdfsFactory at [src/Hdfs.cpp:198], error code: 0, message: No files match [http://localhost:50075/webhdfs/v1/tmp/test.txt] LOCATION: planUDSource, /scratch_a/release/vbuild/vertica/Catalog/LanguageSupport.cpp:410 2013-07-03 11:17:43.423 Init Session:0x7fd9f4028cb0-a00000000025b7 @v_vmart_node0004: 00000/4719: Session debian-27591:0x61ba ended; closing connection (connCnt 1) 2013-07-03 11:17:43.423 Init Session:0x7fd9f4028cb0-a00000000025b7 [Txn] Rollback Txn: a00000000025b7 'COPY testTable SOURCE Hdfs(url='http://localhost:50075/webhdfs/v1/tmp/test.txt',username='ulil');'
  • Thanks for posting. The log contents are helpful. The Hdfs source is somehow confused about the file status response received from webhdfs. Can you please issue the following query and post the results? Thanks. http://hadoop-namenode:9101/webhdfs/tmp/test.txt?op=GETFILESTATUS&user.name=hadoop
  • Hi Venkat, Could you please tell me where have you defined webHDFS port to be 9101? Thanks Ravi
  • I am getting exactly same problem, checked UdxLogs and nothing posted; tried different ports such as 50070, 50075, 9010, etc and got can't connect to host error for all ports except 50075 which got 'no files match', any help is appreciated.
  • Can you post a snippet of your vertica.log as well as the contents returned by an http request issued with a URL from the log, preferably one containing "?op=GETFILESTATUS"? Thanks.
  • Hi Ravi,


    The default port for webHDFS on my distribution was 9101. 


    The way to enable WEBHDFS is to add a bootstrap action to the EMR job as below.


    -> --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop --args "-s,dfs.webhdfs.enabled=true"
  • I am using below command
    ruby elastic-mapreduce --create --name "Demo Instance" --alive --num-instances 1 --instance-type m1.small --ami-version latest --log-uri s3://<myawsbucket>/logs --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-daemons --args "-h,dfs.webhdfs.enabled=true"

    but it is not working for me.

    any help is appreciated

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file