Copy command fails with no files match: Failure in UDx RPC call InvokePlanUDL():
Hi,
I am trying to copy data from WebHDFS location into vertica using the COPY command of vertica and using the WebHDFS connector package.
Some of my copies are failing with the error:
2015-10-02 09:00:49.774 Init Session:0x7f19843f7a00-a000000015749e [EE] <INFO> Trying to set up a new UDx side process
2015-10-02 09:00:49.809 Init Session:0x7f19843f7a00-a000000015749e <ERROR> @v_bigfoot_node0001: VP001/3399: Failure in UDx RPC call InvokePlanUDL(): Error calling planUDL() in User Defined Object [Hdfs] at [src/Hdfs.cpp:710], error code: 0, message: No files match [http://10.47.2.157:50070/webhdfs/v1/projects/bigfoot/processed/core/zs_badger/scp_ekl/la_runsheet_tasklist_fact_1905/*]
LOCATION: performUDxCall, /scratch_a/release/30493/vbuild/vertica/EE/EEUtil/UDxFenceSupport.cpp:429
2015-10-02 09:00:49.839 Init Session:0x7f19843f7a00-a000000015749e <LOG> @v_bigfoot_node0001: 08006/5167: Unexpected EOF on client connection
2015-10-02 09:00:49.839 Init Session:0x7f19843f7a00-a000000015749e <LOG> @v_bigfoot_node0001: 00000/4719: Session prod-fdpanalytics-v-9197:0xb2ef ended; closing connection (connCnt 12)
2015-10-02 09:00:49.840 Init Session:0x7f19843f7a00-a000000015749e [Txn] <INFO> Rollback Txn: a000000015749e 'COPY b_test.b_scp_ekl__la_runsheet_tasklist_fact_tmp_38386 SOURCE Hdfs(url='http://10.47.2.157:50070/webhdfs/v1/projects/bigfoot/processed/core/zs_badger/scp_ekl/la_runsheet_tasklist_fact_1905/*',username='fk-bigfoot-azkaban', connection_timeout=9999999,low_speed_time=9999999 ,low_speed_limit=0 ) parser fjsonparser() DIRECT'
I checked the hdfs location using op=LISTSTATUS and op=GETCONTENTSUMMARY. I could see data being present in those locations.
Comments
Adding more information which I found.
These copies are failing only of HDFS locations which I have only one part file.
Also, they fail only when I use the wildcard '*' character in the HDFS location. If I give complete part file name it works.
Basically,
The following command fails:
COPY b_test.b_scp_ekl__la_tpt_vertica_fact_tmp_39026 SOURCE Hdfs(url='http://10.47.2.157:50070/webhdfs/v1/projects/bigfoot/processed/core/zs_badger/scp_ekl/la_tpt_vertica_fact_1912/*',username='fk-bigfoot-azkaban', connection_timeout=9999999,low_speed_time=9999999 ,low_speed_limit=0 ) parser fjsonparser() DIRECT;
The following command works:
COPY b_test.b_scp_ekl__la_tpt_vertica_fact_tmp_39026 SOURCE Hdfs(url='http://10.47.2.157:50070/webhdfs/v1/projects/bigfoot/processed/core/zs_badger/scp_ekl/la_tpt_vertica_fact_1912/000000_0',username='fk-bigfoot-azkaban', connection_timeout=9999999,low_speed_time=9999999 ,low_speed_limit=0 ) parser fjsonparser() DIRECT;
Also to add the command with the wildcard '*' fails only when the HDFS location has one part file. It is has multiple part files, it works.
Anyone has idea about ? How do I debug further ?