Filter GZip() Error with HDFS connector
We are trying to load some files into our 3 node POC cluster and getting the following exception for some of the files. It may be because that files are too big for our POC cluster but I couldn’t find anything useful in the logs to know the exact cause. Let us know if we are missing something.
Exception:
014-09-15 21:51:52.190 Init Session:0x2b54d41f9940-a000000012e4a2 <ERROR> @v_poc_node0003: VP001/3399: Failure in UDx RPC call InvokeProcessUDL(): Error calling processUDL() in User Defined Object [GZip] at [FilterFunctions/GZip.cpp:56], error code: 0, message: Error occurred during ZLIB decompression. ZLIB error code: -5, Message: (null)
LOCATION: makeUDxRemoteProcedureCallHandlingErrors, /scratch_a/release/vbuild/vertica/EE/EEUtil/UDxFenceSupport.cpp:387
Current syntax:
Note: Adding low_speed_limit also doesn’t seem to help.
COPY Y_DETAILS_141_A931FDF9(day, week, month, model, buildString, buildTrain, firmwareVersion, count) SOURCE Hdfs(url='http://nn01:50070/webhdfs/v1/user/awdt/ankita/CR_Out/MyDetails/20140903/tsv/*.tsv*',username='', low_speed_limit=524288) FILTER Gzip() DELIMITER E'\t' DIRECT
Workaround:
However, copying the file to one of the nodes and issuing copy command without FILTER Gzip() worked.
dbadmin=> COPY Y_DETAILS_141_A931FDF9(day, week, month, model, buildString, buildTrain, firmwareVersion, count) from ‘/verticaNode1/public/MyDetails_20140903.tsv.deflate' gzip DELIMITER E'\t' REJECTED DATA '/tmp/reject.txt' EXCEPTIONS '/tmp/exceptions.txt' DIRECT ;
Rows Loaded
-------------
41337321
(1 row)
dbadmin=> select get_num_rejected_rows();
get_num_rejected_rows
-----------------------
0
(1 row)
Other details:
File size: ~730MB (compressed)
Vertica Version: 7.0.1-3
Any pointers are greatly appreciated.
Exception:
014-09-15 21:51:52.190 Init Session:0x2b54d41f9940-a000000012e4a2 <ERROR> @v_poc_node0003: VP001/3399: Failure in UDx RPC call InvokeProcessUDL(): Error calling processUDL() in User Defined Object [GZip] at [FilterFunctions/GZip.cpp:56], error code: 0, message: Error occurred during ZLIB decompression. ZLIB error code: -5, Message: (null)
LOCATION: makeUDxRemoteProcedureCallHandlingErrors, /scratch_a/release/vbuild/vertica/EE/EEUtil/UDxFenceSupport.cpp:387
Current syntax:
Note: Adding low_speed_limit also doesn’t seem to help.
COPY Y_DETAILS_141_A931FDF9(day, week, month, model, buildString, buildTrain, firmwareVersion, count) SOURCE Hdfs(url='http://nn01:50070/webhdfs/v1/user/awdt/ankita/CR_Out/MyDetails/20140903/tsv/*.tsv*',username='', low_speed_limit=524288) FILTER Gzip() DELIMITER E'\t' DIRECT
Workaround:
However, copying the file to one of the nodes and issuing copy command without FILTER Gzip() worked.
dbadmin=> COPY Y_DETAILS_141_A931FDF9(day, week, month, model, buildString, buildTrain, firmwareVersion, count) from ‘/verticaNode1/public/MyDetails_20140903.tsv.deflate' gzip DELIMITER E'\t' REJECTED DATA '/tmp/reject.txt' EXCEPTIONS '/tmp/exceptions.txt' DIRECT ;
Rows Loaded
-------------
41337321
(1 row)
dbadmin=> select get_num_rejected_rows();
get_num_rejected_rows
-----------------------
0
(1 row)
Other details:
File size: ~730MB (compressed)
Vertica Version: 7.0.1-3
Any pointers are greatly appreciated.
0