Options

Filter GZip() Error with HDFS connector

We are trying to load some files into our 3 node POC cluster and getting the following exception for some of the files. It may be because that files are too big for our POC cluster but I couldn’t find anything useful in the logs to know the exact cause. Let us know if we are missing something.

 Exception:
014-09-15 21:51:52.190 Init Session:0x2b54d41f9940-a000000012e4a2 <ERROR> @v_poc_node0003: VP001/3399: Failure in UDx RPC call InvokeProcessUDL(): Error calling processUDL() in User Defined Object [GZip] at [FilterFunctions/GZip.cpp:56], error code: 0, message: Error occurred during ZLIB decompression.  ZLIB error code: -5, Message: (null)
               LOCATION:  makeUDxRemoteProcedureCallHandlingErrors, /scratch_a/release/vbuild/vertica/EE/EEUtil/UDxFenceSupport.cpp:387
 
Current syntax: 
Note: Adding low_speed_limit also doesn’t seem to help.

COPY Y_DETAILS_141_A931FDF9(day, week, month, model, buildString, buildTrain, firmwareVersion, count) SOURCE Hdfs(url='http://nn01:50070/webhdfs/v1/user/awdt/ankita/CR_Out/MyDetails/20140903/tsv/*.tsv*',username='', low_speed_limit=524288) FILTER Gzip()  DELIMITER E'\t'  DIRECT 


Workaround: 
However, copying the file to one of the nodes and issuing copy command without FILTER Gzip() worked.
dbadmin=> COPY Y_DETAILS_141_A931FDF9(day, week, month, model, buildString, buildTrain, firmwareVersion, count) from ‘/verticaNode1/public/MyDetails_20140903.tsv.deflate' gzip DELIMITER E'\t'   REJECTED DATA '/tmp/reject.txt' EXCEPTIONS '/tmp/exceptions.txt' DIRECT ;
 Rows Loaded 
-------------
    41337321
(1 row)
 
dbadmin=> select get_num_rejected_rows();
 get_num_rejected_rows 
-----------------------
                     0
(1 row)





Other details: 
File size: ~730MB (compressed)
Vertica Version: 7.0.1-3


Any pointers are greatly appreciated.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file