Using COPY command to load "split" tar.gz files

So i have a ~8G tar.gz file that i am trying to load in the DB. I tried several options like using multi-threaded vsqls, copy with globs on any node etc. But all these solutions have to do with splitting the files.

I split my inputFile.tar.gz into certain number of chunks, but when i tried to run copy command with the glob option i run into "ERROR:  COPY: Error occurred during ZLIB decompression.            ZLIB error code: -3, Message: incorrect header check" error.

Has anyone encountered this with a "split" tar.gz file copy?

Comments

  • In Vertica 7 there were several improvements in the COPY command that you should not need to split the file. Have you tried ? 
    https://my.vertica.com/docs/7.0.x/HTML/index.htm#Authoring/NewFeatures/7.0/7.0.0/COPYChanges.htm

    Eugenia
  • Thanks, that is helpful but i should have mentioned that we are still on version 6.
  • Time to update :)..
    What 6 version? 
    I remember a similar issue that was resolved in 6.1SP2. the latest is 6.1.3, this is the release note (you can find them in the documentation for version 6.1.x) , 

    "VER-27008

    Data load / COPY

    Under certain conditions, loading multiple concatenated gzip files through a single FIFO caused an error. This issue has been resolved.

    "

    This issue reported the same error that you are seeing. 


    Do you have this version of Vertica? If not, can you update to 6.1.3?

    Eugenia

     




  • Not sure if this is SP2, is it??

    vertica=> select version();              version
    ------------------------------------
     Vertica Analytic Database v6.1.2-0
    (1 row)

    But yes, if it is not SP2 i understand from your statement that "update version" is probably the only solution?
  • This should be SP2 so it is not the same issue that I mention. If you are an enterprise customer, I recommend you to open a support ticket. 
    Eugenia
  • Thanks, ticket is in progress.

    But i have the following scenario and was wondering if COPY command can be used?

    I have a data.tar.gz file with the actual data file and couple of other lookup data files. Any suggestions on how to apply COPY command for each of the different files in the gz file without having to decompress it?

    Actual file: data.tar.gz
    Contents of data.tar.gz: data.tsv
    lookup1.tsv
    lookup2.tsv

    Please advise

  • I think you're asking if Vertica can read tar files in this question and I'm sure that Vertica does not read tar files, compressed or otherwise.

    I think one course of action would be to extract only the file you're interested in (data.tsv) from the compressed tar file and then load that into Vertica. However if you cannot extract a full copy of the data.tsv file for some reason then the command

    tar zxf data.tar.gz data.tsv -O

    should extract the data.tsv file from the data.tar.gz file and write it to stdout. You can then pipe that into Vertica (read from STDIN in the COPY command).

    You could of course pipe the output to gzip again to immediately recompress the data if disk space is an issue.


Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file