Using COPY command to load "split" tar.gz files

Srivatsan · March 2014

So i have a ~8G tar.gz file that i am trying to load in the DB. I tried several options like using multi-threaded vsqls, copy with globs on any node etc. But all these solutions have to do with splitting the files.

I split my inputFile.tar.gz into certain number of chunks, but when i tried to run copy command with the glob option i run into "ERROR: COPY: Error occurred during ZLIB decompression. ZLIB error code: -3, Message: incorrect header check" error.

Has anyone encountered this with a "split" tar.gz file copy?

[Deleted User] · March 2014

In Vertica 7 there were several improvements in the COPY command that you should not need to split the file. Have you tried ?
https://my.vertica.com/docs/7.0.x/HTML/index.htm#Authoring/NewFeatures/7.0/7.0.0/COPYChanges.htm

Eugenia

Srivatsan · March 2014

Thanks, that is helpful but i should have mentioned that we are still on version 6.

[Deleted User] · March 2014

Time to update

..
What 6 version?
I remember a similar issue that was resolved in 6.1SP2. the latest is 6.1.3, this is the release note (you can find them in the documentation for version 6.1.x) ,

"VER-27008

Data load / COPY

Under certain conditions, loading multiple concatenated gzip files through a single FIFO caused an error. This issue has been resolved.

"

This issue reported the same error that you are seeing.

Do you have this version of Vertica? If not, can you update to 6.1.3?

Eugenia

Srivatsan · March 2014

Not sure if this is SP2, is it??

vertica=> select version(); version
------------------------------------
Vertica Analytic Database v6.1.2-0
(1 row)

But yes, if it is not SP2 i understand from your statement that "update version" is probably the only solution?

[Deleted User] · March 2014

This should be SP2 so it is not the same issue that I mention. If you are an enterprise customer, I recommend you to open a support ticket.
Eugenia

Srivatsan · March 2014

Thanks, ticket is in progress.

But i have the following scenario and was wondering if COPY command can be used?

I have a data.tar.gz file with the actual data file and couple of other lookup data files. Any suggestions on how to apply COPY command for each of the different files in the gz file without having to decompress it?

Actual file: data.tar.gz
Contents of data.tar.gz: data.tsv
lookup1.tsv
lookup2.tsv

Please advise

Allan_Cochrane_1 · April 2014

I think you're asking if Vertica can read tar files in this question and I'm sure that Vertica does not read tar files, compressed or otherwise.

I think one course of action would be to extract only the file you're interested in (data.tsv) from the compressed tar file and then load that into Vertica. However if you cannot extract a full copy of the data.tsv file for some reason then the command

tar zxf data.tar.gz data.tsv -O

should extract the data.tsv file from the data.tar.gz file and write it to stdout. You can then pipe that into Vertica (read from STDIN in the COPY command).

You could of course pipe the output to gzip again to immediately recompress the data if disk space is an issue.

We're Moving!

Create My New Community Account Now

Using COPY command to load "split" tar.gz files

Comments

Leave a Comment