Using COPY command to load "split" tar.gz files
So i have a ~8G tar.gz file that i am trying to load in the DB. I tried several options like using multi-threaded vsqls, copy with globs on any node etc. But all these solutions have to do with splitting the files.
I split my inputFile.tar.gz into certain number of chunks, but when i tried to run copy command with the glob option i run into "ERROR: COPY: Error occurred during ZLIB decompression. ZLIB error code: -3, Message: incorrect header check" error.
Has anyone encountered this with a "split" tar.gz file copy?
I split my inputFile.tar.gz into certain number of chunks, but when i tried to run copy command with the glob option i run into "ERROR: COPY: Error occurred during ZLIB decompression. ZLIB error code: -3, Message: incorrect header check" error.
Has anyone encountered this with a "split" tar.gz file copy?
0
Comments
https://my.vertica.com/docs/7.0.x/HTML/index.htm#Authoring/NewFeatures/7.0/7.0.0/COPYChanges.htm
Eugenia
What 6 version?
I remember a similar issue that was resolved in 6.1SP2. the latest is 6.1.3, this is the release note (you can find them in the documentation for version 6.1.x) ,
"VER-27008
Data load / COPY
Under certain conditions, loading multiple concatenated gzip files through a single FIFO caused an error. This issue has been resolved.
"
This issue reported the same error that you are seeing.
Do you have this version of Vertica? If not, can you update to 6.1.3?
Eugenia
vertica=> select version(); version
------------------------------------
Vertica Analytic Database v6.1.2-0
(1 row)
But yes, if it is not SP2 i understand from your statement that "update version" is probably the only solution?
Eugenia
But i have the following scenario and was wondering if COPY command can be used?
I have a data.tar.gz file with the actual data file and couple of other lookup data files. Any suggestions on how to apply COPY command for each of the different files in the gz file without having to decompress it?
Actual file: data.tar.gz
Contents of data.tar.gz: data.tsv
lookup1.tsv
lookup2.tsv
Please advise
I think one course of action would be to extract only the file you're interested in (data.tsv) from the compressed tar file and then load that into Vertica. However if you cannot extract a full copy of the data.tsv file for some reason then the command
tar zxf data.tar.gz data.tsv -O
should extract the data.tsv file from the data.tar.gz file and write it to stdout. You can then pipe that into Vertica (read from STDIN in the COPY command).
You could of course pipe the output to gzip again to immediately recompress the data if disk space is an issue.