COPY command with multi-character delimiters
Hi,
I have data in the below mentioned format:
1234567^~^1234567^~^1234567^~^2004-02-02
The file has multiple rows of data in the same format.
Can you tell me how to use it with COPY command as the Vertica allows only a SINGLE character delimiter.
I cant use a file parser as the size of the file is huge and would take immense time to parse.
Any inputs would be highly appreciated.
Thanks
0
Comments
Create your data file.
Create the table
Create the copy commnad
- you just need to use one single character delimiter, when loading , but you can transform the rest using fillers and the TRIM build-in function.
Check that data
I hope this was what you are looking for
Thanks Adrian_Oprea_1 !! Your solution worked perfectly fine for above example. Using fillers was a great idea.
We are purposely using multi-character delimiters (^~^) as the columns may have data that can contain tilda (~). However, there is no support for multi-character delimiter in Vertica.
Is there anything using which we can ignore tilda from the coumn data.
Sample:
123~4567^~^1234567^~^1234567^~^2004-02-04
Looking forward for your response.
I don't understand your question ?
could you rephrase it ?
Hi Adrian,
What I mean to ask you is that, there is a possibility that the data might contain special characters. And if one of these special characters matches the delimiter then that ENTIRE ROW goes under Rejected data.
So can you tell me how to handle this scenario i.e. what if the delimiter is a part of the data??
Sample:
123~4567^~^1234567^~^1234567^~^2004-02-04
Hope this makes it clear. Thanks.
Ohhh,
I faced this in the past, i had to sweet a bit to fix it , but i could not do it dynamicly i had ot do it by hand.
I sudgesst to solve this at the "root" where the data is generated and make sure that you have specific delimiters in place.
Well, I too faced the same issue but unfortunately I dont have the authority to alter the data content. ;-)
So after a little ground work, we built a Parser, which could parse multi-character delimiters. And then after you deploy it. you can use it in the COPY commands as a parameter and it works perfectly.
Thanks again for the help Adrian.
Could you share it with the community ?
Hi tangiblegeek -
If you can influence the creation of the load file - consider using string delimiters/enclosers:
Instead of:
123~4567^~^1234567^~^1234567^~^2004-02-04
you would have:
'123~4567'~1234567~1234567~'2004-02-04'
And your COPY command would be:
COPY <mytable> FROM LOCAL '<mytable.tsv>' DELIMITER '~' ENCLOSED BY '''' DIRECT;
If the string delimiter can occur within the string, it would need to be escaped by a backslash ...
Good luck
Marco
(I work for HP) ...
Hi Marco,
Unfortunately, I dont have the access to alter the data, but there is fix that we have put eventually and it has been working pretty fine till now, so decided to share that with the community.
We are using a shell script, and using 'awk' and 'sed' commands we replace the multi-char delimiter with a single delimiter while loading. Hence, the file gets loaded perfiectly also, the file also does not get modified.
The flow is as follows:
1. UDX calls shell script
2. Newly constructed COPY statement gets fired at command prompt
3. Delimiter gets replaced 'on-the-fly' (actual file is still the same)
4. Shell returns control to UDX for remaining steps
Hope this works for you guys.