Loading "standard" CSV file with double quoted fields
How can I load a standard csv file with the copy command? The issue i have is with double quote enclosed fields. When double quote characters occur within the text of a field, it appears as two consecutive double quote characters. (e.g. col1,col2,"col3 ""quoted"" word might also have, commas",col4) I can't seem to find any combination of options to the copy command that will let me actually load this type of csv file. If quotes within quotes had simply been escaped with a backslash, this would be easy. But, this method of quoting is about as standard as it gets with csv files. (This is the standard defined in RFC 4180.) It would surprise me if there is no way to actually load a "standard" csv file.
0
Comments
vsql version v6.1.2-0, built for Linux64
root@vertica-1:/opt/vertica/sdk/examples# cat /etc/issue
Debian GNU/Linux 7 \n \l
Try Compile ParserFunctions/Rfc4180CsvParser.cpp
--------------
.root@vertica-1:/opt/vertica/skd/examples# make
/opt/vertica/sdk/include/BasicsUDxShared.h:278:50: warning: narrowing conversion of ‘18444492273895866368ull’ from ‘long long unsigned int’ to ‘Vertica::vint {aka long long int}’ inside { } is ill-formed in C++11 [-Wnarrowing]
ParserFunctions/BasicIntegerParser_continuous.cpp: In member function ‘virtual Vertica::UDParser* BasicIntegerParserFactory::prepare(Vertica::ServerInterface&, Vertica::PerColumnParamReader&, Vertica::PlanContext&, const Vertica::SizedColumnTypes&)’:
ParserFunctions/BasicIntegerParser_continuous.cpp:76:16: internal compiler error: in build_zero_init_1, at cp/init.c:280
Please submit a full bug report,with preprocessed source if appropriate.See <file:///usr/share/doc/gcc-4.7/README.Bugs> for instructions.
Preprocessed source stored into /tmp/cc7a8M6j.out file, please attach this to your bugreport.In file included from /opt/vertica/sdk/include/VerticaUDx.h:57:0,
from /opt/vertica/sdk/include/Vertica.h:76,
from /opt/vertica/sdk/include/Vertica.cpp:38:
/opt/vertica/sdk/include/BasicsUDxShared.h:278:50: warning: narrowing conversion of ‘18444492273895866368ull’ from ‘long long unsigned int’ to ‘Vertica::vint {aka long long int}’ inside { } is ill-formed in C++11 [-Wnarrowing]
make: *** [build/BasicIntegerParser_continuous.so] Error 1
Ah, you've hit GCC bug #56403:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56403
It has been fixed upstream. The fix has been packaged and released by some distributions, but not yet (to my knowledge at least) by Debian.
Unfortunately, our 6.x SDK was frozen before we discovered the bug, and the workaround would break binary compatibility, so it's difficult for us to provide a workaround ourselves.
One workaround that you could implement would be to try the clang++ compiler instead of g++. clang++ is packaged for Debian (package "clang"); it works with our SDK, and is not affected by this bug. To do this, install clang; then modify the line at the top of "makefile" (in the current directory) that reads "CXX=g++" to read "CXX=clang++".
Another alternative would be to download the latest gcc from gcc.gnu.org, and compile and install it yourself. Though I would only recommend this if you're familiar with installing compilers from scratch; it can be quite disruptive to your system, depending on how you do it.
If you're familiar with the Debian community, you are of course welcome to encourage and/or help them to package the patch. (I don't know that community well personally so I can't speak to their process.)
Adam
http://vertica-forums.com/posting.php?mode=post&f=81
http://vertica-forums.com/viewtopic.php?f=81&t=1507
I gave a link while I didn't post it yet :-)
Work around/snippet is hard coded (it just shows a concept), but if you need I can to rewrite it to your requirements.