Loading "standard" CSV file with double quoted fields

How can I load a standard csv file with the copy command? The issue i have is with double quote enclosed fields. When double quote characters occur within the text of a field, it appears as two consecutive double quote characters. (e.g. col1,col2,"col3 ""quoted"" word might also have, commas",col4) I can't seem to find any combination of options to the copy command that will let me actually load this type of csv file. If quotes within quotes had simply been escaped with a backslash, this would be easy. But, this method of quoting is about as standard as it gets with csv files. (This is the standard defined in RFC 4180.) It would surprise me if there is no way to actually load a "standard" csv file.

Comments

  • This file is a CSV file, not a DELIMITED file, so the default DELIMITED parser will not be able to load it. Vertica provides a number of parsers, including an RFC-4180 parser, as examples in /opt/vertica/sdk/examples/ . If you load that parser, you can use it to parse your file.
  • Great, I will try that. Thanks!
  • root@vertica-1:/opt/vertica/sdk/examples# /opt/vertica/bin/vsql --version
    vsql version v6.1.2-0, built for Linux64
    root@vertica-1:/opt/vertica/sdk/examples# cat /etc/issue
    Debian GNU/Linux 7 \n \l

    Try Compile  ParserFunctions/Rfc4180CsvParser.cpp
    --------------

    .root@vertica-1:/opt/vertica/skd/examples# make
    /opt/vertica/sdk/include/BasicsUDxShared.h:278:50: warning: narrowing conversion of ‘18444492273895866368ull’ from ‘long long unsigned int’ to ‘Vertica::vint {aka long long int}’ inside { } is ill-formed in C++11 [-Wnarrowing]

    ParserFunctions/BasicIntegerParser_continuous.cpp: In member function ‘virtual Vertica::UDParser* BasicIntegerParserFactory::prepare(Vertica::ServerInterface&, Vertica::PerColumnParamReader&, Vertica::PlanContext&, const Vertica::SizedColumnTypes&)’:

    ParserFunctions/BasicIntegerParser_continuous.cpp:76:16: internal compiler error: in build_zero_init_1, at cp/init.c:280

    Please submit a full bug report,with preprocessed source if appropriate.See <file:///usr/share/doc/gcc-4.7/README.Bugs> for instructions.

    Preprocessed source stored into /tmp/cc7a8M6j.out file, please attach this to your bugreport.In file included from /opt/vertica/sdk/include/VerticaUDx.h:57:0,                 
    from /opt/vertica/sdk/include/Vertica.h:76,                 
    from /opt/vertica/sdk/include/Vertica.cpp:38:
    /opt/vertica/sdk/include/BasicsUDxShared.h:278:50: warning: narrowing conversion of ‘18444492273895866368ull’ from ‘long long unsigned int’ to ‘Vertica::vint {aka long long int}’ inside { } is ill-formed in C++11 [-Wnarrowing]

    make: *** [build/BasicIntegerParser_continuous.so] Error 1
  • Hi Eugene,

    Ah, you've hit GCC bug #56403:

    http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56403

    It has been fixed upstream.  The fix has been packaged and released by some distributions, but not yet (to my knowledge at least) by Debian.

    Unfortunately, our 6.x SDK was frozen before we discovered the bug, and the workaround would break binary compatibility, so it's difficult for us to provide a workaround ourselves.

    One workaround that you could implement would be to try the clang++ compiler instead of g++.  clang++ is packaged for Debian (package "clang"); it works with our SDK, and is not affected by this bug.  To do this, install clang; then modify the line at the top of "makefile" (in the current directory) that reads "CXX=g++" to read "CXX=clang++".

    Another alternative would be to download the latest gcc from gcc.gnu.org, and compile and install it yourself.  Though I would only recommend this if you're familiar with installing compilers from scratch; it can be quite disruptive to your system, depending on how you do it.

    If you're familiar with the Debian community, you are of course welcome to encourage and/or help them to package the patch.  (I don't know that community well personally so I can't speak to their process.)

    Adam
  • Yeah, you are right, thanks.

    I gave a link while I didn't post it yet :-)

    Work around/snippet is hard coded (it just shows a concept), but if you need I can to rewrite it to your requirements.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file