Import CSV using Rfc4180CsvParser and exclude header row
Is there a way to exclude the header row when importing data via the Rfc4180CsvParser?
The COPY command has a SKIP opion but the option doesn't seem to work when using
copy scratch.test from '/home/root/data/test.csv' with parser Rfc4180CsvParser() SKIP 1 ;
Alternatively, the file is a C++ file is just 135 lines, so it should be feasible to modify it, and recompile using `make`.
/opt/vertica/sdk/examples/ParserFunctions/Rfc4180CsvParser.cpp
Presumably we keep a counter, increment it for each lines encountered, and not write out some initial number of lines. But I don't quite see where to make that incision. Hints?
https://community.dev.hpe.com/t5/Vertica-Forum/How-to-load-a-csv-into-a-table/m-p/233590#M11142
0
Comments
Vertica 7 comes with its own CSV parser which works with Flex Tables and columnar tables:
By default it assumes a header row, and rejects on invalid values, as if you'd specified.
The options specfied above are actually the defaults and need not be specfied. Additional options are documented at: http://my.vertica.com/docs/7.2.x/HTML/Content/Authoring/FlexTables/FCSVPARSERreference.htm
For FlexTables it does appear to read the column names from the header.
For columnar tables, I'm pretty sure the parser is not actually reading the column names, and can thus cannot take data with column orders different than the table definition, or even a subset of columns. It appears to take the columns order from the data like the regular COPY statement.
It's not big on error messages. If any one is interested in extending the C++ parser to do that, would be interested in collaborating or funding that work.
https://gist.github.com/protobi/62b225b448db39c33af5