Options

Issue with COPY command

Hello to everyone,

I found a very interesting bug with COPY command. Sadly I had to pay for this with some real production data but nevertheless.

That is the case. I have a table with some fields like this
create table test(
    a int,
    b int,
    c int,
    d int
);

And I'm using COPY command to load some big batches with data. So I have batch like this

1|2|3
4|5|6
7|8|9

So this batch can be loaded into vertica using following COPY command - COPY test (a,b,c) FROM STDIN DELIMITER '|' and everything works fine.

Problem appears when you have more columns in COPY command than in the batch. So let's say I have the same batch but COPY command was changed to this (we added new column for this table) - COPY test (a,b,c,d) FROM STDIN DELIMITER '|'. In this case data just goes to nowhere but COPY command executes without errors.

Similar case in Postgres 9.2.2 works correctly and I receive an error - ERROR:  missing data for column "d"

Comments

  • Options
    Hi Sergey, What's in you COPY rejection files? Are you familiar with the ABORT ON ERROR directive to Vertica COPY? Vertica follows the SQL standard, not necessarily the PostgreSQL standard. Our design expectation is that you want to clean up bad records manually and incrementally retry, rather than rolling back your whole data load and starting over. (Common for ELT workloads, which Vertica is often used for.) PostgreSQL assumes that your data loads are smaller and/or already cleaned. So our defaults are different and we provide a different set of tools for dealing with bad data. (We can act like PostgreSQL but you have to ask for it.) I would encourage you to look through our documentation and consider whether you want to use flags to make Vertica's COPY behave more like PostgreSQL, or whether your workflow would benefit from using some of Vertica's different functionality. Thanks, Adam
  • Options
    Hi Adam,

    Thanks a lot for your reply.

    I used ABORT ON ERROR directive before but totally forgot about it in this case. And I found my data in CopyErrorLogs folder.

    But it confuses me a little that by default there is no notification about that. So it is not easy to notice that you are missing some data. Am I missing anything else?

    Regards,
    Sergey.
  • Options
    Hi Sergey, Unfortunately, you're right that there is no default notification about this. It's not an ideal bit of UI design... It's also baked into peoples' ETL scripts, etc, to expect no output in this case; people have optimized around the status quo at this point. I can't comment on future changes, so I unfortunately can't help more than that. Though I will say that you're not the first to run into this. Feel free to post in the Ideas section of the site, or (if you're an EE customer) file a Support case requesting these notifications. That gives us more information about the need, and it gives us a way to let you know if/when this changes. Thanks, Adam

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file