Bullk copy doesn't always create rejections file immediately

When using a bulk copy the functions GET_NUM_REJECTED_ROWS() and GET_NUM_ACCEPTED_ROWS() are not supported. To decide on the number of rejected rows i want to scan the rejected files and count the number of lines there. Unfortunately from time to time, the rejections file is not immediately created (after the copy finishes) and my code decides that there are no rejections (when i know for sure there are). Is there a workaround for this issue?

Comments

  • Thank you Amelia, I'm looking forward for your answer. Best, Alex
  • Hi Alex, You say that the file is not immediately created -- is it ever created? I'm not aware of any delay in the creation of rejections files; that should always happen before the query returns. However, there was a known bug where, under the right circumstances, with small numbers of rejected records and with the right set of options to COPY, rejections files are not created in the first place. This should be fixed in recent versions of Vertica. If you just need the count of rejected rows, COPY always returns the number of accepted rows; you could subtract that from the number of rows that you intended to load. (If you are doing a JDBC bulk load, you should just use the JDBC error-handling APIs; they should work fine.) Adam
  • Hi Adam, Thank you for answering. The file is always created but sometimes with a delay. When i'm adding a sleep() of 5 seconds between the copy command and the scan of the rejections file, it is always there. I didn't know that the copy command always returns the number of accepted rows even when multiple files are being loaded. I will experiment with it, it might be a very effective workaround for this issue. Best, Alex
  • Hi Alex, Hm, that's quite odd. I would expect the file to be written immediately; in fact prior to the COPY statement returning. Are you using regular COPY, or COPY LOCAL (to get the rejection data locally)? Also, are you by any chance writing the rejection data to a network filesystem, and checking it from a computer other than the Vertica server in question? Network filesystems can introduce this kind of delay, depending on how Linux is configured to use them. If you find a good workaround, though, great! Adam
  • You got it Adam. First, the workaround you suggested is working great and i don't need to scan the rejected file anymore. Second, we are using a network file system (CIFS) exactly as you assumed. Thank you, Alex

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file