Exceptions and Rejection with flex table
Exceptions and rejected data files is not currently supported while loading flex tables.
Are you guys going to do something for it?
when i use copy command for a flex table for copying from json file using fjsonparser,
, it just reject all data in a file, if it contains any invalid json.
Thanks,
naveen
Are you guys going to do something for it?
when i use copy command for a flex table for copying from json file using fjsonparser,
, it just reject all data in a file, if it contains any invalid json.
Thanks,
naveen
0
Comments
You should validate your json file before loading it. Use the online tool a the http://jsonlint.com/
We're aware that rejection for JSON would be a useful feature, though thanks for pointing it out again. It's a little tricky (though not fundamentally hard) -- Vertica can parse JSON documents which are much larger than system memory, so must use a streaming parser; typical existing streaming parsers don't provide a mechanism to rewind the stream to capture the entirety of the record that is being rejected.
That said, my understanding is that we do try to provide basic support for bad records -- we won't tell you in depth what's wrong, but we will try to recover and continue parsing with the next record.
It sounds like that's not happening for you. Usually this indicates some deeper sort of corruption; maybe as simple as an extra close bracket somewhere, maybe much more complex.
Can you narrow down what in the file is causing this issue?
Thanks,
Adam
To the issue here, without record terminators existing in the JSON format, an error in the JSON itself means that where the separations betweeen records are is ambiguous and, therefore, we currently fail the load. Look in vertica.log to see the characters which caused the syntax error and failed load. We'd be interested in heaering desired behavior, given the record terminator ambiguity.
Regarding general JSON validity, if it passes a syntax check such as JSONLint.com, it should load without problem. We'd be very curious and grateful to see an example which passes with JSONLint but fails with the FJSONParser.
Can you put some example where i can use this parameter with jsonpaser ,
i did try to copy same data twice with this parameter but it didn't work.as
copy flextable from '/newdel.json' parser fjsonparser(reject_on_duplicate='true');
Rows Loaded-------------
1
copy flextable from '/newdel.json' parser fjsonparser(reject_on_duplicate='true');
Rows Loaded-------------
1
Am i missing something?
Thanks,
Naveen
Thanks.
But there must be some kind of parameter/configuration for rejecting duplicate tuple while loading flex table.
For Unique tuple copying distinct from one table to another table is totally annoying .
I'm not able to find the offending record. When I try to extract a block of 100 records (lines for me) around the failing one, all of them import without problems.
Using the __identity__ field, I checked the last imported record, it is indeed the line before the record # that failed and is imported correctly. I'm puzzled.
Wondering if it's not connected to the input data size, I tried this (record 90764 fails): The record sizes are small: Thanks!
When looking for the error in vertica.log, look for lines which contain "<UDx>". If your issue is one of record size, you will get one or more entries complaining about needing n bytes, but only having m available.
Regarding record terminators: You're welcome to split your records into separate lines for easier management, but newlines don't have any meaning within JSON; that is they will be ignored on parse.
If record #90764 is failing, what does JSONLint.com have to say about its syntax?
If a row contains duplicate tuples, then their key will be the same (or they're not duplicates), so this parameter will work in this case.
If you're asking Vertica to refuse to load rows which already exist in the destination table, we unfortunately don't have plans to support this at this time, in part because this will be a Very Large operation requiring the entirety of the table to be read in for every load.
JSONLint.com is pretty happy with the syntax - check http://pastebin.com/N84nwzgv.
Even vertica is happy with the syntax - see the post - extracting the line and few around, vertica imports them without a problem.
But now I noticed another line in the log: Adding 'direct' to the copy command didn't help, though - only the message disappeared.
On the other hand I'm not sure that multiple objects following each other are valid json as well - so there has to be some kind of format extension to multi-record json. And this is where the newlines can have some meaning.
Regarding that row, I'm able to load it without problem. What version of Vertica are you using? (as reported by: /opt/vertica/bin/vertica --version) I'd take a look at the row directly following it (and potentially the one immediately preceding it).
Thanks very much for the loading feedback--I like this pattern a lot and we'll definitely consider it. One concern I have is escaping: do you have any preferences for how we might deal with JSON values which need to contain newlines?
Currently, we support two formats: 1. Multiple individual JSON object following each other which, as you identified, is not valid JSON taken together, even if the individual records are valid, for example {"A": 1}{"A": 2}{"A": 3}; 2. A single JSON array of JSON objects, which is valid JSON, for example [{"A": 1}{"A": 2}{"A": 3}]
I can try to provide the data later.