Output JSON String Problem
I am returning a JSON string (7215952 characters long approx, but could be A LOT MORE) as a 1x1 matrix for a UDF (LONG VARCHAR). I am getting the following error
[Vertica][VJDBC](3399) ERROR: Failure in UDx RPC call InvokeProcessPartition(): Error calling processPartition() in User Defined Object [runmodel] at [/scratch_a/release/vbuild/vertica/UDxFence/RInterface.cpp:1236], error code: 0, message: Exception in processPartitionForR: [Returned string value '[{"item":[["2013-10-14 17:15:00","TRUE"],["2013-10-14 17:20 {...} [SQL State=VP001, DB Errorcode=3399]
NOTE: I added the "{...}" to signal that I purposedly deleted part of the error sequence due to it holding no further information (similar JSON format text as in the first 2 values)
What could cause this?
[Vertica][VJDBC](3399) ERROR: Failure in UDx RPC call InvokeProcessPartition(): Error calling processPartition() in User Defined Object [runmodel] at [/scratch_a/release/vbuild/vertica/UDxFence/RInterface.cpp:1236], error code: 0, message: Exception in processPartitionForR: [Returned string value '[{"item":[["2013-10-14 17:15:00","TRUE"],["2013-10-14 17:20 {...} [SQL State=VP001, DB Errorcode=3399]
NOTE: I added the "{...}" to signal that I purposedly deleted part of the error sequence due to it holding no further information (similar JSON format text as in the first 2 values)
What could cause this?
0
Comments
Can you please run you code in the R console and see if it works fine there.
Pratibha
We recently added logic that truncates all Vertica log messages to around 65,000 characters. Your error message is probably much longer than that, so you're missing the end of it.
It turns out that there is exactly one message in our SDK with the exact prefix above. (You can find it by grep'ing through "/opt/vertica/sdk/include/*".) The message's format-string is:
"Returned string value '[%s]' with length [%zu] "
"is greater than declared field length of [%zu] "
"of field [%s] at output column index [%zu]"
Pratibha/Said, hopefully that'll help track down what's going on with the UDx?
Now my question, what limit is there on LONG VARCHAR? Is there anything larger?
Note that Vertica has a max row size (the maximum of the sum of all field sizes on a table) of 32768000 bytes. So a single max-size LONG field uses most of that.
Also note that there are some performance implications here -- Vertica allocates memory conservatively; so if you have a field that could be 32mb, then every query operator that operates on that field and that needs to copy it to a buffer, must reserve 32mb RAM. That adds up in a hurry. It also usually disables vectorized operation -- we operate on blocks of thousands or tens of thousands of integers all the time, it's much faster, but holding tens of thousands of 32mb blocks in memory is usually a bad plan :-)
A recent development brought my team a couple question regarding LONG VARCHAR types.
1) Is there a way to CAST from VARCHAR to LONG VARCHAR without adding leading/trailing spaces?
And most importantly regarding Vertica memory management with LONG VARCHAR.
2) Let me ellaborate a scenario. If I have ONLY 100,000 byte strings in a 32,000,000 byte LONG VARCHAR (LV) column named "observations" would the following statements be true?
* Running a Query that returns a table with the "observations" column (without using it for filtering or operations) will reserve 32,000,000 bytes per observation even when their contents are only 100,000 bytes long
* Running Operations such as: LIKE, IN, "=", "<>" on the "observations" LV column will reserve the 32,000,000 bytes and consider the 100,000 strings to have leading/trailing spaces.
I am asking this to understand more about the LONG VARCHAR and the LONG VARBINARY types.
Thanks