MAPTOSTRING turns JSON arrays into objects keyed by array index and erases types
Hey I noticed that the MAPTOSTRING built-in produces JSON that is not quite what one loaded in, even when using FJSONParser(flatten_maps=FALSE,flatten_arrays=FALSE). I leafed through some of the available source https://github.com/vertica/FlexTable/blob/master/src/JSONParser.cpp I'm guessing that's by design as all flex inputs are turned into the VMap internal representation which doesn't have a notion of "arrays" or "value types"
Here's a minimal example:
CREATE FLEX TABLE some_json();
my json:
{
"stuff": [
"a",
"b",
"c",
"d"
],
"id": 0
}
{
"stuff": [
"e",
"f",
"g",
"h"
],
"id": 1
}
loading it in:
$ echo '{"stuff": ["a", "b", "c", "d"], "id": 0}\n{"stuff": ["e", "f", "g", "h"], "id": 1}' | vsql ... -c "COPY some_json(__raw__) FROM STDIN PARSER public.FJSONParser(flatten_maps=FALSE,flatten_arrays=FALSE)"
what it looks like
$ vsql ... -c "SELECT public.MapToString(__raw__) FROM some_json"
MapToString
-------------------------------------------------------------------------------
{
"id": "0",
"stuff": {
"0": "a",
"1": "b",
"2": "c",
"3": "d"
}
}
{
"id": "1",
"stuff": {
"0": "e",
"1": "f",
"2": "g",
"3": "h"
}
}
(2 rows)
Is there a way of keeping around the originally loaded data alongside the VMAP column?
Thanks!
Tagged:
0
Answers
You can keep a copy of input JSON objects in a separate column.
create flex table f1(); alter table f1 add column json long varchar; copy f1(json, __raw__ as MapJSONExtractor(json)) from stdin; Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. {"a":["b","c"]} \. select * from f1; __identity__ | __raw__ | json --------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------- 1 | \001\000\000\000,\000\000\000\001\000\000\000\010\000\000\000\001\000\000\000\016\000\000\000\002\000\000\000\014\000\000\000\015\000\000\000bc\002\000\000\000\014\000\000\000\015\000\000\00001\001\000\000\000\010\000\000\000a | {"a":["b","c"]} (1 row) verticadb20140=> select json, maptostring(__raw__) from f1; json | maptostring -----------------+--------------------------------------- {"a":["b","c"]} | { "a": { "0": "b", "1": "c" } } (1 row)Well. Can we have a new feature - MapToString should try to deduct if this is an array.
If all keys in dict are sequential numbers starting from 0, it is a sign of array. MapToString should produce an array in this case.
That would nicely solve a problem.
This new feature of MapToString should be triggered by new parameter - do_my_best_to_figure_out_array. If set to true, new behaviour will be triggered.
Please consider filing new feature request.
@Ariel_Cary can you create internal feature request, to make MapToString deduct array, when additional parameter specified?