MAPTOSTRING turns JSON arrays into objects keyed by array index and erases types

Hey I noticed that the MAPTOSTRING built-in produces JSON that is not quite what one loaded in, even when using FJSONParser(flatten_maps=FALSE,flatten_arrays=FALSE). I leafed through some of the available source https://github.com/vertica/FlexTable/blob/master/src/JSONParser.cpp I'm guessing that's by design as all flex inputs are turned into the VMap internal representation which doesn't have a notion of "arrays" or "value types"

Here's a minimal example:

CREATE FLEX TABLE some_json();

my json:

{
  "stuff": [
    "a",
    "b",
    "c",
    "d"
  ],
  "id": 0
}
{
  "stuff": [
    "e",
    "f",
    "g",
    "h"
  ],
  "id": 1
}

loading it in:

$ echo '{"stuff": ["a", "b", "c", "d"], "id": 0}\n{"stuff": ["e", "f", "g", "h"], "id": 1}' | vsql ... -c "COPY some_json(__raw__) FROM STDIN PARSER public.FJSONParser(flatten_maps=FALSE,flatten_arrays=FALSE)"

what it looks like

$ vsql ... -c "SELECT public.MapToString(__raw__) FROM some_json"
                                  MapToString
-------------------------------------------------------------------------------
 {
        "id": "0",
        "stuff": {
                "0": "a",
                "1": "b",
                "2": "c",
                "3": "d"
        }
}
 {
        "id": "1",
        "stuff": {
                "0": "e",
                "1": "f",
                "2": "g",
                "3": "h"
        }
}
(2 rows)

Is there a way of keeping around the originally loaded data alongside the VMAP column?

Thanks!

Tagged:

Answers

  • Ariel_CaryAriel_Cary Vertica Employee Employee

    You can keep a copy of input JSON objects in a separate column.

    create flex table f1();
    alter table f1 add column json long varchar;
    
    copy f1(json, __raw__ as MapJSONExtractor(json)) from stdin;
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    {"a":["b","c"]}
    \.
    
    select * from f1;
     __identity__ |                                                                                                              __raw__                                                                                                               |      json       
    --------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------
                1 | \001\000\000\000,\000\000\000\001\000\000\000\010\000\000\000\001\000\000\000\016\000\000\000\002\000\000\000\014\000\000\000\015\000\000\000bc\002\000\000\000\014\000\000\000\015\000\000\00001\001\000\000\000\010\000\000\000a | {"a":["b","c"]}
    (1 row)
    
    verticadb20140=> select json, maptostring(__raw__) from f1;
          json       |              maptostring              
    -----------------+---------------------------------------
     {"a":["b","c"]} | {
            "a": {
                    "0": "b",
                    "1": "c"
            }
    }
    (1 row)
    
  • Well. Can we have a new feature - MapToString should try to deduct if this is an array.
    If all keys in dict are sequential numbers starting from 0, it is a sign of array. MapToString should produce an array in this case.
    That would nicely solve a problem.
    This new feature of MapToString should be triggered by new parameter - do_my_best_to_figure_out_array. If set to true, new behaviour will be triggered.

    Please consider filing new feature request.

  • @Ariel_Cary can you create internal feature request, to make MapToString deduct array, when additional parameter specified?

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file