MAPTOSTRING turns JSON arrays into objects keyed by array index and erases types

Hey I noticed that the MAPTOSTRING built-in produces JSON that is not quite what one loaded in, even when using FJSONParser(flatten_maps=FALSE,flatten_arrays=FALSE). I leafed through some of the available source https://github.com/vertica/FlexTable/blob/master/src/JSONParser.cpp I'm guessing that's by design as all flex inputs are turned into the VMap internal representation which doesn't have a notion of "arrays" or "value types"

Here's a minimal example:

CREATE FLEX TABLE some_json();

my json:

  "stuff": [
  "id": 0
  "stuff": [
  "id": 1

loading it in:

$ echo '{"stuff": ["a", "b", "c", "d"], "id": 0}\n{"stuff": ["e", "f", "g", "h"], "id": 1}' | vsql ... -c "COPY some_json(__raw__) FROM STDIN PARSER public.FJSONParser(flatten_maps=FALSE,flatten_arrays=FALSE)"

what it looks like

$ vsql ... -c "SELECT public.MapToString(__raw__) FROM some_json"
        "id": "0",
        "stuff": {
                "0": "a",
                "1": "b",
                "2": "c",
                "3": "d"
        "id": "1",
        "stuff": {
                "0": "e",
                "1": "f",
                "2": "g",
                "3": "h"
(2 rows)

Is there a way of keeping around the originally loaded data alongside the VMAP column?




  • Options
    Ariel_CaryAriel_Cary Vertica Employee Employee

    You can keep a copy of input JSON objects in a separate column.

    create flex table f1();
    alter table f1 add column json long varchar;
    copy f1(json, __raw__ as MapJSONExtractor(json)) from stdin;
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    select * from f1;
     __identity__ |                                                                                                              __raw__                                                                                                               |      json       
                1 | \001\000\000\000,\000\000\000\001\000\000\000\010\000\000\000\001\000\000\000\016\000\000\000\002\000\000\000\014\000\000\000\015\000\000\000bc\002\000\000\000\014\000\000\000\015\000\000\00001\001\000\000\000\010\000\000\000a | {"a":["b","c"]}
    (1 row)
    verticadb20140=> select json, maptostring(__raw__) from f1;
          json       |              maptostring              
     {"a":["b","c"]} | {
            "a": {
                    "0": "b",
                    "1": "c"
    (1 row)
  • Options

    Well. Can we have a new feature - MapToString should try to deduct if this is an array.
    If all keys in dict are sequential numbers starting from 0, it is a sign of array. MapToString should produce an array in this case.
    That would nicely solve a problem.
    This new feature of MapToString should be triggered by new parameter - do_my_best_to_figure_out_array. If set to true, new behaviour will be triggered.

    Please consider filing new feature request.

  • Options

    @Ariel_Cary can you create internal feature request, to make MapToString deduct array, when additional parameter specified?

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file