Options

arrays not readable (varbinary) after import into flex table

Hello,

when I import a JSON file with the json parser into a flex zone, many arrays are automatically put in one column. So far no problem. But the data type is often long varbinary and the actual content gets lost.
I use - on purpose - flatten_arrays=0.

Extract from JSON file (source):
"salience": {
"content": { "entities": [{ "about": 1, "name": "Chris Dunbar", "sentiment": 1, "evidence": 7, "themes": ["guy walking", "absolute treat", "much talent"], "confident": 1, "label": "Person", "type": "Person" }, { "about": 1, "name": "Jerry", "sentiment": -21, "evidence": 7, "themes": ["video section feeding", "absolute treat", "much talent"], "confident": 1, "label": "Person", "type": "Person" }, { "about": 0, "name": "Audrey", "sentiment": -18, "evidence": 5, "themes": ["video section feeding", "absolute treat", "much talent"], "confident": 1, "label": "Person", "type": "Person" }], "sentiment": 2 } }
After having computed the flex table keys, the content of e.g. "salience.content.entities" looks as follows (tool: DBeaver); data type: long varbinary
image

Within batch on the VM of Vertica 7 Community Edition, it loos a bit better but still: it contains wrong values and is nearly unreadable
MyFirstDatabase=> select "salience.content.entities" from facebook;                                                                                                                                                                                                                                                  salience.content.entities                                                                             
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  � defgmyz��������9 ? @ A B H N Q � 117PersonChris Dunbar1 4 )guy walkingabsolute treatmuch talent 012Person117PersonJerry-21 > %3video section feedingabsolute treatmuch talent 012Person015PersonAudrey-18 > %3video section feedingabsolute treatmuch talent 012Person dkv����������������
                                                                                                                              $ 0.about0.confident0.evidence0.label0.name0.sentiment0.themes0.type1.about1.confident1.evidence1.label1.name1.sentiment1.themes1.type2.about2.confident2.evidence2.label2.name2.sentiment2.themes2.type
(1 row)

Is there any solution how to avoid this or to "re-convert" it to the actual content? The best solution may be a proper import.
Please note, flatten_arrays=1 (resp. true) works fine but this is not what I want to do.

Thank you!

Comments

  • Options
    Hi Adrian,

    If you don't flatten arrays, then Vertica must store the array as a variable-size object of some sort.  Vertica stores its arrays within "map" objects.  What you are seeing is the internal binary representation of the map object.

    Most third-party applications unfortunately don't yet know how to pretty-print maps.  (vsql provides the raw map because scripts may want to work with it programmatically.  As you've seen, that leads to some amount of user confusion; it's therefore something that might change in the future.)

    But Vertica itself can work with maps just fine.  You just have to provide the index that you want to access.  For example, I believe the following will work for you:
    select mapLookup("salience.content.entities", '0') from facebook;
    Or '1', '2', or whatever other integer you would like.

    SQL doesn't provide a powerful standard syntax for working with arbitrary-width data types like arrays.  But it's great at working with collections of rows.  For example, assuming you have some sort of "id" column, I believe you could do something like:
    select id, count(*) from (select id, mapItems("salience.content.entities") over (partition by id) from facebook) f group by id;
    to give you a count of entities per row in the actual table.  (You can, of course, put much-more-complicated logic into this expression; this is just a simple example.)

    These functions, and many more, are documented in the Flex Tables Guide:

    https://my.vertica.com/docs/7.0.x/PDF/HP_Vertica_7.0.x_Flex_Tables.pdf

    Adam

  • Options
    Incidentally, if you want to see JSON, call "mapToString()" on your map.

    Vertica won't let you select columns out of the JSON data; you have to use a map for performance; but if you want something more human-readable, JSON is usually much more approachable.

    mapToString() is a JSON encoder; it re-creates the JSON string based on the contents of the map.  As a result, it can produce JSON regardless of the original format of the data in the map.  (JSON, CSV, output of mapAggregate(), etc.)

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file