Loading hybrid flex table

Hi,

I got delimited data file, where one of fields contain JSON string.
I want to load data into hybrid flex table, JSON loaded into raw, rest of fields in file to corresponding columns in hybrid flex table.

Example:

Data file row:

2024-06-09|12345|{"key1":"val1","key2":"val2"}|4.0

Table:

create flex table tst (dt date, v int, f float);

Thank you
Sergey

Answers

  • SergeBSergeB Employee

    Here's one way (no need for a flex table)

    With this file,
    more hybrid.txt
    2024-05-09|{"key1":"val1","key2":"val2"}|123

    create table zhybrid (dt date, json varchar(2000), entier int, vmap long varbinary(10000));

    copy zhybrid(dt,json,entier, vmap as MapJSONExtractor(json)) from '/somepath/hybrid.txt';

    select dt, entier,json, maptostring(vmap) from zhybrid;
    -[ RECORD 1 ]-------------------------------------
    dt | 2024-05-09
    entier | 123
    json | {"key1":"val1","key2":"val2"}
    maptostring | {
    "key1": "val1",
    "key2": "val2"
    }

  • Hi @SergeB
    Yes it works
    Extra cost is additional column JSON varchar(2000) that is permanently present in table.

    Attempt to use transform and declare field JSON as filler produce error:

    cat flex_data.txt
    2024-05-09|{"key1":"val1","key2":"val2"}|123

    create flex table test_flex (dt date, v int);

    copy test_flex(dt, json filler varchar(80), raw as MapJSONExtractor(json), v) from local '/something/flex_data.txt';

    ERROR 2671: Column reference "json" is ambiguous
    DETAIL: Reference "json" could refer to either "public.test_flex.json" or "FILLER.json"

    Same error happens when loading into normal table with vmap long varbinary(10000)

    That is not hard to fix - can Vertica allow to reference in transform either filler or table column?

    Would be fine if in case of conflict, it will be always resolved to filler.

    Please let me know if it is feasible.

    Thank you
    Sergey

  • SergeBSergeB Employee

    @Sergey_Cherepan_1
    I just tried the following
    more hybrid.txt
    2024-05-09|{"key1":"val1","key2":"val2"}|123
    create table zzhybrid (dt date, entier int, vmap long varbinary(10000));
    copy zzhybrid (dt, json FILLER VARCHAR(200), entier, vmap as mapjsonextractor(json)) from 'somepath/hybrid.txt';
    select * from zzhybrid;
    dt | entier | vmap
    ------------+--------+--------------------------------------------------------------------------------------------------------------------------------------------------
    2024-05-09 | 123 | \001\000\000\000\024\000\000\000\002\000\000\000\014\000\000\000\020\000\000\000val1val2\002\000\000\000\014\000\000\000\020\000\000\000key1key2
    My test was on this version: Vertica Analytic Database v24.1.0-4

  • Looks like it works properly on v 24.1. Thanks!

  • Sergey_Cherepan_1Sergey_Cherepan_1 ✭✭✭
    edited May 14

    Checking v 24.2:

    create flex table public.test_flex (
    dt date,
    v int
    );
    
    create table public.test_longvar (
    dt date,
    raw long varbinary(2048),
    v int
    );
    

    First table is a flex table, second table is normal table with long varbinary column.

    Load works into normal table but does not work into flex table:

    copy test_flex(dt, json filler varchar(80), __raw__ as MapJSONExtractor(json), v) from local '/home/me/work/scripts/flex_data.txt';
    
    ERROR 2671:  Column reference "json" is ambiguous
    DETAIL:  Reference "json" could refer to either "public.test_flex.json" or "*FILLER*.json"
    
    copy test_longvar(dt, json filler varchar(80), raw as MapJSONExtractor(json), v) from local '/home/me/work/scripts/flex_data.txt';
    
     Rows Loaded 
    -------------
               1
    (1 row)
    

    It is definitely a great improvement from 12.0.4, but only partial fix... @SergeB can you make loading work for flex tables as well? Should not be that hard.
    In most cases, we do have additional fields outside JSON for loading into flex table. We had to put additional columns into JSON, just to populate real columns in flex table. That makes table larger, raw field unnecessary has extra fields. This bug definitely bites.

    Thank you
    Sergey

  • @SergeB related bug:

    Maptostring produces string with JSON, containing dictionary with keys ['1','2','3','....] instead of arrays.
    Maptostring cannot be used to re-create original JSON that has been loaded into flex table - you will get no arrays but dictionaries instead.

    I found users are keeping original JSON string in flex table as they need original and cannot reproduce with maptostring.

    Can we have new parameter to maptostring - approx deduct array. If it is dictionary and keys are sequential numbers ['1','2','3','4'...], maptostring will produce array instead of dict. That will not be exact but good enough for vast majority of cases.

    Thank you
    Sergey

  • SergeBSergeB Employee

    @Sergey_Cherepan_1 for the second issue (maptostring handling of arrays) a new feature request was created.

    For the first issue (json FILLER ambiguity when using FlexTable), in my tests it works fine if you remove the ambiguity by using

    copy test_flex(dt, json filler varchar(80), raw as MapJSONExtractor("FILLER".json), v)

    The reason for the ambiguity is probably because in the case of a flex table, json can come from FILLER but also be extracted from one of the fields in raw

  • @SergeB

    It works!

    Syntax is really non-trivial and obnoxious, but it works!

    In past, I tried VERY MANY various combinations, but never hit a right syntax.

    This absolutely deserve being mentioned in docs, along with your example.

    May be you can invent more user-friendly syntax - without double quotes (that needs to be escaped in vast majority of cases) and without stars.

    Thanks for a hint, it really helps. I will be using it like tomorrow.

    We do load quite a few flex tables. Current practice is to add all columns that needs to be materialised to JSON, even if those columns are not part of original JSON. Your hint will greatly simplify process - for example, we will have a normal CSV file, where one of fields are JSON. This CSV format is not loadable into flex table without knowing your hint.

    Sergey

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file