Options

Loading hybrid flex table

Hi,

I got delimited data file, where one of fields contain JSON string.
I want to load data into hybrid flex table, JSON loaded into raw, rest of fields in file to corresponding columns in hybrid flex table.

Example:

Data file row:

2024-06-09|12345|{"key1":"val1","key2":"val2"}|4.0

Table:

create flex table tst (dt date, v int, f float);

Thank you
Sergey

Answers

  • Options
    SergeBSergeB - Select Field - Employee

    Here's one way (no need for a flex table)

    With this file,
    more hybrid.txt
    2024-05-09|{"key1":"val1","key2":"val2"}|123

    create table zhybrid (dt date, json varchar(2000), entier int, vmap long varbinary(10000));

    copy zhybrid(dt,json,entier, vmap as MapJSONExtractor(json)) from '/somepath/hybrid.txt';

    select dt, entier,json, maptostring(vmap) from zhybrid;
    -[ RECORD 1 ]-------------------------------------
    dt | 2024-05-09
    entier | 123
    json | {"key1":"val1","key2":"val2"}
    maptostring | {
    "key1": "val1",
    "key2": "val2"
    }

  • Options

    Hi @SergeB
    Yes it works
    Extra cost is additional column JSON varchar(2000) that is permanently present in table.

    Attempt to use transform and declare field JSON as filler produce error:

    cat flex_data.txt
    2024-05-09|{"key1":"val1","key2":"val2"}|123

    create flex table test_flex (dt date, v int);

    copy test_flex(dt, json filler varchar(80), raw as MapJSONExtractor(json), v) from local '/something/flex_data.txt';

    ERROR 2671: Column reference "json" is ambiguous
    DETAIL: Reference "json" could refer to either "public.test_flex.json" or "FILLER.json"

    Same error happens when loading into normal table with vmap long varbinary(10000)

    That is not hard to fix - can Vertica allow to reference in transform either filler or table column?

    Would be fine if in case of conflict, it will be always resolved to filler.

    Please let me know if it is feasible.

    Thank you
    Sergey

  • Options
    SergeBSergeB - Select Field - Employee

    @Sergey_Cherepan_1
    I just tried the following
    more hybrid.txt
    2024-05-09|{"key1":"val1","key2":"val2"}|123
    create table zzhybrid (dt date, entier int, vmap long varbinary(10000));
    copy zzhybrid (dt, json FILLER VARCHAR(200), entier, vmap as mapjsonextractor(json)) from 'somepath/hybrid.txt';
    select * from zzhybrid;
    dt | entier | vmap
    ------------+--------+--------------------------------------------------------------------------------------------------------------------------------------------------
    2024-05-09 | 123 | \001\000\000\000\024\000\000\000\002\000\000\000\014\000\000\000\020\000\000\000val1val2\002\000\000\000\014\000\000\000\020\000\000\000key1key2
    My test was on this version: Vertica Analytic Database v24.1.0-4

  • Options

    Looks like it works properly on v 24.1. Thanks!

  • Options
    Sergey_Cherepan_1Sergey_Cherepan_1 ✭✭✭
    edited May 14

    Checking v 24.2:

    create flex table public.test_flex (
    dt date,
    v int
    );
    
    create table public.test_longvar (
    dt date,
    raw long varbinary(2048),
    v int
    );
    

    First table is a flex table, second table is normal table with long varbinary column.

    Load works into normal table but does not work into flex table:

    copy test_flex(dt, json filler varchar(80), __raw__ as MapJSONExtractor(json), v) from local '/home/me/work/scripts/flex_data.txt';
    
    ERROR 2671:  Column reference "json" is ambiguous
    DETAIL:  Reference "json" could refer to either "public.test_flex.json" or "*FILLER*.json"
    
    copy test_longvar(dt, json filler varchar(80), raw as MapJSONExtractor(json), v) from local '/home/me/work/scripts/flex_data.txt';
    
     Rows Loaded 
    -------------
               1
    (1 row)
    

    It is definitely a great improvement from 12.0.4, but only partial fix... @SergeB can you make loading work for flex tables as well? Should not be that hard.
    In most cases, we do have additional fields outside JSON for loading into flex table. We had to put additional columns into JSON, just to populate real columns in flex table. That makes table larger, raw field unnecessary has extra fields. This bug definitely bites.

    Thank you
    Sergey

  • Options

    @SergeB related bug:

    Maptostring produces string with JSON, containing dictionary with keys ['1','2','3','....] instead of arrays.
    Maptostring cannot be used to re-create original JSON that has been loaded into flex table - you will get no arrays but dictionaries instead.

    I found users are keeping original JSON string in flex table as they need original and cannot reproduce with maptostring.

    Can we have new parameter to maptostring - approx deduct array. If it is dictionary and keys are sequential numbers ['1','2','3','4'...], maptostring will produce array instead of dict. That will not be exact but good enough for vast majority of cases.

    Thank you
    Sergey

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file