Loading hybrid flex table
Hi,
I got delimited data file, where one of fields contain JSON string.
I want to load data into hybrid flex table, JSON loaded into raw, rest of fields in file to corresponding columns in hybrid flex table.
Example:
Data file row:
2024-06-09|12345|{"key1":"val1","key2":"val2"}|4.0
Table:
create flex table tst (dt date, v int, f float);
Thank you
Sergey
0
Answers
Here's one way (no need for a flex table)
With this file,
more hybrid.txt
2024-05-09|{"key1":"val1","key2":"val2"}|123
create table zhybrid (dt date, json varchar(2000), entier int, vmap long varbinary(10000));
copy zhybrid(dt,json,entier, vmap as MapJSONExtractor(json)) from '/somepath/hybrid.txt';
select dt, entier,json, maptostring(vmap) from zhybrid;
-[ RECORD 1 ]-------------------------------------
dt | 2024-05-09
entier | 123
json | {"key1":"val1","key2":"val2"}
maptostring | {
"key1": "val1",
"key2": "val2"
}
Hi @SergeB
Yes it works
Extra cost is additional column JSON varchar(2000) that is permanently present in table.
Attempt to use transform and declare field JSON as filler produce error:
cat flex_data.txt
2024-05-09|{"key1":"val1","key2":"val2"}|123
create flex table test_flex (dt date, v int);
copy test_flex(dt, json filler varchar(80), raw as MapJSONExtractor(json), v) from local '/something/flex_data.txt';
ERROR 2671: Column reference "json" is ambiguous
DETAIL: Reference "json" could refer to either "public.test_flex.json" or "FILLER.json"
Same error happens when loading into normal table with vmap long varbinary(10000)
That is not hard to fix - can Vertica allow to reference in transform either filler or table column?
Would be fine if in case of conflict, it will be always resolved to filler.
Please let me know if it is feasible.
Thank you
Sergey
@Sergey_Cherepan_1
I just tried the following
more hybrid.txt
2024-05-09|{"key1":"val1","key2":"val2"}|123
create table zzhybrid (dt date, entier int, vmap long varbinary(10000));
copy zzhybrid (dt, json FILLER VARCHAR(200), entier, vmap as mapjsonextractor(json)) from 'somepath/hybrid.txt';
select * from zzhybrid;
dt | entier | vmap
------------+--------+--------------------------------------------------------------------------------------------------------------------------------------------------
2024-05-09 | 123 | \001\000\000\000\024\000\000\000\002\000\000\000\014\000\000\000\020\000\000\000val1val2\002\000\000\000\014\000\000\000\020\000\000\000key1key2
My test was on this version: Vertica Analytic Database v24.1.0-4
Looks like it works properly on v 24.1. Thanks!
Checking v 24.2:
First table is a flex table, second table is normal table with long varbinary column.
Load works into normal table but does not work into flex table:
It is definitely a great improvement from 12.0.4, but only partial fix... @SergeB can you make loading work for flex tables as well? Should not be that hard.
In most cases, we do have additional fields outside JSON for loading into flex table. We had to put additional columns into JSON, just to populate real columns in flex table. That makes table larger, raw field unnecessary has extra fields. This bug definitely bites.
Thank you
Sergey
@SergeB related bug:
Maptostring produces string with JSON, containing dictionary with keys ['1','2','3','....] instead of arrays.
Maptostring cannot be used to re-create original JSON that has been loaded into flex table - you will get no arrays but dictionaries instead.
I found users are keeping original JSON string in flex table as they need original and cannot reproduce with maptostring.
Can we have new parameter to maptostring - approx deduct array. If it is dictionary and keys are sequential numbers ['1','2','3','4'...], maptostring will produce array instead of dict. That will not be exact but good enough for vast majority of cases.
Thank you
Sergey
@Sergey_Cherepan_1 for the second issue (maptostring handling of arrays) a new feature request was created.
For the first issue (json FILLER ambiguity when using FlexTable), in my tests it works fine if you remove the ambiguity by using
copy test_flex(dt, json filler varchar(80), raw as MapJSONExtractor("FILLER".json), v)
The reason for the ambiguity is probably because in the case of a flex table, json can come from FILLER but also be extracted from one of the fields in raw
@SergeB
It works!
Syntax is really non-trivial and obnoxious, but it works!
In past, I tried VERY MANY various combinations, but never hit a right syntax.
This absolutely deserve being mentioned in docs, along with your example.
May be you can invent more user-friendly syntax - without double quotes (that needs to be escaped in vast majority of cases) and without stars.
Thanks for a hint, it really helps. I will be using it like tomorrow.
We do load quite a few flex tables. Current practice is to add all columns that needs to be materialised to JSON, even if those columns are not part of original JSON. Your hint will greatly simplify process - for example, we will have a normal CSV file, where one of fields are JSON. This CSV format is not loadable into flex table without knowing your hint.
Sergey