Flex table interprets version string as a date
I am trying to load a JSON which has "version" (e.g. 2.3.1, 5.3.4) field to a Flex table, however, when I compute the keys it is interpreting these strings as date. Can someone please help with this? This is on Vertica 9.2.0
Steps to recreate:
test.json
{"id":1, "version": "5.3.4"}
{"id":2, "version": "5.5.4"}
Load this file to a flex table
create flex table test();
copy test from '/home/dbadmin/test.json' parser fjsonparser(key_separator='_');
select compute_flextable_keys('test');
select build_flextable_view('test');
select * from test_keys;
key_name|frequency|data_type_guess|
--------|---------|---------------|
id | 2|Integer |
version | 2|Date |
select * from test_view;
id|version |
--|----------|
1 |2004-05-03|
2 |2004-05-05|
Tagged:
0
Answers
The string values '5.3.4' and '5.5.4' can be implicitly coerced to DATEs:
Try temporily turing off the the EnableBetterFlexTypeGuessing parameter...
dbadmin=> SELECT current_value, default_value, description FROM configuration_parameters WHERE parameter_name = 'EnableBetterFlexTypeGuessing'; current_value | default_value | description ---------------+---------------+------------------------------------------------------------------- 0 | 1 | When computing flex table keys, attempt to guess non-string types (1 row) dbadmin=> SELECT set_config_parameter('EnableBetterFlexTypeGuessing', 0); set_config_parameter ---------------------------- Parameter set successfully (1 row) dbadmin=> CREATE FLEX TABLE test(); CREATE TABLE dbadmin=> \! cat /home/dbadmin/test.json {"id":1, "version": "5.3.4"} {"id":2, "version": "5.5.4"} dbadmin=> COPY test FROM '/home/dbadmin/test.json' PARSER fjsonparser(); Rows Loaded ------------- 2 (1 row) dbadmin=> SELECT compute_flextable_keys_and_build_view('test'); compute_flextable_keys_and_build_view ---------------------------------------------------------------------------------------------- Please see public.test_keys for updated keys The view public.test_view is ready for querying (1 row) dbadmin=> \dv public.test_view List of View Fields Schema | View | Column | Type | Size --------+-----------+---------+-------------+------ public | test_view | id | varchar(20) | 20 public | test_view | version | varchar(20) | 20 (2 rows) dbadmin=> SELECT set_config_parameter('EnableBetterFlexTypeGuessing', 1); set_config_parameter ---------------------------- Parameter set successfully (1 row)See:
https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/FlexTables/SettingFlexTableParameters.htm
Or, you can also simply create your own view The actual data in the FLEX table is what you loaded. It's the view that's doing the data type coercion!
dbadmin=> DROP TABLE test; DROP TABLE dbadmin=> CREATE FLEX TABLE test(); CREATE TABLE dbadmin=> COPY test FROM '/home/dbadmin/test.json' PARSER fjsonparser(); Rows Loaded ------------- 2 (1 row) dbadmin=> SELECT compute_flextable_keys_and_build_view('test'); compute_flextable_keys_and_build_view ---------------------------------------------------------------------------------------------- Please see public.test_keys for updated keys The view public.test_view is ready for querying (1 row) dbadmin=> SELECT view_definition FROM views WHERE table_name = 'test_view'; view_definition ---------------------------------------------------- SELECT id::!int8, version::!Date FROM public.test; (1 row) dbadmin=> CREATE OR REPLACE VIEW public.test_view AS SELECT id::!int8, version::!Varchar FROM public.test; CREATE VIEW dbadmin=> SELECT * FROM public.test_view; id | version ----+--------- 1 | 5.3.4 2 | 5.5.4 (2 rows)Thanks for the detailed response Jim.
It does not always 'guess' it as date, for the following JSON, it guesses 'version1' as Date but 'version2' as Varchar
I can disable 'EnableBetterFlexTypeGuessing' but dealing with boolean or numeric values requires some work.
I am not sure how 'data_type_guess' works but I am assuming it takes into account the top K occurrences of the values of the column, and determines which data type suits these top K values.
Anyways, it is an implementation detail which we do not have control over, so as a work around I was thinking if I update the '_keys' table with correct data type for this column and then build the view I should see correct results.
There is no month 0. They need to be 1 to 12. So a date parsing would fail.
In general , a one to two digit number, from 1 to 13, followed by a dot, followed by a one to two digit number, from one to twelve, followed by a dot, followed by either a number from 1 to 9999 , or a number from 1 to 99, is a date in European DMY format for Vertica.
So "6.1.4" is the sixth of January 2004, and "4.1.1" is the fourth of January 2001.
Not 1 to 13, of course, it's 1 to 31, my fingers were a bit un-synchronised ....
Thanks marcothesane, I didn't know it was a valid date format.
Hi sandeepdeva. It's been a few months since you asked this question, but I wanted to follow up on a related question you had.
What if you manually update the
keystable to reflect the "right" data type? In a nutshell, the answer is yes you can safely do that because the contents of thekeystable is derived from your flex table. This is a way to help the data type guessing algorithm, which is not always right as you saw. This algorithm does a best effort to guess the most suitable data type for a key given the values it sees in said key. Sometimes, it can get confused on values that can be coerced to multiple types, e.g.,1is a integer but it can also be BooleanTrue, which one we choose? We won't get into the details of the algorithm here, but it suffices to say thatbuild_flextable_viewwill take the updates inkeysand come up with a newviewobject.You may be wondering what if you manipulate the
keystable too hard to the point that the resultingviewisn't any good. In this case you can just truncate thekeystable and re-runcompute_flextable_keysto get you back to the initial state of thekeystable.I updated the
keystable in your example and got the results you were looking for.create flex table test(); CREATE TABLE copy test from stdin parser fjsonparser(key_separator='_'); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. {"id":1, "version": "5.3.4"} {"id":2, "version": "5.5.4"}>> \. select compute_flextable_keys('test'); select * from test_keys; -[ RECORD 1 ]---+-------- key_name | id frequency | 2 data_type_guess | Integer -[ RECORD 2 ]---+-------- key_name | version frequency | 2 data_type_guess | Date -- version column data type doesn't look right... -- Let's create the view object and see what we get select build_flextable_view('test'); select view_definition from views where table_name='test_view'; -[ RECORD 1 ]---+--------------------------------------------------- view_definition | SELECT id::!int8, version::!Date FROM public.test; select * from test_view; -[ RECORD 1 ]------- id | 1 version | 2004-05-03 -[ RECORD 2 ]------- id | 2 version | 2004-05-05 -- well, we suspected the version column wasn't going to look great -- let's try to help the data type guesser by correcting the version data type update test_keys set data_type_guess='Varchar(100)' where key_name='version'; commit; -- now re-build the view select build_flextable_view('test'); -- and check the new view definition select view_definition from views where table_name='test_view'; -[ RECORD 1 ]---+-------------------------------------------------------------- view_definition | SELECT id::!int8, version::!"Varchar"(100) FROM public.test; -- now, query the view and see the results we expected select * from test_view; -[ RECORD 1 ]-- id | 1 version | 5.3.4 -[ RECORD 2 ]-- id | 2 version | 5.5.4