Flex table interprets version string as a date

sandeepdevasandeepdeva Community Edition User

I am trying to load a JSON which has "version" (e.g. 2.3.1, 5.3.4) field to a Flex table, however, when I compute the keys it is interpreting these strings as date. Can someone please help with this? This is on Vertica 9.2.0

Steps to recreate:
test.json

{"id":1, "version": "5.3.4"}
{"id":2, "version": "5.5.4"}

Load this file to a flex table

create flex table test();
copy test from '/home/dbadmin/test.json' parser fjsonparser(key_separator='_');
select compute_flextable_keys('test');
select build_flextable_view('test');
select * from test_keys;
key_name|frequency|data_type_guess|
--------|---------|---------------|
id      |        2|Integer        |
version |        2|Date           |            
select * from test_view;
id|version   |
--|----------|
1 |2004-05-03|
2 |2004-05-05|
Tagged:

Answers

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited December 2019

    The string values '5.3.4' and '5.5.4' can be implicitly coerced to DATEs:

    dbadmin=> SELECT '5.3.4'::DATE "Hmm..." UNION ALL SELECT '5.5.4'::DATE;
       Hmm...
    ------------
     2004-05-03
     2004-05-05
    (2 rows)
    

    Try temporily turing off the the EnableBetterFlexTypeGuessing parameter...

    dbadmin=> SELECT current_value, default_value, description FROM configuration_parameters WHERE parameter_name = 'EnableBetterFlexTypeGuessing';
     current_value | default_value |                            description
    ---------------+---------------+-------------------------------------------------------------------
     0             | 1             | When computing flex table keys, attempt to guess non-string types
    (1 row)
    
    dbadmin=> SELECT set_config_parameter('EnableBetterFlexTypeGuessing', 0);
        set_config_parameter
    ----------------------------
     Parameter set successfully
    (1 row)
    
    dbadmin=> CREATE FLEX TABLE test();
    CREATE TABLE
    
    dbadmin=> \! cat /home/dbadmin/test.json
    {"id":1, "version": "5.3.4"}
    {"id":2, "version": "5.5.4"}
    
    dbadmin=> COPY test FROM '/home/dbadmin/test.json' PARSER fjsonparser();
     Rows Loaded
    -------------
               2
    (1 row)
    
    dbadmin=> SELECT compute_flextable_keys_and_build_view('test');
                                compute_flextable_keys_and_build_view
    ----------------------------------------------------------------------------------------------
     Please see public.test_keys for updated keys
    The view public.test_view is ready for querying
    (1 row)
    
    dbadmin=> \dv public.test_view
                    List of View Fields
     Schema |   View    | Column  |    Type     | Size
    --------+-----------+---------+-------------+------
     public | test_view | id      | varchar(20) |   20
     public | test_view | version | varchar(20) |   20
    (2 rows)
    
    dbadmin=> SELECT set_config_parameter('EnableBetterFlexTypeGuessing', 1);
        set_config_parameter
    ----------------------------
     Parameter set successfully
    (1 row)
    

    See:
    https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/FlexTables/SettingFlexTableParameters.htm

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited December 2019

    Or, you can also simply create your own view The actual data in the FLEX table is what you loaded. It's the view that's doing the data type coercion!

    dbadmin=> DROP TABLE test;
    DROP TABLE
    
    dbadmin=> CREATE FLEX TABLE test();
    CREATE TABLE
    
    dbadmin=> COPY test FROM '/home/dbadmin/test.json' PARSER fjsonparser();
     Rows Loaded
    -------------
               2
    (1 row)
    
    dbadmin=> SELECT compute_flextable_keys_and_build_view('test');
                                compute_flextable_keys_and_build_view
    ----------------------------------------------------------------------------------------------
     Please see public.test_keys for updated keys
    The view public.test_view is ready for querying
    (1 row)
    
    dbadmin=> SELECT view_definition FROM views WHERE table_name = 'test_view';
                      view_definition
    ----------------------------------------------------
     SELECT id::!int8, version::!Date FROM public.test;
    (1 row)
    
    dbadmin=> CREATE OR REPLACE VIEW public.test_view AS SELECT id::!int8, version::!Varchar FROM public.test;
    CREATE VIEW
    
    dbadmin=> SELECT * FROM public.test_view;
     id | version
    ----+---------
      1 | 5.3.4
      2 | 5.5.4
    (2 rows)
    
  • sandeepdevasandeepdeva Community Edition User

    Thanks for the detailed response Jim.

    It does not always 'guess' it as date, for the following JSON, it guesses 'version1' as Date but 'version2' as Varchar

    {"version1":"6.1.4","version2":"7.0.2"}
    {"version1" : "4.1.1","version2" : "7.0.1"}

    I can disable 'EnableBetterFlexTypeGuessing' but dealing with boolean or numeric values requires some work.

    I am not sure how 'data_type_guess' works but I am assuming it takes into account the top K occurrences of the values of the column, and determines which data type suits these top K values.
    Anyways, it is an implementation detail which we do not have control over, so as a work around I was thinking if I update the '_keys' table with correct data type for this column and then build the view I should see correct results.

  • marcothesanemarcothesane - Select Field - Administrator

    There is no month 0. They need to be 1 to 12. So a date parsing would fail.
    In general , a one to two digit number, from 1 to 13, followed by a dot, followed by a one to two digit number, from one to twelve, followed by a dot, followed by either a number from 1 to 9999 , or a number from 1 to 99, is a date in European DMY format for Vertica.
    So "6.1.4" is the sixth of January 2004, and "4.1.1" is the fourth of January 2001.

  • marcothesanemarcothesane - Select Field - Administrator

    Not 1 to 13, of course, it's 1 to 31, my fingers were a bit un-synchronised ....

  • sandeepdevasandeepdeva Community Edition User

    Thanks marcothesane, I didn't know it was a valid date format.

  • Ariel_CaryAriel_Cary Vertica Employee Employee

    Hi sandeepdeva. It's been a few months since you asked this question, but I wanted to follow up on a related question you had.

    What if you manually update the keys table to reflect the "right" data type? In a nutshell, the answer is yes you can safely do that because the contents of the keys table is derived from your flex table. This is a way to help the data type guessing algorithm, which is not always right as you saw. This algorithm does a best effort to guess the most suitable data type for a key given the values it sees in said key. Sometimes, it can get confused on values that can be coerced to multiple types, e.g., 1 is a integer but it can also be Boolean True, which one we choose? We won't get into the details of the algorithm here, but it suffices to say that build_flextable_view will take the updates in keys and come up with a new view object.

    You may be wondering what if you manipulate the keys table too hard to the point that the resulting view isn't any good. In this case you can just truncate the keys table and re-run compute_flextable_keys to get you back to the initial state of the keys table.

    I updated the keys table in your example and got the results you were looking for.

    create flex table test();
    CREATE TABLE
    copy test from stdin parser fjsonparser(key_separator='_');
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    {"id":1, "version": "5.3.4"}
    {"id":2, "version": "5.5.4"}>> 
    \.
    select compute_flextable_keys('test');
    select * from test_keys;
    -[ RECORD 1 ]---+--------
    key_name        | id
    frequency       | 2
    data_type_guess | Integer
    -[ RECORD 2 ]---+--------
    key_name        | version
    frequency       | 2
    data_type_guess | Date
    -- version column data type doesn't look right...
    
    -- Let's create the view object and see what we get
    select build_flextable_view('test');
    select view_definition from views where table_name='test_view';
    -[ RECORD 1 ]---+---------------------------------------------------
    view_definition | SELECT id::!int8, version::!Date FROM public.test;
    
    select * from test_view;
    -[ RECORD 1 ]-------
    id      | 1
    version | 2004-05-03
    -[ RECORD 2 ]-------
    id      | 2
    version | 2004-05-05
    -- well, we suspected the version column wasn't going to look great
    
    -- let's try to help the data type guesser by correcting the version data type
    update test_keys set data_type_guess='Varchar(100)' where key_name='version';
    commit;
    -- now re-build the view
    select build_flextable_view('test');
    -- and check the new view definition
    select view_definition from views where table_name='test_view';
    -[ RECORD 1 ]---+--------------------------------------------------------------
    view_definition | SELECT id::!int8, version::!"Varchar"(100)  FROM public.test;
    -- now, query the view and see the results we expected
    select * from test_view;
    -[ RECORD 1 ]--
    id      | 1
    version | 5.3.4
    -[ RECORD 2 ]--
    id      | 2
    version | 5.5.4
    

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file