Flex table interprets version string as a date
sandeepdeva
Community Edition User
I am trying to load a JSON which has "version" (e.g. 2.3.1, 5.3.4) field to a Flex table, however, when I compute the keys it is interpreting these strings as date. Can someone please help with this? This is on Vertica 9.2.0
Steps to recreate:
test.json
{"id":1, "version": "5.3.4"}
{"id":2, "version": "5.5.4"}
Load this file to a flex table
create flex table test(); copy test from '/home/dbadmin/test.json' parser fjsonparser(key_separator='_'); select compute_flextable_keys('test'); select build_flextable_view('test'); select * from test_keys; key_name|frequency|data_type_guess| --------|---------|---------------| id | 2|Integer | version | 2|Date | select * from test_view; id|version | --|----------| 1 |2004-05-03| 2 |2004-05-05|
Tagged:
0
Answers
The string values '5.3.4' and '5.5.4' can be implicitly coerced to DATEs:
Try temporily turing off the the EnableBetterFlexTypeGuessing parameter...
See:
https://www.vertica.com/docs/9.3.x/HTML/Content/Authoring/FlexTables/SettingFlexTableParameters.htm
Or, you can also simply create your own view The actual data in the FLEX table is what you loaded. It's the view that's doing the data type coercion!
Thanks for the detailed response Jim.
It does not always 'guess' it as date, for the following JSON, it guesses 'version1' as Date but 'version2' as Varchar
I can disable 'EnableBetterFlexTypeGuessing' but dealing with boolean or numeric values requires some work.
I am not sure how 'data_type_guess' works but I am assuming it takes into account the top K occurrences of the values of the column, and determines which data type suits these top K values.
Anyways, it is an implementation detail which we do not have control over, so as a work around I was thinking if I update the '_keys' table with correct data type for this column and then build the view I should see correct results.
There is no month 0. They need to be 1 to 12. So a date parsing would fail.
In general , a one to two digit number, from 1 to 13, followed by a dot, followed by a one to two digit number, from one to twelve, followed by a dot, followed by either a number from 1 to 9999 , or a number from 1 to 99, is a date in European DMY format for Vertica.
So "6.1.4" is the sixth of January 2004, and "4.1.1" is the fourth of January 2001.
Not 1 to 13, of course, it's 1 to 31, my fingers were a bit un-synchronised ....
Thanks marcothesane, I didn't know it was a valid date format.
Hi sandeepdeva. It's been a few months since you asked this question, but I wanted to follow up on a related question you had.
What if you manually update the
keys
table to reflect the "right" data type? In a nutshell, the answer is yes you can safely do that because the contents of thekeys
table is derived from your flex table. This is a way to help the data type guessing algorithm, which is not always right as you saw. This algorithm does a best effort to guess the most suitable data type for a key given the values it sees in said key. Sometimes, it can get confused on values that can be coerced to multiple types, e.g.,1
is a integer but it can also be BooleanTrue
, which one we choose? We won't get into the details of the algorithm here, but it suffices to say thatbuild_flextable_view
will take the updates inkeys
and come up with a newview
object.You may be wondering what if you manipulate the
keys
table too hard to the point that the resultingview
isn't any good. In this case you can just truncate thekeys
table and re-runcompute_flextable_keys
to get you back to the initial state of thekeys
table.I updated the
keys
table in your example and got the results you were looking for.