How to load JSON with expression on JSON field to same column name
Hi,
I am loading through JSON.
To avoid complex time and date string formats, my JSON always contain int for date/time/timestamp.
For example, I am using number of micros since epoch for timestamp JSON fields, number of micros since midnight for time JSON fields, number of days from epoch for date fields.
Of course JSON fields has names, and users want to have column names same as field names.
Here is a problem - I found no way to convert number of micros since epoch into timestamp column in copy transform, without changing column name.
Yes I can declare my input field with micros since epoch as filler, and then load into column with different name using to_timestamp function. But column name became different.
For example, my JSON data has field ts with micros since epoch:
...
"ts":1662577716717866,
...
ts contains number of micros since epoch.
I can write in copy transform
ts filler int,
ts_ts as to_timestamp(ts)
That works, but column name changed - input data has field ts, table has column ts_ts. Users are not happy.
Question - how to load ts into table column name ts with type timestamp?
I cannot use date/time format in copy transform, first because there is no format for micros since epoch, second because formats are for strings and I have int.
Same problem when I have field in JSON as int 0 or 1, and users want to see column with same name but type bool.
"isB" : 1, - this is JSON for true
I cannot load it into table column named isB type boolean. I can load into table column named isB_renamed.
It appears, Vertica copy transform lack option to do expression on input column and same to table field with same name. Table field has to be different, to do expression on input field in transform.
In my case, I have to have table columns with different names comparing to JSON, in a lot of cases. Each time when I need to do any expression on input field I have ot create in table column with different name.
Problem is not limited to JSON data format. It is present for any format that has names for input data fields. For example problem exist for Vertica supported AVRO format.
I will appreciate if you give me some hint how to work around this problem.
Thank you
Sergey
Answers
One approach would be to load data into a temp table, then execute a query or stored procedure to transform the data and load into target table.
Otherwise you could write a filter or parser that could transform the data value or type while preserving field name.
Or create a view for querying. BTW, 0 and 1 are valid values for boolean so you could define the column as boolean and those would work.
Thanks for recommendations!
All approaches you suggested would work, but either not practical, or create undesirable overhead, or will be poorly performing.
Curious how a view fits into the categories of not practical, overhead, or poorly performing. You can perform the desired formatting while loading into the table so you don't do it on each and every query. A view would just change column names so your queries can use your desired identifier names.
Hi Dave,
Yes, view would be poorly performing on querying.
For example I need to write expression on input field that is not supported by Vertica in transform.
For example, string type support operation TRIM, but nothing more, and you need to convert input field into capitalised form (first letter uppercase, rest lowercase) for names.
Yes you can load original data into table, and view on top that will capitalise name field.
If you will run query with condition on name field, Vertica will not be able to perform any optimisation on name field, as it will be in expression. checking on ROS headers min/max values for name column would not work. On huge data it makes a huge impact on query performance.
I can give you a lot of examples when using view with expression on column make for very poor performance.
Please let me know if you have any questions.
Sergey