Hybrid Flex table performance vs standard table
If we are referring real columns only in a hybrid flex table - will there be any performance issues as compared to a standard table. I know that loading data from JSON would be slower as compared to CSV parser on a standard table. However for query performance - will there be any issues, are projections created will work equally good?
If we are not referring raw in a query - will there be any hidden storage cost associated with the query?
Tagged:
0
Answers
There won't be any penalty. Real columns in a hybrid flex table act the same was as columns in a regular, columnar table. The server will choose real columns (over virtual columns) when available and their projections to process queries.
If specifying column names and types when creating the table, is there any downside to always using "CREATE FLEX TABLE" versus "CREATE TABLE"? Is it just more storage to retain the __raw__column? My primary goal here is to allow queries to refer to columns name(s) that might not exist in the table and have the query return NULL values for those back rather than the query fail with a hard error.
That's an interesting use case!
A quick test shows that the RAW columns does add alot of storage:
But the bigger issue is proabbly in maintaing that column. For example, it takes a lot longer to run a big INSERT into the FLEX table than a standard ROS table.
@Jim_Knicely Thank you for such a carefully thought out answer! For our purpose always defining tables by specifying the column types as typical, but also defining them as
FLEX
seems like the way to go, and worth the storage overhead, given a choice between queries failing vs extra storage.In our work, a common situation is uploading data files as data tables for analysis, and then querying multiple tables in which a column might exist in many but not all, e.g.:
The problem is that maybe column
d
wasn't in the week 3 file, and columnb
wasn't in the week 49 file and so on, and it's nice not to have the whole query fail with "columnd
does not exist" and have to go back and identify which columns weren't in which tables and fix that, only to next see it fail with "columnb
does not exist" and so on.