Hybrid Flex table performance vs standard table
If we are referring real columns only in a hybrid flex table - will there be any performance issues as compared to a standard table. I know that loading data from JSON would be slower as compared to CSV parser on a standard table. However for query performance - will there be any issues, are projections created will work equally good?
If we are not referring raw in a query - will there be any hidden storage cost associated with the query?
Tagged:
0
Answers
There won't be any penalty. Real columns in a hybrid flex table act the same was as columns in a regular, columnar table. The server will choose real columns (over virtual columns) when available and their projections to process queries.
If specifying column names and types when creating the table, is there any downside to always using "CREATE FLEX TABLE" versus "CREATE TABLE"? Is it just more storage to retain the __raw__column? My primary goal here is to allow queries to refer to columns name(s) that might not exist in the table and have the query return NULL values for those back rather than the query fail with a hard error.
That's an interesting use case!
A quick test shows that the RAW columns does add alot of storage:
dbadmin=> SELECT export_tables('', 'car_sales'); export_tables ------------------------------------------------------------------------------------------------------------------------------------------------------------------ CREATE FLEX TABLE public.car_sales ( manufact varchar(16), model varchar(32), sales float, resale float, released date, type int ); (1 row) dbadmin=> SELECT export_tables('', 'car_sales_ros'); export_tables ----------------------------------------------------------------------------------------------------------------------------------------------------------------- CREATE TABLE public.car_sales_ros ( manufact varchar(16), model varchar(32), sales float, resale float, released date, type int ); (1 row) dbadmin=> SELECT COUNT(*) FROM car_sales; COUNT ---------- 12582912 (1 row) dbadmin=> SELECT COUNT(*) FROM car_sales_ros; COUNT ---------- 12582912 (1 row) dbadmin=> SELECT column_name, used_bytes FROM column_storage WHERE anchor_table_name = 'car_sales' ORDER BY 1; column_name | used_bytes -------------+------------ __raw__ | 238384764 epoch | 75822 manufact | 3707 model | 1513123 released | 23627457 resale | 40519392 sales | 39846433 type | 12592002 (8 rows) dbadmin=> SELECT column_name, used_bytes FROM column_storage WHERE anchor_table_name = 'car_sales_ros' ORDER BY 1; column_name | used_bytes -------------+------------ epoch | 73776 manufact | 3707 model | 1513123 released | 23627457 resale | 40519392 sales | 39846433 type | 12592002 (7 rows) dbadmin=> SELECT SUM(used_bytes) FROM projection_storage WHERE anchor_table_name = 'car_sales'; SUM ----------- 356562700 (1 row) dbadmin=> SELECT SUM(used_bytes) FROM projection_storage WHERE anchor_table_name = 'car_sales_ros'; SUM ----------- 118175890 (1 row) dbadmin=> SELECT (356562700 / 118175890)::INT "FLEX is x times bigger than ROS"; FLEX is x times bigger than ROS --------------------------------- 3 (1 row)But the bigger issue is proabbly in maintaing that column. For example, it takes a lot longer to run a big INSERT into the FLEX table than a standard ROS table.
@Jim_Knicely Thank you for such a carefully thought out answer! For our purpose always defining tables by specifying the column types as typical, but also defining them as
FLEXseems like the way to go, and worth the storage overhead, given a choice between queries failing vs extra storage.In our work, a common situation is uploading data files as data tables for analysis, and then querying multiple tables in which a column might exist in many but not all, e.g.:
The problem is that maybe column
dwasn't in the week 3 file, and columnbwasn't in the week 49 file and so on, and it's nice not to have the whole query fail with "columnddoes not exist" and have to go back and identify which columns weren't in which tables and fix that, only to next see it fail with "columnbdoes not exist" and so on.