Hybrid Flex table performance vs standard table

rajatpaliwal86 · January 2020

If we are referring real columns only in a hybrid flex table - will there be any performance issues as compared to a standard table. I know that loading data from JSON would be slower as compared to CSV parser on a standard table. However for query performance - will there be any issues, are projections created will work equally good?
If we are not referring raw in a query - will there be any hidden storage cost associated with the query?

Ariel_Cary · March 2020

There won't be any penalty. Real columns in a hybrid flex table act the same was as columns in a regular, columnar table. The server will choose real columns (over virtual columns) when available and their projections to process queries.

Pieter_Sheth-Vo · December 2020

If specifying column names and types when creating the table, is there any downside to always using "CREATE FLEX TABLE" versus "CREATE TABLE"? Is it just more storage to retain the __raw__column? My primary goal here is to allow queries to refer to columns name(s) that might not exist in the table and have the query return NULL values for those back rather than the query fail with a hard error.

CREATE FLEX TABLE car_sales ( 
 "manufact" VARCHAR(16),
 "model" VARCHAR(32),
 "sales" FLOAT,
 "resale" FLOAT,
"released": DATE,
 "type" INTEGER
)

Jim_Knicely · December 2020

That's an interesting use case!

A quick test shows that the RAW columns does add alot of storage:

dbadmin=> SELECT export_tables('', 'car_sales');
                                                                          export_tables
------------------------------------------------------------------------------------------------------------------------------------------------------------------
CREATE FLEX TABLE public.car_sales
(
    manufact varchar(16),
    model varchar(32),
    sales float,
    resale float,
    released date,
    type int
);

(1 row)

dbadmin=> SELECT export_tables('', 'car_sales_ros');
                                                                          export_tables
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
CREATE TABLE public.car_sales_ros
(
    manufact varchar(16),
    model varchar(32),
    sales float,
    resale float,
    released date,
    type int
);

(1 row)

dbadmin=> SELECT COUNT(*) FROM car_sales;
  COUNT
----------
 12582912
(1 row)

dbadmin=> SELECT COUNT(*) FROM car_sales_ros;
  COUNT
----------
 12582912
(1 row)

dbadmin=> SELECT column_name, used_bytes FROM column_storage WHERE anchor_table_name = 'car_sales' ORDER BY 1;
 column_name | used_bytes
-------------+------------
 __raw__     |  238384764
 epoch       |      75822
 manufact    |       3707
 model       |    1513123
 released    |   23627457
 resale      |   40519392
 sales       |   39846433
 type        |   12592002
(8 rows)

dbadmin=> SELECT column_name, used_bytes FROM column_storage WHERE anchor_table_name = 'car_sales_ros' ORDER BY 1;
 column_name | used_bytes
-------------+------------
 epoch       |      73776
 manufact    |       3707
 model       |    1513123
 released    |   23627457
 resale      |   40519392
 sales       |   39846433
 type        |   12592002
(7 rows)

dbadmin=> SELECT SUM(used_bytes) FROM projection_storage WHERE anchor_table_name = 'car_sales';
    SUM
-----------
 356562700
(1 row)

dbadmin=> SELECT SUM(used_bytes) FROM projection_storage WHERE anchor_table_name = 'car_sales_ros';
    SUM
-----------
 118175890
(1 row)

dbadmin=> SELECT (356562700 / 118175890)::INT "FLEX is x times bigger than ROS";
 FLEX is x times bigger than ROS
---------------------------------
                               3
(1 row)

But the bigger issue is proabbly in maintaing that column. For example, it takes a lot longer to run a big INSERT into the FLEX table than a standard ROS table.

dbadmin=> insert into car_sales SELECT * FROM car_sales_ros;
  OUTPUT
----------
 12582912
(1 row)

Time: First fetch (1 row): 42295.620 ms. All rows formatted: 42295.696 ms

dbadmin=> insert into car_sales_ros SELECT * FROM car_sales_ros;
  OUTPUT
----------
 12582912
(1 row)

Time: First fetch (1 row): 3462.721 ms. All rows formatted: 3462.760 ms

Pieter_Sheth-Vo · December 2020

@Jim_Knicely Thank you for such a carefully thought out answer! For our purpose always defining tables by specifying the column types as typical, but also defining them as FLEX seems like the way to go, and worth the storage overhead, given a choice between queries failing vs extra storage.

In our work, a common situation is uploading data files as data tables for analysis, and then querying multiple tables in which a column might exist in many but not all, e.g.:

CREATE TABLE sales_2020
SELECT a, b, c, d FROM week1 UNION ALL
SELECT a, b, c, d FROM week2 UNION ALL
SELECT a, b, c, d FROM week3 UNION ALL
SELECT a, b, c, d FROM week4 UNION ALL
...

The problem is that maybe column d wasn't in the week 3 file, and column b wasn't in the week 49 file and so on, and it's nice not to have the whole query fail with "column d does not exist" and have to go back and identify which columns weren't in which tables and fix that, only to next see it fail with "column b does not exist" and so on.

We're Moving!

Create My New Community Account Now

Hybrid Flex table performance vs standard table

Answers

Leave a Comment