Hybrid Flex table performance vs standard table

If we are referring real columns only in a hybrid flex table - will there be any performance issues as compared to a standard table. I know that loading data from JSON would be slower as compared to CSV parser on a standard table. However for query performance - will there be any issues, are projections created will work equally good?
If we are not referring raw in a query - will there be any hidden storage cost associated with the query?

Answers

  • There won't be any penalty. Real columns in a hybrid flex table act the same was as columns in a regular, columnar table. The server will choose real columns (over virtual columns) when available and their projections to process queries.

  • If specifying column names and types when creating the table, is there any downside to always using "CREATE FLEX TABLE" versus "CREATE TABLE"? Is it just more storage to retain the __raw__column? My primary goal here is to allow queries to refer to columns name(s) that might not exist in the table and have the query return NULL values for those back rather than the query fail with a hard error.

    CREATE FLEX TABLE car_sales ( 
     "manufact" VARCHAR(16),
     "model" VARCHAR(32),
     "sales" FLOAT,
     "resale" FLOAT,
    "released": DATE,
     "type" INTEGER
    )
    
  • Jim_KnicelyJim_Knicely Administrator

    That's an interesting use case!

    A quick test shows that the RAW columns does add alot of storage:

    dbadmin=> SELECT export_tables('', 'car_sales');
                                                                              export_tables
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------
    CREATE FLEX TABLE public.car_sales
    (
        manufact varchar(16),
        model varchar(32),
        sales float,
        resale float,
        released date,
        type int
    );
    
    (1 row)
    
    dbadmin=> SELECT export_tables('', 'car_sales_ros');
                                                                              export_tables
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------
    CREATE TABLE public.car_sales_ros
    (
        manufact varchar(16),
        model varchar(32),
        sales float,
        resale float,
        released date,
        type int
    );
    
    (1 row)
    
    dbadmin=> SELECT COUNT(*) FROM car_sales;
      COUNT
    ----------
     12582912
    (1 row)
    
    dbadmin=> SELECT COUNT(*) FROM car_sales_ros;
      COUNT
    ----------
     12582912
    (1 row)
    
    dbadmin=> SELECT column_name, used_bytes FROM column_storage WHERE anchor_table_name = 'car_sales' ORDER BY 1;
     column_name | used_bytes
    -------------+------------
     __raw__     |  238384764
     epoch       |      75822
     manufact    |       3707
     model       |    1513123
     released    |   23627457
     resale      |   40519392
     sales       |   39846433
     type        |   12592002
    (8 rows)
    
    dbadmin=> SELECT column_name, used_bytes FROM column_storage WHERE anchor_table_name = 'car_sales_ros' ORDER BY 1;
     column_name | used_bytes
    -------------+------------
     epoch       |      73776
     manufact    |       3707
     model       |    1513123
     released    |   23627457
     resale      |   40519392
     sales       |   39846433
     type        |   12592002
    (7 rows)
    
    dbadmin=> SELECT SUM(used_bytes) FROM projection_storage WHERE anchor_table_name = 'car_sales';
        SUM
    -----------
     356562700
    (1 row)
    
    dbadmin=> SELECT SUM(used_bytes) FROM projection_storage WHERE anchor_table_name = 'car_sales_ros';
        SUM
    -----------
     118175890
    (1 row)
    
    dbadmin=> SELECT (356562700 / 118175890)::INT "FLEX is x times bigger than ROS";
     FLEX is x times bigger than ROS
    ---------------------------------
                                   3
    (1 row)
    

    But the bigger issue is proabbly in maintaing that column. For example, it takes a lot longer to run a big INSERT into the FLEX table than a standard ROS table.

    dbadmin=> insert into car_sales SELECT * FROM car_sales_ros;
      OUTPUT
    ----------
     12582912
    (1 row)
    
    Time: First fetch (1 row): 42295.620 ms. All rows formatted: 42295.696 ms
    
    dbadmin=> insert into car_sales_ros SELECT * FROM car_sales_ros;
      OUTPUT
    ----------
     12582912
    (1 row)
    
    Time: First fetch (1 row): 3462.721 ms. All rows formatted: 3462.760 ms
    
    
  • @Jim_Knicely Thank you for such a carefully thought out answer! For our purpose always defining tables by specifying the column types as typical, but also defining them as FLEX seems like the way to go, and worth the storage overhead, given a choice between queries failing vs extra storage.

    In our work, a common situation is uploading data files as data tables for analysis, and then querying multiple tables in which a column might exist in many but not all, e.g.:

    CREATE TABLE sales_2020
    SELECT a, b, c, d FROM week1 UNION ALL
    SELECT a, b, c, d FROM week2 UNION ALL
    SELECT a, b, c, d FROM week3 UNION ALL
    SELECT a, b, c, d FROM week4 UNION ALL
    ...
    

    The problem is that maybe column d wasn't in the week 3 file, and column b wasn't in the week 49 file and so on, and it's nice not to have the whole query fail with "column d does not exist" and have to go back and identify which columns weren't in which tables and fix that, only to next see it fail with "column b does not exist" and so on.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file