Options

The best way of extracting data from flex tables

Greetings everyone,

We store some types of data in the flex tables because they are quite unstructured and we believe that flex tables are the best solution for it.

From time to time we need to extract quite big amount of data (about 2 billion records)
We noticed that extracting data from the flex table using 'select field1, field2, …, fieldN from flex_table' is bottleneck.
For example, the extraction of 400000 records with 10 non-materialized fields takes about 58 seconds. Please note that all these 10 fields exist in stored JSON.

What is the best way (in terms of processing time) of exporting data from a flex table?

Thank you.

Comments

  • Options
    Ariel_CaryAriel_Cary Vertica Employee Employee

    Hi LANC,

    Flex tables offer you flexibility in allowing to store data that has evolving schema, which it appears that you are capitalizing it well in your solution. Performance, on the other hand, is a trade-off. The storage of non-materialized columns (aka virtual columns) is not as optimized as storage for real columns. Data retrieval of non-materialized columns is expected to be slower as a result.

    The way to improve performance is to materialize columns to fully take advantage of the performance benefits of the columnar architecture. In queries, Vertica will pick materialized columns (over virtual columns) whenever they are available.

    You could use the materialize_flextable_columns() function to do the job automatically. Or you could selectively materialize just the ten columns you're interested in (or a subset of those), adding those manually to your flex table. Just make sure you also default them to the value stored in the flex table raw column.

    Example: Say I want to materialize an Integer column my_int_col in a flex table.

    ALTER FLEX TABLE flex_table ADD COLUMN my_int_col INT DEFAULT (public.MapLookup(flex_table.__raw__, 'my_int_col'))::!int;

    This query now gets substantially better performance.
    SELECT my_int_col FROM flex_table;

    That will get you the best of both worlds: performance and flexibility to store schema-evolving data in your flex table.

    Regards.

  • Options

    Just to add to Ariel's excellent description, here's the doc pointer to materializing virtual columns: https://my.vertica.com/docs/8.1.x/HTML/index.htm#Authoring/FlexTables/MaterializingFlexTables.htm

  • Options

    Excellent info here. Thanks to all who shared.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file