Options

Best practice for schematizing semi-structured data in flex tables

We are using an automated process to periodically load batches of semi-structured data into a flex table.  Is there any best practice for a way to transform this data into a more defined schema, post-load?  For instance, let's say that the loaded data is deeply nested making queries on the flex table itself more difficult to construct.  Here are the options that I can think of:
  1. Once the automated process has finished the load, make a call to Vertica to run an external procedure or user defined function that transforms the data.
  2. Don't transform the data at all - materialize expected columns and run queries against the single flex table.
  3. Attempt to add structure to the data pre-load, and forgo the need for using a flex table at all.
Any other method you could suggest would be very useful.  I know that the exact answer will be dependent on the data, but I was wondering whether there were any common use-cases that you have seen or a generally accepted best practice in this area? 

Comments

  • Options

    Good question Jonathan -- I'm afraid that you're correct; the answer really is "it depends."

     

    For use cases where the data coming into the system is mostly queryable as-is, people may leave it in the Flex table and just materialize some columns.  This is the "I want to work with interestingly-structured data in SQL" use case.

     

    If the incoming data is dirty and/or needs significant ELT in order to be consumed, it is almost always loaded into one or more other tables (typically but not necessarily regular non-Flex tables) via INSERT .. SELECT statements.  This can use a UDT of some sort but more often just uses simple (or not-so-simple) SQL expressions -- there's really quite a lot that you can do with Vertica's built-in functions.  This is the "I want to use Flex to help clean confusing or dirty data" use case.

     

    There are, of course, people who do ETL (or ELT) without Flex.  We, of course, think they should use Flex :-)  More seriously, use Flex if it helps you to consume or to transform your data more easily.

     

    Adam

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file