How to Make the Leap from Traditional DW/ETL to Vertica?

**Newbie - please be gentle**

 

I realize many books could be written in response to this question. I obviously don't expect that. I'm just hoping a few of you can point me towards some documentation, blog entries, etc. that can get me on my way.

 

I come from the row-oriented, single server (non-MPP) DW world. I'm used to creating dimensional models, sometimes 3nf ODSs, MDM dbs, etc. and traditional ETL processes using tools like Informatica and SSIS to do full and incremental loads of data from a variety of sources including flat files, XML, web services, and primarily relational sources such as Oracle and SQL Server. 

 

I'm currently evaluating Vertica for a large organization looking to build their first DW. They have very large data volumes, most of which are coming from relational sources such as Oracle and SQL Server as well as social media sources such as Facebook and Twitter and click stream data from a number of web sites.

 

I'm looking for help in understanding best practices for things like table/projection design and ETL/ELT. I'm guessing that dimensional modeling doesn't really apply here since its design is heavily driven by the need to make row-oriented databases perform well in an analytic environment - an issue that is addressed by the nature of column store technology such as Vertica. 

 

As for ETL/ELT, it appears that Vertica most likely has a much different loading paradigm than what I'm used to. I'm used to extracting data from sources, transforming it in memory, and then persisting it to a row-oriented relational source. It seems like Vertica works better with flat files/COPY, transform in Vertica, then load to destination table. 

 

My above assumptions might be completely off. Just seems like I'm going to get myself in trouble if I try to do what I'm used to with a Vertica implementation and looking for some guidance on how to have a successful evaluation and hopefully production implementation of what seems to be an awesome product. 

 

Thank you!

Comments

  • There are a few ways to implement ELT in vertica.  Some traditional ETL tools actually have ELT components.  Here is an example of Talend with ELT components that have specific ELT components that do push down optimization (PDO) which can be used to transform the data once it is in Vertica.

     

    talend_elt.png

     

     

    Another way to implement ELT is to just use fast data movement tools, typically now just done with load scripts and Vertica vsql,  that just load the data and then you do data exploration using your favorite profiling and exploration tools to decide what to keep.

     

    The last way to implement ELT is to use Vertica flex zone. A very powerful new capability that allows you to load your data without defining the schema before the load. See the following link for more details.

    http://www.vertica.com/2014/02/14/enter-the-flex-zone-under-the-hood/

  • Hi,

     

    There is actually nothing different, Vertica has ODBC,JDBC connectors so you can use the same ETL tools that some traditionals DW.

     

    There are some considerations such as be sure that the data is loaded in batches, etc. But once that you choose the tools you can look in the decumentation for details.

     

    As reference, there is a nice document about how to use Vertica with Pentaho:

     

    https://community.dev.hp.com/t5/Vertica-Wiki/HP-Vertica-Integration-with-Pentaho-Data-Integration-PDI-Tips/ta-p/226999

     

    Pentaho is an open source tool for ETL.

     

    Hope this helps. Regards,

     

    Eugenia

     

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file