Options

Bulk load data into Vertica from HDFS

Hello,

 

I have a csv file which is residing on HDFS. I want to bulk load this file into vertica

 

I am reading this pdf

 

http://www.vertica.com/wp-content/uploads/2011/01/FastDataLoadingInVertica.pdf

 

but this does not talk about fast loading data from HDFS into vertica.

 

what is the best way to load HDFS data into vertica?

 

Edit::

 

I did see this thread

 

https://community.dev.hpe.com/t5/Vertica-Forum/load-from-hadoop/m-p/219695/highlight/true#M7434

 

But i am not loading from "hive" to vertica. I want to keep my data as a CSV in HDFS and then upload it to vertica.

 

Edit::

 

Also, please tell me how is this approach compared to a tool called sqoop? does it make sense that I juse sqoop for my data load?

Comments

  • Options
    swalkausswalkaus Vertica Employee Employee

    The hdfs connector should work well for you. Here's relevant doc for vertica v7.2 (currently the latest release) https://my.vertica.com/docs/7.2.x/HTML/Content/Authoring/HadoopIntegrationGuide/HDFSConnector/LoadingDataFromHDFS.htm. The hdfs connector has existed for many versions now, check documentation for whatever version of vertica you are currently running.

     

    In a nutshell the hdfs connector lets vertica read bytes from hdfs as a source of data for COPY statements or external tables (yes, you can query files that reside in hdfs without loading that data into vertica). You can use a built-in or user-defined parser to turn those bytes into tuples (and you can use zero or more built-in or user-defined filters to uncompress or otherwise transform the bytes between the source and parser). In your case you're parsing csv format, so you should be able to use vertica's built-in delimited parser.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file