Bulk load data into Vertica from HDFS



I have a csv file which is residing on HDFS. I want to bulk load this file into vertica


I am reading this pdf




but this does not talk about fast loading data from HDFS into vertica.


what is the best way to load HDFS data into vertica?




I did see this thread




But i am not loading from "hive" to vertica. I want to keep my data as a CSV in HDFS and then upload it to vertica.




Also, please tell me how is this approach compared to a tool called sqoop? does it make sense that I juse sqoop for my data load?


  • swalkausswalkaus Vertica Employee Employee

    The hdfs connector should work well for you. Here's relevant doc for vertica v7.2 (currently the latest release) https://my.vertica.com/docs/7.2.x/HTML/Content/Authoring/HadoopIntegrationGuide/HDFSConnector/LoadingDataFromHDFS.htm. The hdfs connector has existed for many versions now, check documentation for whatever version of vertica you are currently running.


    In a nutshell the hdfs connector lets vertica read bytes from hdfs as a source of data for COPY statements or external tables (yes, you can query files that reside in hdfs without loading that data into vertica). You can use a built-in or user-defined parser to turn those bytes into tuples (and you can use zero or more built-in or user-defined filters to uncompress or otherwise transform the bytes between the source and parser). In your case you're parsing csv format, so you should be able to use vertica's built-in delimited parser.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file