Does anyone know how to Data Transfer from Oracle to Vertica?

I am actually trying to transfer Oracle objects(primarily tables) to Vertica. Does anyone know the optimal way with which I can make the transfer?

Comments

  • Thanks for your question Asim! Our team is looking into some best practices for you.
  • Hello Asim: Some information exists in our Administrator's Guide. See: Bulk Loading Data > Performing the Initial Database Load Please let us know if this is enough information to solve your issue.
  • I tried several ways to do that. You can try using Talend or Informatica for data transfering. From my experience the best way is to dump data using sqlplus to file and then copy to the Vertica. So it is like hand made ETL :)
  • Thank you All for your inputs. But I was looking for something really efficient. Actually, my need is to transfer around 2000 tables (where few of the tables would be more than 100GBs) with a minimal down time and it is not a one-time transfer but need to happen on timely basis. @Kanti, let me further revisit that section. I don't think it fulfills my requirement completely. Should there be any further updates, please feel free to share. Thanks
  • Let me ask my query in a different way - does anyone know the efficient way to transfer DB Schema from Oracle to Vertica? Oracle does have the option of CTAS (for tables only) and TTS (for all DB Objects). Similarly, is there anything like this between Oracle to Vertica.
  • Asim, Talend Data Integration has a option to create target tables on Vertica. In most cases Orcale DDL should works in Vertica. It means that columns types are converted to Vertica. But this will not be the best way. For example it is better to change NUMBER to INTEGER, FLOAT etc. The same with DATE type. DATE type in Vertica does not include time part so it needs changing to DATETIME in case you need time part. The biggest problem is that physical model in Vertica is different. From my experience it is better to plan everything from scratch, like projections, partitions etc. The second question I will ask is that you really need 2000 tables? Maybe some of them can be combined? To sum up, I don't know any tool that can be used in your case. But maybe someone else knows it.
  • Here is a perl script that can help.. while (<>) { $subject .= $_; } $subject =~ s/(.*)\r\n/$1/g; $subject =~ s/(.*)\n/$1/g; $subject =~ s/(.*)\r/$1/g; $subject =~ s/REM //g; $subject =~ s/CREATE TABLE "[^. ]"\."([^. ])"/CREATE TABLE $1\n/g; $subject =~ s/) PCTFREE[^;]*;/);\n\n/g; $subject =~ s/"([^"])" VARCHAR2((\d))/$1 VARCHAR($2)/g; $subject =~ s/"([^"]*)" DATE/$1 DATE/g; $subject =~ s/"([^"])" CHAR((\d))/$1 CHAR($2)/g; $subject =~ s/"([^"])" NUMBER((\d), (\d*))/$1 FLOAT/g; $subject =~ s/"([^"]*)" NUMBER/$1 INTEGER/g; $subject =~ s/CONNECT [^;]*;//g; $subject =~ s/CREATE UNIQUE INDEX [^;]*;//g; $subject =~ s/ALTER TABLE [^;]*;//g; $subject =~ s/ ENABLE//g; $subject =~ s/, PRIMARY KEY ([^\)]*)//g; $subject =~ s/) ORGANIZATION[^;]*;/);\n\n/g; $subject =~ s/, /,\n /g; print $subject;
  • Hi Asim, BryteFlow CDC can help you in the following way 1) Automatically convert your schemas in Oracle to Vertica 2) Do an initial extract populating all data from Oracle to Vertica 3) Set up a log based Change Data capture which will deliver all changes real-time to near real-time on a regular basis or set-up a logical CDC to run on a regular basis or replicate entire dataset on a regular basis 4) This tool has special support for Vertica
  • Hi, It's good to know that there is some tool which provide support to transfer DB schema from Oracle to Vertica. Can anyone share some details about BryteFlow CDC mentioned in the above post. Thanks Swapna
  • Hi Swapna, how do I send the data sheets of the product to you
  • Hi, You can just direct me towards any link which gives this information. Thanks
  • Hi, Please find the link to the BryteFlow CDC and Data Replication products that can help you. http://www.bryte.com.au/products/bryteflow-change-data-capture/ http://www.bryte.com.au/products/bryteflow-data-replication/ Hope this helps!
  • This is a great conversation that's separate from the main topic, so I created a new topic to continue the discussion. Please reference the new topic here: BryteFlow CDC for Vertica?
  • For CDC you will have to pay and a lot , this will impact your project TCO , such task can be implemented very easy by simple script that will export from oracle and load it to Vertica ( very minimal effort with no cost ).
  • Hand coded solutions are ok for one-off transfers, but have the problems listed below for a production implementation - handle changes in source structures - Handle control characters in text data, which cause loads to fail - selective data transfer - handle complex data types - handle reserved words - scheduling etc. CDC delivers real-time data transfer and is a complex technology as it hooks into the database internals of Oracle. Data Replication is more suited for just normal transfers and very cost effective.
  • Of course , all the list below is very important and most of it can be implemented via set of script . I was part of such project that was implement on one of the biggest Telco company’s on the USA . In general we have a C program that use dynamic SQL level 4 to extract the data from Oracle to flat files , the C program input was a flat file that include several sql statement to extract , each extract output go to dedicated delimited flat file , each extract process that finish the extract sign an indicator on a control table , the loader is checking this control table for files to load to the data target when loader finish it update the control table and with the status and with the stat about the load , if the loader failed to load some data it run automatic data correction and reload it . The process run as a demon each X hours . I know it’s not covering all , but if you schema on both source and target is the same and you only insert to target (no update and delete ), such utility can be enough
  • eli ,

    would you have the C code for this?
  • We need to load TBs of data from oracle to Vertica,what will be the best option for this?

    Thanks,
    Pankaj
  • If its one time effort , you can use the C code aproch to extract to csv ,make sure you run parallel instances of the extract process , so you finely have many csv files ,than distribure the csv to your vertica cluster so each node will have part of the files ,then run copy command in parallel on each one of the nodes
  • Built our own software tool using Java to migrate large volume of data from Oracle to Vertica efficiently and quickly. Recently completed a POC to load 1000+ tables (1.2 TBs of data) from Oracle to Vertica in 1 hr and 30 mins. This may be the fastest tool on the market now competing with other products by leading companies.
  • We use Pentaho Data Integration for PoC. It has Vertica loader, which is consumes a lot of memory, but is still usable. If I had time, I would also check this approach: http://type-exit.org/adventures-with-open-source-bi/2012/06/implementing-universal-bulk-loading-in-pdi/

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file