Options

Fastest way to copy data from one database to another

What would be the fastest way to copy a Vertica database (~2TB) to another Vertica database? There are additional columns on the destination database.

Comments

  • Options
    wwong2wwong2 Employee
    Hi Mason, are you referring to copying table from 1 vertica db to another table which has addtional columns in another database?
    If so, there are a couple of options you can use, export to vertica (see link for more details https://my.vertica.com/docs/6.1.x/HTML/index.htm#15573.htm )
    or copy from vertica (see link for more details, https://my.vertica.com/docs/6.1.x/HTML/index.htm#15591.htm)

    Regards
    Wayne

  • Options
    How about the fastest way to copy from one table to another in the same database? I tried to use insert with direct, but it's too slow. Basically, I need to copy table1(col1, col2) to table2(col1, col2, date_trunc(col1)), where col1 and col2 are the same on table1 and table2, date_trunc(col1) is calculated column based on col1.
  • Options
    wwong2wwong2 Employee
    Hi Mason, the insert direct would be the quickest way as it is going direct to disk.  What is the performance like that you are seeing and how many records are there?
    Another way would be to dump out the data from table to a file and then use the COPY command to load that file to Vertica.  
    Please note, if you dump the tabe data to flat file, you may need to review and clean it out (ie removing column names) or if you use the vsql command, you can use some of the available options to not display columns, see link for info about those options

    Regards
     
  • Options
    wwong2wwong2 Employee
    Hi Mason, the insert direct would be the quickest way as it is going direct to disk.  What is the performance like that you are seeing and how many records are there?
    Another way would be to dump out the data from table to a file and then use the COPY command to load that file to Vertica.  
    Please note, if you dump the tabe data to flat file, you may need to review and clean it out (ie removing column names) or if you use the vsql command, you can use some of the available options to not display columns, see link for info about those options

    Regards
     
  • Options
    Thank you so much, Wayne! It took about 2 hours to copy 800 million records, but we have 40 billion records. What kind of throughput should I expect?
  • Options

    >> What kind of throughput should I expect?

    30-50GB/hour/node for a 1-Ksafe projection design for COPY from file.


    >> I need to copy table1(col1, col2) to table2(col1, col2, date_trunc(col1))

    Define DEFAULT value for date_trunc(col1) and transfer only 2 columns - it will reduce raw size of data.

    daniel=> create table may (id int, name varchar, born date, age int default age_in_years(born));
    CREATE TABLE

    daniel=> copy may(id, name, born) from stdin delimiter ' ';
    Enter data to be copied followed by a newline.
    End with a backslash and a period on a line by itself.
    >> 1 Anna 1985-01-01
    >> 2 John 1978-03-17
    >> \.

    daniel=> select * from may;
     id | name |    born    | age
    ----+------+------------+-----
      1 | Anna | 1985-01-01 |  28
      2 | John | 1978-03-17 |  35
    (2 rows)

    >> fastest way to copy a Vertica database

    If you are using in COPY FROM VERTICA/EXPORT FROM VERTICA, so compress traffic.

    CompressNetworkData

    When enabled (set to value 1), Vertica
    will compress all of the data it sends over the network. This speeds up network traffic at the expense of added CPU load. You can enable this if you find that the network is throttling your database performance.
    daniel=> SELECT GET_CONFIG_PARAMETER ('CompressNetworkData');
     GET_CONFIG_PARAMETER
    ----------------------
     0
    (1 row)

    daniel=> SELECT SET_CONFIG_PARAMETER ('CompressNetworkData', 1);
        SET_CONFIG_PARAMETER   
    ----------------------------
     Parameter set successfully
    (1 row)


  • Options
    another way to load data (Im using in same db):


    daniel@synapse:~$ time vsql -Atqnc "select * from inventory_fact" | vsql -c "copy inventory_fact from stdin"

    real    0m0.621s
    user    0m0.360s
    sys    0m0.072s

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file