using EXCEPT to compare 2 tables on 2 different vertica db's.

kfruchtmankfruchtman Vertica Customer

Hi Guys!
I have a tricky question.I need to compare 2 tables (1 Billion rows tables) .Should be very easy when they are on the same db- just use the except sql command but
my situation is that they are not on the same db.they are both Vertica software and lets say both db's are imported on MC .Is there some way to do that?
both db's have 1 node.

many thanks!
Keren

Answers

  • kfruchtmankfruchtman Vertica Customer

    Some more info I forgot to mention which makes the issue more difficult I guess:
    -- those 2 db's are cloned from the past so they have the same db name
    -- both are single node with localhost
    I thought of doing CONNECT TO VERTICA command to connect from one to the other but I think it is not possible I didn't succeed
    because of this architecture.
    best regards,
    Keren

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited April 2021

    Are you expeting a lot of differences? Maybe a Linux diff could help?

    Example:

    [dbadmin@localhost ~]$ vsql -h host1 -Atc "SELECT * FROM some_table1;"
    1
    2
    
    [dbadmin@localhost ~]$ vsql -h host12 -Atc "SELECT * FROM some_table2;"
    1
    3
    
    [dbadmin@localhost ~]$ diff --color <(vsql -h host1 -Atc "SELECT * FROM some_table1;") <(vsql -h host2 -Atc "SELECT * FROM some_table2;")
    2c2
    < 2
    ---
    
    
      3
    
    
    
  • kfruchtmankfruchtman Vertica Customer

    first thanks for answering.
    Second - I know the exact difference of rows since when I count the rows on the tables I have exactly 192,448 rows more on one db . Can I still use the above command? BTW they are not on the same boxes.each db on a different ubuntu box.
    I need to put those 192,448 rows in a file and use the COPY command to insert it to the missing rows db.
    will that work?
    Keren

  • kfruchtmankfruchtman Vertica Customer

    doing select * is a lot since the table has more than 1B rows...

  • Jim_KnicelyJim_Knicely - Select Field - Administrator
    edited April 2021

    Can you reach one host from the other? Why not just load one of the tables temporaily into the other DB and do a compare with EXCEPT or MINUS?

    Something like:

    vsql -h remote_host -c "CREATE TABLE some_table1_temp LIKE some_table1;"
    vsql -h local_host -Atc "SELECT * FROM some_table1;" | vsql -h remote_host -c "COPY some_table1_temp FROM STDIN;"
    

    Unfortunately that COPY might take a while...

    Another option might to export the data as Parquet and either read it as an external table on the remote server or load it on the remote server.

  • SergeBSergeB - Select Field - Employee
    edited May 2021

    If loading both tables in the same DB is an option, you could use COPY FROM VERTICA to import the table (with 1B rows it should be faster than COPY from STDIN ). Then use EXCEPT or MINUS as suggested by Jim.
    https://www.vertica.com/docs/10.1.x/HTML/Content/Authoring/SQLReferenceManual/Statements/COPYFROMVERTICA.htm

    => CONNECT TO VERTICA vmart USER dbadmin PASSWORD 'myPassword' ON 'VertTest01',5433;
    CONNECT
    => COPY customer_dimension2 FROM VERTICA vmart.customer_dimension;

    Rows Loaded

      500000
    

    (1 row)
    => DISCONNECT vmart;
    DISCONNECT

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file