MERGE on huge external tables(oracle)

I have created external tables with a built in shell script to connect to ORACLE from vertica. Its working really fast on small tables with with upto 250 k rows or so. But tables with millions of rows fails probably becoz shell script not being able to keep up with connection to oracle or some reason. Did any one tried to merge data by  intervals based on data or date? This will really help me to complete a 10 million row table with a 20 chunk interval

Comments

  • Anyone with MERGE expertise is desperatley needed ...please reply asap

  • So what is the issue here ? 

    1. conection fail while doing merge on Oracle side ?(maybe some timeout param)
    2. looking for a model to do intervals based on data or date merges ? (show some code, eg: cols used )
  • MERGE INTO DSS.T_D_RE_ELIG AS target
    USING dssmod.T_D_RE_ELIG_EXT AS Source
         ON Source.SAK_RECIP = target.SAK_RECIP
    WHEN MATCHED THEN UPDATE SET
        SAK_PGM_ELIG = Source.SAK_PGM_ELIG,
        SAK_PUB_HLTH = Source.SAK_PUB_HLTH,
        DTE_EFFECTIVE = Source.DTE_EFFECTIVE,
        DTE_END = Source.DTE_END,
        CDE_PRE_ENTRY_STAT = Source.CDE_PRE_ENTRY_STAT,
        CDE_STOP = Source.CDE_STOP,
        CDE_STATUS1 = Source.CDE_STATUS1,
        DTE_CREATED = Source.DTE_CREATED,
        DTE_LAST_UPDATED = Source.DTE_LAST_UPDATED,
        DTE_ACTIVE_THRU = Source.DTE_ACTIVE_THRU,
        NUM_RANK = Source.NUM_RANK
    WHEN NOT MATCHED THEN INSERT
        (
            SAK_RECIP,
            SAK_RECIP ,
            SAK_PGM_ELIG,
            SAK_PUB_HLTH,
            DTE_EFFECTIVE,
            DTE_END,
            CDE_PRE_ENTRY_STAT,
            CDE_STOP,
            CDE_STATUS1,
            DTE_CREATED,
            DTE_LAST_UPDATED,
            DTE_ACTIVE_THRU,
            NUM_RANK
        )
        VALUES (
            Source.SAK_RECIP,
            Source.SAK_RECIP ,
            Source.SAK_PGM_ELIG,
            Source.SAK_PUB_HLTH,
            Source.DTE_EFFECTIVE,
            Source.DTE_END,
            Source.CDE_PRE_ENTRY_STAT,
            Source.CDE_STOP,
            Source.CDE_STATUS1,
            Source.DTE_CREATED,
            Source.DTE_LAST_UPDATED,
            Source.DTE_ACTIVE_THRU,
            Source.NUM_RANK
        );

     

     

     

    This is our basic MERGE and works fine for a table with upto 400k rows...but we are trying to see what can we do for a 8 million rows - like MERGE chunk by chunk like using some 'date' column in the table itself or use an icnrement on a surrogate key

     

     

    2. #!/bin/bash
    ssh user@server005 ARG1=$1 'bash -s' <<'ENDSSH'
    cd $HOME/scripts
    export ORACLE_HOME=/OraCl11g/instantclient_11_2
    export PATH=$PATH:/OraCl11g/instantclient_11_2
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/OraCl11g/instantclient_11_2
    ./unloader "userid=user/password@oracledb" sqlstmt="select * from $ARG1" arraysize=10000 delimiter="|" enclosure="" null_string=""
    #echo "All: " $*
    #echo "ARG1: " $ARG1
    ENDSSH

     

    This is how I am trying to connect to a remote ORACLE DB and trying to merge data with VERTICA. UNLOADER user is a C program. I believe this is not able to hold huge volume of data..which is why we are looking to MERGE by chunks

  • MaurizioMaurizio Vertica Employee Employee

    It is not clear if this is a data movement (Oracle>Vertica) issue or a MERGE issue. I would suggest you to:

    • step1: move your data from Oracle to a "normal" Vertica table

    and then:

    • step2: MERGE tables in Vertica

    This is just to understand where your issue come from and recommend the appropriate solution.

     

    A few questions:

    1. is your Vertica table partitioned?
    2. which % of the Vertica table will be updated during your MERGE?
  • It is not clear if this is a data movement (Oracle>Vertica) issue or a MERGE issue. I would suggest you to:

     

    Its actually both - I have created an EXTRENAL TABLE on oracle tables as source and target have same structure. What I am trying to do is MERGE external table(oracle) with vertica to get my CDC. Instead of going for a huge scan at once, I would like to MERGE by chunks to preseve my network. Is there a way to do this ?

     

     

     

    1. which % of the Vertica table will be updated during your MERGE?  - It might be less than 10% but MERGE will do a full scan of source and target to get the CDC which is where my LAN is screwed.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file