Local temporary table Vs Physical Table for Huge data load

Navneet_1Navneet_1 Registered User
Hi,
I have a scenario where I need a temporary staging area for my data. Data size is somewhere from 20 GB to 50GB. Source of the data is another physical table joined with one or more different tables. 

The options I see are:

1. Create a Local Temporary table ( I don't want the data after the session)
2. Create a physical table and insert the records into that tables after truncating it everytime
3. Using "With"

I wanted to understand how the Vertica manages the memory in each of the case and is there any preferred way from i) Performance (time) and ii) memory perspective.

Certainly, one aspect will be not to have sudden peaks in temp usage. 

Thanks
NJ

Comments

  • Sajan_SangraulaSajan_Sangraula Registered User
    Hi Navneet,
    Assuming you have the same storage devices for both you temp and data spaces, I would go with option 2. By reusing the same physical table, you dont have any catalog locking overheads of dropping/ recreating tables and truncation also clears out all containers, so you dont have purge overhead. You can tune for inserts and selects better as well as the projections persist with the table. Considering the size of your data loads, you can load the data directly to ROS with COPY DIRECT OR with a /*+ direct */ hint of you are doing a INSERT INTO. 

    /Sajan

     
  • eli_revacheli_revach Registered User
    "With" is not creating temp segments in vertica , this beave change in vertica 7.1 , you not mention why you need this staging area . in some cases you can avoid it , like having your cursor open durining the proccsing of your data.
  • Navneet_1Navneet_1 Registered User
    Thanks guys.

    The reason I am using a staging area is because I can make a flat load to Vertica and then perform the complex joins and calculations based on this middle zone. 

    So to Eli's response, can you please help me understand "having a cursor opened". 
    Also, are you suggesting that using "With" is a better option ? I am using Vertica 6.1.2, and don't think will be changing to 7.1 anytime soon.
  • eli_revacheli_revach Registered User
    Hi ,
    "having a cursor opened" - You mention that your "Source of the data is another physical table joined with one or more different tables" , Instead of run this query and put its results into temp and then read from team to do the processing , you can read directly from source and do the processing . In some case is more fast that creating temp .


    Regarding WITH , i don't day that WITH is good option her (you not share your SQL’s ) , I just mention that in most of the time peoples that use to work with Oracle /Sqlserver relay on the fact that WITH have the ability to create temp table on the fly and the rest of the query will use it transparently , in Vertica until 7.1  WITH section is just being rewrite to normal join without creating temp table .

  • Navneet_1Navneet_1 Registered User
    Well, at times it is not possible to use the data directly without creating an intermediate table. For example, I am using a Merge statement to Update/Insert the records to a final table. Now, Merge does not work with subqueries in Using section (am I correct ?), so I have to create an intermediate table and then pass it down to Merge. Other way would be to use the same join twice, once in Update and second time in Insert.

  • eli_revacheli_revach Registered User
    Merge is fantastic tool ,i am not understending way you need subquery in the using ? To me its looks like your "using "part need to include some kind of view that implement your data source . Anyway as you not provide deep details about your use case , what i am saying her can be /not be applicable to address your use case
  • Navneet_1Navneet_1 Registered User
    sorry about that, lets say here is the case:

    There is a file payments.dat that contains millions of records amounting to nearly 10+ GB or so (can be up to 100 GB at times). The data in this file has to be processed and loaded into a Vertica table, I load this data into some staging area first (payment_dat)

    To finally load this table, I need to achieve something like
     Merge into pymtdetails     using (       Select <pymtColumns>, <customerDetails>, <pymtSubjectDetailsColumn>        FROM         Payment_dat, customer, pymtInfoRef        Where <join conditions>      When MATCHED THEN UPDATE       ...    WHEN NOT MATCHED THEN INSERT       ...
    Now, Vertica does not let me use that subquery in the USING part, and thus I have to create an intermediate object and that is what I was looking a best option for.
    As I mentioned, I can create a Local Temp table, or another physical table. 

    Please let me know if I should be more detailed.
  • eli_revacheli_revach Registered User
    Hi

    The terminology of sub query her is misled , you just try to define some kind of inline view , anyway , I attach an example ,     as I mention you should use view as a workaround  , see below example :

     

    drop table target;

    create table target (id number ,value varchar2(20));

    insert into target values(1,'value of 1');

    insert into target values(2,'value of 2');

     

    drop table source1;

    create table source1 (id number ,value varchar2(20));

    insert into source1 values(1,'source1 of 1');

    insert into source1 values(2,'source1 of 2');

    insert into source1 values(3,'source1 of 2');

    insert into source1 values(4,'source1 of 2');

     

    drop table source2;

    create table source2 (id number ,value varchar2(20));

    insert into source2 values(1,'source2 of 1');

    insert into source2 values(2,'source2 of 2');

    insert into source2 values(3,'source2 of 2');

    insert into source2 values(4,'source2 of 2');

     

    MERGE INTO target

    using (select source1.value,source1.id from source1,source2

            where source1.id=source2.id

                   ) src

    on ( src.id=target.id)      

    WHEN MATCHED THEN

    UPDATE SET

    value=src.value

    WHEN NOT MATCHED THEN

     INSERT VALUES (

    src.id,

    src.value); 

    ERROR 5665:  Subquery in MERGE is not supported


     Create or replace view myview as   select source1.value,source1.id from source1,source2

     where source1.id=source2.id;

     

    MERGE INTO target

    using myview src

    on ( src.id=target.id)            

    WHEN MATCHED THEN

    UPDATE SET

    value=src.value

    WHEN NOT MATCHED THEN

     INSERT VALUES (

    src.id,

    src.value); 

     

    CREATE VIEW

    dbadmin=> MERGE INTO target

    dbadmin-> using myview src

    dbadmin-> on ( src.id=target.id)

    dbadmin-> WHEN MATCHED THEN

    dbadmin-> UPDATE SET

    dbadmin-> value=src.value

    dbadmin-> WHEN NOT MATCHED THEN

    dbadmin->  INSERT VALUES (

    dbadmin(> src.id,

    dbadmin(> src.value);

     OUTPUT

    --------

          4

    (1 row)

  • Vertica_CurtisVertica_Curtis Employee, Registered User

    I've never performed the metrics, but I've heard good things regarding the performance of merge.

    Also, historically, temp tables tend to operate faster than permanent tables, and also don't affect license size. So, I'd tend towards the with clause or the temp table, if you're looking for just pure performance.

    Another option, would be a table with a pinned projection. That's the fastest option. This creates the projection only on a single node.

    CREATE TEMPORARY TABLE public.gttz
    (
    i int,
    j int
    ) NO PROJECTION;

    >
    >
    >

    CREATE PROJECTION public.gttz_p
    (
    i,
    j
    )
    AS
    SELECT gttz.i,
    gttz.j
    FROM public.gttz
    ORDER BY gttz.j
    PINNED;

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file