Local temporary table Vs Physical Table for Huge data load

Navneet_1 · July 2014

Hi,
I have a scenario where I need a temporary staging area for my data. Data size is somewhere from 20 GB to 50GB. Source of the data is another physical table joined with one or more different tables.

The options I see are:

1. Create a Local Temporary table ( I don't want the data after the session)
2. Create a physical table and insert the records into that tables after truncating it everytime
3. Using "With"

I wanted to understand how the Vertica manages the memory in each of the case and is there any preferred way from i) Performance (time) and ii) memory perspective.

Certainly, one aspect will be not to have sudden peaks in temp usage.

Thanks
NJ

Sajan_Sangraula · July 2014

Hi Navneet,
Assuming you have the same storage devices for both you temp and data spaces, I would go with option 2. By reusing the same physical table, you dont have any catalog locking overheads of dropping/ recreating tables and truncation also clears out all containers, so you dont have purge overhead. You can tune for inserts and selects better as well as the projections persist with the table. Considering the size of your data loads, you can load the data directly to ROS with COPY DIRECT OR with a /*+ direct */ hint of you are doing a INSERT INTO.

/Sajan

eli_revach · July 2014

"With" is not creating temp segments in vertica , this beave change in vertica 7.1 , you not mention why you need this staging area . in some cases you can avoid it , like having your cursor open durining the proccsing of your data.

Navneet_1 · August 2014

Thanks guys.

The reason I am using a staging area is because I can make a flat load to Vertica and then perform the complex joins and calculations based on this middle zone.

So to Eli's response, can you please help me understand "having a cursor opened".
Also, are you suggesting that using "With" is a better option ? I am using Vertica 6.1.2, and don't think will be changing to 7.1 anytime soon.

eli_revach · August 2014

Hi ,
"having a cursor opened" - You mention that your "Source of the data is another physical table joined with one or more different tables" , Instead of run this query and put its results into temp and then read from team to do the processing , you can read directly from source and do the processing . In some case is more fast that creating temp .

Regarding WITH , i don't day that WITH is good option her (you not share your SQL’s ) , I just mention that in most of the time peoples that use to work with Oracle /Sqlserver relay on the fact that WITH have the ability to create temp table on the fly and the rest of the query will use it transparently , in Vertica until 7.1 WITH section is just being rewrite to normal join without creating temp table .

Navneet_1 · August 2014

Well, at times it is not possible to use the data directly without creating an intermediate table. For example, I am using a Merge statement to Update/Insert the records to a final table. Now, Merge does not work with subqueries in Using section (am I correct ?), so I have to create an intermediate table and then pass it down to Merge. Other way would be to use the same join twice, once in Update and second time in Insert.

eli_revach · August 2014

Merge is fantastic tool ,i am not understending way you need subquery in the using ? To me its looks like your "using "part need to include some kind of view that implement your data source . Anyway as you not provide deep details about your use case , what i am saying her can be /not be applicable to address your use case

Navneet_1 · August 2014

sorry about that, lets say here is the case:

There is a file payments.dat that contains millions of records amounting to nearly 10+ GB or so (can be up to 100 GB at times). The data in this file has to be processed and loaded into a Vertica table, I load this data into some staging area first (payment_dat)

To finally load this table, I need to achieve something like

 Merge into pymtdetails     using (       Select <pymtColumns>, <customerDetails>, <pymtSubjectDetailsColumn>        FROM         Payment_dat, customer, pymtInfoRef        Where <join conditions>      When MATCHED THEN UPDATE       ...    WHEN NOT MATCHED THEN INSERT       ...

Now, Vertica does not let me use that subquery in the USING part, and thus I have to create an intermediate object and that is what I was looking a best option for.
As I mentioned, I can create a Local Temp table, or another physical table.

Please let me know if I should be more detailed.

eli_revach · August 2014

Hi

The terminology of sub query her is misled , you just try to define some kind of inline view , anyway , I attach an example , as I mention you should use view as a workaround , see below example :

drop table target;

create table target (id number ,value varchar2(20));

insert into target values(1,'value of 1');

insert into target values(2,'value of 2');

drop table source1;

create table source1 (id number ,value varchar2(20));

insert into source1 values(1,'source1 of 1');

insert into source1 values(2,'source1 of 2');

insert into source1 values(3,'source1 of 2');

insert into source1 values(4,'source1 of 2');

drop table source2;

create table source2 (id number ,value varchar2(20));

insert into source2 values(1,'source2 of 1');

insert into source2 values(2,'source2 of 2');

insert into source2 values(3,'source2 of 2');

insert into source2 values(4,'source2 of 2');

MERGE INTO target

using (select source1.value,source1.id from source1,source2

where source1.id=source2.id

) src

on ( src.id=target.id)

WHEN MATCHED THEN

UPDATE SET

value=src.value

WHEN NOT MATCHED THEN

INSERT VALUES (

src.id,

src.value);

ERROR 5665: Subquery in MERGE is not supported

Create or replace view myview as select source1.value,source1.id from source1,source2

where source1.id=source2.id;

MERGE INTO target

using myview src

on ( src.id=target.id)

WHEN MATCHED THEN

UPDATE SET

value=src.value

WHEN NOT MATCHED THEN

INSERT VALUES (

src.id,

src.value);

CREATE VIEW

dbadmin=> MERGE INTO target

dbadmin-> using myview src

dbadmin-> on ( src.id=target.id)

dbadmin-> WHEN MATCHED THEN

dbadmin-> UPDATE SET

dbadmin-> value=src.value

dbadmin-> WHEN NOT MATCHED THEN

dbadmin-> INSERT VALUES (

dbadmin(> src.id,

dbadmin(> src.value);

OUTPUT

--------

4

(1 row)

Vertica_Curtis · June 2019

I've never performed the metrics, but I've heard good things regarding the performance of merge.

Also, historically, temp tables tend to operate faster than permanent tables, and also don't affect license size. So, I'd tend towards the with clause or the temp table, if you're looking for just pure performance.

Another option, would be a table with a pinned projection. That's the fastest option. This creates the projection only on a single node.

CREATE TEMPORARY TABLE public.gttz
(
i int,
j int
) NO PROJECTION;

>
>
>

CREATE PROJECTION public.gttz_p
(
i,
j
)
AS
SELECT gttz.i,
gttz.j
FROM public.gttz
ORDER BY gttz.j
PINNED;

We're Moving!

Create My New Community Account Now

Local temporary table Vs Physical Table for Huge data load

Comments

Leave a Comment