Sequence taking more time

deb0687 · June 2015

Hi ,

I am inserting 200000 records into a table test1 and test2 , both are having same structure and test2 is having 1 extra sequence column.

Now while loading test1 is taking 10 sec while test2 is taking 1 hr.

What could be the reason ?

Why sequence creation is taking so much time ?

Regards,

Deb

Adrian_Oprea_1 · June 2015

You need to look at you sequence definition.

Here is a small example:

--Sequence
create sequence test_sq INCREMENT 1 START 1  CACHE 1;
--Table
CREATE TABLE sq_test(
      id INTEGER DEFAULT NEXTVAL('test_sq'),
      first_name VARCHAR(25), 
      last_name VARCHAR(25));
--Load statement	  
copy sq_test(first_name,last_name) from '/tmp/data.csv' delimiter '|' direct;	  

--What is going on with your running sessions.
--See that sequneces are alocated at run-time.(BAD)
(dbadmin@:5433) [dbadmin] *> select object_name,transaction_description,lock_mode,lock_scope from locks;
     object_name      |                                       transaction_description                                       | lock_mode | lock_scope
----------------------+-----------------------------------------------------------------------------------------------------+-----------+-------------
 Global Catalog       | Txn: a000000001bb16 'take number from sequence'                                                     | X         | TRANSACTION
 Global Catalog       | Txn: a000000001bb17 'take number from sequence'                                                     | X         | REQUESTED
 Cluster Topology     | Txn: a000000001bb16 'take number from sequence'                                                     | S         | TRANSACTION
 Cluster Topology     | Txn: a000000001bb17 'take number from sequence'                                                     | S         | TRANSACTION
 Table:public.sq_test | Txn: a000000001b80e 'copy sq_test(first_name,last_name) from '/tmp/data.csv' delimiter '|' direct;' | I         | TRANSACTION
 Rows Loaded
-------------
       20000
(1 row)

Time: First fetch (1 row): 84057.987 ms. All rows formatted: 84058.073 ms

You can see you will get tons of catalog locks because Vertica must perform a catalog lock to obtain a new set of values.
This will make you database performance bad because the insert cannot continue until HP Vertica releases the lock.

Now we will do the same action on a table that has the same ddl but is using a different sequence with a samll parameter alteration.(CACHE parameter)

--Sequence
create sequence test_sq2 INCREMENT 1 START 1 CACHE 200000;
--Table

CREATE TABLE sq_test2(
      id INTEGER DEFAULT NEXTVAL('test_sq2'),
      first_name VARCHAR(25), 
      last_name VARCHAR(25));	  

--Load statement	  
copy sq_test2(first_name,last_name) from '/tmp/data.csv' delimiter '|' direct;
 Rows Loaded
-------------
       20000
(1 row)

Time: First fetch (1 row): 77.707 ms. All rows formatted: 77.792 ms	  

--What is going on
--Nothing to see was to fast 
(dbadmin@:5433) [dbadmin] *> select object_name,transaction_description,lock_mode,lock_scope from locks;

So you need to take a close look at the CACHE parameter value - this represents the numbers are preallocated and stored in memory for faster access.
This value is allocated par session.

I hope this was helpfull

deb0687 · June 2015

Thanks Adrian.

This almost resolved most of the loading issues.

Thanks a lot.

Please tell me how high cache value matters here ?

Regards,

Deb

Adrian_Oprea_1 · June 2015

Well the cache value indicates the number of unique numbers each node allocates per session. The defualt value is 250.000(Since Vertica is a big data db) it espects you to load bunch of data as per session.

You can play around with this value depending on your workload.

When you alocate a small cache value and you will do varios inserts every time the numbers of unique session numbers gets used the catalog will get locked in order for vertica to obtain a new set of sequnece unique numbers.

In the case where you pre-alocate a bunch of them no lock will be needed durnig the insert.

Ohhh ,, Why alomost ? 200k rows should be 5 sec in a single thread on a 1cpu + 1Gb node

deb0687 · June 2015

Because I found in query_request that most of query were taking time those were having sequence.

My system conf is 32 GB RAM with 8 CPU.

I generally create sequence like

SQL >> create sequence sequence_name;

with out any parameter assigning.

Thanks for the very good suggestion.

deb0687 · June 2015

I have 1 more doubt regarding Ksafe.

I found that the maximum Ksafe value can be 2 only.

How about the ksafe where I will have more than 10 nodes and how it works.

Means super projection , Buddy projection , segmentation & all .

Please clarify the logic behind this.

Adrian_Oprea_1 · June 2015

K-safety represents the numbers of copies of your data the database cluster should hold.

K-safety 0 - means you only have one copy of your data(that is the initial one); same as single instance of MySQL installed on your home pc by default.(total of one copie of your data)

k-safety 1 - means you have the initial load plus one more copie(total of two copies of your data)

k-safety 2 - means you have the initial load plus two more copie(total of three copie of your data)

The K-safety is resolved by using segmeneted projections and buddy projections.

Segmented projection - apply for big tables, and the table is divided into N(number of nodes) segments that are then copied to other nodes.

This image might explain better:)

- asseen here you can loose any one node and stil you will have a copie of your data still available.

Buddy projections - applies for small tables that can simply be replicated.

Beside High Avalability this database structures(projection types) also provide performance enhancements.

In cases on 10 node or more the k-safety concepts stay the same and Falut Groups are introduced.

- as no matterthe k-safety you have on your database if all the nodes are located in the same rack all of thos copies of data wont save you.

Follow this video tutorial on falut groups posted on HP Vertica channel

https://vimeo.com/89022413

[Deleted User] · July 2015

Would like to correct this concept:

" Buddy projections - applies for small tables that can simply be replicated "

Buddy projections is the copy of segments of the segmented projections. The replicated projections do not have buddies because they are copied in every node.

Eugenia

We're Moving!

Create My New Community Account Now

Sequence taking more time

Comments

Leave a Comment