Options

Sequence taking more time

Hi ,

 

 

I am inserting 200000 records into a table test1 and test2 , both are having same structure and test2 is having 1 extra sequence column.

 

Now while loading test1 is taking 10 sec while test2 is taking 1 hr.

 

 

What could be the reason ?

 

Why sequence creation is taking so much time ?

 

 

Regards,

Deb

Comments

  • Options

    You need to look at you sequence definition. 

     

    Here is a small example:

    --Sequence
    create sequence test_sq INCREMENT 1 START 1 CACHE 1;
    --Table
    CREATE TABLE sq_test(
    id INTEGER DEFAULT NEXTVAL('test_sq'),
    first_name VARCHAR(25),
    last_name VARCHAR(25));
    --Load statement
    copy sq_test(first_name,last_name) from '/tmp/data.csv' delimiter '|' direct;

    --What is going on with your running sessions.
    --See that sequneces are alocated at run-time.(BAD)
    (dbadmin@:5433) [dbadmin] *> select object_name,transaction_description,lock_mode,lock_scope from locks;
    object_name | transaction_description | lock_mode | lock_scope
    ----------------------+-----------------------------------------------------------------------------------------------------+-----------+-------------
    Global Catalog | Txn: a000000001bb16 'take number from sequence' | X | TRANSACTION
    Global Catalog | Txn: a000000001bb17 'take number from sequence' | X | REQUESTED
    Cluster Topology | Txn: a000000001bb16 'take number from sequence' | S | TRANSACTION
    Cluster Topology | Txn: a000000001bb17 'take number from sequence' | S | TRANSACTION
    Table:public.sq_test | Txn: a000000001b80e 'copy sq_test(first_name,last_name) from '/tmp/data.csv' delimiter '|' direct;' | I | TRANSACTION
    Rows Loaded
    -------------
    20000
    (1 row)

    Time: First fetch (1 row): 84057.987 ms. All rows formatted: 84058.073 ms

    You can see you will get tons of catalog locks because Vertica must perform a catalog lock to obtain a new set of values.
    This will make you database performance bad because the insert cannot continue until HP Vertica releases the lock.

     

    Now we will do the same action on a table that has the same ddl but is using a different sequence with a samll parameter alteration.(CACHE parameter)

    --Sequence
    create sequence test_sq2 INCREMENT 1 START 1 CACHE 200000;
    --Table

    CREATE TABLE sq_test2(
    id INTEGER DEFAULT NEXTVAL('test_sq2'),
    first_name VARCHAR(25),
    last_name VARCHAR(25));

    --Load statement
    copy sq_test2(first_name,last_name) from '/tmp/data.csv' delimiter '|' direct;
    Rows Loaded
    -------------
    20000
    (1 row)

    Time: First fetch (1 row): 77.707 ms. All rows formatted: 77.792 ms

    --What is going on
    --Nothing to see was to fast :)
    (dbadmin@:5433) [dbadmin] *> select object_name,transaction_description,lock_mode,lock_scope from locks;

    So you need to take a close look at the CACHE parameter value - this represents the numbers are preallocated and stored in memory for faster access.
    This value is allocated par session.

     

    I hope this was helpfull :) 

  • Options

    Thanks Adrian.

     

     

    This almost resolved most of the loading issues.

     

     

    Thanks a lot.

     

     

     

    Please tell me how high cache value matters here ?

     

     

    Regards,

    Deb

  • Options

     Well the cache value indicates the number of unique numbers each node allocates per session. The defualt value is 250.000(Since Vertica is a big data db) it espects you to load bunch of data as per session.

     You can play around with this value depending on your workload. 

     When you alocate a small cache value and you will do varios inserts every time the numbers of unique session numbers gets used the catalog will get locked in order for vertica to obtain a new set of sequnece unique numbers. 

     In the case where you pre-alocate a bunch of them no lock will be needed durnig the insert. 

     

    Ohhh ,, Why alomost ?  200k rows should be 5 sec in a single thread on a 1cpu + 1Gb node :) 

  • Options

    Because I found in query_request that most of query were taking time those were having sequence.

     

    My system conf is 32 GB  RAM with 8 CPU.

     

     

    I generally create sequence like

     

    SQL >> create sequence sequence_name;

     

    with out any parameter assigning.

     

    Thanks for the very good suggestion.

     

     

  • Options

    I have 1 more doubt regarding Ksafe.

     

    I found that the maximum Ksafe value can be 2 only.

     

    How about the ksafe where I will have more than 10 nodes and how it works.

     

    Means super projection , Buddy projection , segmentation & all .

     

    Please clarify the logic behind this.

  • Options

    K-safety represents the numbers of copies of your data the database cluster should hold.

    K-safety 0  - means you only have one copy of your data(that is the initial one); same as single instance of MySQL installed on your home pc by default.(total of one copie of your data)

    k-safety 1  - means you have the initial load plus one more copie(total of two copies of your data)

    k-safety 2  - means you have the initial load plus two more copie(total of three copie of your data)

     

    The K-safety is resolved by using segmeneted projections and buddy projections.

     

    Segmented projection  - apply for big tables, and the table is divided into N(number of nodes) segments that are then copied to other nodes.  

    This image might explain better:) 

    - asseen here you can loose any one node and stil you will have a copie of your data still available.

    image

     

    Buddy projections - applies for small tables that can simply be replicated.

    image

     

    Beside High Avalability this database structures(projection types) also provide performance enhancements.

     

    In cases on 10 node or more the k-safety concepts stay the same and Falut Groups are introduced.

    - as no matterthe k-safety you have on your database if all the nodes are located in the same rack all of thos copies of data wont save you.

     Follow this video tutorial on falut groups posted on HP Vertica channel

    https://vimeo.com/89022413 

  • Options

    Would like to correct this concept: 

     

    " Buddy projections - applies for small tables that can simply be replicated "

     

    Buddy projections is the copy of segments of the segmented projections. The replicated projections do not have buddies because they are copied in every node.

     

    Eugenia

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file