Sequence taking more time
Hi ,
I am inserting 200000 records into a table test1 and test2 , both are having same structure and test2 is having 1 extra sequence column.
Now while loading test1 is taking 10 sec while test2 is taking 1 hr.
What could be the reason ?
Why sequence creation is taking so much time ?
Regards,
Deb
0
Comments
You need to look at you sequence definition.
Here is a small example:
You can see you will get tons of catalog locks because Vertica must perform a catalog lock to obtain a new set of values.
This will make you database performance bad because the insert cannot continue until HP Vertica releases the lock.
Now we will do the same action on a table that has the same ddl but is using a different sequence with a samll parameter alteration.(CACHE parameter)
So you need to take a close look at the CACHE parameter value - this represents the numbers are preallocated and stored in memory for faster access.
This value is allocated par session.
I hope this was helpfull
Thanks Adrian.
This almost resolved most of the loading issues.
Thanks a lot.
Please tell me how high cache value matters here ?
Regards,
Deb
Well the cache value indicates the number of unique numbers each node allocates per session. The defualt value is 250.000(Since Vertica is a big data db) it espects you to load bunch of data as per session.
You can play around with this value depending on your workload.
When you alocate a small cache value and you will do varios inserts every time the numbers of unique session numbers gets used the catalog will get locked in order for vertica to obtain a new set of sequnece unique numbers.
In the case where you pre-alocate a bunch of them no lock will be needed durnig the insert.
Ohhh ,, Why alomost ? 200k rows should be 5 sec in a single thread on a 1cpu + 1Gb node
Because I found in query_request that most of query were taking time those were having sequence.
My system conf is 32 GB RAM with 8 CPU.
I generally create sequence like
SQL >> create sequence sequence_name;
with out any parameter assigning.
Thanks for the very good suggestion.
I have 1 more doubt regarding Ksafe.
I found that the maximum Ksafe value can be 2 only.
How about the ksafe where I will have more than 10 nodes and how it works.
Means super projection , Buddy projection , segmentation & all .
Please clarify the logic behind this.
K-safety represents the numbers of copies of your data the database cluster should hold.
K-safety 0 - means you only have one copy of your data(that is the initial one); same as single instance of MySQL installed on your home pc by default.(total of one copie of your data)
k-safety 1 - means you have the initial load plus one more copie(total of two copies of your data)
k-safety 2 - means you have the initial load plus two more copie(total of three copie of your data)
The K-safety is resolved by using segmeneted projections and buddy projections.
Segmented projection - apply for big tables, and the table is divided into N(number of nodes) segments that are then copied to other nodes.
This image might explain better:)
- asseen here you can loose any one node and stil you will have a copie of your data still available.
Buddy projections - applies for small tables that can simply be replicated.
Beside High Avalability this database structures(projection types) also provide performance enhancements.
In cases on 10 node or more the k-safety concepts stay the same and Falut Groups are introduced.
- as no matterthe k-safety you have on your database if all the nodes are located in the same rack all of thos copies of data wont save you.
Follow this video tutorial on falut groups posted on HP Vertica channel
https://vimeo.com/89022413
Would like to correct this concept:
" Buddy projections - applies for small tables that can simply be replicated "
Buddy projections is the copy of segments of the segmented projections. The replicated projections do not have buddies because they are copied in every node.
Eugenia