can I flush the cache of a sequence without logout a session

vincent_wu · October 2014

we run ETL in parallel against Vertica database.
there is a sequence name S1 start with 1 cache 1000

step 1:
at time 00:01:00 in session 1. we run a sql as below:
insert into T1(ID, name, desc) select S1.nextval, name, desc from T2; it will insert 200 records.

step 2:
at time 00:02:00 in session 2, we run the same sql as below:
insert into T1(ID, name, desc) select S1.nextval, name, desc from T2; it will insert 2300 records.

step 3:
at time 00:05:00 in session 1, we run the same sql again as below:
insert into T1(ID, name, desc) select S1.nextval, name, desc from T2; it will insert 300 records.

I notice that we get the smaller ID at step 3 then what we get at step 2, although we run the sql in 3 minutes later.

I know that I can set the sequence no cache to avoid this issue. But it would raise another performance issue.

how can I get the bigger sequence number which I call it later? I mean don't logout the sessions.

I wonder if we can flush the cache of a sequence without logout the session?

marcothesane · October 2014

Hi Vincent

Cached sequences will always start with a multiple of the cache size on each node. So, in your case, it will start with 0 on one node, with 1000 on a second node, and with 2000 on a third node. This makes sure that the nodes don't need to communicate with each other every time a <sequence>.nextval is generated. In your case, steps 1 and 2 ran on nodes where the sequence started at a higher multiple of 1000 than in step 3.

This is how cached sequences work in Vertica.

If you absolutely need a higher number for that column when the row was inserted later, you had better use a different way to populate it.

This could be a relatively performant way: use the microseconds elapsed between an anchor timestamp (start of this year in my example), and the current timestamp:

SQL>select 
 timestampdiff(microsecond
              ,'2014-01-01 00:00:00'::timestamp
              ,current_timestamp);
timestampdiff
  25,873,495,176,734

Sequences, even if they are called like that, do not generate clear predecessor/successor sequences, in Vertica, if they are cached. They are a means to generate a value that is guaranteed to be unique, but they will never be helpful to generate any semantics in what they generate (in your case, which row was generated later than another).

Another way could be to first

select max(ID) from T1

, then to use a

ROW_NUMBER()

OLAP function to add to the maximum ID in the insert-select statement.

Happy playing and good luck -

marco

can I flush the cache of a sequence without logout a session

Comments

Leave a Comment