COPY statement generates "ERROR 4704: Sequence exceeded max value" message
Issuing a COPY statement from a Vertica cluster node results in an error
The COPY statement looks like this:
the data looks like
Thanks
David
ERROR 4704: Sequence exceeded max valueI can't find any information on the cause of this error or troubleshooting steps.
The COPY statement looks like this:
copy benchmarks.fake_requests ( ip, url, ua ) from '/tmp/data/???.gz' gzip/tmp/data contains tab delimited files like xaa.gz, xab.gz, etc.
ENCLOSED BY E'"'
delimiter E'\t'
direct ;
the data looks like
"46.62.215.*" "http://www.west.com/"; "Mozilla/5.0 (Windows NT 6.1; sl-SI; rv:1.9.1.20) Gecko/2010-08-29 03:33:05 Firefox/3.8"
"240.212.193.*" "http://www.paucek.com/"; "Mozilla/5.0 (compatible; MSIE 6.0; Windows 98; Trident/3.1)"
"62.14.30.*" "http://crist-hoeger.biz/"; "Mozilla/5.0 (compatible; MSIE 5.0; Windows NT 5.0; Trident/5.0)"
"86.204.99.*" "http://abshire.com/"; "Mozilla/5.0 (compatible; MSIE 6.0; Windows NT 5.2; Trident/4.0)"
The table is created like this:create schema if not exists benchmarks; set search_path=benchmarks,public; drop table if exists fake_requests; drop sequence if exists fake_requests_seq; create sequence fake_requests_seq cache 250000; create table fake_requests ( id int not null default nextval('fake_requests_seq'), ip varchar(15) not null, url varchar(256) not null, ua varchar(1024) null, primary key (id) ) segmented by modularhash(id) all nodes;Any idea why I am seeing this error? I don't see the error when using COPY FROM LOCAL option, only when I copy from the node. Also, I see the error whether I try to load multiple files by wildcard or a single file using an explicit file name.
Thanks
David
0
Comments
Hi David -
Your table has an ID column populated by default by a sequence
(id int not null default nextval('fake_requests_seq') ).
It's your fake_requests_seq sequence that overruns.
Have you dropped the table several times without dropping /re-initializing the sequence? Or do you indeed have such a humongous data file with so many rows that the sequence overruns from scratch?
If your table is not empty, then run:
select max(id) from fake_requests;
Then, run:
select fake_requests_seq.nextval;
... and decide by yourself how many numbers the sequence has left for you ...
Come to think of it: it also depends on how many ???..gz files you have. I think I remember that the COPY command would run several parallel sessions, one per file. And your sequence cache is at 250,000. Each thread that is instantiated for a *gz. file will begin the sequence at another multiple of 250,000.
Good luck -
marco
Its a clear case of your sequence running over the maximum limit for the cluster.
You can refer to these posts here to have more understanding on using sequences in Vertica.
Using Sequences
Hope this helps.
NC
- I am dropping and creating the table and the sequence on each attempt.
- I am using the default sequence cache setting of 250K. However, I see the same error even when using a sequence cache setting of 10K.
- There are 5 files with 100K rows each for a total of 500K rows.
- Even trying to load just one 100K row file causes this error.
If I do a \ds in vsql, I get this output, so no indication that we're even close to the Maximum value for the sequence.List of Sequences
Schema | Sequence | CurrentValue | IncrementBy | Minimum | Maximum | AllowCycle | Comment
------------+-------------------+--------------+-------------+---------+---------------------+------------+---------
If I look at nextval after the error, also not even close.dbadmin=> select fake_requests_seq.nextval;
nextval
---------
4346
I can provide copies of the code I am using to reproduce this if you're interested.
David
Given that I am using the default of 250000 for the sequence cache setting, and that I am loading only a modest quantity of data: 500,000 rows using 5 x 100,000 row files (on a single node cluster), even if one batch of sequence numbers is allocated per file per host, it doesn't make sense that I am bumping up against the maximum.
So, two questions:
1) Am I misunderstanding the calculation of sequence number allocation?
2) What the expected use case or best practice should be with regard to use of sequences and COPY.
Thanks much
David
To everyone who see this error please check Global Sequence value in catalog.
If it is too big, run:
PS
(Im not sure for solution, but it helped to me)
I did a clear install, created a new db and found in my catalog:
After clear (clear_all_sequences_cache): And no more fails with error: Sequence exceeded max value
If you turn off parallelism, it will probably go away.
Thanks for reporting the problem. The bug is resolved in 7.1.1-3.
Here is a link to the release notes:
http://my.vertica.com/docs/ReleaseNotes/7.1.x/HP_Vertica_7.1.x_Release_Notes.htm
- Shivani
Thanks for help.
Look, even not all drivers available to CE users and you want a last fix/patch? )
Forget about it, its for EE customers only or wait until it will be available to CE users (at least about a half year delay between last CE and EE versions). If you are CE user, so keep going to get an error. CE users not important for HP.
PS
2 months or half year... doesn't matter... project is active now.
I don't think that customer/user will be happy with "wait 2 month" (and still its not 100%, may be a close to half year too).
The same version is available for download for both CE & EE users at my.vertica.com.
And really, I can't understand it - if I can ask for patch from Support and friendly support will give me a patch, so why do not put in public? Why I need to ask? That Support will feel self-importance?
You are not so smart.
Im fighting for community in all ways(and don't care about my image) - that community will get a full support, last drivers, last patches, will not be ignored, etc. And what you do? Are you part of community? Or you are a spy?
By defending a poor support you are actually against a community.