truncate failed due to TupleMover
Hi,
I face the following weird problem. I have scheduled Sql requests executed regularly on my database, among which a truncate request on a staging table to empty it.
Sometimes, this truncate request fails with following error:
Caught exception [Vertica][VJDBC](5157) ERROR: Unavailable: [Txn 0xa000000097e40d] O lock table - timeout error Timed out O locking Table:MBBQOE.R_CSV_SIG. U held by [user <unknown> (Moveout: (Table: MBBQOE.R_CSV_SIG) (Projection: MBBQOE.R_CSV_SIG_DBD_84_seg_OPTIM3_b1))]. Your current transaction isolation level is SERIALIZABLE while processing SQL script: TRUNCATE TABLE MBBQOE.R_CSV_SIG
The TM moveout on the table seems to prevent the truncate req from working, whereas in the doc, it is stated that system TM operations are aborted by the Truncate.
I quote it: "If the operation cannot obtain an O Lock on the table(s), HP Vertica attempts to close any internal Tuple Mover (TM) sessions running on the same table(s) so that the operation can proceed. Explicit TM operations that are running in user sessions are not closed. If an explicit TM operation is running on the table, then the operation cannot proceed until the explicit TM operation completes. "
So this behavior is not met in my case!
I don't understand why. Neither why the moveout seems to be executed by an unknown user, instead of being identified as a system internal user.
Can somebody help?
Thanks in advance!
Br
I face the following weird problem. I have scheduled Sql requests executed regularly on my database, among which a truncate request on a staging table to empty it.
Sometimes, this truncate request fails with following error:
Caught exception [Vertica][VJDBC](5157) ERROR: Unavailable: [Txn 0xa000000097e40d] O lock table - timeout error Timed out O locking Table:MBBQOE.R_CSV_SIG. U held by [user <unknown> (Moveout: (Table: MBBQOE.R_CSV_SIG) (Projection: MBBQOE.R_CSV_SIG_DBD_84_seg_OPTIM3_b1))]. Your current transaction isolation level is SERIALIZABLE while processing SQL script: TRUNCATE TABLE MBBQOE.R_CSV_SIG
The TM moveout on the table seems to prevent the truncate req from working, whereas in the doc, it is stated that system TM operations are aborted by the Truncate.
I quote it: "If the operation cannot obtain an O Lock on the table(s), HP Vertica attempts to close any internal Tuple Mover (TM) sessions running on the same table(s) so that the operation can proceed. Explicit TM operations that are running in user sessions are not closed. If an explicit TM operation is running on the table, then the operation cannot proceed until the explicit TM operation completes. "
So this behavior is not met in my case!
I don't understand why. Neither why the moveout seems to be executed by an unknown user, instead of being identified as a system internal user.
Can somebody help?
Thanks in advance!
Br
0
Comments
What you see is not necessarily related to TM , Its seems like its kind of bug with the message Vertica is raise for locking conflicts . When vertica build the locking conflict message it take the lock owner message text from locks.transaction_description column , this locks.transaction_description column include the first SQL statement that being executed on the given transaction , below is an example :
dbadmin=> select sysdate from dual; <- simple select
sysdate
----------------------------
2014-08-06 06:30:46.704774
dbadmin=> update aa set table_schema='public' where table_schema<>'public'; <- update , this create X lock on aa table
OUTPUT
--------
236
select object_name,transaction_id,transaction_description,lock_mode from locks; <- Her is the point , the looking is on aa table , however transaction_description show the first statement of the transaction
object_name | transaction_id | transaction_description | lock_mode
----------------- +------------------- +-------------------------------------------------+-----------
Table:public.aa | 45035996278906235 | Txn: a00000004f5d7b 'select sysdate from dual;' | X
Now Session #2 try to update the same table
dbadmin=> update aa set table_schema='public' where table_schema<>'public'; ; <- update , this try to create X lock on aa table
ERROR 5156: Unavailable: initiator locks for query - Locking failure: Timed out X locking Table:public.aa. X held by [user dbadmin (select sysdate from dual;)]. Your current transaction isolation level is READ COMMITTED
As you can see Vertica print wrong message (public.aa. X held by [user dbadmin (select sysdate from dual;)]) which is not necessarily related to the UPDATE command .
Thanks
Thanks for your answer and your time.
But I feel a bit lost: I cannot reproduce the behavior you describe and if this behavior is real, it means that such problems are difficult to investigate if the error messages are wrong!
In my case however, i think the issue really comes from the TM because I am sure that at the time I perform the truncate on my table I am sure no other session tries to update the same table. By construction of my application I would say.
Except the TM.
truncate alike the drop partition are the preferred method advised by Vertica to delete with performance. If it can fail due to TM, it becomes unreliable and therefore unusable, except maybe if the truncate is preceded by a test on a system table to check whether or not a moveout or mergeout is running on the table...not very user friendly....
Did you personnally experiment such problems?
Tx,
You don’t mention drop partition !
Drop partition in fact explicit execution TM as a part of the drop process, (you can see it in the attach test case) , the problem may related to the fact that in parallel to the truncate process there is a drop partition process that is being execute in the background , this may create the conflict you are facing .drop table partition_test;
CREATE TABLE partition_test
(
id int ,
timestamp timestamp NOT NULL
)
PARTITION BY (((date_part('year', partition_test.timestamp) * 100) + date_part('month', partition_test.timestamp)));
insert /*+ direct */ into partition_test values(1,'2014-08-28 14:13:52');
insert /*+ direct */ into partition_test values(1,'2014-09-28 14:13:52');
insert /*+ direct */ into partition_test values(1,'2014-10-28 14:13:52');
commit;
insert into partition_test values(1,'2014-08-28 14:13:52');
insert into partition_test values(1,'2014-09-28 14:13:52');
insert into partition_test values(1,'2014-10-28 14:13:52');
SELECT DROP_PARTITION('partition_test', '201408');
select object_name,mode from LOCK_USAGE where session_id in (SELECT session_id FROM v_monitor.current_session_p);
object_name | mode
----------------------------------------+------
Global Catalog | X
Local Catalog | X
Projection:public.partition_test_super | X
Table:public.partition_test | I
Table:public.partition_test | O
Table:public.partition_test | S
Table:public.partition_test | T <- TM move out
select BTRIM(description)::varchar(100) ,start_timestamp ,end_timestamp ,isolation, user_name from v_monitor.TRANSACTIONS where SESSION_id in (SELECT session_id FROM v_monitor.current_session_p);
n | user_name
------------------------------------------------------------------------------------------------------+-------------------------------+-------------------------------+----------------+-----------
Txn: a00000000b5be6 'SELECT * FROM LOCK_USAGE where session_id in (SELECT session_id FROM v_mon | 2014-09-01 06:58:00.579893+00 | | READ COMMITTED | dbadmin
Txn: a00000000b5be5 'drop_partition on table partition_test' | 2014-09-01 06:58:00.569773+00 | 2014-09-01 06:58:00.573069+00 | SERIALIZABLE | dbadmin
Txn: a00000000b5be3 'Moveout: (Table: public.partition_test) (Projection: public.partition_test_supe | 2014-09-01 06:58:00.555999+00 | 2014-09-01 06:58:00.568631+00 | SERIALIZABLE | dbadmin
Txn: a00000000b5be2 'getUpNodeList' | 2014-09-01 06:58:00.55528+00 | 2014-09-01 06:58:00.555341+00 | SERIALIZABLE | dbadmin
Txn: a00000000b5be1 'do_tm_task moveout: partition_test' | 2014-09-01 06:58:00.554998+00 | 2014-09-01 06:58:00.555128+00 | SERIALIZABLE | dbadmin
Txn: a00000000b5bdd 'insert into partition_test values(1,'2014-08-28 14:13:52');' | 2014-09-01 06:58:00.508192+00 | 2014-09-01 06:58:00.540062+00 | READ COMMITTED | dbadmin
Txn: a00000000b5bd9 'CREATE PROJECTION public.partition_test AS SELECT * FROM public.partition_test | 2014-09-01 06:58:00.433488+00 | 2014-09-01 06:58:00.439334+00 | SERIALIZABLE | dbadmin
Txn: a00000000b5bd8 'CREATE PROJECTION public.partition_test AS SELECT * FROM public.partition_test | 2014-09-01 06:58:00.426657+00 | 2014-09-01 06:58:00.430223+00 | SERIALIZABLE | dbadmin
Txn: a00000000b5bd7 'insert /*+ direct */ into partition_test values(1,'2014-08-28 14:13:52');' | 2014-09-01 06:58:00.422502+00 | 2014-09-01 06:58:00.50458+00 | READ COMMITTED | dbadmin
Txn: a00000000b5bd6 'CREATE TABLE partition_test ( id int , timestamp timestamp NOT NU | 2014-09-01 06:58:00.414655+00 | 2014-09-01 06:58:00.417468+00 | SERIALIZABLE | dbadmin
Txn: a00000000b5bd5 'drop table partition_test;' | 2014-09-01 06:57:59.966594+00 | 2014-09-01 06:57:59.970756+00 | SERIALIZABLE | dbadmin
Txn: a00000000b5221 'SELECT * FROM LOCK_USAGE where session_id in (SELECT session_id FROM v_mon | 2014-09-01 06:53:47.450456+00 | 2014-09-01 06:57:59.962902+00 | READ COMMITTED | dbadmin
Txn: a00000000b5212 'drop_partition on table partition_test' | 2014-09-01 06:53:40.499926+00 | 2014-09-01 06:53:40.583305+00 | SERIALIZABLE | dbadmin
Txn: a00000000b5210 'Moveout: (Table: public.partition_test) (Projection: public.partition_test_supe | 2014-09-01 06:53:40.4462+00 | 2014-09-01 06:53:40.460215+00 | SERIALIZABLE | dbadmin
Txn: a00000000b520f 'getUpNodeList' | 2014-09-01 06:53:40.444171+00 | 2014-09-01 06:53:40.444283+00 | SERIALIZABLE | dbadmin
Txn: a00000000b520e 'do_tm_task moveout: partition_test' | 2014-09-01 06:53:40.44301+00 | 2014-09-01 06:53:40.443145+00 | SERIALIZABLE | dbadmin
Txn: a00000000b5205 'insert into partition_test values(1,'2014-08-28 14:13:52');' | 2014-09-01 06:53:33.647977+00 | 2014-09-01 06:53:40.440712+00 | READ COMMITTED | dbadmin
Txn: a00000000b5201 'CREATE PROJECTION public.partition_test AS SELECT * FROM public.partition_test | 2014-09-01 06:53:33.519539+00 | 2014-09-01 06:53:33.525714+00 | SERIALIZABLE | dbadmin
Txn: a00000000b5200 'CREATE PROJECTION public.partition_test AS SELECT * FROM public.partition_test | 2014-09-01 06:53:33.471611+00 | 2014-09-01 06:53:33.515087+00 | SERIALIZABLE | dbadmin
Txn: a00000000b51ff 'insert /*+ direct */ into partition_test values(1,'2014-08-28 14:13:52');' | 2014-09-01 06:53:33.397394+00 | 2014-09-01 06:53:33.639342+00 | READ COMMITTED | dbadmin
Txn: a00000000b51f5 'CREATE TABLE partition_test ( id int , timestamp timestamp NOT NU | 2014-09-01 06:53:25.543791+00 | 2014-09-01 06:53:25.561091+00 | SERIALIZABLE | dbadmin
Txn: a00000000b51ee 'drop table partition_test;' | 2014-09-01 06:53:18.284828+00 | 2014-09-01 06:53:18.474467+00 | SERIALIZABLE | dbadmin
One small (minor) comment - you don't need to query system table to extract a session_id
daniel=> select current_session();
current_session
-----------------
xps-3139:0x29
(1 row)
So your query will be simplified to:
TM might be the possible reason behind getting a O lock on the table.
One more possibility is some DML statement fired by another user through another sessions, due to which you will not be able to TRUNCATE a table, This might be due to an uncommitted transaction by another session on same table.
Before truncating, perform close_all_Sessions function and check.
Another way to track this if TM is actually causing the locks while truncating is using the system table system_sessions.
This will tell you whether there are any TM operations going in background of the session with the timestamp.
Hope this helps.
NC
Hope your issue is solved.
I was going through the release notes of Vertica 7.1.x and I found this know issue.
Issue number : VER-29360
Related to : Transactions
Description :
A DROP_PARTITION operation is expected to interrupt or cancel a conflicting MERGEOUT operation. In some instances the MERGEOUT operation does not cancel and the
DROP_PARTITION times out waiting for a table lock. Furthermore, a race condition in error reporting can sometimes result in a CANCEL error message being returned instead of the
expected LOCK TIMEOUT error message.
Workaround:
Retry the DROP_PARTITION operation.
Hope this helps
NC
Thanks a lot. It is exactly the problem I have but with the truncate instead of drop partition and moveout instead of mergeout. So it seems like a bug too.
I fear I cannot do anything more than retry my truncate command for the moment.I will raise the criticality of the issue at Vertica support because the recommend delete operation for perf reason is either a drop partition or a truncate but it should be reliable ( atomic)
vsql:MergeClaimTablesPROD.sql:263: ERROR 5156: Unavailable: initiator locks for query - Locking failure: Timed out X locking Table:T_CA_NCI. X held by [user dsksdss (MERGE INTO T_CA_NCI AS target^
Should we be increasing LOCK TIME to this table? Does this error mean that table is not locked enough to complete the merge? It went good for all other table, this one is huge with 7 million rows to merge. We have a source table in staging of vertica...
Hi,
Yes the locking time has to be increased. Default value of locktimeout is set to 300 seconds in vertica, we can try setting it to 600 seconds, so that any transaction requiring a lock can waiting for longer time before it throw lock timeout error.
You can set it by using the below query
select set_config_parameter('LockTimeout',600);
-Regards,
Sruthi
whats the max we can put this at? I had it at 1000 but still my table fails to merge.
9105007 is the count of my staging table that I am trying to MERGE.