truncate failed due to TupleMover

Hi,

I face the following weird problem. I have scheduled Sql requests executed regularly on my database, among which a truncate request on a staging table to empty it.
Sometimes, this truncate request fails with following error: 
Caught exception [Vertica][VJDBC](5157) ERROR: Unavailable: [Txn 0xa000000097e40d] O lock table - timeout error Timed out O locking Table:MBBQOE.R_CSV_SIG. U held by [user <unknown> (Moveout: (Table: MBBQOE.R_CSV_SIG) (Projection: MBBQOE.R_CSV_SIG_DBD_84_seg_OPTIM3_b1))]. Your current transaction isolation level is SERIALIZABLE while processing SQL script: TRUNCATE TABLE MBBQOE.R_CSV_SIG

The TM moveout on the table seems to prevent the truncate req from working, whereas in the doc, it is stated that system TM operations are aborted by the Truncate.
I quote it: "If the operation cannot obtain an O Lock on the table(s), HP Vertica attempts to close any internal Tuple Mover (TM) sessions running on the same table(s) so that the operation can proceed. Explicit TM operations that are running in user sessions are not closed. If an explicit TM operation is running on the table, then the operation cannot proceed until the explicit TM operation completes. "
So this behavior is not met in my case!
I don't understand why. Neither why the moveout seems to be executed by an unknown user, instead of being identified as a system internal user.
Can somebody help?
Thanks in advance!

Br

Comments

  • Hi ,

    What you see is not necessarily related to TM  , Its seems like its kind of bug with the  message Vertica is raise for locking conflicts  . When vertica build the locking conflict message it take the lock owner message text  from locks.transaction_description  column , this locks.transaction_description   column include the first SQL statement that being executed on the given transaction , below is an example :

     

    dbadmin=> select sysdate from dual;  <- simple select

              sysdate

    ----------------------------

    2014-08-06 06:30:46.704774 

    dbadmin=>  update aa set table_schema='public' where table_schema<>'public';  <- update , this create X lock on aa table

     OUTPUT

    --------

        236 

    select object_name,transaction_id,transaction_description,lock_mode from locks;  <- Her is the point  , the looking is on aa table , however transaction_description     show the first statement of the transaction                             

     

       object_name   |  transaction_id         | transaction_description                                      | lock_mode

    -----------------       +-------------------           +-------------------------------------------------+-----------

    Table:public.aa | 45035996278906235 | Txn: a00000004f5d7b 'select sysdate from dual;' | X

     

    Now Session #2 try to update the same table

    dbadmin=> update aa set table_schema='public' where table_schema<>'public';   ;  <- update , this try to create X lock on aa table

     

     

    ERROR 5156:  Unavailable: initiator locks for query - Locking failure: Timed out X locking Table:public.aa. X held by [user dbadmin (select sysdate from dual;)]. Your current transaction isolation level is READ COMMITTED

     

    As you can see  Vertica print wrong message (public.aa. X held by [user dbadmin (select sysdate from dual;)]) which is not necessarily related to the UPDATE command .


    Thanks 



  • Hi Eli,
    Thanks for your answer and your time.
    But I feel a bit lost: I cannot reproduce the behavior you describe and if this behavior is real, it means that such problems are difficult to investigate if the error messages are wrong!

    In my case however, i think the issue really comes from the TM because I am sure that at the time I perform the truncate on my table I am sure no other session tries to  update the same table. By construction of my application I would say.
    Except the TM.

    truncate alike the drop partition are the preferred method advised by Vertica to delete with performance. If it can fail due to TM, it becomes unreliable and therefore unusable, except maybe if the truncate is preceded by a test on a system table to check whether or not a moveout or mergeout is running on the table...not very user friendly....
    Did you personnally experiment such problems? 

    Tx,
  • Hi ,

    You don’t mention drop partition !

    Drop partition in  fact explicit  execution  TM  as a part of the drop process, (you can see it in the attach    test case) , the problem may related to the fact that in parallel to the truncate  process there is  a drop partition process that  is being execute   in the background ,  this may create the conflict   you are facing   .


    drop table partition_test;
    CREATE TABLE partition_test
    (
        id  int  ,    
        timestamp timestamp NOT NULL
     )
    PARTITION BY (((date_part('year', partition_test.timestamp) * 100) + date_part('month', partition_test.timestamp)));

    insert /*+ direct */ into partition_test values(1,'2014-08-28 14:13:52');
    insert /*+ direct */ into partition_test values(1,'2014-09-28 14:13:52');
    insert /*+ direct */  into partition_test values(1,'2014-10-28 14:13:52');

    commit;

    insert   into partition_test values(1,'2014-08-28 14:13:52');
    insert   into partition_test values(1,'2014-09-28 14:13:52');
    insert    into partition_test values(1,'2014-10-28 14:13:52');

    SELECT DROP_PARTITION('partition_test', '201408');   

    select object_name,mode from LOCK_USAGE    where  session_id in (SELECT session_id FROM v_monitor.current_session_p);
                  object_name               | mode
    ----------------------------------------+------
     Global Catalog                         | X
     Local Catalog                          | X
     Projection:public.partition_test_super | X
     Table:public.partition_test            | I
     Table:public.partition_test            | O
     Table:public.partition_test            | S
     Table:public.partition_test            | T   <- TM move out

    select BTRIM(description)::varchar(100) ,start_timestamp ,end_timestamp ,isolation,   user_name from  v_monitor.TRANSACTIONS where SESSION_id in (SELECT session_id FROM v_monitor.current_session_p);
    n    | user_name
    ------------------------------------------------------------------------------------------------------+-------------------------------+-------------------------------+----------------+-----------
     Txn: a00000000b5be6 'SELECT * FROM LOCK_USAGE     where  session_id in (SELECT session_id FROM v_mon | 2014-09-01 06:58:00.579893+00 |                               | READ COMMITTED | dbadmin
     Txn: a00000000b5be5 'drop_partition on table partition_test'                                         | 2014-09-01 06:58:00.569773+00 | 2014-09-01 06:58:00.573069+00 | SERIALIZABLE   | dbadmin
     Txn: a00000000b5be3 'Moveout: (Table: public.partition_test) (Projection: public.partition_test_supe | 2014-09-01 06:58:00.555999+00 | 2014-09-01 06:58:00.568631+00 | SERIALIZABLE   | dbadmin
     Txn: a00000000b5be2 'getUpNodeList'                                                                  | 2014-09-01 06:58:00.55528+00  | 2014-09-01 06:58:00.555341+00 | SERIALIZABLE   | dbadmin
     Txn: a00000000b5be1 'do_tm_task moveout: partition_test'                                             | 2014-09-01 06:58:00.554998+00 | 2014-09-01 06:58:00.555128+00 | SERIALIZABLE   | dbadmin
     Txn: a00000000b5bdd 'insert   into partition_test values(1,'2014-08-28 14:13:52');'                  | 2014-09-01 06:58:00.508192+00 | 2014-09-01 06:58:00.540062+00 | READ COMMITTED | dbadmin
     Txn: a00000000b5bd9 'CREATE PROJECTION public.partition_test AS  SELECT * FROM public.partition_test | 2014-09-01 06:58:00.433488+00 | 2014-09-01 06:58:00.439334+00 | SERIALIZABLE   | dbadmin
     Txn: a00000000b5bd8 'CREATE PROJECTION public.partition_test AS  SELECT * FROM public.partition_test | 2014-09-01 06:58:00.426657+00 | 2014-09-01 06:58:00.430223+00 | SERIALIZABLE   | dbadmin
     Txn: a00000000b5bd7 'insert /*+ direct */ into partition_test values(1,'2014-08-28 14:13:52');'      | 2014-09-01 06:58:00.422502+00 | 2014-09-01 06:58:00.50458+00  | READ COMMITTED | dbadmin
     Txn: a00000000b5bd6 'CREATE TABLE partition_test (     id  int  ,         timestamp timestamp NOT NU | 2014-09-01 06:58:00.414655+00 | 2014-09-01 06:58:00.417468+00 | SERIALIZABLE   | dbadmin
     Txn: a00000000b5bd5 'drop table partition_test;'                                                     | 2014-09-01 06:57:59.966594+00 | 2014-09-01 06:57:59.970756+00 | SERIALIZABLE   | dbadmin
     Txn: a00000000b5221 'SELECT * FROM LOCK_USAGE     where  session_id in (SELECT session_id FROM v_mon | 2014-09-01 06:53:47.450456+00 | 2014-09-01 06:57:59.962902+00 | READ COMMITTED | dbadmin
     Txn: a00000000b5212 'drop_partition on table partition_test'                                         | 2014-09-01 06:53:40.499926+00 | 2014-09-01 06:53:40.583305+00 | SERIALIZABLE   | dbadmin
     Txn: a00000000b5210 'Moveout: (Table: public.partition_test) (Projection: public.partition_test_supe | 2014-09-01 06:53:40.4462+00   | 2014-09-01 06:53:40.460215+00 | SERIALIZABLE   | dbadmin
     Txn: a00000000b520f 'getUpNodeList'                                                                  | 2014-09-01 06:53:40.444171+00 | 2014-09-01 06:53:40.444283+00 | SERIALIZABLE   | dbadmin
     Txn: a00000000b520e 'do_tm_task moveout: partition_test'                                             | 2014-09-01 06:53:40.44301+00  | 2014-09-01 06:53:40.443145+00 | SERIALIZABLE   | dbadmin
     Txn: a00000000b5205 'insert   into partition_test values(1,'2014-08-28 14:13:52');'                  | 2014-09-01 06:53:33.647977+00 | 2014-09-01 06:53:40.440712+00 | READ COMMITTED | dbadmin
     Txn: a00000000b5201 'CREATE PROJECTION public.partition_test AS  SELECT * FROM public.partition_test | 2014-09-01 06:53:33.519539+00 | 2014-09-01 06:53:33.525714+00 | SERIALIZABLE   | dbadmin
     Txn: a00000000b5200 'CREATE PROJECTION public.partition_test AS  SELECT * FROM public.partition_test | 2014-09-01 06:53:33.471611+00 | 2014-09-01 06:53:33.515087+00 | SERIALIZABLE   | dbadmin
     Txn: a00000000b51ff 'insert /*+ direct */ into partition_test values(1,'2014-08-28 14:13:52');'      | 2014-09-01 06:53:33.397394+00 | 2014-09-01 06:53:33.639342+00 | READ COMMITTED | dbadmin
     Txn: a00000000b51f5 'CREATE TABLE partition_test (     id  int  ,         timestamp timestamp NOT NU | 2014-09-01 06:53:25.543791+00 | 2014-09-01 06:53:25.561091+00 | SERIALIZABLE   | dbadmin
     Txn: a00000000b51ee 'drop table partition_test;'                                                     | 2014-09-01 06:53:18.284828+00 | 2014-09-01 06:53:18.474467+00 | SERIALIZABLE   | dbadmin


  • Hi Eli!

    One small (minor) comment - you don't need to query system table to extract a session_id

    daniel=> select current_session();
     current_session
    -----------------
     xps-3139:0x29
    (1 row)

    So your query will be simplified to:
    select object_name, mode from LOCK_USAGE where session_id = current_session();
  • Sure tnx 
  • Hi, I mention drop partition because the same error and message occur with drop partition too. But the drop partition my application performs are not background operations, they are controlled by the application and do not occur at the same time as truncate operations.
  • Navin_CNavin_C Vertica Customer
    Hi Leroy,

    TM might be the possible reason behind getting a O lock on the table.
    One more possibility is some DML statement fired by another user through another sessions, due to which you will not be able to TRUNCATE a table, This might be due to an uncommitted transaction by another session on same table.

    Before truncating,  perform close_all_Sessions function and check.

    Another way to track this if TM is actually causing the locks while truncating is using the system table system_sessions.

    This will tell you whether there are any TM operations going in background of the session with the timestamp.

    Hope this helps.
    NC
  • Navin_CNavin_C Vertica Customer
    Hi Leroy,

    Hope your issue is solved.

    I was going through the release notes of Vertica 7.1.x and I found this know issue.

    Issue number : VER-29360
    Related to  : Transactions
    Description :
    A DROP_PARTITION operation is expected to interrupt or cancel a conflicting MERGEOUT operation. In some instances the MERGEOUT operation does not cancel and the 
    DROP_PARTITION times out waiting for a table lock. Furthermore, a race condition in error reporting can sometimes result in a CANCEL error message being returned instead of the 
    expected LOCK TIMEOUT error message.
    Workaround:
    Retry the DROP_PARTITION operation.


    Hope this helps
    NC


  • Hi Navin,

    Thanks a lot. It is exactly the problem I have but with the truncate instead of drop partition and moveout instead of mergeout. So it seems like a bug too. 
    I fear I cannot do anything more than retry my truncate command for the moment.I will raise the criticality of the issue at Vertica support because the recommend delete operation for perf reason is either a drop partition or a truncate but it should be reliable ( atomic)
  • vsql:MergeClaimTablesPROD.sql:263: ERROR 5156:  Unavailable: initiator locks for query - Locking failure: Timed out X locking Table:T_CA_NCI. X held by [user dsksdss (MERGE INTO T_CA_NCI AS target^

     

     

    Should we be increasing LOCK TIME to this table? Does this error mean that table is not locked enough to complete the merge? It went good for all other table, this one is huge with 7 million rows to merge. We have a source table in staging of vertica...

     

  • SruthiASruthiA Vertica Employee Administrator

    Hi,

     

     Yes the locking time has to be increased.  Default value of locktimeout is set to 300 seconds in vertica, we can try setting it to 600 seconds, so that any transaction requiring a lock can waiting for longer time before it throw lock timeout error.

     

    You can set it by using the below query

     

    select set_config_parameter('LockTimeout',600);

     

    -Regards,

     Sruthi

  • whats the max we can put this at? I had it at 1000 but still my table fails to merge. 

     

    9105007 is the count of my staging table that I am trying to MERGE.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file