Kafka scheduler

Hello!

 

I run two kafka schedulers on two different nodes - node1 and node2.

 

On node1 I see:

 

2016-06-29 05:04:27.646 com.vertica.solutions.kafka.scheduler.StreamCoordinator::Main  [INFO] Starting frame @ 2016-06-29 05:04:27.646
2016-06-29 05:04:27.666 com.vertica.solutions.kafka.scheduler.FrameScheduler::Main [INFO] Starting compute batches for new Frame.
2016-06-29 05:04:27.748 com.vertica.solutions.kafka.scheduler.FrameScheduler::Main [INFO] Completed computing batch set for current Frame.
2016-06-29 05:04:33.690 com.vertica.solutions.kafka.scheduler.LeaderSelector::Main [ERROR] Caught SQLException during Leadership Lock Procedure. Rolling back txn.
java.sql.SQLTransactionRollbackException: [Vertica][VJDBC](5156) ERROR: Unavailable: initiator locks for query - Locking failure: Timed out X locking Table:kafka_config.kafka_lock.
S held by [user dbadmin (SELECT scheduler_id FROM "kafka_config".kafka_lock)]. Your current transaction isolation level is SERIALIZABLE
at com.vertica.util.ServerErrorData.buildException(Unknown Source)
at com.vertica.dataengine.VResultSet.fetchChunk(Unknown Source)
at com.vertica.dataengine.VResultSet.initialize(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.readExecuteResponse(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.handleExecuteResponse(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.execute(Unknown Source)
at com.vertica.jdbc.common.SStatement.executeNoParams(Unknown Source)
at com.vertica.jdbc.common.SStatement.executeQuery(Unknown Source)
at com.vertica.solutions.kafka.scheduler.LeaderSelector.lock(LeaderSelector.java:99)
at com.vertica.solutions.kafka.scheduler.StreamCoordinator.run(StreamCoordinator.java:206)
at com.vertica.solutions.kafka.Launcher.run(Launcher.java:97)
at com.vertica.solutions.kafka.Launcher.main(Launcher.java:150)
Caused by: com.vertica.support.exceptions.TransactionRollbackException: [Vertica][VJDBC](5156) ERROR: Unavailable: initiator locks for query - Locking failure: Timed out X locking T
able:kafka_config.kafka_lock. S held by [user dbadmin (SELECT scheduler_id FROM "kafka_config".kafka_lock)]. Your current transaction isolation level is SERIALIZABLE
... 12 more
2016-06-29 05:04:37.929 com.vertica.solutions.kafka.scheduler.LaneWorker::Lane Worker 2 [INFO] Lane Worker 2 waiting for batch...
2016-06-29 05:04:37.929 com.vertica.solutions.kafka.scheduler.StreamCoordinator::Main [INFO] Starting frame @ 2016-06-29 05:04:37.929
2016-06-29 05:04:37.949 com.vertica.solutions.kafka.scheduler.FrameScheduler::Main [INFO] Starting compute batches for new Frame.
2016-06-29 05:04:38.027 com.vertica.solutions.kafka.scheduler.FrameScheduler::Main [INFO] Completed computing batch set for current Frame.
2016-06-29 05:04:48.217 com.vertica.solutions.kafka.scheduler.LaneWorker::Lane Worker 2 [INFO] Lane Worker 2 waiting for batch...
2016-06-29 05:04:48.217 com.vertica.solutions.kafka.scheduler.StreamCoordinator::Main [INFO] Starting frame @ 2016-06-29 05:04:48.217
2016-06-29 05:04:48.236 com.vertica.solutions.kafka.scheduler.FrameScheduler::Main [INFO] Starting compute batches for new Frame.
2016-06-29 05:04:48.317 com.vertica.solutions.kafka.scheduler.FrameScheduler::Main [INFO] Completed computing batch set for current Frame.
2016-06-29 05:04:58.502 com.vertica.solutions.kafka.scheduler.LaneWorker::Lane Worker 2 [INFO] Lane Worker 2 waiting for batch...
2016-06-29 05:04:58.502 com.vertica.solutions.kafka.scheduler.StreamCoordinator::Main [INFO] Starting frame @ 2016-06-29 05:04:58.502
2016-06-29 05:04:58.521 com.vertica.solutions.kafka.scheduler.FrameScheduler::Main [INFO] Starting compute batches for new Frame.
2016-06-29 05:04:58.604 com.vertica.solutions.kafka.scheduler.FrameScheduler::Main [INFO] Completed computing batch set for current Frame.

 

and so on...

 

On node2 I see:

 

2016-06-29 05:04:37.782 com.vertica.solutions.kafka.scheduler.LeaderSelector::Main  [ERROR] Caught SQLException during Leadership Lock Procedure. Rolling back txn.
java.sql.SQLTransactionRollbackException: [Vertica][VJDBC](5156) ERROR: Unavailable: initiator locks for query - Locking failure: Timed out X locking Table:kafka_config.kafka_lock. S held by [user dbadmin (SELECT scheduler_id FROM "kafka_config".kafka_lock)]. Your current transaction isolation level is SERIALIZABLE
at com.vertica.util.ServerErrorData.buildException(Unknown Source)
at com.vertica.dataengine.VResultSet.fetchChunk(Unknown Source)
at com.vertica.dataengine.VResultSet.initialize(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.readExecuteResponse(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.handleExecuteResponse(Unknown Source)
at com.vertica.dataengine.VQueryExecutor.execute(Unknown Source)
at com.vertica.jdbc.common.SStatement.executeNoParams(Unknown Source)
at com.vertica.jdbc.common.SStatement.executeQuery(Unknown Source)
at com.vertica.solutions.kafka.scheduler.LeaderSelector.lock(LeaderSelector.java:99)
at com.vertica.solutions.kafka.scheduler.StreamCoordinator.run(StreamCoordinator.java:206)
at com.vertica.solutions.kafka.Launcher.run(Launcher.java:97)
at com.vertica.solutions.kafka.Launcher.main(Launcher.java:150)
Caused by: com.vertica.support.exceptions.TransactionRollbackException: [Vertica][VJDBC](5156) ERROR: Unavailable: initiator locks for query - Locking failure: Timed out X locking Table:kafka_config.kafka_lock. S held by [user dbadmin (SELECT scheduler_id FROM "kafka_config".kafka_lock)]. Your current transaction isolation level is SERIALIZABLE

looks like two schedulers conflict with each other, but why I see exceptions? How can I fix it?

Comments

  • Hi 


    The lock timeout is not an error. It's likely expected behavior. You'd have to look at the locks table to see why that other transaction is holding a lock. You can't simply look at the transactions current statement, since locks are held until commit or rollback (and transactions can have many statements). Although looking out here it looks like the S lock was aquired by dbadmin user on the same table "kafka_lock" where it timed out (depending on LockTimeout parameter set in your cluster) waiting for it to release it to perform other operations which required exclusive lock.

     

    Caused by: com.vertica.support.exceptions.TransactionRollbackException: [Vertica][VJDBC](5156) ERROR: Unavailable: initiator locks for query - Locking failure: Timed out X locking Table:kafka_config.kafka_lock. S held by [user dbadmin (SELECT scheduler_id FROM "kafka_config".kafka_lock)]. Your current transaction isolation level is SERIALIZABLE

     

     

     

    Please run select * from locks; when you see this error and it will show which transaction is actually holding the lock.

     

     

    Thanks

    Rahul

  • Hello!

     

    As expected, I see that the kafka shceduler form node1 acquire lock, for write datas from kafka into vertica. But I wonder, why the node2 kafka scheduler, don't go to the stand-by mode for hand-off the node1, in case it fail? Whe the node2, instead of be in stand-by mode, try do:

    SELECT scheduler_id FROM "kafka_config".kafka_lock

     

    dbadmin=> select * from locks;
    -[ RECORD 1 ]-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    node_names | v_test_node0001,v_test_node0002,v_test_node0003,v_test_node0004,v_test_node0005,v_test_node0006
    object_name | Table:kafka_config.kafka_lock
    object_id | 45035996273751294
    transaction_id | 45035996273905125
    transaction_description | Txn: a0000000030de5 'SELECT scheduler_id FROM "kafka_config".kafka_lock'
    lock_mode | S
    lock_scope | TRANSACTION
    request_timestamp | 2016-06-30 03:32:12.753546-05
    grant_timestamp | 2016-06-30 03:32:12.753548-05
    -[ RECORD 2 ]-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    node_names | v_test_node0001,v_test_node0002,v_test_node0003,v_test_node0004,v_test_node0005,v_test_node0006
    object_name | Table:public.kafka_target1
    object_id | 45035996273760724
    transaction_id | 45035996273923154
    transaction_description | Txn: a0000000035452 'COPY "public"."kafka_target1" SOURCE KafkaSource(stream='requests|0|7043139765,requests|1|7022313835,requests|2|7022309029', brokers='kaf01xxx:9092,kaf02xxx:9092,1kaf03xxx:9092', duration=interval '9806 milliseconds', stop_on_eof=true, executionparallelism=1 ) PARSER KafkaJSONParser() REJECTED DATA AS TABLE kafka_target1_rej TRICKLE NO COMMIT'
    lock_mode | I
    lock_scope | TRANSACTION
    request_timestamp | 2016-06-30 06:09:51.359489-05
    grant_timestamp | 2016-06-30 06:09:51.359493-05

     

  • What does your Kafka scheduler config look like?

  • Hi!

     

    node1$ ./vkconfig scheduler --edit --config-schema kafka_config --brokers kafka01.rtty.in:9092,kafka02:9092,kafka03:9092 --username dbadmin --password xxx

    node1$ ./vkconfig topic --add --config-schema kafka_config --target kafka_target1 --rejection-table kafka_target1_rej --topic requests --num-partitions 3 --parser KafkaJSONParser --username dbadmin --password xxx

    node1$ cat configFile.properties
    config-schema=kafka_config
    instance-name=db01
    username=dbadmin
    password=xxx

    node2$ cat configFile.properties
    config-schema=kafka_config
    instance-name=db02
    username=dbadmin
    password=xxx

    node1$ ./vkconfig launch --conf configFile.properties&
    node2$ ./vkconfig launch --conf configFile.properties&
  • Hi 

     

    I have similiar issue,did you find  a solution

     

  • Hi!

     

    Unfortunately, no! I'm don't find any work around or solution, looks like "hand-off" doesn't work.

  • Does anyone have an update on this? I'm facing the same issue.
    Thanks!

  • edited May 2018

    Any update? I see locks like this all the time.

    Txn: a000002af88923 'SELECT scheduler_id FROM "stream_scheduler".stream_lock'

  • SruthiASruthiA Administrator

    What is your vertica version @asaunders ?

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file