Options

kafka_config.kafka_offsets table has 200,000 rows after a week

Hi,

 

I have a vertica connected to kafka for one week and have over 200,000 rows in the kafka_config.kafka_offsets table. It adds 3 rows every 10 seconds. Normal? I think this will cause me problems after a few months!

 

Thanks,

Drew

Comments

  • Options

    Any thoughts on this? I have 1.6MM rows of data in kafka_config.kafka_offsets after just over a month.

     

    Thanks,

    Drew

  • Options

    Has any one experienced this?

  • Options

    I tried to remove old entries from the kafka_config.kafka_offsets table:

     

    delete from kafka_config.kafka_offsets where transaction_id < 45035996273705601

     

    but ran into an errors:

     

    [Vertica][VJDBC](6460) ERROR: UPDATE/DELETE/MERGE a table with aggregate projections is not supported

  • Options

    No one else is having this issue?

  • Options
    MAWMAW Employee

    Hello Drew

    I too am seeing this (albeit in Vertica 8.1 - where this table is now called stream_microbatch_history). As I understand this, it records 1 row per source/target per microbatch. In my setup, that's 11 source/targets every 10 seconds (the default microbatch duration).

    This (and other) tables are not being automatically cleared down (even in Vertica 8.x).

    I will investigate/research and let you know what I find.

    Regards
    Mark

  • Options
    [Deleted User][Deleted User] Administrator

    Delete will not work as this table has live agregate projection.

    This tables partitioned by following expression. You may try doing drop_partition on partitions that are old. You need to be super careful that you don't drop partition that has current end_offset for a target schema,table,cluster, topic and partition .

    date_part('day', kafka_offsets.batch_start)

    Use following query that can help identify partitions that are REQUIRED and can't be dropped.

    select distinct partition from (select target_schema,target_table,kcluster,ktopic,kpartition,date_part('day', max(kafka_offsets.batch_start)) as partition from kafka_config.kafka_offsets group by 1,2,3,4,5) as tab;

  • Options
    [Deleted User][Deleted User] Administrator

    You may also use following query to identify partition that Vertica kafka integration needs and should not be dropped.

    select distinct date_part('day', batch_start) from kafka_config.kafka_offsets_topk;

  • Options

    Thanks, @MAW -- It's nice to hear that someone else experiences this behavior. It's shocking to think it has been designed this way and continues to function in this manner in 8.x.

  • Options
    [Deleted User][Deleted User] Administrator

    We have opened enhancement request for Vertica to handle automatic prunning of these tables .

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file