We're Moving!

The Vertica Forum is moving to a new OpenText Analytics Database (Vertica) Community.

Join us there to post discussion topics, learn about

product releases, share tips, access the blog, and much more.

Create My New Community Account Now


kafka_config.kafka_offsets table has 200,000 rows after a week — Vertica Forum

kafka_config.kafka_offsets table has 200,000 rows after a week

Hi,

 

I have a vertica connected to kafka for one week and have over 200,000 rows in the kafka_config.kafka_offsets table. It adds 3 rows every 10 seconds. Normal? I think this will cause me problems after a few months!

 

Thanks,

Drew

Comments

  • Any thoughts on this? I have 1.6MM rows of data in kafka_config.kafka_offsets after just over a month.

     

    Thanks,

    Drew

  • Has any one experienced this?

  • I tried to remove old entries from the kafka_config.kafka_offsets table:

     

    delete from kafka_config.kafka_offsets where transaction_id < 45035996273705601

     

    but ran into an errors:

     

    [Vertica][VJDBC](6460) ERROR: UPDATE/DELETE/MERGE a table with aggregate projections is not supported

  • No one else is having this issue?

  • MAWMAW Employee

    Hello Drew

    I too am seeing this (albeit in Vertica 8.1 - where this table is now called stream_microbatch_history). As I understand this, it records 1 row per source/target per microbatch. In my setup, that's 11 source/targets every 10 seconds (the default microbatch duration).

    This (and other) tables are not being automatically cleared down (even in Vertica 8.x).

    I will investigate/research and let you know what I find.

    Regards
    Mark

  • [Deleted User][Deleted User] Administrator

    Delete will not work as this table has live agregate projection.

    This tables partitioned by following expression. You may try doing drop_partition on partitions that are old. You need to be super careful that you don't drop partition that has current end_offset for a target schema,table,cluster, topic and partition .

    date_part('day', kafka_offsets.batch_start)

    Use following query that can help identify partitions that are REQUIRED and can't be dropped.

    select distinct partition from (select target_schema,target_table,kcluster,ktopic,kpartition,date_part('day', max(kafka_offsets.batch_start)) as partition from kafka_config.kafka_offsets group by 1,2,3,4,5) as tab;

  • [Deleted User][Deleted User] Administrator

    You may also use following query to identify partition that Vertica kafka integration needs and should not be dropped.

    select distinct date_part('day', batch_start) from kafka_config.kafka_offsets_topk;

  • Thanks, @MAW -- It's nice to hear that someone else experiences this behavior. It's shocking to think it has been designed this way and continues to function in this manner in 8.x.

  • [Deleted User][Deleted User] Administrator

    We have opened enhancement request for Vertica to handle automatic prunning of these tables .

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file