kafka_config.kafka_offsets table has 200,000 rows after a week

ScubaDrew · September 2016

Hi,

I have a vertica connected to kafka for one week and have over 200,000 rows in the kafka_config.kafka_offsets table. It adds 3 rows every 10 seconds. Normal? I think this will cause me problems after a few months!

Thanks,

Drew

ScubaDrew · October 2016

Any thoughts on this? I have 1.6MM rows of data in kafka_config.kafka_offsets after just over a month.

Thanks,

Drew

ScubaDrew · December 2016

Has any one experienced this?

ScubaDrew · January 2017

I tried to remove old entries from the kafka_config.kafka_offsets table:

delete from kafka_config.kafka_offsets where transaction_id < 45035996273705601

but ran into an errors:

[Vertica][VJDBC](6460) ERROR: UPDATE/DELETE/MERGE a table with aggregate projections is not supported

ScubaDrew · May 2017

No one else is having this issue?

MAW · May 2017

Hello Drew

I too am seeing this (albeit in Vertica 8.1 - where this table is now called stream_microbatch_history). As I understand this, it records 1 row per source/target per microbatch. In my setup, that's 11 source/targets every 10 seconds (the default microbatch duration).

This (and other) tables are not being automatically cleared down (even in Vertica 8.x).

I will investigate/research and let you know what I find.

Regards
Mark

[Deleted User] · May 2017

Delete will not work as this table has live agregate projection.

This tables partitioned by following expression. You may try doing drop_partition on partitions that are old. You need to be super careful that you don't drop partition that has current end_offset for a target schema,table,cluster, topic and partition .

date_part('day', kafka_offsets.batch_start)

Use following query that can help identify partitions that are REQUIRED and can't be dropped.

select distinct partition from (select target_schema,target_table,kcluster,ktopic,kpartition,date_part('day', max(kafka_offsets.batch_start)) as partition from kafka_config.kafka_offsets group by 1,2,3,4,5) as tab;

[Deleted User] · May 2017

You may also use following query to identify partition that Vertica kafka integration needs and should not be dropped.

select distinct date_part('day', batch_start) from kafka_config.kafka_offsets_topk;

ScubaDrew · May 2017

Thanks, @MAW -- It's nice to hear that someone else experiences this behavior. It's shocking to think it has been designed this way and continues to function in this manner in 8.x.

[Deleted User] · May 2017

We have opened enhancement request for Vertica to handle automatic prunning of these tables .

We're Moving!

Create My New Community Account Now

kafka_config.kafka_offsets table has 200,000 rows after a week

Comments

Leave a Comment