Object replication and kafka offset
Hello Support Team!
New version of kafka scheduler in Vertica 10 uses epoch for sorting records in stream_microbatch_history
SELECT * FROM (SELECT *, extract(milliseconds from last_batch_duration::interval second) as ld FROM "KAFKA".stream_microbatch_history LIMIT 1 OVER (PARTITION BY microbatch_id, target_schema, target_table, source_name, source_cluster, source_partition ORDER BY epoch DESC)) as lastframe ORDER BY microbatch_id, target_schema, target_table, source_name, source_cluster, source_partition
But if this table is replicated from another DB all records has the same epoch.
select epoch,count(*) FROM KAFKA.stream_microbatch_history group by 1 order by 1; epoch | count -------+--------- 48421 | 7552259
And then kafka scheduler estimates incorrect (almost random) last offset from stream_microbatch_history
Is there are any workaround besides of manually updating last records in stream_microbatch_history after replication to change epoch?
UPDATE KAFKA.stream_microbatch_history SET batch_start = batch_start , end_offset = end_offset WHERE ( microbatch_id, target_schema, target_table, source_name, source_cluster, source_partition, batch_start, start_offset, end_offset ) IN ( SELECT microbatch_id, target_schema, target_table, source_name, source_cluster, source_partition, batch_start, start_offset, end_offset FROM KAFKA.stream_microbatch_history LIMIT 1 OVER ( PARTITION BY microbatch_id, target_schema, target_table, source_name, source_cluster, source_partition ORDER BY batch_start DESC ) )
Used version: Vertica Analytic Database v10.1.0-0
Tagged:
0
Answers
Hello could you open a support ticket for this issue so that we can properly track the issue? Your workaround seem fine if you have only one epoch in the replicated table.
Could you explain your scenario (here or in a support ticket)? Are you replicating the microbatch_history table with the intent of running another scheduler from the replicated DB and stopping the scheduler on the source DB? In that case, it might be simpler to only copy the last records in microbatch_history rather than the full table?
No, I'am using the community edition.
I use replication in two scenarios:
1. Recreating short (without some big tables) rel stage db from production for testing
2. Replicating short recovery db for quick switch with current production db in case of disaster
It's almost the same workaround.
So this is an intentional choice (sort by epoch) and you are not planning to change it?
What about sorting by batch_start / batch_end ?
Sorting by batch_start/batch_end was the previous method. This was changed in recent versions in order to resolve other issues. I will open an internal ticket about the scenario where microbatch_history is replicated. In the meantime, you'll have to use your workaround right before starting your replicated scheduler.
I think it's better to sort by epoch + batch_start if there are no more pitfalls )
Thanks for reporting the issue when table is replicated. This will be addressed in upcoming version(s).
And what about backups? Are the rows epochs saved in the backup?
The fix will address all circumstances where there isn't one epoch per microbatch run.
Thanks! Glad to hear )