[Kafka Streaming] Recover scheduler and start loading from newest offset first

If the scheduler goes down for 2 hrs, for example,  when it comes back, is there a way to get the latest data first ? They’d  like to prioritize the latest data to be processed first, then go back to read older data, any recommendations?  Or  if it is possible to have 2 schedulers which read from different beginning and ending offsets, it seems this is NOT an option but please confirm.

 

Question credit to Wang, Wei (HPSW Big Data Platform Presales)

Comments

  • At this time this type of configuration is not possible. It is possible to start from specific offsets which can be configured manually, however I would not recommend this approach as setting the offset back after collecting the recent data would likely cause data duplication.

     

    Answer credit to Mark Fay

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file