Tzu-Li (Gordon) Tai created FLINK-6109:
------------------------------------------
Summary: Add "consumer lag" report metric to FlinkKafkaConsumer
Key: FLINK-6109
URL:
https://issues.apache.org/jira/browse/FLINK-6109 Project: Flink
Issue Type: New Feature
Components: Kafka Connector, Streaming Connectors
Reporter: Tzu-Li (Gordon) Tai
This is a feature discussed in this ML:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Telling-if-a-job-has-caught-up-with-Kafka-td12261.html.
As discussed, we can expose two kinds of "consumer lag" metrics for this:
- "current consumer lag for partition": the current difference between the latest offset and the last collected record record of a partition. This metric is calculated and updated at a configurable interval. This metric basically serves as an indicator of how the consumer is keeping up with the head of partitions. I propose to name this `currentOffsetLag`.
- "Consumer lag of last checkpoint": the difference between the latest offset and the offset stored in the checkpoint of a partition. This metric is only updated when checkpoints are completed. It serves as an indicator of how much data may need to be replayed in case of a failure. I propose to name this `lastCheckpointedOffsetLag`.
The granularity of the metric is per-FlinkKafkaConsumer, and independent of the consumer group.id used (the offset used to calculate consumer lag is the internal offset state of the FlinkKafkaConsumer, not the consumer group's committed offsets in Kafka).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)