[jira] [Created] (FLINK-6109) Add "consumer lag" report metric to FlinkKafkaConsumer

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-6109) Add "consumer lag" report metric to FlinkKafkaConsumer

Shang Yuanchun (Jira)
Tzu-Li (Gordon) Tai created FLINK-6109:
------------------------------------------

             Summary: Add "consumer lag" report metric to FlinkKafkaConsumer
                 Key: FLINK-6109
                 URL: https://issues.apache.org/jira/browse/FLINK-6109
             Project: Flink
          Issue Type: New Feature
          Components: Kafka Connector, Streaming Connectors
            Reporter: Tzu-Li (Gordon) Tai


This is a feature discussed in this ML: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Telling-if-a-job-has-caught-up-with-Kafka-td12261.html.

As discussed, we can expose two kinds of "consumer lag" metrics for this:

 - "current consumer lag for partition": the current difference between the latest offset and the last collected record record of a partition. This metric is calculated and updated at a configurable interval. This metric basically serves as an indicator of how the consumer is keeping up with the head of partitions. I propose to name this `currentOffsetLag`.

 - "Consumer lag of last checkpoint": the difference between the latest offset and the offset stored in the checkpoint of a partition. This metric is only updated when checkpoints are completed. It serves as an indicator of how much data may need to be replayed in case of a failure. I propose to name this `lastCheckpointedOffsetLag`.

The granularity of the metric is per-FlinkKafkaConsumer, and independent of the consumer group.id used (the offset used to calculate consumer lag is the internal offset state of the FlinkKafkaConsumer, not the consumer group's committed offsets in Kafka).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)