[jira] [Created] (FLINK-20829) flink.jm.downtime metric is inaccurate in flink 1.9.1 and 1.11.1

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-20829) flink.jm.downtime metric is inaccurate in flink 1.9.1 and 1.11.1

Shang Yuanchun (Jira)
Yu Yang created FLINK-20829:
-------------------------------

             Summary: flink.jm.downtime metric is inaccurate in flink 1.9.1 and 1.11.1
                 Key: FLINK-20829
                 URL: https://issues.apache.org/jira/browse/FLINK-20829
             Project: Flink
          Issue Type: Bug
          Components: API / Scala, Runtime / Metrics
    Affects Versions: 1.11.1, 1.9.1
            Reporter: Yu Yang
         Attachments: Screen Shot 2021-01-01 at 2.38.39 PM.png

According to the comments in [DownTimeGauge.java|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/metrics/DownTimeGauge.java#L28]:

 

 A gauge that returns (in milliseconds) how long a job has not been not running any more, in case  it is in a failing/recovering situation. Running jobs return naturally a value of zero.

 

We noticed that flink runtime reports inaccurate value for flink.jm.downtime metric.  What flink reports was actually the uptime in milliseconds before the application restarted. 

 

!Screen Shot 2021-01-01 at 2.38.39 PM.png|width=720!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)