Kaibo Zhou created FLINK-13787:
----------------------------------
Summary: PrometheusPushGatewayReporter does not cleanup TM metrics when run on kubernetes
Key: FLINK-13787
URL:
https://issues.apache.org/jira/browse/FLINK-13787 Project: Flink
Issue Type: Bug
Components: Runtime / Metrics
Affects Versions: 1.8.1, 1.7.2, 1.9.0
Reporter: Kaibo Zhou
I have run a flink job on kubernetes and use PrometheusPushGatewayReporter, I can see the metrics from the flink jobmanager and taskmanager from the push gateway's UI.
When I cancel the job, I found the jobmanager's metrics disappear, but the taskmanager's metrics still exist, even though I have set the _deleteOnShutdown_ to true_._
The configuration is:
{code:java}
metrics.reporters: "prom"
metrics.reporter.prom.class: "org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter"
metrics.reporter.prom.jobName: "WordCount"
metrics.reporter.prom.host: "localhost"
metrics.reporter.prom.port: "9091"
metrics.reporter.prom.randomJobNameSuffix: "true"
metrics.reporter.prom.filterLabelValueCharacters: "true"
metrics.reporter.prom.deleteOnShutdown: "true"
{code}
Other people have also encountered this problem: [link|[
https://stackoverflow.com/questions/54420498/flink-prometheus-push-gateway-reporter-delete-metrics-on-job-shutdown]].
And another similar issue: [FLINK-11457|
https://issues.apache.org/jira/browse/FLINK-11457].
As prometheus is a very import metrics system on kubernetes, if we can solve this problem, it is beneficial for users to monitor their flink jobs.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)