[jira] [Created] (FLINK-11457) PrometheusPushGatewayReporter either overwrites its own metrics or creates too may labels

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-11457) PrometheusPushGatewayReporter either overwrites its own metrics or creates too may labels

Shang Yuanchun (Jira)
Oscar Westra van Holthe - Kind created FLINK-11457:
------------------------------------------------------

             Summary: PrometheusPushGatewayReporter either overwrites its own metrics or creates too may labels
                 Key: FLINK-11457
                 URL: https://issues.apache.org/jira/browse/FLINK-11457
             Project: Flink
          Issue Type: Bug
            Reporter: Oscar Westra van Holthe - Kind


When using the PrometheusPushGatewayReporter, one has two options:
 * Use a fixed job name, which causes the jobmanager and taskmanager to overwrite each others metrics (i.e. last write wins, and you lose a lot of metrics)
 * Use a random suffix for the job name, which creates a lot of labels that have to be cleaned up manually

The manual cleanup should not be necessary, but happens nonetheless when using a yarn cluster.

A fix could be to add a suffix the job name, naming the nodes in a non-random manner like: {{myjob_jm0}}, {{my_job_tm1}}, {{my_job_tm1}}, {{my_job_tm2}}, {{my_job_tm3}}, {{my_job_tm4}}, ..., using a counter (not sure if such is available), or some other stable (!) suffix.

Related discussion: FLINK-9187

 

Any thoughts on a solution? I'm happy to implement it, but Im not sure what the best solution would be.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)