Hi all!
Several users have asked in the past about a Kafka based metrics reporter which can serve as a natural connector between arbitrary metric storage systems and a straightforward way to process Flink metrics downstream. I think this would be an extremely useful addition but I would like to hear what others in the dev community think about it before submitting a proper proposal. There are at least 3 questions to discuss here: *1. Do we want the Kafka metrics reporter in the Flink repo?* As it is much more generic than other metrics reporters already included, I would say yes. Also as almost everyone uses Flink with Kafka it would be a natural reporter choice for a lot of users. *2. How should we handle the Kafka dependency of the connector?* I think it would be an overkill to add different Kafka versions here, so I would use Kafka 2.+ which has the best compatibility and is future proof *3. What message format should we use?* I would go with JSON for readability and compatibility There is a relevant JIRA open for this already. https://issues.apache.org/jira/browse/FLINK-14531 We at Cloudera also promote this as a scalable way of pushing metrics to other systems so we are very happy to contribute an implementation or cooperate with others on building it. Please let me know what you think! Cheers, Gyula |
Hi Gyula,
thank you for proposing this. +1 for adding a KafkaMetricsReporter. In terms of the dependency we could go a similar route as for the "universal" Flink Kafka Connector which to my knowledge always tracks the latest Kafka version as of the Flink release and relies on compatibility of the underlying KafkaClient. JSON sounds good to me. Cheers, Konstantin On Sun, Nov 17, 2019 at 1:46 PM Gyula Fóra <[hidden email]> wrote: > Hi all! > > Several users have asked in the past about a Kafka based metrics reporter > which can serve as a natural connector between arbitrary metric storage > systems and a straightforward way to process Flink metrics downstream. > > I think this would be an extremely useful addition but I would like to hear > what others in the dev community think about it before submitting a proper > proposal. > > There are at least 3 questions to discuss here: > > > *1. Do we want the Kafka metrics reporter in the Flink repo?* As it is > much more generic than other metrics reporters already included, I would > say yes. Also as almost everyone uses Flink with Kafka it would be a > natural reporter choice for a lot of users. > *2. How should we handle the Kafka dependency of the connector?* > I think it would be an overkill to add different Kafka versions here, > so I would use Kafka 2.+ which has the best compatibility and is future > proof > *3. What message format should we use?* > I would go with JSON for readability and compatibility > > There is a relevant JIRA open for this already. > https://issues.apache.org/jira/browse/FLINK-14531 > > We at Cloudera also promote this as a scalable way of pushing metrics to > other systems so we are very happy to contribute an implementation or > cooperate with others on building it. > > Please let me know what you think! > > Cheers, > Gyula > -- Konstantin Knauf | Solutions Architect +49 160 91394525 Follow us @VervericaData Ververica <https://www.ververica.com/> -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Tony) Cheng |
Hi Gyula,
Thanks for bringing this up. It is a useful addition to have a Kafka metrics reporter. I understand that we already have Prometheus and DataDog reporters in the Flink main repo. However, personally speaking, I would slightly prefer to have the Kafka metrics reporter as an ecosystem project instead of in the main repo due to the following reasons: 1. To keep core Flink more focused. So in general if a component is more relevant to external system rather than Flink, it might be good to keep it as an ecosystem project. And metrics reporter seems a good example of that. 2. This helps encourage more contributions to Flink ecosystem instead of giving the impression that anything in Flink ecosystem must be in Flink main repo. 3. To facilitate our ecosystem project authors, we have launched a website[1] to help the community keep track of and advertise the ecosystem projects. It looks a good place to put the Kafka metrics reporter. Regarding the message format, while I think use JSON by default is fine as it does not introduce much external dependency, I wonder if we should make the message format pluggable. Many companies probably already have their own serde format for all the Kafka messages. For example, maybe they would like to just use an Avro record for their metrics instead of introducing a new JSON format. Also in many cases, there could be a lot of metric messages sent by the Flink jobs. JSON format is less efficient and might have too much overhead in that case. Thanks, Jiangjie (Becket) Qin [1] https://flink-packages.org/ On Mon, Nov 18, 2019 at 3:30 AM Konstantin Knauf <[hidden email]> wrote: > Hi Gyula, > > thank you for proposing this. +1 for adding a KafkaMetricsReporter. In > terms of the dependency we could go a similar route as for the "universal" > Flink Kafka Connector which to my knowledge always tracks the latest Kafka > version as of the Flink release and relies on compatibility of the > underlying KafkaClient. JSON sounds good to me. > > Cheers, > > Konstantin > > > > > > On Sun, Nov 17, 2019 at 1:46 PM Gyula Fóra <[hidden email]> wrote: > > > Hi all! > > > > Several users have asked in the past about a Kafka based metrics reporter > > which can serve as a natural connector between arbitrary metric storage > > systems and a straightforward way to process Flink metrics downstream. > > > > I think this would be an extremely useful addition but I would like to > hear > > what others in the dev community think about it before submitting a > proper > > proposal. > > > > There are at least 3 questions to discuss here: > > > > > > *1. Do we want the Kafka metrics reporter in the Flink repo?* As it is > > much more generic than other metrics reporters already included, I would > > say yes. Also as almost everyone uses Flink with Kafka it would be a > > natural reporter choice for a lot of users. > > *2. How should we handle the Kafka dependency of the connector?* > > I think it would be an overkill to add different Kafka versions here, > > so I would use Kafka 2.+ which has the best compatibility and is future > > proof > > *3. What message format should we use?* > > I would go with JSON for readability and compatibility > > > > There is a relevant JIRA open for this already. > > https://issues.apache.org/jira/browse/FLINK-14531 > > > > We at Cloudera also promote this as a scalable way of pushing metrics to > > other systems so we are very happy to contribute an implementation or > > cooperate with others on building it. > > > > Please let me know what you think! > > > > Cheers, > > Gyula > > > > > -- > > Konstantin Knauf | Solutions Architect > > +49 160 91394525 > > > Follow us @VervericaData Ververica <https://www.ververica.com/> > > > -- > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > Conference > > Stream Processing | Event Driven | Real Time > > -- > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > -- > Ververica GmbH > Registered at Amtsgericht Charlottenburg: HRB 158244 B > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji > (Tony) Cheng > |
Hi all
Glad to see this topic in community. We at Alibaba also implemented a kafka metrics reporter and extend it to other message queues like Alibaba cloud log service [1] half a year ago. The reason why we not launch a similar discussion is that we previously thought we only provide a way to report metrics to kafka. Unlike current supported metrics reporter, e.g. InfluxDB, Graphite, they all have an easy-to-use data source in grafana to visualize metrics. Even with kafka metrics reporter, we still need another way to consume data out and work as a data source for observability platform, and this would be diverse for different companies. I think this is the main concern to include this in a popular open-source main repo, and I pretty agree with Becket's suggestion to contribute this as a flink-package and we could offer an end-to-end solution including how to visualize these metrics data. [1] https://www.alibabacloud.com/help/doc-detail/29003.htm Best Yun Tang On 11/18/19, 8:19 AM, "Becket Qin" <[hidden email]> wrote: Hi Gyula, Thanks for bringing this up. It is a useful addition to have a Kafka metrics reporter. I understand that we already have Prometheus and DataDog reporters in the Flink main repo. However, personally speaking, I would slightly prefer to have the Kafka metrics reporter as an ecosystem project instead of in the main repo due to the following reasons: 1. To keep core Flink more focused. So in general if a component is more relevant to external system rather than Flink, it might be good to keep it as an ecosystem project. And metrics reporter seems a good example of that. 2. This helps encourage more contributions to Flink ecosystem instead of giving the impression that anything in Flink ecosystem must be in Flink main repo. 3. To facilitate our ecosystem project authors, we have launched a website[1] to help the community keep track of and advertise the ecosystem projects. It looks a good place to put the Kafka metrics reporter. Regarding the message format, while I think use JSON by default is fine as it does not introduce much external dependency, I wonder if we should make the message format pluggable. Many companies probably already have their own serde format for all the Kafka messages. For example, maybe they would like to just use an Avro record for their metrics instead of introducing a new JSON format. Also in many cases, there could be a lot of metric messages sent by the Flink jobs. JSON format is less efficient and might have too much overhead in that case. Thanks, Jiangjie (Becket) Qin [1] https://flink-packages.org/ On Mon, Nov 18, 2019 at 3:30 AM Konstantin Knauf <[hidden email]> wrote: > Hi Gyula, > > thank you for proposing this. +1 for adding a KafkaMetricsReporter. In > terms of the dependency we could go a similar route as for the "universal" > Flink Kafka Connector which to my knowledge always tracks the latest Kafka > version as of the Flink release and relies on compatibility of the > underlying KafkaClient. JSON sounds good to me. > > Cheers, > > Konstantin > > > > > > On Sun, Nov 17, 2019 at 1:46 PM Gyula Fóra <[hidden email]> wrote: > > > Hi all! > > > > Several users have asked in the past about a Kafka based metrics reporter > > which can serve as a natural connector between arbitrary metric storage > > systems and a straightforward way to process Flink metrics downstream. > > > > I think this would be an extremely useful addition but I would like to > hear > > what others in the dev community think about it before submitting a > proper > > proposal. > > > > There are at least 3 questions to discuss here: > > > > > > *1. Do we want the Kafka metrics reporter in the Flink repo?* As it is > > much more generic than other metrics reporters already included, I would > > say yes. Also as almost everyone uses Flink with Kafka it would be a > > natural reporter choice for a lot of users. > > *2. How should we handle the Kafka dependency of the connector?* > > I think it would be an overkill to add different Kafka versions here, > > so I would use Kafka 2.+ which has the best compatibility and is future > > proof > > *3. What message format should we use?* > > I would go with JSON for readability and compatibility > > > > There is a relevant JIRA open for this already. > > https://issues.apache.org/jira/browse/FLINK-14531 > > > > We at Cloudera also promote this as a scalable way of pushing metrics to > > other systems so we are very happy to contribute an implementation or > > cooperate with others on building it. > > > > Please let me know what you think! > > > > Cheers, > > Gyula > > > > > -- > > Konstantin Knauf | Solutions Architect > > +49 160 91394525 > > > Follow us @VervericaData Ververica <https://www.ververica.com/> > > > -- > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > Conference > > Stream Processing | Event Driven | Real Time > > -- > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > -- > Ververica GmbH > Registered at Amtsgericht Charlottenburg: HRB 158244 B > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji > (Tony) Cheng > |
Hi,
What still unclear to me so far is - As I don't see any yet., what would be the fundamental differences between this Kafka reporter and Flink’s existing Kafka producer? I’ve been thinking of Flink metrics for a while, and the “metric reporter” feels a bit redundant to me. As you may already knew, Flink has been used to process external metrics in various companies. If you think about it, Flink’s own metric system is no different from external ones and actually just another stream source, and metric reporters are just some data sinks writing to external storage, with no guarantee or checkpointing. So instead of adding Kafka or other MQ reporters and worrying about message format (which are already solved by Flink’s sinks), we can generalize and expose Flink’s metrics system to be a simple built-in stream source, and "metric reporters" are just some customized sink tailored for this source. Users may even be able to access and process it in stream environment with data stream api. That give users full flexibility on manipulating Flink metrics with Flink, and it’s more of a “eat your own dogfood” philosophy. This seems too good to be true, and I haven't had time to think of the details. Let me know if I miss anything here. On Mon, Nov 18, 2019 at 09:51 Yun Tang <[hidden email]> wrote: > Hi all > > Glad to see this topic in community. > We at Alibaba also implemented a kafka metrics reporter and extend it to > other message queues like Alibaba cloud log service [1] half a year ago. > The reason why we not launch a similar discussion is that we previously > thought we only provide a way to report metrics to kafka. Unlike current > supported metrics reporter, e.g. InfluxDB, Graphite, they all have an > easy-to-use data source in grafana to visualize metrics. Even with kafka > metrics reporter, we still need another way to consume data out and work as > a data source for observability platform, and this would be diverse for > different companies. > > I think this is the main concern to include this in a popular open-source > main repo, and I pretty agree with Becket's suggestion to contribute this > as a flink-package and we could offer an end-to-end solution including how > to visualize these metrics data. > > [1] https://www.alibabacloud.com/help/doc-detail/29003.htm > > Best > Yun Tang > > On 11/18/19, 8:19 AM, "Becket Qin" <[hidden email]> wrote: > > Hi Gyula, > > Thanks for bringing this up. It is a useful addition to have a Kafka > metrics reporter. I understand that we already have Prometheus and > DataDog > reporters in the Flink main repo. However, personally speaking, I would > slightly prefer to have the Kafka metrics reporter as an ecosystem > project > instead of in the main repo due to the following reasons: > > 1. To keep core Flink more focused. So in general if a component is > more > relevant to external system rather than Flink, it might be good to > keep it > as an ecosystem project. And metrics reporter seems a good example of > that. > 2. This helps encourage more contributions to Flink ecosystem instead > of > giving the impression that anything in Flink ecosystem must be in Flink > main repo. > 3. To facilitate our ecosystem project authors, we have launched a > website[1] to help the community keep track of and advertise the > ecosystem > projects. It looks a good place to put the Kafka metrics reporter. > > Regarding the message format, while I think use JSON by default is > fine as > it does not introduce much external dependency, I wonder if we should > make > the message format pluggable. Many companies probably already have > their > own serde format for all the Kafka messages. For example, maybe they > would > like to just use an Avro record for their metrics instead of > introducing a > new JSON format. Also in many cases, there could be a lot of metric > messages sent by the Flink jobs. JSON format is less efficient and > might > have too much overhead in that case. > > Thanks, > > Jiangjie (Becket) Qin > > [1] https://flink-packages.org/ > > > On Mon, Nov 18, 2019 at 3:30 AM Konstantin Knauf < > [hidden email]> > wrote: > > > Hi Gyula, > > > > thank you for proposing this. +1 for adding a KafkaMetricsReporter. > In > > terms of the dependency we could go a similar route as for the > "universal" > > Flink Kafka Connector which to my knowledge always tracks the latest > Kafka > > version as of the Flink release and relies on compatibility of the > > underlying KafkaClient. JSON sounds good to me. > > > > Cheers, > > > > Konstantin > > > > > > > > > > > > On Sun, Nov 17, 2019 at 1:46 PM Gyula Fóra <[hidden email]> > wrote: > > > > > Hi all! > > > > > > Several users have asked in the past about a Kafka based metrics > reporter > > > which can serve as a natural connector between arbitrary metric > storage > > > systems and a straightforward way to process Flink metrics > downstream. > > > > > > I think this would be an extremely useful addition but I would > like to > > hear > > > what others in the dev community think about it before submitting a > > proper > > > proposal. > > > > > > There are at least 3 questions to discuss here: > > > > > > > > > *1. Do we want the Kafka metrics reporter in the Flink repo?* > As it is > > > much more generic than other metrics reporters already included, I > would > > > say yes. Also as almost everyone uses Flink with Kafka it would be > a > > > natural reporter choice for a lot of users. > > > *2. How should we handle the Kafka dependency of the connector?* > > > I think it would be an overkill to add different Kafka > versions here, > > > so I would use Kafka 2.+ which has the best compatibility and is > future > > > proof > > > *3. What message format should we use?* > > > I would go with JSON for readability and compatibility > > > > > > There is a relevant JIRA open for this already. > > > https://issues.apache.org/jira/browse/FLINK-14531 > > > > > > We at Cloudera also promote this as a scalable way of pushing > metrics to > > > other systems so we are very happy to contribute an implementation > or > > > cooperate with others on building it. > > > > > > Please let me know what you think! > > > > > > Cheers, > > > Gyula > > > > > > > > > -- > > > > Konstantin Knauf | Solutions Architect > > > > +49 160 91394525 > > > > > > Follow us @VervericaData Ververica <https://www.ververica.com/> > > > > > > -- > > > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > > Conference > > > > Stream Processing | Event Driven | Real Time > > > > -- > > > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > > > -- > > Ververica GmbH > > Registered at Amtsgericht Charlottenburg: HRB 158244 B > > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, > Ji > > (Tony) Cheng > > > > > |
@Bowen I can see where you're coming from, but I don't think this would
work too well. Your "stream" would have to contain events for added/removed metrics, but metrics are inherently not Serializable. I think this would end up being a weird special case. (Periodically emitting the values of all metrics goes against the convention which we established from the very beginning that metrics should only incur costs if necessary; as such a reporter that polls on demand should only consume resources if it was called) Additionally, there are plans to add additional methods to the reporter in the future, at which point the source interface would no longer suffice. At that point you'd need a separate interface again, and wrappers for your sinks. This would result in what is the trivial solution for this reporter right now anyway: have the reporter use a kafka connector internally, with all the features that if offers. Overall I think we'd be unnecessarily coupling reporters to the source interface, and i don't see a true benefit. On 19/11/2019 19:47, Bowen Li wrote: > Hi, > > What still unclear to me so far is - As I don't see any yet., what would be > the fundamental differences between this Kafka reporter and Flink’s > existing Kafka producer? > > I’ve been thinking of Flink metrics for a while, and the “metric reporter” > feels a bit redundant to me. As you may already knew, Flink has been used > to process external metrics in various companies. If you think about it, > Flink’s own metric system is no different from external ones and actually > just another stream source, and metric reporters are just some data sinks > writing to external storage, with no guarantee or checkpointing. > > So instead of adding Kafka or other MQ reporters and worrying about message > format (which are already solved by Flink’s sinks), we can generalize and > expose Flink’s metrics system to be a simple built-in stream source, and > "metric reporters" are just some customized sink tailored for this source. > Users may even be able to access and process it in stream environment with > data stream api. That give users full flexibility on manipulating Flink > metrics with Flink, and it’s more of a “eat your own dogfood” philosophy. > > This seems too good to be true, and I haven't had time to think of the > details. Let me know if I miss anything here. > > > On Mon, Nov 18, 2019 at 09:51 Yun Tang <[hidden email]> wrote: > >> Hi all >> >> Glad to see this topic in community. >> We at Alibaba also implemented a kafka metrics reporter and extend it to >> other message queues like Alibaba cloud log service [1] half a year ago. >> The reason why we not launch a similar discussion is that we previously >> thought we only provide a way to report metrics to kafka. Unlike current >> supported metrics reporter, e.g. InfluxDB, Graphite, they all have an >> easy-to-use data source in grafana to visualize metrics. Even with kafka >> metrics reporter, we still need another way to consume data out and work as >> a data source for observability platform, and this would be diverse for >> different companies. >> >> I think this is the main concern to include this in a popular open-source >> main repo, and I pretty agree with Becket's suggestion to contribute this >> as a flink-package and we could offer an end-to-end solution including how >> to visualize these metrics data. >> >> [1] https://www.alibabacloud.com/help/doc-detail/29003.htm >> >> Best >> Yun Tang >> >> On 11/18/19, 8:19 AM, "Becket Qin" <[hidden email]> wrote: >> >> Hi Gyula, >> >> Thanks for bringing this up. It is a useful addition to have a Kafka >> metrics reporter. I understand that we already have Prometheus and >> DataDog >> reporters in the Flink main repo. However, personally speaking, I would >> slightly prefer to have the Kafka metrics reporter as an ecosystem >> project >> instead of in the main repo due to the following reasons: >> >> 1. To keep core Flink more focused. So in general if a component is >> more >> relevant to external system rather than Flink, it might be good to >> keep it >> as an ecosystem project. And metrics reporter seems a good example of >> that. >> 2. This helps encourage more contributions to Flink ecosystem instead >> of >> giving the impression that anything in Flink ecosystem must be in Flink >> main repo. >> 3. To facilitate our ecosystem project authors, we have launched a >> website[1] to help the community keep track of and advertise the >> ecosystem >> projects. It looks a good place to put the Kafka metrics reporter. >> >> Regarding the message format, while I think use JSON by default is >> fine as >> it does not introduce much external dependency, I wonder if we should >> make >> the message format pluggable. Many companies probably already have >> their >> own serde format for all the Kafka messages. For example, maybe they >> would >> like to just use an Avro record for their metrics instead of >> introducing a >> new JSON format. Also in many cases, there could be a lot of metric >> messages sent by the Flink jobs. JSON format is less efficient and >> might >> have too much overhead in that case. >> >> Thanks, >> >> Jiangjie (Becket) Qin >> >> [1] https://flink-packages.org/ >> >> >> On Mon, Nov 18, 2019 at 3:30 AM Konstantin Knauf < >> [hidden email]> >> wrote: >> >> > Hi Gyula, >> > >> > thank you for proposing this. +1 for adding a KafkaMetricsReporter. >> In >> > terms of the dependency we could go a similar route as for the >> "universal" >> > Flink Kafka Connector which to my knowledge always tracks the latest >> Kafka >> > version as of the Flink release and relies on compatibility of the >> > underlying KafkaClient. JSON sounds good to me. >> > >> > Cheers, >> > >> > Konstantin >> > >> > >> > >> > >> > >> > On Sun, Nov 17, 2019 at 1:46 PM Gyula Fóra <[hidden email]> >> wrote: >> > >> > > Hi all! >> > > >> > > Several users have asked in the past about a Kafka based metrics >> reporter >> > > which can serve as a natural connector between arbitrary metric >> storage >> > > systems and a straightforward way to process Flink metrics >> downstream. >> > > >> > > I think this would be an extremely useful addition but I would >> like to >> > hear >> > > what others in the dev community think about it before submitting a >> > proper >> > > proposal. >> > > >> > > There are at least 3 questions to discuss here: >> > > >> > > >> > > *1. Do we want the Kafka metrics reporter in the Flink repo?* >> As it is >> > > much more generic than other metrics reporters already included, I >> would >> > > say yes. Also as almost everyone uses Flink with Kafka it would be >> a >> > > natural reporter choice for a lot of users. >> > > *2. How should we handle the Kafka dependency of the connector?* >> > > I think it would be an overkill to add different Kafka >> versions here, >> > > so I would use Kafka 2.+ which has the best compatibility and is >> future >> > > proof >> > > *3. What message format should we use?* >> > > I would go with JSON for readability and compatibility >> > > >> > > There is a relevant JIRA open for this already. >> > > https://issues.apache.org/jira/browse/FLINK-14531 >> > > >> > > We at Cloudera also promote this as a scalable way of pushing >> metrics to >> > > other systems so we are very happy to contribute an implementation >> or >> > > cooperate with others on building it. >> > > >> > > Please let me know what you think! >> > > >> > > Cheers, >> > > Gyula >> > > >> > >> > >> > -- >> > >> > Konstantin Knauf | Solutions Architect >> > >> > +49 160 91394525 >> > >> > >> > Follow us @VervericaData Ververica <https://www.ververica.com/> >> > >> > >> > -- >> > >> > Join Flink Forward <https://flink-forward.org/> - The Apache Flink >> > Conference >> > >> > Stream Processing | Event Driven | Real Time >> > >> > -- >> > >> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany >> > >> > -- >> > Ververica GmbH >> > Registered at Amtsgericht Charlottenburg: HRB 158244 B >> > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, >> Ji >> > (Tony) Cheng >> > >> >> >> |
@Becket , Yun:
Regarding the core/ecosystem project: I don't completely agree with your arguments regarding why this should be an external ecosystem project instead of part of the Flink repo. A metric connector is relevant for the Flink users, not the metric store. Metric storage systems don't care about where logs are coming but Flink job authors need a way to get the metrics to whatever systems they have. The same applies for other connectors. If we don't provide canonical ways of communicating with external systems, be it sources, sinks or metrics that makes everyones life a bit harder. Historically most of the connectors went straight to Flink and over time the maintenance of these has become quite a challenge with Flink core itself growing rapidly. I agree that we have to make these decision and not include every new external connector to the Flink core. I think this decision should be based on the value it brings to users, and how often it will be used. These are not easy questions and the Flink ecosystem website is a great way for gauging the popularity/value of a specific connector. Another way of deciding this would be to talk to the Flink community (like we do with this thread) and see if this is a common pattern and if we can come up with a good generic solution regarding Kafka versioning and formats that will work for most. If we see big interest here and have a consensus on the formats I don't see any reason why we shouldn't include it. Regarding the message format: The idea with the JSON format was that it could be an easy-to-use source for downstream metric systems to integrate with it. I don't have much experience with different metric storage systems so maybe Yun you are right that you will always end up having another processor for this. But even in that case JSON is a pretty safe format as it is easy to process no matter what you use. Otherwise I agree that a pluggable format would be much better and more generic. We just need to find a way to keep it simple :D @Bowen, Chensay The whole idea of making a metrics reporter source sounds pretty great at first :) If we could do this that would definitely make this more flexible but even then you probably need some sort of a serialization schema implementation for the metrics by default. Which is basically what the kafka reporter would do + minimal client. Chesnay, I don't completely understand what you mean by: "Periodically emitting the values of all metrics goes against the convention which we established" Isn't this exactly what the Kafka metrics reporter would do anyways? If you could elaborate on this a bit more that would be very helpful for me because I don't have a good overview of the key design principals of the current metric system. Cheers, Gyula On Wed, Nov 20, 2019 at 10:22 AM Chesnay Schepler <[hidden email]> wrote: > @Bowen I can see where you're coming from, but I don't think this would > work too well. Your "stream" would have to contain events for > added/removed metrics, but metrics are inherently not Serializable. I > think this would end up being a weird special case. > > (Periodically emitting the values of all metrics goes against the > convention which we established from the very beginning that metrics > should only incur costs if necessary; as such a reporter that polls on > demand should only consume resources if it was called) > > Additionally, there are plans to add additional methods to the reporter > in the future, at which point the source interface would no longer > suffice. At that point you'd need a separate interface again, and > wrappers for your sinks. > > This would result in what is the trivial solution for this reporter > right now anyway: have the reporter use a kafka connector internally, > with all the features that if offers. > > Overall I think we'd be unnecessarily coupling reporters to the source > interface, and i don't see a true benefit. > > On 19/11/2019 19:47, Bowen Li wrote: > > Hi, > > > > What still unclear to me so far is - As I don't see any yet., what would > be > > the fundamental differences between this Kafka reporter and Flink’s > > existing Kafka producer? > > > > I’ve been thinking of Flink metrics for a while, and the “metric > reporter” > > feels a bit redundant to me. As you may already knew, Flink has been used > > to process external metrics in various companies. If you think about it, > > Flink’s own metric system is no different from external ones and actually > > just another stream source, and metric reporters are just some data sinks > > writing to external storage, with no guarantee or checkpointing. > > > > So instead of adding Kafka or other MQ reporters and worrying about > message > > format (which are already solved by Flink’s sinks), we can generalize and > > expose Flink’s metrics system to be a simple built-in stream source, and > > "metric reporters" are just some customized sink tailored for this > source. > > Users may even be able to access and process it in stream environment > with > > data stream api. That give users full flexibility on manipulating Flink > > metrics with Flink, and it’s more of a “eat your own dogfood” philosophy. > > > > This seems too good to be true, and I haven't had time to think of the > > details. Let me know if I miss anything here. > > > > > > On Mon, Nov 18, 2019 at 09:51 Yun Tang <[hidden email]> wrote: > > > >> Hi all > >> > >> Glad to see this topic in community. > >> We at Alibaba also implemented a kafka metrics reporter and extend it to > >> other message queues like Alibaba cloud log service [1] half a year ago. > >> The reason why we not launch a similar discussion is that we previously > >> thought we only provide a way to report metrics to kafka. Unlike current > >> supported metrics reporter, e.g. InfluxDB, Graphite, they all have an > >> easy-to-use data source in grafana to visualize metrics. Even with kafka > >> metrics reporter, we still need another way to consume data out and > work as > >> a data source for observability platform, and this would be diverse for > >> different companies. > >> > >> I think this is the main concern to include this in a popular > open-source > >> main repo, and I pretty agree with Becket's suggestion to contribute > this > >> as a flink-package and we could offer an end-to-end solution including > how > >> to visualize these metrics data. > >> > >> [1] https://www.alibabacloud.com/help/doc-detail/29003.htm > >> > >> Best > >> Yun Tang > >> > >> On 11/18/19, 8:19 AM, "Becket Qin" <[hidden email]> wrote: > >> > >> Hi Gyula, > >> > >> Thanks for bringing this up. It is a useful addition to have a > Kafka > >> metrics reporter. I understand that we already have Prometheus and > >> DataDog > >> reporters in the Flink main repo. However, personally speaking, I > would > >> slightly prefer to have the Kafka metrics reporter as an ecosystem > >> project > >> instead of in the main repo due to the following reasons: > >> > >> 1. To keep core Flink more focused. So in general if a component is > >> more > >> relevant to external system rather than Flink, it might be good to > >> keep it > >> as an ecosystem project. And metrics reporter seems a good example > of > >> that. > >> 2. This helps encourage more contributions to Flink ecosystem > instead > >> of > >> giving the impression that anything in Flink ecosystem must be in > Flink > >> main repo. > >> 3. To facilitate our ecosystem project authors, we have launched a > >> website[1] to help the community keep track of and advertise the > >> ecosystem > >> projects. It looks a good place to put the Kafka metrics reporter. > >> > >> Regarding the message format, while I think use JSON by default is > >> fine as > >> it does not introduce much external dependency, I wonder if we > should > >> make > >> the message format pluggable. Many companies probably already have > >> their > >> own serde format for all the Kafka messages. For example, maybe > they > >> would > >> like to just use an Avro record for their metrics instead of > >> introducing a > >> new JSON format. Also in many cases, there could be a lot of metric > >> messages sent by the Flink jobs. JSON format is less efficient and > >> might > >> have too much overhead in that case. > >> > >> Thanks, > >> > >> Jiangjie (Becket) Qin > >> > >> [1] https://flink-packages.org/ > >> > >> > >> On Mon, Nov 18, 2019 at 3:30 AM Konstantin Knauf < > >> [hidden email]> > >> wrote: > >> > >> > Hi Gyula, > >> > > >> > thank you for proposing this. +1 for adding a > KafkaMetricsReporter. > >> In > >> > terms of the dependency we could go a similar route as for the > >> "universal" > >> > Flink Kafka Connector which to my knowledge always tracks the > latest > >> Kafka > >> > version as of the Flink release and relies on compatibility of > the > >> > underlying KafkaClient. JSON sounds good to me. > >> > > >> > Cheers, > >> > > >> > Konstantin > >> > > >> > > >> > > >> > > >> > > >> > On Sun, Nov 17, 2019 at 1:46 PM Gyula Fóra <[hidden email]> > >> wrote: > >> > > >> > > Hi all! > >> > > > >> > > Several users have asked in the past about a Kafka based > metrics > >> reporter > >> > > which can serve as a natural connector between arbitrary metric > >> storage > >> > > systems and a straightforward way to process Flink metrics > >> downstream. > >> > > > >> > > I think this would be an extremely useful addition but I would > >> like to > >> > hear > >> > > what others in the dev community think about it before > submitting a > >> > proper > >> > > proposal. > >> > > > >> > > There are at least 3 questions to discuss here: > >> > > > >> > > > >> > > *1. Do we want the Kafka metrics reporter in the Flink repo?* > >> As it is > >> > > much more generic than other metrics reporters already > included, I > >> would > >> > > say yes. Also as almost everyone uses Flink with Kafka it > would be > >> a > >> > > natural reporter choice for a lot of users. > >> > > *2. How should we handle the Kafka dependency of the > connector?* > >> > > I think it would be an overkill to add different Kafka > >> versions here, > >> > > so I would use Kafka 2.+ which has the best compatibility and > is > >> future > >> > > proof > >> > > *3. What message format should we use?* > >> > > I would go with JSON for readability and compatibility > >> > > > >> > > There is a relevant JIRA open for this already. > >> > > https://issues.apache.org/jira/browse/FLINK-14531 > >> > > > >> > > We at Cloudera also promote this as a scalable way of pushing > >> metrics to > >> > > other systems so we are very happy to contribute an > implementation > >> or > >> > > cooperate with others on building it. > >> > > > >> > > Please let me know what you think! > >> > > > >> > > Cheers, > >> > > Gyula > >> > > > >> > > >> > > >> > -- > >> > > >> > Konstantin Knauf | Solutions Architect > >> > > >> > +49 160 91394525 > >> > > >> > > >> > Follow us @VervericaData Ververica <https://www.ververica.com/> > >> > > >> > > >> > -- > >> > > >> > Join Flink Forward <https://flink-forward.org/> - The Apache > Flink > >> > Conference > >> > > >> > Stream Processing | Event Driven | Real Time > >> > > >> > -- > >> > > >> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > >> > > >> > -- > >> > Ververica GmbH > >> > Registered at Amtsgericht Charlottenburg: HRB 158244 B > >> > Managing Directors: Timothy Alexander Steinert, Yip Park Tung > Jason, > >> Ji > >> > (Tony) Cheng > >> > > >> > >> > >> > > |
So this probably doesn't belong in this thread, but here goes:
When you think of the metric system as source and reporters and sinks, one has to consider what he source emits: Either: a) events for added/removed metrics b) periodically emit the values of all metrics, with the plethora of additional scope information the reporters might require Approach a) obviously doesn't work in a distributed setting, but is closest to the current approach. Reporters access metrics / meta-info as needed, and we only use as much resources as we actually need. Approach b) does work in a distributed setting, and conceptually works reasonable well with scheduled reporters (i.e., reporters that periodically write data to the external system, but for reporters who are polled on demand from some external system (prometheus, jmx) this can cause resources to be wasted if the polling interval is larger than the update interval or metrics aren't being polled at all. Additionally, you'd have to include _a lot_ of metadata for each metric to retain current functionality. Naturally this approach also consumes additional network resources. Another concern I have is that we're mixing concerns here; one being about allowing users to process metrics in a datastream fashion (which has quite a few caveats, like not being able to access metrics from the dispatcher / RM since we surely aren't running user-code in them), the other being about handling formats. And I have to point out that the format problem only really applies to kafka. For other reporters we don't have this problem since the backends define the format, and there are just easier ways to handle this without a major rework (wrap kafka connector, have a factory for serialization schemes, _done_) I'd now suggest to move the process-as-a-datastream idea into a different thread because it a) isn't _really_ connected to the reporter it self and b) we already have enough points of contention. As for having the connector in flink vs flink-packages, I'm constantly amazed at how much value people attribute to the source of a reporter being in Flink. After all, there's _nothing_ stopping us from including additional 3rd-party reporters in the distribution. There's also _nothing_ stopping us from linking to 3rd-party reporters in the documentation. In other words, all user-facing parts are agnostic to whether the reporter is maintained by Flink or not, hence I'm not accepting the argument anymore that something must be in Flink for it to be used. It just doesn't make sense. Having every widely used component within Flink is just not maintainable in the long run, as well all know, hence I'm very much in favor of having it maintained externally via flink-packages. That's the very purpose of that site. And, on a final note, there's also _nothing_ stopping us from adding something to Flink after X <time_unit> if it becomes an integral in the way Flink is being used. On 20/11/2019 15:46, Gyula Fóra wrote: > @Becket , Yun: > Regarding the core/ecosystem project: > > I don't completely agree with your arguments regarding why this should be > an external ecosystem project instead of part of the Flink repo. > A metric connector is relevant for the Flink users, not the metric store. > Metric storage systems don't care about where logs are coming but Flink job > authors need a way to get the metrics to whatever systems they have. The > same applies for other connectors. If we don't provide canonical ways of > communicating with external systems, be it sources, sinks or metrics that > makes everyones life a bit harder. > > Historically most of the connectors went straight to Flink and over time > the maintenance of these has become quite a challenge with Flink core > itself growing rapidly. I agree that we have to make these decision and not > include every new external connector to the Flink core. I think this > decision should be based on the value it brings to users, and how often it > will be used. These are not easy questions and the Flink ecosystem website > is a great way for gauging the popularity/value of a specific connector. > > Another way of deciding this would be to talk to the Flink community (like > we do with this thread) and see if this is a common pattern and if we can > come up with a good generic solution regarding Kafka versioning and formats > that will work for most. If we see big interest here and have a consensus > on the formats I don't see any reason why we shouldn't include it. > > Regarding the message format: > The idea with the JSON format was that it could be an easy-to-use source > for downstream metric systems to integrate with it. I don't have much > experience with different metric storage systems so maybe Yun you are right > that you will always end up having another processor for this. But even in > that case JSON is a pretty safe format as it is easy to process no matter > what you use. > Otherwise I agree that a pluggable format would be much better and more > generic. We just need to find a way to keep it simple :D > > @Bowen, Chensay > > The whole idea of making a metrics reporter source sounds pretty great at > first :) If we could do this that would definitely make this more flexible > but even then you probably need some sort of a serialization schema > implementation for the metrics by default. Which is basically what the > kafka reporter would do + minimal client. > > Chesnay, I don't completely understand what you mean by: > "Periodically emitting the values of all metrics goes against the > convention which we established" > Isn't this exactly what the Kafka metrics reporter would do anyways? > > If you could elaborate on this a bit more that would be very helpful for me > because I don't have a good overview of the key design principals of the > current metric system. > > Cheers, > Gyula > > On Wed, Nov 20, 2019 at 10:22 AM Chesnay Schepler <[hidden email]> > wrote: > >> @Bowen I can see where you're coming from, but I don't think this would >> work too well. Your "stream" would have to contain events for >> added/removed metrics, but metrics are inherently not Serializable. I >> think this would end up being a weird special case. >> >> (Periodically emitting the values of all metrics goes against the >> convention which we established from the very beginning that metrics >> should only incur costs if necessary; as such a reporter that polls on >> demand should only consume resources if it was called) >> >> Additionally, there are plans to add additional methods to the reporter >> in the future, at which point the source interface would no longer >> suffice. At that point you'd need a separate interface again, and >> wrappers for your sinks. >> >> This would result in what is the trivial solution for this reporter >> right now anyway: have the reporter use a kafka connector internally, >> with all the features that if offers. >> >> Overall I think we'd be unnecessarily coupling reporters to the source >> interface, and i don't see a true benefit. >> >> On 19/11/2019 19:47, Bowen Li wrote: >>> Hi, >>> >>> What still unclear to me so far is - As I don't see any yet., what would >> be >>> the fundamental differences between this Kafka reporter and Flink’s >>> existing Kafka producer? >>> >>> I’ve been thinking of Flink metrics for a while, and the “metric >> reporter” >>> feels a bit redundant to me. As you may already knew, Flink has been used >>> to process external metrics in various companies. If you think about it, >>> Flink’s own metric system is no different from external ones and actually >>> just another stream source, and metric reporters are just some data sinks >>> writing to external storage, with no guarantee or checkpointing. >>> >>> So instead of adding Kafka or other MQ reporters and worrying about >> message >>> format (which are already solved by Flink’s sinks), we can generalize and >>> expose Flink’s metrics system to be a simple built-in stream source, and >>> "metric reporters" are just some customized sink tailored for this >> source. >>> Users may even be able to access and process it in stream environment >> with >>> data stream api. That give users full flexibility on manipulating Flink >>> metrics with Flink, and it’s more of a “eat your own dogfood” philosophy. >>> >>> This seems too good to be true, and I haven't had time to think of the >>> details. Let me know if I miss anything here. >>> >>> >>> On Mon, Nov 18, 2019 at 09:51 Yun Tang <[hidden email]> wrote: >>> >>>> Hi all >>>> >>>> Glad to see this topic in community. >>>> We at Alibaba also implemented a kafka metrics reporter and extend it to >>>> other message queues like Alibaba cloud log service [1] half a year ago. >>>> The reason why we not launch a similar discussion is that we previously >>>> thought we only provide a way to report metrics to kafka. Unlike current >>>> supported metrics reporter, e.g. InfluxDB, Graphite, they all have an >>>> easy-to-use data source in grafana to visualize metrics. Even with kafka >>>> metrics reporter, we still need another way to consume data out and >> work as >>>> a data source for observability platform, and this would be diverse for >>>> different companies. >>>> >>>> I think this is the main concern to include this in a popular >> open-source >>>> main repo, and I pretty agree with Becket's suggestion to contribute >> this >>>> as a flink-package and we could offer an end-to-end solution including >> how >>>> to visualize these metrics data. >>>> >>>> [1] https://www.alibabacloud.com/help/doc-detail/29003.htm >>>> >>>> Best >>>> Yun Tang >>>> >>>> On 11/18/19, 8:19 AM, "Becket Qin" <[hidden email]> wrote: >>>> >>>> Hi Gyula, >>>> >>>> Thanks for bringing this up. It is a useful addition to have a >> Kafka >>>> metrics reporter. I understand that we already have Prometheus and >>>> DataDog >>>> reporters in the Flink main repo. However, personally speaking, I >> would >>>> slightly prefer to have the Kafka metrics reporter as an ecosystem >>>> project >>>> instead of in the main repo due to the following reasons: >>>> >>>> 1. To keep core Flink more focused. So in general if a component is >>>> more >>>> relevant to external system rather than Flink, it might be good to >>>> keep it >>>> as an ecosystem project. And metrics reporter seems a good example >> of >>>> that. >>>> 2. This helps encourage more contributions to Flink ecosystem >> instead >>>> of >>>> giving the impression that anything in Flink ecosystem must be in >> Flink >>>> main repo. >>>> 3. To facilitate our ecosystem project authors, we have launched a >>>> website[1] to help the community keep track of and advertise the >>>> ecosystem >>>> projects. It looks a good place to put the Kafka metrics reporter. >>>> >>>> Regarding the message format, while I think use JSON by default is >>>> fine as >>>> it does not introduce much external dependency, I wonder if we >> should >>>> make >>>> the message format pluggable. Many companies probably already have >>>> their >>>> own serde format for all the Kafka messages. For example, maybe >> they >>>> would >>>> like to just use an Avro record for their metrics instead of >>>> introducing a >>>> new JSON format. Also in many cases, there could be a lot of metric >>>> messages sent by the Flink jobs. JSON format is less efficient and >>>> might >>>> have too much overhead in that case. >>>> >>>> Thanks, >>>> >>>> Jiangjie (Becket) Qin >>>> >>>> [1] https://flink-packages.org/ >>>> >>>> >>>> On Mon, Nov 18, 2019 at 3:30 AM Konstantin Knauf < >>>> [hidden email]> >>>> wrote: >>>> >>>> > Hi Gyula, >>>> > >>>> > thank you for proposing this. +1 for adding a >> KafkaMetricsReporter. >>>> In >>>> > terms of the dependency we could go a similar route as for the >>>> "universal" >>>> > Flink Kafka Connector which to my knowledge always tracks the >> latest >>>> Kafka >>>> > version as of the Flink release and relies on compatibility of >> the >>>> > underlying KafkaClient. JSON sounds good to me. >>>> > >>>> > Cheers, >>>> > >>>> > Konstantin >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > On Sun, Nov 17, 2019 at 1:46 PM Gyula Fóra <[hidden email]> >>>> wrote: >>>> > >>>> > > Hi all! >>>> > > >>>> > > Several users have asked in the past about a Kafka based >> metrics >>>> reporter >>>> > > which can serve as a natural connector between arbitrary metric >>>> storage >>>> > > systems and a straightforward way to process Flink metrics >>>> downstream. >>>> > > >>>> > > I think this would be an extremely useful addition but I would >>>> like to >>>> > hear >>>> > > what others in the dev community think about it before >> submitting a >>>> > proper >>>> > > proposal. >>>> > > >>>> > > There are at least 3 questions to discuss here: >>>> > > >>>> > > >>>> > > *1. Do we want the Kafka metrics reporter in the Flink repo?* >>>> As it is >>>> > > much more generic than other metrics reporters already >> included, I >>>> would >>>> > > say yes. Also as almost everyone uses Flink with Kafka it >> would be >>>> a >>>> > > natural reporter choice for a lot of users. >>>> > > *2. How should we handle the Kafka dependency of the >> connector?* >>>> > > I think it would be an overkill to add different Kafka >>>> versions here, >>>> > > so I would use Kafka 2.+ which has the best compatibility and >> is >>>> future >>>> > > proof >>>> > > *3. What message format should we use?* >>>> > > I would go with JSON for readability and compatibility >>>> > > >>>> > > There is a relevant JIRA open for this already. >>>> > > https://issues.apache.org/jira/browse/FLINK-14531 >>>> > > >>>> > > We at Cloudera also promote this as a scalable way of pushing >>>> metrics to >>>> > > other systems so we are very happy to contribute an >> implementation >>>> or >>>> > > cooperate with others on building it. >>>> > > >>>> > > Please let me know what you think! >>>> > > >>>> > > Cheers, >>>> > > Gyula >>>> > > >>>> > >>>> > >>>> > -- >>>> > >>>> > Konstantin Knauf | Solutions Architect >>>> > >>>> > +49 160 91394525 >>>> > >>>> > >>>> > Follow us @VervericaData Ververica <https://www.ververica.com/> >>>> > >>>> > >>>> > -- >>>> > >>>> > Join Flink Forward <https://flink-forward.org/> - The Apache >> Flink >>>> > Conference >>>> > >>>> > Stream Processing | Event Driven | Real Time >>>> > >>>> > -- >>>> > >>>> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany >>>> > >>>> > -- >>>> > Ververica GmbH >>>> > Registered at Amtsgericht Charlottenburg: HRB 158244 B >>>> > Managing Directors: Timothy Alexander Steinert, Yip Park Tung >> Jason, >>>> Ji >>>> > (Tony) Cheng >>>> > >>>> >>>> >>>> >> |
In reply to this post by Gyula Fóra
Hi Gyula,
I did not meant to say the MetricsReporter code should stay with the metric storage, e.g. Kafka. I am trying to argue that metric reporter is a plugin less related to Flink, but more related to a specific user environment. If we include such plugins into Flink, as time goes, the user environments popularity may change, but Flink will have to carry on some legacy plugins that are no longer popular all the way, because removing them will hurt some users. So I think it is very important to decouple such user environment focused plugin implementations from Flink itself. Ideally Flink should have only one default implementation for such a Plugin, with minimum external dependency. Other implementations would be in the ecosystem so Flink itself does not need to carry the long term maintenance burden for various user environment. I am not sure if potential popularity of a particular implementation is a good argument because it may be perceived completely different depending on the region, the industry and who you ask, etc. And popularity may shift over time. Thus I am not sure if we want Flink to have an official judgement about that. Keeping the plugin implementations outside of Flink does not necessarily hurt the availability of them. On the contrary, in long run, it is easier for people to contribute implementations as it is like User Generated Contents (UGC) that does not have to go though the authorization. And good implementations will gain popularity as a natural result rather than an official pick. My two cents about the idea of reusing the source / sink connector, it is actually an orthogonal discussion about the implementation of the metrics reporter. There are two things we are discussing here: metric reporting and metric processing. Metric reporter does the former. Personally speaking I don't feel the connector is a good fit for the this use case. A connector consists of two parts: 1. The part related to Flink: Flink specific interface, record format and such 2. The part related to external system: e.g. Kafka clients, etc. The metrics reporter does not really need the first part in most cases. Usually a raw external system client would be sufficient. As of the metric processing part, it is completely fine to use a Flink job to process the metrics that have been produced to Kafka, but that would just be a normal data analytics use case, rather than part of metric reporting. I think this makes the architecture clearer. Thanks, Jiangjie (Becket) Qin On Wed, Nov 20, 2019 at 10:46 PM Gyula Fóra <[hidden email]> wrote: > @Becket , Yun: > Regarding the core/ecosystem project: > > I don't completely agree with your arguments regarding why this should be > an external ecosystem project instead of part of the Flink repo. > A metric connector is relevant for the Flink users, not the metric store. > Metric storage systems don't care about where logs are coming but Flink job > authors need a way to get the metrics to whatever systems they have. The > same applies for other connectors. If we don't provide canonical ways of > communicating with external systems, be it sources, sinks or metrics that > makes everyones life a bit harder. > > Historically most of the connectors went straight to Flink and over time > the maintenance of these has become quite a challenge with Flink core > itself growing rapidly. I agree that we have to make these decision and not > include every new external connector to the Flink core. I think this > decision should be based on the value it brings to users, and how often it > will be used. These are not easy questions and the Flink ecosystem website > is a great way for gauging the popularity/value of a specific connector. > > Another way of deciding this would be to talk to the Flink community (like > we do with this thread) and see if this is a common pattern and if we can > come up with a good generic solution regarding Kafka versioning and formats > that will work for most. If we see big interest here and have a consensus > on the formats I don't see any reason why we shouldn't include it. > > Regarding the message format: > The idea with the JSON format was that it could be an easy-to-use source > for downstream metric systems to integrate with it. I don't have much > experience with different metric storage systems so maybe Yun you are right > that you will always end up having another processor for this. But even in > that case JSON is a pretty safe format as it is easy to process no matter > what you use. > Otherwise I agree that a pluggable format would be much better and more > generic. We just need to find a way to keep it simple :D > > @Bowen, Chensay > > The whole idea of making a metrics reporter source sounds pretty great at > first :) If we could do this that would definitely make this more flexible > but even then you probably need some sort of a serialization schema > implementation for the metrics by default. Which is basically what the > kafka reporter would do + minimal client. > > Chesnay, I don't completely understand what you mean by: > "Periodically emitting the values of all metrics goes against the > convention which we established" > Isn't this exactly what the Kafka metrics reporter would do anyways? > > If you could elaborate on this a bit more that would be very helpful for me > because I don't have a good overview of the key design principals of the > current metric system. > > Cheers, > Gyula > > On Wed, Nov 20, 2019 at 10:22 AM Chesnay Schepler <[hidden email]> > wrote: > > > @Bowen I can see where you're coming from, but I don't think this would > > work too well. Your "stream" would have to contain events for > > added/removed metrics, but metrics are inherently not Serializable. I > > think this would end up being a weird special case. > > > > (Periodically emitting the values of all metrics goes against the > > convention which we established from the very beginning that metrics > > should only incur costs if necessary; as such a reporter that polls on > > demand should only consume resources if it was called) > > > > Additionally, there are plans to add additional methods to the reporter > > in the future, at which point the source interface would no longer > > suffice. At that point you'd need a separate interface again, and > > wrappers for your sinks. > > > > This would result in what is the trivial solution for this reporter > > right now anyway: have the reporter use a kafka connector internally, > > with all the features that if offers. > > > > Overall I think we'd be unnecessarily coupling reporters to the source > > interface, and i don't see a true benefit. > > > > On 19/11/2019 19:47, Bowen Li wrote: > > > Hi, > > > > > > What still unclear to me so far is - As I don't see any yet., what > would > > be > > > the fundamental differences between this Kafka reporter and Flink’s > > > existing Kafka producer? > > > > > > I’ve been thinking of Flink metrics for a while, and the “metric > > reporter” > > > feels a bit redundant to me. As you may already knew, Flink has been > used > > > to process external metrics in various companies. If you think about > it, > > > Flink’s own metric system is no different from external ones and > actually > > > just another stream source, and metric reporters are just some data > sinks > > > writing to external storage, with no guarantee or checkpointing. > > > > > > So instead of adding Kafka or other MQ reporters and worrying about > > message > > > format (which are already solved by Flink’s sinks), we can generalize > and > > > expose Flink’s metrics system to be a simple built-in stream source, > and > > > "metric reporters" are just some customized sink tailored for this > > source. > > > Users may even be able to access and process it in stream environment > > with > > > data stream api. That give users full flexibility on manipulating Flink > > > metrics with Flink, and it’s more of a “eat your own dogfood” > philosophy. > > > > > > This seems too good to be true, and I haven't had time to think of the > > > details. Let me know if I miss anything here. > > > > > > > > > On Mon, Nov 18, 2019 at 09:51 Yun Tang <[hidden email]> wrote: > > > > > >> Hi all > > >> > > >> Glad to see this topic in community. > > >> We at Alibaba also implemented a kafka metrics reporter and extend it > to > > >> other message queues like Alibaba cloud log service [1] half a year > ago. > > >> The reason why we not launch a similar discussion is that we > previously > > >> thought we only provide a way to report metrics to kafka. Unlike > current > > >> supported metrics reporter, e.g. InfluxDB, Graphite, they all have an > > >> easy-to-use data source in grafana to visualize metrics. Even with > kafka > > >> metrics reporter, we still need another way to consume data out and > > work as > > >> a data source for observability platform, and this would be diverse > for > > >> different companies. > > >> > > >> I think this is the main concern to include this in a popular > > open-source > > >> main repo, and I pretty agree with Becket's suggestion to contribute > > this > > >> as a flink-package and we could offer an end-to-end solution including > > how > > >> to visualize these metrics data. > > >> > > >> [1] https://www.alibabacloud.com/help/doc-detail/29003.htm > > >> > > >> Best > > >> Yun Tang > > >> > > >> On 11/18/19, 8:19 AM, "Becket Qin" <[hidden email]> wrote: > > >> > > >> Hi Gyula, > > >> > > >> Thanks for bringing this up. It is a useful addition to have a > > Kafka > > >> metrics reporter. I understand that we already have Prometheus > and > > >> DataDog > > >> reporters in the Flink main repo. However, personally speaking, I > > would > > >> slightly prefer to have the Kafka metrics reporter as an > ecosystem > > >> project > > >> instead of in the main repo due to the following reasons: > > >> > > >> 1. To keep core Flink more focused. So in general if a component > is > > >> more > > >> relevant to external system rather than Flink, it might be good > to > > >> keep it > > >> as an ecosystem project. And metrics reporter seems a good > example > > of > > >> that. > > >> 2. This helps encourage more contributions to Flink ecosystem > > instead > > >> of > > >> giving the impression that anything in Flink ecosystem must be in > > Flink > > >> main repo. > > >> 3. To facilitate our ecosystem project authors, we have launched > a > > >> website[1] to help the community keep track of and advertise the > > >> ecosystem > > >> projects. It looks a good place to put the Kafka metrics > reporter. > > >> > > >> Regarding the message format, while I think use JSON by default > is > > >> fine as > > >> it does not introduce much external dependency, I wonder if we > > should > > >> make > > >> the message format pluggable. Many companies probably already > have > > >> their > > >> own serde format for all the Kafka messages. For example, maybe > > they > > >> would > > >> like to just use an Avro record for their metrics instead of > > >> introducing a > > >> new JSON format. Also in many cases, there could be a lot of > metric > > >> messages sent by the Flink jobs. JSON format is less efficient > and > > >> might > > >> have too much overhead in that case. > > >> > > >> Thanks, > > >> > > >> Jiangjie (Becket) Qin > > >> > > >> [1] https://flink-packages.org/ > > >> > > >> > > >> On Mon, Nov 18, 2019 at 3:30 AM Konstantin Knauf < > > >> [hidden email]> > > >> wrote: > > >> > > >> > Hi Gyula, > > >> > > > >> > thank you for proposing this. +1 for adding a > > KafkaMetricsReporter. > > >> In > > >> > terms of the dependency we could go a similar route as for the > > >> "universal" > > >> > Flink Kafka Connector which to my knowledge always tracks the > > latest > > >> Kafka > > >> > version as of the Flink release and relies on compatibility of > > the > > >> > underlying KafkaClient. JSON sounds good to me. > > >> > > > >> > Cheers, > > >> > > > >> > Konstantin > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > On Sun, Nov 17, 2019 at 1:46 PM Gyula Fóra <[hidden email]> > > >> wrote: > > >> > > > >> > > Hi all! > > >> > > > > >> > > Several users have asked in the past about a Kafka based > > metrics > > >> reporter > > >> > > which can serve as a natural connector between arbitrary > metric > > >> storage > > >> > > systems and a straightforward way to process Flink metrics > > >> downstream. > > >> > > > > >> > > I think this would be an extremely useful addition but I > would > > >> like to > > >> > hear > > >> > > what others in the dev community think about it before > > submitting a > > >> > proper > > >> > > proposal. > > >> > > > > >> > > There are at least 3 questions to discuss here: > > >> > > > > >> > > > > >> > > *1. Do we want the Kafka metrics reporter in the Flink repo?* > > >> As it is > > >> > > much more generic than other metrics reporters already > > included, I > > >> would > > >> > > say yes. Also as almost everyone uses Flink with Kafka it > > would be > > >> a > > >> > > natural reporter choice for a lot of users. > > >> > > *2. How should we handle the Kafka dependency of the > > connector?* > > >> > > I think it would be an overkill to add different Kafka > > >> versions here, > > >> > > so I would use Kafka 2.+ which has the best compatibility and > > is > > >> future > > >> > > proof > > >> > > *3. What message format should we use?* > > >> > > I would go with JSON for readability and compatibility > > >> > > > > >> > > There is a relevant JIRA open for this already. > > >> > > https://issues.apache.org/jira/browse/FLINK-14531 > > >> > > > > >> > > We at Cloudera also promote this as a scalable way of pushing > > >> metrics to > > >> > > other systems so we are very happy to contribute an > > implementation > > >> or > > >> > > cooperate with others on building it. > > >> > > > > >> > > Please let me know what you think! > > >> > > > > >> > > Cheers, > > >> > > Gyula > > >> > > > > >> > > > >> > > > >> > -- > > >> > > > >> > Konstantin Knauf | Solutions Architect > > >> > > > >> > +49 160 91394525 > > >> > > > >> > > > >> > Follow us @VervericaData Ververica <https://www.ververica.com/ > > > > >> > > > >> > > > >> > -- > > >> > > > >> > Join Flink Forward <https://flink-forward.org/> - The Apache > > Flink > > >> > Conference > > >> > > > >> > Stream Processing | Event Driven | Real Time > > >> > > > >> > -- > > >> > > > >> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > >> > > > >> > -- > > >> > Ververica GmbH > > >> > Registered at Amtsgericht Charlottenburg: HRB 158244 B > > >> > Managing Directors: Timothy Alexander Steinert, Yip Park Tung > > Jason, > > >> Ji > > >> > (Tony) Cheng > > >> > > > >> > > >> > > >> > > > > > |
Free forum by Nabble | Edit this page |