Hello,
I was looking to add metrics to the streaming file sink. Currently the only details available is the generic information about for any operator like the number of records in, number of records out etc. I was looking at adding some metrics and contributing back as well as enabling the metrics which are already getting published by the aws-hadoop. Is that something which is of value for the community? Another change I am proposing is to make the constructor of StreamingFileSink protected instead of private here: https://tinyurl.com/y5vh4jn6. If we can make this as protected, then it is possible to extend this class and have custom metrics for anyone to add in the 'open' method. Thanks Kailash |
+1 to both suggestions
It should be possible to extend the connector (we run into the same issues with KinesisConsumer). Metrics are essential to understand the performance, especially for things like S3 writes, error, retries, memory buffers and so on. Thomas On 2019/05/15 07:43:39, Kailash Dayanand <[hidden email]> wrote: > Hello, > > I was looking to add metrics to the streaming file sink. Currently the only > details available is the generic information about for any operator like > the number of records in, number of records out etc. I was looking at > adding some metrics and contributing back as well as enabling the metrics > which are already getting published by the aws-hadoop. Is that something > which is of value for the community? > > Another change I am proposing is to make the constructor of > StreamingFileSink protected instead of private here: > https://tinyurl.com/y5vh4jn6. If we can make this as protected, then it is > possible to extend this class and have custom metrics for anyone to add in > the 'open' method. > > Thanks > Kailash > |
Hi Kailash,
have you seen FLIP-33 [1] and the corresponding ML thread [2]. The scope of this improvement proposal is to extend the set of standard metrics a connector should offer. Maybe this can already solve your problem. Concerning your second proposal for the StreamingFileSink, I think this should be doable and help users to build their custom StreamingFileSink. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics [2] https://www.mail-archive.com/dev@.../msg25296.html Cheers, Till On Thu, May 16, 2019 at 2:38 AM Thomas Weise <[hidden email]> wrote: > +1 to both suggestions > > It should be possible to extend the connector (we run into the same issues > with KinesisConsumer). > > Metrics are essential to understand the performance, especially for things > like S3 writes, error, retries, memory buffers and so on. > > Thomas > > On 2019/05/15 07:43:39, Kailash Dayanand <[hidden email]> wrote: > > Hello, > > > > I was looking to add metrics to the streaming file sink. Currently the > only > > details available is the generic information about for any operator like > > the number of records in, number of records out etc. I was looking at > > adding some metrics and contributing back as well as enabling the metrics > > which are already getting published by the aws-hadoop. Is that something > > which is of value for the community? > > > > Another change I am proposing is to make the constructor of > > StreamingFileSink protected instead of private here: > > https://tinyurl.com/y5vh4jn6. If we can make this as protected, then it > is > > possible to extend this class and have custom metrics for anyone to add > in > > the 'open' method. > > > > Thanks > > Kailash > > > |
Hello Till,
Thanks a lot for the information. It makes a lot of sense to have generic sink based metrics. Some things which maybe useful for file system sinks is the number of files written as well. I am assuming that will be abstracted under number of records for something like a bulk writer ( multiple records will constitute a single write to the sink, hence my doubt). Thanks Kailash On Thu, May 16, 2019, 3:13 AM Till Rohrmann <[hidden email]> wrote: > Hi Kailash, > > have you seen FLIP-33 [1] and the corresponding ML thread [2]. The scope of > this improvement proposal is to extend the set of standard metrics a > connector should offer. Maybe this can already solve your problem. > > Concerning your second proposal for the StreamingFileSink, I think this > should be doable and help users to build their custom StreamingFileSink. > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics > [2] https://www.mail-archive.com/dev@.../msg25296.html > > Cheers, > Till > > On Thu, May 16, 2019 at 2:38 AM Thomas Weise <[hidden email]> wrote: > > > +1 to both suggestions > > > > It should be possible to extend the connector (we run into the same > issues > > with KinesisConsumer). > > > > Metrics are essential to understand the performance, especially for > things > > like S3 writes, error, retries, memory buffers and so on. > > > > Thomas > > > > On 2019/05/15 07:43:39, Kailash Dayanand <[hidden email]> wrote: > > > Hello, > > > > > > I was looking to add metrics to the streaming file sink. Currently the > > only > > > details available is the generic information about for any operator > like > > > the number of records in, number of records out etc. I was looking at > > > adding some metrics and contributing back as well as enabling the > metrics > > > which are already getting published by the aws-hadoop. Is that > something > > > which is of value for the community? > > > > > > Another change I am proposing is to make the constructor of > > > StreamingFileSink protected instead of private here: > > > https://tinyurl.com/y5vh4jn6. If we can make this as protected, then > it > > is > > > possible to extend this class and have custom metrics for anyone to add > > in > > > the 'open' method. > > > > > > Thanks > > > Kailash > > > > > > |
I think you are right that some connectors will still need some special
metrics due to their peculiarities. I guess that this won't be addressed with the FLIP but it could be a starting point. Cheers, Till On Fri, May 17, 2019 at 8:26 AM Kailash Dayanand <[hidden email]> wrote: > Hello Till, > > Thanks a lot for the information. It makes a lot of sense to have generic > sink based metrics. Some things which maybe useful for file system sinks is > the number of files written as well. I am assuming that will be abstracted > under number of records for something like a bulk writer ( multiple records > will constitute a single write to the sink, hence my doubt). > > Thanks > Kailash > > On Thu, May 16, 2019, 3:13 AM Till Rohrmann <[hidden email]> wrote: > > > Hi Kailash, > > > > have you seen FLIP-33 [1] and the corresponding ML thread [2]. The scope > of > > this improvement proposal is to extend the set of standard metrics a > > connector should offer. Maybe this can already solve your problem. > > > > Concerning your second proposal for the StreamingFileSink, I think this > > should be doable and help users to build their custom StreamingFileSink. > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics > > [2] https://www.mail-archive.com/dev@.../msg25296.html > > > > Cheers, > > Till > > > > On Thu, May 16, 2019 at 2:38 AM Thomas Weise <[hidden email]> wrote: > > > > > +1 to both suggestions > > > > > > It should be possible to extend the connector (we run into the same > > issues > > > with KinesisConsumer). > > > > > > Metrics are essential to understand the performance, especially for > > things > > > like S3 writes, error, retries, memory buffers and so on. > > > > > > Thomas > > > > > > On 2019/05/15 07:43:39, Kailash Dayanand <[hidden email]> wrote: > > > > Hello, > > > > > > > > I was looking to add metrics to the streaming file sink. Currently > the > > > only > > > > details available is the generic information about for any operator > > like > > > > the number of records in, number of records out etc. I was looking at > > > > adding some metrics and contributing back as well as enabling the > > metrics > > > > which are already getting published by the aws-hadoop. Is that > > something > > > > which is of value for the community? > > > > > > > > Another change I am proposing is to make the constructor of > > > > StreamingFileSink protected instead of private here: > > > > https://tinyurl.com/y5vh4jn6. If we can make this as protected, then > > it > > > is > > > > possible to extend this class and have custom metrics for anyone to > add > > > in > > > > the 'open' method. > > > > > > > > Thanks > > > > Kailash > > > > > > > > > > |
Free forum by Nabble | Edit this page |