[Discuss]: Adding Metrics to StreamingFileSink

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[Discuss]: Adding Metrics to StreamingFileSink

Kailash Dayanand-3
Hello,

I was looking to add metrics to the streaming file sink. Currently the only
details available is the generic information about for any operator like
the number of records in, number of records out etc. I was looking at
adding some metrics and contributing back as well as enabling the metrics
which are already getting published by the aws-hadoop. Is that something
which is of value for the community?

Another change I am proposing is to make the constructor of
StreamingFileSink protected instead of private here:
https://tinyurl.com/y5vh4jn6. If we can make this as protected, then it is
possible to extend this class and have custom metrics for anyone to add in
the 'open' method.

Thanks
Kailash
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss]: Adding Metrics to StreamingFileSink

Thomas Weise
+1 to both suggestions

It should be possible to extend the connector (we run into the same issues with KinesisConsumer).

Metrics are essential to understand the performance, especially for things like S3 writes, error, retries, memory buffers and so on.

Thomas

On 2019/05/15 07:43:39, Kailash Dayanand <[hidden email]> wrote:

> Hello,
>
> I was looking to add metrics to the streaming file sink. Currently the only
> details available is the generic information about for any operator like
> the number of records in, number of records out etc. I was looking at
> adding some metrics and contributing back as well as enabling the metrics
> which are already getting published by the aws-hadoop. Is that something
> which is of value for the community?
>
> Another change I am proposing is to make the constructor of
> StreamingFileSink protected instead of private here:
> https://tinyurl.com/y5vh4jn6. If we can make this as protected, then it is
> possible to extend this class and have custom metrics for anyone to add in
> the 'open' method.
>
> Thanks
> Kailash
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss]: Adding Metrics to StreamingFileSink

Till Rohrmann
Hi Kailash,

have you seen FLIP-33 [1] and the corresponding ML thread [2]. The scope of
this improvement proposal is to extend the set of standard metrics a
connector should offer. Maybe this can already solve your problem.

Concerning your second proposal for the StreamingFileSink, I think this
should be doable and help users to build their custom StreamingFileSink.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics
[2] https://www.mail-archive.com/dev@.../msg25296.html

Cheers,
Till

On Thu, May 16, 2019 at 2:38 AM Thomas Weise <[hidden email]> wrote:

> +1 to both suggestions
>
> It should be possible to extend the connector (we run into the same issues
> with KinesisConsumer).
>
> Metrics are essential to understand the performance, especially for things
> like S3 writes, error, retries, memory buffers and so on.
>
> Thomas
>
> On 2019/05/15 07:43:39, Kailash Dayanand <[hidden email]> wrote:
> > Hello,
> >
> > I was looking to add metrics to the streaming file sink. Currently the
> only
> > details available is the generic information about for any operator like
> > the number of records in, number of records out etc. I was looking at
> > adding some metrics and contributing back as well as enabling the metrics
> > which are already getting published by the aws-hadoop. Is that something
> > which is of value for the community?
> >
> > Another change I am proposing is to make the constructor of
> > StreamingFileSink protected instead of private here:
> > https://tinyurl.com/y5vh4jn6. If we can make this as protected, then it
> is
> > possible to extend this class and have custom metrics for anyone to add
> in
> > the 'open' method.
> >
> > Thanks
> > Kailash
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss]: Adding Metrics to StreamingFileSink

Kailash Dayanand-3
Hello Till,

Thanks a lot for the information. It makes a lot of sense to have generic
sink based metrics. Some things which maybe useful for file system sinks is
the number of files written as well. I am assuming that will be abstracted
under number of records for something like a bulk writer ( multiple records
will constitute a single write to the sink, hence my doubt).

Thanks
Kailash

On Thu, May 16, 2019, 3:13 AM Till Rohrmann <[hidden email]> wrote:

> Hi Kailash,
>
> have you seen FLIP-33 [1] and the corresponding ML thread [2]. The scope of
> this improvement proposal is to extend the set of standard metrics a
> connector should offer. Maybe this can already solve your problem.
>
> Concerning your second proposal for the StreamingFileSink, I think this
> should be doable and help users to build their custom StreamingFileSink.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics
> [2] https://www.mail-archive.com/dev@.../msg25296.html
>
> Cheers,
> Till
>
> On Thu, May 16, 2019 at 2:38 AM Thomas Weise <[hidden email]> wrote:
>
> > +1 to both suggestions
> >
> > It should be possible to extend the connector (we run into the same
> issues
> > with KinesisConsumer).
> >
> > Metrics are essential to understand the performance, especially for
> things
> > like S3 writes, error, retries, memory buffers and so on.
> >
> > Thomas
> >
> > On 2019/05/15 07:43:39, Kailash Dayanand <[hidden email]> wrote:
> > > Hello,
> > >
> > > I was looking to add metrics to the streaming file sink. Currently the
> > only
> > > details available is the generic information about for any operator
> like
> > > the number of records in, number of records out etc. I was looking at
> > > adding some metrics and contributing back as well as enabling the
> metrics
> > > which are already getting published by the aws-hadoop. Is that
> something
> > > which is of value for the community?
> > >
> > > Another change I am proposing is to make the constructor of
> > > StreamingFileSink protected instead of private here:
> > > https://tinyurl.com/y5vh4jn6. If we can make this as protected, then
> it
> > is
> > > possible to extend this class and have custom metrics for anyone to add
> > in
> > > the 'open' method.
> > >
> > > Thanks
> > > Kailash
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss]: Adding Metrics to StreamingFileSink

Till Rohrmann
I think you are right that some connectors will still need some special
metrics due to their peculiarities. I guess that this won't be addressed
with the FLIP but it could be a starting point.

Cheers,
Till


On Fri, May 17, 2019 at 8:26 AM Kailash Dayanand <[hidden email]>
wrote:

> Hello Till,
>
> Thanks a lot for the information. It makes a lot of sense to have generic
> sink based metrics. Some things which maybe useful for file system sinks is
> the number of files written as well. I am assuming that will be abstracted
> under number of records for something like a bulk writer ( multiple records
> will constitute a single write to the sink, hence my doubt).
>
> Thanks
> Kailash
>
> On Thu, May 16, 2019, 3:13 AM Till Rohrmann <[hidden email]> wrote:
>
> > Hi Kailash,
> >
> > have you seen FLIP-33 [1] and the corresponding ML thread [2]. The scope
> of
> > this improvement proposal is to extend the set of standard metrics a
> > connector should offer. Maybe this can already solve your problem.
> >
> > Concerning your second proposal for the StreamingFileSink, I think this
> > should be doable and help users to build their custom StreamingFileSink.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics
> > [2] https://www.mail-archive.com/dev@.../msg25296.html
> >
> > Cheers,
> > Till
> >
> > On Thu, May 16, 2019 at 2:38 AM Thomas Weise <[hidden email]> wrote:
> >
> > > +1 to both suggestions
> > >
> > > It should be possible to extend the connector (we run into the same
> > issues
> > > with KinesisConsumer).
> > >
> > > Metrics are essential to understand the performance, especially for
> > things
> > > like S3 writes, error, retries, memory buffers and so on.
> > >
> > > Thomas
> > >
> > > On 2019/05/15 07:43:39, Kailash Dayanand <[hidden email]> wrote:
> > > > Hello,
> > > >
> > > > I was looking to add metrics to the streaming file sink. Currently
> the
> > > only
> > > > details available is the generic information about for any operator
> > like
> > > > the number of records in, number of records out etc. I was looking at
> > > > adding some metrics and contributing back as well as enabling the
> > metrics
> > > > which are already getting published by the aws-hadoop. Is that
> > something
> > > > which is of value for the community?
> > > >
> > > > Another change I am proposing is to make the constructor of
> > > > StreamingFileSink protected instead of private here:
> > > > https://tinyurl.com/y5vh4jn6. If we can make this as protected, then
> > it
> > > is
> > > > possible to extend this class and have custom metrics for anyone to
> add
> > > in
> > > > the 'open' method.
> > > >
> > > > Thanks
> > > > Kailash
> > > >
> > >
> >
>