[DISCUSS] AssignerWithPeriodicWatermarks with max delay

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] AssignerWithPeriodicWatermarks with max delay

Eduardo Winpenny Tejedor
Hi all,

I've been using Apache Flink for the last few months but I'm new to
the dev community. I'd like to contribute code (and possibly more) to
the community and I've been advised a good starting point would be
suggesting improvements for those areas that I found lacking. I'll
create a separate [DISCUSS] thread for each of those (if this is
indeed the process!).

-- Problem statement --

In my use cases I've had to output data at regular (event time)
intervals, regardless of whether there's been any events flowing
through the app. For those occasions when no events flow I've been
happy to delay the emission of data for some time. This amount of time
is reasonable and still several times larger than the worse delays of
my event bus. It also meets the business requirements :)

Flink's documentation suggests marking a source as temporarily idle
for such occasions but to the best of my knowledge it will not advance
the watermark if there's no events at all flowing through the system.


-- Proposed solution --

Provide all implementations of AssignerWithPeriodicWatermarks in the
Flink project a mechanism to specify a max time delay after which the
watermark will advance if no events have been processed. The watermark
will always stay as far as the specified delay when advanced in this
way.

To achieve backward compatibility I suggest providing the
implementations of AssignerWithPeriodicWatermarks with a builder
method that'll allow to specify said max delay. Other options to
introducing this change in a non-invasive way are welcome.

I'm hoping for your suggestions/comments/questions.

Thanks,
Eduardo
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] AssignerWithPeriodicWatermarks with max delay

Till Rohrmann
Hi Eduardo,

great to hear that you want to contribute to Flink.

I'm not too deeply involved in idle watermarks but I know that there is
FLIP-27 [1] which has the aim of reworking Flink's source interface. Part
of it is to solve the problem of idle watermarks. The idea would be to
solve the problem for all sources instead of having to implement it for
every watermark assigner. Please also take a look at the corresponding JIRA
issue [2].

Maybe you can check with the involved people how far we are with the
implementation and whether it will cover the idle watermarks.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
[2] https://issues.apache.org/jira/browse/FLINK-10740

Cheers,
Till

On Mon, Feb 17, 2020 at 10:22 PM Eduardo Winpenny Tejedor <
[hidden email]> wrote:

> Hi all,
>
> I've been using Apache Flink for the last few months but I'm new to
> the dev community. I'd like to contribute code (and possibly more) to
> the community and I've been advised a good starting point would be
> suggesting improvements for those areas that I found lacking. I'll
> create a separate [DISCUSS] thread for each of those (if this is
> indeed the process!).
>
> -- Problem statement --
>
> In my use cases I've had to output data at regular (event time)
> intervals, regardless of whether there's been any events flowing
> through the app. For those occasions when no events flow I've been
> happy to delay the emission of data for some time. This amount of time
> is reasonable and still several times larger than the worse delays of
> my event bus. It also meets the business requirements :)
>
> Flink's documentation suggests marking a source as temporarily idle
> for such occasions but to the best of my knowledge it will not advance
> the watermark if there's no events at all flowing through the system.
>
>
> -- Proposed solution --
>
> Provide all implementations of AssignerWithPeriodicWatermarks in the
> Flink project a mechanism to specify a max time delay after which the
> watermark will advance if no events have been processed. The watermark
> will always stay as far as the specified delay when advanced in this
> way.
>
> To achieve backward compatibility I suggest providing the
> implementations of AssignerWithPeriodicWatermarks with a builder
> method that'll allow to specify said max delay. Other options to
> introducing this change in a non-invasive way are welcome.
>
> I'm hoping for your suggestions/comments/questions.
>
> Thanks,
> Eduardo
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] AssignerWithPeriodicWatermarks with max delay

Eduardo Winpenny Tejedor
Hi Till,

Thanks for setting me on the right path! I'll finish reading the FLIP and
Jira item and I'll contact Jiangjie Qin on how to best help.

Regards,
Eduardo


On Wed, 19 Feb 2020, 14:44 Till Rohrmann, <[hidden email]> wrote:

> Hi Eduardo,
>
> great to hear that you want to contribute to Flink.
>
> I'm not too deeply involved in idle watermarks but I know that there is
> FLIP-27 [1] which has the aim of reworking Flink's source interface. Part
> of it is to solve the problem of idle watermarks. The idea would be to
> solve the problem for all sources instead of having to implement it for
> every watermark assigner. Please also take a look at the corresponding JIRA
> issue [2].
>
> Maybe you can check with the involved people how far we are with the
> implementation and whether it will cover the idle watermarks.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> [2] https://issues.apache.org/jira/browse/FLINK-10740
>
> Cheers,
> Till
>
> On Mon, Feb 17, 2020 at 10:22 PM Eduardo Winpenny Tejedor <
> [hidden email]> wrote:
>
> > Hi all,
> >
> > I've been using Apache Flink for the last few months but I'm new to
> > the dev community. I'd like to contribute code (and possibly more) to
> > the community and I've been advised a good starting point would be
> > suggesting improvements for those areas that I found lacking. I'll
> > create a separate [DISCUSS] thread for each of those (if this is
> > indeed the process!).
> >
> > -- Problem statement --
> >
> > In my use cases I've had to output data at regular (event time)
> > intervals, regardless of whether there's been any events flowing
> > through the app. For those occasions when no events flow I've been
> > happy to delay the emission of data for some time. This amount of time
> > is reasonable and still several times larger than the worse delays of
> > my event bus. It also meets the business requirements :)
> >
> > Flink's documentation suggests marking a source as temporarily idle
> > for such occasions but to the best of my knowledge it will not advance
> > the watermark if there's no events at all flowing through the system.
> >
> >
> > -- Proposed solution --
> >
> > Provide all implementations of AssignerWithPeriodicWatermarks in the
> > Flink project a mechanism to specify a max time delay after which the
> > watermark will advance if no events have been processed. The watermark
> > will always stay as far as the specified delay when advanced in this
> > way.
> >
> > To achieve backward compatibility I suggest providing the
> > implementations of AssignerWithPeriodicWatermarks with a builder
> > method that'll allow to specify said max delay. Other options to
> > introducing this change in a non-invasive way are welcome.
> >
> > I'm hoping for your suggestions/comments/questions.
> >
> > Thanks,
> > Eduardo
> >
>