Hi all,
I've been using Apache Flink for the last few months but I'm new to the dev community. I'd like to contribute code (and possibly more) to the community and I've been advised a good starting point would be suggesting improvements for those areas that I found lacking. I'll create a separate [DISCUSS] thread for each of those (if this is indeed the process!). -- Problem statement -- In my use cases I've had to output data at regular (event time) intervals, regardless of whether there's been any events flowing through the app. For those occasions when no events flow I've been happy to delay the emission of data for some time. This amount of time is reasonable and still several times larger than the worse delays of my event bus. It also meets the business requirements :) Flink's documentation suggests marking a source as temporarily idle for such occasions but to the best of my knowledge it will not advance the watermark if there's no events at all flowing through the system. -- Proposed solution -- Provide all implementations of AssignerWithPeriodicWatermarks in the Flink project a mechanism to specify a max time delay after which the watermark will advance if no events have been processed. The watermark will always stay as far as the specified delay when advanced in this way. To achieve backward compatibility I suggest providing the implementations of AssignerWithPeriodicWatermarks with a builder method that'll allow to specify said max delay. Other options to introducing this change in a non-invasive way are welcome. I'm hoping for your suggestions/comments/questions. Thanks, Eduardo |
Hi Eduardo,
great to hear that you want to contribute to Flink. I'm not too deeply involved in idle watermarks but I know that there is FLIP-27 [1] which has the aim of reworking Flink's source interface. Part of it is to solve the problem of idle watermarks. The idea would be to solve the problem for all sources instead of having to implement it for every watermark assigner. Please also take a look at the corresponding JIRA issue [2]. Maybe you can check with the involved people how far we are with the implementation and whether it will cover the idle watermarks. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface [2] https://issues.apache.org/jira/browse/FLINK-10740 Cheers, Till On Mon, Feb 17, 2020 at 10:22 PM Eduardo Winpenny Tejedor < [hidden email]> wrote: > Hi all, > > I've been using Apache Flink for the last few months but I'm new to > the dev community. I'd like to contribute code (and possibly more) to > the community and I've been advised a good starting point would be > suggesting improvements for those areas that I found lacking. I'll > create a separate [DISCUSS] thread for each of those (if this is > indeed the process!). > > -- Problem statement -- > > In my use cases I've had to output data at regular (event time) > intervals, regardless of whether there's been any events flowing > through the app. For those occasions when no events flow I've been > happy to delay the emission of data for some time. This amount of time > is reasonable and still several times larger than the worse delays of > my event bus. It also meets the business requirements :) > > Flink's documentation suggests marking a source as temporarily idle > for such occasions but to the best of my knowledge it will not advance > the watermark if there's no events at all flowing through the system. > > > -- Proposed solution -- > > Provide all implementations of AssignerWithPeriodicWatermarks in the > Flink project a mechanism to specify a max time delay after which the > watermark will advance if no events have been processed. The watermark > will always stay as far as the specified delay when advanced in this > way. > > To achieve backward compatibility I suggest providing the > implementations of AssignerWithPeriodicWatermarks with a builder > method that'll allow to specify said max delay. Other options to > introducing this change in a non-invasive way are welcome. > > I'm hoping for your suggestions/comments/questions. > > Thanks, > Eduardo > |
Hi Till,
Thanks for setting me on the right path! I'll finish reading the FLIP and Jira item and I'll contact Jiangjie Qin on how to best help. Regards, Eduardo On Wed, 19 Feb 2020, 14:44 Till Rohrmann, <[hidden email]> wrote: > Hi Eduardo, > > great to hear that you want to contribute to Flink. > > I'm not too deeply involved in idle watermarks but I know that there is > FLIP-27 [1] which has the aim of reworking Flink's source interface. Part > of it is to solve the problem of idle watermarks. The idea would be to > solve the problem for all sources instead of having to implement it for > every watermark assigner. Please also take a look at the corresponding JIRA > issue [2]. > > Maybe you can check with the involved people how far we are with the > implementation and whether it will cover the idle watermarks. > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface > [2] https://issues.apache.org/jira/browse/FLINK-10740 > > Cheers, > Till > > On Mon, Feb 17, 2020 at 10:22 PM Eduardo Winpenny Tejedor < > [hidden email]> wrote: > > > Hi all, > > > > I've been using Apache Flink for the last few months but I'm new to > > the dev community. I'd like to contribute code (and possibly more) to > > the community and I've been advised a good starting point would be > > suggesting improvements for those areas that I found lacking. I'll > > create a separate [DISCUSS] thread for each of those (if this is > > indeed the process!). > > > > -- Problem statement -- > > > > In my use cases I've had to output data at regular (event time) > > intervals, regardless of whether there's been any events flowing > > through the app. For those occasions when no events flow I've been > > happy to delay the emission of data for some time. This amount of time > > is reasonable and still several times larger than the worse delays of > > my event bus. It also meets the business requirements :) > > > > Flink's documentation suggests marking a source as temporarily idle > > for such occasions but to the best of my knowledge it will not advance > > the watermark if there's no events at all flowing through the system. > > > > > > -- Proposed solution -- > > > > Provide all implementations of AssignerWithPeriodicWatermarks in the > > Flink project a mechanism to specify a max time delay after which the > > watermark will advance if no events have been processed. The watermark > > will always stay as far as the specified delay when advanced in this > > way. > > > > To achieve backward compatibility I suggest providing the > > implementations of AssignerWithPeriodicWatermarks with a builder > > method that'll allow to specify said max delay. Other options to > > introducing this change in a non-invasive way are welcome. > > > > I'm hoping for your suggestions/comments/questions. > > > > Thanks, > > Eduardo > > > |
Free forum by Nabble | Edit this page |