[Discuss] Retraction for Flink Streaming

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[Discuss] Retraction for Flink Streaming

Shaoxuan Wang
Hello everyone,

Flink is widely used in Alibaba Group, especially in our Search and
Recommendation Infra. Retraction is one of the most important features that
we needed. We have spent lots of efforts to try to solve this problem, and
gladly at the end we develop an approach which can address most of
retraction problems in our production scenarios. Same as usual, we (Alibaba
search-data infra team) would like to share our retraction solution to the
entire Flink community. If you like this proposal, I would also like to
make it as one of the FLIPs. I am attaching the design doc of "Retraction
for Flink Streaming" as well as the introduction section below. I have also
created a master jira (FLINK-6047) to track the discussion and design of
the Flink retraction. All suggestions and comments are welcome.


*Design doc:*
https://docs.google.com/document/d/18XlGPcfsGbnPSApRipJDLPg5IFNGTQjnz7emkVpZlkw

*Introduction:*

"Retraction" is an important building block for data streaming to refine
the early fired results in streaming. “Early firing” are very common and
widely used in many streaming scenarios, for instance “window-less” or
unbounded aggregate and stream-stream inner join, windowed (with early
firing) aggregate and stream-stream inner join. As described in Streaming
102, there are mainly two cases that require retractions: 1) update on the
keyed table (the key is either a primaryKey (PK) on source table, or a
groupKey/partitionKey in an aggregate); 2) When dynamic windows (e.g.,
session window) are in use, the new value may be replacing more than one
previous window due to window merging.

To the best of our knowledge, the retraction for the early fired streaming
results has never been practically solved before. In this proposal, we
develop a retraction solution and explain how it works for the problem of
“update on the keyed table”. The same solution can be easily extended for
the dynamic windows merging, as the key component of retraction - how to
refine an early fired results - is the same across different problems.

*Master Jira: *
https://issues.apache.org/jira/browse/FLINK-6047


Regards,
Shaoxuan
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Retraction for Flink Streaming

Fabian Hueske-2
Hi Shaoxuan,

thanks a lot for this proposal!
Support for retractions is a super nice and important feature and will
enable many more use cases for the Table API / SQL.
I'm really excited to see this happening. I made a first pass over your
proposal and added a few comments. I'll do another pass soon.

Since it is only 6 weeks left until the feature freeze for Flink 1.3, I
propose to develop the retraction support in a feature branch.
IMO, we must make sure that either all operators support retraction or
none. Otherwise, the behavior of the Table API / SQL will not be
predictable.

I also think that we should define which operators we want to support in
Flink 1.3 in order to coordinate the development of retraction support.

What do others think?

Cheers, Fabian


2017-03-14 16:53 GMT+01:00 Shaoxuan Wang <[hidden email]>:

> Hello everyone,
>
> Flink is widely used in Alibaba Group, especially in our Search and
> Recommendation Infra. Retraction is one of the most important features that
> we needed. We have spent lots of efforts to try to solve this problem, and
> gladly at the end we develop an approach which can address most of
> retraction problems in our production scenarios. Same as usual, we (Alibaba
> search-data infra team) would like to share our retraction solution to the
> entire Flink community. If you like this proposal, I would also like to
> make it as one of the FLIPs. I am attaching the design doc of "Retraction
> for Flink Streaming" as well as the introduction section below. I have also
> created a master jira (FLINK-6047) to track the discussion and design of
> the Flink retraction. All suggestions and comments are welcome.
>
>
> *Design doc:*
> https://docs.google.com/document/d/18XlGPcfsGbnPSApRipJDLPg5IFNGT
> Qjnz7emkVpZlkw
>
> *Introduction:*
>
> "Retraction" is an important building block for data streaming to refine
> the early fired results in streaming. “Early firing” are very common and
> widely used in many streaming scenarios, for instance “window-less” or
> unbounded aggregate and stream-stream inner join, windowed (with early
> firing) aggregate and stream-stream inner join. As described in Streaming
> 102, there are mainly two cases that require retractions: 1) update on the
> keyed table (the key is either a primaryKey (PK) on source table, or a
> groupKey/partitionKey in an aggregate); 2) When dynamic windows (e.g.,
> session window) are in use, the new value may be replacing more than one
> previous window due to window merging.
>
> To the best of our knowledge, the retraction for the early fired streaming
> results has never been practically solved before. In this proposal, we
> develop a retraction solution and explain how it works for the problem of
> “update on the keyed table”. The same solution can be easily extended for
> the dynamic windows merging, as the key component of retraction - how to
> refine an early fired results - is the same across different problems.
>
> *Master Jira: *
> https://issues.apache.org/jira/browse/FLINK-6047
>
>
> Regards,
> Shaoxuan
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Retraction for Flink Streaming

Shaoxuan Wang
The doc gets lots of attention. I appreciate everyone for the valuable
comments.

Hi Fabian,
Thanks for your comments.

I agree with you that we should ensure that all operators are running in
the same mode (either turning on retraction globally or not). From my point
of view, I did not see any problem to support the retraction for all the
operators we have so far in Flink master (assuming we keep the window
aggregate in a "without early firing mode" as it is now, and not to handle
the late arrival after window is materialized as we agreed via the
commenting discussions in the google doc).

Can you please create a feature branch. We have a complete design on top of
Flink master. We would like to submit the design to the feature branch
asap, then everyone will get a deeper inside of the design with more
details.

Regards,
Shaoxuan



On Thu, Mar 16, 2017 at 12:54 AM, Fabian Hueske <[hidden email]> wrote:

> Hi Shaoxuan,
>
> thanks a lot for this proposal!
> Support for retractions is a super nice and important feature and will
> enable many more use cases for the Table API / SQL.
> I'm really excited to see this happening. I made a first pass over your
> proposal and added a few comments. I'll do another pass soon.
>
> Since it is only 6 weeks left until the feature freeze for Flink 1.3, I
> propose to develop the retraction support in a feature branch.
> IMO, we must make sure that either all operators support retraction or
> none. Otherwise, the behavior of the Table API / SQL will not be
> predictable.
>
> I also think that we should define which operators we want to support in
> Flink 1.3 in order to coordinate the development of retraction support.
>
> What do others think?
>
> Cheers, Fabian
>
>
> 2017-03-14 16:53 GMT+01:00 Shaoxuan Wang <[hidden email]>:
>
> > Hello everyone,
> >
> > Flink is widely used in Alibaba Group, especially in our Search and
> > Recommendation Infra. Retraction is one of the most important features
> that
> > we needed. We have spent lots of efforts to try to solve this problem,
> and
> > gladly at the end we develop an approach which can address most of
> > retraction problems in our production scenarios. Same as usual, we
> (Alibaba
> > search-data infra team) would like to share our retraction solution to
> the
> > entire Flink community. If you like this proposal, I would also like to
> > make it as one of the FLIPs. I am attaching the design doc of "Retraction
> > for Flink Streaming" as well as the introduction section below. I have
> also
> > created a master jira (FLINK-6047) to track the discussion and design of
> > the Flink retraction. All suggestions and comments are welcome.
> >
> >
> > *Design doc:*
> > https://docs.google.com/document/d/18XlGPcfsGbnPSApRipJDLPg5IFNGT
> > Qjnz7emkVpZlkw
> >
> > *Introduction:*
> >
> > "Retraction" is an important building block for data streaming to refine
> > the early fired results in streaming. “Early firing” are very common and
> > widely used in many streaming scenarios, for instance “window-less” or
> > unbounded aggregate and stream-stream inner join, windowed (with early
> > firing) aggregate and stream-stream inner join. As described in Streaming
> > 102, there are mainly two cases that require retractions: 1) update on
> the
> > keyed table (the key is either a primaryKey (PK) on source table, or a
> > groupKey/partitionKey in an aggregate); 2) When dynamic windows (e.g.,
> > session window) are in use, the new value may be replacing more than one
> > previous window due to window merging.
> >
> > To the best of our knowledge, the retraction for the early fired
> streaming
> > results has never been practically solved before. In this proposal, we
> > develop a retraction solution and explain how it works for the problem of
> > “update on the keyed table”. The same solution can be easily extended for
> > the dynamic windows merging, as the key component of retraction - how to
> > refine an early fired results - is the same across different problems.
> >
> > *Master Jira: *
> > https://issues.apache.org/jira/browse/FLINK-6047
> >
> >
> > Regards,
> > Shaoxuan
> >
>