Stream iteration head as ConnectedDataStream

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Stream iteration head as ConnectedDataStream

Gyula Fóra-2
Hey!

Now that we are implementing more and more applications for streaming that
use iterations we realized a huge shortcoming of the current iteration api.
Currently it only allows to feedback data of the same type to the iteration
head.

This makes sense because the operators are typed but makes it awkward if we
indeed want to use a different feedback (such as a model syncing for
 machine learning applications). To do this developers need to use wrapper
types and flags to distinguish the inputs.

I propose to add the possibility to tread the original input of the
iteration head operator and the feedback stream as a ConnectedDataStream so
we can apply operators such as CoMap, CoFlatMap etc. This helps
distinguishing the inputs and also allows different feedback types to be
used. I believe this change is inevitable if we want to write elegant
applications without unnecessary wrapper types.

I made a PR <https://github.com/apache/flink/pull/870> already that
introduces this functionality (it is a very small change in fact).

Cheers,
Gyula
Reply | Threaded
Open this post in threaded view
|

Re: Stream iteration head as ConnectedDataStream

Paris Carbone
That’s convenient, at least for the incremental ML where feedback streams are the norm.
In this case we don’t force the user to create wrappers and we also know what comes from where.

I went through the PR and it looks that doesn’t break anything so you have my +1.

Paris


> On 26 Jun 2015, at 13:35, Gyula Fóra <[hidden email]> wrote:
>
> Hey!
>
> Now that we are implementing more and more applications for streaming that
> use iterations we realized a huge shortcoming of the current iteration api.
> Currently it only allows to feedback data of the same type to the iteration
> head.
>
> This makes sense because the operators are typed but makes it awkward if we
> indeed want to use a different feedback (such as a model syncing for
> machine learning applications). To do this developers need to use wrapper
> types and flags to distinguish the inputs.
>
> I propose to add the possibility to tread the original input of the
> iteration head operator and the feedback stream as a ConnectedDataStream so
> we can apply operators such as CoMap, CoFlatMap etc. This helps
> distinguishing the inputs and also allows different feedback types to be
> used. I believe this change is inevitable if we want to write elegant
> applications without unnecessary wrapper types.
>
> I made a PR <https://github.com/apache/flink/pull/870> already that
> introduces this functionality (it is a very small change in fact).
>
> Cheers,
> Gyula