[flink-streaming] Regarding loops in the Job Graph

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[flink-streaming] Regarding loops in the Job Graph

Paris Carbone
Hello,

While implementing the SAMOA adapter for Flink-Streaming we stumbled upon the need to allow loops (or circular dependencies) in the job graph. Many incremental machine learning tasks define loops already  and there is no trivial way of getting around it. In the streaming job graph builder there is only a check that does not allow the user to submit graphs with loops, however, from what Gyula told me, if the check is removed the streaming job runs as expected. Is there (still) a major reason for having this check, at least in the streaming component?

Paris
Reply | Threaded
Open this post in threaded view
|

Re: [flink-streaming] Regarding loops in the Job Graph

Stephan Ewen
Hi Paris!

The Streaming API allows you to define iterations, where parts of the
stream are fed back. Do those work for you?

In general, cyclic flows are a tricky thing, as the topological order of
operators is needed for scheduling (may not be important for continuous
streams) but also for a clear producer/consumer relationship, which is
important for fault tolerance techniques.

Currently, the JobManager topologically sorts the job graph and starts
scheduling operators. I am surprised to hear that a graph with cyclic
dependencies works...


Stephan




Stephan


On Wed, Jan 21, 2015 at 2:57 AM, Paris Carbone <[hidden email]> wrote:

> Hello,
>
> While implementing the SAMOA adapter for Flink-Streaming we stumbled upon
> the need to allow loops (or circular dependencies) in the job graph. Many
> incremental machine learning tasks define loops already  and there is no
> trivial way of getting around it. In the streaming job graph builder there
> is only a check that does not allow the user to submit graphs with loops,
> however, from what Gyula told me, if the check is removed the streaming job
> runs as expected. Is there (still) a major reason for having this check, at
> least in the streaming component?
>
> Paris
Reply | Threaded
Open this post in threaded view
|

Re: [flink-streaming] Regarding loops in the Job Graph

Paris Carbone
Thanks for the quick answers!
It is possible to use iterations, we could detect circles while building the samoa topology and convert them into iterations. It is perhaps the proper way to go. I just thought whether we could hack around it but we better avoid messing with cyclic dependences.

Paris

> On 21 Jan 2015, at 19:36, Stephan Ewen <[hidden email]> wrote:
>  
> Hi Paris!
>
> The Streaming API allows you to define iterations, where parts of the
> stream are fed back. Do those work for you?
>
> In general, cyclic flows are a tricky thing, as the topological order of
> operators is needed for scheduling (may not be important for continuous
> streams) but also for a clear producer/consumer relationship, which is
> important for fault tolerance techniques.
>
> Currently, the JobManager topologically sorts the job graph and starts
> scheduling operators. I am surprised to hear that a graph with cyclic
> dependencies works...
>
>
> Stephan
>
>
>
>
> Stephan
>
>
> On Wed, Jan 21, 2015 at 2:57 AM, Paris Carbone <[hidden email]> wrote:
>
>> Hello,
>>
>> While implementing the SAMOA adapter for Flink-Streaming we stumbled upon
>> the need to allow loops (or circular dependencies) in the job graph. Many
>> incremental machine learning tasks define loops already  and there is no
>> trivial way of getting around it. In the streaming job graph builder there
>> is only a check that does not allow the user to submit graphs with loops,
>> however, from what Gyula told me, if the check is removed the streaming job
>> runs as expected. Is there (still) a major reason for having this check, at
>> least in the streaming component?
>>
>> Paris

Reply | Threaded
Open this post in threaded view
|

Re: [flink-streaming] Regarding loops in the Job Graph

Stephan Ewen
If this becomes a strong requirement, then we can look into relaxing the
constraints (and then have some features not supported on cyclic flows).

I just wanted to get a discussion started about the different angles of
approach, and what may be the simplest way to do this...

On Thu, Jan 22, 2015 at 4:47 AM, Paris Carbone <[hidden email]> wrote:

> Thanks for the quick answers!
> It is possible to use iterations, we could detect circles while building
> the samoa topology and convert them into iterations. It is perhaps the
> proper way to go. I just thought whether we could hack around it but we
> better avoid messing with cyclic dependences.
>
> Paris
>
> > On 21 Jan 2015, at 19:36, Stephan Ewen <[hidden email]> wrote:
> >
> > Hi Paris!
> >
> > The Streaming API allows you to define iterations, where parts of the
> > stream are fed back. Do those work for you?
> >
> > In general, cyclic flows are a tricky thing, as the topological order of
> > operators is needed for scheduling (may not be important for continuous
> > streams) but also for a clear producer/consumer relationship, which is
> > important for fault tolerance techniques.
> >
> > Currently, the JobManager topologically sorts the job graph and starts
> > scheduling operators. I am surprised to hear that a graph with cyclic
> > dependencies works...
> >
> >
> > Stephan
> >
> >
> >
> >
> > Stephan
> >
> >
> > On Wed, Jan 21, 2015 at 2:57 AM, Paris Carbone <[hidden email]> wrote:
> >
> >> Hello,
> >>
> >> While implementing the SAMOA adapter for Flink-Streaming we stumbled
> upon
> >> the need to allow loops (or circular dependencies) in the job graph.
> Many
> >> incremental machine learning tasks define loops already  and there is no
> >> trivial way of getting around it. In the streaming job graph builder
> there
> >> is only a check that does not allow the user to submit graphs with
> loops,
> >> however, from what Gyula told me, if the check is removed the streaming
> job
> >> runs as expected. Is there (still) a major reason for having this
> check, at
> >> least in the streaming component?
> >>
> >> Paris
>
>