Hello,
While implementing the SAMOA adapter for Flink-Streaming we stumbled upon the need to allow loops (or circular dependencies) in the job graph. Many incremental machine learning tasks define loops already and there is no trivial way of getting around it. In the streaming job graph builder there is only a check that does not allow the user to submit graphs with loops, however, from what Gyula told me, if the check is removed the streaming job runs as expected. Is there (still) a major reason for having this check, at least in the streaming component? Paris |
Hi Paris!
The Streaming API allows you to define iterations, where parts of the stream are fed back. Do those work for you? In general, cyclic flows are a tricky thing, as the topological order of operators is needed for scheduling (may not be important for continuous streams) but also for a clear producer/consumer relationship, which is important for fault tolerance techniques. Currently, the JobManager topologically sorts the job graph and starts scheduling operators. I am surprised to hear that a graph with cyclic dependencies works... Stephan Stephan On Wed, Jan 21, 2015 at 2:57 AM, Paris Carbone <[hidden email]> wrote: > Hello, > > While implementing the SAMOA adapter for Flink-Streaming we stumbled upon > the need to allow loops (or circular dependencies) in the job graph. Many > incremental machine learning tasks define loops already and there is no > trivial way of getting around it. In the streaming job graph builder there > is only a check that does not allow the user to submit graphs with loops, > however, from what Gyula told me, if the check is removed the streaming job > runs as expected. Is there (still) a major reason for having this check, at > least in the streaming component? > > Paris |
Thanks for the quick answers!
It is possible to use iterations, we could detect circles while building the samoa topology and convert them into iterations. It is perhaps the proper way to go. I just thought whether we could hack around it but we better avoid messing with cyclic dependences. Paris > On 21 Jan 2015, at 19:36, Stephan Ewen <[hidden email]> wrote: > > Hi Paris! > > The Streaming API allows you to define iterations, where parts of the > stream are fed back. Do those work for you? > > In general, cyclic flows are a tricky thing, as the topological order of > operators is needed for scheduling (may not be important for continuous > streams) but also for a clear producer/consumer relationship, which is > important for fault tolerance techniques. > > Currently, the JobManager topologically sorts the job graph and starts > scheduling operators. I am surprised to hear that a graph with cyclic > dependencies works... > > > Stephan > > > > > Stephan > > > On Wed, Jan 21, 2015 at 2:57 AM, Paris Carbone <[hidden email]> wrote: > >> Hello, >> >> While implementing the SAMOA adapter for Flink-Streaming we stumbled upon >> the need to allow loops (or circular dependencies) in the job graph. Many >> incremental machine learning tasks define loops already and there is no >> trivial way of getting around it. In the streaming job graph builder there >> is only a check that does not allow the user to submit graphs with loops, >> however, from what Gyula told me, if the check is removed the streaming job >> runs as expected. Is there (still) a major reason for having this check, at >> least in the streaming component? >> >> Paris |
If this becomes a strong requirement, then we can look into relaxing the
constraints (and then have some features not supported on cyclic flows). I just wanted to get a discussion started about the different angles of approach, and what may be the simplest way to do this... On Thu, Jan 22, 2015 at 4:47 AM, Paris Carbone <[hidden email]> wrote: > Thanks for the quick answers! > It is possible to use iterations, we could detect circles while building > the samoa topology and convert them into iterations. It is perhaps the > proper way to go. I just thought whether we could hack around it but we > better avoid messing with cyclic dependences. > > Paris > > > On 21 Jan 2015, at 19:36, Stephan Ewen <[hidden email]> wrote: > > > > Hi Paris! > > > > The Streaming API allows you to define iterations, where parts of the > > stream are fed back. Do those work for you? > > > > In general, cyclic flows are a tricky thing, as the topological order of > > operators is needed for scheduling (may not be important for continuous > > streams) but also for a clear producer/consumer relationship, which is > > important for fault tolerance techniques. > > > > Currently, the JobManager topologically sorts the job graph and starts > > scheduling operators. I am surprised to hear that a graph with cyclic > > dependencies works... > > > > > > Stephan > > > > > > > > > > Stephan > > > > > > On Wed, Jan 21, 2015 at 2:57 AM, Paris Carbone <[hidden email]> wrote: > > > >> Hello, > >> > >> While implementing the SAMOA adapter for Flink-Streaming we stumbled > upon > >> the need to allow loops (or circular dependencies) in the job graph. > Many > >> incremental machine learning tasks define loops already and there is no > >> trivial way of getting around it. In the streaming job graph builder > there > >> is only a check that does not allow the user to submit graphs with > loops, > >> however, from what Gyula told me, if the check is removed the streaming > job > >> runs as expected. Is there (still) a major reason for having this > check, at > >> least in the streaming component? > >> > >> Paris > > |
Free forum by Nabble | Edit this page |