This proposes to refactor the RPC service and the coordination between
Client, JobManager, and TaskManager to use the Akka actor library. Even though Akka is written in Scala, it offers a Java interface and we can use Akka completely from Java. Below are a list of arguments why this would help the system: Problems with the current RPC service: -------------------------------------------------------- - No asynchronous calls with callbacks. This is the reason why several parts of the runtime poll the status, introducing unnecessary latency. - No exception forwarding (many exceptions are simply swallowed), making debugging and operation in flaky environments very hard - Limited number of handler threads. The RPC can only handle a fix number of concurrent requests, forcing you to maintain separate thread pools to delegate actions to - No support for primitive data types (or boxed primitives) as arguments, everything has to be a specially serializable type - Problematic threading model. The RPC continuously spawns and terminates threads Benefits of switching to the Akka actor model: ------------------------------------------------------------------------------- - Akka solves all of the above issues out of the box - The supervisor model allows you to do failure detection of actors. That provides a unified way of detecting and handling failures (missing heartbeats, failed calls, ...) - Akka has tools to make stateful actors persistent and restart them on other machines in cases of failure. That would greatly help in implementing "master fail-over", which will become important - You can define many "call targets" (actors). Tasks (on taskmanagers) can directly call their ExecutionVertex on the JobManager, rather than calling the JobManager, creating a Runnable that looks up the execution vertex, and so on... - The actor model's approach to queue actions on an actor and run the one after another makes the concurrency model of the state machine very simple and robust - We "outsource" our own concerns about maintaining and improving that part of the system Greetings, Stephan |
I agree with using Akka for RPC. It is ASF 2.0 licensed, seems to have a
big community [1] and users [2] that depend on the system. The YARN client is also using the old RPC service. I would like to rewrite it with Akka once we have added it into the other parts of the system, to learn it. [1] https://github.com/akka/akka/pulls [2] http://doc.akka.io/docs/akka/2.0.4/additional/companies-using-akka.html On Fri, Sep 5, 2014 at 1:34 PM, Stephan Ewen <[hidden email]> wrote: > This proposes to refactor the RPC service and the coordination between > Client, JobManager, and TaskManager to use the Akka actor library. > > Even though Akka is written in Scala, it offers a Java interface and we can > use Akka completely from Java. > > Below are a list of arguments why this would help the system: > > > Problems with the current RPC service: > -------------------------------------------------------- > > - No asynchronous calls with callbacks. This is the reason why several > parts of the runtime poll the status, introducing unnecessary latency. > > - No exception forwarding (many exceptions are simply swallowed), making > debugging and operation in flaky environments very hard > > - Limited number of handler threads. The RPC can only handle a fix number > of concurrent requests, forcing you to maintain separate thread pools to > delegate actions to > > - No support for primitive data types (or boxed primitives) as arguments, > everything has to be a specially serializable type > > - Problematic threading model. The RPC continuously spawns and terminates > threads > > > > Benefits of switching to the Akka actor model: > > ------------------------------------------------------------------------------- > > - Akka solves all of the above issues out of the box > > - The supervisor model allows you to do failure detection of actors. That > provides a unified way of detecting and handling failures (missing > heartbeats, failed calls, ...) > > - Akka has tools to make stateful actors persistent and restart them on > other machines in cases of failure. That would greatly help in implementing > "master fail-over", which will become important > > - You can define many "call targets" (actors). Tasks (on taskmanagers) > can directly call their ExecutionVertex on the JobManager, rather than > calling the JobManager, creating a Runnable that looks up the execution > vertex, and so on... > > - The actor model's approach to queue actions on an actor and run the one > after another makes the concurrency model of the state machine very simple > and robust > > - We "outsource" our own concerns about maintaining and improving that > part of the system > > Greetings, > Stephan > |
+1 for refactoring using Akka, the arguments are overwhelming.
On Fri, Sep 5, 2014 at 2:04 PM, Robert Metzger <[hidden email]> wrote: > I agree with using Akka for RPC. It is ASF 2.0 licensed, seems to have a > big community [1] and users [2] that depend on the system. > > The YARN client is also using the old RPC service. I would like to rewrite > it with Akka once we have added it into the other parts of the system, to > learn it. > > > [1] https://github.com/akka/akka/pulls > [2] > http://doc.akka.io/docs/akka/2.0.4/additional/companies-using-akka.html > > > > On Fri, Sep 5, 2014 at 1:34 PM, Stephan Ewen <[hidden email]> wrote: > > > This proposes to refactor the RPC service and the coordination between > > Client, JobManager, and TaskManager to use the Akka actor library. > > > > Even though Akka is written in Scala, it offers a Java interface and we > can > > use Akka completely from Java. > > > > Below are a list of arguments why this would help the system: > > > > > > Problems with the current RPC service: > > -------------------------------------------------------- > > > > - No asynchronous calls with callbacks. This is the reason why several > > parts of the runtime poll the status, introducing unnecessary latency. > > > > - No exception forwarding (many exceptions are simply swallowed), > making > > debugging and operation in flaky environments very hard > > > > - Limited number of handler threads. The RPC can only handle a fix > number > > of concurrent requests, forcing you to maintain separate thread pools to > > delegate actions to > > > > - No support for primitive data types (or boxed primitives) as > arguments, > > everything has to be a specially serializable type > > > > - Problematic threading model. The RPC continuously spawns and > terminates > > threads > > > > > > > > Benefits of switching to the Akka actor model: > > > > > ------------------------------------------------------------------------------- > > > > - Akka solves all of the above issues out of the box > > > > - The supervisor model allows you to do failure detection of actors. > That > > provides a unified way of detecting and handling failures (missing > > heartbeats, failed calls, ...) > > > > - Akka has tools to make stateful actors persistent and restart them on > > other machines in cases of failure. That would greatly help in > implementing > > "master fail-over", which will become important > > > > - You can define many "call targets" (actors). Tasks (on taskmanagers) > > can directly call their ExecutionVertex on the JobManager, rather than > > calling the JobManager, creating a Runnable that looks up the execution > > vertex, and so on... > > > > - The actor model's approach to queue actions on an actor and run the > one > > after another makes the concurrency model of the state machine very > simple > > and robust > > > > - We "outsource" our own concerns about maintaining and improving that > > part of the system > > > > Greetings, > > Stephan > > > |
+1
On Fri, Sep 5, 2014 at 2:25 PM, Kostas Tzoumas <[hidden email]> wrote: > +1 for refactoring using Akka, the arguments are overwhelming. > > > On Fri, Sep 5, 2014 at 2:04 PM, Robert Metzger <[hidden email]> > wrote: > > > I agree with using Akka for RPC. It is ASF 2.0 licensed, seems to have a > > big community [1] and users [2] that depend on the system. > > > > The YARN client is also using the old RPC service. I would like to > rewrite > > it with Akka once we have added it into the other parts of the system, to > > learn it. > > > > > > [1] https://github.com/akka/akka/pulls > > [2] > > http://doc.akka.io/docs/akka/2.0.4/additional/companies-using-akka.html > > > > > > > > On Fri, Sep 5, 2014 at 1:34 PM, Stephan Ewen <[hidden email]> wrote: > > > > > This proposes to refactor the RPC service and the coordination between > > > Client, JobManager, and TaskManager to use the Akka actor library. > > > > > > Even though Akka is written in Scala, it offers a Java interface and we > > can > > > use Akka completely from Java. > > > > > > Below are a list of arguments why this would help the system: > > > > > > > > > Problems with the current RPC service: > > > -------------------------------------------------------- > > > > > > - No asynchronous calls with callbacks. This is the reason why > several > > > parts of the runtime poll the status, introducing unnecessary latency. > > > > > > - No exception forwarding (many exceptions are simply swallowed), > > making > > > debugging and operation in flaky environments very hard > > > > > > - Limited number of handler threads. The RPC can only handle a fix > > number > > > of concurrent requests, forcing you to maintain separate thread pools > to > > > delegate actions to > > > > > > - No support for primitive data types (or boxed primitives) as > > arguments, > > > everything has to be a specially serializable type > > > > > > - Problematic threading model. The RPC continuously spawns and > > terminates > > > threads > > > > > > > > > > > > Benefits of switching to the Akka actor model: > > > > > > > > > ------------------------------------------------------------------------------- > > > > > > - Akka solves all of the above issues out of the box > > > > > > - The supervisor model allows you to do failure detection of actors. > > That > > > provides a unified way of detecting and handling failures (missing > > > heartbeats, failed calls, ...) > > > > > > - Akka has tools to make stateful actors persistent and restart them > on > > > other machines in cases of failure. That would greatly help in > > implementing > > > "master fail-over", which will become important > > > > > > - You can define many "call targets" (actors). Tasks (on > taskmanagers) > > > can directly call their ExecutionVertex on the JobManager, rather than > > > calling the JobManager, creating a Runnable that looks up the execution > > > vertex, and so on... > > > > > > - The actor model's approach to queue actions on an actor and run the > > one > > > after another makes the concurrency model of the state machine very > > simple > > > and robust > > > > > > - We "outsource" our own concerns about maintaining and improving > that > > > part of the system > > > > > > Greetings, > > > Stephan > > > > > > |
+1
On Fri, Sep 5, 2014 at 2:53 PM, Ufuk Celebi <[hidden email]> wrote: > +1 > > > On Fri, Sep 5, 2014 at 2:25 PM, Kostas Tzoumas <[hidden email]> > wrote: > > > +1 for refactoring using Akka, the arguments are overwhelming. > > > > > > On Fri, Sep 5, 2014 at 2:04 PM, Robert Metzger <[hidden email]> > > wrote: > > > > > I agree with using Akka for RPC. It is ASF 2.0 licensed, seems to have > a > > > big community [1] and users [2] that depend on the system. > > > > > > The YARN client is also using the old RPC service. I would like to > > rewrite > > > it with Akka once we have added it into the other parts of the system, > to > > > learn it. > > > > > > > > > [1] https://github.com/akka/akka/pulls > > > [2] > > > > http://doc.akka.io/docs/akka/2.0.4/additional/companies-using-akka.html > > > > > > > > > > > > On Fri, Sep 5, 2014 at 1:34 PM, Stephan Ewen <[hidden email]> wrote: > > > > > > > This proposes to refactor the RPC service and the coordination > between > > > > Client, JobManager, and TaskManager to use the Akka actor library. > > > > > > > > Even though Akka is written in Scala, it offers a Java interface and > we > > > can > > > > use Akka completely from Java. > > > > > > > > Below are a list of arguments why this would help the system: > > > > > > > > > > > > Problems with the current RPC service: > > > > -------------------------------------------------------- > > > > > > > > - No asynchronous calls with callbacks. This is the reason why > > several > > > > parts of the runtime poll the status, introducing unnecessary > latency. > > > > > > > > - No exception forwarding (many exceptions are simply swallowed), > > > making > > > > debugging and operation in flaky environments very hard > > > > > > > > - Limited number of handler threads. The RPC can only handle a fix > > > number > > > > of concurrent requests, forcing you to maintain separate thread pools > > to > > > > delegate actions to > > > > > > > > - No support for primitive data types (or boxed primitives) as > > > arguments, > > > > everything has to be a specially serializable type > > > > > > > > - Problematic threading model. The RPC continuously spawns and > > > terminates > > > > threads > > > > > > > > > > > > > > > > Benefits of switching to the Akka actor model: > > > > > > > > > > > > > > ------------------------------------------------------------------------------- > > > > > > > > - Akka solves all of the above issues out of the box > > > > > > > > - The supervisor model allows you to do failure detection of > actors. > > > That > > > > provides a unified way of detecting and handling failures (missing > > > > heartbeats, failed calls, ...) > > > > > > > > - Akka has tools to make stateful actors persistent and restart > them > > on > > > > other machines in cases of failure. That would greatly help in > > > implementing > > > > "master fail-over", which will become important > > > > > > > > - You can define many "call targets" (actors). Tasks (on > > taskmanagers) > > > > can directly call their ExecutionVertex on the JobManager, rather > than > > > > calling the JobManager, creating a Runnable that looks up the > execution > > > > vertex, and so on... > > > > > > > > - The actor model's approach to queue actions on an actor and run > the > > > one > > > > after another makes the concurrency model of the state machine very > > > simple > > > > and robust > > > > > > > > - We "outsource" our own concerns about maintaining and improving > > that > > > > part of the system > > > > > > > > Greetings, > > > > Stephan > > > > > > > > > > |
+1
On Fri, Sep 5, 2014 at 3:04 PM, Stephan Ewen <[hidden email]> wrote: > +1 > > > On Fri, Sep 5, 2014 at 2:53 PM, Ufuk Celebi <[hidden email]> wrote: > > > +1 > > > > > > On Fri, Sep 5, 2014 at 2:25 PM, Kostas Tzoumas <[hidden email]> > > wrote: > > > > > +1 for refactoring using Akka, the arguments are overwhelming. > > > > > > > > > On Fri, Sep 5, 2014 at 2:04 PM, Robert Metzger <[hidden email]> > > > wrote: > > > > > > > I agree with using Akka for RPC. It is ASF 2.0 licensed, seems to > have > > a > > > > big community [1] and users [2] that depend on the system. > > > > > > > > The YARN client is also using the old RPC service. I would like to > > > rewrite > > > > it with Akka once we have added it into the other parts of the > system, > > to > > > > learn it. > > > > > > > > > > > > [1] https://github.com/akka/akka/pulls > > > > [2] > > > > > > http://doc.akka.io/docs/akka/2.0.4/additional/companies-using-akka.html > > > > > > > > > > > > > > > > On Fri, Sep 5, 2014 at 1:34 PM, Stephan Ewen <[hidden email]> > wrote: > > > > > > > > > This proposes to refactor the RPC service and the coordination > > between > > > > > Client, JobManager, and TaskManager to use the Akka actor library. > > > > > > > > > > Even though Akka is written in Scala, it offers a Java interface > and > > we > > > > can > > > > > use Akka completely from Java. > > > > > > > > > > Below are a list of arguments why this would help the system: > > > > > > > > > > > > > > > Problems with the current RPC service: > > > > > -------------------------------------------------------- > > > > > > > > > > - No asynchronous calls with callbacks. This is the reason why > > > several > > > > > parts of the runtime poll the status, introducing unnecessary > > latency. > > > > > > > > > > - No exception forwarding (many exceptions are simply swallowed), > > > > making > > > > > debugging and operation in flaky environments very hard > > > > > > > > > > - Limited number of handler threads. The RPC can only handle a > fix > > > > number > > > > > of concurrent requests, forcing you to maintain separate thread > pools > > > to > > > > > delegate actions to > > > > > > > > > > - No support for primitive data types (or boxed primitives) as > > > > arguments, > > > > > everything has to be a specially serializable type > > > > > > > > > > - Problematic threading model. The RPC continuously spawns and > > > > terminates > > > > > threads > > > > > > > > > > > > > > > > > > > > Benefits of switching to the Akka actor model: > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------- > > > > > > > > > > - Akka solves all of the above issues out of the box > > > > > > > > > > - The supervisor model allows you to do failure detection of > > actors. > > > > That > > > > > provides a unified way of detecting and handling failures (missing > > > > > heartbeats, failed calls, ...) > > > > > > > > > > - Akka has tools to make stateful actors persistent and restart > > them > > > on > > > > > other machines in cases of failure. That would greatly help in > > > > implementing > > > > > "master fail-over", which will become important > > > > > > > > > > - You can define many "call targets" (actors). Tasks (on > > > taskmanagers) > > > > > can directly call their ExecutionVertex on the JobManager, rather > > than > > > > > calling the JobManager, creating a Runnable that looks up the > > execution > > > > > vertex, and so on... > > > > > > > > > > - The actor model's approach to queue actions on an actor and run > > the > > > > one > > > > > after another makes the concurrency model of the state machine very > > > > simple > > > > > and robust > > > > > > > > > > - We "outsource" our own concerns about maintaining and improving > > > that > > > > > part of the system > > > > > > > > > > Greetings, > > > > > Stephan > > > > > > > > > > > > > > > |
+1
2014-09-05 6:46 GMT-07:00 Till Rohrmann <[hidden email]>: > +1 > > > On Fri, Sep 5, 2014 at 3:04 PM, Stephan Ewen <[hidden email]> wrote: > > > +1 > > > > > > On Fri, Sep 5, 2014 at 2:53 PM, Ufuk Celebi <[hidden email]> wrote: > > > > > +1 > > > > > > > > > On Fri, Sep 5, 2014 at 2:25 PM, Kostas Tzoumas <[hidden email]> > > > wrote: > > > > > > > +1 for refactoring using Akka, the arguments are overwhelming. > > > > > > > > > > > > On Fri, Sep 5, 2014 at 2:04 PM, Robert Metzger <[hidden email]> > > > > wrote: > > > > > > > > > I agree with using Akka for RPC. It is ASF 2.0 licensed, seems to > > have > > > a > > > > > big community [1] and users [2] that depend on the system. > > > > > > > > > > The YARN client is also using the old RPC service. I would like to > > > > rewrite > > > > > it with Akka once we have added it into the other parts of the > > system, > > > to > > > > > learn it. > > > > > > > > > > > > > > > [1] https://github.com/akka/akka/pulls > > > > > [2] > > > > > > > > > http://doc.akka.io/docs/akka/2.0.4/additional/companies-using-akka.html > > > > > > > > > > > > > > > > > > > > On Fri, Sep 5, 2014 at 1:34 PM, Stephan Ewen <[hidden email]> > > wrote: > > > > > > > > > > > This proposes to refactor the RPC service and the coordination > > > between > > > > > > Client, JobManager, and TaskManager to use the Akka actor > library. > > > > > > > > > > > > Even though Akka is written in Scala, it offers a Java interface > > and > > > we > > > > > can > > > > > > use Akka completely from Java. > > > > > > > > > > > > Below are a list of arguments why this would help the system: > > > > > > > > > > > > > > > > > > Problems with the current RPC service: > > > > > > -------------------------------------------------------- > > > > > > > > > > > > - No asynchronous calls with callbacks. This is the reason why > > > > several > > > > > > parts of the runtime poll the status, introducing unnecessary > > > latency. > > > > > > > > > > > > - No exception forwarding (many exceptions are simply > swallowed), > > > > > making > > > > > > debugging and operation in flaky environments very hard > > > > > > > > > > > > - Limited number of handler threads. The RPC can only handle a > > fix > > > > > number > > > > > > of concurrent requests, forcing you to maintain separate thread > > pools > > > > to > > > > > > delegate actions to > > > > > > > > > > > > - No support for primitive data types (or boxed primitives) as > > > > > arguments, > > > > > > everything has to be a specially serializable type > > > > > > > > > > > > - Problematic threading model. The RPC continuously spawns and > > > > > terminates > > > > > > threads > > > > > > > > > > > > > > > > > > > > > > > > Benefits of switching to the Akka actor model: > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------- > > > > > > > > > > > > - Akka solves all of the above issues out of the box > > > > > > > > > > > > - The supervisor model allows you to do failure detection of > > > actors. > > > > > That > > > > > > provides a unified way of detecting and handling failures > (missing > > > > > > heartbeats, failed calls, ...) > > > > > > > > > > > > - Akka has tools to make stateful actors persistent and restart > > > them > > > > on > > > > > > other machines in cases of failure. That would greatly help in > > > > > implementing > > > > > > "master fail-over", which will become important > > > > > > > > > > > > - You can define many "call targets" (actors). Tasks (on > > > > taskmanagers) > > > > > > can directly call their ExecutionVertex on the JobManager, rather > > > than > > > > > > calling the JobManager, creating a Runnable that looks up the > > > execution > > > > > > vertex, and so on... > > > > > > > > > > > > - The actor model's approach to queue actions on an actor and > run > > > the > > > > > one > > > > > > after another makes the concurrency model of the state machine > very > > > > > simple > > > > > > and robust > > > > > > > > > > > > - We "outsource" our own concerns about maintaining and > improving > > > > that > > > > > > part of the system > > > > > > > > > > > > Greetings, > > > > > > Stephan > > > > > > > > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |