Replacing JobManager with Scala implementation

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Replacing JobManager with Scala implementation

Till Rohrmann
Hi guys,

I currently working on replacing the old rpc infrastructure with an akka
based actor system. In the wake of this change I will reimplement the
JobManager and TaskManager which will then be actors. Akka offers a Java
API but the implementation turns out to be very verbose and laborious,
because Java 6 and 7 do not support lambdas and pattern matching. Using
Scala instead, would allow a far more succinct and clear implementation of
the JobManager and TaskManager. Instead of a lot of if statements using
instanceof to figure out the message type, we could simply use pattern
matching. Furthermore, the callback functions could simply be Scala's
anonymous functions. Therefore I would propose to use Scala for these two
systems.

The Akka system uses the slf4j library as logging interface. Therefore I
would also propose to replace the jcl logging system with the slf4j logging
system. Since we want to use Akka in many parts of the runtime system and
it recommends using logback as logging backend, I would also like to
replace log4j with logback. But this change should inflict only few changes
once we established the slf4j logging interface everywhere.

What do you guys think of that idea?

Best regards,

Till
Reply | Threaded
Open this post in threaded view
|

Re: Replacing JobManager with Scala implementation

Robert Metzger
Hi Till,

I guess changing the logging framework is not going to be a huge problem.
If you think that it will be better to switch the logging framework, I
trust you.

Changing the programming language of a very important system component is
something we should carefully discuss.
I understand that Akka is written in Scala and that it will be much more
natural to implement the actor based system using Scala.
I see the following issues that we should consider:
Until now, Flink is clearly a project implemented only in Java. The Scala
API basically sits on top of the Java-based runtime. We do not really
depend on Scala (we could easily remove the Scala API if we want to).
Having code written in Scala in the main system will add a hard dependency
on a scala version.
Being a pure Java project has some advantages: I think its a fact that
there are more Java programmers than Scala programmers. So our chances of
attracting new contributors are higher when being a Java project.
On the other hand, we could maybe attract Scala developers to our project.
But that has not happened (for contributors, not users!) so far for our
Scala API, so I don't see any reason for that to happen.

Another issue is tooling: There are a lot of problems with Scala and
Eclipse: I've recently switched to Eclipse Luna. It seems to be impossible
to compile Scala code with Luna because ScalaIDE does not properly cope
with it.
Even with Eclipse versions that are supported by ScalaIDE, you have to
manually install 3 plugins, some of them are not available in the Eclipse
Marketplace. So with a JobManager written in Scala, users can not just
import our project as a Maven project into Eclipse and start developing.
The support for Maven is probably also limited. For example, I don't know
if there is a checkstyle plugin for Scala.

I'm looking forward to hearing other opinions on this issue. As I said in
the beginning, we should exchange arguments on this and think about it for
some time before we decide on this.

Best,
Robert



On Thu, Aug 28, 2014 at 1:08 AM, Till Rohrmann <[hidden email]> wrote:

> Hi guys,
>
> I currently working on replacing the old rpc infrastructure with an akka
> based actor system. In the wake of this change I will reimplement the
> JobManager and TaskManager which will then be actors. Akka offers a Java
> API but the implementation turns out to be very verbose and laborious,
> because Java 6 and 7 do not support lambdas and pattern matching. Using
> Scala instead, would allow a far more succinct and clear implementation of
> the JobManager and TaskManager. Instead of a lot of if statements using
> instanceof to figure out the message type, we could simply use pattern
> matching. Furthermore, the callback functions could simply be Scala's
> anonymous functions. Therefore I would propose to use Scala for these two
> systems.
>
> The Akka system uses the slf4j library as logging interface. Therefore I
> would also propose to replace the jcl logging system with the slf4j logging
> system. Since we want to use Akka in many parts of the runtime system and
> it recommends using logback as logging backend, I would also like to
> replace log4j with logback. But this change should inflict only few changes
> once we established the slf4j logging interface everywhere.
>
> What do you guys think of that idea?
>
> Best regards,
>
> Till
>
Reply | Threaded
Open this post in threaded view
|

Re: Replacing JobManager with Scala implementation

Kostas Tzoumas-2
On Thu, Aug 28, 2014 at 11:49 AM, Robert Metzger <[hidden email]>
wrote:

>
> Changing the programming language of a very important system component is
> something we should carefully discuss.
>

Definitely agree, I think the community should discuss this very carefully.


> I understand that Akka is written in Scala and that it will be much more
> natural to implement the actor based system using Scala.
> I see the following issues that we should consider:
> Until now, Flink is clearly a project implemented only in Java. The Scala
> API basically sits on top of the Java-based runtime. We do not really
> depend on Scala (we could easily remove the Scala API if we want to).
> Having code written in Scala in the main system will add a hard dependency
> on a scala version.
> Being a pure Java project has some advantages: I think its a fact that
> there are more Java programmers than Scala programmers. So our chances of
> attracting new contributors are higher when being a Java project.
> On the other hand, we could maybe attract Scala developers to our project.
> But that has not happened (for contributors, not users!) so far for our
> Scala API, so I don't see any reason for that to happen.
>
>
This is definitely an issue to consider. We need to carefully weight how
important this issue is. If we want to break things, incubation is the
right time to do it. Below are some arguments in favor of breaking things,
but do keep in mind that I am undecided, and I would really like to see the
community weighing in.

First, I would dare say that the primary reason for someone to contribute
to Flink so far has not been that the code is written in Java, but more the
content and nature of the project. Most contributors are Big Data
enthusiasts in some way or another.

Second, Scala projects have attracted contributors in the past.

Third, it should not be too hard for someone that does not know Scala to
contribute to a different component if the interfaces are clear.


> Another issue is tooling: There are a lot of problems with Scala and
> Eclipse: I've recently switched to Eclipse Luna. It seems to be impossible
> to compile Scala code with Luna because ScalaIDE does not properly cope
> with it.
> Even with Eclipse versions that are supported by ScalaIDE, you have to
> manually install 3 plugins, some of them are not available in the Eclipse
> Marketplace. So with a JobManager written in Scala, users can not just
> import our project as a Maven project into Eclipse and start developing.
> The support for Maven is probably also limited. For example, I don't know
> if there is a checkstyle plugin for Scala.
>
> I'm looking forward to hearing other opinions on this issue. As I said in
> the beginning, we should exchange arguments on this and think about it for
> some time before we decide on this.
>
Best,

> Robert
>
>
>
> On Thu, Aug 28, 2014 at 1:08 AM, Till Rohrmann <[hidden email]>
> wrote:
>
> > Hi guys,
> >
> > I currently working on replacing the old rpc infrastructure with an akka
> > based actor system. In the wake of this change I will reimplement the
> > JobManager and TaskManager which will then be actors. Akka offers a Java
> > API but the implementation turns out to be very verbose and laborious,
> > because Java 6 and 7 do not support lambdas and pattern matching. Using
> > Scala instead, would allow a far more succinct and clear implementation
> of
> > the JobManager and TaskManager. Instead of a lot of if statements using
> > instanceof to figure out the message type, we could simply use pattern
> > matching. Furthermore, the callback functions could simply be Scala's
> > anonymous functions. Therefore I would propose to use Scala for these two
> > systems.
> >
> > The Akka system uses the slf4j library as logging interface. Therefore I
> > would also propose to replace the jcl logging system with the slf4j
> logging
> > system. Since we want to use Akka in many parts of the runtime system and
> > it recommends using logback as logging backend, I would also like to
> > replace log4j with logback. But this change should inflict only few
> changes
> > once we established the slf4j logging interface everywhere.
> >
> > What do you guys think of that idea?
> >
> > Best regards,
> >
> > Till
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Replacing JobManager with Scala implementation

till.rohrmann
I also agree with Robert and Kostas that it has to be a community decision.
I understand the problems with Eclipse and the Scala IDE which is a pain in
the ass. But eventually these things will be fixed. Maybe we could also
talk to the typesafe guy and tell him that this problem bothers us a lot.

I also believe that the project is not about a specific programming
language but a problem we want to tackle with Flink. From time to time it
might be necessary to adapt the tools in order to reach the goal. In fact,
I don't believe that Scala parts would drive people away from the project.
Instead, Scala enthusiasts would be motivated to join us.

Actually I stumbled across a quote of Leibniz which put's my point of view
quite accurately in a nutshell:

In symbols one observes an advantage in discovery which is greatest when
they express the exact nature of a thing briefly and, as it were, picture
it; then indeed the labor of thought is wonderfully diminished -- Gottfried
Wilhelm Leibniz


On Thu, Aug 28, 2014 at 12:57 PM, Kostas Tzoumas <[hidden email]>
wrote:

> On Thu, Aug 28, 2014 at 11:49 AM, Robert Metzger <[hidden email]>
> wrote:
>
> >
> > Changing the programming language of a very important system component is
> > something we should carefully discuss.
> >
>
> Definitely agree, I think the community should discuss this very carefully.
>
>
> > I understand that Akka is written in Scala and that it will be much more
> > natural to implement the actor based system using Scala.
> > I see the following issues that we should consider:
> > Until now, Flink is clearly a project implemented only in Java. The Scala
> > API basically sits on top of the Java-based runtime. We do not really
> > depend on Scala (we could easily remove the Scala API if we want to).
> > Having code written in Scala in the main system will add a hard
> dependency
> > on a scala version.
> > Being a pure Java project has some advantages: I think its a fact that
> > there are more Java programmers than Scala programmers. So our chances of
> > attracting new contributors are higher when being a Java project.
> > On the other hand, we could maybe attract Scala developers to our
> project.
> > But that has not happened (for contributors, not users!) so far for our
> > Scala API, so I don't see any reason for that to happen.
> >
> >
> This is definitely an issue to consider. We need to carefully weight how
> important this issue is. If we want to break things, incubation is the
> right time to do it. Below are some arguments in favor of breaking things,
> but do keep in mind that I am undecided, and I would really like to see the
> community weighing in.
>
> First, I would dare say that the primary reason for someone to contribute
> to Flink so far has not been that the code is written in Java, but more the
> content and nature of the project. Most contributors are Big Data
> enthusiasts in some way or another.
>
> Second, Scala projects have attracted contributors in the past.
>
> Third, it should not be too hard for someone that does not know Scala to
> contribute to a different component if the interfaces are clear.
>
>
> > Another issue is tooling: There are a lot of problems with Scala and
> > Eclipse: I've recently switched to Eclipse Luna. It seems to be
> impossible
> > to compile Scala code with Luna because ScalaIDE does not properly cope
> > with it.
> > Even with Eclipse versions that are supported by ScalaIDE, you have to
> > manually install 3 plugins, some of them are not available in the Eclipse
> > Marketplace. So with a JobManager written in Scala, users can not just
> > import our project as a Maven project into Eclipse and start developing.
> > The support for Maven is probably also limited. For example, I don't know
> > if there is a checkstyle plugin for Scala.
> >
> > I'm looking forward to hearing other opinions on this issue. As I said in
> > the beginning, we should exchange arguments on this and think about it
> for
> > some time before we decide on this.
> >
> Best,
> > Robert
> >
> >
> >
> > On Thu, Aug 28, 2014 at 1:08 AM, Till Rohrmann <[hidden email]>
> > wrote:
> >
> > > Hi guys,
> > >
> > > I currently working on replacing the old rpc infrastructure with an
> akka
> > > based actor system. In the wake of this change I will reimplement the
> > > JobManager and TaskManager which will then be actors. Akka offers a
> Java
> > > API but the implementation turns out to be very verbose and laborious,
> > > because Java 6 and 7 do not support lambdas and pattern matching. Using
> > > Scala instead, would allow a far more succinct and clear implementation
> > of
> > > the JobManager and TaskManager. Instead of a lot of if statements using
> > > instanceof to figure out the message type, we could simply use pattern
> > > matching. Furthermore, the callback functions could simply be Scala's
> > > anonymous functions. Therefore I would propose to use Scala for these
> two
> > > systems.
> > >
> > > The Akka system uses the slf4j library as logging interface. Therefore
> I
> > > would also propose to replace the jcl logging system with the slf4j
> > logging
> > > system. Since we want to use Akka in many parts of the runtime system
> and
> > > it recommends using logback as logging backend, I would also like to
> > > replace log4j with logback. But this change should inflict only few
> > changes
> > > once we established the slf4j logging interface everywhere.
> > >
> > > What do you guys think of that idea?
> > >
> > > Best regards,
> > >
> > > Till
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Replacing JobManager with Scala implementation

Asterios Katsifodimos
In reply to this post by Kostas Tzoumas-2
I agree that using Akka's actors from Java results in very ugly code.
Hiding the internals of Akka behind Java reflection looks better but breaks
the principles of actors. For me it is kind of a deal breaker for using
Akka from Java.  I think that Till has more reasons to believe that Scala
would be a more appropriate for building a new Job/Task Manager.

I think that this discussion should focus on 4 main aspects:
1. Performance
2. Implementability
3. Maintainability
4. Available Tools

1. Performance: Since that the job of the JobManager and the TaskManager is
to 1) exchange messages in order to maintain a distributed state machine
and 2) setup connections between task managers, 3) detect failures etc.. In
these basic operations, performance should not be an issue. Akka was proven
to scale quite well with very low latency. I guess that the low level
"plumbing" (serialization, connections, etc.) will continue in Java in
order to guarantee high performance. I have no clue on what's happening
with memory management and whether this will be implemented in Java or
Scala and the respective consequences. Please comment.

2. Since the Job/Task Manager is going to be essentially implemented from
scratch, given the power of Akka, it seems to me that the implementation
will be   easier, shorter and less verbose in Scala, given that Till is
comfortable enough with Scala.

3. Given #2, maintaining the code and trying out new ideas in Scala would
take less time and effort. But maintaining low level plumbing in Java and
high level logic in Scala scares me. Anyone that has done this before could
comment on this?

4. Tools: Robert has raised some issues already but I think that tools will
get better with time.

Given the above, I would focus on #3 to be honest. Apart from this, going
the Scala way sounds like a great idea. I really second Kostas' opinion
that if large changes are going to happen, this is the best moment.

Cheers,
Asterios



On Fri, Aug 29, 2014 at 1:02 AM, Till Rohrmann <[hidden email]>
wrote:

> I also agree with Robert and Kostas that it has to be a community decision.
> I understand the problems with Eclipse and the Scala IDE which is a pain in
> the ass. But eventually these things will be fixed. Maybe we could also
> talk to the typesafe guy and tell him that this problem bothers us a lot.
>
> I also believe that the project is not about a specific programming
> language but a problem we want to tackle with Flink. From time to time it
> might be necessary to adapt the tools in order to reach the goal. In fact,
> I don't believe that Scala parts would drive people away from the project.
> Instead, Scala enthusiasts would be motivated to join us.
>
> Actually I stumbled across a quote of Leibniz which put's my point of view
> quite accurately in a nutshell:
>
> In symbols one observes an advantage in discovery which is greatest when
> they express the exact nature of a thing briefly and, as it were, picture
> it; then indeed the labor of thought is wonderfully diminished -- Gottfried
> Wilhelm Leibniz
>
>
> On Thu, Aug 28, 2014 at 12:57 PM, Kostas Tzoumas <[hidden email]>
> wrote:
>
> > On Thu, Aug 28, 2014 at 11:49 AM, Robert Metzger <[hidden email]>
> > wrote:
> >
> > >
> > > Changing the programming language of a very important system component
> is
> > > something we should carefully discuss.
> > >
> >
> > Definitely agree, I think the community should discuss this very
> carefully.
> >
> >
> > > I understand that Akka is written in Scala and that it will be much
> more
> > > natural to implement the actor based system using Scala.
> > > I see the following issues that we should consider:
> > > Until now, Flink is clearly a project implemented only in Java. The
> Scala
> > > API basically sits on top of the Java-based runtime. We do not really
> > > depend on Scala (we could easily remove the Scala API if we want to).
> > > Having code written in Scala in the main system will add a hard
> > dependency
> > > on a scala version.
> > > Being a pure Java project has some advantages: I think its a fact that
> > > there are more Java programmers than Scala programmers. So our chances
> of
> > > attracting new contributors are higher when being a Java project.
> > > On the other hand, we could maybe attract Scala developers to our
> > project.
> > > But that has not happened (for contributors, not users!) so far for our
> > > Scala API, so I don't see any reason for that to happen.
> > >
> > >
> > This is definitely an issue to consider. We need to carefully weight how
> > important this issue is. If we want to break things, incubation is the
> > right time to do it. Below are some arguments in favor of breaking
> things,
> > but do keep in mind that I am undecided, and I would really like to see
> the
> > community weighing in.
> >
> > First, I would dare say that the primary reason for someone to contribute
> > to Flink so far has not been that the code is written in Java, but more
> the
> > content and nature of the project. Most contributors are Big Data
> > enthusiasts in some way or another.
> >
> > Second, Scala projects have attracted contributors in the past.
> >
> > Third, it should not be too hard for someone that does not know Scala to
> > contribute to a different component if the interfaces are clear.
> >
> >
> > > Another issue is tooling: There are a lot of problems with Scala and
> > > Eclipse: I've recently switched to Eclipse Luna. It seems to be
> > impossible
> > > to compile Scala code with Luna because ScalaIDE does not properly cope
> > > with it.
> > > Even with Eclipse versions that are supported by ScalaIDE, you have to
> > > manually install 3 plugins, some of them are not available in the
> Eclipse
> > > Marketplace. So with a JobManager written in Scala, users can not just
> > > import our project as a Maven project into Eclipse and start
> developing.
> > > The support for Maven is probably also limited. For example, I don't
> know
> > > if there is a checkstyle plugin for Scala.
> > >
> > > I'm looking forward to hearing other opinions on this issue. As I said
> in
> > > the beginning, we should exchange arguments on this and think about it
> > for
> > > some time before we decide on this.
> > >
> > Best,
> > > Robert
> > >
> > >
> > >
> > > On Thu, Aug 28, 2014 at 1:08 AM, Till Rohrmann <[hidden email]>
> > > wrote:
> > >
> > > > Hi guys,
> > > >
> > > > I currently working on replacing the old rpc infrastructure with an
> > akka
> > > > based actor system. In the wake of this change I will reimplement the
> > > > JobManager and TaskManager which will then be actors. Akka offers a
> > Java
> > > > API but the implementation turns out to be very verbose and
> laborious,
> > > > because Java 6 and 7 do not support lambdas and pattern matching.
> Using
> > > > Scala instead, would allow a far more succinct and clear
> implementation
> > > of
> > > > the JobManager and TaskManager. Instead of a lot of if statements
> using
> > > > instanceof to figure out the message type, we could simply use
> pattern
> > > > matching. Furthermore, the callback functions could simply be Scala's
> > > > anonymous functions. Therefore I would propose to use Scala for these
> > two
> > > > systems.
> > > >
> > > > The Akka system uses the slf4j library as logging interface.
> Therefore
> > I
> > > > would also propose to replace the jcl logging system with the slf4j
> > > logging
> > > > system. Since we want to use Akka in many parts of the runtime system
> > and
> > > > it recommends using logback as logging backend, I would also like to
> > > > replace log4j with logback. But this change should inflict only few
> > > changes
> > > > once we established the slf4j logging interface everywhere.
> > > >
> > > > What do you guys think of that idea?
> > > >
> > > > Best regards,
> > > >
> > > > Till
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Replacing JobManager with Scala implementation

Daniel Warneke
Hi,

will akka just be used for RPC or are there any plans to expand the
actor-based model to further parts of the runtime system? If so, could
you please point me to the discussion thread?

Spontaneously, I would say that adding a hard dependency on Scala just
for the sake of having a hip RPC service sounds like a pretty dodgy
deal. Therefore, I would like understand how much value akka could bring
to Flink in the long run. The discussion whether to reimplement core
components of the system in Scala should be the second step in my opinion.

Bests,

     Daniel


Am 29.08.2014 11:33, schrieb Asterios Katsifodimos:

> I agree that using Akka's actors from Java results in very ugly code.
> Hiding the internals of Akka behind Java reflection looks better but breaks
> the principles of actors. For me it is kind of a deal breaker for using
> Akka from Java.  I think that Till has more reasons to believe that Scala
> would be a more appropriate for building a new Job/Task Manager.
>
> I think that this discussion should focus on 4 main aspects:
> 1. Performance
> 2. Implementability
> 3. Maintainability
> 4. Available Tools
>
> 1. Performance: Since that the job of the JobManager and the TaskManager is
> to 1) exchange messages in order to maintain a distributed state machine
> and 2) setup connections between task managers, 3) detect failures etc.. In
> these basic operations, performance should not be an issue. Akka was proven
> to scale quite well with very low latency. I guess that the low level
> "plumbing" (serialization, connections, etc.) will continue in Java in
> order to guarantee high performance. I have no clue on what's happening
> with memory management and whether this will be implemented in Java or
> Scala and the respective consequences. Please comment.
>
> 2. Since the Job/Task Manager is going to be essentially implemented from
> scratch, given the power of Akka, it seems to me that the implementation
> will be   easier, shorter and less verbose in Scala, given that Till is
> comfortable enough with Scala.
>
> 3. Given #2, maintaining the code and trying out new ideas in Scala would
> take less time and effort. But maintaining low level plumbing in Java and
> high level logic in Scala scares me. Anyone that has done this before could
> comment on this?
>
> 4. Tools: Robert has raised some issues already but I think that tools will
> get better with time.
>
> Given the above, I would focus on #3 to be honest. Apart from this, going
> the Scala way sounds like a great idea. I really second Kostas' opinion
> that if large changes are going to happen, this is the best moment.
>
> Cheers,
> Asterios
>
>
>
> On Fri, Aug 29, 2014 at 1:02 AM, Till Rohrmann <[hidden email]>
> wrote:
>
>> I also agree with Robert and Kostas that it has to be a community decision.
>> I understand the problems with Eclipse and the Scala IDE which is a pain in
>> the ass. But eventually these things will be fixed. Maybe we could also
>> talk to the typesafe guy and tell him that this problem bothers us a lot.
>>
>> I also believe that the project is not about a specific programming
>> language but a problem we want to tackle with Flink. From time to time it
>> might be necessary to adapt the tools in order to reach the goal. In fact,
>> I don't believe that Scala parts would drive people away from the project.
>> Instead, Scala enthusiasts would be motivated to join us.
>>
>> Actually I stumbled across a quote of Leibniz which put's my point of view
>> quite accurately in a nutshell:
>>
>> In symbols one observes an advantage in discovery which is greatest when
>> they express the exact nature of a thing briefly and, as it were, picture
>> it; then indeed the labor of thought is wonderfully diminished -- Gottfried
>> Wilhelm Leibniz
>>
>>
>> On Thu, Aug 28, 2014 at 12:57 PM, Kostas Tzoumas <[hidden email]>
>> wrote:
>>
>>> On Thu, Aug 28, 2014 at 11:49 AM, Robert Metzger <[hidden email]>
>>> wrote:
>>>
>>>> Changing the programming language of a very important system component
>> is
>>>> something we should carefully discuss.
>>>>
>>> Definitely agree, I think the community should discuss this very
>> carefully.
>>>
>>>> I understand that Akka is written in Scala and that it will be much
>> more
>>>> natural to implement the actor based system using Scala.
>>>> I see the following issues that we should consider:
>>>> Until now, Flink is clearly a project implemented only in Java. The
>> Scala
>>>> API basically sits on top of the Java-based runtime. We do not really
>>>> depend on Scala (we could easily remove the Scala API if we want to).
>>>> Having code written in Scala in the main system will add a hard
>>> dependency
>>>> on a scala version.
>>>> Being a pure Java project has some advantages: I think its a fact that
>>>> there are more Java programmers than Scala programmers. So our chances
>> of
>>>> attracting new contributors are higher when being a Java project.
>>>> On the other hand, we could maybe attract Scala developers to our
>>> project.
>>>> But that has not happened (for contributors, not users!) so far for our
>>>> Scala API, so I don't see any reason for that to happen.
>>>>
>>>>
>>> This is definitely an issue to consider. We need to carefully weight how
>>> important this issue is. If we want to break things, incubation is the
>>> right time to do it. Below are some arguments in favor of breaking
>> things,
>>> but do keep in mind that I am undecided, and I would really like to see
>> the
>>> community weighing in.
>>>
>>> First, I would dare say that the primary reason for someone to contribute
>>> to Flink so far has not been that the code is written in Java, but more
>> the
>>> content and nature of the project. Most contributors are Big Data
>>> enthusiasts in some way or another.
>>>
>>> Second, Scala projects have attracted contributors in the past.
>>>
>>> Third, it should not be too hard for someone that does not know Scala to
>>> contribute to a different component if the interfaces are clear.
>>>
>>>
>>>> Another issue is tooling: There are a lot of problems with Scala and
>>>> Eclipse: I've recently switched to Eclipse Luna. It seems to be
>>> impossible
>>>> to compile Scala code with Luna because ScalaIDE does not properly cope
>>>> with it.
>>>> Even with Eclipse versions that are supported by ScalaIDE, you have to
>>>> manually install 3 plugins, some of them are not available in the
>> Eclipse
>>>> Marketplace. So with a JobManager written in Scala, users can not just
>>>> import our project as a Maven project into Eclipse and start
>> developing.
>>>> The support for Maven is probably also limited. For example, I don't
>> know
>>>> if there is a checkstyle plugin for Scala.
>>>>
>>>> I'm looking forward to hearing other opinions on this issue. As I said
>> in
>>>> the beginning, we should exchange arguments on this and think about it
>>> for
>>>> some time before we decide on this.
>>>>
>>> Best,
>>>> Robert
>>>>
>>>>
>>>>
>>>> On Thu, Aug 28, 2014 at 1:08 AM, Till Rohrmann <[hidden email]>
>>>> wrote:
>>>>
>>>>> Hi guys,
>>>>>
>>>>> I currently working on replacing the old rpc infrastructure with an
>>> akka
>>>>> based actor system. In the wake of this change I will reimplement the
>>>>> JobManager and TaskManager which will then be actors. Akka offers a
>>> Java
>>>>> API but the implementation turns out to be very verbose and
>> laborious,
>>>>> because Java 6 and 7 do not support lambdas and pattern matching.
>> Using
>>>>> Scala instead, would allow a far more succinct and clear
>> implementation
>>>> of
>>>>> the JobManager and TaskManager. Instead of a lot of if statements
>> using
>>>>> instanceof to figure out the message type, we could simply use
>> pattern
>>>>> matching. Furthermore, the callback functions could simply be Scala's
>>>>> anonymous functions. Therefore I would propose to use Scala for these
>>> two
>>>>> systems.
>>>>>
>>>>> The Akka system uses the slf4j library as logging interface.
>> Therefore
>>> I
>>>>> would also propose to replace the jcl logging system with the slf4j
>>>> logging
>>>>> system. Since we want to use Akka in many parts of the runtime system
>>> and
>>>>> it recommends using logback as logging backend, I would also like to
>>>>> replace log4j with logback. But this change should inflict only few
>>>> changes
>>>>> once we established the slf4j logging interface everywhere.
>>>>>
>>>>> What do you guys think of that idea?
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Till
>>>>>

Reply | Threaded
Open this post in threaded view
|

Re: Replacing JobManager with Scala implementation

till.rohrmann
Hi Daniel,

the RPC rework is discussed in
https://issues.apache.org/jira/browse/FLINK-1019. Jira is currently down
due to maintenance reasons.

The ideas to use akka are the following. Akka allows us to reduce the code
base which has to be maintained. Especially, we get rid of all the
multi-threading programming of the rpc service which is always hard to work
with. With Akka we would get the heartbeat signal for free, because Akka
can detect dead actors. Akka uses supervision to handle fault tolerance as
well as recovery and it allows an easy forwarding of remote exceptions. At
the same time it offers a nice rpc abstraction which easily allows to
implement asynchronous services. Furthermore, it scales rather well to
large numbers of nodes and hopefully we get the latencies of Flink a little
bit down.

Bests,

Till


On Sun, Aug 31, 2014 at 11:35 AM, Daniel Warneke <[hidden email]> wrote:

> Hi,
>
> will akka just be used for RPC or are there any plans to expand the
> actor-based model to further parts of the runtime system? If so, could you
> please point me to the discussion thread?
>
> Spontaneously, I would say that adding a hard dependency on Scala just for
> the sake of having a hip RPC service sounds like a pretty dodgy deal.
> Therefore, I would like understand how much value akka could bring to Flink
> in the long run. The discussion whether to reimplement core components of
> the system in Scala should be the second step in my opinion.
>
> Bests,
>
>     Daniel
>
>
> Am 29.08.2014 11:33, schrieb Asterios Katsifodimos:
>
>  I agree that using Akka's actors from Java results in very ugly code.
>> Hiding the internals of Akka behind Java reflection looks better but
>> breaks
>> the principles of actors. For me it is kind of a deal breaker for using
>> Akka from Java.  I think that Till has more reasons to believe that Scala
>> would be a more appropriate for building a new Job/Task Manager.
>>
>> I think that this discussion should focus on 4 main aspects:
>> 1. Performance
>> 2. Implementability
>> 3. Maintainability
>> 4. Available Tools
>>
>> 1. Performance: Since that the job of the JobManager and the TaskManager
>> is
>> to 1) exchange messages in order to maintain a distributed state machine
>> and 2) setup connections between task managers, 3) detect failures etc..
>> In
>> these basic operations, performance should not be an issue. Akka was
>> proven
>> to scale quite well with very low latency. I guess that the low level
>> "plumbing" (serialization, connections, etc.) will continue in Java in
>> order to guarantee high performance. I have no clue on what's happening
>> with memory management and whether this will be implemented in Java or
>> Scala and the respective consequences. Please comment.
>>
>> 2. Since the Job/Task Manager is going to be essentially implemented from
>> scratch, given the power of Akka, it seems to me that the implementation
>> will be   easier, shorter and less verbose in Scala, given that Till is
>> comfortable enough with Scala.
>>
>> 3. Given #2, maintaining the code and trying out new ideas in Scala would
>> take less time and effort. But maintaining low level plumbing in Java and
>> high level logic in Scala scares me. Anyone that has done this before
>> could
>> comment on this?
>>
>> 4. Tools: Robert has raised some issues already but I think that tools
>> will
>> get better with time.
>>
>> Given the above, I would focus on #3 to be honest. Apart from this, going
>> the Scala way sounds like a great idea. I really second Kostas' opinion
>> that if large changes are going to happen, this is the best moment.
>>
>> Cheers,
>> Asterios
>>
>>
>>
>> On Fri, Aug 29, 2014 at 1:02 AM, Till Rohrmann <[hidden email]>
>> wrote:
>>
>>  I also agree with Robert and Kostas that it has to be a community
>>> decision.
>>> I understand the problems with Eclipse and the Scala IDE which is a pain
>>> in
>>> the ass. But eventually these things will be fixed. Maybe we could also
>>> talk to the typesafe guy and tell him that this problem bothers us a lot.
>>>
>>> I also believe that the project is not about a specific programming
>>> language but a problem we want to tackle with Flink. From time to time it
>>> might be necessary to adapt the tools in order to reach the goal. In
>>> fact,
>>> I don't believe that Scala parts would drive people away from the
>>> project.
>>> Instead, Scala enthusiasts would be motivated to join us.
>>>
>>> Actually I stumbled across a quote of Leibniz which put's my point of
>>> view
>>> quite accurately in a nutshell:
>>>
>>> In symbols one observes an advantage in discovery which is greatest when
>>> they express the exact nature of a thing briefly and, as it were, picture
>>> it; then indeed the labor of thought is wonderfully diminished --
>>> Gottfried
>>> Wilhelm Leibniz
>>>
>>>
>>> On Thu, Aug 28, 2014 at 12:57 PM, Kostas Tzoumas <[hidden email]>
>>> wrote:
>>>
>>>  On Thu, Aug 28, 2014 at 11:49 AM, Robert Metzger <[hidden email]>
>>>> wrote:
>>>>
>>>>  Changing the programming language of a very important system component
>>>>>
>>>> is
>>>
>>>> something we should carefully discuss.
>>>>>
>>>>>  Definitely agree, I think the community should discuss this very
>>>>
>>> carefully.
>>>
>>>>
>>>>  I understand that Akka is written in Scala and that it will be much
>>>>>
>>>> more
>>>
>>>> natural to implement the actor based system using Scala.
>>>>> I see the following issues that we should consider:
>>>>> Until now, Flink is clearly a project implemented only in Java. The
>>>>>
>>>> Scala
>>>
>>>> API basically sits on top of the Java-based runtime. We do not really
>>>>> depend on Scala (we could easily remove the Scala API if we want to).
>>>>> Having code written in Scala in the main system will add a hard
>>>>>
>>>> dependency
>>>>
>>>>> on a scala version.
>>>>> Being a pure Java project has some advantages: I think its a fact that
>>>>> there are more Java programmers than Scala programmers. So our chances
>>>>>
>>>> of
>>>
>>>> attracting new contributors are higher when being a Java project.
>>>>> On the other hand, we could maybe attract Scala developers to our
>>>>>
>>>> project.
>>>>
>>>>> But that has not happened (for contributors, not users!) so far for our
>>>>> Scala API, so I don't see any reason for that to happen.
>>>>>
>>>>>
>>>>>  This is definitely an issue to consider. We need to carefully weight
>>>> how
>>>> important this issue is. If we want to break things, incubation is the
>>>> right time to do it. Below are some arguments in favor of breaking
>>>>
>>> things,
>>>
>>>> but do keep in mind that I am undecided, and I would really like to see
>>>>
>>> the
>>>
>>>> community weighing in.
>>>>
>>>> First, I would dare say that the primary reason for someone to
>>>> contribute
>>>> to Flink so far has not been that the code is written in Java, but more
>>>>
>>> the
>>>
>>>> content and nature of the project. Most contributors are Big Data
>>>> enthusiasts in some way or another.
>>>>
>>>> Second, Scala projects have attracted contributors in the past.
>>>>
>>>> Third, it should not be too hard for someone that does not know Scala to
>>>> contribute to a different component if the interfaces are clear.
>>>>
>>>>
>>>>  Another issue is tooling: There are a lot of problems with Scala and
>>>>> Eclipse: I've recently switched to Eclipse Luna. It seems to be
>>>>>
>>>> impossible
>>>>
>>>>> to compile Scala code with Luna because ScalaIDE does not properly cope
>>>>> with it.
>>>>> Even with Eclipse versions that are supported by ScalaIDE, you have to
>>>>> manually install 3 plugins, some of them are not available in the
>>>>>
>>>> Eclipse
>>>
>>>> Marketplace. So with a JobManager written in Scala, users can not just
>>>>> import our project as a Maven project into Eclipse and start
>>>>>
>>>> developing.
>>>
>>>> The support for Maven is probably also limited. For example, I don't
>>>>>
>>>> know
>>>
>>>> if there is a checkstyle plugin for Scala.
>>>>>
>>>>> I'm looking forward to hearing other opinions on this issue. As I said
>>>>>
>>>> in
>>>
>>>> the beginning, we should exchange arguments on this and think about it
>>>>>
>>>> for
>>>>
>>>>> some time before we decide on this.
>>>>>
>>>>>  Best,
>>>>
>>>>> Robert
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Aug 28, 2014 at 1:08 AM, Till Rohrmann <[hidden email]>
>>>>> wrote:
>>>>>
>>>>>  Hi guys,
>>>>>>
>>>>>> I currently working on replacing the old rpc infrastructure with an
>>>>>>
>>>>> akka
>>>>
>>>>> based actor system. In the wake of this change I will reimplement the
>>>>>> JobManager and TaskManager which will then be actors. Akka offers a
>>>>>>
>>>>> Java
>>>>
>>>>> API but the implementation turns out to be very verbose and
>>>>>>
>>>>> laborious,
>>>
>>>> because Java 6 and 7 do not support lambdas and pattern matching.
>>>>>>
>>>>> Using
>>>
>>>> Scala instead, would allow a far more succinct and clear
>>>>>>
>>>>> implementation
>>>
>>>> of
>>>>>
>>>>>> the JobManager and TaskManager. Instead of a lot of if statements
>>>>>>
>>>>> using
>>>
>>>> instanceof to figure out the message type, we could simply use
>>>>>>
>>>>> pattern
>>>
>>>> matching. Furthermore, the callback functions could simply be Scala's
>>>>>> anonymous functions. Therefore I would propose to use Scala for these
>>>>>>
>>>>> two
>>>>
>>>>> systems.
>>>>>>
>>>>>> The Akka system uses the slf4j library as logging interface.
>>>>>>
>>>>> Therefore
>>>
>>>> I
>>>>
>>>>> would also propose to replace the jcl logging system with the slf4j
>>>>>>
>>>>> logging
>>>>>
>>>>>> system. Since we want to use Akka in many parts of the runtime system
>>>>>>
>>>>> and
>>>>
>>>>> it recommends using logback as logging backend, I would also like to
>>>>>> replace log4j with logback. But this change should inflict only few
>>>>>>
>>>>> changes
>>>>>
>>>>>> once we established the slf4j logging interface everywhere.
>>>>>>
>>>>>> What do you guys think of that idea?
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Till
>>>>>>
>>>>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Replacing JobManager with Scala implementation

Stephan Ewen
Hi!

The Java vs Scala discussion is orthogonal to the actors discussion. We can
use Akka actors in Java. And I think that makes a lot of sense, for the
reasons that Till mentioned, plus the following reasons:

 - Akka has made a lot of effort to combine message throughput (multiple
actor calls in one message) with low message latency. I don't think we
could do much better with something else

 - I am currently working on the ExecutionGraph and Scheduler to unify lazy
computation / recovery / dynamic resource assignment.
   The Actor paradigm (order of calls, queuing invocations in the actor
mailboxes) makes it much simpler to get concurrent situations right (such
as certain calls overtaking each other, like for example deploy/cancel, etc)

 - Actors work with thread pools be themselves, so we can get rid of all
the inner runnables sent to executor services. makes the code much more
readable

Stephan



On Sun, Aug 31, 2014 at 1:31 PM, Till Rohrmann <[hidden email]>
wrote:

> Hi Daniel,
>
> the RPC rework is discussed in
> https://issues.apache.org/jira/browse/FLINK-1019. Jira is currently down
> due to maintenance reasons.
>
> The ideas to use akka are the following. Akka allows us to reduce the code
> base which has to be maintained. Especially, we get rid of all the
> multi-threading programming of the rpc service which is always hard to work
> with. With Akka we would get the heartbeat signal for free, because Akka
> can detect dead actors. Akka uses supervision to handle fault tolerance as
> well as recovery and it allows an easy forwarding of remote exceptions. At
> the same time it offers a nice rpc abstraction which easily allows to
> implement asynchronous services. Furthermore, it scales rather well to
> large numbers of nodes and hopefully we get the latencies of Flink a little
> bit down.
>
> Bests,
>
> Till
>
>
> On Sun, Aug 31, 2014 at 11:35 AM, Daniel Warneke <[hidden email]>
> wrote:
>
> > Hi,
> >
> > will akka just be used for RPC or are there any plans to expand the
> > actor-based model to further parts of the runtime system? If so, could
> you
> > please point me to the discussion thread?
> >
> > Spontaneously, I would say that adding a hard dependency on Scala just
> for
> > the sake of having a hip RPC service sounds like a pretty dodgy deal.
> > Therefore, I would like understand how much value akka could bring to
> Flink
> > in the long run. The discussion whether to reimplement core components of
> > the system in Scala should be the second step in my opinion.
> >
> > Bests,
> >
> >     Daniel
> >
> >
> > Am 29.08.2014 11:33, schrieb Asterios Katsifodimos:
> >
> >  I agree that using Akka's actors from Java results in very ugly code.
> >> Hiding the internals of Akka behind Java reflection looks better but
> >> breaks
> >> the principles of actors. For me it is kind of a deal breaker for using
> >> Akka from Java.  I think that Till has more reasons to believe that
> Scala
> >> would be a more appropriate for building a new Job/Task Manager.
> >>
> >> I think that this discussion should focus on 4 main aspects:
> >> 1. Performance
> >> 2. Implementability
> >> 3. Maintainability
> >> 4. Available Tools
> >>
> >> 1. Performance: Since that the job of the JobManager and the TaskManager
> >> is
> >> to 1) exchange messages in order to maintain a distributed state machine
> >> and 2) setup connections between task managers, 3) detect failures etc..
> >> In
> >> these basic operations, performance should not be an issue. Akka was
> >> proven
> >> to scale quite well with very low latency. I guess that the low level
> >> "plumbing" (serialization, connections, etc.) will continue in Java in
> >> order to guarantee high performance. I have no clue on what's happening
> >> with memory management and whether this will be implemented in Java or
> >> Scala and the respective consequences. Please comment.
> >>
> >> 2. Since the Job/Task Manager is going to be essentially implemented
> from
> >> scratch, given the power of Akka, it seems to me that the implementation
> >> will be   easier, shorter and less verbose in Scala, given that Till is
> >> comfortable enough with Scala.
> >>
> >> 3. Given #2, maintaining the code and trying out new ideas in Scala
> would
> >> take less time and effort. But maintaining low level plumbing in Java
> and
> >> high level logic in Scala scares me. Anyone that has done this before
> >> could
> >> comment on this?
> >>
> >> 4. Tools: Robert has raised some issues already but I think that tools
> >> will
> >> get better with time.
> >>
> >> Given the above, I would focus on #3 to be honest. Apart from this,
> going
> >> the Scala way sounds like a great idea. I really second Kostas' opinion
> >> that if large changes are going to happen, this is the best moment.
> >>
> >> Cheers,
> >> Asterios
> >>
> >>
> >>
> >> On Fri, Aug 29, 2014 at 1:02 AM, Till Rohrmann <[hidden email]
> >
> >> wrote:
> >>
> >>  I also agree with Robert and Kostas that it has to be a community
> >>> decision.
> >>> I understand the problems with Eclipse and the Scala IDE which is a
> pain
> >>> in
> >>> the ass. But eventually these things will be fixed. Maybe we could also
> >>> talk to the typesafe guy and tell him that this problem bothers us a
> lot.
> >>>
> >>> I also believe that the project is not about a specific programming
> >>> language but a problem we want to tackle with Flink. From time to time
> it
> >>> might be necessary to adapt the tools in order to reach the goal. In
> >>> fact,
> >>> I don't believe that Scala parts would drive people away from the
> >>> project.
> >>> Instead, Scala enthusiasts would be motivated to join us.
> >>>
> >>> Actually I stumbled across a quote of Leibniz which put's my point of
> >>> view
> >>> quite accurately in a nutshell:
> >>>
> >>> In symbols one observes an advantage in discovery which is greatest
> when
> >>> they express the exact nature of a thing briefly and, as it were,
> picture
> >>> it; then indeed the labor of thought is wonderfully diminished --
> >>> Gottfried
> >>> Wilhelm Leibniz
> >>>
> >>>
> >>> On Thu, Aug 28, 2014 at 12:57 PM, Kostas Tzoumas <[hidden email]>
> >>> wrote:
> >>>
> >>>  On Thu, Aug 28, 2014 at 11:49 AM, Robert Metzger <[hidden email]
> >
> >>>> wrote:
> >>>>
> >>>>  Changing the programming language of a very important system
> component
> >>>>>
> >>>> is
> >>>
> >>>> something we should carefully discuss.
> >>>>>
> >>>>>  Definitely agree, I think the community should discuss this very
> >>>>
> >>> carefully.
> >>>
> >>>>
> >>>>  I understand that Akka is written in Scala and that it will be much
> >>>>>
> >>>> more
> >>>
> >>>> natural to implement the actor based system using Scala.
> >>>>> I see the following issues that we should consider:
> >>>>> Until now, Flink is clearly a project implemented only in Java. The
> >>>>>
> >>>> Scala
> >>>
> >>>> API basically sits on top of the Java-based runtime. We do not really
> >>>>> depend on Scala (we could easily remove the Scala API if we want to).
> >>>>> Having code written in Scala in the main system will add a hard
> >>>>>
> >>>> dependency
> >>>>
> >>>>> on a scala version.
> >>>>> Being a pure Java project has some advantages: I think its a fact
> that
> >>>>> there are more Java programmers than Scala programmers. So our
> chances
> >>>>>
> >>>> of
> >>>
> >>>> attracting new contributors are higher when being a Java project.
> >>>>> On the other hand, we could maybe attract Scala developers to our
> >>>>>
> >>>> project.
> >>>>
> >>>>> But that has not happened (for contributors, not users!) so far for
> our
> >>>>> Scala API, so I don't see any reason for that to happen.
> >>>>>
> >>>>>
> >>>>>  This is definitely an issue to consider. We need to carefully weight
> >>>> how
> >>>> important this issue is. If we want to break things, incubation is the
> >>>> right time to do it. Below are some arguments in favor of breaking
> >>>>
> >>> things,
> >>>
> >>>> but do keep in mind that I am undecided, and I would really like to
> see
> >>>>
> >>> the
> >>>
> >>>> community weighing in.
> >>>>
> >>>> First, I would dare say that the primary reason for someone to
> >>>> contribute
> >>>> to Flink so far has not been that the code is written in Java, but
> more
> >>>>
> >>> the
> >>>
> >>>> content and nature of the project. Most contributors are Big Data
> >>>> enthusiasts in some way or another.
> >>>>
> >>>> Second, Scala projects have attracted contributors in the past.
> >>>>
> >>>> Third, it should not be too hard for someone that does not know Scala
> to
> >>>> contribute to a different component if the interfaces are clear.
> >>>>
> >>>>
> >>>>  Another issue is tooling: There are a lot of problems with Scala and
> >>>>> Eclipse: I've recently switched to Eclipse Luna. It seems to be
> >>>>>
> >>>> impossible
> >>>>
> >>>>> to compile Scala code with Luna because ScalaIDE does not properly
> cope
> >>>>> with it.
> >>>>> Even with Eclipse versions that are supported by ScalaIDE, you have
> to
> >>>>> manually install 3 plugins, some of them are not available in the
> >>>>>
> >>>> Eclipse
> >>>
> >>>> Marketplace. So with a JobManager written in Scala, users can not just
> >>>>> import our project as a Maven project into Eclipse and start
> >>>>>
> >>>> developing.
> >>>
> >>>> The support for Maven is probably also limited. For example, I don't
> >>>>>
> >>>> know
> >>>
> >>>> if there is a checkstyle plugin for Scala.
> >>>>>
> >>>>> I'm looking forward to hearing other opinions on this issue. As I
> said
> >>>>>
> >>>> in
> >>>
> >>>> the beginning, we should exchange arguments on this and think about it
> >>>>>
> >>>> for
> >>>>
> >>>>> some time before we decide on this.
> >>>>>
> >>>>>  Best,
> >>>>
> >>>>> Robert
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Thu, Aug 28, 2014 at 1:08 AM, Till Rohrmann <[hidden email]
> >
> >>>>> wrote:
> >>>>>
> >>>>>  Hi guys,
> >>>>>>
> >>>>>> I currently working on replacing the old rpc infrastructure with an
> >>>>>>
> >>>>> akka
> >>>>
> >>>>> based actor system. In the wake of this change I will reimplement the
> >>>>>> JobManager and TaskManager which will then be actors. Akka offers a
> >>>>>>
> >>>>> Java
> >>>>
> >>>>> API but the implementation turns out to be very verbose and
> >>>>>>
> >>>>> laborious,
> >>>
> >>>> because Java 6 and 7 do not support lambdas and pattern matching.
> >>>>>>
> >>>>> Using
> >>>
> >>>> Scala instead, would allow a far more succinct and clear
> >>>>>>
> >>>>> implementation
> >>>
> >>>> of
> >>>>>
> >>>>>> the JobManager and TaskManager. Instead of a lot of if statements
> >>>>>>
> >>>>> using
> >>>
> >>>> instanceof to figure out the message type, we could simply use
> >>>>>>
> >>>>> pattern
> >>>
> >>>> matching. Furthermore, the callback functions could simply be Scala's
> >>>>>> anonymous functions. Therefore I would propose to use Scala for
> these
> >>>>>>
> >>>>> two
> >>>>
> >>>>> systems.
> >>>>>>
> >>>>>> The Akka system uses the slf4j library as logging interface.
> >>>>>>
> >>>>> Therefore
> >>>
> >>>> I
> >>>>
> >>>>> would also propose to replace the jcl logging system with the slf4j
> >>>>>>
> >>>>> logging
> >>>>>
> >>>>>> system. Since we want to use Akka in many parts of the runtime
> system
> >>>>>>
> >>>>> and
> >>>>
> >>>>> it recommends using logback as logging backend, I would also like to
> >>>>>> replace log4j with logback. But this change should inflict only few
> >>>>>>
> >>>>> changes
> >>>>>
> >>>>>> once we established the slf4j logging interface everywhere.
> >>>>>>
> >>>>>> What do you guys think of that idea?
> >>>>>>
> >>>>>> Best regards,
> >>>>>>
> >>>>>> Till
> >>>>>>
> >>>>>>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Replacing JobManager with Scala implementation

Stephan Ewen
Here is one more: Akka has facilities that help creating hot standby
actors, that help with eliminating the JobManager as the
single-point-of-failure.


On Sun, Aug 31, 2014 at 5:43 PM, Stephan Ewen <[hidden email]> wrote:

> Hi!
>
> The Java vs Scala discussion is orthogonal to the actors discussion. We
> can use Akka actors in Java. And I think that makes a lot of sense, for the
> reasons that Till mentioned, plus the following reasons:
>
>  - Akka has made a lot of effort to combine message throughput (multiple
> actor calls in one message) with low message latency. I don't think we
> could do much better with something else
>
>  - I am currently working on the ExecutionGraph and Scheduler to unify
> lazy computation / recovery / dynamic resource assignment.
>    The Actor paradigm (order of calls, queuing invocations in the actor
> mailboxes) makes it much simpler to get concurrent situations right (such
> as certain calls overtaking each other, like for example deploy/cancel, etc)
>
>  - Actors work with thread pools be themselves, so we can get rid of all
> the inner runnables sent to executor services. makes the code much more
> readable
>
> Stephan
>
>
>
> On Sun, Aug 31, 2014 at 1:31 PM, Till Rohrmann <[hidden email]>
> wrote:
>
>> Hi Daniel,
>>
>> the RPC rework is discussed in
>> https://issues.apache.org/jira/browse/FLINK-1019. Jira is currently down
>> due to maintenance reasons.
>>
>> The ideas to use akka are the following. Akka allows us to reduce the code
>> base which has to be maintained. Especially, we get rid of all the
>> multi-threading programming of the rpc service which is always hard to
>> work
>> with. With Akka we would get the heartbeat signal for free, because Akka
>> can detect dead actors. Akka uses supervision to handle fault tolerance as
>> well as recovery and it allows an easy forwarding of remote exceptions. At
>> the same time it offers a nice rpc abstraction which easily allows to
>> implement asynchronous services. Furthermore, it scales rather well to
>> large numbers of nodes and hopefully we get the latencies of Flink a
>> little
>> bit down.
>>
>> Bests,
>>
>> Till
>>
>>
>> On Sun, Aug 31, 2014 at 11:35 AM, Daniel Warneke <[hidden email]>
>> wrote:
>>
>> > Hi,
>> >
>> > will akka just be used for RPC or are there any plans to expand the
>> > actor-based model to further parts of the runtime system? If so, could
>> you
>> > please point me to the discussion thread?
>> >
>> > Spontaneously, I would say that adding a hard dependency on Scala just
>> for
>> > the sake of having a hip RPC service sounds like a pretty dodgy deal.
>> > Therefore, I would like understand how much value akka could bring to
>> Flink
>> > in the long run. The discussion whether to reimplement core components
>> of
>> > the system in Scala should be the second step in my opinion.
>> >
>> > Bests,
>> >
>> >     Daniel
>> >
>> >
>> > Am 29.08.2014 11:33, schrieb Asterios Katsifodimos:
>> >
>> >  I agree that using Akka's actors from Java results in very ugly code.
>> >> Hiding the internals of Akka behind Java reflection looks better but
>> >> breaks
>> >> the principles of actors. For me it is kind of a deal breaker for using
>> >> Akka from Java.  I think that Till has more reasons to believe that
>> Scala
>> >> would be a more appropriate for building a new Job/Task Manager.
>> >>
>> >> I think that this discussion should focus on 4 main aspects:
>> >> 1. Performance
>> >> 2. Implementability
>> >> 3. Maintainability
>> >> 4. Available Tools
>> >>
>> >> 1. Performance: Since that the job of the JobManager and the
>> TaskManager
>> >> is
>> >> to 1) exchange messages in order to maintain a distributed state
>> machine
>> >> and 2) setup connections between task managers, 3) detect failures
>> etc..
>> >> In
>> >> these basic operations, performance should not be an issue. Akka was
>> >> proven
>> >> to scale quite well with very low latency. I guess that the low level
>> >> "plumbing" (serialization, connections, etc.) will continue in Java in
>> >> order to guarantee high performance. I have no clue on what's happening
>> >> with memory management and whether this will be implemented in Java or
>> >> Scala and the respective consequences. Please comment.
>> >>
>> >> 2. Since the Job/Task Manager is going to be essentially implemented
>> from
>> >> scratch, given the power of Akka, it seems to me that the
>> implementation
>> >> will be   easier, shorter and less verbose in Scala, given that Till is
>> >> comfortable enough with Scala.
>> >>
>> >> 3. Given #2, maintaining the code and trying out new ideas in Scala
>> would
>> >> take less time and effort. But maintaining low level plumbing in Java
>> and
>> >> high level logic in Scala scares me. Anyone that has done this before
>> >> could
>> >> comment on this?
>> >>
>> >> 4. Tools: Robert has raised some issues already but I think that tools
>> >> will
>> >> get better with time.
>> >>
>> >> Given the above, I would focus on #3 to be honest. Apart from this,
>> going
>> >> the Scala way sounds like a great idea. I really second Kostas' opinion
>> >> that if large changes are going to happen, this is the best moment.
>> >>
>> >> Cheers,
>> >> Asterios
>> >>
>> >>
>> >>
>> >> On Fri, Aug 29, 2014 at 1:02 AM, Till Rohrmann <
>> [hidden email]>
>> >> wrote:
>> >>
>> >>  I also agree with Robert and Kostas that it has to be a community
>> >>> decision.
>> >>> I understand the problems with Eclipse and the Scala IDE which is a
>> pain
>> >>> in
>> >>> the ass. But eventually these things will be fixed. Maybe we could
>> also
>> >>> talk to the typesafe guy and tell him that this problem bothers us a
>> lot.
>> >>>
>> >>> I also believe that the project is not about a specific programming
>> >>> language but a problem we want to tackle with Flink. From time to
>> time it
>> >>> might be necessary to adapt the tools in order to reach the goal. In
>> >>> fact,
>> >>> I don't believe that Scala parts would drive people away from the
>> >>> project.
>> >>> Instead, Scala enthusiasts would be motivated to join us.
>> >>>
>> >>> Actually I stumbled across a quote of Leibniz which put's my point of
>> >>> view
>> >>> quite accurately in a nutshell:
>> >>>
>> >>> In symbols one observes an advantage in discovery which is greatest
>> when
>> >>> they express the exact nature of a thing briefly and, as it were,
>> picture
>> >>> it; then indeed the labor of thought is wonderfully diminished --
>> >>> Gottfried
>> >>> Wilhelm Leibniz
>> >>>
>> >>>
>> >>> On Thu, Aug 28, 2014 at 12:57 PM, Kostas Tzoumas <[hidden email]
>> >
>> >>> wrote:
>> >>>
>> >>>  On Thu, Aug 28, 2014 at 11:49 AM, Robert Metzger <
>> [hidden email]>
>> >>>> wrote:
>> >>>>
>> >>>>  Changing the programming language of a very important system
>> component
>> >>>>>
>> >>>> is
>> >>>
>> >>>> something we should carefully discuss.
>> >>>>>
>> >>>>>  Definitely agree, I think the community should discuss this very
>> >>>>
>> >>> carefully.
>> >>>
>> >>>>
>> >>>>  I understand that Akka is written in Scala and that it will be much
>> >>>>>
>> >>>> more
>> >>>
>> >>>> natural to implement the actor based system using Scala.
>> >>>>> I see the following issues that we should consider:
>> >>>>> Until now, Flink is clearly a project implemented only in Java. The
>> >>>>>
>> >>>> Scala
>> >>>
>> >>>> API basically sits on top of the Java-based runtime. We do not really
>> >>>>> depend on Scala (we could easily remove the Scala API if we want
>> to).
>> >>>>> Having code written in Scala in the main system will add a hard
>> >>>>>
>> >>>> dependency
>> >>>>
>> >>>>> on a scala version.
>> >>>>> Being a pure Java project has some advantages: I think its a fact
>> that
>> >>>>> there are more Java programmers than Scala programmers. So our
>> chances
>> >>>>>
>> >>>> of
>> >>>
>> >>>> attracting new contributors are higher when being a Java project.
>> >>>>> On the other hand, we could maybe attract Scala developers to our
>> >>>>>
>> >>>> project.
>> >>>>
>> >>>>> But that has not happened (for contributors, not users!) so far for
>> our
>> >>>>> Scala API, so I don't see any reason for that to happen.
>> >>>>>
>> >>>>>
>> >>>>>  This is definitely an issue to consider. We need to carefully
>> weight
>> >>>> how
>> >>>> important this issue is. If we want to break things, incubation is
>> the
>> >>>> right time to do it. Below are some arguments in favor of breaking
>> >>>>
>> >>> things,
>> >>>
>> >>>> but do keep in mind that I am undecided, and I would really like to
>> see
>> >>>>
>> >>> the
>> >>>
>> >>>> community weighing in.
>> >>>>
>> >>>> First, I would dare say that the primary reason for someone to
>> >>>> contribute
>> >>>> to Flink so far has not been that the code is written in Java, but
>> more
>> >>>>
>> >>> the
>> >>>
>> >>>> content and nature of the project. Most contributors are Big Data
>> >>>> enthusiasts in some way or another.
>> >>>>
>> >>>> Second, Scala projects have attracted contributors in the past.
>> >>>>
>> >>>> Third, it should not be too hard for someone that does not know
>> Scala to
>> >>>> contribute to a different component if the interfaces are clear.
>> >>>>
>> >>>>
>> >>>>  Another issue is tooling: There are a lot of problems with Scala and
>> >>>>> Eclipse: I've recently switched to Eclipse Luna. It seems to be
>> >>>>>
>> >>>> impossible
>> >>>>
>> >>>>> to compile Scala code with Luna because ScalaIDE does not properly
>> cope
>> >>>>> with it.
>> >>>>> Even with Eclipse versions that are supported by ScalaIDE, you have
>> to
>> >>>>> manually install 3 plugins, some of them are not available in the
>> >>>>>
>> >>>> Eclipse
>> >>>
>> >>>> Marketplace. So with a JobManager written in Scala, users can not
>> just
>> >>>>> import our project as a Maven project into Eclipse and start
>> >>>>>
>> >>>> developing.
>> >>>
>> >>>> The support for Maven is probably also limited. For example, I don't
>> >>>>>
>> >>>> know
>> >>>
>> >>>> if there is a checkstyle plugin for Scala.
>> >>>>>
>> >>>>> I'm looking forward to hearing other opinions on this issue. As I
>> said
>> >>>>>
>> >>>> in
>> >>>
>> >>>> the beginning, we should exchange arguments on this and think about
>> it
>> >>>>>
>> >>>> for
>> >>>>
>> >>>>> some time before we decide on this.
>> >>>>>
>> >>>>>  Best,
>> >>>>
>> >>>>> Robert
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Aug 28, 2014 at 1:08 AM, Till Rohrmann <
>> [hidden email]>
>> >>>>> wrote:
>> >>>>>
>> >>>>>  Hi guys,
>> >>>>>>
>> >>>>>> I currently working on replacing the old rpc infrastructure with an
>> >>>>>>
>> >>>>> akka
>> >>>>
>> >>>>> based actor system. In the wake of this change I will reimplement
>> the
>> >>>>>> JobManager and TaskManager which will then be actors. Akka offers a
>> >>>>>>
>> >>>>> Java
>> >>>>
>> >>>>> API but the implementation turns out to be very verbose and
>> >>>>>>
>> >>>>> laborious,
>> >>>
>> >>>> because Java 6 and 7 do not support lambdas and pattern matching.
>> >>>>>>
>> >>>>> Using
>> >>>
>> >>>> Scala instead, would allow a far more succinct and clear
>> >>>>>>
>> >>>>> implementation
>> >>>
>> >>>> of
>> >>>>>
>> >>>>>> the JobManager and TaskManager. Instead of a lot of if statements
>> >>>>>>
>> >>>>> using
>> >>>
>> >>>> instanceof to figure out the message type, we could simply use
>> >>>>>>
>> >>>>> pattern
>> >>>
>> >>>> matching. Furthermore, the callback functions could simply be Scala's
>> >>>>>> anonymous functions. Therefore I would propose to use Scala for
>> these
>> >>>>>>
>> >>>>> two
>> >>>>
>> >>>>> systems.
>> >>>>>>
>> >>>>>> The Akka system uses the slf4j library as logging interface.
>> >>>>>>
>> >>>>> Therefore
>> >>>
>> >>>> I
>> >>>>
>> >>>>> would also propose to replace the jcl logging system with the slf4j
>> >>>>>>
>> >>>>> logging
>> >>>>>
>> >>>>>> system. Since we want to use Akka in many parts of the runtime
>> system
>> >>>>>>
>> >>>>> and
>> >>>>
>> >>>>> it recommends using logback as logging backend, I would also like to
>> >>>>>> replace log4j with logback. But this change should inflict only few
>> >>>>>>
>> >>>>> changes
>> >>>>>
>> >>>>>> once we established the slf4j logging interface everywhere.
>> >>>>>>
>> >>>>>> What do you guys think of that idea?
>> >>>>>>
>> >>>>>> Best regards,
>> >>>>>>
>> >>>>>> Till
>> >>>>>>
>> >>>>>>
>> >
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Replacing JobManager with Scala implementation

till.rohrmann
If we decide for Akka, then we can choose the language to use. But with
both bindings (Java and Scala), we would add a Scala dependency to the
project, because Akka is implemented in Scala.


On Sun, Aug 31, 2014 at 5:44 PM, Stephan Ewen <[hidden email]> wrote:

> Here is one more: Akka has facilities that help creating hot standby
> actors, that help with eliminating the JobManager as the
> single-point-of-failure.
>
>
> On Sun, Aug 31, 2014 at 5:43 PM, Stephan Ewen <[hidden email]> wrote:
>
> > Hi!
> >
> > The Java vs Scala discussion is orthogonal to the actors discussion. We
> > can use Akka actors in Java. And I think that makes a lot of sense, for
> the
> > reasons that Till mentioned, plus the following reasons:
> >
> >  - Akka has made a lot of effort to combine message throughput (multiple
> > actor calls in one message) with low message latency. I don't think we
> > could do much better with something else
> >
> >  - I am currently working on the ExecutionGraph and Scheduler to unify
> > lazy computation / recovery / dynamic resource assignment.
> >    The Actor paradigm (order of calls, queuing invocations in the actor
> > mailboxes) makes it much simpler to get concurrent situations right (such
> > as certain calls overtaking each other, like for example deploy/cancel,
> etc)
> >
> >  - Actors work with thread pools be themselves, so we can get rid of all
> > the inner runnables sent to executor services. makes the code much more
> > readable
> >
> > Stephan
> >
> >
> >
> > On Sun, Aug 31, 2014 at 1:31 PM, Till Rohrmann <[hidden email]>
> > wrote:
> >
> >> Hi Daniel,
> >>
> >> the RPC rework is discussed in
> >> https://issues.apache.org/jira/browse/FLINK-1019. Jira is currently
> down
> >> due to maintenance reasons.
> >>
> >> The ideas to use akka are the following. Akka allows us to reduce the
> code
> >> base which has to be maintained. Especially, we get rid of all the
> >> multi-threading programming of the rpc service which is always hard to
> >> work
> >> with. With Akka we would get the heartbeat signal for free, because Akka
> >> can detect dead actors. Akka uses supervision to handle fault tolerance
> as
> >> well as recovery and it allows an easy forwarding of remote exceptions.
> At
> >> the same time it offers a nice rpc abstraction which easily allows to
> >> implement asynchronous services. Furthermore, it scales rather well to
> >> large numbers of nodes and hopefully we get the latencies of Flink a
> >> little
> >> bit down.
> >>
> >> Bests,
> >>
> >> Till
> >>
> >>
> >> On Sun, Aug 31, 2014 at 11:35 AM, Daniel Warneke <[hidden email]>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > will akka just be used for RPC or are there any plans to expand the
> >> > actor-based model to further parts of the runtime system? If so, could
> >> you
> >> > please point me to the discussion thread?
> >> >
> >> > Spontaneously, I would say that adding a hard dependency on Scala just
> >> for
> >> > the sake of having a hip RPC service sounds like a pretty dodgy deal.
> >> > Therefore, I would like understand how much value akka could bring to
> >> Flink
> >> > in the long run. The discussion whether to reimplement core components
> >> of
> >> > the system in Scala should be the second step in my opinion.
> >> >
> >> > Bests,
> >> >
> >> >     Daniel
> >> >
> >> >
> >> > Am 29.08.2014 11:33, schrieb Asterios Katsifodimos:
> >> >
> >> >  I agree that using Akka's actors from Java results in very ugly code.
> >> >> Hiding the internals of Akka behind Java reflection looks better but
> >> >> breaks
> >> >> the principles of actors. For me it is kind of a deal breaker for
> using
> >> >> Akka from Java.  I think that Till has more reasons to believe that
> >> Scala
> >> >> would be a more appropriate for building a new Job/Task Manager.
> >> >>
> >> >> I think that this discussion should focus on 4 main aspects:
> >> >> 1. Performance
> >> >> 2. Implementability
> >> >> 3. Maintainability
> >> >> 4. Available Tools
> >> >>
> >> >> 1. Performance: Since that the job of the JobManager and the
> >> TaskManager
> >> >> is
> >> >> to 1) exchange messages in order to maintain a distributed state
> >> machine
> >> >> and 2) setup connections between task managers, 3) detect failures
> >> etc..
> >> >> In
> >> >> these basic operations, performance should not be an issue. Akka was
> >> >> proven
> >> >> to scale quite well with very low latency. I guess that the low level
> >> >> "plumbing" (serialization, connections, etc.) will continue in Java
> in
> >> >> order to guarantee high performance. I have no clue on what's
> happening
> >> >> with memory management and whether this will be implemented in Java
> or
> >> >> Scala and the respective consequences. Please comment.
> >> >>
> >> >> 2. Since the Job/Task Manager is going to be essentially implemented
> >> from
> >> >> scratch, given the power of Akka, it seems to me that the
> >> implementation
> >> >> will be   easier, shorter and less verbose in Scala, given that Till
> is
> >> >> comfortable enough with Scala.
> >> >>
> >> >> 3. Given #2, maintaining the code and trying out new ideas in Scala
> >> would
> >> >> take less time and effort. But maintaining low level plumbing in Java
> >> and
> >> >> high level logic in Scala scares me. Anyone that has done this before
> >> >> could
> >> >> comment on this?
> >> >>
> >> >> 4. Tools: Robert has raised some issues already but I think that
> tools
> >> >> will
> >> >> get better with time.
> >> >>
> >> >> Given the above, I would focus on #3 to be honest. Apart from this,
> >> going
> >> >> the Scala way sounds like a great idea. I really second Kostas'
> opinion
> >> >> that if large changes are going to happen, this is the best moment.
> >> >>
> >> >> Cheers,
> >> >> Asterios
> >> >>
> >> >>
> >> >>
> >> >> On Fri, Aug 29, 2014 at 1:02 AM, Till Rohrmann <
> >> [hidden email]>
> >> >> wrote:
> >> >>
> >> >>  I also agree with Robert and Kostas that it has to be a community
> >> >>> decision.
> >> >>> I understand the problems with Eclipse and the Scala IDE which is a
> >> pain
> >> >>> in
> >> >>> the ass. But eventually these things will be fixed. Maybe we could
> >> also
> >> >>> talk to the typesafe guy and tell him that this problem bothers us a
> >> lot.
> >> >>>
> >> >>> I also believe that the project is not about a specific programming
> >> >>> language but a problem we want to tackle with Flink. From time to
> >> time it
> >> >>> might be necessary to adapt the tools in order to reach the goal. In
> >> >>> fact,
> >> >>> I don't believe that Scala parts would drive people away from the
> >> >>> project.
> >> >>> Instead, Scala enthusiasts would be motivated to join us.
> >> >>>
> >> >>> Actually I stumbled across a quote of Leibniz which put's my point
> of
> >> >>> view
> >> >>> quite accurately in a nutshell:
> >> >>>
> >> >>> In symbols one observes an advantage in discovery which is greatest
> >> when
> >> >>> they express the exact nature of a thing briefly and, as it were,
> >> picture
> >> >>> it; then indeed the labor of thought is wonderfully diminished --
> >> >>> Gottfried
> >> >>> Wilhelm Leibniz
> >> >>>
> >> >>>
> >> >>> On Thu, Aug 28, 2014 at 12:57 PM, Kostas Tzoumas <
> [hidden email]
> >> >
> >> >>> wrote:
> >> >>>
> >> >>>  On Thu, Aug 28, 2014 at 11:49 AM, Robert Metzger <
> >> [hidden email]>
> >> >>>> wrote:
> >> >>>>
> >> >>>>  Changing the programming language of a very important system
> >> component
> >> >>>>>
> >> >>>> is
> >> >>>
> >> >>>> something we should carefully discuss.
> >> >>>>>
> >> >>>>>  Definitely agree, I think the community should discuss this very
> >> >>>>
> >> >>> carefully.
> >> >>>
> >> >>>>
> >> >>>>  I understand that Akka is written in Scala and that it will be
> much
> >> >>>>>
> >> >>>> more
> >> >>>
> >> >>>> natural to implement the actor based system using Scala.
> >> >>>>> I see the following issues that we should consider:
> >> >>>>> Until now, Flink is clearly a project implemented only in Java.
> The
> >> >>>>>
> >> >>>> Scala
> >> >>>
> >> >>>> API basically sits on top of the Java-based runtime. We do not
> really
> >> >>>>> depend on Scala (we could easily remove the Scala API if we want
> >> to).
> >> >>>>> Having code written in Scala in the main system will add a hard
> >> >>>>>
> >> >>>> dependency
> >> >>>>
> >> >>>>> on a scala version.
> >> >>>>> Being a pure Java project has some advantages: I think its a fact
> >> that
> >> >>>>> there are more Java programmers than Scala programmers. So our
> >> chances
> >> >>>>>
> >> >>>> of
> >> >>>
> >> >>>> attracting new contributors are higher when being a Java project.
> >> >>>>> On the other hand, we could maybe attract Scala developers to our
> >> >>>>>
> >> >>>> project.
> >> >>>>
> >> >>>>> But that has not happened (for contributors, not users!) so far
> for
> >> our
> >> >>>>> Scala API, so I don't see any reason for that to happen.
> >> >>>>>
> >> >>>>>
> >> >>>>>  This is definitely an issue to consider. We need to carefully
> >> weight
> >> >>>> how
> >> >>>> important this issue is. If we want to break things, incubation is
> >> the
> >> >>>> right time to do it. Below are some arguments in favor of breaking
> >> >>>>
> >> >>> things,
> >> >>>
> >> >>>> but do keep in mind that I am undecided, and I would really like to
> >> see
> >> >>>>
> >> >>> the
> >> >>>
> >> >>>> community weighing in.
> >> >>>>
> >> >>>> First, I would dare say that the primary reason for someone to
> >> >>>> contribute
> >> >>>> to Flink so far has not been that the code is written in Java, but
> >> more
> >> >>>>
> >> >>> the
> >> >>>
> >> >>>> content and nature of the project. Most contributors are Big Data
> >> >>>> enthusiasts in some way or another.
> >> >>>>
> >> >>>> Second, Scala projects have attracted contributors in the past.
> >> >>>>
> >> >>>> Third, it should not be too hard for someone that does not know
> >> Scala to
> >> >>>> contribute to a different component if the interfaces are clear.
> >> >>>>
> >> >>>>
> >> >>>>  Another issue is tooling: There are a lot of problems with Scala
> and
> >> >>>>> Eclipse: I've recently switched to Eclipse Luna. It seems to be
> >> >>>>>
> >> >>>> impossible
> >> >>>>
> >> >>>>> to compile Scala code with Luna because ScalaIDE does not properly
> >> cope
> >> >>>>> with it.
> >> >>>>> Even with Eclipse versions that are supported by ScalaIDE, you
> have
> >> to
> >> >>>>> manually install 3 plugins, some of them are not available in the
> >> >>>>>
> >> >>>> Eclipse
> >> >>>
> >> >>>> Marketplace. So with a JobManager written in Scala, users can not
> >> just
> >> >>>>> import our project as a Maven project into Eclipse and start
> >> >>>>>
> >> >>>> developing.
> >> >>>
> >> >>>> The support for Maven is probably also limited. For example, I
> don't
> >> >>>>>
> >> >>>> know
> >> >>>
> >> >>>> if there is a checkstyle plugin for Scala.
> >> >>>>>
> >> >>>>> I'm looking forward to hearing other opinions on this issue. As I
> >> said
> >> >>>>>
> >> >>>> in
> >> >>>
> >> >>>> the beginning, we should exchange arguments on this and think about
> >> it
> >> >>>>>
> >> >>>> for
> >> >>>>
> >> >>>>> some time before we decide on this.
> >> >>>>>
> >> >>>>>  Best,
> >> >>>>
> >> >>>>> Robert
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> On Thu, Aug 28, 2014 at 1:08 AM, Till Rohrmann <
> >> [hidden email]>
> >> >>>>> wrote:
> >> >>>>>
> >> >>>>>  Hi guys,
> >> >>>>>>
> >> >>>>>> I currently working on replacing the old rpc infrastructure with
> an
> >> >>>>>>
> >> >>>>> akka
> >> >>>>
> >> >>>>> based actor system. In the wake of this change I will reimplement
> >> the
> >> >>>>>> JobManager and TaskManager which will then be actors. Akka
> offers a
> >> >>>>>>
> >> >>>>> Java
> >> >>>>
> >> >>>>> API but the implementation turns out to be very verbose and
> >> >>>>>>
> >> >>>>> laborious,
> >> >>>
> >> >>>> because Java 6 and 7 do not support lambdas and pattern matching.
> >> >>>>>>
> >> >>>>> Using
> >> >>>
> >> >>>> Scala instead, would allow a far more succinct and clear
> >> >>>>>>
> >> >>>>> implementation
> >> >>>
> >> >>>> of
> >> >>>>>
> >> >>>>>> the JobManager and TaskManager. Instead of a lot of if statements
> >> >>>>>>
> >> >>>>> using
> >> >>>
> >> >>>> instanceof to figure out the message type, we could simply use
> >> >>>>>>
> >> >>>>> pattern
> >> >>>
> >> >>>> matching. Furthermore, the callback functions could simply be
> Scala's
> >> >>>>>> anonymous functions. Therefore I would propose to use Scala for
> >> these
> >> >>>>>>
> >> >>>>> two
> >> >>>>
> >> >>>>> systems.
> >> >>>>>>
> >> >>>>>> The Akka system uses the slf4j library as logging interface.
> >> >>>>>>
> >> >>>>> Therefore
> >> >>>
> >> >>>> I
> >> >>>>
> >> >>>>> would also propose to replace the jcl logging system with the
> slf4j
> >> >>>>>>
> >> >>>>> logging
> >> >>>>>
> >> >>>>>> system. Since we want to use Akka in many parts of the runtime
> >> system
> >> >>>>>>
> >> >>>>> and
> >> >>>>
> >> >>>>> it recommends using logback as logging backend, I would also like
> to
> >> >>>>>> replace log4j with logback. But this change should inflict only
> few
> >> >>>>>>
> >> >>>>> changes
> >> >>>>>
> >> >>>>>> once we established the slf4j logging interface everywhere.
> >> >>>>>>
> >> >>>>>> What do you guys think of that idea?
> >> >>>>>>
> >> >>>>>> Best regards,
> >> >>>>>>
> >> >>>>>> Till
> >> >>>>>>
> >> >>>>>>
> >> >
> >>
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Replacing JobManager with Scala implementation

Kostas Tzoumas-2
It seems that the discussion has winded down a bit.

My opinion is that we should not be religious with languages. If Scala is a
better match for this implementation (and it seems to be), then we should
use it and make sure that the JobManager and TaskManager expose clean APIs
that can be used by a Java programmer without Scala knowledge.

Kostas


On Sun, Aug 31, 2014 at 6:12 PM, Till Rohrmann <[hidden email]>
wrote:

> If we decide for Akka, then we can choose the language to use. But with
> both bindings (Java and Scala), we would add a Scala dependency to the
> project, because Akka is implemented in Scala.
>
>
> On Sun, Aug 31, 2014 at 5:44 PM, Stephan Ewen <[hidden email]> wrote:
>
> > Here is one more: Akka has facilities that help creating hot standby
> > actors, that help with eliminating the JobManager as the
> > single-point-of-failure.
> >
> >
> > On Sun, Aug 31, 2014 at 5:43 PM, Stephan Ewen <[hidden email]> wrote:
> >
> > > Hi!
> > >
> > > The Java vs Scala discussion is orthogonal to the actors discussion. We
> > > can use Akka actors in Java. And I think that makes a lot of sense, for
> > the
> > > reasons that Till mentioned, plus the following reasons:
> > >
> > >  - Akka has made a lot of effort to combine message throughput
> (multiple
> > > actor calls in one message) with low message latency. I don't think we
> > > could do much better with something else
> > >
> > >  - I am currently working on the ExecutionGraph and Scheduler to unify
> > > lazy computation / recovery / dynamic resource assignment.
> > >    The Actor paradigm (order of calls, queuing invocations in the actor
> > > mailboxes) makes it much simpler to get concurrent situations right
> (such
> > > as certain calls overtaking each other, like for example deploy/cancel,
> > etc)
> > >
> > >  - Actors work with thread pools be themselves, so we can get rid of
> all
> > > the inner runnables sent to executor services. makes the code much more
> > > readable
> > >
> > > Stephan
> > >
> > >
> > >
> > > On Sun, Aug 31, 2014 at 1:31 PM, Till Rohrmann <
> [hidden email]>
> > > wrote:
> > >
> > >> Hi Daniel,
> > >>
> > >> the RPC rework is discussed in
> > >> https://issues.apache.org/jira/browse/FLINK-1019. Jira is currently
> > down
> > >> due to maintenance reasons.
> > >>
> > >> The ideas to use akka are the following. Akka allows us to reduce the
> > code
> > >> base which has to be maintained. Especially, we get rid of all the
> > >> multi-threading programming of the rpc service which is always hard to
> > >> work
> > >> with. With Akka we would get the heartbeat signal for free, because
> Akka
> > >> can detect dead actors. Akka uses supervision to handle fault
> tolerance
> > as
> > >> well as recovery and it allows an easy forwarding of remote
> exceptions.
> > At
> > >> the same time it offers a nice rpc abstraction which easily allows to
> > >> implement asynchronous services. Furthermore, it scales rather well to
> > >> large numbers of nodes and hopefully we get the latencies of Flink a
> > >> little
> > >> bit down.
> > >>
> > >> Bests,
> > >>
> > >> Till
> > >>
> > >>
> > >> On Sun, Aug 31, 2014 at 11:35 AM, Daniel Warneke <[hidden email]>
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > will akka just be used for RPC or are there any plans to expand the
> > >> > actor-based model to further parts of the runtime system? If so,
> could
> > >> you
> > >> > please point me to the discussion thread?
> > >> >
> > >> > Spontaneously, I would say that adding a hard dependency on Scala
> just
> > >> for
> > >> > the sake of having a hip RPC service sounds like a pretty dodgy
> deal.
> > >> > Therefore, I would like understand how much value akka could bring
> to
> > >> Flink
> > >> > in the long run. The discussion whether to reimplement core
> components
> > >> of
> > >> > the system in Scala should be the second step in my opinion.
> > >> >
> > >> > Bests,
> > >> >
> > >> >     Daniel
> > >> >
> > >> >
> > >> > Am 29.08.2014 11:33, schrieb Asterios Katsifodimos:
> > >> >
> > >> >  I agree that using Akka's actors from Java results in very ugly
> code.
> > >> >> Hiding the internals of Akka behind Java reflection looks better
> but
> > >> >> breaks
> > >> >> the principles of actors. For me it is kind of a deal breaker for
> > using
> > >> >> Akka from Java.  I think that Till has more reasons to believe that
> > >> Scala
> > >> >> would be a more appropriate for building a new Job/Task Manager.
> > >> >>
> > >> >> I think that this discussion should focus on 4 main aspects:
> > >> >> 1. Performance
> > >> >> 2. Implementability
> > >> >> 3. Maintainability
> > >> >> 4. Available Tools
> > >> >>
> > >> >> 1. Performance: Since that the job of the JobManager and the
> > >> TaskManager
> > >> >> is
> > >> >> to 1) exchange messages in order to maintain a distributed state
> > >> machine
> > >> >> and 2) setup connections between task managers, 3) detect failures
> > >> etc..
> > >> >> In
> > >> >> these basic operations, performance should not be an issue. Akka
> was
> > >> >> proven
> > >> >> to scale quite well with very low latency. I guess that the low
> level
> > >> >> "plumbing" (serialization, connections, etc.) will continue in Java
> > in
> > >> >> order to guarantee high performance. I have no clue on what's
> > happening
> > >> >> with memory management and whether this will be implemented in Java
> > or
> > >> >> Scala and the respective consequences. Please comment.
> > >> >>
> > >> >> 2. Since the Job/Task Manager is going to be essentially
> implemented
> > >> from
> > >> >> scratch, given the power of Akka, it seems to me that the
> > >> implementation
> > >> >> will be   easier, shorter and less verbose in Scala, given that
> Till
> > is
> > >> >> comfortable enough with Scala.
> > >> >>
> > >> >> 3. Given #2, maintaining the code and trying out new ideas in Scala
> > >> would
> > >> >> take less time and effort. But maintaining low level plumbing in
> Java
> > >> and
> > >> >> high level logic in Scala scares me. Anyone that has done this
> before
> > >> >> could
> > >> >> comment on this?
> > >> >>
> > >> >> 4. Tools: Robert has raised some issues already but I think that
> > tools
> > >> >> will
> > >> >> get better with time.
> > >> >>
> > >> >> Given the above, I would focus on #3 to be honest. Apart from this,
> > >> going
> > >> >> the Scala way sounds like a great idea. I really second Kostas'
> > opinion
> > >> >> that if large changes are going to happen, this is the best moment.
> > >> >>
> > >> >> Cheers,
> > >> >> Asterios
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Fri, Aug 29, 2014 at 1:02 AM, Till Rohrmann <
> > >> [hidden email]>
> > >> >> wrote:
> > >> >>
> > >> >>  I also agree with Robert and Kostas that it has to be a community
> > >> >>> decision.
> > >> >>> I understand the problems with Eclipse and the Scala IDE which is
> a
> > >> pain
> > >> >>> in
> > >> >>> the ass. But eventually these things will be fixed. Maybe we could
> > >> also
> > >> >>> talk to the typesafe guy and tell him that this problem bothers
> us a
> > >> lot.
> > >> >>>
> > >> >>> I also believe that the project is not about a specific
> programming
> > >> >>> language but a problem we want to tackle with Flink. From time to
> > >> time it
> > >> >>> might be necessary to adapt the tools in order to reach the goal.
> In
> > >> >>> fact,
> > >> >>> I don't believe that Scala parts would drive people away from the
> > >> >>> project.
> > >> >>> Instead, Scala enthusiasts would be motivated to join us.
> > >> >>>
> > >> >>> Actually I stumbled across a quote of Leibniz which put's my point
> > of
> > >> >>> view
> > >> >>> quite accurately in a nutshell:
> > >> >>>
> > >> >>> In symbols one observes an advantage in discovery which is
> greatest
> > >> when
> > >> >>> they express the exact nature of a thing briefly and, as it were,
> > >> picture
> > >> >>> it; then indeed the labor of thought is wonderfully diminished --
> > >> >>> Gottfried
> > >> >>> Wilhelm Leibniz
> > >> >>>
> > >> >>>
> > >> >>> On Thu, Aug 28, 2014 at 12:57 PM, Kostas Tzoumas <
> > [hidden email]
> > >> >
> > >> >>> wrote:
> > >> >>>
> > >> >>>  On Thu, Aug 28, 2014 at 11:49 AM, Robert Metzger <
> > >> [hidden email]>
> > >> >>>> wrote:
> > >> >>>>
> > >> >>>>  Changing the programming language of a very important system
> > >> component
> > >> >>>>>
> > >> >>>> is
> > >> >>>
> > >> >>>> something we should carefully discuss.
> > >> >>>>>
> > >> >>>>>  Definitely agree, I think the community should discuss this
> very
> > >> >>>>
> > >> >>> carefully.
> > >> >>>
> > >> >>>>
> > >> >>>>  I understand that Akka is written in Scala and that it will be
> > much
> > >> >>>>>
> > >> >>>> more
> > >> >>>
> > >> >>>> natural to implement the actor based system using Scala.
> > >> >>>>> I see the following issues that we should consider:
> > >> >>>>> Until now, Flink is clearly a project implemented only in Java.
> > The
> > >> >>>>>
> > >> >>>> Scala
> > >> >>>
> > >> >>>> API basically sits on top of the Java-based runtime. We do not
> > really
> > >> >>>>> depend on Scala (we could easily remove the Scala API if we want
> > >> to).
> > >> >>>>> Having code written in Scala in the main system will add a hard
> > >> >>>>>
> > >> >>>> dependency
> > >> >>>>
> > >> >>>>> on a scala version.
> > >> >>>>> Being a pure Java project has some advantages: I think its a
> fact
> > >> that
> > >> >>>>> there are more Java programmers than Scala programmers. So our
> > >> chances
> > >> >>>>>
> > >> >>>> of
> > >> >>>
> > >> >>>> attracting new contributors are higher when being a Java project.
> > >> >>>>> On the other hand, we could maybe attract Scala developers to
> our
> > >> >>>>>
> > >> >>>> project.
> > >> >>>>
> > >> >>>>> But that has not happened (for contributors, not users!) so far
> > for
> > >> our
> > >> >>>>> Scala API, so I don't see any reason for that to happen.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>  This is definitely an issue to consider. We need to carefully
> > >> weight
> > >> >>>> how
> > >> >>>> important this issue is. If we want to break things, incubation
> is
> > >> the
> > >> >>>> right time to do it. Below are some arguments in favor of
> breaking
> > >> >>>>
> > >> >>> things,
> > >> >>>
> > >> >>>> but do keep in mind that I am undecided, and I would really like
> to
> > >> see
> > >> >>>>
> > >> >>> the
> > >> >>>
> > >> >>>> community weighing in.
> > >> >>>>
> > >> >>>> First, I would dare say that the primary reason for someone to
> > >> >>>> contribute
> > >> >>>> to Flink so far has not been that the code is written in Java,
> but
> > >> more
> > >> >>>>
> > >> >>> the
> > >> >>>
> > >> >>>> content and nature of the project. Most contributors are Big Data
> > >> >>>> enthusiasts in some way or another.
> > >> >>>>
> > >> >>>> Second, Scala projects have attracted contributors in the past.
> > >> >>>>
> > >> >>>> Third, it should not be too hard for someone that does not know
> > >> Scala to
> > >> >>>> contribute to a different component if the interfaces are clear.
> > >> >>>>
> > >> >>>>
> > >> >>>>  Another issue is tooling: There are a lot of problems with Scala
> > and
> > >> >>>>> Eclipse: I've recently switched to Eclipse Luna. It seems to be
> > >> >>>>>
> > >> >>>> impossible
> > >> >>>>
> > >> >>>>> to compile Scala code with Luna because ScalaIDE does not
> properly
> > >> cope
> > >> >>>>> with it.
> > >> >>>>> Even with Eclipse versions that are supported by ScalaIDE, you
> > have
> > >> to
> > >> >>>>> manually install 3 plugins, some of them are not available in
> the
> > >> >>>>>
> > >> >>>> Eclipse
> > >> >>>
> > >> >>>> Marketplace. So with a JobManager written in Scala, users can not
> > >> just
> > >> >>>>> import our project as a Maven project into Eclipse and start
> > >> >>>>>
> > >> >>>> developing.
> > >> >>>
> > >> >>>> The support for Maven is probably also limited. For example, I
> > don't
> > >> >>>>>
> > >> >>>> know
> > >> >>>
> > >> >>>> if there is a checkstyle plugin for Scala.
> > >> >>>>>
> > >> >>>>> I'm looking forward to hearing other opinions on this issue. As
> I
> > >> said
> > >> >>>>>
> > >> >>>> in
> > >> >>>
> > >> >>>> the beginning, we should exchange arguments on this and think
> about
> > >> it
> > >> >>>>>
> > >> >>>> for
> > >> >>>>
> > >> >>>>> some time before we decide on this.
> > >> >>>>>
> > >> >>>>>  Best,
> > >> >>>>
> > >> >>>>> Robert
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> On Thu, Aug 28, 2014 at 1:08 AM, Till Rohrmann <
> > >> [hidden email]>
> > >> >>>>> wrote:
> > >> >>>>>
> > >> >>>>>  Hi guys,
> > >> >>>>>>
> > >> >>>>>> I currently working on replacing the old rpc infrastructure
> with
> > an
> > >> >>>>>>
> > >> >>>>> akka
> > >> >>>>
> > >> >>>>> based actor system. In the wake of this change I will
> reimplement
> > >> the
> > >> >>>>>> JobManager and TaskManager which will then be actors. Akka
> > offers a
> > >> >>>>>>
> > >> >>>>> Java
> > >> >>>>
> > >> >>>>> API but the implementation turns out to be very verbose and
> > >> >>>>>>
> > >> >>>>> laborious,
> > >> >>>
> > >> >>>> because Java 6 and 7 do not support lambdas and pattern matching.
> > >> >>>>>>
> > >> >>>>> Using
> > >> >>>
> > >> >>>> Scala instead, would allow a far more succinct and clear
> > >> >>>>>>
> > >> >>>>> implementation
> > >> >>>
> > >> >>>> of
> > >> >>>>>
> > >> >>>>>> the JobManager and TaskManager. Instead of a lot of if
> statements
> > >> >>>>>>
> > >> >>>>> using
> > >> >>>
> > >> >>>> instanceof to figure out the message type, we could simply use
> > >> >>>>>>
> > >> >>>>> pattern
> > >> >>>
> > >> >>>> matching. Furthermore, the callback functions could simply be
> > Scala's
> > >> >>>>>> anonymous functions. Therefore I would propose to use Scala for
> > >> these
> > >> >>>>>>
> > >> >>>>> two
> > >> >>>>
> > >> >>>>> systems.
> > >> >>>>>>
> > >> >>>>>> The Akka system uses the slf4j library as logging interface.
> > >> >>>>>>
> > >> >>>>> Therefore
> > >> >>>
> > >> >>>> I
> > >> >>>>
> > >> >>>>> would also propose to replace the jcl logging system with the
> > slf4j
> > >> >>>>>>
> > >> >>>>> logging
> > >> >>>>>
> > >> >>>>>> system. Since we want to use Akka in many parts of the runtime
> > >> system
> > >> >>>>>>
> > >> >>>>> and
> > >> >>>>
> > >> >>>>> it recommends using logback as logging backend, I would also
> like
> > to
> > >> >>>>>> replace log4j with logback. But this change should inflict only
> > few
> > >> >>>>>>
> > >> >>>>> changes
> > >> >>>>>
> > >> >>>>>> once we established the slf4j logging interface everywhere.
> > >> >>>>>>
> > >> >>>>>> What do you guys think of that idea?
> > >> >>>>>>
> > >> >>>>>> Best regards,
> > >> >>>>>>
> > >> >>>>>> Till
> > >> >>>>>>
> > >> >>>>>>
> > >> >
> > >>
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Replacing JobManager with Scala implementation

till.rohrmann
I agree with Kostas. Maybe we should start a vote on this feature to get
the thing started.


On Tue, Sep 2, 2014 at 11:53 AM, Kostas Tzoumas <[hidden email]> wrote:

> It seems that the discussion has winded down a bit.
>
> My opinion is that we should not be religious with languages. If Scala is a
> better match for this implementation (and it seems to be), then we should
> use it and make sure that the JobManager and TaskManager expose clean APIs
> that can be used by a Java programmer without Scala knowledge.
>
> Kostas
>
>
> On Sun, Aug 31, 2014 at 6:12 PM, Till Rohrmann <[hidden email]>
> wrote:
>
> > If we decide for Akka, then we can choose the language to use. But with
> > both bindings (Java and Scala), we would add a Scala dependency to the
> > project, because Akka is implemented in Scala.
> >
> >
> > On Sun, Aug 31, 2014 at 5:44 PM, Stephan Ewen <[hidden email]> wrote:
> >
> > > Here is one more: Akka has facilities that help creating hot standby
> > > actors, that help with eliminating the JobManager as the
> > > single-point-of-failure.
> > >
> > >
> > > On Sun, Aug 31, 2014 at 5:43 PM, Stephan Ewen <[hidden email]>
> wrote:
> > >
> > > > Hi!
> > > >
> > > > The Java vs Scala discussion is orthogonal to the actors discussion.
> We
> > > > can use Akka actors in Java. And I think that makes a lot of sense,
> for
> > > the
> > > > reasons that Till mentioned, plus the following reasons:
> > > >
> > > >  - Akka has made a lot of effort to combine message throughput
> > (multiple
> > > > actor calls in one message) with low message latency. I don't think
> we
> > > > could do much better with something else
> > > >
> > > >  - I am currently working on the ExecutionGraph and Scheduler to
> unify
> > > > lazy computation / recovery / dynamic resource assignment.
> > > >    The Actor paradigm (order of calls, queuing invocations in the
> actor
> > > > mailboxes) makes it much simpler to get concurrent situations right
> > (such
> > > > as certain calls overtaking each other, like for example
> deploy/cancel,
> > > etc)
> > > >
> > > >  - Actors work with thread pools be themselves, so we can get rid of
> > all
> > > > the inner runnables sent to executor services. makes the code much
> more
> > > > readable
> > > >
> > > > Stephan
> > > >
> > > >
> > > >
> > > > On Sun, Aug 31, 2014 at 1:31 PM, Till Rohrmann <
> > [hidden email]>
> > > > wrote:
> > > >
> > > >> Hi Daniel,
> > > >>
> > > >> the RPC rework is discussed in
> > > >> https://issues.apache.org/jira/browse/FLINK-1019. Jira is currently
> > > down
> > > >> due to maintenance reasons.
> > > >>
> > > >> The ideas to use akka are the following. Akka allows us to reduce
> the
> > > code
> > > >> base which has to be maintained. Especially, we get rid of all the
> > > >> multi-threading programming of the rpc service which is always hard
> to
> > > >> work
> > > >> with. With Akka we would get the heartbeat signal for free, because
> > Akka
> > > >> can detect dead actors. Akka uses supervision to handle fault
> > tolerance
> > > as
> > > >> well as recovery and it allows an easy forwarding of remote
> > exceptions.
> > > At
> > > >> the same time it offers a nice rpc abstraction which easily allows
> to
> > > >> implement asynchronous services. Furthermore, it scales rather well
> to
> > > >> large numbers of nodes and hopefully we get the latencies of Flink a
> > > >> little
> > > >> bit down.
> > > >>
> > > >> Bests,
> > > >>
> > > >> Till
> > > >>
> > > >>
> > > >> On Sun, Aug 31, 2014 at 11:35 AM, Daniel Warneke <
> [hidden email]>
> > > >> wrote:
> > > >>
> > > >> > Hi,
> > > >> >
> > > >> > will akka just be used for RPC or are there any plans to expand
> the
> > > >> > actor-based model to further parts of the runtime system? If so,
> > could
> > > >> you
> > > >> > please point me to the discussion thread?
> > > >> >
> > > >> > Spontaneously, I would say that adding a hard dependency on Scala
> > just
> > > >> for
> > > >> > the sake of having a hip RPC service sounds like a pretty dodgy
> > deal.
> > > >> > Therefore, I would like understand how much value akka could bring
> > to
> > > >> Flink
> > > >> > in the long run. The discussion whether to reimplement core
> > components
> > > >> of
> > > >> > the system in Scala should be the second step in my opinion.
> > > >> >
> > > >> > Bests,
> > > >> >
> > > >> >     Daniel
> > > >> >
> > > >> >
> > > >> > Am 29.08.2014 11:33, schrieb Asterios Katsifodimos:
> > > >> >
> > > >> >  I agree that using Akka's actors from Java results in very ugly
> > code.
> > > >> >> Hiding the internals of Akka behind Java reflection looks better
> > but
> > > >> >> breaks
> > > >> >> the principles of actors. For me it is kind of a deal breaker for
> > > using
> > > >> >> Akka from Java.  I think that Till has more reasons to believe
> that
> > > >> Scala
> > > >> >> would be a more appropriate for building a new Job/Task Manager.
> > > >> >>
> > > >> >> I think that this discussion should focus on 4 main aspects:
> > > >> >> 1. Performance
> > > >> >> 2. Implementability
> > > >> >> 3. Maintainability
> > > >> >> 4. Available Tools
> > > >> >>
> > > >> >> 1. Performance: Since that the job of the JobManager and the
> > > >> TaskManager
> > > >> >> is
> > > >> >> to 1) exchange messages in order to maintain a distributed state
> > > >> machine
> > > >> >> and 2) setup connections between task managers, 3) detect
> failures
> > > >> etc..
> > > >> >> In
> > > >> >> these basic operations, performance should not be an issue. Akka
> > was
> > > >> >> proven
> > > >> >> to scale quite well with very low latency. I guess that the low
> > level
> > > >> >> "plumbing" (serialization, connections, etc.) will continue in
> Java
> > > in
> > > >> >> order to guarantee high performance. I have no clue on what's
> > > happening
> > > >> >> with memory management and whether this will be implemented in
> Java
> > > or
> > > >> >> Scala and the respective consequences. Please comment.
> > > >> >>
> > > >> >> 2. Since the Job/Task Manager is going to be essentially
> > implemented
> > > >> from
> > > >> >> scratch, given the power of Akka, it seems to me that the
> > > >> implementation
> > > >> >> will be   easier, shorter and less verbose in Scala, given that
> > Till
> > > is
> > > >> >> comfortable enough with Scala.
> > > >> >>
> > > >> >> 3. Given #2, maintaining the code and trying out new ideas in
> Scala
> > > >> would
> > > >> >> take less time and effort. But maintaining low level plumbing in
> > Java
> > > >> and
> > > >> >> high level logic in Scala scares me. Anyone that has done this
> > before
> > > >> >> could
> > > >> >> comment on this?
> > > >> >>
> > > >> >> 4. Tools: Robert has raised some issues already but I think that
> > > tools
> > > >> >> will
> > > >> >> get better with time.
> > > >> >>
> > > >> >> Given the above, I would focus on #3 to be honest. Apart from
> this,
> > > >> going
> > > >> >> the Scala way sounds like a great idea. I really second Kostas'
> > > opinion
> > > >> >> that if large changes are going to happen, this is the best
> moment.
> > > >> >>
> > > >> >> Cheers,
> > > >> >> Asterios
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> On Fri, Aug 29, 2014 at 1:02 AM, Till Rohrmann <
> > > >> [hidden email]>
> > > >> >> wrote:
> > > >> >>
> > > >> >>  I also agree with Robert and Kostas that it has to be a
> community
> > > >> >>> decision.
> > > >> >>> I understand the problems with Eclipse and the Scala IDE which
> is
> > a
> > > >> pain
> > > >> >>> in
> > > >> >>> the ass. But eventually these things will be fixed. Maybe we
> could
> > > >> also
> > > >> >>> talk to the typesafe guy and tell him that this problem bothers
> > us a
> > > >> lot.
> > > >> >>>
> > > >> >>> I also believe that the project is not about a specific
> > programming
> > > >> >>> language but a problem we want to tackle with Flink. From time
> to
> > > >> time it
> > > >> >>> might be necessary to adapt the tools in order to reach the
> goal.
> > In
> > > >> >>> fact,
> > > >> >>> I don't believe that Scala parts would drive people away from
> the
> > > >> >>> project.
> > > >> >>> Instead, Scala enthusiasts would be motivated to join us.
> > > >> >>>
> > > >> >>> Actually I stumbled across a quote of Leibniz which put's my
> point
> > > of
> > > >> >>> view
> > > >> >>> quite accurately in a nutshell:
> > > >> >>>
> > > >> >>> In symbols one observes an advantage in discovery which is
> > greatest
> > > >> when
> > > >> >>> they express the exact nature of a thing briefly and, as it
> were,
> > > >> picture
> > > >> >>> it; then indeed the labor of thought is wonderfully diminished
> --
> > > >> >>> Gottfried
> > > >> >>> Wilhelm Leibniz
> > > >> >>>
> > > >> >>>
> > > >> >>> On Thu, Aug 28, 2014 at 12:57 PM, Kostas Tzoumas <
> > > [hidden email]
> > > >> >
> > > >> >>> wrote:
> > > >> >>>
> > > >> >>>  On Thu, Aug 28, 2014 at 11:49 AM, Robert Metzger <
> > > >> [hidden email]>
> > > >> >>>> wrote:
> > > >> >>>>
> > > >> >>>>  Changing the programming language of a very important system
> > > >> component
> > > >> >>>>>
> > > >> >>>> is
> > > >> >>>
> > > >> >>>> something we should carefully discuss.
> > > >> >>>>>
> > > >> >>>>>  Definitely agree, I think the community should discuss this
> > very
> > > >> >>>>
> > > >> >>> carefully.
> > > >> >>>
> > > >> >>>>
> > > >> >>>>  I understand that Akka is written in Scala and that it will be
> > > much
> > > >> >>>>>
> > > >> >>>> more
> > > >> >>>
> > > >> >>>> natural to implement the actor based system using Scala.
> > > >> >>>>> I see the following issues that we should consider:
> > > >> >>>>> Until now, Flink is clearly a project implemented only in
> Java.
> > > The
> > > >> >>>>>
> > > >> >>>> Scala
> > > >> >>>
> > > >> >>>> API basically sits on top of the Java-based runtime. We do not
> > > really
> > > >> >>>>> depend on Scala (we could easily remove the Scala API if we
> want
> > > >> to).
> > > >> >>>>> Having code written in Scala in the main system will add a
> hard
> > > >> >>>>>
> > > >> >>>> dependency
> > > >> >>>>
> > > >> >>>>> on a scala version.
> > > >> >>>>> Being a pure Java project has some advantages: I think its a
> > fact
> > > >> that
> > > >> >>>>> there are more Java programmers than Scala programmers. So our
> > > >> chances
> > > >> >>>>>
> > > >> >>>> of
> > > >> >>>
> > > >> >>>> attracting new contributors are higher when being a Java
> project.
> > > >> >>>>> On the other hand, we could maybe attract Scala developers to
> > our
> > > >> >>>>>
> > > >> >>>> project.
> > > >> >>>>
> > > >> >>>>> But that has not happened (for contributors, not users!) so
> far
> > > for
> > > >> our
> > > >> >>>>> Scala API, so I don't see any reason for that to happen.
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>>  This is definitely an issue to consider. We need to carefully
> > > >> weight
> > > >> >>>> how
> > > >> >>>> important this issue is. If we want to break things, incubation
> > is
> > > >> the
> > > >> >>>> right time to do it. Below are some arguments in favor of
> > breaking
> > > >> >>>>
> > > >> >>> things,
> > > >> >>>
> > > >> >>>> but do keep in mind that I am undecided, and I would really
> like
> > to
> > > >> see
> > > >> >>>>
> > > >> >>> the
> > > >> >>>
> > > >> >>>> community weighing in.
> > > >> >>>>
> > > >> >>>> First, I would dare say that the primary reason for someone to
> > > >> >>>> contribute
> > > >> >>>> to Flink so far has not been that the code is written in Java,
> > but
> > > >> more
> > > >> >>>>
> > > >> >>> the
> > > >> >>>
> > > >> >>>> content and nature of the project. Most contributors are Big
> Data
> > > >> >>>> enthusiasts in some way or another.
> > > >> >>>>
> > > >> >>>> Second, Scala projects have attracted contributors in the past.
> > > >> >>>>
> > > >> >>>> Third, it should not be too hard for someone that does not know
> > > >> Scala to
> > > >> >>>> contribute to a different component if the interfaces are
> clear.
> > > >> >>>>
> > > >> >>>>
> > > >> >>>>  Another issue is tooling: There are a lot of problems with
> Scala
> > > and
> > > >> >>>>> Eclipse: I've recently switched to Eclipse Luna. It seems to
> be
> > > >> >>>>>
> > > >> >>>> impossible
> > > >> >>>>
> > > >> >>>>> to compile Scala code with Luna because ScalaIDE does not
> > properly
> > > >> cope
> > > >> >>>>> with it.
> > > >> >>>>> Even with Eclipse versions that are supported by ScalaIDE, you
> > > have
> > > >> to
> > > >> >>>>> manually install 3 plugins, some of them are not available in
> > the
> > > >> >>>>>
> > > >> >>>> Eclipse
> > > >> >>>
> > > >> >>>> Marketplace. So with a JobManager written in Scala, users can
> not
> > > >> just
> > > >> >>>>> import our project as a Maven project into Eclipse and start
> > > >> >>>>>
> > > >> >>>> developing.
> > > >> >>>
> > > >> >>>> The support for Maven is probably also limited. For example, I
> > > don't
> > > >> >>>>>
> > > >> >>>> know
> > > >> >>>
> > > >> >>>> if there is a checkstyle plugin for Scala.
> > > >> >>>>>
> > > >> >>>>> I'm looking forward to hearing other opinions on this issue.
> As
> > I
> > > >> said
> > > >> >>>>>
> > > >> >>>> in
> > > >> >>>
> > > >> >>>> the beginning, we should exchange arguments on this and think
> > about
> > > >> it
> > > >> >>>>>
> > > >> >>>> for
> > > >> >>>>
> > > >> >>>>> some time before we decide on this.
> > > >> >>>>>
> > > >> >>>>>  Best,
> > > >> >>>>
> > > >> >>>>> Robert
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> On Thu, Aug 28, 2014 at 1:08 AM, Till Rohrmann <
> > > >> [hidden email]>
> > > >> >>>>> wrote:
> > > >> >>>>>
> > > >> >>>>>  Hi guys,
> > > >> >>>>>>
> > > >> >>>>>> I currently working on replacing the old rpc infrastructure
> > with
> > > an
> > > >> >>>>>>
> > > >> >>>>> akka
> > > >> >>>>
> > > >> >>>>> based actor system. In the wake of this change I will
> > reimplement
> > > >> the
> > > >> >>>>>> JobManager and TaskManager which will then be actors. Akka
> > > offers a
> > > >> >>>>>>
> > > >> >>>>> Java
> > > >> >>>>
> > > >> >>>>> API but the implementation turns out to be very verbose and
> > > >> >>>>>>
> > > >> >>>>> laborious,
> > > >> >>>
> > > >> >>>> because Java 6 and 7 do not support lambdas and pattern
> matching.
> > > >> >>>>>>
> > > >> >>>>> Using
> > > >> >>>
> > > >> >>>> Scala instead, would allow a far more succinct and clear
> > > >> >>>>>>
> > > >> >>>>> implementation
> > > >> >>>
> > > >> >>>> of
> > > >> >>>>>
> > > >> >>>>>> the JobManager and TaskManager. Instead of a lot of if
> > statements
> > > >> >>>>>>
> > > >> >>>>> using
> > > >> >>>
> > > >> >>>> instanceof to figure out the message type, we could simply use
> > > >> >>>>>>
> > > >> >>>>> pattern
> > > >> >>>
> > > >> >>>> matching. Furthermore, the callback functions could simply be
> > > Scala's
> > > >> >>>>>> anonymous functions. Therefore I would propose to use Scala
> for
> > > >> these
> > > >> >>>>>>
> > > >> >>>>> two
> > > >> >>>>
> > > >> >>>>> systems.
> > > >> >>>>>>
> > > >> >>>>>> The Akka system uses the slf4j library as logging interface.
> > > >> >>>>>>
> > > >> >>>>> Therefore
> > > >> >>>
> > > >> >>>> I
> > > >> >>>>
> > > >> >>>>> would also propose to replace the jcl logging system with the
> > > slf4j
> > > >> >>>>>>
> > > >> >>>>> logging
> > > >> >>>>>
> > > >> >>>>>> system. Since we want to use Akka in many parts of the
> runtime
> > > >> system
> > > >> >>>>>>
> > > >> >>>>> and
> > > >> >>>>
> > > >> >>>>> it recommends using logback as logging backend, I would also
> > like
> > > to
> > > >> >>>>>> replace log4j with logback. But this change should inflict
> only
> > > few
> > > >> >>>>>>
> > > >> >>>>> changes
> > > >> >>>>>
> > > >> >>>>>> once we established the slf4j logging interface everywhere.
> > > >> >>>>>>
> > > >> >>>>>> What do you guys think of that idea?
> > > >> >>>>>>
> > > >> >>>>>> Best regards,
> > > >> >>>>>>
> > > >> >>>>>> Till
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Replacing JobManager with Scala implementation

Henry Saputra
In reply to this post by till.rohrmann
HI Till,

Thanks for opening the discussions and lead the effort and apologize
for late response.

From what I have gathered so far, there are 2 issues:
1. Introducing Akka as RPC
2. Moving to Scala to enable easy access to Akka Scala APIs.

For no1, if the RPC us used for lower level communications then we
could probably consider Netty as the transport and serialization
protocol (I also have added comment to the JIRA).
Internally, to reduce thread management we could use Akka via Scala
bridge service to make sure we use Scala Akka APIs.

So addressing no 2, we could mix both Scala and Java in JobManager and
TaskManager. the code that handle async RPC communications between JM
and TM are using Java via Netty, and internal multi-threads or higher
level plane code such as heart beat we could use Akka.

It does introduce a bit mix between Java and Scala code but we already
have mix of Scala and Java to support APIs so I think we could move
some the internal code to use Scala too as "learning" steps to utilize
Scala for better multi concurrency/ functional programming.

- Henry



On Sun, Aug 31, 2014 at 4:31 AM, Till Rohrmann <[hidden email]> wrote:

> Hi Daniel,
>
> the RPC rework is discussed in
> https://issues.apache.org/jira/browse/FLINK-1019. Jira is currently down
> due to maintenance reasons.
>
> The ideas to use akka are the following. Akka allows us to reduce the code
> base which has to be maintained. Especially, we get rid of all the
> multi-threading programming of the rpc service which is always hard to work
> with. With Akka we would get the heartbeat signal for free, because Akka
> can detect dead actors. Akka uses supervision to handle fault tolerance as
> well as recovery and it allows an easy forwarding of remote exceptions. At
> the same time it offers a nice rpc abstraction which easily allows to
> implement asynchronous services. Furthermore, it scales rather well to
> large numbers of nodes and hopefully we get the latencies of Flink a little
> bit down.
>
> Bests,
>
> Till
>
>
> On Sun, Aug 31, 2014 at 11:35 AM, Daniel Warneke <[hidden email]> wrote:
>
>> Hi,
>>
>> will akka just be used for RPC or are there any plans to expand the
>> actor-based model to further parts of the runtime system? If so, could you
>> please point me to the discussion thread?
>>
>> Spontaneously, I would say that adding a hard dependency on Scala just for
>> the sake of having a hip RPC service sounds like a pretty dodgy deal.
>> Therefore, I would like understand how much value akka could bring to Flink
>> in the long run. The discussion whether to reimplement core components of
>> the system in Scala should be the second step in my opinion.
>>
>> Bests,
>>
>>     Daniel
>>
>>
>> Am 29.08.2014 11:33, schrieb Asterios Katsifodimos:
>>
>>  I agree that using Akka's actors from Java results in very ugly code.
>>> Hiding the internals of Akka behind Java reflection looks better but
>>> breaks
>>> the principles of actors. For me it is kind of a deal breaker for using
>>> Akka from Java.  I think that Till has more reasons to believe that Scala
>>> would be a more appropriate for building a new Job/Task Manager.
>>>
>>> I think that this discussion should focus on 4 main aspects:
>>> 1. Performance
>>> 2. Implementability
>>> 3. Maintainability
>>> 4. Available Tools
>>>
>>> 1. Performance: Since that the job of the JobManager and the TaskManager
>>> is
>>> to 1) exchange messages in order to maintain a distributed state machine
>>> and 2) setup connections between task managers, 3) detect failures etc..
>>> In
>>> these basic operations, performance should not be an issue. Akka was
>>> proven
>>> to scale quite well with very low latency. I guess that the low level
>>> "plumbing" (serialization, connections, etc.) will continue in Java in
>>> order to guarantee high performance. I have no clue on what's happening
>>> with memory management and whether this will be implemented in Java or
>>> Scala and the respective consequences. Please comment.
>>>
>>> 2. Since the Job/Task Manager is going to be essentially implemented from
>>> scratch, given the power of Akka, it seems to me that the implementation
>>> will be   easier, shorter and less verbose in Scala, given that Till is
>>> comfortable enough with Scala.
>>>
>>> 3. Given #2, maintaining the code and trying out new ideas in Scala would
>>> take less time and effort. But maintaining low level plumbing in Java and
>>> high level logic in Scala scares me. Anyone that has done this before
>>> could
>>> comment on this?
>>>
>>> 4. Tools: Robert has raised some issues already but I think that tools
>>> will
>>> get better with time.
>>>
>>> Given the above, I would focus on #3 to be honest. Apart from this, going
>>> the Scala way sounds like a great idea. I really second Kostas' opinion
>>> that if large changes are going to happen, this is the best moment.
>>>
>>> Cheers,
>>> Asterios
>>>
>>>
>>>
>>> On Fri, Aug 29, 2014 at 1:02 AM, Till Rohrmann <[hidden email]>
>>> wrote:
>>>
>>>  I also agree with Robert and Kostas that it has to be a community
>>>> decision.
>>>> I understand the problems with Eclipse and the Scala IDE which is a pain
>>>> in
>>>> the ass. But eventually these things will be fixed. Maybe we could also
>>>> talk to the typesafe guy and tell him that this problem bothers us a lot.
>>>>
>>>> I also believe that the project is not about a specific programming
>>>> language but a problem we want to tackle with Flink. From time to time it
>>>> might be necessary to adapt the tools in order to reach the goal. In
>>>> fact,
>>>> I don't believe that Scala parts would drive people away from the
>>>> project.
>>>> Instead, Scala enthusiasts would be motivated to join us.
>>>>
>>>> Actually I stumbled across a quote of Leibniz which put's my point of
>>>> view
>>>> quite accurately in a nutshell:
>>>>
>>>> In symbols one observes an advantage in discovery which is greatest when
>>>> they express the exact nature of a thing briefly and, as it were, picture
>>>> it; then indeed the labor of thought is wonderfully diminished --
>>>> Gottfried
>>>> Wilhelm Leibniz
>>>>
>>>>
>>>> On Thu, Aug 28, 2014 at 12:57 PM, Kostas Tzoumas <[hidden email]>
>>>> wrote:
>>>>
>>>>  On Thu, Aug 28, 2014 at 11:49 AM, Robert Metzger <[hidden email]>
>>>>> wrote:
>>>>>
>>>>>  Changing the programming language of a very important system component
>>>>>>
>>>>> is
>>>>
>>>>> something we should carefully discuss.
>>>>>>
>>>>>>  Definitely agree, I think the community should discuss this very
>>>>>
>>>> carefully.
>>>>
>>>>>
>>>>>  I understand that Akka is written in Scala and that it will be much
>>>>>>
>>>>> more
>>>>
>>>>> natural to implement the actor based system using Scala.
>>>>>> I see the following issues that we should consider:
>>>>>> Until now, Flink is clearly a project implemented only in Java. The
>>>>>>
>>>>> Scala
>>>>
>>>>> API basically sits on top of the Java-based runtime. We do not really
>>>>>> depend on Scala (we could easily remove the Scala API if we want to).
>>>>>> Having code written in Scala in the main system will add a hard
>>>>>>
>>>>> dependency
>>>>>
>>>>>> on a scala version.
>>>>>> Being a pure Java project has some advantages: I think its a fact that
>>>>>> there are more Java programmers than Scala programmers. So our chances
>>>>>>
>>>>> of
>>>>
>>>>> attracting new contributors are higher when being a Java project.
>>>>>> On the other hand, we could maybe attract Scala developers to our
>>>>>>
>>>>> project.
>>>>>
>>>>>> But that has not happened (for contributors, not users!) so far for our
>>>>>> Scala API, so I don't see any reason for that to happen.
>>>>>>
>>>>>>
>>>>>>  This is definitely an issue to consider. We need to carefully weight
>>>>> how
>>>>> important this issue is. If we want to break things, incubation is the
>>>>> right time to do it. Below are some arguments in favor of breaking
>>>>>
>>>> things,
>>>>
>>>>> but do keep in mind that I am undecided, and I would really like to see
>>>>>
>>>> the
>>>>
>>>>> community weighing in.
>>>>>
>>>>> First, I would dare say that the primary reason for someone to
>>>>> contribute
>>>>> to Flink so far has not been that the code is written in Java, but more
>>>>>
>>>> the
>>>>
>>>>> content and nature of the project. Most contributors are Big Data
>>>>> enthusiasts in some way or another.
>>>>>
>>>>> Second, Scala projects have attracted contributors in the past.
>>>>>
>>>>> Third, it should not be too hard for someone that does not know Scala to
>>>>> contribute to a different component if the interfaces are clear.
>>>>>
>>>>>
>>>>>  Another issue is tooling: There are a lot of problems with Scala and
>>>>>> Eclipse: I've recently switched to Eclipse Luna. It seems to be
>>>>>>
>>>>> impossible
>>>>>
>>>>>> to compile Scala code with Luna because ScalaIDE does not properly cope
>>>>>> with it.
>>>>>> Even with Eclipse versions that are supported by ScalaIDE, you have to
>>>>>> manually install 3 plugins, some of them are not available in the
>>>>>>
>>>>> Eclipse
>>>>
>>>>> Marketplace. So with a JobManager written in Scala, users can not just
>>>>>> import our project as a Maven project into Eclipse and start
>>>>>>
>>>>> developing.
>>>>
>>>>> The support for Maven is probably also limited. For example, I don't
>>>>>>
>>>>> know
>>>>
>>>>> if there is a checkstyle plugin for Scala.
>>>>>>
>>>>>> I'm looking forward to hearing other opinions on this issue. As I said
>>>>>>
>>>>> in
>>>>
>>>>> the beginning, we should exchange arguments on this and think about it
>>>>>>
>>>>> for
>>>>>
>>>>>> some time before we decide on this.
>>>>>>
>>>>>>  Best,
>>>>>
>>>>>> Robert
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 28, 2014 at 1:08 AM, Till Rohrmann <[hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>>  Hi guys,
>>>>>>>
>>>>>>> I currently working on replacing the old rpc infrastructure with an
>>>>>>>
>>>>>> akka
>>>>>
>>>>>> based actor system. In the wake of this change I will reimplement the
>>>>>>> JobManager and TaskManager which will then be actors. Akka offers a
>>>>>>>
>>>>>> Java
>>>>>
>>>>>> API but the implementation turns out to be very verbose and
>>>>>>>
>>>>>> laborious,
>>>>
>>>>> because Java 6 and 7 do not support lambdas and pattern matching.
>>>>>>>
>>>>>> Using
>>>>
>>>>> Scala instead, would allow a far more succinct and clear
>>>>>>>
>>>>>> implementation
>>>>
>>>>> of
>>>>>>
>>>>>>> the JobManager and TaskManager. Instead of a lot of if statements
>>>>>>>
>>>>>> using
>>>>
>>>>> instanceof to figure out the message type, we could simply use
>>>>>>>
>>>>>> pattern
>>>>
>>>>> matching. Furthermore, the callback functions could simply be Scala's
>>>>>>> anonymous functions. Therefore I would propose to use Scala for these
>>>>>>>
>>>>>> two
>>>>>
>>>>>> systems.
>>>>>>>
>>>>>>> The Akka system uses the slf4j library as logging interface.
>>>>>>>
>>>>>> Therefore
>>>>
>>>>> I
>>>>>
>>>>>> would also propose to replace the jcl logging system with the slf4j
>>>>>>>
>>>>>> logging
>>>>>>
>>>>>>> system. Since we want to use Akka in many parts of the runtime system
>>>>>>>
>>>>>> and
>>>>>
>>>>>> it recommends using logback as logging backend, I would also like to
>>>>>>> replace log4j with logback. But this change should inflict only few
>>>>>>>
>>>>>> changes
>>>>>>
>>>>>>> once we established the slf4j logging interface everywhere.
>>>>>>>
>>>>>>> What do you guys think of that idea?
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> Till
>>>>>>>
>>>>>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Replacing JobManager with Scala implementation

Ufuk Celebi-2
Hey Till,

I'm not sure what the "right" ASF process is, but I wouldn't mind a vote on
this in order to make sure that you don't do unnecessary work by replacing
the code with Scala.

I for one would be certainly open to it. The only thing that bothers me is
the current state of out-of-the-box IDE support. But since there are other
successful Scala projects around ;-), which manage to do it, why shouldn't
we?

@Henry, regarding Akka: I think the main motiviation for moving to Akka
(besides the points raised by Stephan and others) is that we actually don't
want to bother with low-level thread management, protocols, etc.



On Tue, Sep 2, 2014 at 8:32 PM, Henry Saputra <[hidden email]>
wrote:

> HI Till,
>
> Thanks for opening the discussions and lead the effort and apologize
> for late response.
>
> From what I have gathered so far, there are 2 issues:
> 1. Introducing Akka as RPC
> 2. Moving to Scala to enable easy access to Akka Scala APIs.
>
> For no1, if the RPC us used for lower level communications then we
> could probably consider Netty as the transport and serialization
> protocol (I also have added comment to the JIRA).
> Internally, to reduce thread management we could use Akka via Scala
> bridge service to make sure we use Scala Akka APIs.
>
> So addressing no 2, we could mix both Scala and Java in JobManager and
> TaskManager. the code that handle async RPC communications between JM
> and TM are using Java via Netty, and internal multi-threads or higher
> level plane code such as heart beat we could use Akka.
>
> It does introduce a bit mix between Java and Scala code but we already
> have mix of Scala and Java to support APIs so I think we could move
> some the internal code to use Scala too as "learning" steps to utilize
> Scala for better multi concurrency/ functional programming.
>
> - Henry
>
>
>
> On Sun, Aug 31, 2014 at 4:31 AM, Till Rohrmann <[hidden email]>
> wrote:
> > Hi Daniel,
> >
> > the RPC rework is discussed in
> > https://issues.apache.org/jira/browse/FLINK-1019. Jira is currently down
> > due to maintenance reasons.
> >
> > The ideas to use akka are the following. Akka allows us to reduce the
> code
> > base which has to be maintained. Especially, we get rid of all the
> > multi-threading programming of the rpc service which is always hard to
> work
> > with. With Akka we would get the heartbeat signal for free, because Akka
> > can detect dead actors. Akka uses supervision to handle fault tolerance
> as
> > well as recovery and it allows an easy forwarding of remote exceptions.
> At
> > the same time it offers a nice rpc abstraction which easily allows to
> > implement asynchronous services. Furthermore, it scales rather well to
> > large numbers of nodes and hopefully we get the latencies of Flink a
> little
> > bit down.
> >
> > Bests,
> >
> > Till
> >
> >
> > On Sun, Aug 31, 2014 at 11:35 AM, Daniel Warneke <[hidden email]>
> wrote:
> >
> >> Hi,
> >>
> >> will akka just be used for RPC or are there any plans to expand the
> >> actor-based model to further parts of the runtime system? If so, could
> you
> >> please point me to the discussion thread?
> >>
> >> Spontaneously, I would say that adding a hard dependency on Scala just
> for
> >> the sake of having a hip RPC service sounds like a pretty dodgy deal.
> >> Therefore, I would like understand how much value akka could bring to
> Flink
> >> in the long run. The discussion whether to reimplement core components
> of
> >> the system in Scala should be the second step in my opinion.
> >>
> >> Bests,
> >>
> >>     Daniel
> >>
> >>
> >> Am 29.08.2014 11:33, schrieb Asterios Katsifodimos:
> >>
> >>  I agree that using Akka's actors from Java results in very ugly code.
> >>> Hiding the internals of Akka behind Java reflection looks better but
> >>> breaks
> >>> the principles of actors. For me it is kind of a deal breaker for using
> >>> Akka from Java.  I think that Till has more reasons to believe that
> Scala
> >>> would be a more appropriate for building a new Job/Task Manager.
> >>>
> >>> I think that this discussion should focus on 4 main aspects:
> >>> 1. Performance
> >>> 2. Implementability
> >>> 3. Maintainability
> >>> 4. Available Tools
> >>>
> >>> 1. Performance: Since that the job of the JobManager and the
> TaskManager
> >>> is
> >>> to 1) exchange messages in order to maintain a distributed state
> machine
> >>> and 2) setup connections between task managers, 3) detect failures
> etc..
> >>> In
> >>> these basic operations, performance should not be an issue. Akka was
> >>> proven
> >>> to scale quite well with very low latency. I guess that the low level
> >>> "plumbing" (serialization, connections, etc.) will continue in Java in
> >>> order to guarantee high performance. I have no clue on what's happening
> >>> with memory management and whether this will be implemented in Java or
> >>> Scala and the respective consequences. Please comment.
> >>>
> >>> 2. Since the Job/Task Manager is going to be essentially implemented
> from
> >>> scratch, given the power of Akka, it seems to me that the
> implementation
> >>> will be   easier, shorter and less verbose in Scala, given that Till is
> >>> comfortable enough with Scala.
> >>>
> >>> 3. Given #2, maintaining the code and trying out new ideas in Scala
> would
> >>> take less time and effort. But maintaining low level plumbing in Java
> and
> >>> high level logic in Scala scares me. Anyone that has done this before
> >>> could
> >>> comment on this?
> >>>
> >>> 4. Tools: Robert has raised some issues already but I think that tools
> >>> will
> >>> get better with time.
> >>>
> >>> Given the above, I would focus on #3 to be honest. Apart from this,
> going
> >>> the Scala way sounds like a great idea. I really second Kostas' opinion
> >>> that if large changes are going to happen, this is the best moment.
> >>>
> >>> Cheers,
> >>> Asterios
> >>>
> >>>
> >>>
> >>> On Fri, Aug 29, 2014 at 1:02 AM, Till Rohrmann <
> [hidden email]>
> >>> wrote:
> >>>
> >>>  I also agree with Robert and Kostas that it has to be a community
> >>>> decision.
> >>>> I understand the problems with Eclipse and the Scala IDE which is a
> pain
> >>>> in
> >>>> the ass. But eventually these things will be fixed. Maybe we could
> also
> >>>> talk to the typesafe guy and tell him that this problem bothers us a
> lot.
> >>>>
> >>>> I also believe that the project is not about a specific programming
> >>>> language but a problem we want to tackle with Flink. From time to
> time it
> >>>> might be necessary to adapt the tools in order to reach the goal. In
> >>>> fact,
> >>>> I don't believe that Scala parts would drive people away from the
> >>>> project.
> >>>> Instead, Scala enthusiasts would be motivated to join us.
> >>>>
> >>>> Actually I stumbled across a quote of Leibniz which put's my point of
> >>>> view
> >>>> quite accurately in a nutshell:
> >>>>
> >>>> In symbols one observes an advantage in discovery which is greatest
> when
> >>>> they express the exact nature of a thing briefly and, as it were,
> picture
> >>>> it; then indeed the labor of thought is wonderfully diminished --
> >>>> Gottfried
> >>>> Wilhelm Leibniz
> >>>>
> >>>>
> >>>> On Thu, Aug 28, 2014 at 12:57 PM, Kostas Tzoumas <[hidden email]
> >
> >>>> wrote:
> >>>>
> >>>>  On Thu, Aug 28, 2014 at 11:49 AM, Robert Metzger <
> [hidden email]>
> >>>>> wrote:
> >>>>>
> >>>>>  Changing the programming language of a very important system
> component
> >>>>>>
> >>>>> is
> >>>>
> >>>>> something we should carefully discuss.
> >>>>>>
> >>>>>>  Definitely agree, I think the community should discuss this very
> >>>>>
> >>>> carefully.
> >>>>
> >>>>>
> >>>>>  I understand that Akka is written in Scala and that it will be much
> >>>>>>
> >>>>> more
> >>>>
> >>>>> natural to implement the actor based system using Scala.
> >>>>>> I see the following issues that we should consider:
> >>>>>> Until now, Flink is clearly a project implemented only in Java. The
> >>>>>>
> >>>>> Scala
> >>>>
> >>>>> API basically sits on top of the Java-based runtime. We do not really
> >>>>>> depend on Scala (we could easily remove the Scala API if we want
> to).
> >>>>>> Having code written in Scala in the main system will add a hard
> >>>>>>
> >>>>> dependency
> >>>>>
> >>>>>> on a scala version.
> >>>>>> Being a pure Java project has some advantages: I think its a fact
> that
> >>>>>> there are more Java programmers than Scala programmers. So our
> chances
> >>>>>>
> >>>>> of
> >>>>
> >>>>> attracting new contributors are higher when being a Java project.
> >>>>>> On the other hand, we could maybe attract Scala developers to our
> >>>>>>
> >>>>> project.
> >>>>>
> >>>>>> But that has not happened (for contributors, not users!) so far for
> our
> >>>>>> Scala API, so I don't see any reason for that to happen.
> >>>>>>
> >>>>>>
> >>>>>>  This is definitely an issue to consider. We need to carefully
> weight
> >>>>> how
> >>>>> important this issue is. If we want to break things, incubation is
> the
> >>>>> right time to do it. Below are some arguments in favor of breaking
> >>>>>
> >>>> things,
> >>>>
> >>>>> but do keep in mind that I am undecided, and I would really like to
> see
> >>>>>
> >>>> the
> >>>>
> >>>>> community weighing in.
> >>>>>
> >>>>> First, I would dare say that the primary reason for someone to
> >>>>> contribute
> >>>>> to Flink so far has not been that the code is written in Java, but
> more
> >>>>>
> >>>> the
> >>>>
> >>>>> content and nature of the project. Most contributors are Big Data
> >>>>> enthusiasts in some way or another.
> >>>>>
> >>>>> Second, Scala projects have attracted contributors in the past.
> >>>>>
> >>>>> Third, it should not be too hard for someone that does not know
> Scala to
> >>>>> contribute to a different component if the interfaces are clear.
> >>>>>
> >>>>>
> >>>>>  Another issue is tooling: There are a lot of problems with Scala and
> >>>>>> Eclipse: I've recently switched to Eclipse Luna. It seems to be
> >>>>>>
> >>>>> impossible
> >>>>>
> >>>>>> to compile Scala code with Luna because ScalaIDE does not properly
> cope
> >>>>>> with it.
> >>>>>> Even with Eclipse versions that are supported by ScalaIDE, you have
> to
> >>>>>> manually install 3 plugins, some of them are not available in the
> >>>>>>
> >>>>> Eclipse
> >>>>
> >>>>> Marketplace. So with a JobManager written in Scala, users can not
> just
> >>>>>> import our project as a Maven project into Eclipse and start
> >>>>>>
> >>>>> developing.
> >>>>
> >>>>> The support for Maven is probably also limited. For example, I don't
> >>>>>>
> >>>>> know
> >>>>
> >>>>> if there is a checkstyle plugin for Scala.
> >>>>>>
> >>>>>> I'm looking forward to hearing other opinions on this issue. As I
> said
> >>>>>>
> >>>>> in
> >>>>
> >>>>> the beginning, we should exchange arguments on this and think about
> it
> >>>>>>
> >>>>> for
> >>>>>
> >>>>>> some time before we decide on this.
> >>>>>>
> >>>>>>  Best,
> >>>>>
> >>>>>> Robert
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Aug 28, 2014 at 1:08 AM, Till Rohrmann <
> [hidden email]>
> >>>>>> wrote:
> >>>>>>
> >>>>>>  Hi guys,
> >>>>>>>
> >>>>>>> I currently working on replacing the old rpc infrastructure with an
> >>>>>>>
> >>>>>> akka
> >>>>>
> >>>>>> based actor system. In the wake of this change I will reimplement
> the
> >>>>>>> JobManager and TaskManager which will then be actors. Akka offers a
> >>>>>>>
> >>>>>> Java
> >>>>>
> >>>>>> API but the implementation turns out to be very verbose and
> >>>>>>>
> >>>>>> laborious,
> >>>>
> >>>>> because Java 6 and 7 do not support lambdas and pattern matching.
> >>>>>>>
> >>>>>> Using
> >>>>
> >>>>> Scala instead, would allow a far more succinct and clear
> >>>>>>>
> >>>>>> implementation
> >>>>
> >>>>> of
> >>>>>>
> >>>>>>> the JobManager and TaskManager. Instead of a lot of if statements
> >>>>>>>
> >>>>>> using
> >>>>
> >>>>> instanceof to figure out the message type, we could simply use
> >>>>>>>
> >>>>>> pattern
> >>>>
> >>>>> matching. Furthermore, the callback functions could simply be Scala's
> >>>>>>> anonymous functions. Therefore I would propose to use Scala for
> these
> >>>>>>>
> >>>>>> two
> >>>>>
> >>>>>> systems.
> >>>>>>>
> >>>>>>> The Akka system uses the slf4j library as logging interface.
> >>>>>>>
> >>>>>> Therefore
> >>>>
> >>>>> I
> >>>>>
> >>>>>> would also propose to replace the jcl logging system with the slf4j
> >>>>>>>
> >>>>>> logging
> >>>>>>
> >>>>>>> system. Since we want to use Akka in many parts of the runtime
> system
> >>>>>>>
> >>>>>> and
> >>>>>
> >>>>>> it recommends using logback as logging backend, I would also like to
> >>>>>>> replace log4j with logback. But this change should inflict only few
> >>>>>>>
> >>>>>> changes
> >>>>>>
> >>>>>>> once we established the slf4j logging interface everywhere.
> >>>>>>>
> >>>>>>> What do you guys think of that idea?
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>>
> >>>>>>> Till
> >>>>>>>
> >>>>>>>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Replacing JobManager with Scala implementation

Henry Saputra
Thanks @Ufuk for the response.

Yeah, Akka hides all the low level nuts and bolts about the RPC flow
but then it also makes a bit harder to debug issues when communication
fail.
It makes sense to use one RPC framework if we could, and since there
are other plans for Akka in the code to help manage concurrencies
programming it is good idea to use Akka for RPC.

- Henry


On Wed, Sep 3, 2014 at 5:06 AM, Ufuk Celebi <[hidden email]> wrote:

> Hey Till,
>
> I'm not sure what the "right" ASF process is, but I wouldn't mind a vote on
> this in order to make sure that you don't do unnecessary work by replacing
> the code with Scala.
>
> I for one would be certainly open to it. The only thing that bothers me is
> the current state of out-of-the-box IDE support. But since there are other
> successful Scala projects around ;-), which manage to do it, why shouldn't
> we?
>
> @Henry, regarding Akka: I think the main motiviation for moving to Akka
> (besides the points raised by Stephan and others) is that we actually don't
> want to bother with low-level thread management, protocols, etc.
>
>
>
> On Tue, Sep 2, 2014 at 8:32 PM, Henry Saputra <[hidden email]>
> wrote:
>
>> HI Till,
>>
>> Thanks for opening the discussions and lead the effort and apologize
>> for late response.
>>
>> From what I have gathered so far, there are 2 issues:
>> 1. Introducing Akka as RPC
>> 2. Moving to Scala to enable easy access to Akka Scala APIs.
>>
>> For no1, if the RPC us used for lower level communications then we
>> could probably consider Netty as the transport and serialization
>> protocol (I also have added comment to the JIRA).
>> Internally, to reduce thread management we could use Akka via Scala
>> bridge service to make sure we use Scala Akka APIs.
>>
>> So addressing no 2, we could mix both Scala and Java in JobManager and
>> TaskManager. the code that handle async RPC communications between JM
>> and TM are using Java via Netty, and internal multi-threads or higher
>> level plane code such as heart beat we could use Akka.
>>
>> It does introduce a bit mix between Java and Scala code but we already
>> have mix of Scala and Java to support APIs so I think we could move
>> some the internal code to use Scala too as "learning" steps to utilize
>> Scala for better multi concurrency/ functional programming.
>>
>> - Henry
>>
>>
>>
>> On Sun, Aug 31, 2014 at 4:31 AM, Till Rohrmann <[hidden email]>
>> wrote:
>> > Hi Daniel,
>> >
>> > the RPC rework is discussed in
>> > https://issues.apache.org/jira/browse/FLINK-1019. Jira is currently down
>> > due to maintenance reasons.
>> >
>> > The ideas to use akka are the following. Akka allows us to reduce the
>> code
>> > base which has to be maintained. Especially, we get rid of all the
>> > multi-threading programming of the rpc service which is always hard to
>> work
>> > with. With Akka we would get the heartbeat signal for free, because Akka
>> > can detect dead actors. Akka uses supervision to handle fault tolerance
>> as
>> > well as recovery and it allows an easy forwarding of remote exceptions.
>> At
>> > the same time it offers a nice rpc abstraction which easily allows to
>> > implement asynchronous services. Furthermore, it scales rather well to
>> > large numbers of nodes and hopefully we get the latencies of Flink a
>> little
>> > bit down.
>> >
>> > Bests,
>> >
>> > Till
>> >
>> >
>> > On Sun, Aug 31, 2014 at 11:35 AM, Daniel Warneke <[hidden email]>
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> will akka just be used for RPC or are there any plans to expand the
>> >> actor-based model to further parts of the runtime system? If so, could
>> you
>> >> please point me to the discussion thread?
>> >>
>> >> Spontaneously, I would say that adding a hard dependency on Scala just
>> for
>> >> the sake of having a hip RPC service sounds like a pretty dodgy deal.
>> >> Therefore, I would like understand how much value akka could bring to
>> Flink
>> >> in the long run. The discussion whether to reimplement core components
>> of
>> >> the system in Scala should be the second step in my opinion.
>> >>
>> >> Bests,
>> >>
>> >>     Daniel
>> >>
>> >>
>> >> Am 29.08.2014 11:33, schrieb Asterios Katsifodimos:
>> >>
>> >>  I agree that using Akka's actors from Java results in very ugly code.
>> >>> Hiding the internals of Akka behind Java reflection looks better but
>> >>> breaks
>> >>> the principles of actors. For me it is kind of a deal breaker for using
>> >>> Akka from Java.  I think that Till has more reasons to believe that
>> Scala
>> >>> would be a more appropriate for building a new Job/Task Manager.
>> >>>
>> >>> I think that this discussion should focus on 4 main aspects:
>> >>> 1. Performance
>> >>> 2. Implementability
>> >>> 3. Maintainability
>> >>> 4. Available Tools
>> >>>
>> >>> 1. Performance: Since that the job of the JobManager and the
>> TaskManager
>> >>> is
>> >>> to 1) exchange messages in order to maintain a distributed state
>> machine
>> >>> and 2) setup connections between task managers, 3) detect failures
>> etc..
>> >>> In
>> >>> these basic operations, performance should not be an issue. Akka was
>> >>> proven
>> >>> to scale quite well with very low latency. I guess that the low level
>> >>> "plumbing" (serialization, connections, etc.) will continue in Java in
>> >>> order to guarantee high performance. I have no clue on what's happening
>> >>> with memory management and whether this will be implemented in Java or
>> >>> Scala and the respective consequences. Please comment.
>> >>>
>> >>> 2. Since the Job/Task Manager is going to be essentially implemented
>> from
>> >>> scratch, given the power of Akka, it seems to me that the
>> implementation
>> >>> will be   easier, shorter and less verbose in Scala, given that Till is
>> >>> comfortable enough with Scala.
>> >>>
>> >>> 3. Given #2, maintaining the code and trying out new ideas in Scala
>> would
>> >>> take less time and effort. But maintaining low level plumbing in Java
>> and
>> >>> high level logic in Scala scares me. Anyone that has done this before
>> >>> could
>> >>> comment on this?
>> >>>
>> >>> 4. Tools: Robert has raised some issues already but I think that tools
>> >>> will
>> >>> get better with time.
>> >>>
>> >>> Given the above, I would focus on #3 to be honest. Apart from this,
>> going
>> >>> the Scala way sounds like a great idea. I really second Kostas' opinion
>> >>> that if large changes are going to happen, this is the best moment.
>> >>>
>> >>> Cheers,
>> >>> Asterios
>> >>>
>> >>>
>> >>>
>> >>> On Fri, Aug 29, 2014 at 1:02 AM, Till Rohrmann <
>> [hidden email]>
>> >>> wrote:
>> >>>
>> >>>  I also agree with Robert and Kostas that it has to be a community
>> >>>> decision.
>> >>>> I understand the problems with Eclipse and the Scala IDE which is a
>> pain
>> >>>> in
>> >>>> the ass. But eventually these things will be fixed. Maybe we could
>> also
>> >>>> talk to the typesafe guy and tell him that this problem bothers us a
>> lot.
>> >>>>
>> >>>> I also believe that the project is not about a specific programming
>> >>>> language but a problem we want to tackle with Flink. From time to
>> time it
>> >>>> might be necessary to adapt the tools in order to reach the goal. In
>> >>>> fact,
>> >>>> I don't believe that Scala parts would drive people away from the
>> >>>> project.
>> >>>> Instead, Scala enthusiasts would be motivated to join us.
>> >>>>
>> >>>> Actually I stumbled across a quote of Leibniz which put's my point of
>> >>>> view
>> >>>> quite accurately in a nutshell:
>> >>>>
>> >>>> In symbols one observes an advantage in discovery which is greatest
>> when
>> >>>> they express the exact nature of a thing briefly and, as it were,
>> picture
>> >>>> it; then indeed the labor of thought is wonderfully diminished --
>> >>>> Gottfried
>> >>>> Wilhelm Leibniz
>> >>>>
>> >>>>
>> >>>> On Thu, Aug 28, 2014 at 12:57 PM, Kostas Tzoumas <[hidden email]
>> >
>> >>>> wrote:
>> >>>>
>> >>>>  On Thu, Aug 28, 2014 at 11:49 AM, Robert Metzger <
>> [hidden email]>
>> >>>>> wrote:
>> >>>>>
>> >>>>>  Changing the programming language of a very important system
>> component
>> >>>>>>
>> >>>>> is
>> >>>>
>> >>>>> something we should carefully discuss.
>> >>>>>>
>> >>>>>>  Definitely agree, I think the community should discuss this very
>> >>>>>
>> >>>> carefully.
>> >>>>
>> >>>>>
>> >>>>>  I understand that Akka is written in Scala and that it will be much
>> >>>>>>
>> >>>>> more
>> >>>>
>> >>>>> natural to implement the actor based system using Scala.
>> >>>>>> I see the following issues that we should consider:
>> >>>>>> Until now, Flink is clearly a project implemented only in Java. The
>> >>>>>>
>> >>>>> Scala
>> >>>>
>> >>>>> API basically sits on top of the Java-based runtime. We do not really
>> >>>>>> depend on Scala (we could easily remove the Scala API if we want
>> to).
>> >>>>>> Having code written in Scala in the main system will add a hard
>> >>>>>>
>> >>>>> dependency
>> >>>>>
>> >>>>>> on a scala version.
>> >>>>>> Being a pure Java project has some advantages: I think its a fact
>> that
>> >>>>>> there are more Java programmers than Scala programmers. So our
>> chances
>> >>>>>>
>> >>>>> of
>> >>>>
>> >>>>> attracting new contributors are higher when being a Java project.
>> >>>>>> On the other hand, we could maybe attract Scala developers to our
>> >>>>>>
>> >>>>> project.
>> >>>>>
>> >>>>>> But that has not happened (for contributors, not users!) so far for
>> our
>> >>>>>> Scala API, so I don't see any reason for that to happen.
>> >>>>>>
>> >>>>>>
>> >>>>>>  This is definitely an issue to consider. We need to carefully
>> weight
>> >>>>> how
>> >>>>> important this issue is. If we want to break things, incubation is
>> the
>> >>>>> right time to do it. Below are some arguments in favor of breaking
>> >>>>>
>> >>>> things,
>> >>>>
>> >>>>> but do keep in mind that I am undecided, and I would really like to
>> see
>> >>>>>
>> >>>> the
>> >>>>
>> >>>>> community weighing in.
>> >>>>>
>> >>>>> First, I would dare say that the primary reason for someone to
>> >>>>> contribute
>> >>>>> to Flink so far has not been that the code is written in Java, but
>> more
>> >>>>>
>> >>>> the
>> >>>>
>> >>>>> content and nature of the project. Most contributors are Big Data
>> >>>>> enthusiasts in some way or another.
>> >>>>>
>> >>>>> Second, Scala projects have attracted contributors in the past.
>> >>>>>
>> >>>>> Third, it should not be too hard for someone that does not know
>> Scala to
>> >>>>> contribute to a different component if the interfaces are clear.
>> >>>>>
>> >>>>>
>> >>>>>  Another issue is tooling: There are a lot of problems with Scala and
>> >>>>>> Eclipse: I've recently switched to Eclipse Luna. It seems to be
>> >>>>>>
>> >>>>> impossible
>> >>>>>
>> >>>>>> to compile Scala code with Luna because ScalaIDE does not properly
>> cope
>> >>>>>> with it.
>> >>>>>> Even with Eclipse versions that are supported by ScalaIDE, you have
>> to
>> >>>>>> manually install 3 plugins, some of them are not available in the
>> >>>>>>
>> >>>>> Eclipse
>> >>>>
>> >>>>> Marketplace. So with a JobManager written in Scala, users can not
>> just
>> >>>>>> import our project as a Maven project into Eclipse and start
>> >>>>>>
>> >>>>> developing.
>> >>>>
>> >>>>> The support for Maven is probably also limited. For example, I don't
>> >>>>>>
>> >>>>> know
>> >>>>
>> >>>>> if there is a checkstyle plugin for Scala.
>> >>>>>>
>> >>>>>> I'm looking forward to hearing other opinions on this issue. As I
>> said
>> >>>>>>
>> >>>>> in
>> >>>>
>> >>>>> the beginning, we should exchange arguments on this and think about
>> it
>> >>>>>>
>> >>>>> for
>> >>>>>
>> >>>>>> some time before we decide on this.
>> >>>>>>
>> >>>>>>  Best,
>> >>>>>
>> >>>>>> Robert
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Thu, Aug 28, 2014 at 1:08 AM, Till Rohrmann <
>> [hidden email]>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>  Hi guys,
>> >>>>>>>
>> >>>>>>> I currently working on replacing the old rpc infrastructure with an
>> >>>>>>>
>> >>>>>> akka
>> >>>>>
>> >>>>>> based actor system. In the wake of this change I will reimplement
>> the
>> >>>>>>> JobManager and TaskManager which will then be actors. Akka offers a
>> >>>>>>>
>> >>>>>> Java
>> >>>>>
>> >>>>>> API but the implementation turns out to be very verbose and
>> >>>>>>>
>> >>>>>> laborious,
>> >>>>
>> >>>>> because Java 6 and 7 do not support lambdas and pattern matching.
>> >>>>>>>
>> >>>>>> Using
>> >>>>
>> >>>>> Scala instead, would allow a far more succinct and clear
>> >>>>>>>
>> >>>>>> implementation
>> >>>>
>> >>>>> of
>> >>>>>>
>> >>>>>>> the JobManager and TaskManager. Instead of a lot of if statements
>> >>>>>>>
>> >>>>>> using
>> >>>>
>> >>>>> instanceof to figure out the message type, we could simply use
>> >>>>>>>
>> >>>>>> pattern
>> >>>>
>> >>>>> matching. Furthermore, the callback functions could simply be Scala's
>> >>>>>>> anonymous functions. Therefore I would propose to use Scala for
>> these
>> >>>>>>>
>> >>>>>> two
>> >>>>>
>> >>>>>> systems.
>> >>>>>>>
>> >>>>>>> The Akka system uses the slf4j library as logging interface.
>> >>>>>>>
>> >>>>>> Therefore
>> >>>>
>> >>>>> I
>> >>>>>
>> >>>>>> would also propose to replace the jcl logging system with the slf4j
>> >>>>>>>
>> >>>>>> logging
>> >>>>>>
>> >>>>>>> system. Since we want to use Akka in many parts of the runtime
>> system
>> >>>>>>>
>> >>>>>> and
>> >>>>>
>> >>>>>> it recommends using logback as logging backend, I would also like to
>> >>>>>>> replace log4j with logback. But this change should inflict only few
>> >>>>>>>
>> >>>>>> changes
>> >>>>>>
>> >>>>>>> once we established the slf4j logging interface everywhere.
>> >>>>>>>
>> >>>>>>> What do you guys think of that idea?
>> >>>>>>>
>> >>>>>>> Best regards,
>> >>>>>>>
>> >>>>>>> Till
>> >>>>>>>
>> >>>>>>>
>> >>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Replacing JobManager with Scala implementation

till.rohrmann
How do we then start the vote on whether we should implement the JobManager
with Scala or not? Can we just do it in this thread or should it happen in
a separate thread?


On Wed, Sep 3, 2014 at 6:27 PM, Henry Saputra <[hidden email]>
wrote:

> Thanks @Ufuk for the response.
>
> Yeah, Akka hides all the low level nuts and bolts about the RPC flow
> but then it also makes a bit harder to debug issues when communication
> fail.
> It makes sense to use one RPC framework if we could, and since there
> are other plans for Akka in the code to help manage concurrencies
> programming it is good idea to use Akka for RPC.
>
> - Henry
>
>
> On Wed, Sep 3, 2014 at 5:06 AM, Ufuk Celebi <[hidden email]> wrote:
> > Hey Till,
> >
> > I'm not sure what the "right" ASF process is, but I wouldn't mind a vote
> on
> > this in order to make sure that you don't do unnecessary work by
> replacing
> > the code with Scala.
> >
> > I for one would be certainly open to it. The only thing that bothers me
> is
> > the current state of out-of-the-box IDE support. But since there are
> other
> > successful Scala projects around ;-), which manage to do it, why
> shouldn't
> > we?
> >
> > @Henry, regarding Akka: I think the main motiviation for moving to Akka
> > (besides the points raised by Stephan and others) is that we actually
> don't
> > want to bother with low-level thread management, protocols, etc.
> >
> >
> >
> > On Tue, Sep 2, 2014 at 8:32 PM, Henry Saputra <[hidden email]>
> > wrote:
> >
> >> HI Till,
> >>
> >> Thanks for opening the discussions and lead the effort and apologize
> >> for late response.
> >>
> >> From what I have gathered so far, there are 2 issues:
> >> 1. Introducing Akka as RPC
> >> 2. Moving to Scala to enable easy access to Akka Scala APIs.
> >>
> >> For no1, if the RPC us used for lower level communications then we
> >> could probably consider Netty as the transport and serialization
> >> protocol (I also have added comment to the JIRA).
> >> Internally, to reduce thread management we could use Akka via Scala
> >> bridge service to make sure we use Scala Akka APIs.
> >>
> >> So addressing no 2, we could mix both Scala and Java in JobManager and
> >> TaskManager. the code that handle async RPC communications between JM
> >> and TM are using Java via Netty, and internal multi-threads or higher
> >> level plane code such as heart beat we could use Akka.
> >>
> >> It does introduce a bit mix between Java and Scala code but we already
> >> have mix of Scala and Java to support APIs so I think we could move
> >> some the internal code to use Scala too as "learning" steps to utilize
> >> Scala for better multi concurrency/ functional programming.
> >>
> >> - Henry
> >>
> >>
> >>
> >> On Sun, Aug 31, 2014 at 4:31 AM, Till Rohrmann <[hidden email]
> >
> >> wrote:
> >> > Hi Daniel,
> >> >
> >> > the RPC rework is discussed in
> >> > https://issues.apache.org/jira/browse/FLINK-1019. Jira is currently
> down
> >> > due to maintenance reasons.
> >> >
> >> > The ideas to use akka are the following. Akka allows us to reduce the
> >> code
> >> > base which has to be maintained. Especially, we get rid of all the
> >> > multi-threading programming of the rpc service which is always hard to
> >> work
> >> > with. With Akka we would get the heartbeat signal for free, because
> Akka
> >> > can detect dead actors. Akka uses supervision to handle fault
> tolerance
> >> as
> >> > well as recovery and it allows an easy forwarding of remote
> exceptions.
> >> At
> >> > the same time it offers a nice rpc abstraction which easily allows to
> >> > implement asynchronous services. Furthermore, it scales rather well to
> >> > large numbers of nodes and hopefully we get the latencies of Flink a
> >> little
> >> > bit down.
> >> >
> >> > Bests,
> >> >
> >> > Till
> >> >
> >> >
> >> > On Sun, Aug 31, 2014 at 11:35 AM, Daniel Warneke <[hidden email]>
> >> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> will akka just be used for RPC or are there any plans to expand the
> >> >> actor-based model to further parts of the runtime system? If so,
> could
> >> you
> >> >> please point me to the discussion thread?
> >> >>
> >> >> Spontaneously, I would say that adding a hard dependency on Scala
> just
> >> for
> >> >> the sake of having a hip RPC service sounds like a pretty dodgy deal.
> >> >> Therefore, I would like understand how much value akka could bring to
> >> Flink
> >> >> in the long run. The discussion whether to reimplement core
> components
> >> of
> >> >> the system in Scala should be the second step in my opinion.
> >> >>
> >> >> Bests,
> >> >>
> >> >>     Daniel
> >> >>
> >> >>
> >> >> Am 29.08.2014 11:33, schrieb Asterios Katsifodimos:
> >> >>
> >> >>  I agree that using Akka's actors from Java results in very ugly
> code.
> >> >>> Hiding the internals of Akka behind Java reflection looks better but
> >> >>> breaks
> >> >>> the principles of actors. For me it is kind of a deal breaker for
> using
> >> >>> Akka from Java.  I think that Till has more reasons to believe that
> >> Scala
> >> >>> would be a more appropriate for building a new Job/Task Manager.
> >> >>>
> >> >>> I think that this discussion should focus on 4 main aspects:
> >> >>> 1. Performance
> >> >>> 2. Implementability
> >> >>> 3. Maintainability
> >> >>> 4. Available Tools
> >> >>>
> >> >>> 1. Performance: Since that the job of the JobManager and the
> >> TaskManager
> >> >>> is
> >> >>> to 1) exchange messages in order to maintain a distributed state
> >> machine
> >> >>> and 2) setup connections between task managers, 3) detect failures
> >> etc..
> >> >>> In
> >> >>> these basic operations, performance should not be an issue. Akka was
> >> >>> proven
> >> >>> to scale quite well with very low latency. I guess that the low
> level
> >> >>> "plumbing" (serialization, connections, etc.) will continue in Java
> in
> >> >>> order to guarantee high performance. I have no clue on what's
> happening
> >> >>> with memory management and whether this will be implemented in Java
> or
> >> >>> Scala and the respective consequences. Please comment.
> >> >>>
> >> >>> 2. Since the Job/Task Manager is going to be essentially implemented
> >> from
> >> >>> scratch, given the power of Akka, it seems to me that the
> >> implementation
> >> >>> will be   easier, shorter and less verbose in Scala, given that
> Till is
> >> >>> comfortable enough with Scala.
> >> >>>
> >> >>> 3. Given #2, maintaining the code and trying out new ideas in Scala
> >> would
> >> >>> take less time and effort. But maintaining low level plumbing in
> Java
> >> and
> >> >>> high level logic in Scala scares me. Anyone that has done this
> before
> >> >>> could
> >> >>> comment on this?
> >> >>>
> >> >>> 4. Tools: Robert has raised some issues already but I think that
> tools
> >> >>> will
> >> >>> get better with time.
> >> >>>
> >> >>> Given the above, I would focus on #3 to be honest. Apart from this,
> >> going
> >> >>> the Scala way sounds like a great idea. I really second Kostas'
> opinion
> >> >>> that if large changes are going to happen, this is the best moment.
> >> >>>
> >> >>> Cheers,
> >> >>> Asterios
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Fri, Aug 29, 2014 at 1:02 AM, Till Rohrmann <
> >> [hidden email]>
> >> >>> wrote:
> >> >>>
> >> >>>  I also agree with Robert and Kostas that it has to be a community
> >> >>>> decision.
> >> >>>> I understand the problems with Eclipse and the Scala IDE which is a
> >> pain
> >> >>>> in
> >> >>>> the ass. But eventually these things will be fixed. Maybe we could
> >> also
> >> >>>> talk to the typesafe guy and tell him that this problem bothers us
> a
> >> lot.
> >> >>>>
> >> >>>> I also believe that the project is not about a specific programming
> >> >>>> language but a problem we want to tackle with Flink. From time to
> >> time it
> >> >>>> might be necessary to adapt the tools in order to reach the goal.
> In
> >> >>>> fact,
> >> >>>> I don't believe that Scala parts would drive people away from the
> >> >>>> project.
> >> >>>> Instead, Scala enthusiasts would be motivated to join us.
> >> >>>>
> >> >>>> Actually I stumbled across a quote of Leibniz which put's my point
> of
> >> >>>> view
> >> >>>> quite accurately in a nutshell:
> >> >>>>
> >> >>>> In symbols one observes an advantage in discovery which is greatest
> >> when
> >> >>>> they express the exact nature of a thing briefly and, as it were,
> >> picture
> >> >>>> it; then indeed the labor of thought is wonderfully diminished --
> >> >>>> Gottfried
> >> >>>> Wilhelm Leibniz
> >> >>>>
> >> >>>>
> >> >>>> On Thu, Aug 28, 2014 at 12:57 PM, Kostas Tzoumas <
> [hidden email]
> >> >
> >> >>>> wrote:
> >> >>>>
> >> >>>>  On Thu, Aug 28, 2014 at 11:49 AM, Robert Metzger <
> >> [hidden email]>
> >> >>>>> wrote:
> >> >>>>>
> >> >>>>>  Changing the programming language of a very important system
> >> component
> >> >>>>>>
> >> >>>>> is
> >> >>>>
> >> >>>>> something we should carefully discuss.
> >> >>>>>>
> >> >>>>>>  Definitely agree, I think the community should discuss this very
> >> >>>>>
> >> >>>> carefully.
> >> >>>>
> >> >>>>>
> >> >>>>>  I understand that Akka is written in Scala and that it will be
> much
> >> >>>>>>
> >> >>>>> more
> >> >>>>
> >> >>>>> natural to implement the actor based system using Scala.
> >> >>>>>> I see the following issues that we should consider:
> >> >>>>>> Until now, Flink is clearly a project implemented only in Java.
> The
> >> >>>>>>
> >> >>>>> Scala
> >> >>>>
> >> >>>>> API basically sits on top of the Java-based runtime. We do not
> really
> >> >>>>>> depend on Scala (we could easily remove the Scala API if we want
> >> to).
> >> >>>>>> Having code written in Scala in the main system will add a hard
> >> >>>>>>
> >> >>>>> dependency
> >> >>>>>
> >> >>>>>> on a scala version.
> >> >>>>>> Being a pure Java project has some advantages: I think its a fact
> >> that
> >> >>>>>> there are more Java programmers than Scala programmers. So our
> >> chances
> >> >>>>>>
> >> >>>>> of
> >> >>>>
> >> >>>>> attracting new contributors are higher when being a Java project.
> >> >>>>>> On the other hand, we could maybe attract Scala developers to our
> >> >>>>>>
> >> >>>>> project.
> >> >>>>>
> >> >>>>>> But that has not happened (for contributors, not users!) so far
> for
> >> our
> >> >>>>>> Scala API, so I don't see any reason for that to happen.
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>  This is definitely an issue to consider. We need to carefully
> >> weight
> >> >>>>> how
> >> >>>>> important this issue is. If we want to break things, incubation is
> >> the
> >> >>>>> right time to do it. Below are some arguments in favor of breaking
> >> >>>>>
> >> >>>> things,
> >> >>>>
> >> >>>>> but do keep in mind that I am undecided, and I would really like
> to
> >> see
> >> >>>>>
> >> >>>> the
> >> >>>>
> >> >>>>> community weighing in.
> >> >>>>>
> >> >>>>> First, I would dare say that the primary reason for someone to
> >> >>>>> contribute
> >> >>>>> to Flink so far has not been that the code is written in Java, but
> >> more
> >> >>>>>
> >> >>>> the
> >> >>>>
> >> >>>>> content and nature of the project. Most contributors are Big Data
> >> >>>>> enthusiasts in some way or another.
> >> >>>>>
> >> >>>>> Second, Scala projects have attracted contributors in the past.
> >> >>>>>
> >> >>>>> Third, it should not be too hard for someone that does not know
> >> Scala to
> >> >>>>> contribute to a different component if the interfaces are clear.
> >> >>>>>
> >> >>>>>
> >> >>>>>  Another issue is tooling: There are a lot of problems with Scala
> and
> >> >>>>>> Eclipse: I've recently switched to Eclipse Luna. It seems to be
> >> >>>>>>
> >> >>>>> impossible
> >> >>>>>
> >> >>>>>> to compile Scala code with Luna because ScalaIDE does not
> properly
> >> cope
> >> >>>>>> with it.
> >> >>>>>> Even with Eclipse versions that are supported by ScalaIDE, you
> have
> >> to
> >> >>>>>> manually install 3 plugins, some of them are not available in the
> >> >>>>>>
> >> >>>>> Eclipse
> >> >>>>
> >> >>>>> Marketplace. So with a JobManager written in Scala, users can not
> >> just
> >> >>>>>> import our project as a Maven project into Eclipse and start
> >> >>>>>>
> >> >>>>> developing.
> >> >>>>
> >> >>>>> The support for Maven is probably also limited. For example, I
> don't
> >> >>>>>>
> >> >>>>> know
> >> >>>>
> >> >>>>> if there is a checkstyle plugin for Scala.
> >> >>>>>>
> >> >>>>>> I'm looking forward to hearing other opinions on this issue. As I
> >> said
> >> >>>>>>
> >> >>>>> in
> >> >>>>
> >> >>>>> the beginning, we should exchange arguments on this and think
> about
> >> it
> >> >>>>>>
> >> >>>>> for
> >> >>>>>
> >> >>>>>> some time before we decide on this.
> >> >>>>>>
> >> >>>>>>  Best,
> >> >>>>>
> >> >>>>>> Robert
> >> >>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On Thu, Aug 28, 2014 at 1:08 AM, Till Rohrmann <
> >> [hidden email]>
> >> >>>>>> wrote:
> >> >>>>>>
> >> >>>>>>  Hi guys,
> >> >>>>>>>
> >> >>>>>>> I currently working on replacing the old rpc infrastructure
> with an
> >> >>>>>>>
> >> >>>>>> akka
> >> >>>>>
> >> >>>>>> based actor system. In the wake of this change I will reimplement
> >> the
> >> >>>>>>> JobManager and TaskManager which will then be actors. Akka
> offers a
> >> >>>>>>>
> >> >>>>>> Java
> >> >>>>>
> >> >>>>>> API but the implementation turns out to be very verbose and
> >> >>>>>>>
> >> >>>>>> laborious,
> >> >>>>
> >> >>>>> because Java 6 and 7 do not support lambdas and pattern matching.
> >> >>>>>>>
> >> >>>>>> Using
> >> >>>>
> >> >>>>> Scala instead, would allow a far more succinct and clear
> >> >>>>>>>
> >> >>>>>> implementation
> >> >>>>
> >> >>>>> of
> >> >>>>>>
> >> >>>>>>> the JobManager and TaskManager. Instead of a lot of if
> statements
> >> >>>>>>>
> >> >>>>>> using
> >> >>>>
> >> >>>>> instanceof to figure out the message type, we could simply use
> >> >>>>>>>
> >> >>>>>> pattern
> >> >>>>
> >> >>>>> matching. Furthermore, the callback functions could simply be
> Scala's
> >> >>>>>>> anonymous functions. Therefore I would propose to use Scala for
> >> these
> >> >>>>>>>
> >> >>>>>> two
> >> >>>>>
> >> >>>>>> systems.
> >> >>>>>>>
> >> >>>>>>> The Akka system uses the slf4j library as logging interface.
> >> >>>>>>>
> >> >>>>>> Therefore
> >> >>>>
> >> >>>>> I
> >> >>>>>
> >> >>>>>> would also propose to replace the jcl logging system with the
> slf4j
> >> >>>>>>>
> >> >>>>>> logging
> >> >>>>>>
> >> >>>>>>> system. Since we want to use Akka in many parts of the runtime
> >> system
> >> >>>>>>>
> >> >>>>>> and
> >> >>>>>
> >> >>>>>> it recommends using logback as logging backend, I would also
> like to
> >> >>>>>>> replace log4j with logback. But this change should inflict only
> few
> >> >>>>>>>
> >> >>>>>> changes
> >> >>>>>>
> >> >>>>>>> once we established the slf4j logging interface everywhere.
> >> >>>>>>>
> >> >>>>>>> What do you guys think of that idea?
> >> >>>>>>>
> >> >>>>>>> Best regards,
> >> >>>>>>>
> >> >>>>>>> Till
> >> >>>>>>>
> >> >>>>>>>
> >> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Replacing JobManager with Scala implementation

Kostas Tzoumas-2
Should be a separate thread with [VOTE] in the subject line, a clear
description of what we are voting for, and the duration of the vote
(typically 72 hours).



On Wed, Sep 3, 2014 at 10:14 PM, Till Rohrmann <[hidden email]>
wrote:

> How do we then start the vote on whether we should implement the JobManager
> with Scala or not? Can we just do it in this thread or should it happen in
> a separate thread?
>
>
> On Wed, Sep 3, 2014 at 6:27 PM, Henry Saputra <[hidden email]>
> wrote:
>
> > Thanks @Ufuk for the response.
> >
> > Yeah, Akka hides all the low level nuts and bolts about the RPC flow
> > but then it also makes a bit harder to debug issues when communication
> > fail.
> > It makes sense to use one RPC framework if we could, and since there
> > are other plans for Akka in the code to help manage concurrencies
> > programming it is good idea to use Akka for RPC.
> >
> > - Henry
> >
> >
> > On Wed, Sep 3, 2014 at 5:06 AM, Ufuk Celebi <[hidden email]> wrote:
> > > Hey Till,
> > >
> > > I'm not sure what the "right" ASF process is, but I wouldn't mind a
> vote
> > on
> > > this in order to make sure that you don't do unnecessary work by
> > replacing
> > > the code with Scala.
> > >
> > > I for one would be certainly open to it. The only thing that bothers me
> > is
> > > the current state of out-of-the-box IDE support. But since there are
> > other
> > > successful Scala projects around ;-), which manage to do it, why
> > shouldn't
> > > we?
> > >
> > > @Henry, regarding Akka: I think the main motiviation for moving to Akka
> > > (besides the points raised by Stephan and others) is that we actually
> > don't
> > > want to bother with low-level thread management, protocols, etc.
> > >
> > >
> > >
> > > On Tue, Sep 2, 2014 at 8:32 PM, Henry Saputra <[hidden email]
> >
> > > wrote:
> > >
> > >> HI Till,
> > >>
> > >> Thanks for opening the discussions and lead the effort and apologize
> > >> for late response.
> > >>
> > >> From what I have gathered so far, there are 2 issues:
> > >> 1. Introducing Akka as RPC
> > >> 2. Moving to Scala to enable easy access to Akka Scala APIs.
> > >>
> > >> For no1, if the RPC us used for lower level communications then we
> > >> could probably consider Netty as the transport and serialization
> > >> protocol (I also have added comment to the JIRA).
> > >> Internally, to reduce thread management we could use Akka via Scala
> > >> bridge service to make sure we use Scala Akka APIs.
> > >>
> > >> So addressing no 2, we could mix both Scala and Java in JobManager and
> > >> TaskManager. the code that handle async RPC communications between JM
> > >> and TM are using Java via Netty, and internal multi-threads or higher
> > >> level plane code such as heart beat we could use Akka.
> > >>
> > >> It does introduce a bit mix between Java and Scala code but we already
> > >> have mix of Scala and Java to support APIs so I think we could move
> > >> some the internal code to use Scala too as "learning" steps to utilize
> > >> Scala for better multi concurrency/ functional programming.
> > >>
> > >> - Henry
> > >>
> > >>
> > >>
> > >> On Sun, Aug 31, 2014 at 4:31 AM, Till Rohrmann <
> [hidden email]
> > >
> > >> wrote:
> > >> > Hi Daniel,
> > >> >
> > >> > the RPC rework is discussed in
> > >> > https://issues.apache.org/jira/browse/FLINK-1019. Jira is currently
> > down
> > >> > due to maintenance reasons.
> > >> >
> > >> > The ideas to use akka are the following. Akka allows us to reduce
> the
> > >> code
> > >> > base which has to be maintained. Especially, we get rid of all the
> > >> > multi-threading programming of the rpc service which is always hard
> to
> > >> work
> > >> > with. With Akka we would get the heartbeat signal for free, because
> > Akka
> > >> > can detect dead actors. Akka uses supervision to handle fault
> > tolerance
> > >> as
> > >> > well as recovery and it allows an easy forwarding of remote
> > exceptions.
> > >> At
> > >> > the same time it offers a nice rpc abstraction which easily allows
> to
> > >> > implement asynchronous services. Furthermore, it scales rather well
> to
> > >> > large numbers of nodes and hopefully we get the latencies of Flink a
> > >> little
> > >> > bit down.
> > >> >
> > >> > Bests,
> > >> >
> > >> > Till
> > >> >
> > >> >
> > >> > On Sun, Aug 31, 2014 at 11:35 AM, Daniel Warneke <
> [hidden email]>
> > >> wrote:
> > >> >
> > >> >> Hi,
> > >> >>
> > >> >> will akka just be used for RPC or are there any plans to expand the
> > >> >> actor-based model to further parts of the runtime system? If so,
> > could
> > >> you
> > >> >> please point me to the discussion thread?
> > >> >>
> > >> >> Spontaneously, I would say that adding a hard dependency on Scala
> > just
> > >> for
> > >> >> the sake of having a hip RPC service sounds like a pretty dodgy
> deal.
> > >> >> Therefore, I would like understand how much value akka could bring
> to
> > >> Flink
> > >> >> in the long run. The discussion whether to reimplement core
> > components
> > >> of
> > >> >> the system in Scala should be the second step in my opinion.
> > >> >>
> > >> >> Bests,
> > >> >>
> > >> >>     Daniel
> > >> >>
> > >> >>
> > >> >> Am 29.08.2014 11:33, schrieb Asterios Katsifodimos:
> > >> >>
> > >> >>  I agree that using Akka's actors from Java results in very ugly
> > code.
> > >> >>> Hiding the internals of Akka behind Java reflection looks better
> but
> > >> >>> breaks
> > >> >>> the principles of actors. For me it is kind of a deal breaker for
> > using
> > >> >>> Akka from Java.  I think that Till has more reasons to believe
> that
> > >> Scala
> > >> >>> would be a more appropriate for building a new Job/Task Manager.
> > >> >>>
> > >> >>> I think that this discussion should focus on 4 main aspects:
> > >> >>> 1. Performance
> > >> >>> 2. Implementability
> > >> >>> 3. Maintainability
> > >> >>> 4. Available Tools
> > >> >>>
> > >> >>> 1. Performance: Since that the job of the JobManager and the
> > >> TaskManager
> > >> >>> is
> > >> >>> to 1) exchange messages in order to maintain a distributed state
> > >> machine
> > >> >>> and 2) setup connections between task managers, 3) detect failures
> > >> etc..
> > >> >>> In
> > >> >>> these basic operations, performance should not be an issue. Akka
> was
> > >> >>> proven
> > >> >>> to scale quite well with very low latency. I guess that the low
> > level
> > >> >>> "plumbing" (serialization, connections, etc.) will continue in
> Java
> > in
> > >> >>> order to guarantee high performance. I have no clue on what's
> > happening
> > >> >>> with memory management and whether this will be implemented in
> Java
> > or
> > >> >>> Scala and the respective consequences. Please comment.
> > >> >>>
> > >> >>> 2. Since the Job/Task Manager is going to be essentially
> implemented
> > >> from
> > >> >>> scratch, given the power of Akka, it seems to me that the
> > >> implementation
> > >> >>> will be   easier, shorter and less verbose in Scala, given that
> > Till is
> > >> >>> comfortable enough with Scala.
> > >> >>>
> > >> >>> 3. Given #2, maintaining the code and trying out new ideas in
> Scala
> > >> would
> > >> >>> take less time and effort. But maintaining low level plumbing in
> > Java
> > >> and
> > >> >>> high level logic in Scala scares me. Anyone that has done this
> > before
> > >> >>> could
> > >> >>> comment on this?
> > >> >>>
> > >> >>> 4. Tools: Robert has raised some issues already but I think that
> > tools
> > >> >>> will
> > >> >>> get better with time.
> > >> >>>
> > >> >>> Given the above, I would focus on #3 to be honest. Apart from
> this,
> > >> going
> > >> >>> the Scala way sounds like a great idea. I really second Kostas'
> > opinion
> > >> >>> that if large changes are going to happen, this is the best
> moment.
> > >> >>>
> > >> >>> Cheers,
> > >> >>> Asterios
> > >> >>>
> > >> >>>
> > >> >>>
> > >> >>> On Fri, Aug 29, 2014 at 1:02 AM, Till Rohrmann <
> > >> [hidden email]>
> > >> >>> wrote:
> > >> >>>
> > >> >>>  I also agree with Robert and Kostas that it has to be a community
> > >> >>>> decision.
> > >> >>>> I understand the problems with Eclipse and the Scala IDE which
> is a
> > >> pain
> > >> >>>> in
> > >> >>>> the ass. But eventually these things will be fixed. Maybe we
> could
> > >> also
> > >> >>>> talk to the typesafe guy and tell him that this problem bothers
> us
> > a
> > >> lot.
> > >> >>>>
> > >> >>>> I also believe that the project is not about a specific
> programming
> > >> >>>> language but a problem we want to tackle with Flink. From time to
> > >> time it
> > >> >>>> might be necessary to adapt the tools in order to reach the goal.
> > In
> > >> >>>> fact,
> > >> >>>> I don't believe that Scala parts would drive people away from the
> > >> >>>> project.
> > >> >>>> Instead, Scala enthusiasts would be motivated to join us.
> > >> >>>>
> > >> >>>> Actually I stumbled across a quote of Leibniz which put's my
> point
> > of
> > >> >>>> view
> > >> >>>> quite accurately in a nutshell:
> > >> >>>>
> > >> >>>> In symbols one observes an advantage in discovery which is
> greatest
> > >> when
> > >> >>>> they express the exact nature of a thing briefly and, as it were,
> > >> picture
> > >> >>>> it; then indeed the labor of thought is wonderfully diminished --
> > >> >>>> Gottfried
> > >> >>>> Wilhelm Leibniz
> > >> >>>>
> > >> >>>>
> > >> >>>> On Thu, Aug 28, 2014 at 12:57 PM, Kostas Tzoumas <
> > [hidden email]
> > >> >
> > >> >>>> wrote:
> > >> >>>>
> > >> >>>>  On Thu, Aug 28, 2014 at 11:49 AM, Robert Metzger <
> > >> [hidden email]>
> > >> >>>>> wrote:
> > >> >>>>>
> > >> >>>>>  Changing the programming language of a very important system
> > >> component
> > >> >>>>>>
> > >> >>>>> is
> > >> >>>>
> > >> >>>>> something we should carefully discuss.
> > >> >>>>>>
> > >> >>>>>>  Definitely agree, I think the community should discuss this
> very
> > >> >>>>>
> > >> >>>> carefully.
> > >> >>>>
> > >> >>>>>
> > >> >>>>>  I understand that Akka is written in Scala and that it will be
> > much
> > >> >>>>>>
> > >> >>>>> more
> > >> >>>>
> > >> >>>>> natural to implement the actor based system using Scala.
> > >> >>>>>> I see the following issues that we should consider:
> > >> >>>>>> Until now, Flink is clearly a project implemented only in Java.
> > The
> > >> >>>>>>
> > >> >>>>> Scala
> > >> >>>>
> > >> >>>>> API basically sits on top of the Java-based runtime. We do not
> > really
> > >> >>>>>> depend on Scala (we could easily remove the Scala API if we
> want
> > >> to).
> > >> >>>>>> Having code written in Scala in the main system will add a hard
> > >> >>>>>>
> > >> >>>>> dependency
> > >> >>>>>
> > >> >>>>>> on a scala version.
> > >> >>>>>> Being a pure Java project has some advantages: I think its a
> fact
> > >> that
> > >> >>>>>> there are more Java programmers than Scala programmers. So our
> > >> chances
> > >> >>>>>>
> > >> >>>>> of
> > >> >>>>
> > >> >>>>> attracting new contributors are higher when being a Java
> project.
> > >> >>>>>> On the other hand, we could maybe attract Scala developers to
> our
> > >> >>>>>>
> > >> >>>>> project.
> > >> >>>>>
> > >> >>>>>> But that has not happened (for contributors, not users!) so far
> > for
> > >> our
> > >> >>>>>> Scala API, so I don't see any reason for that to happen.
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>>  This is definitely an issue to consider. We need to carefully
> > >> weight
> > >> >>>>> how
> > >> >>>>> important this issue is. If we want to break things, incubation
> is
> > >> the
> > >> >>>>> right time to do it. Below are some arguments in favor of
> breaking
> > >> >>>>>
> > >> >>>> things,
> > >> >>>>
> > >> >>>>> but do keep in mind that I am undecided, and I would really like
> > to
> > >> see
> > >> >>>>>
> > >> >>>> the
> > >> >>>>
> > >> >>>>> community weighing in.
> > >> >>>>>
> > >> >>>>> First, I would dare say that the primary reason for someone to
> > >> >>>>> contribute
> > >> >>>>> to Flink so far has not been that the code is written in Java,
> but
> > >> more
> > >> >>>>>
> > >> >>>> the
> > >> >>>>
> > >> >>>>> content and nature of the project. Most contributors are Big
> Data
> > >> >>>>> enthusiasts in some way or another.
> > >> >>>>>
> > >> >>>>> Second, Scala projects have attracted contributors in the past.
> > >> >>>>>
> > >> >>>>> Third, it should not be too hard for someone that does not know
> > >> Scala to
> > >> >>>>> contribute to a different component if the interfaces are clear.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>  Another issue is tooling: There are a lot of problems with
> Scala
> > and
> > >> >>>>>> Eclipse: I've recently switched to Eclipse Luna. It seems to be
> > >> >>>>>>
> > >> >>>>> impossible
> > >> >>>>>
> > >> >>>>>> to compile Scala code with Luna because ScalaIDE does not
> > properly
> > >> cope
> > >> >>>>>> with it.
> > >> >>>>>> Even with Eclipse versions that are supported by ScalaIDE, you
> > have
> > >> to
> > >> >>>>>> manually install 3 plugins, some of them are not available in
> the
> > >> >>>>>>
> > >> >>>>> Eclipse
> > >> >>>>
> > >> >>>>> Marketplace. So with a JobManager written in Scala, users can
> not
> > >> just
> > >> >>>>>> import our project as a Maven project into Eclipse and start
> > >> >>>>>>
> > >> >>>>> developing.
> > >> >>>>
> > >> >>>>> The support for Maven is probably also limited. For example, I
> > don't
> > >> >>>>>>
> > >> >>>>> know
> > >> >>>>
> > >> >>>>> if there is a checkstyle plugin for Scala.
> > >> >>>>>>
> > >> >>>>>> I'm looking forward to hearing other opinions on this issue.
> As I
> > >> said
> > >> >>>>>>
> > >> >>>>> in
> > >> >>>>
> > >> >>>>> the beginning, we should exchange arguments on this and think
> > about
> > >> it
> > >> >>>>>>
> > >> >>>>> for
> > >> >>>>>
> > >> >>>>>> some time before we decide on this.
> > >> >>>>>>
> > >> >>>>>>  Best,
> > >> >>>>>
> > >> >>>>>> Robert
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> On Thu, Aug 28, 2014 at 1:08 AM, Till Rohrmann <
> > >> [hidden email]>
> > >> >>>>>> wrote:
> > >> >>>>>>
> > >> >>>>>>  Hi guys,
> > >> >>>>>>>
> > >> >>>>>>> I currently working on replacing the old rpc infrastructure
> > with an
> > >> >>>>>>>
> > >> >>>>>> akka
> > >> >>>>>
> > >> >>>>>> based actor system. In the wake of this change I will
> reimplement
> > >> the
> > >> >>>>>>> JobManager and TaskManager which will then be actors. Akka
> > offers a
> > >> >>>>>>>
> > >> >>>>>> Java
> > >> >>>>>
> > >> >>>>>> API but the implementation turns out to be very verbose and
> > >> >>>>>>>
> > >> >>>>>> laborious,
> > >> >>>>
> > >> >>>>> because Java 6 and 7 do not support lambdas and pattern
> matching.
> > >> >>>>>>>
> > >> >>>>>> Using
> > >> >>>>
> > >> >>>>> Scala instead, would allow a far more succinct and clear
> > >> >>>>>>>
> > >> >>>>>> implementation
> > >> >>>>
> > >> >>>>> of
> > >> >>>>>>
> > >> >>>>>>> the JobManager and TaskManager. Instead of a lot of if
> > statements
> > >> >>>>>>>
> > >> >>>>>> using
> > >> >>>>
> > >> >>>>> instanceof to figure out the message type, we could simply use
> > >> >>>>>>>
> > >> >>>>>> pattern
> > >> >>>>
> > >> >>>>> matching. Furthermore, the callback functions could simply be
> > Scala's
> > >> >>>>>>> anonymous functions. Therefore I would propose to use Scala
> for
> > >> these
> > >> >>>>>>>
> > >> >>>>>> two
> > >> >>>>>
> > >> >>>>>> systems.
> > >> >>>>>>>
> > >> >>>>>>> The Akka system uses the slf4j library as logging interface.
> > >> >>>>>>>
> > >> >>>>>> Therefore
> > >> >>>>
> > >> >>>>> I
> > >> >>>>>
> > >> >>>>>> would also propose to replace the jcl logging system with the
> > slf4j
> > >> >>>>>>>
> > >> >>>>>> logging
> > >> >>>>>>
> > >> >>>>>>> system. Since we want to use Akka in many parts of the runtime
> > >> system
> > >> >>>>>>>
> > >> >>>>>> and
> > >> >>>>>
> > >> >>>>>> it recommends using logback as logging backend, I would also
> > like to
> > >> >>>>>>> replace log4j with logback. But this change should inflict
> only
> > few
> > >> >>>>>>>
> > >> >>>>>> changes
> > >> >>>>>>
> > >> >>>>>>> once we established the slf4j logging interface everywhere.
> > >> >>>>>>>
> > >> >>>>>>> What do you guys think of that idea?
> > >> >>>>>>>
> > >> >>>>>>> Best regards,
> > >> >>>>>>>
> > >> >>>>>>> Till
> > >> >>>>>>>
> > >> >>>>>>>
> > >> >>
> > >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Replacing JobManager with Scala implementation

Daniel Warneke
Hi,

quite frankly, I still don’t understand what concrete problems in Flink
we are trying to solve with introducing akka, or even worse,
reimplementing the JobManager and TaskManager in Scala. In my opinion,
it is crucial to clarify that before the vote starts.

First, it is unclear to me why akka has such a strong standing in the
project that we are seriously contemplating if it is worth to introduce
the complexity of a second programming language to the very core of the
system. An RPC service is a total commodity component these days. Any
other RPC service could essentially do the job. Did somebody have a look
at the alternatives (kryo, Netty, …)?

Second, I think it is also a misconception to think that the current RPC
service is a major source of scalability and latency issues. Most of the
scalability/latency problems we see arise from the currently rather
complex/clumsy way of traversing Flink’s internal scheduling structures
(i.e. the ExecutionGraph) upon status updates. The scheduling structure
is inherently shared state at the moment, so unless somebody wants to
reimplement it using actors and message passing, I don’t see how either
akka or Scala could help us here.

If the majority of the people here want to use akka and Scala, that’s
fine with me, but if we decide to make the transition for technical
reasons, let’s make sure these reasons are actually valid.

Best regards,

     Daniel


Am 03.09.2014 22:39, schrieb Kostas Tzoumas:

> Should be a separate thread with [VOTE] in the subject line, a clear
> description of what we are voting for, and the duration of the vote
> (typically 72 hours).
>
>
>
> On Wed, Sep 3, 2014 at 10:14 PM, Till Rohrmann <[hidden email]>
> wrote:
>
>> How do we then start the vote on whether we should implement the JobManager
>> with Scala or not? Can we just do it in this thread or should it happen in
>> a separate thread?
>>
>>
>> On Wed, Sep 3, 2014 at 6:27 PM, Henry Saputra <[hidden email]>
>> wrote:
>>
>>> Thanks @Ufuk for the response.
>>>
>>> Yeah, Akka hides all the low level nuts and bolts about the RPC flow
>>> but then it also makes a bit harder to debug issues when communication
>>> fail.
>>> It makes sense to use one RPC framework if we could, and since there
>>> are other plans for Akka in the code to help manage concurrencies
>>> programming it is good idea to use Akka for RPC.
>>>
>>> - Henry
>>>
>>>
>>> On Wed, Sep 3, 2014 at 5:06 AM, Ufuk Celebi <[hidden email]> wrote:
>>>> Hey Till,
>>>>
>>>> I'm not sure what the "right" ASF process is, but I wouldn't mind a
>> vote
>>> on
>>>> this in order to make sure that you don't do unnecessary work by
>>> replacing
>>>> the code with Scala.
>>>>
>>>> I for one would be certainly open to it. The only thing that bothers me
>>> is
>>>> the current state of out-of-the-box IDE support. But since there are
>>> other
>>>> successful Scala projects around ;-), which manage to do it, why
>>> shouldn't
>>>> we?
>>>>
>>>> @Henry, regarding Akka: I think the main motiviation for moving to Akka
>>>> (besides the points raised by Stephan and others) is that we actually
>>> don't
>>>> want to bother with low-level thread management, protocols, etc.
>>>>
>>>>
>>>>
>>>> On Tue, Sep 2, 2014 at 8:32 PM, Henry Saputra <[hidden email]
>>>> wrote:
>>>>
>>>>> HI Till,
>>>>>
>>>>> Thanks for opening the discussions and lead the effort and apologize
>>>>> for late response.
>>>>>
>>>>>  From what I have gathered so far, there are 2 issues:
>>>>> 1. Introducing Akka as RPC
>>>>> 2. Moving to Scala to enable easy access to Akka Scala APIs.
>>>>>
>>>>> For no1, if the RPC us used for lower level communications then we
>>>>> could probably consider Netty as the transport and serialization
>>>>> protocol (I also have added comment to the JIRA).
>>>>> Internally, to reduce thread management we could use Akka via Scala
>>>>> bridge service to make sure we use Scala Akka APIs.
>>>>>
>>>>> So addressing no 2, we could mix both Scala and Java in JobManager and
>>>>> TaskManager. the code that handle async RPC communications between JM
>>>>> and TM are using Java via Netty, and internal multi-threads or higher
>>>>> level plane code such as heart beat we could use Akka.
>>>>>
>>>>> It does introduce a bit mix between Java and Scala code but we already
>>>>> have mix of Scala and Java to support APIs so I think we could move
>>>>> some the internal code to use Scala too as "learning" steps to utilize
>>>>> Scala for better multi concurrency/ functional programming.
>>>>>
>>>>> - Henry
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Aug 31, 2014 at 4:31 AM, Till Rohrmann <
>> [hidden email]
>>>>> wrote:
>>>>>> Hi Daniel,
>>>>>>
>>>>>> the RPC rework is discussed in
>>>>>> https://issues.apache.org/jira/browse/FLINK-1019. Jira is currently
>>> down
>>>>>> due to maintenance reasons.
>>>>>>
>>>>>> The ideas to use akka are the following. Akka allows us to reduce
>> the
>>>>> code
>>>>>> base which has to be maintained. Especially, we get rid of all the
>>>>>> multi-threading programming of the rpc service which is always hard
>> to
>>>>> work
>>>>>> with. With Akka we would get the heartbeat signal for free, because
>>> Akka
>>>>>> can detect dead actors. Akka uses supervision to handle fault
>>> tolerance
>>>>> as
>>>>>> well as recovery and it allows an easy forwarding of remote
>>> exceptions.
>>>>> At
>>>>>> the same time it offers a nice rpc abstraction which easily allows
>> to
>>>>>> implement asynchronous services. Furthermore, it scales rather well
>> to
>>>>>> large numbers of nodes and hopefully we get the latencies of Flink a
>>>>> little
>>>>>> bit down.
>>>>>>
>>>>>> Bests,
>>>>>>
>>>>>> Till
>>>>>>
>>>>>>
>>>>>> On Sun, Aug 31, 2014 at 11:35 AM, Daniel Warneke <
>> [hidden email]>
>>>>> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> will akka just be used for RPC or are there any plans to expand the
>>>>>>> actor-based model to further parts of the runtime system? If so,
>>> could
>>>>> you
>>>>>>> please point me to the discussion thread?
>>>>>>>
>>>>>>> Spontaneously, I would say that adding a hard dependency on Scala
>>> just
>>>>> for
>>>>>>> the sake of having a hip RPC service sounds like a pretty dodgy
>> deal.
>>>>>>> Therefore, I would like understand how much value akka could bring
>> to
>>>>> Flink
>>>>>>> in the long run. The discussion whether to reimplement core
>>> components
>>>>> of
>>>>>>> the system in Scala should be the second step in my opinion.
>>>>>>>
>>>>>>> Bests,
>>>>>>>
>>>>>>>      Daniel
>>>>>>>
>>>>>>>
>>>>>>> Am 29.08.2014 11:33, schrieb Asterios Katsifodimos:
>>>>>>>
>>>>>>>   I agree that using Akka's actors from Java results in very ugly
>>> code.
>>>>>>>> Hiding the internals of Akka behind Java reflection looks better
>> but
>>>>>>>> breaks
>>>>>>>> the principles of actors. For me it is kind of a deal breaker for
>>> using
>>>>>>>> Akka from Java.  I think that Till has more reasons to believe
>> that
>>>>> Scala
>>>>>>>> would be a more appropriate for building a new Job/Task Manager.
>>>>>>>>
>>>>>>>> I think that this discussion should focus on 4 main aspects:
>>>>>>>> 1. Performance
>>>>>>>> 2. Implementability
>>>>>>>> 3. Maintainability
>>>>>>>> 4. Available Tools
>>>>>>>>
>>>>>>>> 1. Performance: Since that the job of the JobManager and the
>>>>> TaskManager
>>>>>>>> is
>>>>>>>> to 1) exchange messages in order to maintain a distributed state
>>>>> machine
>>>>>>>> and 2) setup connections between task managers, 3) detect failures
>>>>> etc..
>>>>>>>> In
>>>>>>>> these basic operations, performance should not be an issue. Akka
>> was
>>>>>>>> proven
>>>>>>>> to scale quite well with very low latency. I guess that the low
>>> level
>>>>>>>> "plumbing" (serialization, connections, etc.) will continue in
>> Java
>>> in
>>>>>>>> order to guarantee high performance. I have no clue on what's
>>> happening
>>>>>>>> with memory management and whether this will be implemented in
>> Java
>>> or
>>>>>>>> Scala and the respective consequences. Please comment.
>>>>>>>>
>>>>>>>> 2. Since the Job/Task Manager is going to be essentially
>> implemented
>>>>> from
>>>>>>>> scratch, given the power of Akka, it seems to me that the
>>>>> implementation
>>>>>>>> will be   easier, shorter and less verbose in Scala, given that
>>> Till is
>>>>>>>> comfortable enough with Scala.
>>>>>>>>
>>>>>>>> 3. Given #2, maintaining the code and trying out new ideas in
>> Scala
>>>>> would
>>>>>>>> take less time and effort. But maintaining low level plumbing in
>>> Java
>>>>> and
>>>>>>>> high level logic in Scala scares me. Anyone that has done this
>>> before
>>>>>>>> could
>>>>>>>> comment on this?
>>>>>>>>
>>>>>>>> 4. Tools: Robert has raised some issues already but I think that
>>> tools
>>>>>>>> will
>>>>>>>> get better with time.
>>>>>>>>
>>>>>>>> Given the above, I would focus on #3 to be honest. Apart from
>> this,
>>>>> going
>>>>>>>> the Scala way sounds like a great idea. I really second Kostas'
>>> opinion
>>>>>>>> that if large changes are going to happen, this is the best
>> moment.
>>>>>>>> Cheers,
>>>>>>>> Asterios
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Aug 29, 2014 at 1:02 AM, Till Rohrmann <
>>>>> [hidden email]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>   I also agree with Robert and Kostas that it has to be a community
>>>>>>>>> decision.
>>>>>>>>> I understand the problems with Eclipse and the Scala IDE which
>> is a
>>>>> pain
>>>>>>>>> in
>>>>>>>>> the ass. But eventually these things will be fixed. Maybe we
>> could
>>>>> also
>>>>>>>>> talk to the typesafe guy and tell him that this problem bothers
>> us
>>> a
>>>>> lot.
>>>>>>>>> I also believe that the project is not about a specific
>> programming
>>>>>>>>> language but a problem we want to tackle with Flink. From time to
>>>>> time it
>>>>>>>>> might be necessary to adapt the tools in order to reach the goal.
>>> In
>>>>>>>>> fact,
>>>>>>>>> I don't believe that Scala parts would drive people away from the
>>>>>>>>> project.
>>>>>>>>> Instead, Scala enthusiasts would be motivated to join us.
>>>>>>>>>
>>>>>>>>> Actually I stumbled across a quote of Leibniz which put's my
>> point
>>> of
>>>>>>>>> view
>>>>>>>>> quite accurately in a nutshell:
>>>>>>>>>
>>>>>>>>> In symbols one observes an advantage in discovery which is
>> greatest
>>>>> when
>>>>>>>>> they express the exact nature of a thing briefly and, as it were,
>>>>> picture
>>>>>>>>> it; then indeed the labor of thought is wonderfully diminished --
>>>>>>>>> Gottfried
>>>>>>>>> Wilhelm Leibniz
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Aug 28, 2014 at 12:57 PM, Kostas Tzoumas <
>>> [hidden email]
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>   On Thu, Aug 28, 2014 at 11:49 AM, Robert Metzger <
>>>>> [hidden email]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>   Changing the programming language of a very important system
>>>>> component
>>>>>>>>>> is
>>>>>>>>>> something we should carefully discuss.
>>>>>>>>>>>   Definitely agree, I think the community should discuss this
>> very
>>>>>>>>> carefully.
>>>>>>>>>
>>>>>>>>>>   I understand that Akka is written in Scala and that it will be
>>> much
>>>>>>>>>> more
>>>>>>>>>> natural to implement the actor based system using Scala.
>>>>>>>>>>> I see the following issues that we should consider:
>>>>>>>>>>> Until now, Flink is clearly a project implemented only in Java.
>>> The
>>>>>>>>>> Scala
>>>>>>>>>> API basically sits on top of the Java-based runtime. We do not
>>> really
>>>>>>>>>>> depend on Scala (we could easily remove the Scala API if we
>> want
>>>>> to).
>>>>>>>>>>> Having code written in Scala in the main system will add a hard
>>>>>>>>>>>
>>>>>>>>>> dependency
>>>>>>>>>>
>>>>>>>>>>> on a scala version.
>>>>>>>>>>> Being a pure Java project has some advantages: I think its a
>> fact
>>>>> that
>>>>>>>>>>> there are more Java programmers than Scala programmers. So our
>>>>> chances
>>>>>>>>>> of
>>>>>>>>>> attracting new contributors are higher when being a Java
>> project.
>>>>>>>>>>> On the other hand, we could maybe attract Scala developers to
>> our
>>>>>>>>>> project.
>>>>>>>>>>
>>>>>>>>>>> But that has not happened (for contributors, not users!) so far
>>> for
>>>>> our
>>>>>>>>>>> Scala API, so I don't see any reason for that to happen.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>   This is definitely an issue to consider. We need to carefully
>>>>> weight
>>>>>>>>>> how
>>>>>>>>>> important this issue is. If we want to break things, incubation
>> is
>>>>> the
>>>>>>>>>> right time to do it. Below are some arguments in favor of
>> breaking
>>>>>>>>> things,
>>>>>>>>>
>>>>>>>>>> but do keep in mind that I am undecided, and I would really like
>>> to
>>>>> see
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>> community weighing in.
>>>>>>>>>>
>>>>>>>>>> First, I would dare say that the primary reason for someone to
>>>>>>>>>> contribute
>>>>>>>>>> to Flink so far has not been that the code is written in Java,
>> but
>>>>> more
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>> content and nature of the project. Most contributors are Big
>> Data
>>>>>>>>>> enthusiasts in some way or another.
>>>>>>>>>>
>>>>>>>>>> Second, Scala projects have attracted contributors in the past.
>>>>>>>>>>
>>>>>>>>>> Third, it should not be too hard for someone that does not know
>>>>> Scala to
>>>>>>>>>> contribute to a different component if the interfaces are clear.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>   Another issue is tooling: There are a lot of problems with
>> Scala
>>> and
>>>>>>>>>>> Eclipse: I've recently switched to Eclipse Luna. It seems to be
>>>>>>>>>>>
>>>>>>>>>> impossible
>>>>>>>>>>
>>>>>>>>>>> to compile Scala code with Luna because ScalaIDE does not
>>> properly
>>>>> cope
>>>>>>>>>>> with it.
>>>>>>>>>>> Even with Eclipse versions that are supported by ScalaIDE, you
>>> have
>>>>> to
>>>>>>>>>>> manually install 3 plugins, some of them are not available in
>> the
>>>>>>>>>> Eclipse
>>>>>>>>>> Marketplace. So with a JobManager written in Scala, users can
>> not
>>>>> just
>>>>>>>>>>> import our project as a Maven project into Eclipse and start
>>>>>>>>>>>
>>>>>>>>>> developing.
>>>>>>>>>> The support for Maven is probably also limited. For example, I
>>> don't
>>>>>>>>>> know
>>>>>>>>>> if there is a checkstyle plugin for Scala.
>>>>>>>>>>> I'm looking forward to hearing other opinions on this issue.
>> As I
>>>>> said
>>>>>>>>>> in
>>>>>>>>>> the beginning, we should exchange arguments on this and think
>>> about
>>>>> it
>>>>>>>>>> for
>>>>>>>>>>
>>>>>>>>>>> some time before we decide on this.
>>>>>>>>>>>
>>>>>>>>>>>   Best,
>>>>>>>>>>> Robert
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 28, 2014 at 1:08 AM, Till Rohrmann <
>>>>> [hidden email]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>   Hi guys,
>>>>>>>>>>>> I currently working on replacing the old rpc infrastructure
>>> with an
>>>>>>>>>>> akka
>>>>>>>>>>> based actor system. In the wake of this change I will
>> reimplement
>>>>> the
>>>>>>>>>>>> JobManager and TaskManager which will then be actors. Akka
>>> offers a
>>>>>>>>>>> Java
>>>>>>>>>>> API but the implementation turns out to be very verbose and
>>>>>>>>>>> laborious,
>>>>>>>>>> because Java 6 and 7 do not support lambdas and pattern
>> matching.
>>>>>>>>>>> Using
>>>>>>>>>> Scala instead, would allow a far more succinct and clear
>>>>>>>>>>> implementation
>>>>>>>>>> of
>>>>>>>>>>>> the JobManager and TaskManager. Instead of a lot of if
>>> statements
>>>>>>>>>>> using
>>>>>>>>>> instanceof to figure out the message type, we could simply use
>>>>>>>>>>> pattern
>>>>>>>>>> matching. Furthermore, the callback functions could simply be
>>> Scala's
>>>>>>>>>>>> anonymous functions. Therefore I would propose to use Scala
>> for
>>>>> these
>>>>>>>>>>> two
>>>>>>>>>>> systems.
>>>>>>>>>>>> The Akka system uses the slf4j library as logging interface.
>>>>>>>>>>>>
>>>>>>>>>>> Therefore
>>>>>>>>>> I
>>>>>>>>>>
>>>>>>>>>>> would also propose to replace the jcl logging system with the
>>> slf4j
>>>>>>>>>>> logging
>>>>>>>>>>>
>>>>>>>>>>>> system. Since we want to use Akka in many parts of the runtime
>>>>> system
>>>>>>>>>>> and
>>>>>>>>>>> it recommends using logback as logging backend, I would also
>>> like to
>>>>>>>>>>>> replace log4j with logback. But this change should inflict
>> only
>>> few
>>>>>>>>>>> changes
>>>>>>>>>>>
>>>>>>>>>>>> once we established the slf4j logging interface everywhere.
>>>>>>>>>>>>
>>>>>>>>>>>> What do you guys think of that idea?
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Till
>>>>>>>>>>>>
>>>>>>>>>>>>

Reply | Threaded
Open this post in threaded view
|

Re: Replacing JobManager with Scala implementation

Ufuk Celebi-2
Hey Daniel,

On Wed, Sep 3, 2014 at 11:48 PM, Daniel Warneke <[hidden email]> wrote:

> quite frankly, I still don’t understand what concrete problems in Flink we
> are trying to solve with introducing akka, or even worse, reimplementing
> the JobManager and TaskManager in Scala. In my opinion, it is crucial to
> clarify that before the vote starts.
>

I think we are all on the same page and Till started this thread to clarify
the issues. It's unfortunate that we are having two interdependent
discussions in one thread though.

 1. Akka: The initial (orthogonal) issue, which initiated this thread is
the question whether to replace our current RPC system with Akka (
https://issues.apache.org/jira/browse/FLINK-1019). I think that the points
given by both Till and Stephan about Akka in this thread are valid
technical reasons for a transition. They highlight both problems with the
current RPC (threading, exception handling) and potential free lunches
(heartbeat, supervision).


> First, it is unclear to me why akka has such a strong standing in the
> project that we are seriously contemplating if it is worth to introduce the
> complexity of a second programming language to the very core of the system.
> An RPC service is a total commodity component these days. Any other RPC
> service could essentially do the job. Did somebody have a look at the
> alternatives (kryo, Netty, …)?
>

I'm not sure if Till or Stephan considered alternatives, but given the
above points Akka seems to be a good fit. Regarding Netty and Kryo:

- We have replaced the custom TCP network code of the system with Netty
some time ago and from my experience with Netty I don't think that it's a
good fit for replacing the RPC service as it won't solve the low-level
issues we are having right now. Instead it would just wrap them in a nice
library and add complexity for message queing etc.

- Kryo is imo just a serialization framework and KryoNet would be a
competitor to Netty and not Akka.

Second, I think it is also a misconception to think that the current RPC
> service is a major source of scalability and latency issues. Most of the
> scalability/latency problems we see arise from the currently rather
> complex/clumsy way of traversing Flink’s internal scheduling structures
> (i.e. the ExecutionGraph) upon status updates. The scheduling structure is
> inherently shared state at the moment, so unless somebody wants to
> reimplement it using actors and message passing, I don’t see how either
> akka or Scala could help us here.
>

I think we have already replaced parts of the ExecutionGraph lookup
structures to counter these problems. Stephan is currently working on
reworking the scheduler (https://issues.apache.org/jira/browse/FLINK-1030)
and if I'm not mistaken he has plans to make use of the Actor model for
this.

2. Scala: In order to make use of some Akka provides, the JobManager and
TaskManager need to be refactored to be Akka Actors. This refactoring can
be done in Java or Scala. The original question of this thread is whether
it would be OK to do it in Scala. Points both in favor and against this
have been raised here and led to the corresponding [VOTE] thread. Since the
[VOTE] is currently running I would suggest to move the discussion for this
specific there.
Reply | Threaded
Open this post in threaded view
|

Re: Replacing JobManager with Scala implementation

Stephan Ewen
Again, Java vs. Scala is a different question than akka or no akka.

Java vs Scala is in the end a question of ease of use, and of adding a
stronger Scala side to the project (for our own sake or for attracting
interested developers).

Akka is planned not simply as an rpc replacement alone, but for remote
calls, asynchronous callbacks, failure detection, master failover, and a
simpler concurrency model in the execution graph.

I Actually think that the current rpc is limiting us, for example because
of the lack of callback support, which makes the polling parts necessary.

Stephan
12