About Operator and OperatorBase

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

About Operator and OperatorBase

Aljoscha Krettek-2
This is just a personal annoyance and I think we are to advanced to
change this now, but here goes: Could we rename the classes in the
API, so that for example MapOperator becomes MapDataSet or MapSet, and
the actual operators in the common package become MapOperator instead
of MapOperatorBase.

My thinking is that two things called MapOperator are very confusing.
If we want to change this we should do it fast.

Cheers,
Aljoscha
Reply | Threaded
Open this post in threaded view
|

Re: About Operator and OperatorBase

Ufuk Celebi-2

On 15 Apr 2015, at 14:11, Aljoscha Krettek <[hidden email]> wrote:

> This is just a personal annoyance and I think we are to advanced to
> change this now, but here goes: Could we rename the classes in the
> API, so that for example MapOperator becomes MapDataSet or MapSet, and
> the actual operators in the common package become MapOperator instead
> of MapOperatorBase.
>
> My thinking is that two things called MapOperator are very confusing.
> If we want to change this we should do it fast.

+1

Realistically speaking, no one except the core committers are working on the internals currently. Better change it now (or never :D).
Reply | Threaded
Open this post in threaded view
|

Re: About Operator and OperatorBase

Stephan Ewen
In reply to this post by Aljoscha Krettek-2
I think we can rename the base operators.

Renaming the subclass of DataSet would be extremely api breaking. I think
that is not worth it.
Am 15.04.2015 14:11 schrieb "Aljoscha Krettek" <[hidden email]>:

> This is just a personal annoyance and I think we are to advanced to
> change this now, but here goes: Could we rename the classes in the
> API, so that for example MapOperator becomes MapDataSet or MapSet, and
> the actual operators in the common package become MapOperator instead
> of MapOperatorBase.
>
> My thinking is that two things called MapOperator are very confusing.
> If we want to change this we should do it fast.
>
> Cheers,
> Aljoscha
>
Reply | Threaded
Open this post in threaded view
|

Re: About Operator and OperatorBase

Ufuk Celebi-2

On 15 Apr 2015, at 15:01, Stephan Ewen <[hidden email]> wrote:

> I think we can rename the base operators.
>
> Renaming the subclass of DataSet would be extremely api breaking. I think
> that is not worth it.

Oh, that's right. We return MapOperator for DataSet operations. Stephan's point makes sense.
Reply | Threaded
Open this post in threaded view
|

Re: About Operator and OperatorBase

till.rohrmann
I would also be in favour of making the distinction between the API and
common API layer more clear by using different names. This will ease the
understanding of the source code.

In the wake of a possible renaming we could also get rid of the legacy code
org.apache.flink.optimizer.dag.MatchNode and
rename org.apache.flink.runtime.operators.MatchDriver into JoinDriver to
make the naming more consistent.

On Wed, Apr 15, 2015 at 3:05 PM, Ufuk Celebi <[hidden email]> wrote:

>
> On 15 Apr 2015, at 15:01, Stephan Ewen <[hidden email]> wrote:
>
> > I think we can rename the base operators.
> >
> > Renaming the subclass of DataSet would be extremely api breaking. I think
> > that is not worth it.
>
> Oh, that's right. We return MapOperator for DataSet operations. Stephan's
> point makes sense.
Reply | Threaded
Open this post in threaded view
|

Re: About Operator and OperatorBase

Timo Walther-2
I share Stephans opinion.

By the way, we could also find a common name for operators with two
inputs. Sometimes it's "TwoInputXXX", "DualInputXXX",
"BinaryInputXXX"... pretty inconsistent.

On 15.04.2015 17:48, Till Rohrmann wrote:

> I would also be in favour of making the distinction between the API and
> common API layer more clear by using different names. This will ease the
> understanding of the source code.
>
> In the wake of a possible renaming we could also get rid of the legacy code
> org.apache.flink.optimizer.dag.MatchNode and
> rename org.apache.flink.runtime.operators.MatchDriver into JoinDriver to
> make the naming more consistent.
>
> On Wed, Apr 15, 2015 at 3:05 PM, Ufuk Celebi <[hidden email]> wrote:
>
>> On 15 Apr 2015, at 15:01, Stephan Ewen <[hidden email]> wrote:
>>
>>> I think we can rename the base operators.
>>>
>>> Renaming the subclass of DataSet would be extremely api breaking. I think
>>> that is not worth it.
>> Oh, that's right. We return MapOperator for DataSet operations. Stephan's
>> point makes sense.

Reply | Threaded
Open this post in threaded view
|

Re: About Operator and OperatorBase

Fabian Hueske-2
Renaming the core operators is fine with me, but I would not touch API
facing classes.
A big +1 for Timo's suggestion.

2015-04-16 6:30 GMT-05:00 Timo Walther <[hidden email]>:

> I share Stephans opinion.
>
> By the way, we could also find a common name for operators with two
> inputs. Sometimes it's "TwoInputXXX", "DualInputXXX", "BinaryInputXXX"...
> pretty inconsistent.
>
>
> On 15.04.2015 17:48, Till Rohrmann wrote:
>
>> I would also be in favour of making the distinction between the API and
>> common API layer more clear by using different names. This will ease the
>> understanding of the source code.
>>
>> In the wake of a possible renaming we could also get rid of the legacy
>> code
>> org.apache.flink.optimizer.dag.MatchNode and
>> rename org.apache.flink.runtime.operators.MatchDriver into JoinDriver to
>> make the naming more consistent.
>>
>> On Wed, Apr 15, 2015 at 3:05 PM, Ufuk Celebi <[hidden email]> wrote:
>>
>>  On 15 Apr 2015, at 15:01, Stephan Ewen <[hidden email]> wrote:
>>>
>>>  I think we can rename the base operators.
>>>>
>>>> Renaming the subclass of DataSet would be extremely api breaking. I
>>>> think
>>>> that is not worth it.
>>>>
>>> Oh, that's right. We return MapOperator for DataSet operations. Stephan's
>>> point makes sense.
>>>
>>
>
mxm
Reply | Threaded
Open this post in threaded view
|

Re: About Operator and OperatorBase

mxm
+1 for keeping the API. Even though this will not change your initial
concern much, Aljoscha :) I agree with you that it would be more consistent
to call the result of an operator OperatorDataSet.

On Thu, Apr 16, 2015 at 3:16 PM, Fabian Hueske <[hidden email]> wrote:

> Renaming the core operators is fine with me, but I would not touch API
> facing classes.
> A big +1 for Timo's suggestion.
>
> 2015-04-16 6:30 GMT-05:00 Timo Walther <[hidden email]>:
>
> > I share Stephans opinion.
> >
> > By the way, we could also find a common name for operators with two
> > inputs. Sometimes it's "TwoInputXXX", "DualInputXXX", "BinaryInputXXX"...
> > pretty inconsistent.
> >
> >
> > On 15.04.2015 17:48, Till Rohrmann wrote:
> >
> >> I would also be in favour of making the distinction between the API and
> >> common API layer more clear by using different names. This will ease the
> >> understanding of the source code.
> >>
> >> In the wake of a possible renaming we could also get rid of the legacy
> >> code
> >> org.apache.flink.optimizer.dag.MatchNode and
> >> rename org.apache.flink.runtime.operators.MatchDriver into JoinDriver to
> >> make the naming more consistent.
> >>
> >> On Wed, Apr 15, 2015 at 3:05 PM, Ufuk Celebi <[hidden email]> wrote:
> >>
> >>  On 15 Apr 2015, at 15:01, Stephan Ewen <[hidden email]> wrote:
> >>>
> >>>  I think we can rename the base operators.
> >>>>
> >>>> Renaming the subclass of DataSet would be extremely api breaking. I
> >>>> think
> >>>> that is not worth it.
> >>>>
> >>> Oh, that's right. We return MapOperator for DataSet operations.
> Stephan's
> >>> point makes sense.
> >>>
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: About Operator and OperatorBase

Stephan Ewen
Okay, it seems the consensus forms around not breaking the API.

When it comes to the *OperatorBase - should we rename them or simply get
rid of them (remove the common API). If we want to remove them, a
precondition is to remove the Record API, and for that, we should migrate
the Record-API-based test cases.

Stephan


On Thu, Apr 16, 2015 at 6:32 PM, Maximilian Michels <[hidden email]> wrote:

> +1 for keeping the API. Even though this will not change your initial
> concern much, Aljoscha :) I agree with you that it would be more consistent
> to call the result of an operator OperatorDataSet.
>
> On Thu, Apr 16, 2015 at 3:16 PM, Fabian Hueske <[hidden email]> wrote:
>
> > Renaming the core operators is fine with me, but I would not touch API
> > facing classes.
> > A big +1 for Timo's suggestion.
> >
> > 2015-04-16 6:30 GMT-05:00 Timo Walther <[hidden email]>:
> >
> > > I share Stephans opinion.
> > >
> > > By the way, we could also find a common name for operators with two
> > > inputs. Sometimes it's "TwoInputXXX", "DualInputXXX",
> "BinaryInputXXX"...
> > > pretty inconsistent.
> > >
> > >
> > > On 15.04.2015 17:48, Till Rohrmann wrote:
> > >
> > >> I would also be in favour of making the distinction between the API
> and
> > >> common API layer more clear by using different names. This will ease
> the
> > >> understanding of the source code.
> > >>
> > >> In the wake of a possible renaming we could also get rid of the legacy
> > >> code
> > >> org.apache.flink.optimizer.dag.MatchNode and
> > >> rename org.apache.flink.runtime.operators.MatchDriver into JoinDriver
> to
> > >> make the naming more consistent.
> > >>
> > >> On Wed, Apr 15, 2015 at 3:05 PM, Ufuk Celebi <[hidden email]> wrote:
> > >>
> > >>  On 15 Apr 2015, at 15:01, Stephan Ewen <[hidden email]> wrote:
> > >>>
> > >>>  I think we can rename the base operators.
> > >>>>
> > >>>> Renaming the subclass of DataSet would be extremely api breaking. I
> > >>>> think
> > >>>> that is not worth it.
> > >>>>
> > >>> Oh, that's right. We return MapOperator for DataSet operations.
> > Stephan's
> > >>> point makes sense.
> > >>>
> > >>
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: About Operator and OperatorBase

Henry Saputra
Since we are talking about common API, what was the original intention
or design for this layer?

From the doc:

" The Common API operator exists only in order for the flink-java and
flink-scalapackages to not have a dependency on the optimizer."

Currently the Java API Operator is converted into common APIs via
recursive call from ExecutionEnvironement#createProgramPlan which
delegate it to OperatorTranslation.
Just wanted to know more about why the original approach is taking this route.

Thanks,

- Henry

On Fri, Apr 17, 2015 at 5:47 AM, Stephan Ewen <[hidden email]> wrote:

> Okay, it seems the consensus forms around not breaking the API.
>
> When it comes to the *OperatorBase - should we rename them or simply get
> rid of them (remove the common API). If we want to remove them, a
> precondition is to remove the Record API, and for that, we should migrate
> the Record-API-based test cases.
>
> Stephan
>
>
> On Thu, Apr 16, 2015 at 6:32 PM, Maximilian Michels <[hidden email]> wrote:
>
>> +1 for keeping the API. Even though this will not change your initial
>> concern much, Aljoscha :) I agree with you that it would be more consistent
>> to call the result of an operator OperatorDataSet.
>>
>> On Thu, Apr 16, 2015 at 3:16 PM, Fabian Hueske <[hidden email]> wrote:
>>
>> > Renaming the core operators is fine with me, but I would not touch API
>> > facing classes.
>> > A big +1 for Timo's suggestion.
>> >
>> > 2015-04-16 6:30 GMT-05:00 Timo Walther <[hidden email]>:
>> >
>> > > I share Stephans opinion.
>> > >
>> > > By the way, we could also find a common name for operators with two
>> > > inputs. Sometimes it's "TwoInputXXX", "DualInputXXX",
>> "BinaryInputXXX"...
>> > > pretty inconsistent.
>> > >
>> > >
>> > > On 15.04.2015 17:48, Till Rohrmann wrote:
>> > >
>> > >> I would also be in favour of making the distinction between the API
>> and
>> > >> common API layer more clear by using different names. This will ease
>> the
>> > >> understanding of the source code.
>> > >>
>> > >> In the wake of a possible renaming we could also get rid of the legacy
>> > >> code
>> > >> org.apache.flink.optimizer.dag.MatchNode and
>> > >> rename org.apache.flink.runtime.operators.MatchDriver into JoinDriver
>> to
>> > >> make the naming more consistent.
>> > >>
>> > >> On Wed, Apr 15, 2015 at 3:05 PM, Ufuk Celebi <[hidden email]> wrote:
>> > >>
>> > >>  On 15 Apr 2015, at 15:01, Stephan Ewen <[hidden email]> wrote:
>> > >>>
>> > >>>  I think we can rename the base operators.
>> > >>>>
>> > >>>> Renaming the subclass of DataSet would be extremely api breaking. I
>> > >>>> think
>> > >>>> that is not worth it.
>> > >>>>
>> > >>> Oh, that's right. We return MapOperator for DataSet operations.
>> > Stephan's
>> > >>> point makes sense.
>> > >>>
>> > >>
>> > >
>> >
>>
Reply | Threaded
Open this post in threaded view
|

Re: About Operator and OperatorBase

Stephan Ewen
Originally, we had multiple Apis with different data models: the current
Java API, the record api, a JSON API. The common API was the data model
agnostic set of operators on which they built.

It has become redundant when we saw how well things can be built in top of
the Java API, using the TypeInformation. Now, Scala, Python, Dataflow, all
build on top of the Java API.
Reply | Threaded
Open this post in threaded view
|

Re: About Operator and OperatorBase

Henry Saputra
Thanks for the explanation, Stephan. I always wonder why the extra
common APIs exist.

Then I think this should be high priority if we want to remove the
common API to reduce the unnecessary layer and "dead code". As Ufuk
mentioned before, better doing it now before more stuff build on top
of Flink.

So removing old Record API [1] and the tests depending on them is step
one of the process, but what is JSON API?

- Henry

[1] https://issues.apache.org/jira/browse/FLINK-1681

On Tue, Apr 21, 2015 at 1:10 AM, Stephan Ewen <[hidden email]> wrote:
> Originally, we had multiple Apis with different data models: the current
> Java API, the record api, a JSON API. The common API was the data model
> agnostic set of operators on which they built.
>
> It has become redundant when we saw how well things can be built in top of
> the Java API, using the TypeInformation. Now, Scala, Python, Dataflow, all
> build on top of the Java API.
Reply | Threaded
Open this post in threaded view
|

Re: About Operator and OperatorBase

Kostas Tzoumas-2
I think Stephan meant Meteor, an old API when Flink was Stratosphere. This
was never part of the code that made it to Apache.

Not sure if we want to remove the common API, as it provides a dataflow
abstraction that is higher level than the JobGraph. Admittedly, I don't
have a better argument other than "it might be useful some day" and happy
to change my opinion :-)

Kostas

On Tue, Apr 21, 2015 at 5:49 PM, Henry Saputra <[hidden email]>
wrote:

> Thanks for the explanation, Stephan. I always wonder why the extra
> common APIs exist.
>
> Then I think this should be high priority if we want to remove the
> common API to reduce the unnecessary layer and "dead code". As Ufuk
> mentioned before, better doing it now before more stuff build on top
> of Flink.
>
> So removing old Record API [1] and the tests depending on them is step
> one of the process, but what is JSON API?
>
> - Henry
>
> [1] https://issues.apache.org/jira/browse/FLINK-1681
>
> On Tue, Apr 21, 2015 at 1:10 AM, Stephan Ewen <[hidden email]> wrote:
> > Originally, we had multiple Apis with different data models: the current
> > Java API, the record api, a JSON API. The common API was the data model
> > agnostic set of operators on which they built.
> >
> > It has become redundant when we saw how well things can be built in top
> of
> > the Java API, using the TypeInformation. Now, Scala, Python, Dataflow,
> all
> > build on top of the Java API.
>
Reply | Threaded
Open this post in threaded view
|

Re: About Operator and OperatorBase

Stephan Ewen
I think the main difference to the Java API is right now that the JavaAPI
is fluent and the common API is compositional. Otherwise there is not much
in the common API any more.

Not sure whether this warrants an extra API...



On Wed, Apr 22, 2015 at 3:52 PM, Kostas Tzoumas <[hidden email]> wrote:

> I think Stephan meant Meteor, an old API when Flink was Stratosphere. This
> was never part of the code that made it to Apache.
>
> Not sure if we want to remove the common API, as it provides a dataflow
> abstraction that is higher level than the JobGraph. Admittedly, I don't
> have a better argument other than "it might be useful some day" and happy
> to change my opinion :-)
>
> Kostas
>
> On Tue, Apr 21, 2015 at 5:49 PM, Henry Saputra <[hidden email]>
> wrote:
>
> > Thanks for the explanation, Stephan. I always wonder why the extra
> > common APIs exist.
> >
> > Then I think this should be high priority if we want to remove the
> > common API to reduce the unnecessary layer and "dead code". As Ufuk
> > mentioned before, better doing it now before more stuff build on top
> > of Flink.
> >
> > So removing old Record API [1] and the tests depending on them is step
> > one of the process, but what is JSON API?
> >
> > - Henry
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-1681
> >
> > On Tue, Apr 21, 2015 at 1:10 AM, Stephan Ewen <[hidden email]> wrote:
> > > Originally, we had multiple Apis with different data models: the
> current
> > > Java API, the record api, a JSON API. The common API was the data model
> > > agnostic set of operators on which they built.
> > >
> > > It has become redundant when we saw how well things can be built in top
> > of
> > > the Java API, using the TypeInformation. Now, Scala, Python, Dataflow,
> > all
> > > build on top of the Java API.
> >
>