Question about DataStream class hierarchy

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about DataStream class hierarchy

Matthias J. Sax
Hi,

I am a little bit confused about the class hierarchy of DataStream. It
has three subclasses: KeyedDataStream, SingleOutputStreamOperator, and
SplitDataStream.

1) Why is the name "SingleOutputStreamOperator" (why OPERATOR ??)

2) Is it correct, that a SplitDataStream emit multiple logical output
streams, while SingleOutputStreamOperator and KeyedDataStream emit a
single logical output stream?
   => If yes, why is a KeyedDataStream not a subclass of
SingleOutputStreamOperator ?

3)
  a) Why does only SingleOutputStreamOperator has method name()/getName()?
  b) Why does only SingleOutputStreamOperator has method setParallelism()?
  c) Should those methods be members of DataStream instead?



-Matthias


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Question about DataStream class hierarchy

Aljoscha Krettek-2
Yes, very good points. I think we will be fixing these when we do the API
cleanups that we discussed on the wiki design docs. In fact, the work I'm
doing on https://issues.apache.org/jira/browse/FLINK-2398 can be seen as
preparation for making these changes possible/easier.

On Tue, 28 Jul 2015 at 21:56 Matthias J. Sax <[hidden email]>
wrote:

> Hi,
>
> I am a little bit confused about the class hierarchy of DataStream. It
> has three subclasses: KeyedDataStream, SingleOutputStreamOperator, and
> SplitDataStream.
>
> 1) Why is the name "SingleOutputStreamOperator" (why OPERATOR ??)
>
> 2) Is it correct, that a SplitDataStream emit multiple logical output
> streams, while SingleOutputStreamOperator and KeyedDataStream emit a
> single logical output stream?
>    => If yes, why is a KeyedDataStream not a subclass of
> SingleOutputStreamOperator ?
>
> 3)
>   a) Why does only SingleOutputStreamOperator has method name()/getName()?
>   b) Why does only SingleOutputStreamOperator has method setParallelism()?
>   c) Should those methods be members of DataStream instead?
>
>
>
> -Matthias
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Question about DataStream class hierarchy

Matthias J. Sax
My current work depends on a clean design of those. Otherwise, my own
code would get very messy. I would like to apply some changes in my own
PR (not opened yet). Do you thinks this is feasible? I don't want get in
a messy state. What kind of changes are you going to apply in FLINK-2398?

-Matthias


On 07/28/2015 10:30 PM, Aljoscha Krettek wrote:

> Yes, very good points. I think we will be fixing these when we do the API
> cleanups that we discussed on the wiki design docs. In fact, the work I'm
> doing on https://issues.apache.org/jira/browse/FLINK-2398 can be seen as
> preparation for making these changes possible/easier.
>
> On Tue, 28 Jul 2015 at 21:56 Matthias J. Sax <[hidden email]>
> wrote:
>
>> Hi,
>>
>> I am a little bit confused about the class hierarchy of DataStream. It
>> has three subclasses: KeyedDataStream, SingleOutputStreamOperator, and
>> SplitDataStream.
>>
>> 1) Why is the name "SingleOutputStreamOperator" (why OPERATOR ??)
>>
>> 2) Is it correct, that a SplitDataStream emit multiple logical output
>> streams, while SingleOutputStreamOperator and KeyedDataStream emit a
>> single logical output stream?
>>    => If yes, why is a KeyedDataStream not a subclass of
>> SingleOutputStreamOperator ?
>>
>> 3)
>>   a) Why does only SingleOutputStreamOperator has method name()/getName()?
>>   b) Why does only SingleOutputStreamOperator has method setParallelism()?
>>   c) Should those methods be members of DataStream instead?
>>
>>
>>
>> -Matthias
>>
>>
>


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Question about DataStream class hierarchy

Aljoscha Krettek-2
Right now it's mostly under-the-hood changes but you can look at the
progress here: https://github.com/aljoscha/flink/tree/stream-api-rework

The commit is going to change, so if you do put your work on top of it you
might have to rebase.

On Wed, 29 Jul 2015 at 07:26 Matthias J. Sax <[hidden email]>
wrote:

> My current work depends on a clean design of those. Otherwise, my own
> code would get very messy. I would like to apply some changes in my own
> PR (not opened yet). Do you thinks this is feasible? I don't want get in
> a messy state. What kind of changes are you going to apply in FLINK-2398?
>
> -Matthias
>
>
> On 07/28/2015 10:30 PM, Aljoscha Krettek wrote:
> > Yes, very good points. I think we will be fixing these when we do the API
> > cleanups that we discussed on the wiki design docs. In fact, the work I'm
> > doing on https://issues.apache.org/jira/browse/FLINK-2398 can be seen as
> > preparation for making these changes possible/easier.
> >
> > On Tue, 28 Jul 2015 at 21:56 Matthias J. Sax <
> [hidden email]>
> > wrote:
> >
> >> Hi,
> >>
> >> I am a little bit confused about the class hierarchy of DataStream. It
> >> has three subclasses: KeyedDataStream, SingleOutputStreamOperator, and
> >> SplitDataStream.
> >>
> >> 1) Why is the name "SingleOutputStreamOperator" (why OPERATOR ??)
> >>
> >> 2) Is it correct, that a SplitDataStream emit multiple logical output
> >> streams, while SingleOutputStreamOperator and KeyedDataStream emit a
> >> single logical output stream?
> >>    => If yes, why is a KeyedDataStream not a subclass of
> >> SingleOutputStreamOperator ?
> >>
> >> 3)
> >>   a) Why does only SingleOutputStreamOperator has method
> name()/getName()?
> >>   b) Why does only SingleOutputStreamOperator has method
> setParallelism()?
> >>   c) Should those methods be members of DataStream instead?
> >>
> >>
> >>
> >> -Matthias
> >>
> >>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Question about DataStream class hierarchy

Matthias J. Sax
What is the expected time frame for you work? I don't want to delay my
work too long (if I base it on your branch, it could not be merged
before yours).

Right now, you did not change the class hierarchy. However, that is what
I would need. Thus, it make no sense to use you branch as a base right
now. What are your plans about this?

-> one side comment: would it make sense to make DataStream abstract?

From my point of view, it make most sense to me, that I apply the
changes I need in my PR directly (based on master).

-Matthias


On 07/29/2015 08:11 AM, Aljoscha Krettek wrote:

> Right now it's mostly under-the-hood changes but you can look at the
> progress here: https://github.com/aljoscha/flink/tree/stream-api-rework
>
> The commit is going to change, so if you do put your work on top of it you
> might have to rebase.
>
> On Wed, 29 Jul 2015 at 07:26 Matthias J. Sax <[hidden email]>
> wrote:
>
>> My current work depends on a clean design of those. Otherwise, my own
>> code would get very messy. I would like to apply some changes in my own
>> PR (not opened yet). Do you thinks this is feasible? I don't want get in
>> a messy state. What kind of changes are you going to apply in FLINK-2398?
>>
>> -Matthias
>>
>>
>> On 07/28/2015 10:30 PM, Aljoscha Krettek wrote:
>>> Yes, very good points. I think we will be fixing these when we do the API
>>> cleanups that we discussed on the wiki design docs. In fact, the work I'm
>>> doing on https://issues.apache.org/jira/browse/FLINK-2398 can be seen as
>>> preparation for making these changes possible/easier.
>>>
>>> On Tue, 28 Jul 2015 at 21:56 Matthias J. Sax <
>> [hidden email]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am a little bit confused about the class hierarchy of DataStream. It
>>>> has three subclasses: KeyedDataStream, SingleOutputStreamOperator, and
>>>> SplitDataStream.
>>>>
>>>> 1) Why is the name "SingleOutputStreamOperator" (why OPERATOR ??)
>>>>
>>>> 2) Is it correct, that a SplitDataStream emit multiple logical output
>>>> streams, while SingleOutputStreamOperator and KeyedDataStream emit a
>>>> single logical output stream?
>>>>    => If yes, why is a KeyedDataStream not a subclass of
>>>> SingleOutputStreamOperator ?
>>>>
>>>> 3)
>>>>   a) Why does only SingleOutputStreamOperator has method
>> name()/getName()?
>>>>   b) Why does only SingleOutputStreamOperator has method
>> setParallelism()?
>>>>   c) Should those methods be members of DataStream instead?
>>>>
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>
>>
>>
>


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Question about DataStream class hierarchy

Matthias J. Sax
Hi,

I would like to apply the following changes to DataStream class
hierarchy: https://github.com/mjsax/flink/tree/flink-2306-storm-namedStreams

Please give some feedback if those changes are reasonable to you.

I need those change to get a clean design for
https://issues.apache.org/jira/browse/FLINK-2306


-Matthias



On 07/29/2015 12:07 PM, Matthias J. Sax wrote:

> What is the expected time frame for you work? I don't want to delay my
> work too long (if I base it on your branch, it could not be merged
> before yours).
>
> Right now, you did not change the class hierarchy. However, that is what
> I would need. Thus, it make no sense to use you branch as a base right
> now. What are your plans about this?
>
> -> one side comment: would it make sense to make DataStream abstract?
>
> From my point of view, it make most sense to me, that I apply the
> changes I need in my PR directly (based on master).
>
> -Matthias
>
>
> On 07/29/2015 08:11 AM, Aljoscha Krettek wrote:
>> Right now it's mostly under-the-hood changes but you can look at the
>> progress here: https://github.com/aljoscha/flink/tree/stream-api-rework
>>
>> The commit is going to change, so if you do put your work on top of it you
>> might have to rebase.
>>
>> On Wed, 29 Jul 2015 at 07:26 Matthias J. Sax <[hidden email]>
>> wrote:
>>
>>> My current work depends on a clean design of those. Otherwise, my own
>>> code would get very messy. I would like to apply some changes in my own
>>> PR (not opened yet). Do you thinks this is feasible? I don't want get in
>>> a messy state. What kind of changes are you going to apply in FLINK-2398?
>>>
>>> -Matthias
>>>
>>>
>>> On 07/28/2015 10:30 PM, Aljoscha Krettek wrote:
>>>> Yes, very good points. I think we will be fixing these when we do the API
>>>> cleanups that we discussed on the wiki design docs. In fact, the work I'm
>>>> doing on https://issues.apache.org/jira/browse/FLINK-2398 can be seen as
>>>> preparation for making these changes possible/easier.
>>>>
>>>> On Tue, 28 Jul 2015 at 21:56 Matthias J. Sax <
>>> [hidden email]>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am a little bit confused about the class hierarchy of DataStream. It
>>>>> has three subclasses: KeyedDataStream, SingleOutputStreamOperator, and
>>>>> SplitDataStream.
>>>>>
>>>>> 1) Why is the name "SingleOutputStreamOperator" (why OPERATOR ??)
>>>>>
>>>>> 2) Is it correct, that a SplitDataStream emit multiple logical output
>>>>> streams, while SingleOutputStreamOperator and KeyedDataStream emit a
>>>>> single logical output stream?
>>>>>    => If yes, why is a KeyedDataStream not a subclass of
>>>>> SingleOutputStreamOperator ?
>>>>>
>>>>> 3)
>>>>>   a) Why does only SingleOutputStreamOperator has method
>>> name()/getName()?
>>>>>   b) Why does only SingleOutputStreamOperator has method
>>> setParallelism()?
>>>>>   c) Should those methods be members of DataStream instead?
>>>>>
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Question about DataStream class hierarchy

Gyula Fóra-2
Hi Matthias,

I think Aljoscha is preparing a nice PR that completely reworks the
DataStream classes and the information they actually contain. I don't think
it's a good idea to mess things up before he gets a chance to open the PR.

Also I don't see a well supported reason for moving the setParallelism,
setName etc method to the DataStream, as these are specific things that you
can only set on operators. The KeyedDataStream is not an operator on the
other hand.

Can we just wait a little bit for Aljoscha with this? If you really need
his changes, you can for his branch and we can consider your changes after
merging his.

Regards,
Gyula



Matthias J. Sax <[hidden email]> ezt írta (időpont: 2015.
júl. 31., P, 21:57):

> Hi,
>
> I would like to apply the following changes to DataStream class
> hierarchy:
> https://github.com/mjsax/flink/tree/flink-2306-storm-namedStreams
>
> Please give some feedback if those changes are reasonable to you.
>
> I need those change to get a clean design for
> https://issues.apache.org/jira/browse/FLINK-2306
>
>
> -Matthias
>
>
>
> On 07/29/2015 12:07 PM, Matthias J. Sax wrote:
> > What is the expected time frame for you work? I don't want to delay my
> > work too long (if I base it on your branch, it could not be merged
> > before yours).
> >
> > Right now, you did not change the class hierarchy. However, that is what
> > I would need. Thus, it make no sense to use you branch as a base right
> > now. What are your plans about this?
> >
> > -> one side comment: would it make sense to make DataStream abstract?
> >
> > From my point of view, it make most sense to me, that I apply the
> > changes I need in my PR directly (based on master).
> >
> > -Matthias
> >
> >
> > On 07/29/2015 08:11 AM, Aljoscha Krettek wrote:
> >> Right now it's mostly under-the-hood changes but you can look at the
> >> progress here: https://github.com/aljoscha/flink/tree/stream-api-rework
> >>
> >> The commit is going to change, so if you do put your work on top of it
> you
> >> might have to rebase.
> >>
> >> On Wed, 29 Jul 2015 at 07:26 Matthias J. Sax <
> [hidden email]>
> >> wrote:
> >>
> >>> My current work depends on a clean design of those. Otherwise, my own
> >>> code would get very messy. I would like to apply some changes in my own
> >>> PR (not opened yet). Do you thinks this is feasible? I don't want get
> in
> >>> a messy state. What kind of changes are you going to apply in
> FLINK-2398?
> >>>
> >>> -Matthias
> >>>
> >>>
> >>> On 07/28/2015 10:30 PM, Aljoscha Krettek wrote:
> >>>> Yes, very good points. I think we will be fixing these when we do the
> API
> >>>> cleanups that we discussed on the wiki design docs. In fact, the work
> I'm
> >>>> doing on https://issues.apache.org/jira/browse/FLINK-2398 can be
> seen as
> >>>> preparation for making these changes possible/easier.
> >>>>
> >>>> On Tue, 28 Jul 2015 at 21:56 Matthias J. Sax <
> >>> [hidden email]>
> >>>> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I am a little bit confused about the class hierarchy of DataStream.
> It
> >>>>> has three subclasses: KeyedDataStream, SingleOutputStreamOperator,
> and
> >>>>> SplitDataStream.
> >>>>>
> >>>>> 1) Why is the name "SingleOutputStreamOperator" (why OPERATOR ??)
> >>>>>
> >>>>> 2) Is it correct, that a SplitDataStream emit multiple logical output
> >>>>> streams, while SingleOutputStreamOperator and KeyedDataStream emit a
> >>>>> single logical output stream?
> >>>>>    => If yes, why is a KeyedDataStream not a subclass of
> >>>>> SingleOutputStreamOperator ?
> >>>>>
> >>>>> 3)
> >>>>>   a) Why does only SingleOutputStreamOperator has method
> >>> name()/getName()?
> >>>>>   b) Why does only SingleOutputStreamOperator has method
> >>> setParallelism()?
> >>>>>   c) Should those methods be members of DataStream instead?
> >>>>>
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Question about DataStream class hierarchy

Stephan Ewen
I agree with Gyula here.

Getting the API right is too important to "quick fix" it.

On Fri, Jul 31, 2015 at 10:06 PM, Gyula Fóra <[hidden email]> wrote:

> Hi Matthias,
>
> I think Aljoscha is preparing a nice PR that completely reworks the
> DataStream classes and the information they actually contain. I don't think
> it's a good idea to mess things up before he gets a chance to open the PR.
>
> Also I don't see a well supported reason for moving the setParallelism,
> setName etc method to the DataStream, as these are specific things that you
> can only set on operators. The KeyedDataStream is not an operator on the
> other hand.
>
> Can we just wait a little bit for Aljoscha with this? If you really need
> his changes, you can for his branch and we can consider your changes after
> merging his.
>
> Regards,
> Gyula
>
>
>
> Matthias J. Sax <[hidden email]> ezt írta (időpont: 2015.
> júl. 31., P, 21:57):
>
> > Hi,
> >
> > I would like to apply the following changes to DataStream class
> > hierarchy:
> > https://github.com/mjsax/flink/tree/flink-2306-storm-namedStreams
> >
> > Please give some feedback if those changes are reasonable to you.
> >
> > I need those change to get a clean design for
> > https://issues.apache.org/jira/browse/FLINK-2306
> >
> >
> > -Matthias
> >
> >
> >
> > On 07/29/2015 12:07 PM, Matthias J. Sax wrote:
> > > What is the expected time frame for you work? I don't want to delay my
> > > work too long (if I base it on your branch, it could not be merged
> > > before yours).
> > >
> > > Right now, you did not change the class hierarchy. However, that is
> what
> > > I would need. Thus, it make no sense to use you branch as a base right
> > > now. What are your plans about this?
> > >
> > > -> one side comment: would it make sense to make DataStream abstract?
> > >
> > > From my point of view, it make most sense to me, that I apply the
> > > changes I need in my PR directly (based on master).
> > >
> > > -Matthias
> > >
> > >
> > > On 07/29/2015 08:11 AM, Aljoscha Krettek wrote:
> > >> Right now it's mostly under-the-hood changes but you can look at the
> > >> progress here:
> https://github.com/aljoscha/flink/tree/stream-api-rework
> > >>
> > >> The commit is going to change, so if you do put your work on top of it
> > you
> > >> might have to rebase.
> > >>
> > >> On Wed, 29 Jul 2015 at 07:26 Matthias J. Sax <
> > [hidden email]>
> > >> wrote:
> > >>
> > >>> My current work depends on a clean design of those. Otherwise, my own
> > >>> code would get very messy. I would like to apply some changes in my
> own
> > >>> PR (not opened yet). Do you thinks this is feasible? I don't want get
> > in
> > >>> a messy state. What kind of changes are you going to apply in
> > FLINK-2398?
> > >>>
> > >>> -Matthias
> > >>>
> > >>>
> > >>> On 07/28/2015 10:30 PM, Aljoscha Krettek wrote:
> > >>>> Yes, very good points. I think we will be fixing these when we do
> the
> > API
> > >>>> cleanups that we discussed on the wiki design docs. In fact, the
> work
> > I'm
> > >>>> doing on https://issues.apache.org/jira/browse/FLINK-2398 can be
> > seen as
> > >>>> preparation for making these changes possible/easier.
> > >>>>
> > >>>> On Tue, 28 Jul 2015 at 21:56 Matthias J. Sax <
> > >>> [hidden email]>
> > >>>> wrote:
> > >>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> I am a little bit confused about the class hierarchy of DataStream.
> > It
> > >>>>> has three subclasses: KeyedDataStream, SingleOutputStreamOperator,
> > and
> > >>>>> SplitDataStream.
> > >>>>>
> > >>>>> 1) Why is the name "SingleOutputStreamOperator" (why OPERATOR ??)
> > >>>>>
> > >>>>> 2) Is it correct, that a SplitDataStream emit multiple logical
> output
> > >>>>> streams, while SingleOutputStreamOperator and KeyedDataStream emit
> a
> > >>>>> single logical output stream?
> > >>>>>    => If yes, why is a KeyedDataStream not a subclass of
> > >>>>> SingleOutputStreamOperator ?
> > >>>>>
> > >>>>> 3)
> > >>>>>   a) Why does only SingleOutputStreamOperator has method
> > >>> name()/getName()?
> > >>>>>   b) Why does only SingleOutputStreamOperator has method
> > >>> setParallelism()?
> > >>>>>   c) Should those methods be members of DataStream instead?
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> -Matthias
> > >>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>
> > >
> >
> >
>