An update on the DataStream API refactoring WiP

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

An update on the DataStream API refactoring WiP

Kostas Tzoumas-2
Hi folks,

Currently, Aljoscha, Stephan, and I are reworking the DataStream API as
discussed before. Things are a bit in-flight right now with several commits
and pull requests, and the current master containing code from both the old
and the new API.

I want to give you an idea of how the new API will look like. This is a
very rough draft of the new documentation page (also a WiP):

https://www.dropbox.com/sh/t5nvlx7meadppnp/AAD5sEIH5S3QNYTiMsyE9KBva?dl=0

Compared to the current API, the major changes include:

- Different syntax (and implementation) for windows. Old constructs will be
replaced by the new ones. The new syntax resembles Google's Dataflow model,
but contains "shortcuts" as syntactic sugar for common cases

- Different syntax (and implementation) of "grouping". New terminology will
be KeyedDataStream (and "keyBy") which will replace GroupedDataStream.

- Reduced functionality in ConnectedDataStream - only map and flatMap

- New syntax (and implementation) for window joins, removal of cross

- No changes in iterations besides deleting the "long milliseconds" argument

- No changes in state

- Deletion of "DataSet.forward() and .global()"

- Windows can only come after keyBy, otherwise they are DOP-1 operators and
are defined as "windowAll"

Best,
Kostas
Reply | Threaded
Open this post in threaded view
|

Re: An update on the DataStream API refactoring WiP

Kostas Tzoumas-2
Oh, and of course, support for event time. I might be forgetting more, feel
free to add to the list


On Fri, Oct 2, 2015 at 2:40 PM, Kostas Tzoumas <[hidden email]> wrote:

> Hi folks,
>
> Currently, Aljoscha, Stephan, and I are reworking the DataStream API as
> discussed before. Things are a bit in-flight right now with several commits
> and pull requests, and the current master containing code from both the old
> and the new API.
>
> I want to give you an idea of how the new API will look like. This is a
> very rough draft of the new documentation page (also a WiP):
>
> https://www.dropbox.com/sh/t5nvlx7meadppnp/AAD5sEIH5S3QNYTiMsyE9KBva?dl=0
>
> Compared to the current API, the major changes include:
>
> - Different syntax (and implementation) for windows. Old constructs will
> be replaced by the new ones. The new syntax resembles Google's Dataflow
> model, but contains "shortcuts" as syntactic sugar for common cases
>
> - Different syntax (and implementation) of "grouping". New terminology
> will be KeyedDataStream (and "keyBy") which will replace GroupedDataStream.
>
> - Reduced functionality in ConnectedDataStream - only map and flatMap
>
> - New syntax (and implementation) for window joins, removal of cross
>
> - No changes in iterations besides deleting the "long milliseconds"
> argument
>
> - No changes in state
>
> - Deletion of "DataSet.forward() and .global()"
>
> - Windows can only come after keyBy, otherwise they are DOP-1 operators
> and are defined as "windowAll"
>
> Best,
> Kostas
>
Reply | Threaded
Open this post in threaded view
|

Re: An update on the DataStream API refactoring WiP

Robert Metzger
I suspect: "- Deletion of "DataSet.forward() and .global()"" is a typo, you
meant DataStream ?

On Fri, Oct 2, 2015 at 2:44 PM, Kostas Tzoumas <[hidden email]> wrote:

> Oh, and of course, support for event time. I might be forgetting more, feel
> free to add to the list
>
>
> On Fri, Oct 2, 2015 at 2:40 PM, Kostas Tzoumas <[hidden email]>
> wrote:
>
> > Hi folks,
> >
> > Currently, Aljoscha, Stephan, and I are reworking the DataStream API as
> > discussed before. Things are a bit in-flight right now with several
> commits
> > and pull requests, and the current master containing code from both the
> old
> > and the new API.
> >
> > I want to give you an idea of how the new API will look like. This is a
> > very rough draft of the new documentation page (also a WiP):
> >
> >
> https://www.dropbox.com/sh/t5nvlx7meadppnp/AAD5sEIH5S3QNYTiMsyE9KBva?dl=0
> >
> > Compared to the current API, the major changes include:
> >
> > - Different syntax (and implementation) for windows. Old constructs will
> > be replaced by the new ones. The new syntax resembles Google's Dataflow
> > model, but contains "shortcuts" as syntactic sugar for common cases
> >
> > - Different syntax (and implementation) of "grouping". New terminology
> > will be KeyedDataStream (and "keyBy") which will replace
> GroupedDataStream.
> >
> > - Reduced functionality in ConnectedDataStream - only map and flatMap
> >
> > - New syntax (and implementation) for window joins, removal of cross
> >
> > - No changes in iterations besides deleting the "long milliseconds"
> > argument
> >
> > - No changes in state
> >
> > - Deletion of "DataSet.forward() and .global()"
> >
> > - Windows can only come after keyBy, otherwise they are DOP-1 operators
> > and are defined as "windowAll"
> >
> > Best,
> > Kostas
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: An update on the DataStream API refactoring WiP

Stephan Ewen
I added two comments to the pull request that this is based on...

On Fri, Oct 2, 2015 at 2:47 PM, Robert Metzger <[hidden email]> wrote:

> I suspect: "- Deletion of "DataSet.forward() and .global()"" is a typo, you
> meant DataStream ?
>
> On Fri, Oct 2, 2015 at 2:44 PM, Kostas Tzoumas <[hidden email]>
> wrote:
>
> > Oh, and of course, support for event time. I might be forgetting more,
> feel
> > free to add to the list
> >
> >
> > On Fri, Oct 2, 2015 at 2:40 PM, Kostas Tzoumas <[hidden email]>
> > wrote:
> >
> > > Hi folks,
> > >
> > > Currently, Aljoscha, Stephan, and I are reworking the DataStream API as
> > > discussed before. Things are a bit in-flight right now with several
> > commits
> > > and pull requests, and the current master containing code from both the
> > old
> > > and the new API.
> > >
> > > I want to give you an idea of how the new API will look like. This is a
> > > very rough draft of the new documentation page (also a WiP):
> > >
> > >
> >
> https://www.dropbox.com/sh/t5nvlx7meadppnp/AAD5sEIH5S3QNYTiMsyE9KBva?dl=0
> > >
> > > Compared to the current API, the major changes include:
> > >
> > > - Different syntax (and implementation) for windows. Old constructs
> will
> > > be replaced by the new ones. The new syntax resembles Google's Dataflow
> > > model, but contains "shortcuts" as syntactic sugar for common cases
> > >
> > > - Different syntax (and implementation) of "grouping". New terminology
> > > will be KeyedDataStream (and "keyBy") which will replace
> > GroupedDataStream.
> > >
> > > - Reduced functionality in ConnectedDataStream - only map and flatMap
> > >
> > > - New syntax (and implementation) for window joins, removal of cross
> > >
> > > - No changes in iterations besides deleting the "long milliseconds"
> > > argument
> > >
> > > - No changes in state
> > >
> > > - Deletion of "DataSet.forward() and .global()"
> > >
> > > - Windows can only come after keyBy, otherwise they are DOP-1 operators
> > > and are defined as "windowAll"
> > >
> > > Best,
> > > Kostas
> > >
> >
>
mxm
Reply | Threaded
Open this post in threaded view
|

Re: An update on the DataStream API refactoring WiP

mxm
You made very sensible choices for improving and finalizing the
Streaming API. The documentation is much clearer now. By the way, here
is the pull request: https://github.com/apache/flink/pull/1208

On Fri, Oct 2, 2015 at 3:02 PM, Stephan Ewen <[hidden email]> wrote:

> I added two comments to the pull request that this is based on...
>
> On Fri, Oct 2, 2015 at 2:47 PM, Robert Metzger <[hidden email]> wrote:
>
>> I suspect: "- Deletion of "DataSet.forward() and .global()"" is a typo, you
>> meant DataStream ?
>>
>> On Fri, Oct 2, 2015 at 2:44 PM, Kostas Tzoumas <[hidden email]>
>> wrote:
>>
>> > Oh, and of course, support for event time. I might be forgetting more,
>> feel
>> > free to add to the list
>> >
>> >
>> > On Fri, Oct 2, 2015 at 2:40 PM, Kostas Tzoumas <[hidden email]>
>> > wrote:
>> >
>> > > Hi folks,
>> > >
>> > > Currently, Aljoscha, Stephan, and I are reworking the DataStream API as
>> > > discussed before. Things are a bit in-flight right now with several
>> > commits
>> > > and pull requests, and the current master containing code from both the
>> > old
>> > > and the new API.
>> > >
>> > > I want to give you an idea of how the new API will look like. This is a
>> > > very rough draft of the new documentation page (also a WiP):
>> > >
>> > >
>> >
>> https://www.dropbox.com/sh/t5nvlx7meadppnp/AAD5sEIH5S3QNYTiMsyE9KBva?dl=0
>> > >
>> > > Compared to the current API, the major changes include:
>> > >
>> > > - Different syntax (and implementation) for windows. Old constructs
>> will
>> > > be replaced by the new ones. The new syntax resembles Google's Dataflow
>> > > model, but contains "shortcuts" as syntactic sugar for common cases
>> > >
>> > > - Different syntax (and implementation) of "grouping". New terminology
>> > > will be KeyedDataStream (and "keyBy") which will replace
>> > GroupedDataStream.
>> > >
>> > > - Reduced functionality in ConnectedDataStream - only map and flatMap
>> > >
>> > > - New syntax (and implementation) for window joins, removal of cross
>> > >
>> > > - No changes in iterations besides deleting the "long milliseconds"
>> > > argument
>> > >
>> > > - No changes in state
>> > >
>> > > - Deletion of "DataSet.forward() and .global()"
>> > >
>> > > - Windows can only come after keyBy, otherwise they are DOP-1 operators
>> > > and are defined as "windowAll"
>> > >
>> > > Best,
>> > > Kostas
>> > >
>> >
>>
Reply | Threaded
Open this post in threaded view
|

Re: An update on the DataStream API refactoring WiP

Kostas Tzoumas-2
In reply to this post by Robert Metzger
right, I meant DataStream

On Fri, Oct 2, 2015 at 2:47 PM, Robert Metzger <[hidden email]> wrote:

> I suspect: "- Deletion of "DataSet.forward() and .global()"" is a typo, you
> meant DataStream ?
>
> On Fri, Oct 2, 2015 at 2:44 PM, Kostas Tzoumas <[hidden email]>
> wrote:
>
> > Oh, and of course, support for event time. I might be forgetting more,
> feel
> > free to add to the list
> >
> >
> > On Fri, Oct 2, 2015 at 2:40 PM, Kostas Tzoumas <[hidden email]>
> > wrote:
> >
> > > Hi folks,
> > >
> > > Currently, Aljoscha, Stephan, and I are reworking the DataStream API as
> > > discussed before. Things are a bit in-flight right now with several
> > commits
> > > and pull requests, and the current master containing code from both the
> > old
> > > and the new API.
> > >
> > > I want to give you an idea of how the new API will look like. This is a
> > > very rough draft of the new documentation page (also a WiP):
> > >
> > >
> >
> https://www.dropbox.com/sh/t5nvlx7meadppnp/AAD5sEIH5S3QNYTiMsyE9KBva?dl=0
> > >
> > > Compared to the current API, the major changes include:
> > >
> > > - Different syntax (and implementation) for windows. Old constructs
> will
> > > be replaced by the new ones. The new syntax resembles Google's Dataflow
> > > model, but contains "shortcuts" as syntactic sugar for common cases
> > >
> > > - Different syntax (and implementation) of "grouping". New terminology
> > > will be KeyedDataStream (and "keyBy") which will replace
> > GroupedDataStream.
> > >
> > > - Reduced functionality in ConnectedDataStream - only map and flatMap
> > >
> > > - New syntax (and implementation) for window joins, removal of cross
> > >
> > > - No changes in iterations besides deleting the "long milliseconds"
> > > argument
> > >
> > > - No changes in state
> > >
> > > - Deletion of "DataSet.forward() and .global()"
> > >
> > > - Windows can only come after keyBy, otherwise they are DOP-1 operators
> > > and are defined as "windowAll"
> > >
> > > Best,
> > > Kostas
> > >
> >
>