Flink Streaming Hangout

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink Streaming Hangout

Gyula Fóra
Hey All,

I have created a google hangout for today's Streaming discussion:
https://plus.google.com/hangouts/_/gws3f77u5fee5euehtw7zwob2qa

As we have agreed we will start today at 17:30 CET / 08:30 PST and I think
we'll be around for those who can only join later as well.

Please find the topics here
<https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming>.

Gyula
Reply | Threaded
Open this post in threaded view
|

Re: Flink Streaming Hangout

Henry Saputra
Thanks for arranging the hangout Gyula, sorry I have to miss the it.

I will help review the proposed doc in the wiki and add
comments/questions as they come up.

- Henry

On Fri, Dec 12, 2014 at 3:22 AM, Gyula Fóra <[hidden email]> wrote:

> Hey All,
>
> I have created a google hangout for today's Streaming discussion:
> https://plus.google.com/hangouts/_/gws3f77u5fee5euehtw7zwob2qa
>
> As we have agreed we will start today at 17:30 CET / 08:30 PST and I think
> we'll be around for those who can only join later as well.
>
> Please find the topics here
> <https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming>.
>
> Gyula
Reply | Threaded
Open this post in threaded view
|

Re: Flink Streaming Hangout

Henry Saputra
In reply to this post by Gyula Fóra
Related to Zeppelin [1], looks like it is sponsored/ developed by a
company in Korea [2] that has nothing to do with football
unfortunately (I thought they were the same team that does
http://www.nfl.com/stats/statslab),
I was kinda disappointed at the beginning =P

But anyway seemed like integration with Flink would be considered as
potential next one  =)

Just want to make sure I clear up my comments in the hangout.

- Henry

[1] https://github.com/NFLabs/zeppelin/blob/master/README.md
[2] http://www.nflabs.com

On Fri, Dec 12, 2014 at 3:22 AM, Gyula Fóra <[hidden email]> wrote:

> Hey All,
>
> I have created a google hangout for today's Streaming discussion:
> https://plus.google.com/hangouts/_/gws3f77u5fee5euehtw7zwob2qa
>
> As we have agreed we will start today at 17:30 CET / 08:30 PST and I think
> we'll be around for those who can only join later as well.
>
> Please find the topics here
> <https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming>.
>
> Gyula
Reply | Threaded
Open this post in threaded view
|

Re: Flink Streaming Hangout

Márton Balassi
The hangout was not recorded, so I'm providing a short write-up on the
issues and decisions. The discussion was 2 hours long, so please feel free
to add the important statements I have missed.

The initial ideas are listed on the project wiki as Flink Streaming
roadmap. [1] The hangout yielded the following additions:

   * Fault tolerance: We have a (mostly) working prototype not yet merged
for at least once semantics, that works similarly to Storm. A missing
feature on the streaming side is vertex restarts in the ExecutionGraph,
which will be made easier with Ufuk's intermediate results [2] pull
request, which will be merged after the 0.8 release. As for exactly once
semantics the preferred option was upstream backup, which is conceptually
the same as backtracking until an intermediate result is found - given that
intermediate results are stored at every vertex.

   * A common pipeline architecture for batch and streaming: The original
idea was to have just one ExecutionEnvironment which can convert DataSets
to DataStreams and vice versa. Gyula hacked together a small prototype
where a DataSet was fed into a DataStream, but for seamless integration
large refactor would have been needed. Stephan stepped in with the idea
that most likely only the DataSet to DataStream option should be supported
and initially let's work it through materializing the batch result in some
in-memory abstraction or even files. This would results in building
separate batch and streaming JobGraphs, and thus addressing optimization,
fault tolerance etc. separately. Gyula mentioned Chiwan Park's pending PR
on using HDFS updates as a streaming source as a possible solution for
feeding the results of recurring batch jobs into streaming.

   * API integrations: We've just added java 8 support to the streaming API
and started working on the Scala API as well, which seems to be a low
hanging fruit standing on Aljoscha's shoulder. A next step would be also
adding the Python API and building on that providing a notebook-like "IDE",
with e.g. Zeppelin. [4] This is also (in fact mainly) interesting for batch
processing. For further integrations a scala shell should be really useful.
According to Stephan the latter should not be too challenging, mostly API
and some Scheduler work is required.

   * Multiparadigm (batch & streaming) ML: Opening to the machine learning
direction Paris and Vasia took up the SAMOA [5] integration issue, which
would provide streaming machine learning support and also comparability
with Storm, S4 and Samza. Kostas mentioned that the Mahout port to Flink is
also an on-going effort.

Further topics included the state of the 0.8 release, for which the first
release candidate should come next week; the streaming windowing rework
lead by Jonas [6], and conceptional comparison of Spark and Flink initiated
by Henry.

Special thanks to Mayur for tuning in despite of being around midnight in
India and providing valuable insight on Tachyon.

[1] https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming
[2] https://github.com/apache/incubator-flink/pull/254
[3] https://github.com/apache/incubator-flink/pull/226
[4] https://github.com/NFLabs/zeppelin/blob/master/README.md
[5] http://samoa-project.net/
[6]
http://mail-archives.apache.org/mod_mbox/incubator-flink-dev/201412.mbox/%3CCANBGL8uzpthoapQRZPK1v8seFcTM%3DCFA2-MRECkfiNg4LXmbLA%40mail.gmail.com%3E

Cheers,

Marton

On Fri, Dec 12, 2014 at 8:25 PM, Henry Saputra <[hidden email]>
wrote:

>
> Related to Zeppelin [1], looks like it is sponsored/ developed by a
> company in Korea [2] that has nothing to do with football
> unfortunately (I thought they were the same team that does
> http://www.nfl.com/stats/statslab),
> I was kinda disappointed at the beginning =P
>
> But anyway seemed like integration with Flink would be considered as
> potential next one  =)
>
> Just want to make sure I clear up my comments in the hangout.
>
> - Henry
>
> [1] https://github.com/NFLabs/zeppelin/blob/master/README.md
> [2] http://www.nflabs.com
>
> On Fri, Dec 12, 2014 at 3:22 AM, Gyula Fóra <[hidden email]> wrote:
> > Hey All,
> >
> > I have created a google hangout for today's Streaming discussion:
> > https://plus.google.com/hangouts/_/gws3f77u5fee5euehtw7zwob2qa
> >
> > As we have agreed we will start today at 17:30 CET / 08:30 PST and I
> think
> > we'll be around for those who can only join later as well.
> >
> > Please find the topics here
> > <https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming>.
> >
> > Gyula
>
Reply | Threaded
Open this post in threaded view
|

Re: Flink Streaming Hangout

Gianmarco De Francisci Morales
Thanks for the recap, and sorry for being unable to join in the end.

If there is anything I can do to help with the integration of SAMOA don't
hesitate to ask.

Cheers,



--
Gianmarco

On 12 December 2014 at 21:35, Márton Balassi <[hidden email]>
wrote:

>
> The hangout was not recorded, so I'm providing a short write-up on the
> issues and decisions. The discussion was 2 hours long, so please feel free
> to add the important statements I have missed.
>
> The initial ideas are listed on the project wiki as Flink Streaming
> roadmap. [1] The hangout yielded the following additions:
>
>    * Fault tolerance: We have a (mostly) working prototype not yet merged
> for at least once semantics, that works similarly to Storm. A missing
> feature on the streaming side is vertex restarts in the ExecutionGraph,
> which will be made easier with Ufuk's intermediate results [2] pull
> request, which will be merged after the 0.8 release. As for exactly once
> semantics the preferred option was upstream backup, which is conceptually
> the same as backtracking until an intermediate result is found - given that
> intermediate results are stored at every vertex.
>
>    * A common pipeline architecture for batch and streaming: The original
> idea was to have just one ExecutionEnvironment which can convert DataSets
> to DataStreams and vice versa. Gyula hacked together a small prototype
> where a DataSet was fed into a DataStream, but for seamless integration
> large refactor would have been needed. Stephan stepped in with the idea
> that most likely only the DataSet to DataStream option should be supported
> and initially let's work it through materializing the batch result in some
> in-memory abstraction or even files. This would results in building
> separate batch and streaming JobGraphs, and thus addressing optimization,
> fault tolerance etc. separately. Gyula mentioned Chiwan Park's pending PR
> on using HDFS updates as a streaming source as a possible solution for
> feeding the results of recurring batch jobs into streaming.
>
>    * API integrations: We've just added java 8 support to the streaming API
> and started working on the Scala API as well, which seems to be a low
> hanging fruit standing on Aljoscha's shoulder. A next step would be also
> adding the Python API and building on that providing a notebook-like "IDE",
> with e.g. Zeppelin. [4] This is also (in fact mainly) interesting for batch
> processing. For further integrations a scala shell should be really useful.
> According to Stephan the latter should not be too challenging, mostly API
> and some Scheduler work is required.
>
>    * Multiparadigm (batch & streaming) ML: Opening to the machine learning
> direction Paris and Vasia took up the SAMOA [5] integration issue, which
> would provide streaming machine learning support and also comparability
> with Storm, S4 and Samza. Kostas mentioned that the Mahout port to Flink is
> also an on-going effort.
>
> Further topics included the state of the 0.8 release, for which the first
> release candidate should come next week; the streaming windowing rework
> lead by Jonas [6], and conceptional comparison of Spark and Flink initiated
> by Henry.
>
> Special thanks to Mayur for tuning in despite of being around midnight in
> India and providing valuable insight on Tachyon.
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming
> [2] https://github.com/apache/incubator-flink/pull/254
> [3] https://github.com/apache/incubator-flink/pull/226
> [4] https://github.com/NFLabs/zeppelin/blob/master/README.md
> [5] http://samoa-project.net/
> [6]
>
> http://mail-archives.apache.org/mod_mbox/incubator-flink-dev/201412.mbox/%3CCANBGL8uzpthoapQRZPK1v8seFcTM%3DCFA2-MRECkfiNg4LXmbLA%40mail.gmail.com%3E
>
> Cheers,
>
> Marton
>
> On Fri, Dec 12, 2014 at 8:25 PM, Henry Saputra <[hidden email]>
> wrote:
> >
> > Related to Zeppelin [1], looks like it is sponsored/ developed by a
> > company in Korea [2] that has nothing to do with football
> > unfortunately (I thought they were the same team that does
> > http://www.nfl.com/stats/statslab),
> > I was kinda disappointed at the beginning =P
> >
> > But anyway seemed like integration with Flink would be considered as
> > potential next one  =)
> >
> > Just want to make sure I clear up my comments in the hangout.
> >
> > - Henry
> >
> > [1] https://github.com/NFLabs/zeppelin/blob/master/README.md
> > [2] http://www.nflabs.com
> >
> > On Fri, Dec 12, 2014 at 3:22 AM, Gyula Fóra <[hidden email]>
> wrote:
> > > Hey All,
> > >
> > > I have created a google hangout for today's Streaming discussion:
> > > https://plus.google.com/hangouts/_/gws3f77u5fee5euehtw7zwob2qa
> > >
> > > As we have agreed we will start today at 17:30 CET / 08:30 PST and I
> > think
> > > we'll be around for those who can only join later as well.
> > >
> > > Please find the topics here
> > > <https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming>.
> > >
> > > Gyula
> >
>