Hey All,
I have created a google hangout for today's Streaming discussion: https://plus.google.com/hangouts/_/gws3f77u5fee5euehtw7zwob2qa As we have agreed we will start today at 17:30 CET / 08:30 PST and I think we'll be around for those who can only join later as well. Please find the topics here <https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming>. Gyula |
Thanks for arranging the hangout Gyula, sorry I have to miss the it.
I will help review the proposed doc in the wiki and add comments/questions as they come up. - Henry On Fri, Dec 12, 2014 at 3:22 AM, Gyula Fóra <[hidden email]> wrote: > Hey All, > > I have created a google hangout for today's Streaming discussion: > https://plus.google.com/hangouts/_/gws3f77u5fee5euehtw7zwob2qa > > As we have agreed we will start today at 17:30 CET / 08:30 PST and I think > we'll be around for those who can only join later as well. > > Please find the topics here > <https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming>. > > Gyula |
In reply to this post by Gyula Fóra
Related to Zeppelin [1], looks like it is sponsored/ developed by a
company in Korea [2] that has nothing to do with football unfortunately (I thought they were the same team that does http://www.nfl.com/stats/statslab), I was kinda disappointed at the beginning =P But anyway seemed like integration with Flink would be considered as potential next one =) Just want to make sure I clear up my comments in the hangout. - Henry [1] https://github.com/NFLabs/zeppelin/blob/master/README.md [2] http://www.nflabs.com On Fri, Dec 12, 2014 at 3:22 AM, Gyula Fóra <[hidden email]> wrote: > Hey All, > > I have created a google hangout for today's Streaming discussion: > https://plus.google.com/hangouts/_/gws3f77u5fee5euehtw7zwob2qa > > As we have agreed we will start today at 17:30 CET / 08:30 PST and I think > we'll be around for those who can only join later as well. > > Please find the topics here > <https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming>. > > Gyula |
The hangout was not recorded, so I'm providing a short write-up on the
issues and decisions. The discussion was 2 hours long, so please feel free to add the important statements I have missed. The initial ideas are listed on the project wiki as Flink Streaming roadmap. [1] The hangout yielded the following additions: * Fault tolerance: We have a (mostly) working prototype not yet merged for at least once semantics, that works similarly to Storm. A missing feature on the streaming side is vertex restarts in the ExecutionGraph, which will be made easier with Ufuk's intermediate results [2] pull request, which will be merged after the 0.8 release. As for exactly once semantics the preferred option was upstream backup, which is conceptually the same as backtracking until an intermediate result is found - given that intermediate results are stored at every vertex. * A common pipeline architecture for batch and streaming: The original idea was to have just one ExecutionEnvironment which can convert DataSets to DataStreams and vice versa. Gyula hacked together a small prototype where a DataSet was fed into a DataStream, but for seamless integration large refactor would have been needed. Stephan stepped in with the idea that most likely only the DataSet to DataStream option should be supported and initially let's work it through materializing the batch result in some in-memory abstraction or even files. This would results in building separate batch and streaming JobGraphs, and thus addressing optimization, fault tolerance etc. separately. Gyula mentioned Chiwan Park's pending PR on using HDFS updates as a streaming source as a possible solution for feeding the results of recurring batch jobs into streaming. * API integrations: We've just added java 8 support to the streaming API and started working on the Scala API as well, which seems to be a low hanging fruit standing on Aljoscha's shoulder. A next step would be also adding the Python API and building on that providing a notebook-like "IDE", with e.g. Zeppelin. [4] This is also (in fact mainly) interesting for batch processing. For further integrations a scala shell should be really useful. According to Stephan the latter should not be too challenging, mostly API and some Scheduler work is required. * Multiparadigm (batch & streaming) ML: Opening to the machine learning direction Paris and Vasia took up the SAMOA [5] integration issue, which would provide streaming machine learning support and also comparability with Storm, S4 and Samza. Kostas mentioned that the Mahout port to Flink is also an on-going effort. Further topics included the state of the 0.8 release, for which the first release candidate should come next week; the streaming windowing rework lead by Jonas [6], and conceptional comparison of Spark and Flink initiated by Henry. Special thanks to Mayur for tuning in despite of being around midnight in India and providing valuable insight on Tachyon. [1] https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming [2] https://github.com/apache/incubator-flink/pull/254 [3] https://github.com/apache/incubator-flink/pull/226 [4] https://github.com/NFLabs/zeppelin/blob/master/README.md [5] http://samoa-project.net/ [6] http://mail-archives.apache.org/mod_mbox/incubator-flink-dev/201412.mbox/%3CCANBGL8uzpthoapQRZPK1v8seFcTM%3DCFA2-MRECkfiNg4LXmbLA%40mail.gmail.com%3E Cheers, Marton On Fri, Dec 12, 2014 at 8:25 PM, Henry Saputra <[hidden email]> wrote: > > Related to Zeppelin [1], looks like it is sponsored/ developed by a > company in Korea [2] that has nothing to do with football > unfortunately (I thought they were the same team that does > http://www.nfl.com/stats/statslab), > I was kinda disappointed at the beginning =P > > But anyway seemed like integration with Flink would be considered as > potential next one =) > > Just want to make sure I clear up my comments in the hangout. > > - Henry > > [1] https://github.com/NFLabs/zeppelin/blob/master/README.md > [2] http://www.nflabs.com > > On Fri, Dec 12, 2014 at 3:22 AM, Gyula Fóra <[hidden email]> wrote: > > Hey All, > > > > I have created a google hangout for today's Streaming discussion: > > https://plus.google.com/hangouts/_/gws3f77u5fee5euehtw7zwob2qa > > > > As we have agreed we will start today at 17:30 CET / 08:30 PST and I > think > > we'll be around for those who can only join later as well. > > > > Please find the topics here > > <https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming>. > > > > Gyula > |
Thanks for the recap, and sorry for being unable to join in the end.
If there is anything I can do to help with the integration of SAMOA don't hesitate to ask. Cheers, -- Gianmarco On 12 December 2014 at 21:35, Márton Balassi <[hidden email]> wrote: > > The hangout was not recorded, so I'm providing a short write-up on the > issues and decisions. The discussion was 2 hours long, so please feel free > to add the important statements I have missed. > > The initial ideas are listed on the project wiki as Flink Streaming > roadmap. [1] The hangout yielded the following additions: > > * Fault tolerance: We have a (mostly) working prototype not yet merged > for at least once semantics, that works similarly to Storm. A missing > feature on the streaming side is vertex restarts in the ExecutionGraph, > which will be made easier with Ufuk's intermediate results [2] pull > request, which will be merged after the 0.8 release. As for exactly once > semantics the preferred option was upstream backup, which is conceptually > the same as backtracking until an intermediate result is found - given that > intermediate results are stored at every vertex. > > * A common pipeline architecture for batch and streaming: The original > idea was to have just one ExecutionEnvironment which can convert DataSets > to DataStreams and vice versa. Gyula hacked together a small prototype > where a DataSet was fed into a DataStream, but for seamless integration > large refactor would have been needed. Stephan stepped in with the idea > that most likely only the DataSet to DataStream option should be supported > and initially let's work it through materializing the batch result in some > in-memory abstraction or even files. This would results in building > separate batch and streaming JobGraphs, and thus addressing optimization, > fault tolerance etc. separately. Gyula mentioned Chiwan Park's pending PR > on using HDFS updates as a streaming source as a possible solution for > feeding the results of recurring batch jobs into streaming. > > * API integrations: We've just added java 8 support to the streaming API > and started working on the Scala API as well, which seems to be a low > hanging fruit standing on Aljoscha's shoulder. A next step would be also > adding the Python API and building on that providing a notebook-like "IDE", > with e.g. Zeppelin. [4] This is also (in fact mainly) interesting for batch > processing. For further integrations a scala shell should be really useful. > According to Stephan the latter should not be too challenging, mostly API > and some Scheduler work is required. > > * Multiparadigm (batch & streaming) ML: Opening to the machine learning > direction Paris and Vasia took up the SAMOA [5] integration issue, which > would provide streaming machine learning support and also comparability > with Storm, S4 and Samza. Kostas mentioned that the Mahout port to Flink is > also an on-going effort. > > Further topics included the state of the 0.8 release, for which the first > release candidate should come next week; the streaming windowing rework > lead by Jonas [6], and conceptional comparison of Spark and Flink initiated > by Henry. > > Special thanks to Mayur for tuning in despite of being around midnight in > India and providing valuable insight on Tachyon. > > [1] https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming > [2] https://github.com/apache/incubator-flink/pull/254 > [3] https://github.com/apache/incubator-flink/pull/226 > [4] https://github.com/NFLabs/zeppelin/blob/master/README.md > [5] http://samoa-project.net/ > [6] > > http://mail-archives.apache.org/mod_mbox/incubator-flink-dev/201412.mbox/%3CCANBGL8uzpthoapQRZPK1v8seFcTM%3DCFA2-MRECkfiNg4LXmbLA%40mail.gmail.com%3E > > Cheers, > > Marton > > On Fri, Dec 12, 2014 at 8:25 PM, Henry Saputra <[hidden email]> > wrote: > > > > Related to Zeppelin [1], looks like it is sponsored/ developed by a > > company in Korea [2] that has nothing to do with football > > unfortunately (I thought they were the same team that does > > http://www.nfl.com/stats/statslab), > > I was kinda disappointed at the beginning =P > > > > But anyway seemed like integration with Flink would be considered as > > potential next one =) > > > > Just want to make sure I clear up my comments in the hangout. > > > > - Henry > > > > [1] https://github.com/NFLabs/zeppelin/blob/master/README.md > > [2] http://www.nflabs.com > > > > On Fri, Dec 12, 2014 at 3:22 AM, Gyula Fóra <[hidden email]> > wrote: > > > Hey All, > > > > > > I have created a google hangout for today's Streaming discussion: > > > https://plus.google.com/hangouts/_/gws3f77u5fee5euehtw7zwob2qa > > > > > > As we have agreed we will start today at 17:30 CET / 08:30 PST and I > > think > > > we'll be around for those who can only join later as well. > > > > > > Please find the topics here > > > <https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming>. > > > > > > Gyula > > > |
Free forum by Nabble | Edit this page |