To extend the functionality of Flink a separate branch of development was
dedicated for low latency, distributed stream processing support. The development started during March of 2014 and is approaching a state where it might be considered a candidate for becoming part of the main repository. As of today a WordCount <https://github.com/stratosphere/stratosphere-streaming/blob/master/src/main/java/eu/stratosphere/streaming/examples/wordcount/WordCountLocal.java#L30-41> example streaming program would fairly similar to the one that the batch API provides: StreamExecutionEnvironment env = new StreamExecutionEnvironment(); DataStream<Tuple2<String, Integer>> dataStream = env.readTextFile("src/test/resources/testdata/hamlet.txt") .flatMap(new WordCountSplitter()) .partitionBy(0) .map(new WordCountCounter()); dataStream.print(); env.execute(); The user defined functions are extending the same classes as in the batch case (e.g. a FlatMapFunction for a flatmap, see WordCountSplitter <https://github.com/stratosphere/stratosphere-streaming/blob/master/src/main/java/eu/stratosphere/streaming/examples/wordcount/WordCountSplitter.java>) thus providing code interusability between the two approaches. As for performance the 0.1 version <https://github.com/stratosphere/stratosphere-streaming/tree/release-0.1> released in the beginning of June was slightly better on a single core then Apache Storm, one of the major players of the field. Cluster performance needs further optimization. This version provided a lower level API, fairly similar to the one Storm has. For a deeper dive on this state of the development and the challenges faced please refer to the slides <http://info.ilab.sztaki.hu/~mbalassi/dw_forum_2014/dw_forum_streaming.pdf> of a talk form the early days of June. The 0.2 release is coming soon with the the above demonstrated new API and improved single core performance. To complete the release the cluster performance is being measured, and the code is being decomposed into three subprojects separating core, example and addon functionality. As for the future fault tolerance is an unresolved issue and as a part of the Google Summer of Code project an intern is working on iterative stream processing. The project is mainly developed at Budapest by three members employed by Hungarian Academy of Sciences and Eötvös Loránd University and Frank Wu, our Google Summer of Code student from Singapore. This summer the Hungarian Academy of Sciences also dedicated 4 interns to the project. The proposed 0.2 release is still dependant on the 0.5 release of Stratosphere, however on branch snapshot-0.6 <https://github.com/stratosphere/stratosphere-streaming/tree/snapshot-0.6> the dependencies are updated to 0.6-snapshot, thus the codebase is ready for becoming part of the main project - preferably a part of addons until it becomes stable. Looking forward to your suggestions. Cheers, Márton, Gyula & Gábor |
Hey,
I would like to give a quick update on the status of the flink streaming project; all of our dependencies are now updated to the current 0.6-snapshot in our main branch, and the project is now decomposed into 3 subprojects: core, examples, and addons. We have created a separate branch for our 0.2 release with dependencies to 0.5, however from now on we will focus our development efforts to be able to merge our main branch with the Main Flink project. Regards, Gyula, Márton & Gábor On Wed, Jul 2, 2014 at 2:05 PM, Márton Balassi <[hidden email]> wrote: > To extend the functionality of Flink a separate branch of development was > dedicated for low latency, distributed stream processing support. The > development started during March of 2014 and is approaching a state where > it might be considered a candidate for becoming part of the main > repository. > > As of today a WordCount > < > https://github.com/stratosphere/stratosphere-streaming/blob/master/src/main/java/eu/stratosphere/streaming/examples/wordcount/WordCountLocal.java#L30-41 > > > example streaming program would fairly similar to the one that the batch > API provides: > > StreamExecutionEnvironment env = new StreamExecutionEnvironment(); > > DataStream<Tuple2<String, Integer>> dataStream = > > env.readTextFile("src/test/resources/testdata/hamlet.txt") > .flatMap(new > WordCountSplitter()) > .partitionBy(0) > .map(new > WordCountCounter()); > > dataStream.print(); > > env.execute(); > > The user defined functions are extending the same classes as in the batch > case (e.g. a FlatMapFunction for a flatmap, see WordCountSplitter > < > https://github.com/stratosphere/stratosphere-streaming/blob/master/src/main/java/eu/stratosphere/streaming/examples/wordcount/WordCountSplitter.java > >) > thus providing code interusability between the two approaches. > > As for performance the 0.1 version > <https://github.com/stratosphere/stratosphere-streaming/tree/release-0.1> > released in the beginning of June was slightly better on a single core then > Apache Storm, one of the major players of the field. Cluster performance > needs further optimization. This version provided a lower level API, fairly > similar to the one Storm has. For a deeper dive on this state of the > development and the challenges faced please refer to the slides > <http://info.ilab.sztaki.hu/~mbalassi/dw_forum_2014/dw_forum_streaming.pdf > > > of a talk form the early days of June. > > The 0.2 release is coming soon with the the above demonstrated new API and > improved single core performance. To complete the release the cluster > performance is being measured, and the code is being decomposed into three > subprojects separating core, example and addon functionality. > > As for the future fault tolerance is an unresolved issue and as a part of > the Google Summer of Code project an intern is working on iterative stream > processing. > > The project is mainly developed at Budapest by three members employed by > Hungarian Academy of Sciences and Eötvös Loránd University and Frank Wu, > our Google Summer of Code student from Singapore. This summer the Hungarian > Academy of Sciences also dedicated 4 interns to the project. > > The proposed 0.2 release is still dependant on the 0.5 release of > Stratosphere, however on branch snapshot-0.6 > <https://github.com/stratosphere/stratosphere-streaming/tree/snapshot-0.6> > the > dependencies are updated to 0.6-snapshot, thus the codebase is ready for > becoming part of the main project - preferably a part of addons until it > becomes stable. > > Looking forward to your suggestions. > > Cheers, > > Márton, Gyula & Gábor > |
Hi!
Thanks for the update! From y side, +1 for adding the code. One question though: What part of the code is in your "addons" project? I am wondering if that may cause confusion, because (as per the discussion via hangout last week), we want to add the streaming code initially to the "flink-addons" project. Stephan |
Yeah, this might be slightly confusing - for clarifying the situation:
- Right under the streaming-addons one can find basic connectors for message queue services - at the moment Kafka and RabbitMQ. We considered this "classical" addon functionality. - Additionally the job used for performance measurements is also under addons, but I'm removing it. - As usually addons mean surplus dependencies I am for separating them. Would you suggest another name then? Streaming-connectors e.g.? On Mon, Jul 7, 2014 at 11:48 AM, Stephan Ewen <[hidden email]> wrote: > Hi! > > Thanks for the update! > > From y side, +1 for adding the code. > > One question though: What part of the code is in your "addons" project? I > am wondering if that may cause confusion, because (as per the discussion > via hangout last week), we want to add the streaming code initially to the > "flink-addons" project. > > Stephan > |
I like the name "connectors"
|
In reply to this post by Márton Balassi
On 07 Jul 2014, at 12:06, Márton Balassi <[hidden email]> wrote: > Yeah, this might be slightly confusing - for clarifying the situation: > > > - Right under the streaming-addons one can find basic connectors for > message queue services - at the moment Kafka and RabbitMQ. We considered > this "classical" addon functionality. > - Additionally the job used for performance measurements is also under > addons, but I'm removing it. > - As usually addons mean surplus dependencies I am for separating them. > Would you suggest another name then? Streaming-connectors e.g.? I also like connectors. Why are you removing the performance measurements stuff? |
The utilites that we used for performance measurements have no direct
connections to this project. We thought it would make sense to move them out into a separate repo since we are constantly modifying the settings for the actual tests. On Mon, Jul 7, 2014 at 2:30 PM, Ufuk Celebi <[hidden email]> wrote: > > On 07 Jul 2014, at 12:06, Márton Balassi <[hidden email]> wrote: > > > Yeah, this might be slightly confusing - for clarifying the situation: > > > > > > - Right under the streaming-addons one can find basic connectors for > > message queue services - at the moment Kafka and RabbitMQ. We > considered > > this "classical" addon functionality. > > - Additionally the job used for performance measurements is also under > > addons, but I'm removing it. > > - As usually addons mean surplus dependencies I am for separating them. > > Would you suggest another name then? Streaming-connectors e.g.? > > I also like connectors. > > Why are you removing the performance measurements stuff? |
Free forum by Nabble | Edit this page |