Hi all!
ApacheCon is coming up and it is the 15th anniversary of the Apache Software Foundation. In the course of the conference, Apache would like to make a series of announcements. If we manage to make a release during (or shortly before) ApacheCon, they will announce it through their channels. I am very much in favor of doing this, under the strong condition that we are very confident that the master has grown to be stable enough (there are major changes in the distributed runtime since version 0.8 that we are still stabilizing). No use in a widely announced build that does not have the quality. Flink has now many new features that warrant a release soon (once we fixed the last quirks in the new distributed runtime). Notable new features are: - Gelly - Streaming windows - Flink on Tez - Expression API - Distributed Runtime on Akka - Batch mode - Maybe even a first ML library version - Some streaming fault tolerance Robert proposed to have a feature freeze mid Match for that. His cornerpoints were: Feature freeze (forking off "release-0.9"): March 17 RC1 vote: March 24 The RC1 vote is 20 days before the ApacheCon (13. April). For the last three releases, the average voting time was 20 days: R 0.8.0 --> 14 days R 0.7.0 --> 22 days R 0.6 --> 26 days Please share your opinion on this! Greetings, Stephan |
Hey,
We have a nice list of new features - it definitely makes sense to have that as a release. On my side I really want to have a first limited version of streaming fault tolerance in it. +1 for Robert's proposal for the deadlines. I'm also volunteering for release manager. Best, Marton On Mon, Mar 2, 2015 at 2:03 PM, Stephan Ewen <[hidden email]> wrote: > Hi all! > > ApacheCon is coming up and it is the 15th anniversary of the Apache > Software Foundation. > > In the course of the conference, Apache would like to make a series of > announcements. If we manage to make a release during (or shortly before) > ApacheCon, they will announce it through their channels. > > I am very much in favor of doing this, under the strong condition that we > are very confident that the master has grown to be stable enough (there are > major changes in the distributed runtime since version 0.8 that we are > still stabilizing). No use in a widely announced build that does not have > the quality. > > Flink has now many new features that warrant a release soon (once we fixed > the last quirks in the new distributed runtime). > > Notable new features are: > - Gelly > - Streaming windows > - Flink on Tez > - Expression API > - Distributed Runtime on Akka > - Batch mode > - Maybe even a first ML library version > - Some streaming fault tolerance > > Robert proposed to have a feature freeze mid Match for that. His > cornerpoints were: > > Feature freeze (forking off "release-0.9"): March 17 > RC1 vote: March 24 > > The RC1 vote is 20 days before the ApacheCon (13. April). > For the last three releases, the average voting time was 20 days: > R 0.8.0 --> 14 days > R 0.7.0 --> 22 days > R 0.6 --> 26 days > > Please share your opinion on this! > > > Greetings, > Stephan > |
In reply to this post by Stephan Ewen
HI Stephan,
What is "Batch mode" feature in the list? - Henry On Mon, Mar 2, 2015 at 5:03 AM, Stephan Ewen <[hidden email]> wrote: > Hi all! > > ApacheCon is coming up and it is the 15th anniversary of the Apache > Software Foundation. > > In the course of the conference, Apache would like to make a series of > announcements. If we manage to make a release during (or shortly before) > ApacheCon, they will announce it through their channels. > > I am very much in favor of doing this, under the strong condition that we > are very confident that the master has grown to be stable enough (there are > major changes in the distributed runtime since version 0.8 that we are > still stabilizing). No use in a widely announced build that does not have > the quality. > > Flink has now many new features that warrant a release soon (once we fixed > the last quirks in the new distributed runtime). > > Notable new features are: > - Gelly > - Streaming windows > - Flink on Tez > - Expression API > - Distributed Runtime on Akka > - Batch mode > - Maybe even a first ML library version > - Some streaming fault tolerance > > Robert proposed to have a feature freeze mid Match for that. His > cornerpoints were: > > Feature freeze (forking off "release-0.9"): March 17 > RC1 vote: March 24 > > The RC1 vote is 20 days before the ApacheCon (13. April). > For the last three releases, the average voting time was 20 days: > R 0.8.0 --> 14 days > R 0.7.0 --> 22 days > R 0.6 --> 26 days > > Please share your opinion on this! > > > Greetings, > Stephan |
Hi Henry,
Batch mode is a new execution mode for batch Flink jobs where instead of pipelining the whole execution the job is scheduled in stages, thus materializing the intermediate result before continuing to the next operators. For implications see [1]. [1] http://www.slideshare.net/KostasTzoumas/flink-internals, page 18-21. On Mon, Mar 2, 2015 at 11:39 PM, Henry Saputra <[hidden email]> wrote: > HI Stephan, > > What is "Batch mode" feature in the list? > > - Henry > > On Mon, Mar 2, 2015 at 5:03 AM, Stephan Ewen <[hidden email]> wrote: > > Hi all! > > > > ApacheCon is coming up and it is the 15th anniversary of the Apache > > Software Foundation. > > > > In the course of the conference, Apache would like to make a series of > > announcements. If we manage to make a release during (or shortly before) > > ApacheCon, they will announce it through their channels. > > > > I am very much in favor of doing this, under the strong condition that we > > are very confident that the master has grown to be stable enough (there > are > > major changes in the distributed runtime since version 0.8 that we are > > still stabilizing). No use in a widely announced build that does not have > > the quality. > > > > Flink has now many new features that warrant a release soon (once we > fixed > > the last quirks in the new distributed runtime). > > > > Notable new features are: > > - Gelly > > - Streaming windows > > - Flink on Tez > > - Expression API > > - Distributed Runtime on Akka > > - Batch mode > > - Maybe even a first ML library version > > - Some streaming fault tolerance > > > > Robert proposed to have a feature freeze mid Match for that. His > > cornerpoints were: > > > > Feature freeze (forking off "release-0.9"): March 17 > > RC1 vote: March 24 > > > > The RC1 vote is 20 days before the ApacheCon (13. April). > > For the last three releases, the average voting time was 20 days: > > R 0.8.0 --> 14 days > > R 0.7.0 --> 22 days > > R 0.6 --> 26 days > > > > Please share your opinion on this! > > > > > > Greetings, > > Stephan > |
Ah, thanks Márton.
So we are chartering to the similar concept of Spark RRD staging execution =P I suppose there will be a runtime configuration or hint to tell the Flink Job manager to indicate which execution is preferred? - Henry On Tue, Mar 3, 2015 at 2:09 AM, Márton Balassi <[hidden email]> wrote: > Hi Henry, > > Batch mode is a new execution mode for batch Flink jobs where instead of > pipelining the whole execution the job is scheduled in stages, thus > materializing the intermediate result before continuing to the next > operators. For implications see [1]. > > [1] http://www.slideshare.net/KostasTzoumas/flink-internals, page 18-21. > > > On Mon, Mar 2, 2015 at 11:39 PM, Henry Saputra <[hidden email]> > wrote: > >> HI Stephan, >> >> What is "Batch mode" feature in the list? >> >> - Henry >> >> On Mon, Mar 2, 2015 at 5:03 AM, Stephan Ewen <[hidden email]> wrote: >> > Hi all! >> > >> > ApacheCon is coming up and it is the 15th anniversary of the Apache >> > Software Foundation. >> > >> > In the course of the conference, Apache would like to make a series of >> > announcements. If we manage to make a release during (or shortly before) >> > ApacheCon, they will announce it through their channels. >> > >> > I am very much in favor of doing this, under the strong condition that we >> > are very confident that the master has grown to be stable enough (there >> are >> > major changes in the distributed runtime since version 0.8 that we are >> > still stabilizing). No use in a widely announced build that does not have >> > the quality. >> > >> > Flink has now many new features that warrant a release soon (once we >> fixed >> > the last quirks in the new distributed runtime). >> > >> > Notable new features are: >> > - Gelly >> > - Streaming windows >> > - Flink on Tez >> > - Expression API >> > - Distributed Runtime on Akka >> > - Batch mode >> > - Maybe even a first ML library version >> > - Some streaming fault tolerance >> > >> > Robert proposed to have a feature freeze mid Match for that. His >> > cornerpoints were: >> > >> > Feature freeze (forking off "release-0.9"): March 17 >> > RC1 vote: March 24 >> > >> > The RC1 vote is 20 days before the ApacheCon (13. April). >> > For the last three releases, the average voting time was 20 days: >> > R 0.8.0 --> 14 days >> > R 0.7.0 --> 22 days >> > R 0.6 --> 26 days >> > >> > Please share your opinion on this! >> > >> > >> > Greetings, >> > Stephan >> |
+1 for Marton as a release manager. Thank you!
On Tue, Mar 3, 2015 at 7:56 PM, Henry Saputra <[hidden email]> wrote: > Ah, thanks Márton. > > So we are chartering to the similar concept of Spark RRD staging execution > =P > I suppose there will be a runtime configuration or hint to tell the > Flink Job manager to indicate which execution is preferred? > > > - Henry > > On Tue, Mar 3, 2015 at 2:09 AM, Márton Balassi <[hidden email]> > wrote: > > Hi Henry, > > > > Batch mode is a new execution mode for batch Flink jobs where instead of > > pipelining the whole execution the job is scheduled in stages, thus > > materializing the intermediate result before continuing to the next > > operators. For implications see [1]. > > > > [1] http://www.slideshare.net/KostasTzoumas/flink-internals, page 18-21. > > > > > > On Mon, Mar 2, 2015 at 11:39 PM, Henry Saputra <[hidden email]> > > wrote: > > > >> HI Stephan, > >> > >> What is "Batch mode" feature in the list? > >> > >> - Henry > >> > >> On Mon, Mar 2, 2015 at 5:03 AM, Stephan Ewen <[hidden email]> wrote: > >> > Hi all! > >> > > >> > ApacheCon is coming up and it is the 15th anniversary of the Apache > >> > Software Foundation. > >> > > >> > In the course of the conference, Apache would like to make a series of > >> > announcements. If we manage to make a release during (or shortly > before) > >> > ApacheCon, they will announce it through their channels. > >> > > >> > I am very much in favor of doing this, under the strong condition > that we > >> > are very confident that the master has grown to be stable enough > (there > >> are > >> > major changes in the distributed runtime since version 0.8 that we are > >> > still stabilizing). No use in a widely announced build that does not > have > >> > the quality. > >> > > >> > Flink has now many new features that warrant a release soon (once we > >> fixed > >> > the last quirks in the new distributed runtime). > >> > > >> > Notable new features are: > >> > - Gelly > >> > - Streaming windows > >> > - Flink on Tez > >> > - Expression API > >> > - Distributed Runtime on Akka > >> > - Batch mode > >> > - Maybe even a first ML library version > >> > - Some streaming fault tolerance > >> > > >> > Robert proposed to have a feature freeze mid Match for that. His > >> > cornerpoints were: > >> > > >> > Feature freeze (forking off "release-0.9"): March 17 > >> > RC1 vote: March 24 > >> > > >> > The RC1 vote is 20 days before the ApacheCon (13. April). > >> > For the last three releases, the average voting time was 20 days: > >> > R 0.8.0 --> 14 days > >> > R 0.7.0 --> 22 days > >> > R 0.6 --> 26 days > >> > > >> > Please share your opinion on this! > >> > > >> > > >> > Greetings, > >> > Stephan > >> > |
Hey,
whats the status on this? There is one week left until we are going to fork off a branch for 0.9 .. if we stick to the suggested timeline. The initial email said "I am very much in favor of doing this, under the strong condition that we are very confident that the master has grown to be stable enough". I think it is time to evaluate whether we are confident that the master is stable. Best Robert On Wed, Mar 4, 2015 at 9:42 AM, Robert Metzger <[hidden email]> wrote: > +1 for Marton as a release manager. Thank you! > > > On Tue, Mar 3, 2015 at 7:56 PM, Henry Saputra <[hidden email]> > wrote: > >> Ah, thanks Márton. >> >> So we are chartering to the similar concept of Spark RRD staging >> execution =P >> I suppose there will be a runtime configuration or hint to tell the >> Flink Job manager to indicate which execution is preferred? >> >> >> - Henry >> >> On Tue, Mar 3, 2015 at 2:09 AM, Márton Balassi <[hidden email]> >> wrote: >> > Hi Henry, >> > >> > Batch mode is a new execution mode for batch Flink jobs where instead of >> > pipelining the whole execution the job is scheduled in stages, thus >> > materializing the intermediate result before continuing to the next >> > operators. For implications see [1]. >> > >> > [1] http://www.slideshare.net/KostasTzoumas/flink-internals, page >> 18-21. >> > >> > >> > On Mon, Mar 2, 2015 at 11:39 PM, Henry Saputra <[hidden email] >> > >> > wrote: >> > >> >> HI Stephan, >> >> >> >> What is "Batch mode" feature in the list? >> >> >> >> - Henry >> >> >> >> On Mon, Mar 2, 2015 at 5:03 AM, Stephan Ewen <[hidden email]> wrote: >> >> > Hi all! >> >> > >> >> > ApacheCon is coming up and it is the 15th anniversary of the Apache >> >> > Software Foundation. >> >> > >> >> > In the course of the conference, Apache would like to make a series >> of >> >> > announcements. If we manage to make a release during (or shortly >> before) >> >> > ApacheCon, they will announce it through their channels. >> >> > >> >> > I am very much in favor of doing this, under the strong condition >> that we >> >> > are very confident that the master has grown to be stable enough >> (there >> >> are >> >> > major changes in the distributed runtime since version 0.8 that we >> are >> >> > still stabilizing). No use in a widely announced build that does not >> have >> >> > the quality. >> >> > >> >> > Flink has now many new features that warrant a release soon (once we >> >> fixed >> >> > the last quirks in the new distributed runtime). >> >> > >> >> > Notable new features are: >> >> > - Gelly >> >> > - Streaming windows >> >> > - Flink on Tez >> >> > - Expression API >> >> > - Distributed Runtime on Akka >> >> > - Batch mode >> >> > - Maybe even a first ML library version >> >> > - Some streaming fault tolerance >> >> > >> >> > Robert proposed to have a feature freeze mid Match for that. His >> >> > cornerpoints were: >> >> > >> >> > Feature freeze (forking off "release-0.9"): March 17 >> >> > RC1 vote: March 24 >> >> > >> >> > The RC1 vote is 20 days before the ApacheCon (13. April). >> >> > For the last three releases, the average voting time was 20 days: >> >> > R 0.8.0 --> 14 days >> >> > R 0.7.0 --> 22 days >> >> > R 0.6 --> 26 days >> >> > >> >> > Please share your opinion on this! >> >> > >> >> > >> >> > Greetings, >> >> > Stephan >> >> >> > > |
On the streaming side:
Must have: * Tests for the fault tolerance (My first priority this week) * Merging Gyula's recent windowing PR [1] Really needed: * Self-join for DataStreams (Gabor has a prototype, PR coming today) [1] * ITCase tests for streaming examples (Peter & myself, review and clean up pending) [3] * Different streaming/batch cluster memory settings (Stephan) [4] * Make projection operator chainable (Gabor Gevay - a wannabe GSoC student, PR coming soon) [5] * Parallel time discretization (Gyula, PR coming tomorrow) [6] Would be nice to have: * Complex integration test for streaming (Peter) [7] * Extend streaming aggregation tests to include POJOs [8] * Iteration bug for large input [9] We would also need a general pass over the streaming API for javadocs. This is not one week but we can hopefully fit into two weeks. [1] https://github.com/apache/flink/pull/465 [2] https://issues.apache.org/jira/browse/FLINK-1594 [3] https://issues.apache.org/jira/browse/FLINK-1560 [4] https://issues.apache.org/jira/browse/FLINK-1368 [5] https://issues.apache.org/jira/browse/FLINK-1641 [6] https://issues.apache.org/jira/browse/FLINK-1618 [7] https://issues.apache.org/jira/browse/FLINK-1595 [8] https://issues.apache.org/jira/browse/FLINK-1544 [9] https://issues.apache.org/jira/browse/FLINK-1239 On Tue, Mar 10, 2015 at 11:20 AM, Robert Metzger <[hidden email]> wrote: > Hey, > > whats the status on this? There is one week left until we are going to fork > off a branch for 0.9 .. if we stick to the suggested timeline. > The initial email said "I am very much in favor of doing this, under the > strong condition that we > are very confident that the master has grown to be stable enough". I think > it is time to evaluate whether we are confident that the master is stable. > > Best > Robert > > > > On Wed, Mar 4, 2015 at 9:42 AM, Robert Metzger <[hidden email]> > wrote: > > > +1 for Marton as a release manager. Thank you! > > > > > > On Tue, Mar 3, 2015 at 7:56 PM, Henry Saputra <[hidden email]> > > wrote: > > > >> Ah, thanks Márton. > >> > >> So we are chartering to the similar concept of Spark RRD staging > >> execution =P > >> I suppose there will be a runtime configuration or hint to tell the > >> Flink Job manager to indicate which execution is preferred? > >> > >> > >> - Henry > >> > >> On Tue, Mar 3, 2015 at 2:09 AM, Márton Balassi < > [hidden email]> > >> wrote: > >> > Hi Henry, > >> > > >> > Batch mode is a new execution mode for batch Flink jobs where instead > of > >> > pipelining the whole execution the job is scheduled in stages, thus > >> > materializing the intermediate result before continuing to the next > >> > operators. For implications see [1]. > >> > > >> > [1] http://www.slideshare.net/KostasTzoumas/flink-internals, page > >> 18-21. > >> > > >> > > >> > On Mon, Mar 2, 2015 at 11:39 PM, Henry Saputra < > [hidden email] > >> > > >> > wrote: > >> > > >> >> HI Stephan, > >> >> > >> >> What is "Batch mode" feature in the list? > >> >> > >> >> - Henry > >> >> > >> >> On Mon, Mar 2, 2015 at 5:03 AM, Stephan Ewen <[hidden email]> > wrote: > >> >> > Hi all! > >> >> > > >> >> > ApacheCon is coming up and it is the 15th anniversary of the Apache > >> >> > Software Foundation. > >> >> > > >> >> > In the course of the conference, Apache would like to make a series > >> of > >> >> > announcements. If we manage to make a release during (or shortly > >> before) > >> >> > ApacheCon, they will announce it through their channels. > >> >> > > >> >> > I am very much in favor of doing this, under the strong condition > >> that we > >> >> > are very confident that the master has grown to be stable enough > >> (there > >> >> are > >> >> > major changes in the distributed runtime since version 0.8 that we > >> are > >> >> > still stabilizing). No use in a widely announced build that does > not > >> have > >> >> > the quality. > >> >> > > >> >> > Flink has now many new features that warrant a release soon (once > we > >> >> fixed > >> >> > the last quirks in the new distributed runtime). > >> >> > > >> >> > Notable new features are: > >> >> > - Gelly > >> >> > - Streaming windows > >> >> > - Flink on Tez > >> >> > - Expression API > >> >> > - Distributed Runtime on Akka > >> >> > - Batch mode > >> >> > - Maybe even a first ML library version > >> >> > - Some streaming fault tolerance > >> >> > > >> >> > Robert proposed to have a feature freeze mid Match for that. His > >> >> > cornerpoints were: > >> >> > > >> >> > Feature freeze (forking off "release-0.9"): March 17 > >> >> > RC1 vote: March 24 > >> >> > > >> >> > The RC1 vote is 20 days before the ApacheCon (13. April). > >> >> > For the last three releases, the average voting time was 20 days: > >> >> > R 0.8.0 --> 14 days > >> >> > R 0.7.0 --> 22 days > >> >> > R 0.6 --> 26 days > >> >> > > >> >> > Please share your opinion on this! > >> >> > > >> >> > > >> >> > Greetings, > >> >> > Stephan > >> >> > >> > > > > > |
In reply to this post by Robert Metzger
I will follow up again with Sally this week if there any special
messaging or communications needed to do for the Apache Con from our side. - Henry On Tue, Mar 10, 2015 at 3:20 AM, Robert Metzger <[hidden email]> wrote: > Hey, > > whats the status on this? There is one week left until we are going to fork > off a branch for 0.9 .. if we stick to the suggested timeline. > The initial email said "I am very much in favor of doing this, under the > strong condition that we > are very confident that the master has grown to be stable enough". I think > it is time to evaluate whether we are confident that the master is stable. > > Best > Robert > > > > On Wed, Mar 4, 2015 at 9:42 AM, Robert Metzger <[hidden email]> wrote: > >> +1 for Marton as a release manager. Thank you! >> >> >> On Tue, Mar 3, 2015 at 7:56 PM, Henry Saputra <[hidden email]> >> wrote: >> >>> Ah, thanks Márton. >>> >>> So we are chartering to the similar concept of Spark RRD staging >>> execution =P >>> I suppose there will be a runtime configuration or hint to tell the >>> Flink Job manager to indicate which execution is preferred? >>> >>> >>> - Henry >>> >>> On Tue, Mar 3, 2015 at 2:09 AM, Márton Balassi <[hidden email]> >>> wrote: >>> > Hi Henry, >>> > >>> > Batch mode is a new execution mode for batch Flink jobs where instead of >>> > pipelining the whole execution the job is scheduled in stages, thus >>> > materializing the intermediate result before continuing to the next >>> > operators. For implications see [1]. >>> > >>> > [1] http://www.slideshare.net/KostasTzoumas/flink-internals, page >>> 18-21. >>> > >>> > >>> > On Mon, Mar 2, 2015 at 11:39 PM, Henry Saputra <[hidden email] >>> > >>> > wrote: >>> > >>> >> HI Stephan, >>> >> >>> >> What is "Batch mode" feature in the list? >>> >> >>> >> - Henry >>> >> >>> >> On Mon, Mar 2, 2015 at 5:03 AM, Stephan Ewen <[hidden email]> wrote: >>> >> > Hi all! >>> >> > >>> >> > ApacheCon is coming up and it is the 15th anniversary of the Apache >>> >> > Software Foundation. >>> >> > >>> >> > In the course of the conference, Apache would like to make a series >>> of >>> >> > announcements. If we manage to make a release during (or shortly >>> before) >>> >> > ApacheCon, they will announce it through their channels. >>> >> > >>> >> > I am very much in favor of doing this, under the strong condition >>> that we >>> >> > are very confident that the master has grown to be stable enough >>> (there >>> >> are >>> >> > major changes in the distributed runtime since version 0.8 that we >>> are >>> >> > still stabilizing). No use in a widely announced build that does not >>> have >>> >> > the quality. >>> >> > >>> >> > Flink has now many new features that warrant a release soon (once we >>> >> fixed >>> >> > the last quirks in the new distributed runtime). >>> >> > >>> >> > Notable new features are: >>> >> > - Gelly >>> >> > - Streaming windows >>> >> > - Flink on Tez >>> >> > - Expression API >>> >> > - Distributed Runtime on Akka >>> >> > - Batch mode >>> >> > - Maybe even a first ML library version >>> >> > - Some streaming fault tolerance >>> >> > >>> >> > Robert proposed to have a feature freeze mid Match for that. His >>> >> > cornerpoints were: >>> >> > >>> >> > Feature freeze (forking off "release-0.9"): March 17 >>> >> > RC1 vote: March 24 >>> >> > >>> >> > The RC1 vote is 20 days before the ApacheCon (13. April). >>> >> > For the last three releases, the average voting time was 20 days: >>> >> > R 0.8.0 --> 14 days >>> >> > R 0.7.0 --> 22 days >>> >> > R 0.6 --> 26 days >>> >> > >>> >> > Please share your opinion on this! >>> >> > >>> >> > >>> >> > Greetings, >>> >> > Stephan >>> >> >>> >> >> |
In reply to this post by Robert Metzger
On Tue, Mar 10, 2015 at 11:20 AM, Robert Metzger <[hidden email]>
wrote: > I think > it is time to evaluate whether we are confident that the master is stable. > In the course of finishing up #471 [1] I ran 20 Travis builds over night, of which 7 failed. The (unexpected) failing test cases: - ExternalSortITCase.testSpillingSortWithIntermediateMerge:325 Field 0 is null, but expected to hold a key. - JobManagerProcessReapingTest.testReapProcessOnFailure:121 JobManager process did not launch the JobManager properly. Failed to look up The (expected/known-to-fail) failing test cases: - TaskManagerFailsITCase => will be fixed with Shading? - YARN test cases => polluted logs (unrelated to YARN)? Can people, who are familiar with the test cases confirm/explain that the failures are known. Details about failing builds below. (One of the failures is related to the changes in my PR.) [1] https://github.com/apache/flink/pull/471 ---- #327: https://travis-ci.org/uce/incubator-flink/builds/53985832 - 327.1 ( https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985834/log.txt): ExternalSortITCase.testSpillingSortWithIntermediateMerge:325 Field 0 is null, but expected to hold a key. - 327.4 ( https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985838/log.txt): YARNSessionFIFOITCase => exception in taskmanager-strerr.log file #331: https://travis-ci.org/uce/incubator-flink/builds/53985889 - 331.2 ( https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985892/log.txt): Failed due to a change in my PR - 332.3 ( https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985893/log.txt): TaskManagerFailsITCase => expected class org.apache.flink.runtime.messages.JobManagerMessages$JobResultSuccess, found class akka.actor.Status$Failure #332: https://travis-ci.org/uce/incubator-flink/builds/53985900 - 332.3 ( https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985903/log.txt): TaskManagerFailsITCase => expected class org.apache.flink.runtime.messages.JobManagerMessages$JobResultSuccess, found class akka.actor. #338: https://travis-ci.org/uce/incubator-flink/builds/53985981 - 338.5 ( https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985986/log.txt): Failed due to a change in my PR #344. https://travis-ci.org/uce/incubator-flink/builds/53986054 - 344.5 ( https://s3.amazonaws.com/archive.travis-ci.org/jobs/53986059/log.txt): YARNSessionFIFOITCase => exception in taskmanager-strerr.log file #346. https://travis-ci.org/uce/incubator-flink/builds/53986071 - 346.3 ( https://s3.amazonaws.com/archive.travis-ci.org/jobs/53986080/log.txt): JobManagerProcessReapingTest.testReapProcessOnFailure:121 JobManager process did not launch the JobManager properly. Failed to look up JobManager actor at localhost:57964 #347. https://travis-ci.org/uce/incubator-flink/builds/53986111 - 347.5 ( https://s3.amazonaws.com/archive.travis-ci.org/jobs/53986116/log.txt): YARNSessionCapacitySchedulerITCase => exception in jobmanager-strerr.log file |
So you're saying regarding the release you don't feel very confident that
we manage to fork off release-0.9 next week? The exceptions in the jobmanager-stderr from the YARN tests is the following (from #347.5 and #344.5): 07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase - LINE: Mar 12, 2015 7:45:57 AM org.jboss.netty.channel.DefaultChannelPipeline 07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase - LINE: WARNING: An exception was thrown by an exception handler. 07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase - LINE: java.util.concurrent.RejectedExecutionException: Worker has already been shutdown 07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase - LINE: at org.jboss.netty.channel.socket.nio.AbstractNioSelector.registerTask(AbstractNioSelector.java:120) 07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase - LINE: at org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72) 07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase - LINE: at org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36) 07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase - LINE: at org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56) 07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase - LINE: at org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36) 07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase - LINE: at org.jboss.netty.channel.socket.nio.AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34) 07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase - LINE: at org.jboss.netty.channel.Channels.fireExceptionCaughtLater(Channels.java:496) 07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase - LINE: at org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:46) 07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase - LINE: at org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:54) 07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase - LINE: at org.jboss.netty.channel.Channels.disconnect(Channels.java:781) 07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase - LINE: at org.jboss.netty.channel.AbstractChannel.disconnect(AbstractChannel.java:211) 07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase - LINE: at akka.remote.transport.netty.NettyTransport$$anonfun$gracefulClose$1.apply(NettyTransport.scala:223) On Thu, Mar 12, 2015 at 9:51 AM, Ufuk Celebi <[hidden email]> wrote: > On Tue, Mar 10, 2015 at 11:20 AM, Robert Metzger <[hidden email]> > wrote: > > > I think > > it is time to evaluate whether we are confident that the master is > stable. > > > > In the course of finishing up #471 [1] I ran 20 Travis builds over night, > of which 7 failed. > > The (unexpected) failing test cases: > > - ExternalSortITCase.testSpillingSortWithIntermediateMerge:325 Field 0 is > null, but expected to hold a key. > - JobManagerProcessReapingTest.testReapProcessOnFailure:121 JobManager > process did not launch the JobManager properly. Failed to look up > > The (expected/known-to-fail) failing test cases: > > - TaskManagerFailsITCase => will be fixed with Shading? > - YARN test cases => polluted logs (unrelated to YARN)? > > Can people, who are familiar with the test cases confirm/explain that the > failures are known. Details about failing builds below. > > (One of the failures is related to the changes in my PR.) > > [1] https://github.com/apache/flink/pull/471 > > ---- > > #327: https://travis-ci.org/uce/incubator-flink/builds/53985832 > - 327.1 ( > https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985834/log.txt): > ExternalSortITCase.testSpillingSortWithIntermediateMerge:325 Field 0 is > null, but expected to hold a key. > - 327.4 ( > https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985838/log.txt): > YARNSessionFIFOITCase => exception in taskmanager-strerr.log file > > #331: https://travis-ci.org/uce/incubator-flink/builds/53985889 > - 331.2 ( > https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985892/log.txt): > Failed due to a change in my PR > - 332.3 ( > https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985893/log.txt): > TaskManagerFailsITCase => expected class > org.apache.flink.runtime.messages.JobManagerMessages$JobResultSuccess, > found class akka.actor.Status$Failure > > #332: https://travis-ci.org/uce/incubator-flink/builds/53985900 > - 332.3 ( > https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985903/log.txt): > TaskManagerFailsITCase => expected class > org.apache.flink.runtime.messages.JobManagerMessages$JobResultSuccess, > found class akka.actor. > > #338: https://travis-ci.org/uce/incubator-flink/builds/53985981 > - 338.5 ( > https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985986/log.txt): > Failed due to a change in my PR > > #344. https://travis-ci.org/uce/incubator-flink/builds/53986054 > - 344.5 ( > https://s3.amazonaws.com/archive.travis-ci.org/jobs/53986059/log.txt): > YARNSessionFIFOITCase => exception in taskmanager-strerr.log file > > #346. https://travis-ci.org/uce/incubator-flink/builds/53986071 > - 346.3 ( > https://s3.amazonaws.com/archive.travis-ci.org/jobs/53986080/log.txt): > JobManagerProcessReapingTest.testReapProcessOnFailure:121 JobManager > process did not launch the JobManager properly. Failed to look up > JobManager actor at localhost:57964 > > #347. https://travis-ci.org/uce/incubator-flink/builds/53986111 > - 347.5 ( > https://s3.amazonaws.com/archive.travis-ci.org/jobs/53986116/log.txt): > YARNSessionCapacitySchedulerITCase => exception in jobmanager-strerr.log > file > |
On Thu, Mar 12, 2015 at 10:11 AM, Robert Metzger <[hidden email]>
wrote: > So you're saying regarding the release you don't feel very confident that > we manage to fork off release-0.9 next week? > Yes. At the moment I would be uncomfortable with forking off. ---- Regarding the failing tests: I thought that some failings jobs were related to my changes, but after looking into it, it was a false alarm. See comments here: https://github.com/apache/flink/pull/475 |
I would like to get the Expression API for Java in there, as well.
On Thu, Mar 12, 2015 at 11:59 AM, Ufuk Celebi <[hidden email]> wrote: > On Thu, Mar 12, 2015 at 10:11 AM, Robert Metzger <[hidden email]> > wrote: > >> So you're saying regarding the release you don't feel very confident that >> we manage to fork off release-0.9 next week? >> > > Yes. At the moment I would be uncomfortable with forking off. > > ---- > > Regarding the failing tests: I thought that some failings jobs were related > to my changes, but after looking into it, it was a false alarm. See > comments here: https://github.com/apache/flink/pull/475 |
I am also big time skeptical.
There are some remaining stability issues with 0.9 - Apparently a bug in the task canceling - Blocking Data Exchange is premature at this point - TaskManager startup is not robust - TaskManager / JobManager registration is not robust - Streaming fault tolerance needs more testing before we can make an assessment I think this needs a few more weeks... On Thu, Mar 12, 2015 at 1:57 PM, Aljoscha Krettek <[hidden email]> wrote: > I would like to get the Expression API for Java in there, as well. > > On Thu, Mar 12, 2015 at 11:59 AM, Ufuk Celebi <[hidden email]> wrote: > > On Thu, Mar 12, 2015 at 10:11 AM, Robert Metzger <[hidden email]> > > wrote: > > > >> So you're saying regarding the release you don't feel very confident > that > >> we manage to fork off release-0.9 next week? > >> > > > > Yes. At the moment I would be uncomfortable with forking off. > > > > ---- > > > > Regarding the failing tests: I thought that some failings jobs were > related > > to my changes, but after looking into it, it was a false alarm. See > > comments here: https://github.com/apache/flink/pull/475 > |
Have you run the 20 builds with the new shading code? With new shading the
TaskManagerFailsITCase should no longer fail. If it still does, then we have to look into it again. On Thu, Mar 12, 2015 at 2:01 PM, Stephan Ewen <[hidden email]> wrote: > I am also big time skeptical. > > There are some remaining stability issues with 0.9 > - Apparently a bug in the task canceling > - Blocking Data Exchange is premature at this point > - TaskManager startup is not robust > - TaskManager / JobManager registration is not robust > - Streaming fault tolerance needs more testing before we can make an > assessment > > I think this needs a few more weeks... > > On Thu, Mar 12, 2015 at 1:57 PM, Aljoscha Krettek <[hidden email]> > wrote: > > > I would like to get the Expression API for Java in there, as well. > > > > On Thu, Mar 12, 2015 at 11:59 AM, Ufuk Celebi <[hidden email]> wrote: > > > On Thu, Mar 12, 2015 at 10:11 AM, Robert Metzger <[hidden email]> > > > wrote: > > > > > >> So you're saying regarding the release you don't feel very confident > > that > > >> we manage to fork off release-0.9 next week? > > >> > > > > > > Yes. At the moment I would be uncomfortable with forking off. > > > > > > ---- > > > > > > Regarding the failing tests: I thought that some failings jobs were > > related > > > to my changes, but after looking into it, it was a false alarm. See > > > comments here: https://github.com/apache/flink/pull/475 > > > |
On Thursday, March 12, 2015, Till Rohrmann <[hidden email]> wrote:
> Have you run the 20 builds with the new shading code? With new shading the > TaskManagerFailsITCase should no longer fail. If it still does, then we > have to look into it again. No, rebased on Monday before shading. Let me rebase and rerun tonight. |
I've reopened https://issues.apache.org/jira/browse/FLINK-1650 because the
issue is still occurring. On Thu, Mar 12, 2015 at 7:05 PM, Ufuk Celebi <[hidden email]> wrote: > On Thursday, March 12, 2015, Till Rohrmann <[hidden email]> > wrote: > > > Have you run the 20 builds with the new shading code? With new shading > the > > TaskManagerFailsITCase should no longer fail. If it still does, then we > > have to look into it again. > > > No, rebased on Monday before shading. Let me rebase and rerun tonight. > |
Two weeks have passed since we've discussed the 0.9 release the last time.
The ApacheCon is in 18 days from now. If we want, we can also release a "0.9.0-beta" release that contains known bugs, but allows our users to try out the new features easily (because they are part of a release). The vote for such a release would be mainly about the legal aspects of the release rather than the stability. So I suspect that the vote will go through much quicker. On Fri, Mar 13, 2015 at 12:01 PM, Robert Metzger <[hidden email]> wrote: > I've reopened https://issues.apache.org/jira/browse/FLINK-1650 because > the issue is still occurring. > > On Thu, Mar 12, 2015 at 7:05 PM, Ufuk Celebi <[hidden email]> wrote: > >> On Thursday, March 12, 2015, Till Rohrmann <[hidden email]> >> wrote: >> >> > Have you run the 20 builds with the new shading code? With new shading >> the >> > TaskManagerFailsITCase should no longer fail. If it still does, then we >> > have to look into it again. >> >> >> No, rebased on Monday before shading. Let me rebase and rerun tonight. >> > > |
+1 for an early milestone release. Perhaps we can call it 0.9-milestone or
so? On Thu, Mar 26, 2015 at 11:01 AM, Robert Metzger <[hidden email]> wrote: > Two weeks have passed since we've discussed the 0.9 release the last time. > > The ApacheCon is in 18 days from now. > If we want, we can also release a "0.9.0-beta" release that contains known > bugs, but allows our users to try out the new features easily (because they > are part of a release). The vote for such a release would be mainly about > the legal aspects of the release rather than the stability. So I suspect > that the vote will go through much quicker. > > > > On Fri, Mar 13, 2015 at 12:01 PM, Robert Metzger <[hidden email]> > wrote: > > > I've reopened https://issues.apache.org/jira/browse/FLINK-1650 because > > the issue is still occurring. > > > > On Thu, Mar 12, 2015 at 7:05 PM, Ufuk Celebi <[hidden email]> wrote: > > > >> On Thursday, March 12, 2015, Till Rohrmann <[hidden email]> > >> wrote: > >> > >> > Have you run the 20 builds with the new shading code? With new shading > >> the > >> > TaskManagerFailsITCase should no longer fail. If it still does, then > we > >> > have to look into it again. > >> > >> > >> No, rebased on Monday before shading. Let me rebase and rerun tonight. > >> > > > > > |
+1 for an early release. It will help unblock the samoa PR that has 0.9 dependencies.
> On 26 Mar 2015, at 11:44, Kostas Tzoumas <[hidden email]> wrote: > > +1 for an early milestone release. Perhaps we can call it 0.9-milestone or > so? > > On Thu, Mar 26, 2015 at 11:01 AM, Robert Metzger <[hidden email]> > wrote: > >> Two weeks have passed since we've discussed the 0.9 release the last time. >> >> The ApacheCon is in 18 days from now. >> If we want, we can also release a "0.9.0-beta" release that contains known >> bugs, but allows our users to try out the new features easily (because they >> are part of a release). The vote for such a release would be mainly about >> the legal aspects of the release rather than the stability. So I suspect >> that the vote will go through much quicker. >> >> >> >> On Fri, Mar 13, 2015 at 12:01 PM, Robert Metzger <[hidden email]> >> wrote: >> >>> I've reopened https://issues.apache.org/jira/browse/FLINK-1650 because >>> the issue is still occurring. >>> >>> On Thu, Mar 12, 2015 at 7:05 PM, Ufuk Celebi <[hidden email]> wrote: >>> >>>> On Thursday, March 12, 2015, Till Rohrmann <[hidden email]> >>>> wrote: >>>> >>>>> Have you run the 20 builds with the new shading code? With new shading >>>> the >>>>> TaskManagerFailsITCase should no longer fail. If it still does, then >> we >>>>> have to look into it again. >>>> >>>> >>>> No, rebased on Monday before shading. Let me rebase and rerun tonight. >>>> >>> >>> >> |
Free forum by Nabble | Edit this page |