(DEPRECATED) Apache Flink Mailing List archive.

[DISCUSS] Make a release to be announced at ApacheCon

Classic

List

Threaded

44 messages Options

123

Stephan Ewen

[DISCUSS] Make a release to be announced at ApacheCon

Hi all!

ApacheCon is coming up and it is the 15th anniversary of the Apache
Software Foundation.

In the course of the conference, Apache would like to make a series of
announcements. If we manage to make a release during (or shortly before)
ApacheCon, they will announce it through their channels.

I am very much in favor of doing this, under the strong condition that we
are very confident that the master has grown to be stable enough (there are
major changes in the distributed runtime since version 0.8 that we are
still stabilizing). No use in a widely announced build that does not have
the quality.

Flink has now many new features that warrant a release soon (once we fixed
the last quirks in the new distributed runtime).

Notable new features are:
- Gelly
- Streaming windows
- Flink on Tez
- Expression API
- Distributed Runtime on Akka
- Batch mode
- Maybe even a first ML library version
- Some streaming fault tolerance

Robert proposed to have a feature freeze mid Match for that. His
cornerpoints were:

Feature freeze (forking off "release-0.9"): March 17
RC1 vote: March 24

The RC1 vote is 20 days before the ApacheCon (13. April).
For the last three releases, the average voting time was 20 days:
R 0.8.0 --> 14 days
R 0.7.0 --> 22 days
R 0.6 --> 26 days

Please share your opinion on this!

Greetings,
Stephan

Márton Balassi-3

Re: [DISCUSS] Make a release to be announced at ApacheCon

Hey,

We have a nice list of new features - it definitely makes sense to have
that as a release. On my side I really want to have a first limited version
of streaming fault tolerance in it.

+1 for Robert's proposal for the deadlines.

I'm also volunteering for release manager.

Best,
Marton

On Mon, Mar 2, 2015 at 2:03 PM, Stephan Ewen <[hidden email]> wrote:

> Hi all!
>
> ApacheCon is coming up and it is the 15th anniversary of the Apache
> Software Foundation.
>
> In the course of the conference, Apache would like to make a series of
> announcements. If we manage to make a release during (or shortly before)
> ApacheCon, they will announce it through their channels.
>
> I am very much in favor of doing this, under the strong condition that we
> are very confident that the master has grown to be stable enough (there are
> major changes in the distributed runtime since version 0.8 that we are
> still stabilizing). No use in a widely announced build that does not have
> the quality.
>
> Flink has now many new features that warrant a release soon (once we fixed
> the last quirks in the new distributed runtime).
>
> Notable new features are:
> - Gelly
> - Streaming windows
> - Flink on Tez
> - Expression API
> - Distributed Runtime on Akka
> - Batch mode
> - Maybe even a first ML library version
> - Some streaming fault tolerance
>
> Robert proposed to have a feature freeze mid Match for that. His
> cornerpoints were:
>
> Feature freeze (forking off "release-0.9"): March 17
> RC1 vote: March 24
>
> The RC1 vote is 20 days before the ApacheCon (13. April).
> For the last three releases, the average voting time was 20 days:
> R 0.8.0 --> 14 days
> R 0.7.0 --> 22 days
> R 0.6 --> 26 days
>
> Please share your opinion on this!
>
>
> Greetings,
> Stephan
>

Henry Saputra

Re: [DISCUSS] Make a release to be announced at ApacheCon

In reply to this post by Stephan Ewen

HI Stephan,

What is "Batch mode" feature in the list?

- Henry

On Mon, Mar 2, 2015 at 5:03 AM, Stephan Ewen <[hidden email]> wrote:

Márton Balassi

Re: [DISCUSS] Make a release to be announced at ApacheCon

Hi Henry,

Batch mode is a new execution mode for batch Flink jobs where instead of
pipelining the whole execution the job is scheduled in stages, thus
materializing the intermediate result before continuing to the next
operators. For implications see [1].

[1] http://www.slideshare.net/KostasTzoumas/flink-internals, page 18-21.

On Mon, Mar 2, 2015 at 11:39 PM, Henry Saputra <[hidden email]>
wrote:

> HI Stephan,
>
> What is "Batch mode" feature in the list?
>
> - Henry
>
> On Mon, Mar 2, 2015 at 5:03 AM, Stephan Ewen <[hidden email]> wrote:
> > Hi all!
> >
> > ApacheCon is coming up and it is the 15th anniversary of the Apache
> > Software Foundation.
> >
> > In the course of the conference, Apache would like to make a series of
> > announcements. If we manage to make a release during (or shortly before)
> > ApacheCon, they will announce it through their channels.
> >
> > I am very much in favor of doing this, under the strong condition that we
> > are very confident that the master has grown to be stable enough (there
> are
> > major changes in the distributed runtime since version 0.8 that we are
> > still stabilizing). No use in a widely announced build that does not have
> > the quality.
> >
> > Flink has now many new features that warrant a release soon (once we
> fixed
> > the last quirks in the new distributed runtime).
> >
> > Notable new features are:
> > - Gelly
> > - Streaming windows
> > - Flink on Tez
> > - Expression API
> > - Distributed Runtime on Akka
> > - Batch mode
> > - Maybe even a first ML library version
> > - Some streaming fault tolerance
> >
> > Robert proposed to have a feature freeze mid Match for that. His
> > cornerpoints were:
> >
> > Feature freeze (forking off "release-0.9"): March 17
> > RC1 vote: March 24
> >
> > The RC1 vote is 20 days before the ApacheCon (13. April).
> > For the last three releases, the average voting time was 20 days:
> > R 0.8.0 --> 14 days
> > R 0.7.0 --> 22 days
> > R 0.6 --> 26 days
> >
> > Please share your opinion on this!
> >
> >
> > Greetings,
> > Stephan
>

Henry Saputra

Re: [DISCUSS] Make a release to be announced at ApacheCon

Ah, thanks Márton.

So we are chartering to the similar concept of Spark RRD staging execution =P
I suppose there will be a runtime configuration or hint to tell the
Flink Job manager to indicate which execution is preferred?

- Henry

On Tue, Mar 3, 2015 at 2:09 AM, Márton Balassi <[hidden email]> wrote:

> Hi Henry,
>
> Batch mode is a new execution mode for batch Flink jobs where instead of
> pipelining the whole execution the job is scheduled in stages, thus
> materializing the intermediate result before continuing to the next
> operators. For implications see [1].
>
> [1] http://www.slideshare.net/KostasTzoumas/flink-internals, page 18-21.
>
>
> On Mon, Mar 2, 2015 at 11:39 PM, Henry Saputra <[hidden email]>
> wrote:
>
>> HI Stephan,
>>
>> What is "Batch mode" feature in the list?
>>
>> - Henry
>>
>> On Mon, Mar 2, 2015 at 5:03 AM, Stephan Ewen <[hidden email]> wrote:
>> > Hi all!
>> >
>> > ApacheCon is coming up and it is the 15th anniversary of the Apache
>> > Software Foundation.
>> >
>> > In the course of the conference, Apache would like to make a series of
>> > announcements. If we manage to make a release during (or shortly before)
>> > ApacheCon, they will announce it through their channels.
>> >
>> > I am very much in favor of doing this, under the strong condition that we
>> > are very confident that the master has grown to be stable enough (there
>> are
>> > major changes in the distributed runtime since version 0.8 that we are
>> > still stabilizing). No use in a widely announced build that does not have
>> > the quality.
>> >
>> > Flink has now many new features that warrant a release soon (once we
>> fixed
>> > the last quirks in the new distributed runtime).
>> >
>> > Notable new features are:
>> > - Gelly
>> > - Streaming windows
>> > - Flink on Tez
>> > - Expression API
>> > - Distributed Runtime on Akka
>> > - Batch mode
>> > - Maybe even a first ML library version
>> > - Some streaming fault tolerance
>> >
>> > Robert proposed to have a feature freeze mid Match for that. His
>> > cornerpoints were:
>> >
>> > Feature freeze (forking off "release-0.9"): March 17
>> > RC1 vote: March 24
>> >
>> > The RC1 vote is 20 days before the ApacheCon (13. April).
>> > For the last three releases, the average voting time was 20 days:
>> > R 0.8.0 --> 14 days
>> > R 0.7.0 --> 22 days
>> > R 0.6 --> 26 days
>> >
>> > Please share your opinion on this!
>> >
>> >
>> > Greetings,
>> > Stephan
>>

Robert Metzger

Re: [DISCUSS] Make a release to be announced at ApacheCon

+1 for Marton as a release manager. Thank you!

On Tue, Mar 3, 2015 at 7:56 PM, Henry Saputra <[hidden email]>
wrote:

> Ah, thanks Márton.
>
> So we are chartering to the similar concept of Spark RRD staging execution
> =P
> I suppose there will be a runtime configuration or hint to tell the
> Flink Job manager to indicate which execution is preferred?
>
>
> - Henry
>
> On Tue, Mar 3, 2015 at 2:09 AM, Márton Balassi <[hidden email]>
> wrote:
> > Hi Henry,
> >
> > Batch mode is a new execution mode for batch Flink jobs where instead of
> > pipelining the whole execution the job is scheduled in stages, thus
> > materializing the intermediate result before continuing to the next
> > operators. For implications see [1].
> >
> > [1] http://www.slideshare.net/KostasTzoumas/flink-internals, page 18-21.
> >
> >
> > On Mon, Mar 2, 2015 at 11:39 PM, Henry Saputra <[hidden email]>
> > wrote:
> >
> >> HI Stephan,
> >>
> >> What is "Batch mode" feature in the list?
> >>
> >> - Henry
> >>
> >> On Mon, Mar 2, 2015 at 5:03 AM, Stephan Ewen <[hidden email]> wrote:
> >> > Hi all!
> >> >
> >> > ApacheCon is coming up and it is the 15th anniversary of the Apache
> >> > Software Foundation.
> >> >
> >> > In the course of the conference, Apache would like to make a series of
> >> > announcements. If we manage to make a release during (or shortly
> before)
> >> > ApacheCon, they will announce it through their channels.
> >> >
> >> > I am very much in favor of doing this, under the strong condition
> that we
> >> > are very confident that the master has grown to be stable enough
> (there
> >> are
> >> > major changes in the distributed runtime since version 0.8 that we are
> >> > still stabilizing). No use in a widely announced build that does not
> have
> >> > the quality.
> >> >
> >> > Flink has now many new features that warrant a release soon (once we
> >> fixed
> >> > the last quirks in the new distributed runtime).
> >> >
> >> > Notable new features are:
> >> > - Gelly
> >> > - Streaming windows
> >> > - Flink on Tez
> >> > - Expression API
> >> > - Distributed Runtime on Akka
> >> > - Batch mode
> >> > - Maybe even a first ML library version
> >> > - Some streaming fault tolerance
> >> >
> >> > Robert proposed to have a feature freeze mid Match for that. His
> >> > cornerpoints were:
> >> >
> >> > Feature freeze (forking off "release-0.9"): March 17
> >> > RC1 vote: March 24
> >> >
> >> > The RC1 vote is 20 days before the ApacheCon (13. April).
> >> > For the last three releases, the average voting time was 20 days:
> >> > R 0.8.0 --> 14 days
> >> > R 0.7.0 --> 22 days
> >> > R 0.6 --> 26 days
> >> >
> >> > Please share your opinion on this!
> >> >
> >> >
> >> > Greetings,
> >> > Stephan
> >>
>

Robert Metzger

Re: [DISCUSS] Make a release to be announced at ApacheCon

Hey,

whats the status on this? There is one week left until we are going to fork
off a branch for 0.9 .. if we stick to the suggested timeline.
The initial email said "I am very much in favor of doing this, under the
strong condition that we
are very confident that the master has grown to be stable enough". I think
it is time to evaluate whether we are confident that the master is stable.

Best
Robert

On Wed, Mar 4, 2015 at 9:42 AM, Robert Metzger <[hidden email]> wrote:

> +1 for Marton as a release manager. Thank you!
>
>
> On Tue, Mar 3, 2015 at 7:56 PM, Henry Saputra <[hidden email]>
> wrote:
>
>> Ah, thanks Márton.
>>
>> So we are chartering to the similar concept of Spark RRD staging
>> execution =P
>> I suppose there will be a runtime configuration or hint to tell the
>> Flink Job manager to indicate which execution is preferred?
>>
>>
>> - Henry
>>
>> On Tue, Mar 3, 2015 at 2:09 AM, Márton Balassi <[hidden email]>
>> wrote:
>> > Hi Henry,
>> >
>> > Batch mode is a new execution mode for batch Flink jobs where instead of
>> > pipelining the whole execution the job is scheduled in stages, thus
>> > materializing the intermediate result before continuing to the next
>> > operators. For implications see [1].
>> >
>> > [1] http://www.slideshare.net/KostasTzoumas/flink-internals, page
>> 18-21.
>> >
>> >
>> > On Mon, Mar 2, 2015 at 11:39 PM, Henry Saputra <[hidden email]
>> >
>> > wrote:
>> >
>> >> HI Stephan,
>> >>
>> >> What is "Batch mode" feature in the list?
>> >>
>> >> - Henry
>> >>
>> >> On Mon, Mar 2, 2015 at 5:03 AM, Stephan Ewen <[hidden email]> wrote:
>> >> > Hi all!
>> >> >
>> >> > ApacheCon is coming up and it is the 15th anniversary of the Apache
>> >> > Software Foundation.
>> >> >
>> >> > In the course of the conference, Apache would like to make a series
>> of
>> >> > announcements. If we manage to make a release during (or shortly
>> before)
>> >> > ApacheCon, they will announce it through their channels.
>> >> >
>> >> > I am very much in favor of doing this, under the strong condition
>> that we
>> >> > are very confident that the master has grown to be stable enough
>> (there
>> >> are
>> >> > major changes in the distributed runtime since version 0.8 that we
>> are
>> >> > still stabilizing). No use in a widely announced build that does not
>> have
>> >> > the quality.
>> >> >
>> >> > Flink has now many new features that warrant a release soon (once we
>> >> fixed
>> >> > the last quirks in the new distributed runtime).
>> >> >
>> >> > Notable new features are:
>> >> > - Gelly
>> >> > - Streaming windows
>> >> > - Flink on Tez
>> >> > - Expression API
>> >> > - Distributed Runtime on Akka
>> >> > - Batch mode
>> >> > - Maybe even a first ML library version
>> >> > - Some streaming fault tolerance
>> >> >
>> >> > Robert proposed to have a feature freeze mid Match for that. His
>> >> > cornerpoints were:
>> >> >
>> >> > Feature freeze (forking off "release-0.9"): March 17
>> >> > RC1 vote: March 24
>> >> >
>> >> > The RC1 vote is 20 days before the ApacheCon (13. April).
>> >> > For the last three releases, the average voting time was 20 days:
>> >> > R 0.8.0 --> 14 days
>> >> > R 0.7.0 --> 22 days
>> >> > R 0.6 --> 26 days
>> >> >
>> >> > Please share your opinion on this!
>> >> >
>> >> >
>> >> > Greetings,
>> >> > Stephan
>> >>
>>
>
>

Márton Balassi-3

Re: [DISCUSS] Make a release to be announced at ApacheCon

On the streaming side:

Must have:
* Tests for the fault tolerance (My first priority this week)
* Merging Gyula's recent windowing PR [1]

Really needed:
* Self-join for DataStreams (Gabor has a prototype, PR coming today) [1]
* ITCase tests for streaming examples (Peter & myself, review and clean
up pending) [3]
* Different streaming/batch cluster memory settings (Stephan) [4]
* Make projection operator chainable (Gabor Gevay - a wannabe GSoC
student, PR coming soon) [5]
* Parallel time discretization (Gyula, PR coming tomorrow) [6]

Would be nice to have:
* Complex integration test for streaming (Peter) [7]
* Extend streaming aggregation tests to include POJOs [8]
* Iteration bug for large input [9]

We would also need a general pass over the streaming API for javadocs.

This is not one week but we can hopefully fit into two weeks.

[1] https://github.com/apache/flink/pull/465
[2] https://issues.apache.org/jira/browse/FLINK-1594
[3] https://issues.apache.org/jira/browse/FLINK-1560
[4] https://issues.apache.org/jira/browse/FLINK-1368
[5] https://issues.apache.org/jira/browse/FLINK-1641
[6] https://issues.apache.org/jira/browse/FLINK-1618
[7] https://issues.apache.org/jira/browse/FLINK-1595
[8] https://issues.apache.org/jira/browse/FLINK-1544
[9] https://issues.apache.org/jira/browse/FLINK-1239

On Tue, Mar 10, 2015 at 11:20 AM, Robert Metzger <[hidden email]>
wrote:

> Hey,
>
> whats the status on this? There is one week left until we are going to fork
> off a branch for 0.9 .. if we stick to the suggested timeline.
> The initial email said "I am very much in favor of doing this, under the
> strong condition that we
> are very confident that the master has grown to be stable enough". I think
> it is time to evaluate whether we are confident that the master is stable.
>
> Best
> Robert
>
>
>
> On Wed, Mar 4, 2015 at 9:42 AM, Robert Metzger <[hidden email]>
> wrote:
>
> > +1 for Marton as a release manager. Thank you!
> >
> >
> > On Tue, Mar 3, 2015 at 7:56 PM, Henry Saputra <[hidden email]>
> > wrote:
> >
> >> Ah, thanks Márton.
> >>
> >> So we are chartering to the similar concept of Spark RRD staging
> >> execution =P
> >> I suppose there will be a runtime configuration or hint to tell the
> >> Flink Job manager to indicate which execution is preferred?
> >>
> >>
> >> - Henry
> >>
> >> On Tue, Mar 3, 2015 at 2:09 AM, Márton Balassi <
> [hidden email]>
> >> wrote:
> >> > Hi Henry,
> >> >
> >> > Batch mode is a new execution mode for batch Flink jobs where instead
> of
> >> > pipelining the whole execution the job is scheduled in stages, thus
> >> > materializing the intermediate result before continuing to the next
> >> > operators. For implications see [1].
> >> >
> >> > [1] http://www.slideshare.net/KostasTzoumas/flink-internals, page
> >> 18-21.
> >> >
> >> >
> >> > On Mon, Mar 2, 2015 at 11:39 PM, Henry Saputra <
> [hidden email]
> >> >
> >> > wrote:
> >> >
> >> >> HI Stephan,
> >> >>
> >> >> What is "Batch mode" feature in the list?
> >> >>
> >> >> - Henry
> >> >>
> >> >> On Mon, Mar 2, 2015 at 5:03 AM, Stephan Ewen <[hidden email]>
> wrote:
> >> >> > Hi all!
> >> >> >
> >> >> > ApacheCon is coming up and it is the 15th anniversary of the Apache
> >> >> > Software Foundation.
> >> >> >
> >> >> > In the course of the conference, Apache would like to make a series
> >> of
> >> >> > announcements. If we manage to make a release during (or shortly
> >> before)
> >> >> > ApacheCon, they will announce it through their channels.
> >> >> >
> >> >> > I am very much in favor of doing this, under the strong condition
> >> that we
> >> >> > are very confident that the master has grown to be stable enough
> >> (there
> >> >> are
> >> >> > major changes in the distributed runtime since version 0.8 that we
> >> are
> >> >> > still stabilizing). No use in a widely announced build that does
> not
> >> have
> >> >> > the quality.
> >> >> >
> >> >> > Flink has now many new features that warrant a release soon (once
> we
> >> >> fixed
> >> >> > the last quirks in the new distributed runtime).
> >> >> >
> >> >> > Notable new features are:
> >> >> > - Gelly
> >> >> > - Streaming windows
> >> >> > - Flink on Tez
> >> >> > - Expression API
> >> >> > - Distributed Runtime on Akka
> >> >> > - Batch mode
> >> >> > - Maybe even a first ML library version
> >> >> > - Some streaming fault tolerance
> >> >> >
> >> >> > Robert proposed to have a feature freeze mid Match for that. His
> >> >> > cornerpoints were:
> >> >> >
> >> >> > Feature freeze (forking off "release-0.9"): March 17
> >> >> > RC1 vote: March 24
> >> >> >
> >> >> > The RC1 vote is 20 days before the ApacheCon (13. April).
> >> >> > For the last three releases, the average voting time was 20 days:
> >> >> > R 0.8.0 --> 14 days
> >> >> > R 0.7.0 --> 22 days
> >> >> > R 0.6 --> 26 days
> >> >> >
> >> >> > Please share your opinion on this!
> >> >> >
> >> >> >
> >> >> > Greetings,
> >> >> > Stephan
> >> >>
> >>
> >
> >
>

Henry Saputra

Re: [DISCUSS] Make a release to be announced at ApacheCon

In reply to this post by Robert Metzger

I will follow up again with Sally this week if there any special
messaging or communications needed to do for the Apache Con from our
side.

- Henry

On Tue, Mar 10, 2015 at 3:20 AM, Robert Metzger <[hidden email]> wrote:

> Hey,
>
> whats the status on this? There is one week left until we are going to fork
> off a branch for 0.9 .. if we stick to the suggested timeline.
> The initial email said "I am very much in favor of doing this, under the
> strong condition that we
> are very confident that the master has grown to be stable enough". I think
> it is time to evaluate whether we are confident that the master is stable.
>
> Best
> Robert
>
>
>
> On Wed, Mar 4, 2015 at 9:42 AM, Robert Metzger <[hidden email]> wrote:
>
>> +1 for Marton as a release manager. Thank you!
>>
>>
>> On Tue, Mar 3, 2015 at 7:56 PM, Henry Saputra <[hidden email]>
>> wrote:
>>
>>> Ah, thanks Márton.
>>>
>>> So we are chartering to the similar concept of Spark RRD staging
>>> execution =P
>>> I suppose there will be a runtime configuration or hint to tell the
>>> Flink Job manager to indicate which execution is preferred?
>>>
>>>
>>> - Henry
>>>
>>> On Tue, Mar 3, 2015 at 2:09 AM, Márton Balassi <[hidden email]>
>>> wrote:
>>> > Hi Henry,
>>> >
>>> > Batch mode is a new execution mode for batch Flink jobs where instead of
>>> > pipelining the whole execution the job is scheduled in stages, thus
>>> > materializing the intermediate result before continuing to the next
>>> > operators. For implications see [1].
>>> >
>>> > [1] http://www.slideshare.net/KostasTzoumas/flink-internals, page
>>> 18-21.
>>> >
>>> >
>>> > On Mon, Mar 2, 2015 at 11:39 PM, Henry Saputra <[hidden email]
>>> >
>>> > wrote:
>>> >
>>> >> HI Stephan,
>>> >>
>>> >> What is "Batch mode" feature in the list?
>>> >>
>>> >> - Henry
>>> >>
>>> >> On Mon, Mar 2, 2015 at 5:03 AM, Stephan Ewen <[hidden email]> wrote:
>>> >> > Hi all!
>>> >> >
>>> >> > ApacheCon is coming up and it is the 15th anniversary of the Apache
>>> >> > Software Foundation.
>>> >> >
>>> >> > In the course of the conference, Apache would like to make a series
>>> of
>>> >> > announcements. If we manage to make a release during (or shortly
>>> before)
>>> >> > ApacheCon, they will announce it through their channels.
>>> >> >
>>> >> > I am very much in favor of doing this, under the strong condition
>>> that we
>>> >> > are very confident that the master has grown to be stable enough
>>> (there
>>> >> are
>>> >> > major changes in the distributed runtime since version 0.8 that we
>>> are
>>> >> > still stabilizing). No use in a widely announced build that does not
>>> have
>>> >> > the quality.
>>> >> >
>>> >> > Flink has now many new features that warrant a release soon (once we
>>> >> fixed
>>> >> > the last quirks in the new distributed runtime).
>>> >> >
>>> >> > Notable new features are:
>>> >> > - Gelly
>>> >> > - Streaming windows
>>> >> > - Flink on Tez
>>> >> > - Expression API
>>> >> > - Distributed Runtime on Akka
>>> >> > - Batch mode
>>> >> > - Maybe even a first ML library version
>>> >> > - Some streaming fault tolerance
>>> >> >
>>> >> > Robert proposed to have a feature freeze mid Match for that. His
>>> >> > cornerpoints were:
>>> >> >
>>> >> > Feature freeze (forking off "release-0.9"): March 17
>>> >> > RC1 vote: March 24
>>> >> >
>>> >> > The RC1 vote is 20 days before the ApacheCon (13. April).
>>> >> > For the last three releases, the average voting time was 20 days:
>>> >> > R 0.8.0 --> 14 days
>>> >> > R 0.7.0 --> 22 days
>>> >> > R 0.6 --> 26 days
>>> >> >
>>> >> > Please share your opinion on this!
>>> >> >
>>> >> >
>>> >> > Greetings,
>>> >> > Stephan
>>> >>
>>>
>>
>>

Ufuk Celebi-2

Re: [DISCUSS] Make a release to be announced at ApacheCon

In reply to this post by Robert Metzger

On Tue, Mar 10, 2015 at 11:20 AM, Robert Metzger <[hidden email]>
wrote:

> I think
> it is time to evaluate whether we are confident that the master is stable.
>

In the course of finishing up #471 [1] I ran 20 Travis builds over night,
of which 7 failed.

The (unexpected) failing test cases:

- ExternalSortITCase.testSpillingSortWithIntermediateMerge:325 Field 0 is
null, but expected to hold a key.
- JobManagerProcessReapingTest.testReapProcessOnFailure:121 JobManager
process did not launch the JobManager properly. Failed to look up

The (expected/known-to-fail) failing test cases:

- TaskManagerFailsITCase => will be fixed with Shading?
- YARN test cases => polluted logs (unrelated to YARN)?

Can people, who are familiar with the test cases confirm/explain that the
failures are known. Details about failing builds below.

(One of the failures is related to the changes in my PR.)

[1] https://github.com/apache/flink/pull/471

----

#327: https://travis-ci.org/uce/incubator-flink/builds/53985832
- 327.1 (
https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985834/log.txt):
ExternalSortITCase.testSpillingSortWithIntermediateMerge:325 Field 0 is
null, but expected to hold a key.
- 327.4 (
https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985838/log.txt):
YARNSessionFIFOITCase => exception in taskmanager-strerr.log file

#331: https://travis-ci.org/uce/incubator-flink/builds/53985889
- 331.2 (
https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985892/log.txt):
Failed due to a change in my PR
- 332.3 (
https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985893/log.txt):
TaskManagerFailsITCase => expected class
org.apache.flink.runtime.messages.JobManagerMessages$JobResultSuccess,
found class akka.actor.Status$Failure

#332: https://travis-ci.org/uce/incubator-flink/builds/53985900
- 332.3 (
https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985903/log.txt):
TaskManagerFailsITCase => expected class
org.apache.flink.runtime.messages.JobManagerMessages$JobResultSuccess,
found class akka.actor.

#338: https://travis-ci.org/uce/incubator-flink/builds/53985981
- 338.5 (
https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985986/log.txt):
Failed due to a change in my PR

#344. https://travis-ci.org/uce/incubator-flink/builds/53986054
- 344.5 (
https://s3.amazonaws.com/archive.travis-ci.org/jobs/53986059/log.txt):
YARNSessionFIFOITCase => exception in taskmanager-strerr.log file

#346. https://travis-ci.org/uce/incubator-flink/builds/53986071
- 346.3 (
https://s3.amazonaws.com/archive.travis-ci.org/jobs/53986080/log.txt):
JobManagerProcessReapingTest.testReapProcessOnFailure:121 JobManager
process did not launch the JobManager properly. Failed to look up
JobManager actor at localhost:57964

#347. https://travis-ci.org/uce/incubator-flink/builds/53986111
- 347.5 (
https://s3.amazonaws.com/archive.travis-ci.org/jobs/53986116/log.txt):
YARNSessionCapacitySchedulerITCase => exception in jobmanager-strerr.log
file

Robert Metzger

Re: [DISCUSS] Make a release to be announced at ApacheCon

So you're saying regarding the release you don't feel very confident that
we manage to fork off release-0.9 next week?

The exceptions in the jobmanager-stderr from the YARN tests is the
following (from #347.5 and #344.5):

07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase
- LINE: Mar 12, 2015 7:45:57 AM
org.jboss.netty.channel.DefaultChannelPipeline
07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase
- LINE: WARNING: An exception was thrown by an exception handler.
07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase
- LINE: java.util.concurrent.RejectedExecutionException: Worker has
already been shutdown
07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase
- LINE: at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.registerTask(AbstractNioSelector.java:120)
07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase
- LINE: at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72)
07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase
- LINE: at
org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase
- LINE: at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56)
07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase
- LINE: at
org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase
- LINE: at
org.jboss.netty.channel.socket.nio.AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34)
07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase
- LINE: at
org.jboss.netty.channel.Channels.fireExceptionCaughtLater(Channels.java:496)
07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase
- LINE: at
org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:46)
07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase
- LINE: at
org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:54)
07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase
- LINE: at
org.jboss.netty.channel.Channels.disconnect(Channels.java:781)
07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase
- LINE: at
org.jboss.netty.channel.AbstractChannel.disconnect(AbstractChannel.java:211)
07:46:00,598 WARN org.apache.flink.yarn.YarnTestBase
- LINE: at
akka.remote.transport.netty.NettyTransport$$anonfun$gracefulClose$1.apply(NettyTransport.scala:223)

On Thu, Mar 12, 2015 at 9:51 AM, Ufuk Celebi <[hidden email]> wrote:

> On Tue, Mar 10, 2015 at 11:20 AM, Robert Metzger <[hidden email]>
> wrote:
>
> > I think
> > it is time to evaluate whether we are confident that the master is
> stable.
> >
>
> In the course of finishing up #471 [1] I ran 20 Travis builds over night,
> of which 7 failed.
>
> The (unexpected) failing test cases:
>
> - ExternalSortITCase.testSpillingSortWithIntermediateMerge:325 Field 0 is
> null, but expected to hold a key.
> - JobManagerProcessReapingTest.testReapProcessOnFailure:121 JobManager
> process did not launch the JobManager properly. Failed to look up
>
> The (expected/known-to-fail) failing test cases:
>
> - TaskManagerFailsITCase => will be fixed with Shading?
> - YARN test cases => polluted logs (unrelated to YARN)?
>
> Can people, who are familiar with the test cases confirm/explain that the
> failures are known. Details about failing builds below.
>
> (One of the failures is related to the changes in my PR.)
>
> [1] https://github.com/apache/flink/pull/471
>
> ----
>
> #327: https://travis-ci.org/uce/incubator-flink/builds/53985832
> - 327.1 (
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985834/log.txt):
> ExternalSortITCase.testSpillingSortWithIntermediateMerge:325 Field 0 is
> null, but expected to hold a key.
> - 327.4 (
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985838/log.txt):
> YARNSessionFIFOITCase => exception in taskmanager-strerr.log file
>
> #331: https://travis-ci.org/uce/incubator-flink/builds/53985889
> - 331.2 (
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985892/log.txt):
> Failed due to a change in my PR
> - 332.3 (
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985893/log.txt):
> TaskManagerFailsITCase => expected class
> org.apache.flink.runtime.messages.JobManagerMessages$JobResultSuccess,
> found class akka.actor.Status$Failure
>
> #332: https://travis-ci.org/uce/incubator-flink/builds/53985900
> - 332.3 (
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985903/log.txt):
> TaskManagerFailsITCase => expected class
> org.apache.flink.runtime.messages.JobManagerMessages$JobResultSuccess,
> found class akka.actor.
>
> #338: https://travis-ci.org/uce/incubator-flink/builds/53985981
> - 338.5 (
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/53985986/log.txt):
> Failed due to a change in my PR
>
> #344. https://travis-ci.org/uce/incubator-flink/builds/53986054
> - 344.5 (
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/53986059/log.txt):
> YARNSessionFIFOITCase => exception in taskmanager-strerr.log file
>
> #346. https://travis-ci.org/uce/incubator-flink/builds/53986071
> - 346.3 (
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/53986080/log.txt):
> JobManagerProcessReapingTest.testReapProcessOnFailure:121 JobManager
> process did not launch the JobManager properly. Failed to look up
> JobManager actor at localhost:57964
>
> #347. https://travis-ci.org/uce/incubator-flink/builds/53986111
> - 347.5 (
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/53986116/log.txt):
> YARNSessionCapacitySchedulerITCase => exception in jobmanager-strerr.log
> file
>

Ufuk Celebi-2

Re: [DISCUSS] Make a release to be announced at ApacheCon

On Thu, Mar 12, 2015 at 10:11 AM, Robert Metzger <[hidden email]>
wrote:

> So you're saying regarding the release you don't feel very confident that
> we manage to fork off release-0.9 next week?
>

Yes. At the moment I would be uncomfortable with forking off.

----

Regarding the failing tests: I thought that some failings jobs were related
to my changes, but after looking into it, it was a false alarm. See
comments here: https://github.com/apache/flink/pull/475

Aljoscha Krettek-2

Re: [DISCUSS] Make a release to be announced at ApacheCon

I would like to get the Expression API for Java in there, as well.

On Thu, Mar 12, 2015 at 11:59 AM, Ufuk Celebi <[hidden email]> wrote:

> On Thu, Mar 12, 2015 at 10:11 AM, Robert Metzger <[hidden email]>
> wrote:
>
>> So you're saying regarding the release you don't feel very confident that
>> we manage to fork off release-0.9 next week?
>>
>
> Yes. At the moment I would be uncomfortable with forking off.
>
> ----
>
> Regarding the failing tests: I thought that some failings jobs were related
> to my changes, but after looking into it, it was a false alarm. See
> comments here: https://github.com/apache/flink/pull/475

Stephan Ewen

Re: [DISCUSS] Make a release to be announced at ApacheCon

I am also big time skeptical.

There are some remaining stability issues with 0.9
- Apparently a bug in the task canceling
- Blocking Data Exchange is premature at this point
- TaskManager startup is not robust
- TaskManager / JobManager registration is not robust
- Streaming fault tolerance needs more testing before we can make an
assessment

I think this needs a few more weeks...

On Thu, Mar 12, 2015 at 1:57 PM, Aljoscha Krettek <[hidden email]>
wrote:

> I would like to get the Expression API for Java in there, as well.
>
> On Thu, Mar 12, 2015 at 11:59 AM, Ufuk Celebi <[hidden email]> wrote:
> > On Thu, Mar 12, 2015 at 10:11 AM, Robert Metzger <[hidden email]>
> > wrote:
> >
> >> So you're saying regarding the release you don't feel very confident
> that
> >> we manage to fork off release-0.9 next week?
> >>
> >
> > Yes. At the moment I would be uncomfortable with forking off.
> >
> > ----
> >
> > Regarding the failing tests: I thought that some failings jobs were
> related
> > to my changes, but after looking into it, it was a false alarm. See
> > comments here: https://github.com/apache/flink/pull/475
>

till.rohrmann

Re: [DISCUSS] Make a release to be announced at ApacheCon

Have you run the 20 builds with the new shading code? With new shading the
TaskManagerFailsITCase should no longer fail. If it still does, then we
have to look into it again.

On Thu, Mar 12, 2015 at 2:01 PM, Stephan Ewen <[hidden email]> wrote:

> I am also big time skeptical.
>
> There are some remaining stability issues with 0.9
> - Apparently a bug in the task canceling
> - Blocking Data Exchange is premature at this point
> - TaskManager startup is not robust
> - TaskManager / JobManager registration is not robust
> - Streaming fault tolerance needs more testing before we can make an
> assessment
>
> I think this needs a few more weeks...
>
> On Thu, Mar 12, 2015 at 1:57 PM, Aljoscha Krettek <[hidden email]>
> wrote:
>
> > I would like to get the Expression API for Java in there, as well.
> >
> > On Thu, Mar 12, 2015 at 11:59 AM, Ufuk Celebi <[hidden email]> wrote:
> > > On Thu, Mar 12, 2015 at 10:11 AM, Robert Metzger <[hidden email]>
> > > wrote:
> > >
> > >> So you're saying regarding the release you don't feel very confident
> > that
> > >> we manage to fork off release-0.9 next week?
> > >>
> > >
> > > Yes. At the moment I would be uncomfortable with forking off.
> > >
> > > ----
> > >
> > > Regarding the failing tests: I thought that some failings jobs were
> > related
> > > to my changes, but after looking into it, it was a false alarm. See
> > > comments here: https://github.com/apache/flink/pull/475
> >
>

Ufuk Celebi-2

Re: [DISCUSS] Make a release to be announced at ApacheCon

On Thursday, March 12, 2015, Till Rohrmann <[hidden email]> wrote:

> Have you run the 20 builds with the new shading code? With new shading the
> TaskManagerFailsITCase should no longer fail. If it still does, then we
> have to look into it again.

No, rebased on Monday before shading. Let me rebase and rerun tonight.

Robert Metzger

Re: [DISCUSS] Make a release to be announced at ApacheCon

I've reopened https://issues.apache.org/jira/browse/FLINK-1650 because the
issue is still occurring.

On Thu, Mar 12, 2015 at 7:05 PM, Ufuk Celebi <[hidden email]> wrote:

> On Thursday, March 12, 2015, Till Rohrmann <[hidden email]>
> wrote:
>
> > Have you run the 20 builds with the new shading code? With new shading
> the
> > TaskManagerFailsITCase should no longer fail. If it still does, then we
> > have to look into it again.
>
>
> No, rebased on Monday before shading. Let me rebase and rerun tonight.
>

Robert Metzger

Re: [DISCUSS] Make a release to be announced at ApacheCon

Two weeks have passed since we've discussed the 0.9 release the last time.

The ApacheCon is in 18 days from now.
If we want, we can also release a "0.9.0-beta" release that contains known
bugs, but allows our users to try out the new features easily (because they
are part of a release). The vote for such a release would be mainly about
the legal aspects of the release rather than the stability. So I suspect
that the vote will go through much quicker.

On Fri, Mar 13, 2015 at 12:01 PM, Robert Metzger <[hidden email]>
wrote:

> I've reopened https://issues.apache.org/jira/browse/FLINK-1650 because
> the issue is still occurring.
>
> On Thu, Mar 12, 2015 at 7:05 PM, Ufuk Celebi <[hidden email]> wrote:
>
>> On Thursday, March 12, 2015, Till Rohrmann <[hidden email]>
>> wrote:
>>
>> > Have you run the 20 builds with the new shading code? With new shading
>> the
>> > TaskManagerFailsITCase should no longer fail. If it still does, then we
>> > have to look into it again.
>>
>>
>> No, rebased on Monday before shading. Let me rebase and rerun tonight.
>>
>
>

Kostas Tzoumas-2

Re: [DISCUSS] Make a release to be announced at ApacheCon

+1 for an early milestone release. Perhaps we can call it 0.9-milestone or
so?

On Thu, Mar 26, 2015 at 11:01 AM, Robert Metzger <[hidden email]>
wrote:

> Two weeks have passed since we've discussed the 0.9 release the last time.
>
> The ApacheCon is in 18 days from now.
> If we want, we can also release a "0.9.0-beta" release that contains known
> bugs, but allows our users to try out the new features easily (because they
> are part of a release). The vote for such a release would be mainly about
> the legal aspects of the release rather than the stability. So I suspect
> that the vote will go through much quicker.
>
>
>
> On Fri, Mar 13, 2015 at 12:01 PM, Robert Metzger <[hidden email]>
> wrote:
>
> > I've reopened https://issues.apache.org/jira/browse/FLINK-1650 because
> > the issue is still occurring.
> >
> > On Thu, Mar 12, 2015 at 7:05 PM, Ufuk Celebi <[hidden email]> wrote:
> >
> >> On Thursday, March 12, 2015, Till Rohrmann <[hidden email]>
> >> wrote:
> >>
> >> > Have you run the 20 builds with the new shading code? With new shading
> >> the
> >> > TaskManagerFailsITCase should no longer fail. If it still does, then
> we
> >> > have to look into it again.
> >>
> >>
> >> No, rebased on Monday before shading. Let me rebase and rerun tonight.
> >>
> >
> >
>

Paris Carbone

Re: [DISCUSS] Make a release to be announced at ApacheCon

+1 for an early release. It will help unblock the samoa PR that has 0.9 dependencies.

> On 26 Mar 2015, at 11:44, Kostas Tzoumas <[hidden email]> wrote:
>
> +1 for an early milestone release. Perhaps we can call it 0.9-milestone or
> so?
>
> On Thu, Mar 26, 2015 at 11:01 AM, Robert Metzger <[hidden email]>
> wrote:
>
>> Two weeks have passed since we've discussed the 0.9 release the last time.
>>
>> The ApacheCon is in 18 days from now.
>> If we want, we can also release a "0.9.0-beta" release that contains known
>> bugs, but allows our users to try out the new features easily (because they
>> are part of a release). The vote for such a release would be mainly about
>> the legal aspects of the release rather than the stability. So I suspect
>> that the vote will go through much quicker.
>>
>>
>>
>> On Fri, Mar 13, 2015 at 12:01 PM, Robert Metzger <[hidden email]>
>> wrote:
>>
>>> I've reopened https://issues.apache.org/jira/browse/FLINK-1650 because
>>> the issue is still occurring.
>>>
>>> On Thu, Mar 12, 2015 at 7:05 PM, Ufuk Celebi <[hidden email]> wrote:
>>>
>>>> On Thursday, March 12, 2015, Till Rohrmann <[hidden email]>
>>>> wrote:
>>>>
>>>>> Have you run the 20 builds with the new shading code? With new shading
>>>> the
>>>>> TaskManagerFailsITCase should no longer fail. If it still does, then
>> we
>>>>> have to look into it again.
>>>>
>>>>
>>>> No, rebased on Monday before shading. Let me rebase and rerun tonight.
>>>>
>>>
>>>
>>

123