[DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Tzu-Li (Gordon) Tai
Hi all!

I would like to start a discussion for the next bugfix release for 1.1.x and 1.2.x.
There’s been quite a few critical fixes for bugs in both the releases recently, and I think they deserve a bugfix release soon.
Most of the bugs were reported by users.

I’m starting the discussion for both bugfix releases because most fixes span both releases (almost identical).
Of course, the actual RC votes and RC creation process doesn’t have to be started together.

Here’s an overview of what’s been collected so far, for both bugfix releases -
(it’s a list of what I’m aware of so far, and may be missing stuff; please append and bring to attention as necessary :-) )


For Flink 1.2.1:

(1) https://issues.apache.org/jira/browse/FLINK-5701:
Async exceptions in the FlinkKafkaProducer are not checked on checkpoints. This compromises the producer’s at-least-once guarantee.
Status: merged

(2) https://issues.apache.org/jira/browse/FLINK-5949:
Do not check Kerberos credentials for non-Kerberos authentications. MapR users are affected by this, and cannot submit Flink on YARN jobs on a secured MapR cluster.
Status: PR - https://github.com/apache/flink/pull/3528, one +1 already

(3) https://issues.apache.org/jira/browse/FLINK-6006:
Kafka Consumer can lose state if queried partition list is incomplete on restore.
Status: PR - https://github.com/apache/flink/pull/3505, one +1 already

(4) https://issues.apache.org/jira/browse/FLINK-6025:
KryoSerializer may use the wrong classloader when Kryo’s JavaSerializer is used.
Status: merged

(5) https://issues.apache.org/jira/browse/FLINK-5771:
Fix multi-char delimiters in Batch InputFormats.
Status: merged

(6) https://issues.apache.org/jira/browse/FLINK-5934:
Set the Scheduler in the ExecutionGraph via its constructor. This fixes a bug that causes HA recovery to fail.
Status: merged


 
For Flink 1.1.5:

(1) https://issues.apache.org/jira/browse/FLINK-5701:
Async exceptions in the FlinkKafkaProducer are not checked on checkpoints. This compromises the producer’s at-least-once guarantee.
Status: This is already merged for 1.2.1. I would personally like to backport the fix for this to 1.1.5 also.

(2) https://issues.apache.org/jira/browse/FLINK-6006:
Kafka Consumer can lose state if queried partition list is incomplete on restore.
Status: PR - https://github.com/apache/flink/pull/3507, one +1 already

(3) https://issues.apache.org/jira/browse/FLINK-6025:
KryoSerializer may use the wrong classloader when Kryo’s JavaSerializer is used.
Status: merged

(4) https://issues.apache.org/jira/browse/FLINK-5771:
Fix multi-char delimiters in Batch InputFormats.
Status: merged

(5) https://issues.apache.org/jira/browse/FLINK-5934:
Set the Scheduler in the ExecutionGraph via its constructor. This fixes a bug that causes HA recovery to fail.
Status: merged

(6) https://issues.apache.org/jira/browse/FLINK-5048:
Kafka Consumer (0.9/0.10) threading model leads problematic cancellation behavior.
Status: This fix was already released in 1.2.0, but never made it into the 1.1.x bugfixes. Do we want to backport this also for 1.1.5?


What do you think? From the list so far, we pretty much already have everything in, so I think it would be nice to aim for RCs by the end of this week.
Since both bugfix releases cover almost the same list of issues, I think it shouldn’t be too hard for us to kick off both bugfix releases around the same time.

Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” / “1.1.5” as the Fix Versions, and are still open.
We should probably want to check if there’s anything on there that we should block on for the releases:

For 1.2.1:
https://issues.apache.org/jira/browse/FLINK-5711?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.2.1

For 1.1.5:
https://issues.apache.org/jira/browse/FLINK-6006?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.1.5
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Till Rohrmann
Thanks for kicking off the discussion Tzu-Li. I'd like to add the following
issues which have already been merged into the 1.2-release and 1.1-release
branch:

1.2.1:

(7) https://issues.apache.org/jira/browse/FLINK-5942
Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
Corrupted checkpoints will now be skipped.
Status: Merged

(8) https://issues.apache.org/jira/browse/FLINK-5940
Hardens the checkpoint recovery in case that we cannot retrieve the
completed checkpoint from the meta data state handle retrieved from
ZooKeeper. This can, for example, happen if the meta data is deleted.
Checkpoints with unretrievable state handles are skipped.
Status: Merged

1.1.5:


(7) https://issues.apache.org/jira/browse/FLINK-5942
Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
Corrupted checkpoints will now be skipped.
Status: Merged

(8) https://issues.apache.org/jira/browse/FLINK-5940
Hardens the checkpoint recovery in case that we cannot retrieve the
completed checkpoint from the meta data state handle retrieved from
ZooKeeper. This can, for example, happen if the meta data is deleted.
Checkpoints with unretrievable state handles are skipped.
Status: Merged

Cheers,
Till

On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <[hidden email]>
wrote:

> Hi all!
>
> I would like to start a discussion for the next bugfix release for 1.1.x
> and 1.2.x.
> There’s been quite a few critical fixes for bugs in both the releases
> recently, and I think they deserve a bugfix release soon.
> Most of the bugs were reported by users.
>
> I’m starting the discussion for both bugfix releases because most fixes
> span both releases (almost identical).
> Of course, the actual RC votes and RC creation process doesn’t have to be
> started together.
>
> Here’s an overview of what’s been collected so far, for both bugfix
> releases -
> (it’s a list of what I’m aware of so far, and may be missing stuff; please
> append and bring to attention as necessary :-) )
>
>
> For Flink 1.2.1:
>
> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> Async exceptions in the FlinkKafkaProducer are not checked on checkpoints.
> This compromises the producer’s at-least-once guarantee.
> Status: merged
>
> (2) https://issues.apache.org/jira/browse/FLINK-5949:
> Do not check Kerberos credentials for non-Kerberos authentications. MapR
> users are affected by this, and cannot submit Flink on YARN jobs on a
> secured MapR cluster.
> Status: PR - https://github.com/apache/flink/pull/3528, one +1 already
>
> (3) https://issues.apache.org/jira/browse/FLINK-6006:
> Kafka Consumer can lose state if queried partition list is incomplete on
> restore.
> Status: PR - https://github.com/apache/flink/pull/3505, one +1 already
>
> (4) https://issues.apache.org/jira/browse/FLINK-6025:
> KryoSerializer may use the wrong classloader when Kryo’s JavaSerializer is
> used.
> Status: merged
>
> (5) https://issues.apache.org/jira/browse/FLINK-5771:
> Fix multi-char delimiters in Batch InputFormats.
> Status: merged
>
> (6) https://issues.apache.org/jira/browse/FLINK-5934:
> Set the Scheduler in the ExecutionGraph via its constructor. This fixes a
> bug that causes HA recovery to fail.
> Status: merged
>
>
>
> For Flink 1.1.5:
>
> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> Async exceptions in the FlinkKafkaProducer are not checked on checkpoints.
> This compromises the producer’s at-least-once guarantee.
> Status: This is already merged for 1.2.1. I would personally like to
> backport the fix for this to 1.1.5 also.
>
> (2) https://issues.apache.org/jira/browse/FLINK-6006:
> Kafka Consumer can lose state if queried partition list is incomplete on
> restore.
> Status: PR - https://github.com/apache/flink/pull/3507, one +1 already
>
> (3) https://issues.apache.org/jira/browse/FLINK-6025:
> KryoSerializer may use the wrong classloader when Kryo’s JavaSerializer is
> used.
> Status: merged
>
> (4) https://issues.apache.org/jira/browse/FLINK-5771:
> Fix multi-char delimiters in Batch InputFormats.
> Status: merged
>
> (5) https://issues.apache.org/jira/browse/FLINK-5934:
> Set the Scheduler in the ExecutionGraph via its constructor. This fixes a
> bug that causes HA recovery to fail.
> Status: merged
>
> (6) https://issues.apache.org/jira/browse/FLINK-5048:
> Kafka Consumer (0.9/0.10) threading model leads problematic cancellation
> behavior.
> Status: This fix was already released in 1.2.0, but never made it into the
> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?
>
>
> What do you think? From the list so far, we pretty much already have
> everything in, so I think it would be nice to aim for RCs by the end of
> this week.
> Since both bugfix releases cover almost the same list of issues, I think
> it shouldn’t be too hard for us to kick off both bugfix releases around the
> same time.
>
> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” / “1.1.5”
> as the Fix Versions, and are still open.
> We should probably want to check if there’s anything on there that we
> should block on for the releases:
>
> For 1.2.1:
> https://issues.apache.org/jira/browse/FLINK-5711?jql=
> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.2.1
>
> For 1.1.5:
> https://issues.apache.org/jira/browse/FLINK-6006?jql=
> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.1.5
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Stefan Richter
Hi,

I would suggest to also include in 1.2.1:

(9) https://issues.apache.org/jira/browse/FLINK-6044 <https://issues.apache.org/jira/browse/FLINK-6044>
Replaces unintentional calls to InputStream#read(…) with the intended
and correct InputStream#readFully(…)
Status: PR

(10) https://issues.apache.org/jira/browse/FLINK-5985 <https://issues.apache.org/jira/browse/FLINK-5985>
Flink 1.2 was creating state handles for stateless tasks which caused trouble
at restore time for users that wanted to do some changes that only include
stateless operators to their topology.
Status: PR


> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <[hidden email]>:
>
> Thanks for kicking off the discussion Tzu-Li. I'd like to add the following
> issues which have already been merged into the 1.2-release and 1.1-release
> branch:
>
> 1.2.1:
>
> (7) https://issues.apache.org/jira/browse/FLINK-5942
> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
> Corrupted checkpoints will now be skipped.
> Status: Merged
>
> (8) https://issues.apache.org/jira/browse/FLINK-5940
> Hardens the checkpoint recovery in case that we cannot retrieve the
> completed checkpoint from the meta data state handle retrieved from
> ZooKeeper. This can, for example, happen if the meta data is deleted.
> Checkpoints with unretrievable state handles are skipped.
> Status: Merged
>
> 1.1.5:
>
>
> (7) https://issues.apache.org/jira/browse/FLINK-5942
> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
> Corrupted checkpoints will now be skipped.
> Status: Merged
>
> (8) https://issues.apache.org/jira/browse/FLINK-5940
> Hardens the checkpoint recovery in case that we cannot retrieve the
> completed checkpoint from the meta data state handle retrieved from
> ZooKeeper. This can, for example, happen if the meta data is deleted.
> Checkpoints with unretrievable state handles are skipped.
> Status: Merged
>
> Cheers,
> Till
>
> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <[hidden email]>
> wrote:
>
>> Hi all!
>>
>> I would like to start a discussion for the next bugfix release for 1.1.x
>> and 1.2.x.
>> There’s been quite a few critical fixes for bugs in both the releases
>> recently, and I think they deserve a bugfix release soon.
>> Most of the bugs were reported by users.
>>
>> I’m starting the discussion for both bugfix releases because most fixes
>> span both releases (almost identical).
>> Of course, the actual RC votes and RC creation process doesn’t have to be
>> started together.
>>
>> Here’s an overview of what’s been collected so far, for both bugfix
>> releases -
>> (it’s a list of what I’m aware of so far, and may be missing stuff; please
>> append and bring to attention as necessary :-) )
>>
>>
>> For Flink 1.2.1:
>>
>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
>> Async exceptions in the FlinkKafkaProducer are not checked on checkpoints.
>> This compromises the producer’s at-least-once guarantee.
>> Status: merged
>>
>> (2) https://issues.apache.org/jira/browse/FLINK-5949:
>> Do not check Kerberos credentials for non-Kerberos authentications. MapR
>> users are affected by this, and cannot submit Flink on YARN jobs on a
>> secured MapR cluster.
>> Status: PR - https://github.com/apache/flink/pull/3528, one +1 already
>>
>> (3) https://issues.apache.org/jira/browse/FLINK-6006:
>> Kafka Consumer can lose state if queried partition list is incomplete on
>> restore.
>> Status: PR - https://github.com/apache/flink/pull/3505, one +1 already
>>
>> (4) https://issues.apache.org/jira/browse/FLINK-6025:
>> KryoSerializer may use the wrong classloader when Kryo’s JavaSerializer is
>> used.
>> Status: merged
>>
>> (5) https://issues.apache.org/jira/browse/FLINK-5771:
>> Fix multi-char delimiters in Batch InputFormats.
>> Status: merged
>>
>> (6) https://issues.apache.org/jira/browse/FLINK-5934:
>> Set the Scheduler in the ExecutionGraph via its constructor. This fixes a
>> bug that causes HA recovery to fail.
>> Status: merged
>>
>>
>>
>> For Flink 1.1.5:
>>
>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
>> Async exceptions in the FlinkKafkaProducer are not checked on checkpoints.
>> This compromises the producer’s at-least-once guarantee.
>> Status: This is already merged for 1.2.1. I would personally like to
>> backport the fix for this to 1.1.5 also.
>>
>> (2) https://issues.apache.org/jira/browse/FLINK-6006:
>> Kafka Consumer can lose state if queried partition list is incomplete on
>> restore.
>> Status: PR - https://github.com/apache/flink/pull/3507, one +1 already
>>
>> (3) https://issues.apache.org/jira/browse/FLINK-6025:
>> KryoSerializer may use the wrong classloader when Kryo’s JavaSerializer is
>> used.
>> Status: merged
>>
>> (4) https://issues.apache.org/jira/browse/FLINK-5771:
>> Fix multi-char delimiters in Batch InputFormats.
>> Status: merged
>>
>> (5) https://issues.apache.org/jira/browse/FLINK-5934:
>> Set the Scheduler in the ExecutionGraph via its constructor. This fixes a
>> bug that causes HA recovery to fail.
>> Status: merged
>>
>> (6) https://issues.apache.org/jira/browse/FLINK-5048:
>> Kafka Consumer (0.9/0.10) threading model leads problematic cancellation
>> behavior.
>> Status: This fix was already released in 1.2.0, but never made it into the
>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?
>>
>>
>> What do you think? From the list so far, we pretty much already have
>> everything in, so I think it would be nice to aim for RCs by the end of
>> this week.
>> Since both bugfix releases cover almost the same list of issues, I think
>> it shouldn’t be too hard for us to kick off both bugfix releases around the
>> same time.
>>
>> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” / “1.1.5”
>> as the Fix Versions, and are still open.
>> We should probably want to check if there’s anything on there that we
>> should block on for the releases:
>>
>> For 1.2.1:
>> https://issues.apache.org/jira/browse/FLINK-5711?jql=
>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.2.1
>>
>> For 1.1.5:
>> https://issues.apache.org/jira/browse/FLINK-6006?jql=
>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.1.5

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Aljoscha Krettek-2
In reply to this post by Till Rohrmann
Thanks for kicking off the discussion!

I have an open PR for
 - https://issues.apache.org/jira/browse/FLINK-5808: Missing
 verification for setParallelism and setMaxParallelism
 - https://issues.apache.org/jira/browse/FLINK-5713: Protect against NPE
 in WindowOperator window cleanup

On Tue, Mar 14, 2017, at 15:15, Till Rohrmann wrote:

> Thanks for kicking off the discussion Tzu-Li. I'd like to add the
> following
> issues which have already been merged into the 1.2-release and
> 1.1-release
> branch:
>
> 1.2.1:
>
> (7) https://issues.apache.org/jira/browse/FLINK-5942
> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
> Corrupted checkpoints will now be skipped.
> Status: Merged
>
> (8) https://issues.apache.org/jira/browse/FLINK-5940
> Hardens the checkpoint recovery in case that we cannot retrieve the
> completed checkpoint from the meta data state handle retrieved from
> ZooKeeper. This can, for example, happen if the meta data is deleted.
> Checkpoints with unretrievable state handles are skipped.
> Status: Merged
>
> 1.1.5:
>
>
> (7) https://issues.apache.org/jira/browse/FLINK-5942
> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
> Corrupted checkpoints will now be skipped.
> Status: Merged
>
> (8) https://issues.apache.org/jira/browse/FLINK-5940
> Hardens the checkpoint recovery in case that we cannot retrieve the
> completed checkpoint from the meta data state handle retrieved from
> ZooKeeper. This can, for example, happen if the meta data is deleted.
> Checkpoints with unretrievable state handles are skipped.
> Status: Merged
>
> Cheers,
> Till
>
> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai
> <[hidden email]>
> wrote:
>
> > Hi all!
> >
> > I would like to start a discussion for the next bugfix release for 1.1.x
> > and 1.2.x.
> > There’s been quite a few critical fixes for bugs in both the releases
> > recently, and I think they deserve a bugfix release soon.
> > Most of the bugs were reported by users.
> >
> > I’m starting the discussion for both bugfix releases because most fixes
> > span both releases (almost identical).
> > Of course, the actual RC votes and RC creation process doesn’t have to be
> > started together.
> >
> > Here’s an overview of what’s been collected so far, for both bugfix
> > releases -
> > (it’s a list of what I’m aware of so far, and may be missing stuff; please
> > append and bring to attention as necessary :-) )
> >
> >
> > For Flink 1.2.1:
> >
> > (1) https://issues.apache.org/jira/browse/FLINK-5701:
> > Async exceptions in the FlinkKafkaProducer are not checked on checkpoints.
> > This compromises the producer’s at-least-once guarantee.
> > Status: merged
> >
> > (2) https://issues.apache.org/jira/browse/FLINK-5949:
> > Do not check Kerberos credentials for non-Kerberos authentications. MapR
> > users are affected by this, and cannot submit Flink on YARN jobs on a
> > secured MapR cluster.
> > Status: PR - https://github.com/apache/flink/pull/3528, one +1 already
> >
> > (3) https://issues.apache.org/jira/browse/FLINK-6006:
> > Kafka Consumer can lose state if queried partition list is incomplete on
> > restore.
> > Status: PR - https://github.com/apache/flink/pull/3505, one +1 already
> >
> > (4) https://issues.apache.org/jira/browse/FLINK-6025:
> > KryoSerializer may use the wrong classloader when Kryo’s JavaSerializer is
> > used.
> > Status: merged
> >
> > (5) https://issues.apache.org/jira/browse/FLINK-5771:
> > Fix multi-char delimiters in Batch InputFormats.
> > Status: merged
> >
> > (6) https://issues.apache.org/jira/browse/FLINK-5934:
> > Set the Scheduler in the ExecutionGraph via its constructor. This fixes a
> > bug that causes HA recovery to fail.
> > Status: merged
> >
> >
> >
> > For Flink 1.1.5:
> >
> > (1) https://issues.apache.org/jira/browse/FLINK-5701:
> > Async exceptions in the FlinkKafkaProducer are not checked on checkpoints.
> > This compromises the producer’s at-least-once guarantee.
> > Status: This is already merged for 1.2.1. I would personally like to
> > backport the fix for this to 1.1.5 also.
> >
> > (2) https://issues.apache.org/jira/browse/FLINK-6006:
> > Kafka Consumer can lose state if queried partition list is incomplete on
> > restore.
> > Status: PR - https://github.com/apache/flink/pull/3507, one +1 already
> >
> > (3) https://issues.apache.org/jira/browse/FLINK-6025:
> > KryoSerializer may use the wrong classloader when Kryo’s JavaSerializer is
> > used.
> > Status: merged
> >
> > (4) https://issues.apache.org/jira/browse/FLINK-5771:
> > Fix multi-char delimiters in Batch InputFormats.
> > Status: merged
> >
> > (5) https://issues.apache.org/jira/browse/FLINK-5934:
> > Set the Scheduler in the ExecutionGraph via its constructor. This fixes a
> > bug that causes HA recovery to fail.
> > Status: merged
> >
> > (6) https://issues.apache.org/jira/browse/FLINK-5048:
> > Kafka Consumer (0.9/0.10) threading model leads problematic cancellation
> > behavior.
> > Status: This fix was already released in 1.2.0, but never made it into the
> > 1.1.x bugfixes. Do we want to backport this also for 1.1.5?
> >
> >
> > What do you think? From the list so far, we pretty much already have
> > everything in, so I think it would be nice to aim for RCs by the end of
> > this week.
> > Since both bugfix releases cover almost the same list of issues, I think
> > it shouldn’t be too hard for us to kick off both bugfix releases around the
> > same time.
> >
> > Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” / “1.1.5”
> > as the Fix Versions, and are still open.
> > We should probably want to check if there’s anything on there that we
> > should block on for the releases:
> >
> > For 1.2.1:
> > https://issues.apache.org/jira/browse/FLINK-5711?jql=
> > project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> > 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.2.1
> >
> > For 1.1.5:
> > https://issues.apache.org/jira/browse/FLINK-6006?jql=
> > project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> > 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.1.5
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Ufuk Celebi-2
In reply to this post by Stefan Richter
Big +1 Gordon!

I think (10) is very critical to have in 1.2.1.

– Ufuk


On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter
<[hidden email]> wrote:

> Hi,
>
> I would suggest to also include in 1.2.1:
>
> (9) https://issues.apache.org/jira/browse/FLINK-6044 <https://issues.apache.org/jira/browse/FLINK-6044>
> Replaces unintentional calls to InputStream#read(…) with the intended
> and correct InputStream#readFully(…)
> Status: PR
>
> (10) https://issues.apache.org/jira/browse/FLINK-5985 <https://issues.apache.org/jira/browse/FLINK-5985>
> Flink 1.2 was creating state handles for stateless tasks which caused trouble
> at restore time for users that wanted to do some changes that only include
> stateless operators to their topology.
> Status: PR
>
>
>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <[hidden email]>:
>>
>> Thanks for kicking off the discussion Tzu-Li. I'd like to add the following
>> issues which have already been merged into the 1.2-release and 1.1-release
>> branch:
>>
>> 1.2.1:
>>
>> (7) https://issues.apache.org/jira/browse/FLINK-5942
>> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
>> Corrupted checkpoints will now be skipped.
>> Status: Merged
>>
>> (8) https://issues.apache.org/jira/browse/FLINK-5940
>> Hardens the checkpoint recovery in case that we cannot retrieve the
>> completed checkpoint from the meta data state handle retrieved from
>> ZooKeeper. This can, for example, happen if the meta data is deleted.
>> Checkpoints with unretrievable state handles are skipped.
>> Status: Merged
>>
>> 1.1.5:
>>
>>
>> (7) https://issues.apache.org/jira/browse/FLINK-5942
>> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
>> Corrupted checkpoints will now be skipped.
>> Status: Merged
>>
>> (8) https://issues.apache.org/jira/browse/FLINK-5940
>> Hardens the checkpoint recovery in case that we cannot retrieve the
>> completed checkpoint from the meta data state handle retrieved from
>> ZooKeeper. This can, for example, happen if the meta data is deleted.
>> Checkpoints with unretrievable state handles are skipped.
>> Status: Merged
>>
>> Cheers,
>> Till
>>
>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <[hidden email]>
>> wrote:
>>
>>> Hi all!
>>>
>>> I would like to start a discussion for the next bugfix release for 1.1.x
>>> and 1.2.x.
>>> There’s been quite a few critical fixes for bugs in both the releases
>>> recently, and I think they deserve a bugfix release soon.
>>> Most of the bugs were reported by users.
>>>
>>> I’m starting the discussion for both bugfix releases because most fixes
>>> span both releases (almost identical).
>>> Of course, the actual RC votes and RC creation process doesn’t have to be
>>> started together.
>>>
>>> Here’s an overview of what’s been collected so far, for both bugfix
>>> releases -
>>> (it’s a list of what I’m aware of so far, and may be missing stuff; please
>>> append and bring to attention as necessary :-) )
>>>
>>>
>>> For Flink 1.2.1:
>>>
>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
>>> Async exceptions in the FlinkKafkaProducer are not checked on checkpoints.
>>> This compromises the producer’s at-least-once guarantee.
>>> Status: merged
>>>
>>> (2) https://issues.apache.org/jira/browse/FLINK-5949:
>>> Do not check Kerberos credentials for non-Kerberos authentications. MapR
>>> users are affected by this, and cannot submit Flink on YARN jobs on a
>>> secured MapR cluster.
>>> Status: PR - https://github.com/apache/flink/pull/3528, one +1 already
>>>
>>> (3) https://issues.apache.org/jira/browse/FLINK-6006:
>>> Kafka Consumer can lose state if queried partition list is incomplete on
>>> restore.
>>> Status: PR - https://github.com/apache/flink/pull/3505, one +1 already
>>>
>>> (4) https://issues.apache.org/jira/browse/FLINK-6025:
>>> KryoSerializer may use the wrong classloader when Kryo’s JavaSerializer is
>>> used.
>>> Status: merged
>>>
>>> (5) https://issues.apache.org/jira/browse/FLINK-5771:
>>> Fix multi-char delimiters in Batch InputFormats.
>>> Status: merged
>>>
>>> (6) https://issues.apache.org/jira/browse/FLINK-5934:
>>> Set the Scheduler in the ExecutionGraph via its constructor. This fixes a
>>> bug that causes HA recovery to fail.
>>> Status: merged
>>>
>>>
>>>
>>> For Flink 1.1.5:
>>>
>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
>>> Async exceptions in the FlinkKafkaProducer are not checked on checkpoints.
>>> This compromises the producer’s at-least-once guarantee.
>>> Status: This is already merged for 1.2.1. I would personally like to
>>> backport the fix for this to 1.1.5 also.
>>>
>>> (2) https://issues.apache.org/jira/browse/FLINK-6006:
>>> Kafka Consumer can lose state if queried partition list is incomplete on
>>> restore.
>>> Status: PR - https://github.com/apache/flink/pull/3507, one +1 already
>>>
>>> (3) https://issues.apache.org/jira/browse/FLINK-6025:
>>> KryoSerializer may use the wrong classloader when Kryo’s JavaSerializer is
>>> used.
>>> Status: merged
>>>
>>> (4) https://issues.apache.org/jira/browse/FLINK-5771:
>>> Fix multi-char delimiters in Batch InputFormats.
>>> Status: merged
>>>
>>> (5) https://issues.apache.org/jira/browse/FLINK-5934:
>>> Set the Scheduler in the ExecutionGraph via its constructor. This fixes a
>>> bug that causes HA recovery to fail.
>>> Status: merged
>>>
>>> (6) https://issues.apache.org/jira/browse/FLINK-5048:
>>> Kafka Consumer (0.9/0.10) threading model leads problematic cancellation
>>> behavior.
>>> Status: This fix was already released in 1.2.0, but never made it into the
>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?
>>>
>>>
>>> What do you think? From the list so far, we pretty much already have
>>> everything in, so I think it would be nice to aim for RCs by the end of
>>> this week.
>>> Since both bugfix releases cover almost the same list of issues, I think
>>> it shouldn’t be too hard for us to kick off both bugfix releases around the
>>> same time.
>>>
>>> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” / “1.1.5”
>>> as the Fix Versions, and are still open.
>>> We should probably want to check if there’s anything on there that we
>>> should block on for the releases:
>>>
>>> For 1.2.1:
>>> https://issues.apache.org/jira/browse/FLINK-5711?jql=
>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
>>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.2.1
>>>
>>> For 1.1.5:
>>> https://issues.apache.org/jira/browse/FLINK-6006?jql=
>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
>>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.1.5
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Vladislav Pernin
Hi,

I would also include the following (not yet resolved) issue in the 1.2.1
scope :

https://issues.apache.org/jira/browse/FLINK-6001
NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and
allowedLateness

2017-03-14 17:34 GMT+01:00 Ufuk Celebi <[hidden email]>:

> Big +1 Gordon!
>
> I think (10) is very critical to have in 1.2.1.
>
> – Ufuk
>
>
> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter
> <[hidden email]> wrote:
> > Hi,
> >
> > I would suggest to also include in 1.2.1:
> >
> > (9) https://issues.apache.org/jira/browse/FLINK-6044 <
> https://issues.apache.org/jira/browse/FLINK-6044>
> > Replaces unintentional calls to InputStream#read(…) with the intended
> > and correct InputStream#readFully(…)
> > Status: PR
> >
> > (10) https://issues.apache.org/jira/browse/FLINK-5985 <
> https://issues.apache.org/jira/browse/FLINK-5985>
> > Flink 1.2 was creating state handles for stateless tasks which caused
> trouble
> > at restore time for users that wanted to do some changes that only
> include
> > stateless operators to their topology.
> > Status: PR
> >
> >
> >> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <[hidden email]>:
> >>
> >> Thanks for kicking off the discussion Tzu-Li. I'd like to add the
> following
> >> issues which have already been merged into the 1.2-release and
> 1.1-release
> >> branch:
> >>
> >> 1.2.1:
> >>
> >> (7) https://issues.apache.org/jira/browse/FLINK-5942
> >> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
> >> Corrupted checkpoints will now be skipped.
> >> Status: Merged
> >>
> >> (8) https://issues.apache.org/jira/browse/FLINK-5940
> >> Hardens the checkpoint recovery in case that we cannot retrieve the
> >> completed checkpoint from the meta data state handle retrieved from
> >> ZooKeeper. This can, for example, happen if the meta data is deleted.
> >> Checkpoints with unretrievable state handles are skipped.
> >> Status: Merged
> >>
> >> 1.1.5:
> >>
> >>
> >> (7) https://issues.apache.org/jira/browse/FLINK-5942
> >> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
> >> Corrupted checkpoints will now be skipped.
> >> Status: Merged
> >>
> >> (8) https://issues.apache.org/jira/browse/FLINK-5940
> >> Hardens the checkpoint recovery in case that we cannot retrieve the
> >> completed checkpoint from the meta data state handle retrieved from
> >> ZooKeeper. This can, for example, happen if the meta data is deleted.
> >> Checkpoints with unretrievable state handles are skipped.
> >> Status: Merged
> >>
> >> Cheers,
> >> Till
> >>
> >> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <
> [hidden email]>
> >> wrote:
> >>
> >>> Hi all!
> >>>
> >>> I would like to start a discussion for the next bugfix release for
> 1.1.x
> >>> and 1.2.x.
> >>> There’s been quite a few critical fixes for bugs in both the releases
> >>> recently, and I think they deserve a bugfix release soon.
> >>> Most of the bugs were reported by users.
> >>>
> >>> I’m starting the discussion for both bugfix releases because most fixes
> >>> span both releases (almost identical).
> >>> Of course, the actual RC votes and RC creation process doesn’t have to
> be
> >>> started together.
> >>>
> >>> Here’s an overview of what’s been collected so far, for both bugfix
> >>> releases -
> >>> (it’s a list of what I’m aware of so far, and may be missing stuff;
> please
> >>> append and bring to attention as necessary :-) )
> >>>
> >>>
> >>> For Flink 1.2.1:
> >>>
> >>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> >>> Async exceptions in the FlinkKafkaProducer are not checked on
> checkpoints.
> >>> This compromises the producer’s at-least-once guarantee.
> >>> Status: merged
> >>>
> >>> (2) https://issues.apache.org/jira/browse/FLINK-5949:
> >>> Do not check Kerberos credentials for non-Kerberos authentications.
> MapR
> >>> users are affected by this, and cannot submit Flink on YARN jobs on a
> >>> secured MapR cluster.
> >>> Status: PR - https://github.com/apache/flink/pull/3528, one +1 already
> >>>
> >>> (3) https://issues.apache.org/jira/browse/FLINK-6006:
> >>> Kafka Consumer can lose state if queried partition list is incomplete
> on
> >>> restore.
> >>> Status: PR - https://github.com/apache/flink/pull/3505, one +1 already
> >>>
> >>> (4) https://issues.apache.org/jira/browse/FLINK-6025:
> >>> KryoSerializer may use the wrong classloader when Kryo’s
> JavaSerializer is
> >>> used.
> >>> Status: merged
> >>>
> >>> (5) https://issues.apache.org/jira/browse/FLINK-5771:
> >>> Fix multi-char delimiters in Batch InputFormats.
> >>> Status: merged
> >>>
> >>> (6) https://issues.apache.org/jira/browse/FLINK-5934:
> >>> Set the Scheduler in the ExecutionGraph via its constructor. This
> fixes a
> >>> bug that causes HA recovery to fail.
> >>> Status: merged
> >>>
> >>>
> >>>
> >>> For Flink 1.1.5:
> >>>
> >>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> >>> Async exceptions in the FlinkKafkaProducer are not checked on
> checkpoints.
> >>> This compromises the producer’s at-least-once guarantee.
> >>> Status: This is already merged for 1.2.1. I would personally like to
> >>> backport the fix for this to 1.1.5 also.
> >>>
> >>> (2) https://issues.apache.org/jira/browse/FLINK-6006:
> >>> Kafka Consumer can lose state if queried partition list is incomplete
> on
> >>> restore.
> >>> Status: PR - https://github.com/apache/flink/pull/3507, one +1 already
> >>>
> >>> (3) https://issues.apache.org/jira/browse/FLINK-6025:
> >>> KryoSerializer may use the wrong classloader when Kryo’s
> JavaSerializer is
> >>> used.
> >>> Status: merged
> >>>
> >>> (4) https://issues.apache.org/jira/browse/FLINK-5771:
> >>> Fix multi-char delimiters in Batch InputFormats.
> >>> Status: merged
> >>>
> >>> (5) https://issues.apache.org/jira/browse/FLINK-5934:
> >>> Set the Scheduler in the ExecutionGraph via its constructor. This
> fixes a
> >>> bug that causes HA recovery to fail.
> >>> Status: merged
> >>>
> >>> (6) https://issues.apache.org/jira/browse/FLINK-5048:
> >>> Kafka Consumer (0.9/0.10) threading model leads problematic
> cancellation
> >>> behavior.
> >>> Status: This fix was already released in 1.2.0, but never made it into
> the
> >>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?
> >>>
> >>>
> >>> What do you think? From the list so far, we pretty much already have
> >>> everything in, so I think it would be nice to aim for RCs by the end of
> >>> this week.
> >>> Since both bugfix releases cover almost the same list of issues, I
> think
> >>> it shouldn’t be too hard for us to kick off both bugfix releases
> around the
> >>> same time.
> >>>
> >>> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” /
> “1.1.5”
> >>> as the Fix Versions, and are still open.
> >>> We should probably want to check if there’s anything on there that we
> >>> should block on for the releases:
> >>>
> >>> For 1.2.1:
> >>> https://issues.apache.org/jira/browse/FLINK-5711?jql=
> >>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> >>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.2.1
> >>>
> >>> For 1.1.5:
> >>> https://issues.apache.org/jira/browse/FLINK-6006?jql=
> >>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> >>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.1.5
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Aljoscha Krettek-2
I did in fact just open a PR for
> https://issues.apache.org/jira/browse/FLINK-6001
> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and
> allowedLateness


On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote:

> Hi,
>
> I would also include the following (not yet resolved) issue in the 1.2.1
> scope :
>
> https://issues.apache.org/jira/browse/FLINK-6001
> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and
> allowedLateness
>
> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <[hidden email]>:
>
> > Big +1 Gordon!
> >
> > I think (10) is very critical to have in 1.2.1.
> >
> > – Ufuk
> >
> >
> > On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter
> > <[hidden email]> wrote:
> > > Hi,
> > >
> > > I would suggest to also include in 1.2.1:
> > >
> > > (9) https://issues.apache.org/jira/browse/FLINK-6044 <
> > https://issues.apache.org/jira/browse/FLINK-6044>
> > > Replaces unintentional calls to InputStream#read(…) with the intended
> > > and correct InputStream#readFully(…)
> > > Status: PR
> > >
> > > (10) https://issues.apache.org/jira/browse/FLINK-5985 <
> > https://issues.apache.org/jira/browse/FLINK-5985>
> > > Flink 1.2 was creating state handles for stateless tasks which caused
> > trouble
> > > at restore time for users that wanted to do some changes that only
> > include
> > > stateless operators to their topology.
> > > Status: PR
> > >
> > >
> > >> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <[hidden email]>:
> > >>
> > >> Thanks for kicking off the discussion Tzu-Li. I'd like to add the
> > following
> > >> issues which have already been merged into the 1.2-release and
> > 1.1-release
> > >> branch:
> > >>
> > >> 1.2.1:
> > >>
> > >> (7) https://issues.apache.org/jira/browse/FLINK-5942
> > >> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
> > >> Corrupted checkpoints will now be skipped.
> > >> Status: Merged
> > >>
> > >> (8) https://issues.apache.org/jira/browse/FLINK-5940
> > >> Hardens the checkpoint recovery in case that we cannot retrieve the
> > >> completed checkpoint from the meta data state handle retrieved from
> > >> ZooKeeper. This can, for example, happen if the meta data is deleted.
> > >> Checkpoints with unretrievable state handles are skipped.
> > >> Status: Merged
> > >>
> > >> 1.1.5:
> > >>
> > >>
> > >> (7) https://issues.apache.org/jira/browse/FLINK-5942
> > >> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
> > >> Corrupted checkpoints will now be skipped.
> > >> Status: Merged
> > >>
> > >> (8) https://issues.apache.org/jira/browse/FLINK-5940
> > >> Hardens the checkpoint recovery in case that we cannot retrieve the
> > >> completed checkpoint from the meta data state handle retrieved from
> > >> ZooKeeper. This can, for example, happen if the meta data is deleted.
> > >> Checkpoints with unretrievable state handles are skipped.
> > >> Status: Merged
> > >>
> > >> Cheers,
> > >> Till
> > >>
> > >> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <
> > [hidden email]>
> > >> wrote:
> > >>
> > >>> Hi all!
> > >>>
> > >>> I would like to start a discussion for the next bugfix release for
> > 1.1.x
> > >>> and 1.2.x.
> > >>> There’s been quite a few critical fixes for bugs in both the releases
> > >>> recently, and I think they deserve a bugfix release soon.
> > >>> Most of the bugs were reported by users.
> > >>>
> > >>> I’m starting the discussion for both bugfix releases because most fixes
> > >>> span both releases (almost identical).
> > >>> Of course, the actual RC votes and RC creation process doesn’t have to
> > be
> > >>> started together.
> > >>>
> > >>> Here’s an overview of what’s been collected so far, for both bugfix
> > >>> releases -
> > >>> (it’s a list of what I’m aware of so far, and may be missing stuff;
> > please
> > >>> append and bring to attention as necessary :-) )
> > >>>
> > >>>
> > >>> For Flink 1.2.1:
> > >>>
> > >>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> > >>> Async exceptions in the FlinkKafkaProducer are not checked on
> > checkpoints.
> > >>> This compromises the producer’s at-least-once guarantee.
> > >>> Status: merged
> > >>>
> > >>> (2) https://issues.apache.org/jira/browse/FLINK-5949:
> > >>> Do not check Kerberos credentials for non-Kerberos authentications.
> > MapR
> > >>> users are affected by this, and cannot submit Flink on YARN jobs on a
> > >>> secured MapR cluster.
> > >>> Status: PR - https://github.com/apache/flink/pull/3528, one +1 already
> > >>>
> > >>> (3) https://issues.apache.org/jira/browse/FLINK-6006:
> > >>> Kafka Consumer can lose state if queried partition list is incomplete
> > on
> > >>> restore.
> > >>> Status: PR - https://github.com/apache/flink/pull/3505, one +1 already
> > >>>
> > >>> (4) https://issues.apache.org/jira/browse/FLINK-6025:
> > >>> KryoSerializer may use the wrong classloader when Kryo’s
> > JavaSerializer is
> > >>> used.
> > >>> Status: merged
> > >>>
> > >>> (5) https://issues.apache.org/jira/browse/FLINK-5771:
> > >>> Fix multi-char delimiters in Batch InputFormats.
> > >>> Status: merged
> > >>>
> > >>> (6) https://issues.apache.org/jira/browse/FLINK-5934:
> > >>> Set the Scheduler in the ExecutionGraph via its constructor. This
> > fixes a
> > >>> bug that causes HA recovery to fail.
> > >>> Status: merged
> > >>>
> > >>>
> > >>>
> > >>> For Flink 1.1.5:
> > >>>
> > >>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> > >>> Async exceptions in the FlinkKafkaProducer are not checked on
> > checkpoints.
> > >>> This compromises the producer’s at-least-once guarantee.
> > >>> Status: This is already merged for 1.2.1. I would personally like to
> > >>> backport the fix for this to 1.1.5 also.
> > >>>
> > >>> (2) https://issues.apache.org/jira/browse/FLINK-6006:
> > >>> Kafka Consumer can lose state if queried partition list is incomplete
> > on
> > >>> restore.
> > >>> Status: PR - https://github.com/apache/flink/pull/3507, one +1 already
> > >>>
> > >>> (3) https://issues.apache.org/jira/browse/FLINK-6025:
> > >>> KryoSerializer may use the wrong classloader when Kryo’s
> > JavaSerializer is
> > >>> used.
> > >>> Status: merged
> > >>>
> > >>> (4) https://issues.apache.org/jira/browse/FLINK-5771:
> > >>> Fix multi-char delimiters in Batch InputFormats.
> > >>> Status: merged
> > >>>
> > >>> (5) https://issues.apache.org/jira/browse/FLINK-5934:
> > >>> Set the Scheduler in the ExecutionGraph via its constructor. This
> > fixes a
> > >>> bug that causes HA recovery to fail.
> > >>> Status: merged
> > >>>
> > >>> (6) https://issues.apache.org/jira/browse/FLINK-5048:
> > >>> Kafka Consumer (0.9/0.10) threading model leads problematic
> > cancellation
> > >>> behavior.
> > >>> Status: This fix was already released in 1.2.0, but never made it into
> > the
> > >>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?
> > >>>
> > >>>
> > >>> What do you think? From the list so far, we pretty much already have
> > >>> everything in, so I think it would be nice to aim for RCs by the end of
> > >>> this week.
> > >>> Since both bugfix releases cover almost the same list of issues, I
> > think
> > >>> it shouldn’t be too hard for us to kick off both bugfix releases
> > around the
> > >>> same time.
> > >>>
> > >>> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” /
> > “1.1.5”
> > >>> as the Fix Versions, and are still open.
> > >>> We should probably want to check if there’s anything on there that we
> > >>> should block on for the releases:
> > >>>
> > >>> For 1.2.1:
> > >>> https://issues.apache.org/jira/browse/FLINK-5711?jql=
> > >>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> > >>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.2.1
> > >>>
> > >>> For 1.1.5:
> > >>> https://issues.apache.org/jira/browse/FLINK-6006?jql=
> > >>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> > >>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.1.5
> > >
> >
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Vladislav Pernin
I just tested in in my reproducer. It works.

2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <[hidden email]>:

> I did in fact just open a PR for
> > https://issues.apache.org/jira/browse/FLINK-6001
> > NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and
> > allowedLateness
>
>
> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote:
> > Hi,
> >
> > I would also include the following (not yet resolved) issue in the 1.2.1
> > scope :
> >
> > https://issues.apache.org/jira/browse/FLINK-6001
> > NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and
> > allowedLateness
> >
> > 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <[hidden email]>:
> >
> > > Big +1 Gordon!
> > >
> > > I think (10) is very critical to have in 1.2.1.
> > >
> > > – Ufuk
> > >
> > >
> > > On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter
> > > <[hidden email]> wrote:
> > > > Hi,
> > > >
> > > > I would suggest to also include in 1.2.1:
> > > >
> > > > (9) https://issues.apache.org/jira/browse/FLINK-6044 <
> > > https://issues.apache.org/jira/browse/FLINK-6044>
> > > > Replaces unintentional calls to InputStream#read(…) with the intended
> > > > and correct InputStream#readFully(…)
> > > > Status: PR
> > > >
> > > > (10) https://issues.apache.org/jira/browse/FLINK-5985 <
> > > https://issues.apache.org/jira/browse/FLINK-5985>
> > > > Flink 1.2 was creating state handles for stateless tasks which caused
> > > trouble
> > > > at restore time for users that wanted to do some changes that only
> > > include
> > > > stateless operators to their topology.
> > > > Status: PR
> > > >
> > > >
> > > >> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <[hidden email]
> >:
> > > >>
> > > >> Thanks for kicking off the discussion Tzu-Li. I'd like to add the
> > > following
> > > >> issues which have already been merged into the 1.2-release and
> > > 1.1-release
> > > >> branch:
> > > >>
> > > >> 1.2.1:
> > > >>
> > > >> (7) https://issues.apache.org/jira/browse/FLINK-5942
> > > >> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
> > > >> Corrupted checkpoints will now be skipped.
> > > >> Status: Merged
> > > >>
> > > >> (8) https://issues.apache.org/jira/browse/FLINK-5940
> > > >> Hardens the checkpoint recovery in case that we cannot retrieve the
> > > >> completed checkpoint from the meta data state handle retrieved from
> > > >> ZooKeeper. This can, for example, happen if the meta data is
> deleted.
> > > >> Checkpoints with unretrievable state handles are skipped.
> > > >> Status: Merged
> > > >>
> > > >> 1.1.5:
> > > >>
> > > >>
> > > >> (7) https://issues.apache.org/jira/browse/FLINK-5942
> > > >> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
> > > >> Corrupted checkpoints will now be skipped.
> > > >> Status: Merged
> > > >>
> > > >> (8) https://issues.apache.org/jira/browse/FLINK-5940
> > > >> Hardens the checkpoint recovery in case that we cannot retrieve the
> > > >> completed checkpoint from the meta data state handle retrieved from
> > > >> ZooKeeper. This can, for example, happen if the meta data is
> deleted.
> > > >> Checkpoints with unretrievable state handles are skipped.
> > > >> Status: Merged
> > > >>
> > > >> Cheers,
> > > >> Till
> > > >>
> > > >> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <
> > > [hidden email]>
> > > >> wrote:
> > > >>
> > > >>> Hi all!
> > > >>>
> > > >>> I would like to start a discussion for the next bugfix release for
> > > 1.1.x
> > > >>> and 1.2.x.
> > > >>> There’s been quite a few critical fixes for bugs in both the
> releases
> > > >>> recently, and I think they deserve a bugfix release soon.
> > > >>> Most of the bugs were reported by users.
> > > >>>
> > > >>> I’m starting the discussion for both bugfix releases because most
> fixes
> > > >>> span both releases (almost identical).
> > > >>> Of course, the actual RC votes and RC creation process doesn’t
> have to
> > > be
> > > >>> started together.
> > > >>>
> > > >>> Here’s an overview of what’s been collected so far, for both bugfix
> > > >>> releases -
> > > >>> (it’s a list of what I’m aware of so far, and may be missing stuff;
> > > please
> > > >>> append and bring to attention as necessary :-) )
> > > >>>
> > > >>>
> > > >>> For Flink 1.2.1:
> > > >>>
> > > >>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> > > >>> Async exceptions in the FlinkKafkaProducer are not checked on
> > > checkpoints.
> > > >>> This compromises the producer’s at-least-once guarantee.
> > > >>> Status: merged
> > > >>>
> > > >>> (2) https://issues.apache.org/jira/browse/FLINK-5949:
> > > >>> Do not check Kerberos credentials for non-Kerberos authentications.
> > > MapR
> > > >>> users are affected by this, and cannot submit Flink on YARN jobs
> on a
> > > >>> secured MapR cluster.
> > > >>> Status: PR - https://github.com/apache/flink/pull/3528, one +1
> already
> > > >>>
> > > >>> (3) https://issues.apache.org/jira/browse/FLINK-6006:
> > > >>> Kafka Consumer can lose state if queried partition list is
> incomplete
> > > on
> > > >>> restore.
> > > >>> Status: PR - https://github.com/apache/flink/pull/3505, one +1
> already
> > > >>>
> > > >>> (4) https://issues.apache.org/jira/browse/FLINK-6025:
> > > >>> KryoSerializer may use the wrong classloader when Kryo’s
> > > JavaSerializer is
> > > >>> used.
> > > >>> Status: merged
> > > >>>
> > > >>> (5) https://issues.apache.org/jira/browse/FLINK-5771:
> > > >>> Fix multi-char delimiters in Batch InputFormats.
> > > >>> Status: merged
> > > >>>
> > > >>> (6) https://issues.apache.org/jira/browse/FLINK-5934:
> > > >>> Set the Scheduler in the ExecutionGraph via its constructor. This
> > > fixes a
> > > >>> bug that causes HA recovery to fail.
> > > >>> Status: merged
> > > >>>
> > > >>>
> > > >>>
> > > >>> For Flink 1.1.5:
> > > >>>
> > > >>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> > > >>> Async exceptions in the FlinkKafkaProducer are not checked on
> > > checkpoints.
> > > >>> This compromises the producer’s at-least-once guarantee.
> > > >>> Status: This is already merged for 1.2.1. I would personally like
> to
> > > >>> backport the fix for this to 1.1.5 also.
> > > >>>
> > > >>> (2) https://issues.apache.org/jira/browse/FLINK-6006:
> > > >>> Kafka Consumer can lose state if queried partition list is
> incomplete
> > > on
> > > >>> restore.
> > > >>> Status: PR - https://github.com/apache/flink/pull/3507, one +1
> already
> > > >>>
> > > >>> (3) https://issues.apache.org/jira/browse/FLINK-6025:
> > > >>> KryoSerializer may use the wrong classloader when Kryo’s
> > > JavaSerializer is
> > > >>> used.
> > > >>> Status: merged
> > > >>>
> > > >>> (4) https://issues.apache.org/jira/browse/FLINK-5771:
> > > >>> Fix multi-char delimiters in Batch InputFormats.
> > > >>> Status: merged
> > > >>>
> > > >>> (5) https://issues.apache.org/jira/browse/FLINK-5934:
> > > >>> Set the Scheduler in the ExecutionGraph via its constructor. This
> > > fixes a
> > > >>> bug that causes HA recovery to fail.
> > > >>> Status: merged
> > > >>>
> > > >>> (6) https://issues.apache.org/jira/browse/FLINK-5048:
> > > >>> Kafka Consumer (0.9/0.10) threading model leads problematic
> > > cancellation
> > > >>> behavior.
> > > >>> Status: This fix was already released in 1.2.0, but never made it
> into
> > > the
> > > >>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?
> > > >>>
> > > >>>
> > > >>> What do you think? From the list so far, we pretty much already
> have
> > > >>> everything in, so I think it would be nice to aim for RCs by the
> end of
> > > >>> this week.
> > > >>> Since both bugfix releases cover almost the same list of issues, I
> > > think
> > > >>> it shouldn’t be too hard for us to kick off both bugfix releases
> > > around the
> > > >>> same time.
> > > >>>
> > > >>> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” /
> > > “1.1.5”
> > > >>> as the Fix Versions, and are still open.
> > > >>> We should probably want to check if there’s anything on there that
> we
> > > >>> should block on for the releases:
> > > >>>
> > > >>> For 1.2.1:
> > > >>> https://issues.apache.org/jira/browse/FLINK-5711?jql=
> > > >>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> > > >>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.2.1
> > > >>>
> > > >>> For 1.1.5:
> > > >>> https://issues.apache.org/jira/browse/FLINK-6006?jql=
> > > >>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> > > >>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.1.5
> > > >
> > >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Jinkui Shi
Can we fix this issue in the 1.2.1:

Flink-python tests cost too long time
https://issues.apache.org/jira/browse/FLINK-5650 <https://issues.apache.org/jira/browse/FLINK-5650>

> 在 2017年3月15日,下午6:29,Vladislav Pernin <[hidden email]> 写道:
>
> I just tested in in my reproducer. It works.
>
> 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <[hidden email]>:
>
>> I did in fact just open a PR for
>>> https://issues.apache.org/jira/browse/FLINK-6001
>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and
>>> allowedLateness
>>
>>
>> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote:
>>> Hi,
>>>
>>> I would also include the following (not yet resolved) issue in the 1.2.1
>>> scope :
>>>
>>> https://issues.apache.org/jira/browse/FLINK-6001
>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and
>>> allowedLateness
>>>
>>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <[hidden email]>:
>>>
>>>> Big +1 Gordon!
>>>>
>>>> I think (10) is very critical to have in 1.2.1.
>>>>
>>>> – Ufuk
>>>>
>>>>
>>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter
>>>> <[hidden email]> wrote:
>>>>> Hi,
>>>>>
>>>>> I would suggest to also include in 1.2.1:
>>>>>
>>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044 <
>>>> https://issues.apache.org/jira/browse/FLINK-6044>
>>>>> Replaces unintentional calls to InputStream#read(…) with the intended
>>>>> and correct InputStream#readFully(…)
>>>>> Status: PR
>>>>>
>>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985 <
>>>> https://issues.apache.org/jira/browse/FLINK-5985>
>>>>> Flink 1.2 was creating state handles for stateless tasks which caused
>>>> trouble
>>>>> at restore time for users that wanted to do some changes that only
>>>> include
>>>>> stateless operators to their topology.
>>>>> Status: PR
>>>>>
>>>>>
>>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <[hidden email]
>>> :
>>>>>>
>>>>>> Thanks for kicking off the discussion Tzu-Li. I'd like to add the
>>>> following
>>>>>> issues which have already been merged into the 1.2-release and
>>>> 1.1-release
>>>>>> branch:
>>>>>>
>>>>>> 1.2.1:
>>>>>>
>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
>>>>>> Corrupted checkpoints will now be skipped.
>>>>>> Status: Merged
>>>>>>
>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve the
>>>>>> completed checkpoint from the meta data state handle retrieved from
>>>>>> ZooKeeper. This can, for example, happen if the meta data is
>> deleted.
>>>>>> Checkpoints with unretrievable state handles are skipped.
>>>>>> Status: Merged
>>>>>>
>>>>>> 1.1.5:
>>>>>>
>>>>>>
>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
>>>>>> Corrupted checkpoints will now be skipped.
>>>>>> Status: Merged
>>>>>>
>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve the
>>>>>> completed checkpoint from the meta data state handle retrieved from
>>>>>> ZooKeeper. This can, for example, happen if the meta data is
>> deleted.
>>>>>> Checkpoints with unretrievable state handles are skipped.
>>>>>> Status: Merged
>>>>>>
>>>>>> Cheers,
>>>>>> Till
>>>>>>
>>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <
>>>> [hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi all!
>>>>>>>
>>>>>>> I would like to start a discussion for the next bugfix release for
>>>> 1.1.x
>>>>>>> and 1.2.x.
>>>>>>> There’s been quite a few critical fixes for bugs in both the
>> releases
>>>>>>> recently, and I think they deserve a bugfix release soon.
>>>>>>> Most of the bugs were reported by users.
>>>>>>>
>>>>>>> I’m starting the discussion for both bugfix releases because most
>> fixes
>>>>>>> span both releases (almost identical).
>>>>>>> Of course, the actual RC votes and RC creation process doesn’t
>> have to
>>>> be
>>>>>>> started together.
>>>>>>>
>>>>>>> Here’s an overview of what’s been collected so far, for both bugfix
>>>>>>> releases -
>>>>>>> (it’s a list of what I’m aware of so far, and may be missing stuff;
>>>> please
>>>>>>> append and bring to attention as necessary :-) )
>>>>>>>
>>>>>>>
>>>>>>> For Flink 1.2.1:
>>>>>>>
>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on
>>>> checkpoints.
>>>>>>> This compromises the producer’s at-least-once guarantee.
>>>>>>> Status: merged
>>>>>>>
>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949:
>>>>>>> Do not check Kerberos credentials for non-Kerberos authentications.
>>>> MapR
>>>>>>> users are affected by this, and cannot submit Flink on YARN jobs
>> on a
>>>>>>> secured MapR cluster.
>>>>>>> Status: PR - https://github.com/apache/flink/pull/3528, one +1
>> already
>>>>>>>
>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006:
>>>>>>> Kafka Consumer can lose state if queried partition list is
>> incomplete
>>>> on
>>>>>>> restore.
>>>>>>> Status: PR - https://github.com/apache/flink/pull/3505, one +1
>> already
>>>>>>>
>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025:
>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
>>>> JavaSerializer is
>>>>>>> used.
>>>>>>> Status: merged
>>>>>>>
>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771:
>>>>>>> Fix multi-char delimiters in Batch InputFormats.
>>>>>>> Status: merged
>>>>>>>
>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934:
>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This
>>>> fixes a
>>>>>>> bug that causes HA recovery to fail.
>>>>>>> Status: merged
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> For Flink 1.1.5:
>>>>>>>
>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on
>>>> checkpoints.
>>>>>>> This compromises the producer’s at-least-once guarantee.
>>>>>>> Status: This is already merged for 1.2.1. I would personally like
>> to
>>>>>>> backport the fix for this to 1.1.5 also.
>>>>>>>
>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006:
>>>>>>> Kafka Consumer can lose state if queried partition list is
>> incomplete
>>>> on
>>>>>>> restore.
>>>>>>> Status: PR - https://github.com/apache/flink/pull/3507, one +1
>> already
>>>>>>>
>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025:
>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
>>>> JavaSerializer is
>>>>>>> used.
>>>>>>> Status: merged
>>>>>>>
>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771:
>>>>>>> Fix multi-char delimiters in Batch InputFormats.
>>>>>>> Status: merged
>>>>>>>
>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934:
>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This
>>>> fixes a
>>>>>>> bug that causes HA recovery to fail.
>>>>>>> Status: merged
>>>>>>>
>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048:
>>>>>>> Kafka Consumer (0.9/0.10) threading model leads problematic
>>>> cancellation
>>>>>>> behavior.
>>>>>>> Status: This fix was already released in 1.2.0, but never made it
>> into
>>>> the
>>>>>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?
>>>>>>>
>>>>>>>
>>>>>>> What do you think? From the list so far, we pretty much already
>> have
>>>>>>> everything in, so I think it would be nice to aim for RCs by the
>> end of
>>>>>>> this week.
>>>>>>> Since both bugfix releases cover almost the same list of issues, I
>>>> think
>>>>>>> it shouldn’t be too hard for us to kick off both bugfix releases
>>>> around the
>>>>>>> same time.
>>>>>>>
>>>>>>> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” /
>>>> “1.1.5”
>>>>>>> as the Fix Versions, and are still open.
>>>>>>> We should probably want to check if there’s anything on there that
>> we
>>>>>>> should block on for the releases:
>>>>>>>
>>>>>>> For 1.2.1:
>>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql=
>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.2.1
>>>>>>>
>>>>>>> For 1.1.5:
>>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql=
>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.1.5
>>>>>
>>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Tzu-Li (Gordon) Tai
Thanks a lot for the updates so far everyone!

From the discussion so far, the below is the still unfixed pending issues for 1.1.5 / 1.2.1 release.

Since there’s only one backport for 1.1.5 left, I think having an RC for 1.1.5 near the end of this week / early next week is very promising, as basically everything is already in.
I’d be happy to volunteer to help manage the release for 1.1.5, and prepare the RC when it’s ready :)

For 1.2.1, we can leave the pending list here for tracking, and come back to update it in the near future.

If there’s anything I missed, please let me know!


=========== Still pending for Flink 1.1.5 ===========

(1) https://issues.apache.org/jira/browse/FLINK-5701
Broken at-least-once Kafka producer.
Status: backport PR pending - https://github.com/apache/flink/pull/3549. Since it is a relatively self-contained change, I expect this to be a fast fix.



=========== Still pending for Flink 1.2.1 ===========

(1) https://issues.apache.org/jira/browse/FLINK-5808
Fix Missing verification for setParallelism and setMaxParallelism
Status: PR - https://github.com/apache/flink/pull/3509, review in progress

(2) https://issues.apache.org/jira/browse/FLINK-5713
Protect against NPE in WindowOperator window cleanup
Status: PR - https://github.com/apache/flink/pull/3535, review pending

(3) https://issues.apache.org/jira/browse/FLINK-6044
TypeSerializerSerializationProxy.read() doesn't verify the read buffer length
Status: Fixed for master, 1.2 backport pending

(4) https://issues.apache.org/jira/browse/FLINK-5985
Flink treats every task as stateful (making topology changes impossible)
Status: PR - https://github.com/apache/flink/pull/3543, review in progress

(5) https://issues.apache.org/jira/browse/FLINK-5650
Flink-python tests taking up too much time
Status: I think Chesnay currently has some progress with this one, we can see if we want to make this a blocker


Cheers,
Gordon

On March 15, 2017 at 7:16:53 PM, Jinkui Shi ([hidden email]) wrote:

Can we fix this issue in the 1.2.1:  

Flink-python tests cost too long time  
https://issues.apache.org/jira/browse/FLINK-5650 <https://issues.apache.org/jira/browse/FLINK-5650>  

> 在 2017年3月15日,下午6:29,Vladislav Pernin <[hidden email]> 写道:  
>  
> I just tested in in my reproducer. It works.  
>  
> 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <[hidden email]>:  
>  
>> I did in fact just open a PR for  
>>> https://issues.apache.org/jira/browse/FLINK-6001 
>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and  
>>> allowedLateness  
>>  
>>  
>> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote:  
>>> Hi,  
>>>  
>>> I would also include the following (not yet resolved) issue in the 1.2.1  
>>> scope :  
>>>  
>>> https://issues.apache.org/jira/browse/FLINK-6001 
>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and  
>>> allowedLateness  
>>>  
>>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <[hidden email]>:  
>>>  
>>>> Big +1 Gordon!  
>>>>  
>>>> I think (10) is very critical to have in 1.2.1.  
>>>>  
>>>> – Ufuk  
>>>>  
>>>>  
>>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter  
>>>> <[hidden email]> wrote:  
>>>>> Hi,  
>>>>>  
>>>>> I would suggest to also include in 1.2.1:  
>>>>>  
>>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044 <  
>>>> https://issues.apache.org/jira/browse/FLINK-6044>  
>>>>> Replaces unintentional calls to InputStream#read(…) with the intended  
>>>>> and correct InputStream#readFully(…)  
>>>>> Status: PR  
>>>>>  
>>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985 <  
>>>> https://issues.apache.org/jira/browse/FLINK-5985>  
>>>>> Flink 1.2 was creating state handles for stateless tasks which caused  
>>>> trouble  
>>>>> at restore time for users that wanted to do some changes that only  
>>>> include  
>>>>> stateless operators to their topology.  
>>>>> Status: PR  
>>>>>  
>>>>>  
>>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <[hidden email]  
>>> :  
>>>>>>  
>>>>>> Thanks for kicking off the discussion Tzu-Li. I'd like to add the  
>>>> following  
>>>>>> issues which have already been merged into the 1.2-release and  
>>>> 1.1-release  
>>>>>> branch:  
>>>>>>  
>>>>>> 1.2.1:  
>>>>>>  
>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942 
>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.  
>>>>>> Corrupted checkpoints will now be skipped.  
>>>>>> Status: Merged  
>>>>>>  
>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940 
>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve the  
>>>>>> completed checkpoint from the meta data state handle retrieved from  
>>>>>> ZooKeeper. This can, for example, happen if the meta data is  
>> deleted.  
>>>>>> Checkpoints with unretrievable state handles are skipped.  
>>>>>> Status: Merged  
>>>>>>  
>>>>>> 1.1.5:  
>>>>>>  
>>>>>>  
>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942 
>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.  
>>>>>> Corrupted checkpoints will now be skipped.  
>>>>>> Status: Merged  
>>>>>>  
>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940 
>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve the  
>>>>>> completed checkpoint from the meta data state handle retrieved from  
>>>>>> ZooKeeper. This can, for example, happen if the meta data is  
>> deleted.  
>>>>>> Checkpoints with unretrievable state handles are skipped.  
>>>>>> Status: Merged  
>>>>>>  
>>>>>> Cheers,  
>>>>>> Till  
>>>>>>  
>>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <  
>>>> [hidden email]>  
>>>>>> wrote:  
>>>>>>  
>>>>>>> Hi all!  
>>>>>>>  
>>>>>>> I would like to start a discussion for the next bugfix release for  
>>>> 1.1.x  
>>>>>>> and 1.2.x.  
>>>>>>> There’s been quite a few critical fixes for bugs in both the  
>> releases  
>>>>>>> recently, and I think they deserve a bugfix release soon.  
>>>>>>> Most of the bugs were reported by users.  
>>>>>>>  
>>>>>>> I’m starting the discussion for both bugfix releases because most  
>> fixes  
>>>>>>> span both releases (almost identical).  
>>>>>>> Of course, the actual RC votes and RC creation process doesn’t  
>> have to  
>>>> be  
>>>>>>> started together.  
>>>>>>>  
>>>>>>> Here’s an overview of what’s been collected so far, for both bugfix  
>>>>>>> releases -  
>>>>>>> (it’s a list of what I’m aware of so far, and may be missing stuff;  
>>>> please  
>>>>>>> append and bring to attention as necessary :-) )  
>>>>>>>  
>>>>>>>  
>>>>>>> For Flink 1.2.1:  
>>>>>>>  
>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701: 
>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on  
>>>> checkpoints.  
>>>>>>> This compromises the producer’s at-least-once guarantee.  
>>>>>>> Status: merged  
>>>>>>>  
>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949: 
>>>>>>> Do not check Kerberos credentials for non-Kerberos authentications.  
>>>> MapR  
>>>>>>> users are affected by this, and cannot submit Flink on YARN jobs  
>> on a  
>>>>>>> secured MapR cluster.  
>>>>>>> Status: PR - https://github.com/apache/flink/pull/3528, one +1  
>> already  
>>>>>>>  
>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006: 
>>>>>>> Kafka Consumer can lose state if queried partition list is  
>> incomplete  
>>>> on  
>>>>>>> restore.  
>>>>>>> Status: PR - https://github.com/apache/flink/pull/3505, one +1  
>> already  
>>>>>>>  
>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025: 
>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s  
>>>> JavaSerializer is  
>>>>>>> used.  
>>>>>>> Status: merged  
>>>>>>>  
>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771: 
>>>>>>> Fix multi-char delimiters in Batch InputFormats.  
>>>>>>> Status: merged  
>>>>>>>  
>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934: 
>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This  
>>>> fixes a  
>>>>>>> bug that causes HA recovery to fail.  
>>>>>>> Status: merged  
>>>>>>>  
>>>>>>>  
>>>>>>>  
>>>>>>> For Flink 1.1.5:  
>>>>>>>  
>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701: 
>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on  
>>>> checkpoints.  
>>>>>>> This compromises the producer’s at-least-once guarantee.  
>>>>>>> Status: This is already merged for 1.2.1. I would personally like  
>> to  
>>>>>>> backport the fix for this to 1.1.5 also.  
>>>>>>>  
>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006: 
>>>>>>> Kafka Consumer can lose state if queried partition list is  
>> incomplete  
>>>> on  
>>>>>>> restore.  
>>>>>>> Status: PR - https://github.com/apache/flink/pull/3507, one +1  
>> already  
>>>>>>>  
>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025: 
>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s  
>>>> JavaSerializer is  
>>>>>>> used.  
>>>>>>> Status: merged  
>>>>>>>  
>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771: 
>>>>>>> Fix multi-char delimiters in Batch InputFormats.  
>>>>>>> Status: merged  
>>>>>>>  
>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934: 
>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This  
>>>> fixes a  
>>>>>>> bug that causes HA recovery to fail.  
>>>>>>> Status: merged  
>>>>>>>  
>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048: 
>>>>>>> Kafka Consumer (0.9/0.10) threading model leads problematic  
>>>> cancellation  
>>>>>>> behavior.  
>>>>>>> Status: This fix was already released in 1.2.0, but never made it  
>> into  
>>>> the  
>>>>>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?  
>>>>>>>  
>>>>>>>  
>>>>>>> What do you think? From the list so far, we pretty much already  
>> have  
>>>>>>> everything in, so I think it would be nice to aim for RCs by the  
>> end of  
>>>>>>> this week.  
>>>>>>> Since both bugfix releases cover almost the same list of issues, I  
>>>> think  
>>>>>>> it shouldn’t be too hard for us to kick off both bugfix releases  
>>>> around the  
>>>>>>> same time.  
>>>>>>>  
>>>>>>> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” /  
>>>> “1.1.5”  
>>>>>>> as the Fix Versions, and are still open.  
>>>>>>> We should probably want to check if there’s anything on there that  
>> we  
>>>>>>> should block on for the releases:  
>>>>>>>  
>>>>>>> For 1.2.1:  
>>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql= 
>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%  
>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.2.1  
>>>>>>>  
>>>>>>> For 1.1.5:  
>>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql= 
>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%  
>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.1.5  
>>>>>  
>>>>  
>>  
>  

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Stephan Ewen
Thanks for the update!

Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove scheduled
cancel-task from timer queue to prevent memory leaks

The remaining issue list looks good, but I would say that (5) is optional.
It is not a critical production bug.



On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai <[hidden email]>
wrote:

> Thanks a lot for the updates so far everyone!
>
> From the discussion so far, the below is the still unfixed pending issues
> for 1.1.5 / 1.2.1 release.
>
> Since there’s only one backport for 1.1.5 left, I think having an RC for
> 1.1.5 near the end of this week / early next week is very promising, as
> basically everything is already in.
> I’d be happy to volunteer to help manage the release for 1.1.5, and
> prepare the RC when it’s ready :)
>
> For 1.2.1, we can leave the pending list here for tracking, and come back
> to update it in the near future.
>
> If there’s anything I missed, please let me know!
>
>
> =========== Still pending for Flink 1.1.5 ===========
>
> (1) https://issues.apache.org/jira/browse/FLINK-5701
> Broken at-least-once Kafka producer.
> Status: backport PR pending - https://github.com/apache/flink/pull/3549.
> Since it is a relatively self-contained change, I expect this to be a fast
> fix.
>
>
>
> =========== Still pending for Flink 1.2.1 ===========
>
> (1) https://issues.apache.org/jira/browse/FLINK-5808
> Fix Missing verification for setParallelism and setMaxParallelism
> Status: PR - https://github.com/apache/flink/pull/3509, review in progress
>
> (2) https://issues.apache.org/jira/browse/FLINK-5713
> Protect against NPE in WindowOperator window cleanup
> Status: PR - https://github.com/apache/flink/pull/3535, review pending
>
> (3) https://issues.apache.org/jira/browse/FLINK-6044
> TypeSerializerSerializationProxy.read() doesn't verify the read buffer
> length
> Status: Fixed for master, 1.2 backport pending
>
> (4) https://issues.apache.org/jira/browse/FLINK-5985
> Flink treats every task as stateful (making topology changes impossible)
> Status: PR - https://github.com/apache/flink/pull/3543, review in progress
>
> (5) https://issues.apache.org/jira/browse/FLINK-5650
> Flink-python tests taking up too much time
> Status: I think Chesnay currently has some progress with this one, we can
> see if we want to make this a blocker
>
>
> Cheers,
> Gordon
>
> On March 15, 2017 at 7:16:53 PM, Jinkui Shi ([hidden email]) wrote:
>
> Can we fix this issue in the 1.2.1:
>
> Flink-python tests cost too long time
> https://issues.apache.org/jira/browse/FLINK-5650 <
> https://issues.apache.org/jira/browse/FLINK-5650>
>
> > 在 2017年3月15日,下午6:29,Vladislav Pernin <[hidden email]> 写道:
> >
> > I just tested in in my reproducer. It works.
> >
> > 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <[hidden email]>:
> >
> >> I did in fact just open a PR for
> >>> https://issues.apache.org/jira/browse/FLINK-6001
> >>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and
> >>> allowedLateness
> >>
> >>
> >> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote:
> >>> Hi,
> >>>
> >>> I would also include the following (not yet resolved) issue in the
> 1.2.1
> >>> scope :
> >>>
> >>> https://issues.apache.org/jira/browse/FLINK-6001
> >>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and
> >>> allowedLateness
> >>>
> >>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <[hidden email]>:
> >>>
> >>>> Big +1 Gordon!
> >>>>
> >>>> I think (10) is very critical to have in 1.2.1.
> >>>>
> >>>> – Ufuk
> >>>>
> >>>>
> >>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter
> >>>> <[hidden email]> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I would suggest to also include in 1.2.1:
> >>>>>
> >>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044 <
> >>>> https://issues.apache.org/jira/browse/FLINK-6044>
> >>>>> Replaces unintentional calls to InputStream#read(…) with the intended
> >>>>> and correct InputStream#readFully(…)
> >>>>> Status: PR
> >>>>>
> >>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985 <
> >>>> https://issues.apache.org/jira/browse/FLINK-5985>
> >>>>> Flink 1.2 was creating state handles for stateless tasks which caused
> >>>> trouble
> >>>>> at restore time for users that wanted to do some changes that only
> >>>> include
> >>>>> stateless operators to their topology.
> >>>>> Status: PR
> >>>>>
> >>>>>
> >>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <[hidden email]
> >>> :
> >>>>>>
> >>>>>> Thanks for kicking off the discussion Tzu-Li. I'd like to add the
> >>>> following
> >>>>>> issues which have already been merged into the 1.2-release and
> >>>> 1.1-release
> >>>>>> branch:
> >>>>>>
> >>>>>> 1.2.1:
> >>>>>>
> >>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
> >>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
> >>>>>> Corrupted checkpoints will now be skipped.
> >>>>>> Status: Merged
> >>>>>>
> >>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
> >>>>>> Hardens the checkpoint recovery in case that we cannot retrieve the
> >>>>>> completed checkpoint from the meta data state handle retrieved from
> >>>>>> ZooKeeper. This can, for example, happen if the meta data is
> >> deleted.
> >>>>>> Checkpoints with unretrievable state handles are skipped.
> >>>>>> Status: Merged
> >>>>>>
> >>>>>> 1.1.5:
> >>>>>>
> >>>>>>
> >>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
> >>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
> >>>>>> Corrupted checkpoints will now be skipped.
> >>>>>> Status: Merged
> >>>>>>
> >>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
> >>>>>> Hardens the checkpoint recovery in case that we cannot retrieve the
> >>>>>> completed checkpoint from the meta data state handle retrieved from
> >>>>>> ZooKeeper. This can, for example, happen if the meta data is
> >> deleted.
> >>>>>> Checkpoints with unretrievable state handles are skipped.
> >>>>>> Status: Merged
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Till
> >>>>>>
> >>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <
> >>>> [hidden email]>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi all!
> >>>>>>>
> >>>>>>> I would like to start a discussion for the next bugfix release for
> >>>> 1.1.x
> >>>>>>> and 1.2.x.
> >>>>>>> There’s been quite a few critical fixes for bugs in both the
> >> releases
> >>>>>>> recently, and I think they deserve a bugfix release soon.
> >>>>>>> Most of the bugs were reported by users.
> >>>>>>>
> >>>>>>> I’m starting the discussion for both bugfix releases because most
> >> fixes
> >>>>>>> span both releases (almost identical).
> >>>>>>> Of course, the actual RC votes and RC creation process doesn’t
> >> have to
> >>>> be
> >>>>>>> started together.
> >>>>>>>
> >>>>>>> Here’s an overview of what’s been collected so far, for both bugfix
> >>>>>>> releases -
> >>>>>>> (it’s a list of what I’m aware of so far, and may be missing stuff;
> >>>> please
> >>>>>>> append and bring to attention as necessary :-) )
> >>>>>>>
> >>>>>>>
> >>>>>>> For Flink 1.2.1:
> >>>>>>>
> >>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> >>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on
> >>>> checkpoints.
> >>>>>>> This compromises the producer’s at-least-once guarantee.
> >>>>>>> Status: merged
> >>>>>>>
> >>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949:
> >>>>>>> Do not check Kerberos credentials for non-Kerberos authentications.
> >>>> MapR
> >>>>>>> users are affected by this, and cannot submit Flink on YARN jobs
> >> on a
> >>>>>>> secured MapR cluster.
> >>>>>>> Status: PR - https://github.com/apache/flink/pull/3528, one +1
> >> already
> >>>>>>>
> >>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006:
> >>>>>>> Kafka Consumer can lose state if queried partition list is
> >> incomplete
> >>>> on
> >>>>>>> restore.
> >>>>>>> Status: PR - https://github.com/apache/flink/pull/3505, one +1
> >> already
> >>>>>>>
> >>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025:
> >>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
> >>>> JavaSerializer is
> >>>>>>> used.
> >>>>>>> Status: merged
> >>>>>>>
> >>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771:
> >>>>>>> Fix multi-char delimiters in Batch InputFormats.
> >>>>>>> Status: merged
> >>>>>>>
> >>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934:
> >>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This
> >>>> fixes a
> >>>>>>> bug that causes HA recovery to fail.
> >>>>>>> Status: merged
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> For Flink 1.1.5:
> >>>>>>>
> >>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> >>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on
> >>>> checkpoints.
> >>>>>>> This compromises the producer’s at-least-once guarantee.
> >>>>>>> Status: This is already merged for 1.2.1. I would personally like
> >> to
> >>>>>>> backport the fix for this to 1.1.5 also.
> >>>>>>>
> >>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006:
> >>>>>>> Kafka Consumer can lose state if queried partition list is
> >> incomplete
> >>>> on
> >>>>>>> restore.
> >>>>>>> Status: PR - https://github.com/apache/flink/pull/3507, one +1
> >> already
> >>>>>>>
> >>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025:
> >>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
> >>>> JavaSerializer is
> >>>>>>> used.
> >>>>>>> Status: merged
> >>>>>>>
> >>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771:
> >>>>>>> Fix multi-char delimiters in Batch InputFormats.
> >>>>>>> Status: merged
> >>>>>>>
> >>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934:
> >>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This
> >>>> fixes a
> >>>>>>> bug that causes HA recovery to fail.
> >>>>>>> Status: merged
> >>>>>>>
> >>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048:
> >>>>>>> Kafka Consumer (0.9/0.10) threading model leads problematic
> >>>> cancellation
> >>>>>>> behavior.
> >>>>>>> Status: This fix was already released in 1.2.0, but never made it
> >> into
> >>>> the
> >>>>>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?
> >>>>>>>
> >>>>>>>
> >>>>>>> What do you think? From the list so far, we pretty much already
> >> have
> >>>>>>> everything in, so I think it would be nice to aim for RCs by the
> >> end of
> >>>>>>> this week.
> >>>>>>> Since both bugfix releases cover almost the same list of issues, I
> >>>> think
> >>>>>>> it shouldn’t be too hard for us to kick off both bugfix releases
> >>>> around the
> >>>>>>> same time.
> >>>>>>>
> >>>>>>> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” /
> >>>> “1.1.5”
> >>>>>>> as the Fix Versions, and are still open.
> >>>>>>> We should probably want to check if there’s anything on there that
> >> we
> >>>>>>> should block on for the releases:
> >>>>>>>
> >>>>>>> For 1.2.1:
> >>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql=
> >>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> >>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.2.1
> >>>>>>>
> >>>>>>> For 1.1.5:
> >>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql=
> >>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> >>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.1.5
> >>>>>
> >>>>
> >>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Jinkui Shi
@Tzu-li(Fordon)Tai

FLINK-5650 is fix by [1]. Chesnay Scheduler push a PR please.

[1] https://github.com/zentol/flink/tree/5650_python_test_debug <https://github.com/zentol/flink/tree/5650_python_test_debug>


> 在 2017年3月16日,上午3:37,Stephan Ewen <[hidden email]> 写道:
>
> Thanks for the update!
>
> Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove scheduled
> cancel-task from timer queue to prevent memory leaks
>
> The remaining issue list looks good, but I would say that (5) is optional.
> It is not a critical production bug.
>
>
>
> On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai <[hidden email]>
> wrote:
>
>> Thanks a lot for the updates so far everyone!
>>
>> From the discussion so far, the below is the still unfixed pending issues
>> for 1.1.5 / 1.2.1 release.
>>
>> Since there’s only one backport for 1.1.5 left, I think having an RC for
>> 1.1.5 near the end of this week / early next week is very promising, as
>> basically everything is already in.
>> I’d be happy to volunteer to help manage the release for 1.1.5, and
>> prepare the RC when it’s ready :)
>>
>> For 1.2.1, we can leave the pending list here for tracking, and come back
>> to update it in the near future.
>>
>> If there’s anything I missed, please let me know!
>>
>>
>> =========== Still pending for Flink 1.1.5 ===========
>>
>> (1) https://issues.apache.org/jira/browse/FLINK-5701
>> Broken at-least-once Kafka producer.
>> Status: backport PR pending - https://github.com/apache/flink/pull/3549.
>> Since it is a relatively self-contained change, I expect this to be a fast
>> fix.
>>
>>
>>
>> =========== Still pending for Flink 1.2.1 ===========
>>
>> (1) https://issues.apache.org/jira/browse/FLINK-5808
>> Fix Missing verification for setParallelism and setMaxParallelism
>> Status: PR - https://github.com/apache/flink/pull/3509, review in progress
>>
>> (2) https://issues.apache.org/jira/browse/FLINK-5713
>> Protect against NPE in WindowOperator window cleanup
>> Status: PR - https://github.com/apache/flink/pull/3535, review pending
>>
>> (3) https://issues.apache.org/jira/browse/FLINK-6044
>> TypeSerializerSerializationProxy.read() doesn't verify the read buffer
>> length
>> Status: Fixed for master, 1.2 backport pending
>>
>> (4) https://issues.apache.org/jira/browse/FLINK-5985
>> Flink treats every task as stateful (making topology changes impossible)
>> Status: PR - https://github.com/apache/flink/pull/3543, review in progress
>>
>> (5) https://issues.apache.org/jira/browse/FLINK-5650
>> Flink-python tests taking up too much time
>> Status: I think Chesnay currently has some progress with this one, we can
>> see if we want to make this a blocker
>>
>>
>> Cheers,
>> Gordon
>>
>> On March 15, 2017 at 7:16:53 PM, Jinkui Shi ([hidden email]) wrote:
>>
>> Can we fix this issue in the 1.2.1:
>>
>> Flink-python tests cost too long time
>> https://issues.apache.org/jira/browse/FLINK-5650 <
>> https://issues.apache.org/jira/browse/FLINK-5650>
>>
>>> 在 2017年3月15日,下午6:29,Vladislav Pernin <[hidden email]> 写道:
>>>
>>> I just tested in in my reproducer. It works.
>>>
>>> 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <[hidden email]>:
>>>
>>>> I did in fact just open a PR for
>>>>> https://issues.apache.org/jira/browse/FLINK-6001
>>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and
>>>>> allowedLateness
>>>>
>>>>
>>>> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote:
>>>>> Hi,
>>>>>
>>>>> I would also include the following (not yet resolved) issue in the
>> 1.2.1
>>>>> scope :
>>>>>
>>>>> https://issues.apache.org/jira/browse/FLINK-6001
>>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and
>>>>> allowedLateness
>>>>>
>>>>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <[hidden email]>:
>>>>>
>>>>>> Big +1 Gordon!
>>>>>>
>>>>>> I think (10) is very critical to have in 1.2.1.
>>>>>>
>>>>>> – Ufuk
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter
>>>>>> <[hidden email]> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I would suggest to also include in 1.2.1:
>>>>>>>
>>>>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044 <
>>>>>> https://issues.apache.org/jira/browse/FLINK-6044>
>>>>>>> Replaces unintentional calls to InputStream#read(…) with the intended
>>>>>>> and correct InputStream#readFully(…)
>>>>>>> Status: PR
>>>>>>>
>>>>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985 <
>>>>>> https://issues.apache.org/jira/browse/FLINK-5985>
>>>>>>> Flink 1.2 was creating state handles for stateless tasks which caused
>>>>>> trouble
>>>>>>> at restore time for users that wanted to do some changes that only
>>>>>> include
>>>>>>> stateless operators to their topology.
>>>>>>> Status: PR
>>>>>>>
>>>>>>>
>>>>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <[hidden email]
>>>>> :
>>>>>>>>
>>>>>>>> Thanks for kicking off the discussion Tzu-Li. I'd like to add the
>>>>>> following
>>>>>>>> issues which have already been merged into the 1.2-release and
>>>>>> 1.1-release
>>>>>>>> branch:
>>>>>>>>
>>>>>>>> 1.2.1:
>>>>>>>>
>>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
>>>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
>>>>>>>> Corrupted checkpoints will now be skipped.
>>>>>>>> Status: Merged
>>>>>>>>
>>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
>>>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve the
>>>>>>>> completed checkpoint from the meta data state handle retrieved from
>>>>>>>> ZooKeeper. This can, for example, happen if the meta data is
>>>> deleted.
>>>>>>>> Checkpoints with unretrievable state handles are skipped.
>>>>>>>> Status: Merged
>>>>>>>>
>>>>>>>> 1.1.5:
>>>>>>>>
>>>>>>>>
>>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
>>>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper data.
>>>>>>>> Corrupted checkpoints will now be skipped.
>>>>>>>> Status: Merged
>>>>>>>>
>>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
>>>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve the
>>>>>>>> completed checkpoint from the meta data state handle retrieved from
>>>>>>>> ZooKeeper. This can, for example, happen if the meta data is
>>>> deleted.
>>>>>>>> Checkpoints with unretrievable state handles are skipped.
>>>>>>>> Status: Merged
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Till
>>>>>>>>
>>>>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <
>>>>>> [hidden email]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi all!
>>>>>>>>>
>>>>>>>>> I would like to start a discussion for the next bugfix release for
>>>>>> 1.1.x
>>>>>>>>> and 1.2.x.
>>>>>>>>> There’s been quite a few critical fixes for bugs in both the
>>>> releases
>>>>>>>>> recently, and I think they deserve a bugfix release soon.
>>>>>>>>> Most of the bugs were reported by users.
>>>>>>>>>
>>>>>>>>> I’m starting the discussion for both bugfix releases because most
>>>> fixes
>>>>>>>>> span both releases (almost identical).
>>>>>>>>> Of course, the actual RC votes and RC creation process doesn’t
>>>> have to
>>>>>> be
>>>>>>>>> started together.
>>>>>>>>>
>>>>>>>>> Here’s an overview of what’s been collected so far, for both bugfix
>>>>>>>>> releases -
>>>>>>>>> (it’s a list of what I’m aware of so far, and may be missing stuff;
>>>>>> please
>>>>>>>>> append and bring to attention as necessary :-) )
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> For Flink 1.2.1:
>>>>>>>>>
>>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
>>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on
>>>>>> checkpoints.
>>>>>>>>> This compromises the producer’s at-least-once guarantee.
>>>>>>>>> Status: merged
>>>>>>>>>
>>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949:
>>>>>>>>> Do not check Kerberos credentials for non-Kerberos authentications.
>>>>>> MapR
>>>>>>>>> users are affected by this, and cannot submit Flink on YARN jobs
>>>> on a
>>>>>>>>> secured MapR cluster.
>>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3528, one +1
>>>> already
>>>>>>>>>
>>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006:
>>>>>>>>> Kafka Consumer can lose state if queried partition list is
>>>> incomplete
>>>>>> on
>>>>>>>>> restore.
>>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3505, one +1
>>>> already
>>>>>>>>>
>>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025:
>>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
>>>>>> JavaSerializer is
>>>>>>>>> used.
>>>>>>>>> Status: merged
>>>>>>>>>
>>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771:
>>>>>>>>> Fix multi-char delimiters in Batch InputFormats.
>>>>>>>>> Status: merged
>>>>>>>>>
>>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934:
>>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This
>>>>>> fixes a
>>>>>>>>> bug that causes HA recovery to fail.
>>>>>>>>> Status: merged
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> For Flink 1.1.5:
>>>>>>>>>
>>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
>>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on
>>>>>> checkpoints.
>>>>>>>>> This compromises the producer’s at-least-once guarantee.
>>>>>>>>> Status: This is already merged for 1.2.1. I would personally like
>>>> to
>>>>>>>>> backport the fix for this to 1.1.5 also.
>>>>>>>>>
>>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006:
>>>>>>>>> Kafka Consumer can lose state if queried partition list is
>>>> incomplete
>>>>>> on
>>>>>>>>> restore.
>>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3507, one +1
>>>> already
>>>>>>>>>
>>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025:
>>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
>>>>>> JavaSerializer is
>>>>>>>>> used.
>>>>>>>>> Status: merged
>>>>>>>>>
>>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771:
>>>>>>>>> Fix multi-char delimiters in Batch InputFormats.
>>>>>>>>> Status: merged
>>>>>>>>>
>>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934:
>>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This
>>>>>> fixes a
>>>>>>>>> bug that causes HA recovery to fail.
>>>>>>>>> Status: merged
>>>>>>>>>
>>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048:
>>>>>>>>> Kafka Consumer (0.9/0.10) threading model leads problematic
>>>>>> cancellation
>>>>>>>>> behavior.
>>>>>>>>> Status: This fix was already released in 1.2.0, but never made it
>>>> into
>>>>>> the
>>>>>>>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What do you think? From the list so far, we pretty much already
>>>> have
>>>>>>>>> everything in, so I think it would be nice to aim for RCs by the
>>>> end of
>>>>>>>>> this week.
>>>>>>>>> Since both bugfix releases cover almost the same list of issues, I
>>>>>> think
>>>>>>>>> it shouldn’t be too hard for us to kick off both bugfix releases
>>>>>> around the
>>>>>>>>> same time.
>>>>>>>>>
>>>>>>>>> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” /
>>>>>> “1.1.5”
>>>>>>>>> as the Fix Versions, and are still open.
>>>>>>>>> We should probably want to check if there’s anything on there that
>>>> we
>>>>>>>>> should block on for the releases:
>>>>>>>>>
>>>>>>>>> For 1.2.1:
>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql=
>>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
>>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.2.1
>>>>>>>>>
>>>>>>>>> For 1.1.5:
>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql=
>>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
>>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.1.5
>>>>>>>
>>>>>>
>>>>
>>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Robert Metzger
The cassandra connector is probably not usable in Flink 1.2.0. I would like
to include a fix in 1.2.1:
https://issues.apache.org/jira/browse/FLINK-6084

Please let me know if this fix becomes a blocker for the 1.2.1 release. If
so, I can validate the fix myself to speed up things.

On Thu, Mar 16, 2017 at 9:41 AM, Jinkui Shi <[hidden email]> wrote:

> @Tzu-li(Fordon)Tai
>
> FLINK-5650 is fix by [1]. Chesnay Scheduler push a PR please.
>
> [1] https://github.com/zentol/flink/tree/5650_python_test_debug <
> https://github.com/zentol/flink/tree/5650_python_test_debug>
>
>
> > 在 2017年3月16日,上午3:37,Stephan Ewen <[hidden email]> 写道:
> >
> > Thanks for the update!
> >
> > Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove scheduled
> > cancel-task from timer queue to prevent memory leaks
> >
> > The remaining issue list looks good, but I would say that (5) is
> optional.
> > It is not a critical production bug.
> >
> >
> >
> > On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai <
> [hidden email]>
> > wrote:
> >
> >> Thanks a lot for the updates so far everyone!
> >>
> >> From the discussion so far, the below is the still unfixed pending
> issues
> >> for 1.1.5 / 1.2.1 release.
> >>
> >> Since there’s only one backport for 1.1.5 left, I think having an RC for
> >> 1.1.5 near the end of this week / early next week is very promising, as
> >> basically everything is already in.
> >> I’d be happy to volunteer to help manage the release for 1.1.5, and
> >> prepare the RC when it’s ready :)
> >>
> >> For 1.2.1, we can leave the pending list here for tracking, and come
> back
> >> to update it in the near future.
> >>
> >> If there’s anything I missed, please let me know!
> >>
> >>
> >> =========== Still pending for Flink 1.1.5 ===========
> >>
> >> (1) https://issues.apache.org/jira/browse/FLINK-5701
> >> Broken at-least-once Kafka producer.
> >> Status: backport PR pending - https://github.com/apache/flink/pull/3549
> .
> >> Since it is a relatively self-contained change, I expect this to be a
> fast
> >> fix.
> >>
> >>
> >>
> >> =========== Still pending for Flink 1.2.1 ===========
> >>
> >> (1) https://issues.apache.org/jira/browse/FLINK-5808
> >> Fix Missing verification for setParallelism and setMaxParallelism
> >> Status: PR - https://github.com/apache/flink/pull/3509, review in
> progress
> >>
> >> (2) https://issues.apache.org/jira/browse/FLINK-5713
> >> Protect against NPE in WindowOperator window cleanup
> >> Status: PR - https://github.com/apache/flink/pull/3535, review pending
> >>
> >> (3) https://issues.apache.org/jira/browse/FLINK-6044
> >> TypeSerializerSerializationProxy.read() doesn't verify the read buffer
> >> length
> >> Status: Fixed for master, 1.2 backport pending
> >>
> >> (4) https://issues.apache.org/jira/browse/FLINK-5985
> >> Flink treats every task as stateful (making topology changes impossible)
> >> Status: PR - https://github.com/apache/flink/pull/3543, review in
> progress
> >>
> >> (5) https://issues.apache.org/jira/browse/FLINK-5650
> >> Flink-python tests taking up too much time
> >> Status: I think Chesnay currently has some progress with this one, we
> can
> >> see if we want to make this a blocker
> >>
> >>
> >> Cheers,
> >> Gordon
> >>
> >> On March 15, 2017 at 7:16:53 PM, Jinkui Shi ([hidden email])
> wrote:
> >>
> >> Can we fix this issue in the 1.2.1:
> >>
> >> Flink-python tests cost too long time
> >> https://issues.apache.org/jira/browse/FLINK-5650 <
> >> https://issues.apache.org/jira/browse/FLINK-5650>
> >>
> >>> 在 2017年3月15日,下午6:29,Vladislav Pernin <[hidden email]> 写道:
> >>>
> >>> I just tested in in my reproducer. It works.
> >>>
> >>> 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <[hidden email]>:
> >>>
> >>>> I did in fact just open a PR for
> >>>>> https://issues.apache.org/jira/browse/FLINK-6001
> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and
> >>>>> allowedLateness
> >>>>
> >>>>
> >>>> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I would also include the following (not yet resolved) issue in the
> >> 1.2.1
> >>>>> scope :
> >>>>>
> >>>>> https://issues.apache.org/jira/browse/FLINK-6001
> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and
> >>>>> allowedLateness
> >>>>>
> >>>>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <[hidden email]>:
> >>>>>
> >>>>>> Big +1 Gordon!
> >>>>>>
> >>>>>> I think (10) is very critical to have in 1.2.1.
> >>>>>>
> >>>>>> – Ufuk
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter
> >>>>>> <[hidden email]> wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I would suggest to also include in 1.2.1:
> >>>>>>>
> >>>>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044 <
> >>>>>> https://issues.apache.org/jira/browse/FLINK-6044>
> >>>>>>> Replaces unintentional calls to InputStream#read(…) with the
> intended
> >>>>>>> and correct InputStream#readFully(…)
> >>>>>>> Status: PR
> >>>>>>>
> >>>>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985 <
> >>>>>> https://issues.apache.org/jira/browse/FLINK-5985>
> >>>>>>> Flink 1.2 was creating state handles for stateless tasks which
> caused
> >>>>>> trouble
> >>>>>>> at restore time for users that wanted to do some changes that only
> >>>>>> include
> >>>>>>> stateless operators to their topology.
> >>>>>>> Status: PR
> >>>>>>>
> >>>>>>>
> >>>>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <
> [hidden email]
> >>>>> :
> >>>>>>>>
> >>>>>>>> Thanks for kicking off the discussion Tzu-Li. I'd like to add the
> >>>>>> following
> >>>>>>>> issues which have already been merged into the 1.2-release and
> >>>>>> 1.1-release
> >>>>>>>> branch:
> >>>>>>>>
> >>>>>>>> 1.2.1:
> >>>>>>>>
> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
> >>>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper
> data.
> >>>>>>>> Corrupted checkpoints will now be skipped.
> >>>>>>>> Status: Merged
> >>>>>>>>
> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
> >>>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve
> the
> >>>>>>>> completed checkpoint from the meta data state handle retrieved
> from
> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is
> >>>> deleted.
> >>>>>>>> Checkpoints with unretrievable state handles are skipped.
> >>>>>>>> Status: Merged
> >>>>>>>>
> >>>>>>>> 1.1.5:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
> >>>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper
> data.
> >>>>>>>> Corrupted checkpoints will now be skipped.
> >>>>>>>> Status: Merged
> >>>>>>>>
> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
> >>>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve
> the
> >>>>>>>> completed checkpoint from the meta data state handle retrieved
> from
> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is
> >>>> deleted.
> >>>>>>>> Checkpoints with unretrievable state handles are skipped.
> >>>>>>>> Status: Merged
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Till
> >>>>>>>>
> >>>>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <
> >>>>>> [hidden email]>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi all!
> >>>>>>>>>
> >>>>>>>>> I would like to start a discussion for the next bugfix release
> for
> >>>>>> 1.1.x
> >>>>>>>>> and 1.2.x.
> >>>>>>>>> There’s been quite a few critical fixes for bugs in both the
> >>>> releases
> >>>>>>>>> recently, and I think they deserve a bugfix release soon.
> >>>>>>>>> Most of the bugs were reported by users.
> >>>>>>>>>
> >>>>>>>>> I’m starting the discussion for both bugfix releases because most
> >>>> fixes
> >>>>>>>>> span both releases (almost identical).
> >>>>>>>>> Of course, the actual RC votes and RC creation process doesn’t
> >>>> have to
> >>>>>> be
> >>>>>>>>> started together.
> >>>>>>>>>
> >>>>>>>>> Here’s an overview of what’s been collected so far, for both
> bugfix
> >>>>>>>>> releases -
> >>>>>>>>> (it’s a list of what I’m aware of so far, and may be missing
> stuff;
> >>>>>> please
> >>>>>>>>> append and bring to attention as necessary :-) )
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> For Flink 1.2.1:
> >>>>>>>>>
> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on
> >>>>>> checkpoints.
> >>>>>>>>> This compromises the producer’s at-least-once guarantee.
> >>>>>>>>> Status: merged
> >>>>>>>>>
> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949:
> >>>>>>>>> Do not check Kerberos credentials for non-Kerberos
> authentications.
> >>>>>> MapR
> >>>>>>>>> users are affected by this, and cannot submit Flink on YARN jobs
> >>>> on a
> >>>>>>>>> secured MapR cluster.
> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3528, one +1
> >>>> already
> >>>>>>>>>
> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006:
> >>>>>>>>> Kafka Consumer can lose state if queried partition list is
> >>>> incomplete
> >>>>>> on
> >>>>>>>>> restore.
> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3505, one +1
> >>>> already
> >>>>>>>>>
> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025:
> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
> >>>>>> JavaSerializer is
> >>>>>>>>> used.
> >>>>>>>>> Status: merged
> >>>>>>>>>
> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771:
> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.
> >>>>>>>>> Status: merged
> >>>>>>>>>
> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934:
> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This
> >>>>>> fixes a
> >>>>>>>>> bug that causes HA recovery to fail.
> >>>>>>>>> Status: merged
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> For Flink 1.1.5:
> >>>>>>>>>
> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on
> >>>>>> checkpoints.
> >>>>>>>>> This compromises the producer’s at-least-once guarantee.
> >>>>>>>>> Status: This is already merged for 1.2.1. I would personally like
> >>>> to
> >>>>>>>>> backport the fix for this to 1.1.5 also.
> >>>>>>>>>
> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006:
> >>>>>>>>> Kafka Consumer can lose state if queried partition list is
> >>>> incomplete
> >>>>>> on
> >>>>>>>>> restore.
> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3507, one +1
> >>>> already
> >>>>>>>>>
> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025:
> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
> >>>>>> JavaSerializer is
> >>>>>>>>> used.
> >>>>>>>>> Status: merged
> >>>>>>>>>
> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771:
> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.
> >>>>>>>>> Status: merged
> >>>>>>>>>
> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934:
> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This
> >>>>>> fixes a
> >>>>>>>>> bug that causes HA recovery to fail.
> >>>>>>>>> Status: merged
> >>>>>>>>>
> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048:
> >>>>>>>>> Kafka Consumer (0.9/0.10) threading model leads problematic
> >>>>>> cancellation
> >>>>>>>>> behavior.
> >>>>>>>>> Status: This fix was already released in 1.2.0, but never made it
> >>>> into
> >>>>>> the
> >>>>>>>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> What do you think? From the list so far, we pretty much already
> >>>> have
> >>>>>>>>> everything in, so I think it would be nice to aim for RCs by the
> >>>> end of
> >>>>>>>>> this week.
> >>>>>>>>> Since both bugfix releases cover almost the same list of issues,
> I
> >>>>>> think
> >>>>>>>>> it shouldn’t be too hard for us to kick off both bugfix releases
> >>>>>> around the
> >>>>>>>>> same time.
> >>>>>>>>>
> >>>>>>>>> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” /
> >>>>>> “1.1.5”
> >>>>>>>>> as the Fix Versions, and are still open.
> >>>>>>>>> We should probably want to check if there’s anything on there
> that
> >>>> we
> >>>>>>>>> should block on for the releases:
> >>>>>>>>>
> >>>>>>>>> For 1.2.1:
> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql=
> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%
> 20fixVersion%20%3D%201.2.1
> >>>>>>>>>
> >>>>>>>>> For 1.1.5:
> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql=
> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%
> 20fixVersion%20%3D%201.1.5
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Tzu-Li (Gordon) Tai
Update for 1.1.5:
The last fixes for 1.1.5 are in! I will create the RC today and start the vote.

Cheers,
Gordon


On March 17, 2017 at 1:14:53 AM, Robert Metzger ([hidden email]) wrote:

The cassandra connector is probably not usable in Flink 1.2.0. I would like  
to include a fix in 1.2.1:  
https://issues.apache.org/jira/browse/FLINK-6084 

Please let me know if this fix becomes a blocker for the 1.2.1 release. If  
so, I can validate the fix myself to speed up things.  

On Thu, Mar 16, 2017 at 9:41 AM, Jinkui Shi <[hidden email]> wrote:  

> @Tzu-li(Fordon)Tai  
>  
> FLINK-5650 is fix by [1]. Chesnay Scheduler push a PR please.  
>  
> [1] https://github.com/zentol/flink/tree/5650_python_test_debug <  
> https://github.com/zentol/flink/tree/5650_python_test_debug>  
>  
>  
> > 在 2017年3月16日,上午3:37,Stephan Ewen <[hidden email]> 写道:  
> >  
> > Thanks for the update!  
> >  
> > Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove scheduled  
> > cancel-task from timer queue to prevent memory leaks  
> >  
> > The remaining issue list looks good, but I would say that (5) is  
> optional.  
> > It is not a critical production bug.  
> >  
> >  
> >  
> > On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai <  
> [hidden email]>  
> > wrote:  
> >  
> >> Thanks a lot for the updates so far everyone!  
> >>  
> >> From the discussion so far, the below is the still unfixed pending  
> issues  
> >> for 1.1.5 / 1.2.1 release.  
> >>  
> >> Since there’s only one backport for 1.1.5 left, I think having an RC for  
> >> 1.1.5 near the end of this week / early next week is very promising, as  
> >> basically everything is already in.  
> >> I’d be happy to volunteer to help manage the release for 1.1.5, and  
> >> prepare the RC when it’s ready :)  
> >>  
> >> For 1.2.1, we can leave the pending list here for tracking, and come  
> back  
> >> to update it in the near future.  
> >>  
> >> If there’s anything I missed, please let me know!  
> >>  
> >>  
> >> =========== Still pending for Flink 1.1.5 ===========  
> >>  
> >> (1) https://issues.apache.org/jira/browse/FLINK-5701 
> >> Broken at-least-once Kafka producer.  
> >> Status: backport PR pending - https://github.com/apache/flink/pull/3549 
> .  
> >> Since it is a relatively self-contained change, I expect this to be a  
> fast  
> >> fix.  
> >>  
> >>  
> >>  
> >> =========== Still pending for Flink 1.2.1 ===========  
> >>  
> >> (1) https://issues.apache.org/jira/browse/FLINK-5808 
> >> Fix Missing verification for setParallelism and setMaxParallelism  
> >> Status: PR - https://github.com/apache/flink/pull/3509, review in  
> progress  
> >>  
> >> (2) https://issues.apache.org/jira/browse/FLINK-5713 
> >> Protect against NPE in WindowOperator window cleanup  
> >> Status: PR - https://github.com/apache/flink/pull/3535, review pending  
> >>  
> >> (3) https://issues.apache.org/jira/browse/FLINK-6044 
> >> TypeSerializerSerializationProxy.read() doesn't verify the read buffer  
> >> length  
> >> Status: Fixed for master, 1.2 backport pending  
> >>  
> >> (4) https://issues.apache.org/jira/browse/FLINK-5985 
> >> Flink treats every task as stateful (making topology changes impossible)  
> >> Status: PR - https://github.com/apache/flink/pull/3543, review in  
> progress  
> >>  
> >> (5) https://issues.apache.org/jira/browse/FLINK-5650 
> >> Flink-python tests taking up too much time  
> >> Status: I think Chesnay currently has some progress with this one, we  
> can  
> >> see if we want to make this a blocker  
> >>  
> >>  
> >> Cheers,  
> >> Gordon  
> >>  
> >> On March 15, 2017 at 7:16:53 PM, Jinkui Shi ([hidden email])  
> wrote:  
> >>  
> >> Can we fix this issue in the 1.2.1:  
> >>  
> >> Flink-python tests cost too long time  
> >> https://issues.apache.org/jira/browse/FLINK-5650 <  
> >> https://issues.apache.org/jira/browse/FLINK-5650>  
> >>  
> >>> 在 2017年3月15日,下午6:29,Vladislav Pernin <[hidden email]> 写道:  
> >>>  
> >>> I just tested in in my reproducer. It works.  
> >>>  
> >>> 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <[hidden email]>:  
> >>>  
> >>>> I did in fact just open a PR for  
> >>>>> https://issues.apache.org/jira/browse/FLINK-6001 
> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and  
> >>>>> allowedLateness  
> >>>>  
> >>>>  
> >>>> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote:  
> >>>>> Hi,  
> >>>>>  
> >>>>> I would also include the following (not yet resolved) issue in the  
> >> 1.2.1  
> >>>>> scope :  
> >>>>>  
> >>>>> https://issues.apache.org/jira/browse/FLINK-6001 
> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and  
> >>>>> allowedLateness  
> >>>>>  
> >>>>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <[hidden email]>:  
> >>>>>  
> >>>>>> Big +1 Gordon!  
> >>>>>>  
> >>>>>> I think (10) is very critical to have in 1.2.1.  
> >>>>>>  
> >>>>>> – Ufuk  
> >>>>>>  
> >>>>>>  
> >>>>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter  
> >>>>>> <[hidden email]> wrote:  
> >>>>>>> Hi,  
> >>>>>>>  
> >>>>>>> I would suggest to also include in 1.2.1:  
> >>>>>>>  
> >>>>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044 <  
> >>>>>> https://issues.apache.org/jira/browse/FLINK-6044>  
> >>>>>>> Replaces unintentional calls to InputStream#read(…) with the  
> intended  
> >>>>>>> and correct InputStream#readFully(…)  
> >>>>>>> Status: PR  
> >>>>>>>  
> >>>>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985 <  
> >>>>>> https://issues.apache.org/jira/browse/FLINK-5985>  
> >>>>>>> Flink 1.2 was creating state handles for stateless tasks which  
> caused  
> >>>>>> trouble  
> >>>>>>> at restore time for users that wanted to do some changes that only  
> >>>>>> include  
> >>>>>>> stateless operators to their topology.  
> >>>>>>> Status: PR  
> >>>>>>>  
> >>>>>>>  
> >>>>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <  
> [hidden email]  
> >>>>> :  
> >>>>>>>>  
> >>>>>>>> Thanks for kicking off the discussion Tzu-Li. I'd like to add the  
> >>>>>> following  
> >>>>>>>> issues which have already been merged into the 1.2-release and  
> >>>>>> 1.1-release  
> >>>>>>>> branch:  
> >>>>>>>>  
> >>>>>>>> 1.2.1:  
> >>>>>>>>  
> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942 
> >>>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper  
> data.  
> >>>>>>>> Corrupted checkpoints will now be skipped.  
> >>>>>>>> Status: Merged  
> >>>>>>>>  
> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940 
> >>>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve  
> the  
> >>>>>>>> completed checkpoint from the meta data state handle retrieved  
> from  
> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is  
> >>>> deleted.  
> >>>>>>>> Checkpoints with unretrievable state handles are skipped.  
> >>>>>>>> Status: Merged  
> >>>>>>>>  
> >>>>>>>> 1.1.5:  
> >>>>>>>>  
> >>>>>>>>  
> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942 
> >>>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper  
> data.  
> >>>>>>>> Corrupted checkpoints will now be skipped.  
> >>>>>>>> Status: Merged  
> >>>>>>>>  
> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940 
> >>>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve  
> the  
> >>>>>>>> completed checkpoint from the meta data state handle retrieved  
> from  
> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is  
> >>>> deleted.  
> >>>>>>>> Checkpoints with unretrievable state handles are skipped.  
> >>>>>>>> Status: Merged  
> >>>>>>>>  
> >>>>>>>> Cheers,  
> >>>>>>>> Till  
> >>>>>>>>  
> >>>>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <  
> >>>>>> [hidden email]>  
> >>>>>>>> wrote:  
> >>>>>>>>  
> >>>>>>>>> Hi all!  
> >>>>>>>>>  
> >>>>>>>>> I would like to start a discussion for the next bugfix release  
> for  
> >>>>>> 1.1.x  
> >>>>>>>>> and 1.2.x.  
> >>>>>>>>> There’s been quite a few critical fixes for bugs in both the  
> >>>> releases  
> >>>>>>>>> recently, and I think they deserve a bugfix release soon.  
> >>>>>>>>> Most of the bugs were reported by users.  
> >>>>>>>>>  
> >>>>>>>>> I’m starting the discussion for both bugfix releases because most  
> >>>> fixes  
> >>>>>>>>> span both releases (almost identical).  
> >>>>>>>>> Of course, the actual RC votes and RC creation process doesn’t  
> >>>> have to  
> >>>>>> be  
> >>>>>>>>> started together.  
> >>>>>>>>>  
> >>>>>>>>> Here’s an overview of what’s been collected so far, for both  
> bugfix  
> >>>>>>>>> releases -  
> >>>>>>>>> (it’s a list of what I’m aware of so far, and may be missing  
> stuff;  
> >>>>>> please  
> >>>>>>>>> append and bring to attention as necessary :-) )  
> >>>>>>>>>  
> >>>>>>>>>  
> >>>>>>>>> For Flink 1.2.1:  
> >>>>>>>>>  
> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701: 
> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on  
> >>>>>> checkpoints.  
> >>>>>>>>> This compromises the producer’s at-least-once guarantee.  
> >>>>>>>>> Status: merged  
> >>>>>>>>>  
> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949: 
> >>>>>>>>> Do not check Kerberos credentials for non-Kerberos  
> authentications.  
> >>>>>> MapR  
> >>>>>>>>> users are affected by this, and cannot submit Flink on YARN jobs  
> >>>> on a  
> >>>>>>>>> secured MapR cluster.  
> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3528, one +1  
> >>>> already  
> >>>>>>>>>  
> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006: 
> >>>>>>>>> Kafka Consumer can lose state if queried partition list is  
> >>>> incomplete  
> >>>>>> on  
> >>>>>>>>> restore.  
> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3505, one +1  
> >>>> already  
> >>>>>>>>>  
> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025: 
> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s  
> >>>>>> JavaSerializer is  
> >>>>>>>>> used.  
> >>>>>>>>> Status: merged  
> >>>>>>>>>  
> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771: 
> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.  
> >>>>>>>>> Status: merged  
> >>>>>>>>>  
> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934: 
> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This  
> >>>>>> fixes a  
> >>>>>>>>> bug that causes HA recovery to fail.  
> >>>>>>>>> Status: merged  
> >>>>>>>>>  
> >>>>>>>>>  
> >>>>>>>>>  
> >>>>>>>>> For Flink 1.1.5:  
> >>>>>>>>>  
> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701: 
> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on  
> >>>>>> checkpoints.  
> >>>>>>>>> This compromises the producer’s at-least-once guarantee.  
> >>>>>>>>> Status: This is already merged for 1.2.1. I would personally like  
> >>>> to  
> >>>>>>>>> backport the fix for this to 1.1.5 also.  
> >>>>>>>>>  
> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006: 
> >>>>>>>>> Kafka Consumer can lose state if queried partition list is  
> >>>> incomplete  
> >>>>>> on  
> >>>>>>>>> restore.  
> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3507, one +1  
> >>>> already  
> >>>>>>>>>  
> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025: 
> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s  
> >>>>>> JavaSerializer is  
> >>>>>>>>> used.  
> >>>>>>>>> Status: merged  
> >>>>>>>>>  
> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771: 
> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.  
> >>>>>>>>> Status: merged  
> >>>>>>>>>  
> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934: 
> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This  
> >>>>>> fixes a  
> >>>>>>>>> bug that causes HA recovery to fail.  
> >>>>>>>>> Status: merged  
> >>>>>>>>>  
> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048: 
> >>>>>>>>> Kafka Consumer (0.9/0.10) threading model leads problematic  
> >>>>>> cancellation  
> >>>>>>>>> behavior.  
> >>>>>>>>> Status: This fix was already released in 1.2.0, but never made it  
> >>>> into  
> >>>>>> the  
> >>>>>>>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?  
> >>>>>>>>>  
> >>>>>>>>>  
> >>>>>>>>> What do you think? From the list so far, we pretty much already  
> >>>> have  
> >>>>>>>>> everything in, so I think it would be nice to aim for RCs by the  
> >>>> end of  
> >>>>>>>>> this week.  
> >>>>>>>>> Since both bugfix releases cover almost the same list of issues,  
> I  
> >>>>>> think  
> >>>>>>>>> it shouldn’t be too hard for us to kick off both bugfix releases  
> >>>>>> around the  
> >>>>>>>>> same time.  
> >>>>>>>>>  
> >>>>>>>>> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” /  
> >>>>>> “1.1.5”  
> >>>>>>>>> as the Fix Versions, and are still open.  
> >>>>>>>>> We should probably want to check if there’s anything on there  
> that  
> >>>> we  
> >>>>>>>>> should block on for the releases:  
> >>>>>>>>>  
> >>>>>>>>> For 1.2.1:  
> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql= 
> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%  
> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%  
> 20fixVersion%20%3D%201.2.1  
> >>>>>>>>>  
> >>>>>>>>> For 1.1.5:  
> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql= 
> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%  
> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%  
> 20fixVersion%20%3D%201.1.5  
> >>>>>>>  
> >>>>>>  
> >>>>  
> >>>  
> >>  
> >>  
>  
>  
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Ufuk Celebi-2
Cool! Thanks for taking care of this Gordon :-)

On Fri, Mar 17, 2017 at 7:13 AM, Tzu-Li (Gordon) Tai
<[hidden email]> wrote:

> Update for 1.1.5:
> The last fixes for 1.1.5 are in! I will create the RC today and start the vote.
>
> Cheers,
> Gordon
>
>
> On March 17, 2017 at 1:14:53 AM, Robert Metzger ([hidden email]) wrote:
>
> The cassandra connector is probably not usable in Flink 1.2.0. I would like
> to include a fix in 1.2.1:
> https://issues.apache.org/jira/browse/FLINK-6084
>
> Please let me know if this fix becomes a blocker for the 1.2.1 release. If
> so, I can validate the fix myself to speed up things.
>
> On Thu, Mar 16, 2017 at 9:41 AM, Jinkui Shi <[hidden email]> wrote:
>
>> @Tzu-li(Fordon)Tai
>>
>> FLINK-5650 is fix by [1]. Chesnay Scheduler push a PR please.
>>
>> [1] https://github.com/zentol/flink/tree/5650_python_test_debug <
>> https://github.com/zentol/flink/tree/5650_python_test_debug>
>>
>>
>> > 在 2017年3月16日,上午3:37,Stephan Ewen <[hidden email]> 写道:
>> >
>> > Thanks for the update!
>> >
>> > Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove scheduled
>> > cancel-task from timer queue to prevent memory leaks
>> >
>> > The remaining issue list looks good, but I would say that (5) is
>> optional.
>> > It is not a critical production bug.
>> >
>> >
>> >
>> > On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai <
>> [hidden email]>
>> > wrote:
>> >
>> >> Thanks a lot for the updates so far everyone!
>> >>
>> >> From the discussion so far, the below is the still unfixed pending
>> issues
>> >> for 1.1.5 / 1.2.1 release.
>> >>
>> >> Since there’s only one backport for 1.1.5 left, I think having an RC for
>> >> 1.1.5 near the end of this week / early next week is very promising, as
>> >> basically everything is already in.
>> >> I’d be happy to volunteer to help manage the release for 1.1.5, and
>> >> prepare the RC when it’s ready :)
>> >>
>> >> For 1.2.1, we can leave the pending list here for tracking, and come
>> back
>> >> to update it in the near future.
>> >>
>> >> If there’s anything I missed, please let me know!
>> >>
>> >>
>> >> =========== Still pending for Flink 1.1.5 ===========
>> >>
>> >> (1) https://issues.apache.org/jira/browse/FLINK-5701
>> >> Broken at-least-once Kafka producer.
>> >> Status: backport PR pending - https://github.com/apache/flink/pull/3549
>> .
>> >> Since it is a relatively self-contained change, I expect this to be a
>> fast
>> >> fix.
>> >>
>> >>
>> >>
>> >> =========== Still pending for Flink 1.2.1 ===========
>> >>
>> >> (1) https://issues.apache.org/jira/browse/FLINK-5808
>> >> Fix Missing verification for setParallelism and setMaxParallelism
>> >> Status: PR - https://github.com/apache/flink/pull/3509, review in
>> progress
>> >>
>> >> (2) https://issues.apache.org/jira/browse/FLINK-5713
>> >> Protect against NPE in WindowOperator window cleanup
>> >> Status: PR - https://github.com/apache/flink/pull/3535, review pending
>> >>
>> >> (3) https://issues.apache.org/jira/browse/FLINK-6044
>> >> TypeSerializerSerializationProxy.read() doesn't verify the read buffer
>> >> length
>> >> Status: Fixed for master, 1.2 backport pending
>> >>
>> >> (4) https://issues.apache.org/jira/browse/FLINK-5985
>> >> Flink treats every task as stateful (making topology changes impossible)
>> >> Status: PR - https://github.com/apache/flink/pull/3543, review in
>> progress
>> >>
>> >> (5) https://issues.apache.org/jira/browse/FLINK-5650
>> >> Flink-python tests taking up too much time
>> >> Status: I think Chesnay currently has some progress with this one, we
>> can
>> >> see if we want to make this a blocker
>> >>
>> >>
>> >> Cheers,
>> >> Gordon
>> >>
>> >> On March 15, 2017 at 7:16:53 PM, Jinkui Shi ([hidden email])
>> wrote:
>> >>
>> >> Can we fix this issue in the 1.2.1:
>> >>
>> >> Flink-python tests cost too long time
>> >> https://issues.apache.org/jira/browse/FLINK-5650 <
>> >> https://issues.apache.org/jira/browse/FLINK-5650>
>> >>
>> >>> 在 2017年3月15日,下午6:29,Vladislav Pernin <[hidden email]> 写道:
>> >>>
>> >>> I just tested in in my reproducer. It works.
>> >>>
>> >>> 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <[hidden email]>:
>> >>>
>> >>>> I did in fact just open a PR for
>> >>>>> https://issues.apache.org/jira/browse/FLINK-6001
>> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and
>> >>>>> allowedLateness
>> >>>>
>> >>>>
>> >>>> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote:
>> >>>>> Hi,
>> >>>>>
>> >>>>> I would also include the following (not yet resolved) issue in the
>> >> 1.2.1
>> >>>>> scope :
>> >>>>>
>> >>>>> https://issues.apache.org/jira/browse/FLINK-6001
>> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and
>> >>>>> allowedLateness
>> >>>>>
>> >>>>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <[hidden email]>:
>> >>>>>
>> >>>>>> Big +1 Gordon!
>> >>>>>>
>> >>>>>> I think (10) is very critical to have in 1.2.1.
>> >>>>>>
>> >>>>>> – Ufuk
>> >>>>>>
>> >>>>>>
>> >>>>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter
>> >>>>>> <[hidden email]> wrote:
>> >>>>>>> Hi,
>> >>>>>>>
>> >>>>>>> I would suggest to also include in 1.2.1:
>> >>>>>>>
>> >>>>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044 <
>> >>>>>> https://issues.apache.org/jira/browse/FLINK-6044>
>> >>>>>>> Replaces unintentional calls to InputStream#read(…) with the
>> intended
>> >>>>>>> and correct InputStream#readFully(…)
>> >>>>>>> Status: PR
>> >>>>>>>
>> >>>>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985 <
>> >>>>>> https://issues.apache.org/jira/browse/FLINK-5985>
>> >>>>>>> Flink 1.2 was creating state handles for stateless tasks which
>> caused
>> >>>>>> trouble
>> >>>>>>> at restore time for users that wanted to do some changes that only
>> >>>>>> include
>> >>>>>>> stateless operators to their topology.
>> >>>>>>> Status: PR
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <
>> [hidden email]
>> >>>>> :
>> >>>>>>>>
>> >>>>>>>> Thanks for kicking off the discussion Tzu-Li. I'd like to add the
>> >>>>>> following
>> >>>>>>>> issues which have already been merged into the 1.2-release and
>> >>>>>> 1.1-release
>> >>>>>>>> branch:
>> >>>>>>>>
>> >>>>>>>> 1.2.1:
>> >>>>>>>>
>> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
>> >>>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper
>> data.
>> >>>>>>>> Corrupted checkpoints will now be skipped.
>> >>>>>>>> Status: Merged
>> >>>>>>>>
>> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
>> >>>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve
>> the
>> >>>>>>>> completed checkpoint from the meta data state handle retrieved
>> from
>> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is
>> >>>> deleted.
>> >>>>>>>> Checkpoints with unretrievable state handles are skipped.
>> >>>>>>>> Status: Merged
>> >>>>>>>>
>> >>>>>>>> 1.1.5:
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
>> >>>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper
>> data.
>> >>>>>>>> Corrupted checkpoints will now be skipped.
>> >>>>>>>> Status: Merged
>> >>>>>>>>
>> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
>> >>>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve
>> the
>> >>>>>>>> completed checkpoint from the meta data state handle retrieved
>> from
>> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is
>> >>>> deleted.
>> >>>>>>>> Checkpoints with unretrievable state handles are skipped.
>> >>>>>>>> Status: Merged
>> >>>>>>>>
>> >>>>>>>> Cheers,
>> >>>>>>>> Till
>> >>>>>>>>
>> >>>>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <
>> >>>>>> [hidden email]>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Hi all!
>> >>>>>>>>>
>> >>>>>>>>> I would like to start a discussion for the next bugfix release
>> for
>> >>>>>> 1.1.x
>> >>>>>>>>> and 1.2.x.
>> >>>>>>>>> There’s been quite a few critical fixes for bugs in both the
>> >>>> releases
>> >>>>>>>>> recently, and I think they deserve a bugfix release soon.
>> >>>>>>>>> Most of the bugs were reported by users.
>> >>>>>>>>>
>> >>>>>>>>> I’m starting the discussion for both bugfix releases because most
>> >>>> fixes
>> >>>>>>>>> span both releases (almost identical).
>> >>>>>>>>> Of course, the actual RC votes and RC creation process doesn’t
>> >>>> have to
>> >>>>>> be
>> >>>>>>>>> started together.
>> >>>>>>>>>
>> >>>>>>>>> Here’s an overview of what’s been collected so far, for both
>> bugfix
>> >>>>>>>>> releases -
>> >>>>>>>>> (it’s a list of what I’m aware of so far, and may be missing
>> stuff;
>> >>>>>> please
>> >>>>>>>>> append and bring to attention as necessary :-) )
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> For Flink 1.2.1:
>> >>>>>>>>>
>> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
>> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on
>> >>>>>> checkpoints.
>> >>>>>>>>> This compromises the producer’s at-least-once guarantee.
>> >>>>>>>>> Status: merged
>> >>>>>>>>>
>> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949:
>> >>>>>>>>> Do not check Kerberos credentials for non-Kerberos
>> authentications.
>> >>>>>> MapR
>> >>>>>>>>> users are affected by this, and cannot submit Flink on YARN jobs
>> >>>> on a
>> >>>>>>>>> secured MapR cluster.
>> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3528, one +1
>> >>>> already
>> >>>>>>>>>
>> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006:
>> >>>>>>>>> Kafka Consumer can lose state if queried partition list is
>> >>>> incomplete
>> >>>>>> on
>> >>>>>>>>> restore.
>> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3505, one +1
>> >>>> already
>> >>>>>>>>>
>> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025:
>> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
>> >>>>>> JavaSerializer is
>> >>>>>>>>> used.
>> >>>>>>>>> Status: merged
>> >>>>>>>>>
>> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771:
>> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.
>> >>>>>>>>> Status: merged
>> >>>>>>>>>
>> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934:
>> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This
>> >>>>>> fixes a
>> >>>>>>>>> bug that causes HA recovery to fail.
>> >>>>>>>>> Status: merged
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> For Flink 1.1.5:
>> >>>>>>>>>
>> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
>> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on
>> >>>>>> checkpoints.
>> >>>>>>>>> This compromises the producer’s at-least-once guarantee.
>> >>>>>>>>> Status: This is already merged for 1.2.1. I would personally like
>> >>>> to
>> >>>>>>>>> backport the fix for this to 1.1.5 also.
>> >>>>>>>>>
>> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006:
>> >>>>>>>>> Kafka Consumer can lose state if queried partition list is
>> >>>> incomplete
>> >>>>>> on
>> >>>>>>>>> restore.
>> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3507, one +1
>> >>>> already
>> >>>>>>>>>
>> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025:
>> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
>> >>>>>> JavaSerializer is
>> >>>>>>>>> used.
>> >>>>>>>>> Status: merged
>> >>>>>>>>>
>> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771:
>> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.
>> >>>>>>>>> Status: merged
>> >>>>>>>>>
>> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934:
>> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This
>> >>>>>> fixes a
>> >>>>>>>>> bug that causes HA recovery to fail.
>> >>>>>>>>> Status: merged
>> >>>>>>>>>
>> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048:
>> >>>>>>>>> Kafka Consumer (0.9/0.10) threading model leads problematic
>> >>>>>> cancellation
>> >>>>>>>>> behavior.
>> >>>>>>>>> Status: This fix was already released in 1.2.0, but never made it
>> >>>> into
>> >>>>>> the
>> >>>>>>>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> What do you think? From the list so far, we pretty much already
>> >>>> have
>> >>>>>>>>> everything in, so I think it would be nice to aim for RCs by the
>> >>>> end of
>> >>>>>>>>> this week.
>> >>>>>>>>> Since both bugfix releases cover almost the same list of issues,
>> I
>> >>>>>> think
>> >>>>>>>>> it shouldn’t be too hard for us to kick off both bugfix releases
>> >>>>>> around the
>> >>>>>>>>> same time.
>> >>>>>>>>>
>> >>>>>>>>> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” /
>> >>>>>> “1.1.5”
>> >>>>>>>>> as the Fix Versions, and are still open.
>> >>>>>>>>> We should probably want to check if there’s anything on there
>> that
>> >>>> we
>> >>>>>>>>> should block on for the releases:
>> >>>>>>>>>
>> >>>>>>>>> For 1.2.1:
>> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql=
>> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
>> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%
>> 20fixVersion%20%3D%201.2.1
>> >>>>>>>>>
>> >>>>>>>>> For 1.1.5:
>> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql=
>> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
>> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%
>> 20fixVersion%20%3D%201.1.5
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>>
>> >>
>> >>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Flavio Pompermaier
I propose to fix https://issues.apache.org/jira/browse/FLINK-6103 before
issue a release

On Fri, Mar 17, 2017 at 8:12 AM, Ufuk Celebi <[hidden email]> wrote:

> Cool! Thanks for taking care of this Gordon :-)
>
> On Fri, Mar 17, 2017 at 7:13 AM, Tzu-Li (Gordon) Tai
> <[hidden email]> wrote:
> > Update for 1.1.5:
> > The last fixes for 1.1.5 are in! I will create the RC today and start
> the vote.
> >
> > Cheers,
> > Gordon
> >
> >
> > On March 17, 2017 at 1:14:53 AM, Robert Metzger ([hidden email])
> wrote:
> >
> > The cassandra connector is probably not usable in Flink 1.2.0. I would
> like
> > to include a fix in 1.2.1:
> > https://issues.apache.org/jira/browse/FLINK-6084
> >
> > Please let me know if this fix becomes a blocker for the 1.2.1 release.
> If
> > so, I can validate the fix myself to speed up things.
> >
> > On Thu, Mar 16, 2017 at 9:41 AM, Jinkui Shi <[hidden email]>
> wrote:
> >
> >> @Tzu-li(Fordon)Tai
> >>
> >> FLINK-5650 is fix by [1]. Chesnay Scheduler push a PR please.
> >>
> >> [1] https://github.com/zentol/flink/tree/5650_python_test_debug <
> >> https://github.com/zentol/flink/tree/5650_python_test_debug>
> >>
> >>
> >> > 在 2017年3月16日,上午3:37,Stephan Ewen <[hidden email]> 写道:
> >> >
> >> > Thanks for the update!
> >> >
> >> > Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove scheduled
> >> > cancel-task from timer queue to prevent memory leaks
> >> >
> >> > The remaining issue list looks good, but I would say that (5) is
> >> optional.
> >> > It is not a critical production bug.
> >> >
> >> >
> >> >
> >> > On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai <
> >> [hidden email]>
> >> > wrote:
> >> >
> >> >> Thanks a lot for the updates so far everyone!
> >> >>
> >> >> From the discussion so far, the below is the still unfixed pending
> >> issues
> >> >> for 1.1.5 / 1.2.1 release.
> >> >>
> >> >> Since there’s only one backport for 1.1.5 left, I think having an RC
> for
> >> >> 1.1.5 near the end of this week / early next week is very promising,
> as
> >> >> basically everything is already in.
> >> >> I’d be happy to volunteer to help manage the release for 1.1.5, and
> >> >> prepare the RC when it’s ready :)
> >> >>
> >> >> For 1.2.1, we can leave the pending list here for tracking, and come
> >> back
> >> >> to update it in the near future.
> >> >>
> >> >> If there’s anything I missed, please let me know!
> >> >>
> >> >>
> >> >> =========== Still pending for Flink 1.1.5 ===========
> >> >>
> >> >> (1) https://issues.apache.org/jira/browse/FLINK-5701
> >> >> Broken at-least-once Kafka producer.
> >> >> Status: backport PR pending - https://github.com/apache/
> flink/pull/3549
> >> .
> >> >> Since it is a relatively self-contained change, I expect this to be a
> >> fast
> >> >> fix.
> >> >>
> >> >>
> >> >>
> >> >> =========== Still pending for Flink 1.2.1 ===========
> >> >>
> >> >> (1) https://issues.apache.org/jira/browse/FLINK-5808
> >> >> Fix Missing verification for setParallelism and setMaxParallelism
> >> >> Status: PR - https://github.com/apache/flink/pull/3509, review in
> >> progress
> >> >>
> >> >> (2) https://issues.apache.org/jira/browse/FLINK-5713
> >> >> Protect against NPE in WindowOperator window cleanup
> >> >> Status: PR - https://github.com/apache/flink/pull/3535, review
> pending
> >> >>
> >> >> (3) https://issues.apache.org/jira/browse/FLINK-6044
> >> >> TypeSerializerSerializationProxy.read() doesn't verify the read
> buffer
> >> >> length
> >> >> Status: Fixed for master, 1.2 backport pending
> >> >>
> >> >> (4) https://issues.apache.org/jira/browse/FLINK-5985
> >> >> Flink treats every task as stateful (making topology changes
> impossible)
> >> >> Status: PR - https://github.com/apache/flink/pull/3543, review in
> >> progress
> >> >>
> >> >> (5) https://issues.apache.org/jira/browse/FLINK-5650
> >> >> Flink-python tests taking up too much time
> >> >> Status: I think Chesnay currently has some progress with this one, we
> >> can
> >> >> see if we want to make this a blocker
> >> >>
> >> >>
> >> >> Cheers,
> >> >> Gordon
> >> >>
> >> >> On March 15, 2017 at 7:16:53 PM, Jinkui Shi ([hidden email])
> >> wrote:
> >> >>
> >> >> Can we fix this issue in the 1.2.1:
> >> >>
> >> >> Flink-python tests cost too long time
> >> >> https://issues.apache.org/jira/browse/FLINK-5650 <
> >> >> https://issues.apache.org/jira/browse/FLINK-5650>
> >> >>
> >> >>> 在 2017年3月15日,下午6:29,Vladislav Pernin <[hidden email]>
> 写道:
> >> >>>
> >> >>> I just tested in in my reproducer. It works.
> >> >>>
> >> >>> 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <[hidden email]>:
> >> >>>
> >> >>>> I did in fact just open a PR for
> >> >>>>> https://issues.apache.org/jira/browse/FLINK-6001
> >> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger
> and
> >> >>>>> allowedLateness
> >> >>>>
> >> >>>>
> >> >>>> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote:
> >> >>>>> Hi,
> >> >>>>>
> >> >>>>> I would also include the following (not yet resolved) issue in the
> >> >> 1.2.1
> >> >>>>> scope :
> >> >>>>>
> >> >>>>> https://issues.apache.org/jira/browse/FLINK-6001
> >> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger
> and
> >> >>>>> allowedLateness
> >> >>>>>
> >> >>>>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <[hidden email]>:
> >> >>>>>
> >> >>>>>> Big +1 Gordon!
> >> >>>>>>
> >> >>>>>> I think (10) is very critical to have in 1.2.1.
> >> >>>>>>
> >> >>>>>> – Ufuk
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter
> >> >>>>>> <[hidden email]> wrote:
> >> >>>>>>> Hi,
> >> >>>>>>>
> >> >>>>>>> I would suggest to also include in 1.2.1:
> >> >>>>>>>
> >> >>>>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044 <
> >> >>>>>> https://issues.apache.org/jira/browse/FLINK-6044>
> >> >>>>>>> Replaces unintentional calls to InputStream#read(…) with the
> >> intended
> >> >>>>>>> and correct InputStream#readFully(…)
> >> >>>>>>> Status: PR
> >> >>>>>>>
> >> >>>>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985 <
> >> >>>>>> https://issues.apache.org/jira/browse/FLINK-5985>
> >> >>>>>>> Flink 1.2 was creating state handles for stateless tasks which
> >> caused
> >> >>>>>> trouble
> >> >>>>>>> at restore time for users that wanted to do some changes that
> only
> >> >>>>>> include
> >> >>>>>>> stateless operators to their topology.
> >> >>>>>>> Status: PR
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <
> >> [hidden email]
> >> >>>>> :
> >> >>>>>>>>
> >> >>>>>>>> Thanks for kicking off the discussion Tzu-Li. I'd like to add
> the
> >> >>>>>> following
> >> >>>>>>>> issues which have already been merged into the 1.2-release and
> >> >>>>>> 1.1-release
> >> >>>>>>>> branch:
> >> >>>>>>>>
> >> >>>>>>>> 1.2.1:
> >> >>>>>>>>
> >> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
> >> >>>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper
> >> data.
> >> >>>>>>>> Corrupted checkpoints will now be skipped.
> >> >>>>>>>> Status: Merged
> >> >>>>>>>>
> >> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
> >> >>>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve
> >> the
> >> >>>>>>>> completed checkpoint from the meta data state handle retrieved
> >> from
> >> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is
> >> >>>> deleted.
> >> >>>>>>>> Checkpoints with unretrievable state handles are skipped.
> >> >>>>>>>> Status: Merged
> >> >>>>>>>>
> >> >>>>>>>> 1.1.5:
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
> >> >>>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper
> >> data.
> >> >>>>>>>> Corrupted checkpoints will now be skipped.
> >> >>>>>>>> Status: Merged
> >> >>>>>>>>
> >> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
> >> >>>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve
> >> the
> >> >>>>>>>> completed checkpoint from the meta data state handle retrieved
> >> from
> >> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is
> >> >>>> deleted.
> >> >>>>>>>> Checkpoints with unretrievable state handles are skipped.
> >> >>>>>>>> Status: Merged
> >> >>>>>>>>
> >> >>>>>>>> Cheers,
> >> >>>>>>>> Till
> >> >>>>>>>>
> >> >>>>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <
> >> >>>>>> [hidden email]>
> >> >>>>>>>> wrote:
> >> >>>>>>>>
> >> >>>>>>>>> Hi all!
> >> >>>>>>>>>
> >> >>>>>>>>> I would like to start a discussion for the next bugfix release
> >> for
> >> >>>>>> 1.1.x
> >> >>>>>>>>> and 1.2.x.
> >> >>>>>>>>> There’s been quite a few critical fixes for bugs in both the
> >> >>>> releases
> >> >>>>>>>>> recently, and I think they deserve a bugfix release soon.
> >> >>>>>>>>> Most of the bugs were reported by users.
> >> >>>>>>>>>
> >> >>>>>>>>> I’m starting the discussion for both bugfix releases because
> most
> >> >>>> fixes
> >> >>>>>>>>> span both releases (almost identical).
> >> >>>>>>>>> Of course, the actual RC votes and RC creation process doesn’t
> >> >>>> have to
> >> >>>>>> be
> >> >>>>>>>>> started together.
> >> >>>>>>>>>
> >> >>>>>>>>> Here’s an overview of what’s been collected so far, for both
> >> bugfix
> >> >>>>>>>>> releases -
> >> >>>>>>>>> (it’s a list of what I’m aware of so far, and may be missing
> >> stuff;
> >> >>>>>> please
> >> >>>>>>>>> append and bring to attention as necessary :-) )
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> For Flink 1.2.1:
> >> >>>>>>>>>
> >> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> >> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on
> >> >>>>>> checkpoints.
> >> >>>>>>>>> This compromises the producer’s at-least-once guarantee.
> >> >>>>>>>>> Status: merged
> >> >>>>>>>>>
> >> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949:
> >> >>>>>>>>> Do not check Kerberos credentials for non-Kerberos
> >> authentications.
> >> >>>>>> MapR
> >> >>>>>>>>> users are affected by this, and cannot submit Flink on YARN
> jobs
> >> >>>> on a
> >> >>>>>>>>> secured MapR cluster.
> >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3528, one
> +1
> >> >>>> already
> >> >>>>>>>>>
> >> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006:
> >> >>>>>>>>> Kafka Consumer can lose state if queried partition list is
> >> >>>> incomplete
> >> >>>>>> on
> >> >>>>>>>>> restore.
> >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3505, one
> +1
> >> >>>> already
> >> >>>>>>>>>
> >> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025:
> >> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
> >> >>>>>> JavaSerializer is
> >> >>>>>>>>> used.
> >> >>>>>>>>> Status: merged
> >> >>>>>>>>>
> >> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771:
> >> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.
> >> >>>>>>>>> Status: merged
> >> >>>>>>>>>
> >> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934:
> >> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor.
> This
> >> >>>>>> fixes a
> >> >>>>>>>>> bug that causes HA recovery to fail.
> >> >>>>>>>>> Status: merged
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> For Flink 1.1.5:
> >> >>>>>>>>>
> >> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> >> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on
> >> >>>>>> checkpoints.
> >> >>>>>>>>> This compromises the producer’s at-least-once guarantee.
> >> >>>>>>>>> Status: This is already merged for 1.2.1. I would personally
> like
> >> >>>> to
> >> >>>>>>>>> backport the fix for this to 1.1.5 also.
> >> >>>>>>>>>
> >> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006:
> >> >>>>>>>>> Kafka Consumer can lose state if queried partition list is
> >> >>>> incomplete
> >> >>>>>> on
> >> >>>>>>>>> restore.
> >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3507, one
> +1
> >> >>>> already
> >> >>>>>>>>>
> >> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025:
> >> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
> >> >>>>>> JavaSerializer is
> >> >>>>>>>>> used.
> >> >>>>>>>>> Status: merged
> >> >>>>>>>>>
> >> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771:
> >> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.
> >> >>>>>>>>> Status: merged
> >> >>>>>>>>>
> >> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934:
> >> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor.
> This
> >> >>>>>> fixes a
> >> >>>>>>>>> bug that causes HA recovery to fail.
> >> >>>>>>>>> Status: merged
> >> >>>>>>>>>
> >> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048:
> >> >>>>>>>>> Kafka Consumer (0.9/0.10) threading model leads problematic
> >> >>>>>> cancellation
> >> >>>>>>>>> behavior.
> >> >>>>>>>>> Status: This fix was already released in 1.2.0, but never
> made it
> >> >>>> into
> >> >>>>>> the
> >> >>>>>>>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>> What do you think? From the list so far, we pretty much
> already
> >> >>>> have
> >> >>>>>>>>> everything in, so I think it would be nice to aim for RCs by
> the
> >> >>>> end of
> >> >>>>>>>>> this week.
> >> >>>>>>>>> Since both bugfix releases cover almost the same list of
> issues,
> >> I
> >> >>>>>> think
> >> >>>>>>>>> it shouldn’t be too hard for us to kick off both bugfix
> releases
> >> >>>>>> around the
> >> >>>>>>>>> same time.
> >> >>>>>>>>>
> >> >>>>>>>>> Also FYI, here’s the lists of JIRA tickets tagged with
> "1.2.1” /
> >> >>>>>> “1.1.5”
> >> >>>>>>>>> as the Fix Versions, and are still open.
> >> >>>>>>>>> We should probably want to check if there’s anything on there
> >> that
> >> >>>> we
> >> >>>>>>>>> should block on for the releases:
> >> >>>>>>>>>
> >> >>>>>>>>> For 1.2.1:
> >> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql=
> >> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> >> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%
> >> 20fixVersion%20%3D%201.2.1
> >> >>>>>>>>>
> >> >>>>>>>>> For 1.1.5:
> >> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql=
> >> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> >> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%
> >> 20fixVersion%20%3D%201.1.5
> >> >>>>>>>
> >> >>>>>>
> >> >>>>
> >> >>>
> >> >>
> >> >>
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Robert Metzger
I don't think that his issue should be a reason to hold back a bugfix
release.
There are workarounds for the problem you are describing. Once we've fixed
it, we can include it into the next upcoming bugfix release.

On Fri, Mar 17, 2017 at 4:22 PM, Flavio Pompermaier <[hidden email]>
wrote:

> I propose to fix https://issues.apache.org/jira/browse/FLINK-6103 before
> issue a release
>
> On Fri, Mar 17, 2017 at 8:12 AM, Ufuk Celebi <[hidden email]> wrote:
>
> > Cool! Thanks for taking care of this Gordon :-)
> >
> > On Fri, Mar 17, 2017 at 7:13 AM, Tzu-Li (Gordon) Tai
> > <[hidden email]> wrote:
> > > Update for 1.1.5:
> > > The last fixes for 1.1.5 are in! I will create the RC today and start
> > the vote.
> > >
> > > Cheers,
> > > Gordon
> > >
> > >
> > > On March 17, 2017 at 1:14:53 AM, Robert Metzger ([hidden email])
> > wrote:
> > >
> > > The cassandra connector is probably not usable in Flink 1.2.0. I would
> > like
> > > to include a fix in 1.2.1:
> > > https://issues.apache.org/jira/browse/FLINK-6084
> > >
> > > Please let me know if this fix becomes a blocker for the 1.2.1 release.
> > If
> > > so, I can validate the fix myself to speed up things.
> > >
> > > On Thu, Mar 16, 2017 at 9:41 AM, Jinkui Shi <[hidden email]>
> > wrote:
> > >
> > >> @Tzu-li(Fordon)Tai
> > >>
> > >> FLINK-5650 is fix by [1]. Chesnay Scheduler push a PR please.
> > >>
> > >> [1] https://github.com/zentol/flink/tree/5650_python_test_debug <
> > >> https://github.com/zentol/flink/tree/5650_python_test_debug>
> > >>
> > >>
> > >> > 在 2017年3月16日,上午3:37,Stephan Ewen <[hidden email]> 写道:
> > >> >
> > >> > Thanks for the update!
> > >> >
> > >> > Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove
> scheduled
> > >> > cancel-task from timer queue to prevent memory leaks
> > >> >
> > >> > The remaining issue list looks good, but I would say that (5) is
> > >> optional.
> > >> > It is not a critical production bug.
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai <
> > >> [hidden email]>
> > >> > wrote:
> > >> >
> > >> >> Thanks a lot for the updates so far everyone!
> > >> >>
> > >> >> From the discussion so far, the below is the still unfixed pending
> > >> issues
> > >> >> for 1.1.5 / 1.2.1 release.
> > >> >>
> > >> >> Since there’s only one backport for 1.1.5 left, I think having an
> RC
> > for
> > >> >> 1.1.5 near the end of this week / early next week is very
> promising,
> > as
> > >> >> basically everything is already in.
> > >> >> I’d be happy to volunteer to help manage the release for 1.1.5, and
> > >> >> prepare the RC when it’s ready :)
> > >> >>
> > >> >> For 1.2.1, we can leave the pending list here for tracking, and
> come
> > >> back
> > >> >> to update it in the near future.
> > >> >>
> > >> >> If there’s anything I missed, please let me know!
> > >> >>
> > >> >>
> > >> >> =========== Still pending for Flink 1.1.5 ===========
> > >> >>
> > >> >> (1) https://issues.apache.org/jira/browse/FLINK-5701
> > >> >> Broken at-least-once Kafka producer.
> > >> >> Status: backport PR pending - https://github.com/apache/
> > flink/pull/3549
> > >> .
> > >> >> Since it is a relatively self-contained change, I expect this to
> be a
> > >> fast
> > >> >> fix.
> > >> >>
> > >> >>
> > >> >>
> > >> >> =========== Still pending for Flink 1.2.1 ===========
> > >> >>
> > >> >> (1) https://issues.apache.org/jira/browse/FLINK-5808
> > >> >> Fix Missing verification for setParallelism and setMaxParallelism
> > >> >> Status: PR - https://github.com/apache/flink/pull/3509, review in
> > >> progress
> > >> >>
> > >> >> (2) https://issues.apache.org/jira/browse/FLINK-5713
> > >> >> Protect against NPE in WindowOperator window cleanup
> > >> >> Status: PR - https://github.com/apache/flink/pull/3535, review
> > pending
> > >> >>
> > >> >> (3) https://issues.apache.org/jira/browse/FLINK-6044
> > >> >> TypeSerializerSerializationProxy.read() doesn't verify the read
> > buffer
> > >> >> length
> > >> >> Status: Fixed for master, 1.2 backport pending
> > >> >>
> > >> >> (4) https://issues.apache.org/jira/browse/FLINK-5985
> > >> >> Flink treats every task as stateful (making topology changes
> > impossible)
> > >> >> Status: PR - https://github.com/apache/flink/pull/3543, review in
> > >> progress
> > >> >>
> > >> >> (5) https://issues.apache.org/jira/browse/FLINK-5650
> > >> >> Flink-python tests taking up too much time
> > >> >> Status: I think Chesnay currently has some progress with this one,
> we
> > >> can
> > >> >> see if we want to make this a blocker
> > >> >>
> > >> >>
> > >> >> Cheers,
> > >> >> Gordon
> > >> >>
> > >> >> On March 15, 2017 at 7:16:53 PM, Jinkui Shi ([hidden email])
> > >> wrote:
> > >> >>
> > >> >> Can we fix this issue in the 1.2.1:
> > >> >>
> > >> >> Flink-python tests cost too long time
> > >> >> https://issues.apache.org/jira/browse/FLINK-5650 <
> > >> >> https://issues.apache.org/jira/browse/FLINK-5650>
> > >> >>
> > >> >>> 在 2017年3月15日,下午6:29,Vladislav Pernin <[hidden email]>
> > 写道:
> > >> >>>
> > >> >>> I just tested in in my reproducer. It works.
> > >> >>>
> > >> >>> 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <[hidden email]
> >:
> > >> >>>
> > >> >>>> I did in fact just open a PR for
> > >> >>>>> https://issues.apache.org/jira/browse/FLINK-6001
> > >> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger
> > and
> > >> >>>>> allowedLateness
> > >> >>>>
> > >> >>>>
> > >> >>>> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote:
> > >> >>>>> Hi,
> > >> >>>>>
> > >> >>>>> I would also include the following (not yet resolved) issue in
> the
> > >> >> 1.2.1
> > >> >>>>> scope :
> > >> >>>>>
> > >> >>>>> https://issues.apache.org/jira/browse/FLINK-6001
> > >> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger
> > and
> > >> >>>>> allowedLateness
> > >> >>>>>
> > >> >>>>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <[hidden email]>:
> > >> >>>>>
> > >> >>>>>> Big +1 Gordon!
> > >> >>>>>>
> > >> >>>>>> I think (10) is very critical to have in 1.2.1.
> > >> >>>>>>
> > >> >>>>>> – Ufuk
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter
> > >> >>>>>> <[hidden email]> wrote:
> > >> >>>>>>> Hi,
> > >> >>>>>>>
> > >> >>>>>>> I would suggest to also include in 1.2.1:
> > >> >>>>>>>
> > >> >>>>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044 <
> > >> >>>>>> https://issues.apache.org/jira/browse/FLINK-6044>
> > >> >>>>>>> Replaces unintentional calls to InputStream#read(…) with the
> > >> intended
> > >> >>>>>>> and correct InputStream#readFully(…)
> > >> >>>>>>> Status: PR
> > >> >>>>>>>
> > >> >>>>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985 <
> > >> >>>>>> https://issues.apache.org/jira/browse/FLINK-5985>
> > >> >>>>>>> Flink 1.2 was creating state handles for stateless tasks which
> > >> caused
> > >> >>>>>> trouble
> > >> >>>>>>> at restore time for users that wanted to do some changes that
> > only
> > >> >>>>>> include
> > >> >>>>>>> stateless operators to their topology.
> > >> >>>>>>> Status: PR
> > >> >>>>>>>
> > >> >>>>>>>
> > >> >>>>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <
> > >> [hidden email]
> > >> >>>>> :
> > >> >>>>>>>>
> > >> >>>>>>>> Thanks for kicking off the discussion Tzu-Li. I'd like to add
> > the
> > >> >>>>>> following
> > >> >>>>>>>> issues which have already been merged into the 1.2-release
> and
> > >> >>>>>> 1.1-release
> > >> >>>>>>>> branch:
> > >> >>>>>>>>
> > >> >>>>>>>> 1.2.1:
> > >> >>>>>>>>
> > >> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
> > >> >>>>>>>> Hardens the checkpoint recovery in case of corrupted
> ZooKeeper
> > >> data.
> > >> >>>>>>>> Corrupted checkpoints will now be skipped.
> > >> >>>>>>>> Status: Merged
> > >> >>>>>>>>
> > >> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
> > >> >>>>>>>> Hardens the checkpoint recovery in case that we cannot
> retrieve
> > >> the
> > >> >>>>>>>> completed checkpoint from the meta data state handle
> retrieved
> > >> from
> > >> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is
> > >> >>>> deleted.
> > >> >>>>>>>> Checkpoints with unretrievable state handles are skipped.
> > >> >>>>>>>> Status: Merged
> > >> >>>>>>>>
> > >> >>>>>>>> 1.1.5:
> > >> >>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
> > >> >>>>>>>> Hardens the checkpoint recovery in case of corrupted
> ZooKeeper
> > >> data.
> > >> >>>>>>>> Corrupted checkpoints will now be skipped.
> > >> >>>>>>>> Status: Merged
> > >> >>>>>>>>
> > >> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
> > >> >>>>>>>> Hardens the checkpoint recovery in case that we cannot
> retrieve
> > >> the
> > >> >>>>>>>> completed checkpoint from the meta data state handle
> retrieved
> > >> from
> > >> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is
> > >> >>>> deleted.
> > >> >>>>>>>> Checkpoints with unretrievable state handles are skipped.
> > >> >>>>>>>> Status: Merged
> > >> >>>>>>>>
> > >> >>>>>>>> Cheers,
> > >> >>>>>>>> Till
> > >> >>>>>>>>
> > >> >>>>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <
> > >> >>>>>> [hidden email]>
> > >> >>>>>>>> wrote:
> > >> >>>>>>>>
> > >> >>>>>>>>> Hi all!
> > >> >>>>>>>>>
> > >> >>>>>>>>> I would like to start a discussion for the next bugfix
> release
> > >> for
> > >> >>>>>> 1.1.x
> > >> >>>>>>>>> and 1.2.x.
> > >> >>>>>>>>> There’s been quite a few critical fixes for bugs in both the
> > >> >>>> releases
> > >> >>>>>>>>> recently, and I think they deserve a bugfix release soon.
> > >> >>>>>>>>> Most of the bugs were reported by users.
> > >> >>>>>>>>>
> > >> >>>>>>>>> I’m starting the discussion for both bugfix releases because
> > most
> > >> >>>> fixes
> > >> >>>>>>>>> span both releases (almost identical).
> > >> >>>>>>>>> Of course, the actual RC votes and RC creation process
> doesn’t
> > >> >>>> have to
> > >> >>>>>> be
> > >> >>>>>>>>> started together.
> > >> >>>>>>>>>
> > >> >>>>>>>>> Here’s an overview of what’s been collected so far, for both
> > >> bugfix
> > >> >>>>>>>>> releases -
> > >> >>>>>>>>> (it’s a list of what I’m aware of so far, and may be missing
> > >> stuff;
> > >> >>>>>> please
> > >> >>>>>>>>> append and bring to attention as necessary :-) )
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>> For Flink 1.2.1:
> > >> >>>>>>>>>
> > >> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> > >> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked
> on
> > >> >>>>>> checkpoints.
> > >> >>>>>>>>> This compromises the producer’s at-least-once guarantee.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949:
> > >> >>>>>>>>> Do not check Kerberos credentials for non-Kerberos
> > >> authentications.
> > >> >>>>>> MapR
> > >> >>>>>>>>> users are affected by this, and cannot submit Flink on YARN
> > jobs
> > >> >>>> on a
> > >> >>>>>>>>> secured MapR cluster.
> > >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3528, one
> > +1
> > >> >>>> already
> > >> >>>>>>>>>
> > >> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006:
> > >> >>>>>>>>> Kafka Consumer can lose state if queried partition list is
> > >> >>>> incomplete
> > >> >>>>>> on
> > >> >>>>>>>>> restore.
> > >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3505, one
> > +1
> > >> >>>> already
> > >> >>>>>>>>>
> > >> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025:
> > >> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
> > >> >>>>>> JavaSerializer is
> > >> >>>>>>>>> used.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771:
> > >> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934:
> > >> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor.
> > This
> > >> >>>>>> fixes a
> > >> >>>>>>>>> bug that causes HA recovery to fail.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>> For Flink 1.1.5:
> > >> >>>>>>>>>
> > >> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> > >> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked
> on
> > >> >>>>>> checkpoints.
> > >> >>>>>>>>> This compromises the producer’s at-least-once guarantee.
> > >> >>>>>>>>> Status: This is already merged for 1.2.1. I would personally
> > like
> > >> >>>> to
> > >> >>>>>>>>> backport the fix for this to 1.1.5 also.
> > >> >>>>>>>>>
> > >> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006:
> > >> >>>>>>>>> Kafka Consumer can lose state if queried partition list is
> > >> >>>> incomplete
> > >> >>>>>> on
> > >> >>>>>>>>> restore.
> > >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3507, one
> > +1
> > >> >>>> already
> > >> >>>>>>>>>
> > >> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025:
> > >> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
> > >> >>>>>> JavaSerializer is
> > >> >>>>>>>>> used.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771:
> > >> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934:
> > >> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor.
> > This
> > >> >>>>>> fixes a
> > >> >>>>>>>>> bug that causes HA recovery to fail.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048:
> > >> >>>>>>>>> Kafka Consumer (0.9/0.10) threading model leads problematic
> > >> >>>>>> cancellation
> > >> >>>>>>>>> behavior.
> > >> >>>>>>>>> Status: This fix was already released in 1.2.0, but never
> > made it
> > >> >>>> into
> > >> >>>>>> the
> > >> >>>>>>>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>> What do you think? From the list so far, we pretty much
> > already
> > >> >>>> have
> > >> >>>>>>>>> everything in, so I think it would be nice to aim for RCs by
> > the
> > >> >>>> end of
> > >> >>>>>>>>> this week.
> > >> >>>>>>>>> Since both bugfix releases cover almost the same list of
> > issues,
> > >> I
> > >> >>>>>> think
> > >> >>>>>>>>> it shouldn’t be too hard for us to kick off both bugfix
> > releases
> > >> >>>>>> around the
> > >> >>>>>>>>> same time.
> > >> >>>>>>>>>
> > >> >>>>>>>>> Also FYI, here’s the lists of JIRA tickets tagged with
> > "1.2.1” /
> > >> >>>>>> “1.1.5”
> > >> >>>>>>>>> as the Fix Versions, and are still open.
> > >> >>>>>>>>> We should probably want to check if there’s anything on
> there
> > >> that
> > >> >>>> we
> > >> >>>>>>>>> should block on for the releases:
> > >> >>>>>>>>>
> > >> >>>>>>>>> For 1.2.1:
> > >> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql=
> > >> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> > >> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%
> > >> 20fixVersion%20%3D%201.2.1
> > >> >>>>>>>>>
> > >> >>>>>>>>> For 1.1.5:
> > >> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql=
> > >> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> > >> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%
> > >> 20fixVersion%20%3D%201.1.5
> > >> >>>>>>>
> > >> >>>>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >> >>
> > >>
> > >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Tzu-Li (Gordon) Tai
Update for Flink 1.2.1:

There’s only one PR pending that is LGTM -
https://issues.apache.org/jira/browse/FLINK-6084
Fix for Cassandra connector dropping metrics-core dependency.

We can proceed to create the release candidate very soon :-)
Release 1.1.5 RC1 seems to be in good shape so far, so hopefully we can start voting for 1.2.1 tomorrow.

Also, we’re still lacking a release manager for 1.2.1. Is anyone interested in volunteering for this release?
If nobody steps up for it before tomorrow, I can also do it.

Cheers,
Gordon

On March 18, 2017 at 12:52:48 AM, Robert Metzger ([hidden email]) wrote:

I don't think that his issue should be a reason to hold back a bugfix  
release.  
There are workarounds for the problem you are describing. Once we've fixed  
it, we can include it into the next upcoming bugfix release.  

On Fri, Mar 17, 2017 at 4:22 PM, Flavio Pompermaier <[hidden email]>  
wrote:  

> I propose to fix https://issues.apache.org/jira/browse/FLINK-6103 before  
> issue a release  
>  
> On Fri, Mar 17, 2017 at 8:12 AM, Ufuk Celebi <[hidden email]> wrote:  
>  
> > Cool! Thanks for taking care of this Gordon :-)  
> >  
> > On Fri, Mar 17, 2017 at 7:13 AM, Tzu-Li (Gordon) Tai  
> > <[hidden email]> wrote:  
> > > Update for 1.1.5:  
> > > The last fixes for 1.1.5 are in! I will create the RC today and start  
> > the vote.  
> > >  
> > > Cheers,  
> > > Gordon  
> > >  
> > >  
> > > On March 17, 2017 at 1:14:53 AM, Robert Metzger ([hidden email])  
> > wrote:  
> > >  
> > > The cassandra connector is probably not usable in Flink 1.2.0. I would  
> > like  
> > > to include a fix in 1.2.1:  
> > > https://issues.apache.org/jira/browse/FLINK-6084 
> > >  
> > > Please let me know if this fix becomes a blocker for the 1.2.1 release.  
> > If  
> > > so, I can validate the fix myself to speed up things.  
> > >  
> > > On Thu, Mar 16, 2017 at 9:41 AM, Jinkui Shi <[hidden email]>  
> > wrote:  
> > >  
> > >> @Tzu-li(Fordon)Tai  
> > >>  
> > >> FLINK-5650 is fix by [1]. Chesnay Scheduler push a PR please.  
> > >>  
> > >> [1] https://github.com/zentol/flink/tree/5650_python_test_debug <  
> > >> https://github.com/zentol/flink/tree/5650_python_test_debug>  
> > >>  
> > >>  
> > >> > 在 2017年3月16日,上午3:37,Stephan Ewen <[hidden email]> 写道:  
> > >> >  
> > >> > Thanks for the update!  
> > >> >  
> > >> > Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove  
> scheduled  
> > >> > cancel-task from timer queue to prevent memory leaks  
> > >> >  
> > >> > The remaining issue list looks good, but I would say that (5) is  
> > >> optional.  
> > >> > It is not a critical production bug.  
> > >> >  
> > >> >  
> > >> >  
> > >> > On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai <  
> > >> [hidden email]>  
> > >> > wrote:  
> > >> >  
> > >> >> Thanks a lot for the updates so far everyone!  
> > >> >>  
> > >> >> From the discussion so far, the below is the still unfixed pending  
> > >> issues  
> > >> >> for 1.1.5 / 1.2.1 release.  
> > >> >>  
> > >> >> Since there’s only one backport for 1.1.5 left, I think having an  
> RC  
> > for  
> > >> >> 1.1.5 near the end of this week / early next week is very  
> promising,  
> > as  
> > >> >> basically everything is already in.  
> > >> >> I’d be happy to volunteer to help manage the release for 1.1.5, and  
> > >> >> prepare the RC when it’s ready :)  
> > >> >>  
> > >> >> For 1.2.1, we can leave the pending list here for tracking, and  
> come  
> > >> back  
> > >> >> to update it in the near future.  
> > >> >>  
> > >> >> If there’s anything I missed, please let me know!  
> > >> >>  
> > >> >>  
> > >> >> =========== Still pending for Flink 1.1.5 ===========  
> > >> >>  
> > >> >> (1) https://issues.apache.org/jira/browse/FLINK-5701 
> > >> >> Broken at-least-once Kafka producer.  
> > >> >> Status: backport PR pending - https://github.com/apache/ 
> > flink/pull/3549  
> > >> .  
> > >> >> Since it is a relatively self-contained change, I expect this to  
> be a  
> > >> fast  
> > >> >> fix.  
> > >> >>  
> > >> >>  
> > >> >>  
> > >> >> =========== Still pending for Flink 1.2.1 ===========  
> > >> >>  
> > >> >> (1) https://issues.apache.org/jira/browse/FLINK-5808 
> > >> >> Fix Missing verification for setParallelism and setMaxParallelism  
> > >> >> Status: PR - https://github.com/apache/flink/pull/3509, review in  
> > >> progress  
> > >> >>  
> > >> >> (2) https://issues.apache.org/jira/browse/FLINK-5713 
> > >> >> Protect against NPE in WindowOperator window cleanup  
> > >> >> Status: PR - https://github.com/apache/flink/pull/3535, review  
> > pending  
> > >> >>  
> > >> >> (3) https://issues.apache.org/jira/browse/FLINK-6044 
> > >> >> TypeSerializerSerializationProxy.read() doesn't verify the read  
> > buffer  
> > >> >> length  
> > >> >> Status: Fixed for master, 1.2 backport pending  
> > >> >>  
> > >> >> (4) https://issues.apache.org/jira/browse/FLINK-5985 
> > >> >> Flink treats every task as stateful (making topology changes  
> > impossible)  
> > >> >> Status: PR - https://github.com/apache/flink/pull/3543, review in  
> > >> progress  
> > >> >>  
> > >> >> (5) https://issues.apache.org/jira/browse/FLINK-5650 
> > >> >> Flink-python tests taking up too much time  
> > >> >> Status: I think Chesnay currently has some progress with this one,  
> we  
> > >> can  
> > >> >> see if we want to make this a blocker  
> > >> >>  
> > >> >>  
> > >> >> Cheers,  
> > >> >> Gordon  
> > >> >>  
> > >> >> On March 15, 2017 at 7:16:53 PM, Jinkui Shi ([hidden email])  
> > >> wrote:  
> > >> >>  
> > >> >> Can we fix this issue in the 1.2.1:  
> > >> >>  
> > >> >> Flink-python tests cost too long time  
> > >> >> https://issues.apache.org/jira/browse/FLINK-5650 <  
> > >> >> https://issues.apache.org/jira/browse/FLINK-5650>  
> > >> >>  
> > >> >>> 在 2017年3月15日,下午6:29,Vladislav Pernin <[hidden email]>  
> > 写道:  
> > >> >>>  
> > >> >>> I just tested in in my reproducer. It works.  
> > >> >>>  
> > >> >>> 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <[hidden email]  
> >:  
> > >> >>>  
> > >> >>>> I did in fact just open a PR for  
> > >> >>>>> https://issues.apache.org/jira/browse/FLINK-6001 
> > >> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger  
> > and  
> > >> >>>>> allowedLateness  
> > >> >>>>  
> > >> >>>>  
> > >> >>>> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote:  
> > >> >>>>> Hi,  
> > >> >>>>>  
> > >> >>>>> I would also include the following (not yet resolved) issue in  
> the  
> > >> >> 1.2.1  
> > >> >>>>> scope :  
> > >> >>>>>  
> > >> >>>>> https://issues.apache.org/jira/browse/FLINK-6001 
> > >> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger  
> > and  
> > >> >>>>> allowedLateness  
> > >> >>>>>  
> > >> >>>>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <[hidden email]>:  
> > >> >>>>>  
> > >> >>>>>> Big +1 Gordon!  
> > >> >>>>>>  
> > >> >>>>>> I think (10) is very critical to have in 1.2.1.  
> > >> >>>>>>  
> > >> >>>>>> – Ufuk  
> > >> >>>>>>  
> > >> >>>>>>  
> > >> >>>>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter  
> > >> >>>>>> <[hidden email]> wrote:  
> > >> >>>>>>> Hi,  
> > >> >>>>>>>  
> > >> >>>>>>> I would suggest to also include in 1.2.1:  
> > >> >>>>>>>  
> > >> >>>>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044 <  
> > >> >>>>>> https://issues.apache.org/jira/browse/FLINK-6044>  
> > >> >>>>>>> Replaces unintentional calls to InputStream#read(…) with the  
> > >> intended  
> > >> >>>>>>> and correct InputStream#readFully(…)  
> > >> >>>>>>> Status: PR  
> > >> >>>>>>>  
> > >> >>>>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985 <  
> > >> >>>>>> https://issues.apache.org/jira/browse/FLINK-5985>  
> > >> >>>>>>> Flink 1.2 was creating state handles for stateless tasks which  
> > >> caused  
> > >> >>>>>> trouble  
> > >> >>>>>>> at restore time for users that wanted to do some changes that  
> > only  
> > >> >>>>>> include  
> > >> >>>>>>> stateless operators to their topology.  
> > >> >>>>>>> Status: PR  
> > >> >>>>>>>  
> > >> >>>>>>>  
> > >> >>>>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <  
> > >> [hidden email]  
> > >> >>>>> :  
> > >> >>>>>>>>  
> > >> >>>>>>>> Thanks for kicking off the discussion Tzu-Li. I'd like to add  
> > the  
> > >> >>>>>> following  
> > >> >>>>>>>> issues which have already been merged into the 1.2-release  
> and  
> > >> >>>>>> 1.1-release  
> > >> >>>>>>>> branch:  
> > >> >>>>>>>>  
> > >> >>>>>>>> 1.2.1:  
> > >> >>>>>>>>  
> > >> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942 
> > >> >>>>>>>> Hardens the checkpoint recovery in case of corrupted  
> ZooKeeper  
> > >> data.  
> > >> >>>>>>>> Corrupted checkpoints will now be skipped.  
> > >> >>>>>>>> Status: Merged  
> > >> >>>>>>>>  
> > >> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940 
> > >> >>>>>>>> Hardens the checkpoint recovery in case that we cannot  
> retrieve  
> > >> the  
> > >> >>>>>>>> completed checkpoint from the meta data state handle  
> retrieved  
> > >> from  
> > >> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is  
> > >> >>>> deleted.  
> > >> >>>>>>>> Checkpoints with unretrievable state handles are skipped.  
> > >> >>>>>>>> Status: Merged  
> > >> >>>>>>>>  
> > >> >>>>>>>> 1.1.5:  
> > >> >>>>>>>>  
> > >> >>>>>>>>  
> > >> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942 
> > >> >>>>>>>> Hardens the checkpoint recovery in case of corrupted  
> ZooKeeper  
> > >> data.  
> > >> >>>>>>>> Corrupted checkpoints will now be skipped.  
> > >> >>>>>>>> Status: Merged  
> > >> >>>>>>>>  
> > >> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940 
> > >> >>>>>>>> Hardens the checkpoint recovery in case that we cannot  
> retrieve  
> > >> the  
> > >> >>>>>>>> completed checkpoint from the meta data state handle  
> retrieved  
> > >> from  
> > >> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is  
> > >> >>>> deleted.  
> > >> >>>>>>>> Checkpoints with unretrievable state handles are skipped.  
> > >> >>>>>>>> Status: Merged  
> > >> >>>>>>>>  
> > >> >>>>>>>> Cheers,  
> > >> >>>>>>>> Till  
> > >> >>>>>>>>  
> > >> >>>>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <  
> > >> >>>>>> [hidden email]>  
> > >> >>>>>>>> wrote:  
> > >> >>>>>>>>  
> > >> >>>>>>>>> Hi all!  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> I would like to start a discussion for the next bugfix  
> release  
> > >> for  
> > >> >>>>>> 1.1.x  
> > >> >>>>>>>>> and 1.2.x.  
> > >> >>>>>>>>> There’s been quite a few critical fixes for bugs in both the  
> > >> >>>> releases  
> > >> >>>>>>>>> recently, and I think they deserve a bugfix release soon.  
> > >> >>>>>>>>> Most of the bugs were reported by users.  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> I’m starting the discussion for both bugfix releases because  
> > most  
> > >> >>>> fixes  
> > >> >>>>>>>>> span both releases (almost identical).  
> > >> >>>>>>>>> Of course, the actual RC votes and RC creation process  
> doesn’t  
> > >> >>>> have to  
> > >> >>>>>> be  
> > >> >>>>>>>>> started together.  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> Here’s an overview of what’s been collected so far, for both  
> > >> bugfix  
> > >> >>>>>>>>> releases -  
> > >> >>>>>>>>> (it’s a list of what I’m aware of so far, and may be missing  
> > >> stuff;  
> > >> >>>>>> please  
> > >> >>>>>>>>> append and bring to attention as necessary :-) )  
> > >> >>>>>>>>>  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> For Flink 1.2.1:  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701: 
> > >> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked  
> on  
> > >> >>>>>> checkpoints.  
> > >> >>>>>>>>> This compromises the producer’s at-least-once guarantee.  
> > >> >>>>>>>>> Status: merged  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949: 
> > >> >>>>>>>>> Do not check Kerberos credentials for non-Kerberos  
> > >> authentications.  
> > >> >>>>>> MapR  
> > >> >>>>>>>>> users are affected by this, and cannot submit Flink on YARN  
> > jobs  
> > >> >>>> on a  
> > >> >>>>>>>>> secured MapR cluster.  
> > >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3528, one  
> > +1  
> > >> >>>> already  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006: 
> > >> >>>>>>>>> Kafka Consumer can lose state if queried partition list is  
> > >> >>>> incomplete  
> > >> >>>>>> on  
> > >> >>>>>>>>> restore.  
> > >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3505, one  
> > +1  
> > >> >>>> already  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025: 
> > >> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s  
> > >> >>>>>> JavaSerializer is  
> > >> >>>>>>>>> used.  
> > >> >>>>>>>>> Status: merged  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771: 
> > >> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.  
> > >> >>>>>>>>> Status: merged  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934: 
> > >> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor.  
> > This  
> > >> >>>>>> fixes a  
> > >> >>>>>>>>> bug that causes HA recovery to fail.  
> > >> >>>>>>>>> Status: merged  
> > >> >>>>>>>>>  
> > >> >>>>>>>>>  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> For Flink 1.1.5:  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701: 
> > >> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked  
> on  
> > >> >>>>>> checkpoints.  
> > >> >>>>>>>>> This compromises the producer’s at-least-once guarantee.  
> > >> >>>>>>>>> Status: This is already merged for 1.2.1. I would personally  
> > like  
> > >> >>>> to  
> > >> >>>>>>>>> backport the fix for this to 1.1.5 also.  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006: 
> > >> >>>>>>>>> Kafka Consumer can lose state if queried partition list is  
> > >> >>>> incomplete  
> > >> >>>>>> on  
> > >> >>>>>>>>> restore.  
> > >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3507, one  
> > +1  
> > >> >>>> already  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025: 
> > >> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s  
> > >> >>>>>> JavaSerializer is  
> > >> >>>>>>>>> used.  
> > >> >>>>>>>>> Status: merged  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771: 
> > >> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.  
> > >> >>>>>>>>> Status: merged  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934: 
> > >> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor.  
> > This  
> > >> >>>>>> fixes a  
> > >> >>>>>>>>> bug that causes HA recovery to fail.  
> > >> >>>>>>>>> Status: merged  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048: 
> > >> >>>>>>>>> Kafka Consumer (0.9/0.10) threading model leads problematic  
> > >> >>>>>> cancellation  
> > >> >>>>>>>>> behavior.  
> > >> >>>>>>>>> Status: This fix was already released in 1.2.0, but never  
> > made it  
> > >> >>>> into  
> > >> >>>>>> the  
> > >> >>>>>>>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?  
> > >> >>>>>>>>>  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> What do you think? From the list so far, we pretty much  
> > already  
> > >> >>>> have  
> > >> >>>>>>>>> everything in, so I think it would be nice to aim for RCs by  
> > the  
> > >> >>>> end of  
> > >> >>>>>>>>> this week.  
> > >> >>>>>>>>> Since both bugfix releases cover almost the same list of  
> > issues,  
> > >> I  
> > >> >>>>>> think  
> > >> >>>>>>>>> it shouldn’t be too hard for us to kick off both bugfix  
> > releases  
> > >> >>>>>> around the  
> > >> >>>>>>>>> same time.  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> Also FYI, here’s the lists of JIRA tickets tagged with  
> > "1.2.1” /  
> > >> >>>>>> “1.1.5”  
> > >> >>>>>>>>> as the Fix Versions, and are still open.  
> > >> >>>>>>>>> We should probably want to check if there’s anything on  
> there  
> > >> that  
> > >> >>>> we  
> > >> >>>>>>>>> should block on for the releases:  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> For 1.2.1:  
> > >> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql= 
> > >> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%  
> > >> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%  
> > >> 20fixVersion%20%3D%201.2.1  
> > >> >>>>>>>>>  
> > >> >>>>>>>>> For 1.1.5:  
> > >> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql= 
> > >> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%  
> > >> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%  
> > >> 20fixVersion%20%3D%201.1.5  
> > >> >>>>>>>  
> > >> >>>>>>  
> > >> >>>>  
> > >> >>>  
> > >> >>  
> > >> >>  
> > >>  
> > >>  
> >  
>  
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Tzu-Li (Gordon) Tai
Sorry, I missed one other pending issue for Flink 1.2.1:

https://issues.apache.org/jira/browse/FLINK-5972
Disallow shrinking merging windows. This would replace https://issues.apache.org/jira/browse/FLINK-5713, which was previously listed as a blocker for 1.2.1.
Status: PR review pending - https://github.com/apache/flink/pull/3587

On March 22, 2017 at 12:23:03 AM, Tzu-Li (Gordon) Tai ([hidden email]) wrote:

Update for Flink 1.2.1:

There’s only one PR pending that is LGTM -
https://issues.apache.org/jira/browse/FLINK-6084
Fix for Cassandra connector dropping metrics-core dependency.

We can proceed to create the release candidate very soon :-)
Release 1.1.5 RC1 seems to be in good shape so far, so hopefully we can start voting for 1.2.1 tomorrow.

Also, we’re still lacking a release manager for 1.2.1. Is anyone interested in volunteering for this release?
If nobody steps up for it before tomorrow, I can also do it.

Cheers,
Gordon

On March 18, 2017 at 12:52:48 AM, Robert Metzger ([hidden email]) wrote:

I don't think that his issue should be a reason to hold back a bugfix
release.
There are workarounds for the problem you are describing. Once we've fixed
it, we can include it into the next upcoming bugfix release.

On Fri, Mar 17, 2017 at 4:22 PM, Flavio Pompermaier <[hidden email]>
wrote:

> I propose to fix https://issues.apache.org/jira/browse/FLINK-6103 before
> issue a release
>
> On Fri, Mar 17, 2017 at 8:12 AM, Ufuk Celebi <[hidden email]> wrote:
>
> > Cool! Thanks for taking care of this Gordon :-)
> >
> > On Fri, Mar 17, 2017 at 7:13 AM, Tzu-Li (Gordon) Tai
> > <[hidden email]> wrote:
> > > Update for 1.1.5:
> > > The last fixes for 1.1.5 are in! I will create the RC today and start
> > the vote.
> > >
> > > Cheers,
> > > Gordon
> > >
> > >
> > > On March 17, 2017 at 1:14:53 AM, Robert Metzger ([hidden email])
> > wrote:
> > >
> > > The cassandra connector is probably not usable in Flink 1.2.0. I would
> > like
> > > to include a fix in 1.2.1:
> > > https://issues.apache.org/jira/browse/FLINK-6084
> > >
> > > Please let me know if this fix becomes a blocker for the 1.2.1 release.
> > If
> > > so, I can validate the fix myself to speed up things.
> > >
> > > On Thu, Mar 16, 2017 at 9:41 AM, Jinkui Shi <[hidden email]>
> > wrote:
> > >
> > >> @Tzu-li(Fordon)Tai
> > >>
> > >> FLINK-5650 is fix by [1]. Chesnay Scheduler push a PR please.
> > >>
> > >> [1] https://github.com/zentol/flink/tree/5650_python_test_debug <
> > >> https://github.com/zentol/flink/tree/5650_python_test_debug>
> > >>
> > >>
> > >> > 在 2017年3月16日,上午3:37,Stephan Ewen <[hidden email]> 写道:
> > >> >
> > >> > Thanks for the update!
> > >> >
> > >> > Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove
> scheduled
> > >> > cancel-task from timer queue to prevent memory leaks
> > >> >
> > >> > The remaining issue list looks good, but I would say that (5) is
> > >> optional.
> > >> > It is not a critical production bug.
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai <
> > >> [hidden email]>
> > >> > wrote:
> > >> >
> > >> >> Thanks a lot for the updates so far everyone!
> > >> >>
> > >> >> From the discussion so far, the below is the still unfixed pending
> > >> issues
> > >> >> for 1.1.5 / 1.2.1 release.
> > >> >>
> > >> >> Since there’s only one backport for 1.1.5 left, I think having an
> RC
> > for
> > >> >> 1.1.5 near the end of this week / early next week is very
> promising,
> > as
> > >> >> basically everything is already in.
> > >> >> I’d be happy to volunteer to help manage the release for 1.1.5, and
> > >> >> prepare the RC when it’s ready :)
> > >> >>
> > >> >> For 1.2.1, we can leave the pending list here for tracking, and
> come
> > >> back
> > >> >> to update it in the near future.
> > >> >>
> > >> >> If there’s anything I missed, please let me know!
> > >> >>
> > >> >>
> > >> >> =========== Still pending for Flink 1.1.5 ===========
> > >> >>
> > >> >> (1) https://issues.apache.org/jira/browse/FLINK-5701
> > >> >> Broken at-least-once Kafka producer.
> > >> >> Status: backport PR pending - https://github.com/apache/
> > flink/pull/3549
> > >> .
> > >> >> Since it is a relatively self-contained change, I expect this to
> be a
> > >> fast
> > >> >> fix.
> > >> >>
> > >> >>
> > >> >>
> > >> >> =========== Still pending for Flink 1.2.1 ===========
> > >> >>
> > >> >> (1) https://issues.apache.org/jira/browse/FLINK-5808
> > >> >> Fix Missing verification for setParallelism and setMaxParallelism
> > >> >> Status: PR - https://github.com/apache/flink/pull/3509, review in
> > >> progress
> > >> >>
> > >> >> (2) https://issues.apache.org/jira/browse/FLINK-5713
> > >> >> Protect against NPE in WindowOperator window cleanup
> > >> >> Status: PR - https://github.com/apache/flink/pull/3535, review
> > pending
> > >> >>
> > >> >> (3) https://issues.apache.org/jira/browse/FLINK-6044
> > >> >> TypeSerializerSerializationProxy.read() doesn't verify the read
> > buffer
> > >> >> length
> > >> >> Status: Fixed for master, 1.2 backport pending
> > >> >>
> > >> >> (4) https://issues.apache.org/jira/browse/FLINK-5985
> > >> >> Flink treats every task as stateful (making topology changes
> > impossible)
> > >> >> Status: PR - https://github.com/apache/flink/pull/3543, review in
> > >> progress
> > >> >>
> > >> >> (5) https://issues.apache.org/jira/browse/FLINK-5650
> > >> >> Flink-python tests taking up too much time
> > >> >> Status: I think Chesnay currently has some progress with this one,
> we
> > >> can
> > >> >> see if we want to make this a blocker
> > >> >>
> > >> >>
> > >> >> Cheers,
> > >> >> Gordon
> > >> >>
> > >> >> On March 15, 2017 at 7:16:53 PM, Jinkui Shi ([hidden email])
> > >> wrote:
> > >> >>
> > >> >> Can we fix this issue in the 1.2.1:
> > >> >>
> > >> >> Flink-python tests cost too long time
> > >> >> https://issues.apache.org/jira/browse/FLINK-5650 <
> > >> >> https://issues.apache.org/jira/browse/FLINK-5650>
> > >> >>
> > >> >>> 在 2017年3月15日,下午6:29,Vladislav Pernin <[hidden email]>
> > 写道:
> > >> >>>
> > >> >>> I just tested in in my reproducer. It works.
> > >> >>>
> > >> >>> 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <[hidden email]
> >:
> > >> >>>
> > >> >>>> I did in fact just open a PR for
> > >> >>>>> https://issues.apache.org/jira/browse/FLINK-6001
> > >> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger
> > and
> > >> >>>>> allowedLateness
> > >> >>>>
> > >> >>>>
> > >> >>>> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote:
> > >> >>>>> Hi,
> > >> >>>>>
> > >> >>>>> I would also include the following (not yet resolved) issue in
> the
> > >> >> 1.2.1
> > >> >>>>> scope :
> > >> >>>>>
> > >> >>>>> https://issues.apache.org/jira/browse/FLINK-6001
> > >> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger
> > and
> > >> >>>>> allowedLateness
> > >> >>>>>
> > >> >>>>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <[hidden email]>:
> > >> >>>>>
> > >> >>>>>> Big +1 Gordon!
> > >> >>>>>>
> > >> >>>>>> I think (10) is very critical to have in 1.2.1.
> > >> >>>>>>
> > >> >>>>>> – Ufuk
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter
> > >> >>>>>> <[hidden email]> wrote:
> > >> >>>>>>> Hi,
> > >> >>>>>>>
> > >> >>>>>>> I would suggest to also include in 1.2.1:
> > >> >>>>>>>
> > >> >>>>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044 <
> > >> >>>>>> https://issues.apache.org/jira/browse/FLINK-6044>
> > >> >>>>>>> Replaces unintentional calls to InputStream#read(…) with the
> > >> intended
> > >> >>>>>>> and correct InputStream#readFully(…)
> > >> >>>>>>> Status: PR
> > >> >>>>>>>
> > >> >>>>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985 <
> > >> >>>>>> https://issues.apache.org/jira/browse/FLINK-5985>
> > >> >>>>>>> Flink 1.2 was creating state handles for stateless tasks which
> > >> caused
> > >> >>>>>> trouble
> > >> >>>>>>> at restore time for users that wanted to do some changes that
> > only
> > >> >>>>>> include
> > >> >>>>>>> stateless operators to their topology.
> > >> >>>>>>> Status: PR
> > >> >>>>>>>
> > >> >>>>>>>
> > >> >>>>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <
> > >> [hidden email]
> > >> >>>>> :
> > >> >>>>>>>>
> > >> >>>>>>>> Thanks for kicking off the discussion Tzu-Li. I'd like to add
> > the
> > >> >>>>>> following
> > >> >>>>>>>> issues which have already been merged into the 1.2-release
> and
> > >> >>>>>> 1.1-release
> > >> >>>>>>>> branch:
> > >> >>>>>>>>
> > >> >>>>>>>> 1.2.1:
> > >> >>>>>>>>
> > >> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
> > >> >>>>>>>> Hardens the checkpoint recovery in case of corrupted
> ZooKeeper
> > >> data.
> > >> >>>>>>>> Corrupted checkpoints will now be skipped.
> > >> >>>>>>>> Status: Merged
> > >> >>>>>>>>
> > >> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
> > >> >>>>>>>> Hardens the checkpoint recovery in case that we cannot
> retrieve
> > >> the
> > >> >>>>>>>> completed checkpoint from the meta data state handle
> retrieved
> > >> from
> > >> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is
> > >> >>>> deleted.
> > >> >>>>>>>> Checkpoints with unretrievable state handles are skipped.
> > >> >>>>>>>> Status: Merged
> > >> >>>>>>>>
> > >> >>>>>>>> 1.1.5:
> > >> >>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
> > >> >>>>>>>> Hardens the checkpoint recovery in case of corrupted
> ZooKeeper
> > >> data.
> > >> >>>>>>>> Corrupted checkpoints will now be skipped.
> > >> >>>>>>>> Status: Merged
> > >> >>>>>>>>
> > >> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
> > >> >>>>>>>> Hardens the checkpoint recovery in case that we cannot
> retrieve
> > >> the
> > >> >>>>>>>> completed checkpoint from the meta data state handle
> retrieved
> > >> from
> > >> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is
> > >> >>>> deleted.
> > >> >>>>>>>> Checkpoints with unretrievable state handles are skipped.
> > >> >>>>>>>> Status: Merged
> > >> >>>>>>>>
> > >> >>>>>>>> Cheers,
> > >> >>>>>>>> Till
> > >> >>>>>>>>
> > >> >>>>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <
> > >> >>>>>> [hidden email]>
> > >> >>>>>>>> wrote:
> > >> >>>>>>>>
> > >> >>>>>>>>> Hi all!
> > >> >>>>>>>>>
> > >> >>>>>>>>> I would like to start a discussion for the next bugfix
> release
> > >> for
> > >> >>>>>> 1.1.x
> > >> >>>>>>>>> and 1.2.x.
> > >> >>>>>>>>> There’s been quite a few critical fixes for bugs in both the
> > >> >>>> releases
> > >> >>>>>>>>> recently, and I think they deserve a bugfix release soon.
> > >> >>>>>>>>> Most of the bugs were reported by users.
> > >> >>>>>>>>>
> > >> >>>>>>>>> I’m starting the discussion for both bugfix releases because
> > most
> > >> >>>> fixes
> > >> >>>>>>>>> span both releases (almost identical).
> > >> >>>>>>>>> Of course, the actual RC votes and RC creation process
> doesn’t
> > >> >>>> have to
> > >> >>>>>> be
> > >> >>>>>>>>> started together.
> > >> >>>>>>>>>
> > >> >>>>>>>>> Here’s an overview of what’s been collected so far, for both
> > >> bugfix
> > >> >>>>>>>>> releases -
> > >> >>>>>>>>> (it’s a list of what I’m aware of so far, and may be missing
> > >> stuff;
> > >> >>>>>> please
> > >> >>>>>>>>> append and bring to attention as necessary :-) )
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>> For Flink 1.2.1:
> > >> >>>>>>>>>
> > >> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> > >> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked
> on
> > >> >>>>>> checkpoints.
> > >> >>>>>>>>> This compromises the producer’s at-least-once guarantee.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949:
> > >> >>>>>>>>> Do not check Kerberos credentials for non-Kerberos
> > >> authentications.
> > >> >>>>>> MapR
> > >> >>>>>>>>> users are affected by this, and cannot submit Flink on YARN
> > jobs
> > >> >>>> on a
> > >> >>>>>>>>> secured MapR cluster.
> > >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3528, one
> > +1
> > >> >>>> already
> > >> >>>>>>>>>
> > >> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006:
> > >> >>>>>>>>> Kafka Consumer can lose state if queried partition list is
> > >> >>>> incomplete
> > >> >>>>>> on
> > >> >>>>>>>>> restore.
> > >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3505, one
> > +1
> > >> >>>> already
> > >> >>>>>>>>>
> > >> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025:
> > >> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
> > >> >>>>>> JavaSerializer is
> > >> >>>>>>>>> used.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771:
> > >> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934:
> > >> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor.
> > This
> > >> >>>>>> fixes a
> > >> >>>>>>>>> bug that causes HA recovery to fail.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>> For Flink 1.1.5:
> > >> >>>>>>>>>
> > >> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> > >> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked
> on
> > >> >>>>>> checkpoints.
> > >> >>>>>>>>> This compromises the producer’s at-least-once guarantee.
> > >> >>>>>>>>> Status: This is already merged for 1.2.1. I would personally
> > like
> > >> >>>> to
> > >> >>>>>>>>> backport the fix for this to 1.1.5 also.
> > >> >>>>>>>>>
> > >> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006:
> > >> >>>>>>>>> Kafka Consumer can lose state if queried partition list is
> > >> >>>> incomplete
> > >> >>>>>> on
> > >> >>>>>>>>> restore.
> > >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3507, one
> > +1
> > >> >>>> already
> > >> >>>>>>>>>
> > >> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025:
> > >> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
> > >> >>>>>> JavaSerializer is
> > >> >>>>>>>>> used.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771:
> > >> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934:
> > >> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor.
> > This
> > >> >>>>>> fixes a
> > >> >>>>>>>>> bug that causes HA recovery to fail.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048:
> > >> >>>>>>>>> Kafka Consumer (0.9/0.10) threading model leads problematic
> > >> >>>>>> cancellation
> > >> >>>>>>>>> behavior.
> > >> >>>>>>>>> Status: This fix was already released in 1.2.0, but never
> > made it
> > >> >>>> into
> > >> >>>>>> the
> > >> >>>>>>>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>> What do you think? From the list so far, we pretty much
> > already
> > >> >>>> have
> > >> >>>>>>>>> everything in, so I think it would be nice to aim for RCs by
> > the
> > >> >>>> end of
> > >> >>>>>>>>> this week.
> > >> >>>>>>>>> Since both bugfix releases cover almost the same list of
> > issues,
> > >> I
> > >> >>>>>> think
> > >> >>>>>>>>> it shouldn’t be too hard for us to kick off both bugfix
> > releases
> > >> >>>>>> around the
> > >> >>>>>>>>> same time.
> > >> >>>>>>>>>
> > >> >>>>>>>>> Also FYI, here’s the lists of JIRA tickets tagged with
> > "1.2.1” /
> > >> >>>>>> “1.1.5”
> > >> >>>>>>>>> as the Fix Versions, and are still open.
> > >> >>>>>>>>> We should probably want to check if there’s anything on
> there
> > >> that
> > >> >>>> we
> > >> >>>>>>>>> should block on for the releases:
> > >> >>>>>>>>>
> > >> >>>>>>>>> For 1.2.1:
> > >> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql=
> > >> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> > >> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%
> > >> 20fixVersion%20%3D%201.2.1
> > >> >>>>>>>>>
> > >> >>>>>>>>> For 1.1.5:
> > >> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql=
> > >> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> > >> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%
> > >> 20fixVersion%20%3D%201.1.5
> > >> >>>>>>>
> > >> >>>>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >> >>
> > >>
> > >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Tzu-Li (Gordon) Tai
In reply to this post by Tzu-Li (Gordon) Tai
Update for 1.2.1:

The last fix was just merged!

Since nobody else seems interested in managing 1.2.1, I can also help with this one :)
I’ll create the release candidate over the weekend so we can start the testing / voting next Monday.

- Gordon

On March 22, 2017 at 12:35:25 AM, Tzu-Li (Gordon) Tai ([hidden email]) wrote:

Sorry, I missed one other pending issue for Flink 1.2.1:

https://issues.apache.org/jira/browse/FLINK-5972
Disallow shrinking merging windows. This would replace https://issues.apache.org/jira/browse/FLINK-5713, which was previously listed as a blocker for 1.2.1.
Status: PR review pending - https://github.com/apache/flink/pull/3587

On March 22, 2017 at 12:23:03 AM, Tzu-Li (Gordon) Tai ([hidden email]) wrote:

Update for Flink 1.2.1:

There’s only one PR pending that is LGTM -
https://issues.apache.org/jira/browse/FLINK-6084
Fix for Cassandra connector dropping metrics-core dependency.

We can proceed to create the release candidate very soon :-)
Release 1.1.5 RC1 seems to be in good shape so far, so hopefully we can start voting for 1.2.1 tomorrow.

Also, we’re still lacking a release manager for 1.2.1. Is anyone interested in volunteering for this release?
If nobody steps up for it before tomorrow, I can also do it.

Cheers,
Gordon

On March 18, 2017 at 12:52:48 AM, Robert Metzger ([hidden email]) wrote:

I don't think that his issue should be a reason to hold back a bugfix
release.
There are workarounds for the problem you are describing. Once we've fixed
it, we can include it into the next upcoming bugfix release.

On Fri, Mar 17, 2017 at 4:22 PM, Flavio Pompermaier <[hidden email]>
wrote:

> I propose to fix https://issues.apache.org/jira/browse/FLINK-6103 before
> issue a release
>
> On Fri, Mar 17, 2017 at 8:12 AM, Ufuk Celebi <[hidden email]> wrote:
>
> > Cool! Thanks for taking care of this Gordon :-)
> >
> > On Fri, Mar 17, 2017 at 7:13 AM, Tzu-Li (Gordon) Tai
> > <[hidden email]> wrote:
> > > Update for 1.1.5:
> > > The last fixes for 1.1.5 are in! I will create the RC today and start
> > the vote.
> > >
> > > Cheers,
> > > Gordon
> > >
> > >
> > > On March 17, 2017 at 1:14:53 AM, Robert Metzger ([hidden email])
> > wrote:
> > >
> > > The cassandra connector is probably not usable in Flink 1.2.0. I would
> > like
> > > to include a fix in 1.2.1:
> > > https://issues.apache.org/jira/browse/FLINK-6084
> > >
> > > Please let me know if this fix becomes a blocker for the 1.2.1 release.
> > If
> > > so, I can validate the fix myself to speed up things.
> > >
> > > On Thu, Mar 16, 2017 at 9:41 AM, Jinkui Shi <[hidden email]>
> > wrote:
> > >
> > >> @Tzu-li(Fordon)Tai
> > >>
> > >> FLINK-5650 is fix by [1]. Chesnay Scheduler push a PR please.
> > >>
> > >> [1] https://github.com/zentol/flink/tree/5650_python_test_debug <
> > >> https://github.com/zentol/flink/tree/5650_python_test_debug>
> > >>
> > >>
> > >> > 在 2017年3月16日,上午3:37,Stephan Ewen <[hidden email]> 写道:
> > >> >
> > >> > Thanks for the update!
> > >> >
> > >> > Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove
> scheduled
> > >> > cancel-task from timer queue to prevent memory leaks
> > >> >
> > >> > The remaining issue list looks good, but I would say that (5) is
> > >> optional.
> > >> > It is not a critical production bug.
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai <
> > >> [hidden email]>
> > >> > wrote:
> > >> >
> > >> >> Thanks a lot for the updates so far everyone!
> > >> >>
> > >> >> From the discussion so far, the below is the still unfixed pending
> > >> issues
> > >> >> for 1.1.5 / 1.2.1 release.
> > >> >>
> > >> >> Since there’s only one backport for 1.1.5 left, I think having an
> RC
> > for
> > >> >> 1.1.5 near the end of this week / early next week is very
> promising,
> > as
> > >> >> basically everything is already in.
> > >> >> I’d be happy to volunteer to help manage the release for 1.1.5, and
> > >> >> prepare the RC when it’s ready :)
> > >> >>
> > >> >> For 1.2.1, we can leave the pending list here for tracking, and
> come
> > >> back
> > >> >> to update it in the near future.
> > >> >>
> > >> >> If there’s anything I missed, please let me know!
> > >> >>
> > >> >>
> > >> >> =========== Still pending for Flink 1.1.5 ===========
> > >> >>
> > >> >> (1) https://issues.apache.org/jira/browse/FLINK-5701
> > >> >> Broken at-least-once Kafka producer.
> > >> >> Status: backport PR pending - https://github.com/apache/
> > flink/pull/3549
> > >> .
> > >> >> Since it is a relatively self-contained change, I expect this to
> be a
> > >> fast
> > >> >> fix.
> > >> >>
> > >> >>
> > >> >>
> > >> >> =========== Still pending for Flink 1.2.1 ===========
> > >> >>
> > >> >> (1) https://issues.apache.org/jira/browse/FLINK-5808
> > >> >> Fix Missing verification for setParallelism and setMaxParallelism
> > >> >> Status: PR - https://github.com/apache/flink/pull/3509, review in
> > >> progress
> > >> >>
> > >> >> (2) https://issues.apache.org/jira/browse/FLINK-5713
> > >> >> Protect against NPE in WindowOperator window cleanup
> > >> >> Status: PR - https://github.com/apache/flink/pull/3535, review
> > pending
> > >> >>
> > >> >> (3) https://issues.apache.org/jira/browse/FLINK-6044
> > >> >> TypeSerializerSerializationProxy.read() doesn't verify the read
> > buffer
> > >> >> length
> > >> >> Status: Fixed for master, 1.2 backport pending
> > >> >>
> > >> >> (4) https://issues.apache.org/jira/browse/FLINK-5985
> > >> >> Flink treats every task as stateful (making topology changes
> > impossible)
> > >> >> Status: PR - https://github.com/apache/flink/pull/3543, review in
> > >> progress
> > >> >>
> > >> >> (5) https://issues.apache.org/jira/browse/FLINK-5650
> > >> >> Flink-python tests taking up too much time
> > >> >> Status: I think Chesnay currently has some progress with this one,
> we
> > >> can
> > >> >> see if we want to make this a blocker
> > >> >>
> > >> >>
> > >> >> Cheers,
> > >> >> Gordon
> > >> >>
> > >> >> On March 15, 2017 at 7:16:53 PM, Jinkui Shi ([hidden email])
> > >> wrote:
> > >> >>
> > >> >> Can we fix this issue in the 1.2.1:
> > >> >>
> > >> >> Flink-python tests cost too long time
> > >> >> https://issues.apache.org/jira/browse/FLINK-5650 <
> > >> >> https://issues.apache.org/jira/browse/FLINK-5650>
> > >> >>
> > >> >>> 在 2017年3月15日,下午6:29,Vladislav Pernin <[hidden email]>
> > 写道:
> > >> >>>
> > >> >>> I just tested in in my reproducer. It works.
> > >> >>>
> > >> >>> 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <[hidden email]
> >:
> > >> >>>
> > >> >>>> I did in fact just open a PR for
> > >> >>>>> https://issues.apache.org/jira/browse/FLINK-6001
> > >> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger
> > and
> > >> >>>>> allowedLateness
> > >> >>>>
> > >> >>>>
> > >> >>>> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote:
> > >> >>>>> Hi,
> > >> >>>>>
> > >> >>>>> I would also include the following (not yet resolved) issue in
> the
> > >> >> 1.2.1
> > >> >>>>> scope :
> > >> >>>>>
> > >> >>>>> https://issues.apache.org/jira/browse/FLINK-6001
> > >> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger
> > and
> > >> >>>>> allowedLateness
> > >> >>>>>
> > >> >>>>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <[hidden email]>:
> > >> >>>>>
> > >> >>>>>> Big +1 Gordon!
> > >> >>>>>>
> > >> >>>>>> I think (10) is very critical to have in 1.2.1.
> > >> >>>>>>
> > >> >>>>>> – Ufuk
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter
> > >> >>>>>> <[hidden email]> wrote:
> > >> >>>>>>> Hi,
> > >> >>>>>>>
> > >> >>>>>>> I would suggest to also include in 1.2.1:
> > >> >>>>>>>
> > >> >>>>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044 <
> > >> >>>>>> https://issues.apache.org/jira/browse/FLINK-6044>
> > >> >>>>>>> Replaces unintentional calls to InputStream#read(…) with the
> > >> intended
> > >> >>>>>>> and correct InputStream#readFully(…)
> > >> >>>>>>> Status: PR
> > >> >>>>>>>
> > >> >>>>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985 <
> > >> >>>>>> https://issues.apache.org/jira/browse/FLINK-5985>
> > >> >>>>>>> Flink 1.2 was creating state handles for stateless tasks which
> > >> caused
> > >> >>>>>> trouble
> > >> >>>>>>> at restore time for users that wanted to do some changes that
> > only
> > >> >>>>>> include
> > >> >>>>>>> stateless operators to their topology.
> > >> >>>>>>> Status: PR
> > >> >>>>>>>
> > >> >>>>>>>
> > >> >>>>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <
> > >> [hidden email]
> > >> >>>>> :
> > >> >>>>>>>>
> > >> >>>>>>>> Thanks for kicking off the discussion Tzu-Li. I'd like to add
> > the
> > >> >>>>>> following
> > >> >>>>>>>> issues which have already been merged into the 1.2-release
> and
> > >> >>>>>> 1.1-release
> > >> >>>>>>>> branch:
> > >> >>>>>>>>
> > >> >>>>>>>> 1.2.1:
> > >> >>>>>>>>
> > >> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
> > >> >>>>>>>> Hardens the checkpoint recovery in case of corrupted
> ZooKeeper
> > >> data.
> > >> >>>>>>>> Corrupted checkpoints will now be skipped.
> > >> >>>>>>>> Status: Merged
> > >> >>>>>>>>
> > >> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
> > >> >>>>>>>> Hardens the checkpoint recovery in case that we cannot
> retrieve
> > >> the
> > >> >>>>>>>> completed checkpoint from the meta data state handle
> retrieved
> > >> from
> > >> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is
> > >> >>>> deleted.
> > >> >>>>>>>> Checkpoints with unretrievable state handles are skipped.
> > >> >>>>>>>> Status: Merged
> > >> >>>>>>>>
> > >> >>>>>>>> 1.1.5:
> > >> >>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
> > >> >>>>>>>> Hardens the checkpoint recovery in case of corrupted
> ZooKeeper
> > >> data.
> > >> >>>>>>>> Corrupted checkpoints will now be skipped.
> > >> >>>>>>>> Status: Merged
> > >> >>>>>>>>
> > >> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
> > >> >>>>>>>> Hardens the checkpoint recovery in case that we cannot
> retrieve
> > >> the
> > >> >>>>>>>> completed checkpoint from the meta data state handle
> retrieved
> > >> from
> > >> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is
> > >> >>>> deleted.
> > >> >>>>>>>> Checkpoints with unretrievable state handles are skipped.
> > >> >>>>>>>> Status: Merged
> > >> >>>>>>>>
> > >> >>>>>>>> Cheers,
> > >> >>>>>>>> Till
> > >> >>>>>>>>
> > >> >>>>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <
> > >> >>>>>> [hidden email]>
> > >> >>>>>>>> wrote:
> > >> >>>>>>>>
> > >> >>>>>>>>> Hi all!
> > >> >>>>>>>>>
> > >> >>>>>>>>> I would like to start a discussion for the next bugfix
> release
> > >> for
> > >> >>>>>> 1.1.x
> > >> >>>>>>>>> and 1.2.x.
> > >> >>>>>>>>> There’s been quite a few critical fixes for bugs in both the
> > >> >>>> releases
> > >> >>>>>>>>> recently, and I think they deserve a bugfix release soon.
> > >> >>>>>>>>> Most of the bugs were reported by users.
> > >> >>>>>>>>>
> > >> >>>>>>>>> I’m starting the discussion for both bugfix releases because
> > most
> > >> >>>> fixes
> > >> >>>>>>>>> span both releases (almost identical).
> > >> >>>>>>>>> Of course, the actual RC votes and RC creation process
> doesn’t
> > >> >>>> have to
> > >> >>>>>> be
> > >> >>>>>>>>> started together.
> > >> >>>>>>>>>
> > >> >>>>>>>>> Here’s an overview of what’s been collected so far, for both
> > >> bugfix
> > >> >>>>>>>>> releases -
> > >> >>>>>>>>> (it’s a list of what I’m aware of so far, and may be missing
> > >> stuff;
> > >> >>>>>> please
> > >> >>>>>>>>> append and bring to attention as necessary :-) )
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>> For Flink 1.2.1:
> > >> >>>>>>>>>
> > >> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> > >> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked
> on
> > >> >>>>>> checkpoints.
> > >> >>>>>>>>> This compromises the producer’s at-least-once guarantee.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949:
> > >> >>>>>>>>> Do not check Kerberos credentials for non-Kerberos
> > >> authentications.
> > >> >>>>>> MapR
> > >> >>>>>>>>> users are affected by this, and cannot submit Flink on YARN
> > jobs
> > >> >>>> on a
> > >> >>>>>>>>> secured MapR cluster.
> > >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3528, one
> > +1
> > >> >>>> already
> > >> >>>>>>>>>
> > >> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006:
> > >> >>>>>>>>> Kafka Consumer can lose state if queried partition list is
> > >> >>>> incomplete
> > >> >>>>>> on
> > >> >>>>>>>>> restore.
> > >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3505, one
> > +1
> > >> >>>> already
> > >> >>>>>>>>>
> > >> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025:
> > >> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
> > >> >>>>>> JavaSerializer is
> > >> >>>>>>>>> used.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771:
> > >> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934:
> > >> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor.
> > This
> > >> >>>>>> fixes a
> > >> >>>>>>>>> bug that causes HA recovery to fail.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>> For Flink 1.1.5:
> > >> >>>>>>>>>
> > >> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
> > >> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked
> on
> > >> >>>>>> checkpoints.
> > >> >>>>>>>>> This compromises the producer’s at-least-once guarantee.
> > >> >>>>>>>>> Status: This is already merged for 1.2.1. I would personally
> > like
> > >> >>>> to
> > >> >>>>>>>>> backport the fix for this to 1.1.5 also.
> > >> >>>>>>>>>
> > >> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006:
> > >> >>>>>>>>> Kafka Consumer can lose state if queried partition list is
> > >> >>>> incomplete
> > >> >>>>>> on
> > >> >>>>>>>>> restore.
> > >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3507, one
> > +1
> > >> >>>> already
> > >> >>>>>>>>>
> > >> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025:
> > >> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
> > >> >>>>>> JavaSerializer is
> > >> >>>>>>>>> used.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771:
> > >> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934:
> > >> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor.
> > This
> > >> >>>>>> fixes a
> > >> >>>>>>>>> bug that causes HA recovery to fail.
> > >> >>>>>>>>> Status: merged
> > >> >>>>>>>>>
> > >> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048:
> > >> >>>>>>>>> Kafka Consumer (0.9/0.10) threading model leads problematic
> > >> >>>>>> cancellation
> > >> >>>>>>>>> behavior.
> > >> >>>>>>>>> Status: This fix was already released in 1.2.0, but never
> > made it
> > >> >>>> into
> > >> >>>>>> the
> > >> >>>>>>>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?
> > >> >>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>> What do you think? From the list so far, we pretty much
> > already
> > >> >>>> have
> > >> >>>>>>>>> everything in, so I think it would be nice to aim for RCs by
> > the
> > >> >>>> end of
> > >> >>>>>>>>> this week.
> > >> >>>>>>>>> Since both bugfix releases cover almost the same list of
> > issues,
> > >> I
> > >> >>>>>> think
> > >> >>>>>>>>> it shouldn’t be too hard for us to kick off both bugfix
> > releases
> > >> >>>>>> around the
> > >> >>>>>>>>> same time.
> > >> >>>>>>>>>
> > >> >>>>>>>>> Also FYI, here’s the lists of JIRA tickets tagged with
> > "1.2.1” /
> > >> >>>>>> “1.1.5”
> > >> >>>>>>>>> as the Fix Versions, and are still open.
> > >> >>>>>>>>> We should probably want to check if there’s anything on
> there
> > >> that
> > >> >>>> we
> > >> >>>>>>>>> should block on for the releases:
> > >> >>>>>>>>>
> > >> >>>>>>>>> For 1.2.1:
> > >> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql=
> > >> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> > >> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%
> > >> 20fixVersion%20%3D%201.2.1
> > >> >>>>>>>>>
> > >> >>>>>>>>> For 1.1.5:
> > >> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql=
> > >> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
> > >> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%
> > >> 20fixVersion%20%3D%201.1.5
> > >> >>>>>>>
> > >> >>>>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >> >>
> > >>
> > >>
> >
>
12