[VOTE] Release 1.11.0, release candidate #4

classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[VOTE] Release 1.11.0, release candidate #4

Zhijiang(wangzhijiang999)
Hi everyone,

Please review and vote on the release candidate #4 for the version 1.11.0, as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases to be deployed to dist.apache.org [2], which are signed with the key with fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.11.0-rc4" [5],
* website pull request listing the new release and adding announcement blog post [6].

The vote will be open for at least 72 hours. It is adopted by majority approval, with at least 3 PMC affirmative votes.

Thanks,
Release Manager

[1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1377/
[5] https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
[6] https://github.com/apache/flink-web/pull/352

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.11.0, release candidate #4

Chesnay Schepler-3
- source does not contain binaries
- started a local cluster, logs are fine, examples run
- web submission works _in general_

However, a number of batch examples fail when submitted through the
WebUI with the following error:

Caused by: org.apache.flink.api.common.InvalidProgramException:
Job was submitted in detached mode. Results of job execution, such as
accumulators, runtime, etc. are not available.
Please make sure your program doesn't call an eager execution function
[collect, print, printToErr, count].

I could not find mention of this in the release notes (nor in 1.10; not
quite sure when this change was introduced...).

IIRC this change was intentional, and it isn't necessarily a deal
breaker, but we should ensure that our examples are compatible with all
submission methods.

I'm undecided yet as to whether to block the release on it.

On 30/06/2020 12:17, Zhijiang wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #4 for the version 1.11.0, as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to be deployed to dist.apache.org [2], which are signed with the key with fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.11.0-rc4" [5],
> * website pull request listing the new release and adding announcement blog post [6].
>
> The vote will be open for at least 72 hours. It is adopted by majority approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Release Manager
>
> [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1377/
> [5] https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
> [6] https://github.com/apache/flink-web/pull/352
>

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.11.0, release candidate #4

Thomas Weise
In reply to this post by Zhijiang(wangzhijiang999)
Thanks for preparing another RC!

As mentioned in the previous RC thread, it would be super helpful if the
release notes that are part of the documentation can be included [1]. It's
a significant time-saver to have read those first.

I found one more non-backward compatible change that would be worth
addressing/mentioning:

It is now necessary to configure the jobmanager heap size in
flink-conf.yaml (with either jobmanager.heap.size
or jobmanager.memory.heap.size). Why would I not want to do that anyways?
Well, we set it dynamically for a cluster deployment via the
flinkk8soperator, but the container image can also be used for testing with
local mode (./bin/jobmanager.sh start-foreground local). That will fail if
the heap wasn't configured and that's how I noticed it.

Thanks,
Thomas

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html

On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <[hidden email]>
wrote:

> Hi everyone,
>
> Please review and vote on the release candidate #4 for the version 1.11.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to be
> deployed to dist.apache.org [2], which are signed with the key with
> fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.11.0-rc4" [5],
> * website pull request listing the new release and adding announcement
> blog post [6].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Release Manager
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1377/
> [5] https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
> [6] https://github.com/apache/flink-web/pull/352
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.11.0, release candidate #4

Till Rohrmann
Hi Thomas,

just to confirm: When starting the image in local mode, then you don't have
any of the JobManager memory configuration settings configured in the
effective flink-conf.yaml, right? Does this mean that you have explicitly
removed `jobmanager.heap.size: 1024m` from the default configuration? If
this is the case, then I believe it was more of an unintentional artifact
that it worked before and it has been corrected now so that one needs to
specify the memory of the JM process explicitly. Do you think it would help
to explicitly state this in the release notes?

Cheers,
Till

On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise <[hidden email]> wrote:

> Thanks for preparing another RC!
>
> As mentioned in the previous RC thread, it would be super helpful if the
> release notes that are part of the documentation can be included [1]. It's
> a significant time-saver to have read those first.
>
> I found one more non-backward compatible change that would be worth
> addressing/mentioning:
>
> It is now necessary to configure the jobmanager heap size in
> flink-conf.yaml (with either jobmanager.heap.size
> or jobmanager.memory.heap.size). Why would I not want to do that anyways?
> Well, we set it dynamically for a cluster deployment via the
> flinkk8soperator, but the container image can also be used for testing with
> local mode (./bin/jobmanager.sh start-foreground local). That will fail if
> the heap wasn't configured and that's how I noticed it.
>
> Thanks,
> Thomas
>
> [1]
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html
>
> On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <[hidden email]
> .invalid>
> wrote:
>
> > Hi everyone,
> >
> > Please review and vote on the release candidate #4 for the version
> 1.11.0,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release and binary convenience releases to
> be
> > deployed to dist.apache.org [2], which are signed with the key with
> > fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "release-1.11.0-rc4" [5],
> > * website pull request listing the new release and adding announcement
> > blog post [6].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > Release Manager
> >
> > [1]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
> > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1377/
> > [5] https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
> > [6] https://github.com/apache/flink-web/pull/352
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.11.0, release candidate #4

Thomas Weise
Hi Till,

Yes, we don't have the setting in flink-conf.yaml.

Generally, we carry forward the existing configuration and any change to
default configuration values would impact the upgrade.

Yes, since it is an incompatible change I would state it in the release
notes.

Thanks,
Thomas

BTW I found a performance regression while trying to upgrade another
pipeline with this RC. It is a simple Kinesis to Kinesis job. Wasn't able
to pin it down yet, symptoms include increased checkpoint alignment time.

On Wed, Jul 1, 2020 at 12:04 AM Till Rohrmann <[hidden email]> wrote:

> Hi Thomas,
>
> just to confirm: When starting the image in local mode, then you don't have
> any of the JobManager memory configuration settings configured in the
> effective flink-conf.yaml, right? Does this mean that you have explicitly
> removed `jobmanager.heap.size: 1024m` from the default configuration? If
> this is the case, then I believe it was more of an unintentional artifact
> that it worked before and it has been corrected now so that one needs to
> specify the memory of the JM process explicitly. Do you think it would help
> to explicitly state this in the release notes?
>
> Cheers,
> Till
>
> On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise <[hidden email]> wrote:
>
> > Thanks for preparing another RC!
> >
> > As mentioned in the previous RC thread, it would be super helpful if the
> > release notes that are part of the documentation can be included [1].
> It's
> > a significant time-saver to have read those first.
> >
> > I found one more non-backward compatible change that would be worth
> > addressing/mentioning:
> >
> > It is now necessary to configure the jobmanager heap size in
> > flink-conf.yaml (with either jobmanager.heap.size
> > or jobmanager.memory.heap.size). Why would I not want to do that anyways?
> > Well, we set it dynamically for a cluster deployment via the
> > flinkk8soperator, but the container image can also be used for testing
> with
> > local mode (./bin/jobmanager.sh start-foreground local). That will fail
> if
> > the heap wasn't configured and that's how I noticed it.
> >
> > Thanks,
> > Thomas
> >
> > [1]
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html
> >
> > On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <[hidden email]
> > .invalid>
> > wrote:
> >
> > > Hi everyone,
> > >
> > > Please review and vote on the release candidate #4 for the version
> > 1.11.0,
> > > as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release and binary convenience releases to
> > be
> > > deployed to dist.apache.org [2], which are signed with the key with
> > > fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag "release-1.11.0-rc4" [5],
> > > * website pull request listing the new release and adding announcement
> > > blog post [6].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Thanks,
> > > Release Manager
> > >
> > > [1]
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
> > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1377/
> > > [5] https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
> > > [6] https://github.com/apache/flink-web/pull/352
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.11.0, release candidate #4

Zhijiang(wangzhijiang999)
Hi Thomas,

Thanks for the efficient feedback.

Regarding the suggestion of adding the release notes document, I agree with your point. Maybe we should adjust the vote template accordingly in the respective wiki to guide the following release processes.

Regarding the performance regression, could you provide some more details for our better measurement or reproducing on our sides?
E.g. I guess the topology only includes two vertexes source and sink?
What is the parallelism for every vertex?
The upstream shuffles data to the downstream via rebalance partitioner or other?
The checkpoint mode is exactly-once with rocksDB state backend?
The backpressure happened in this case?
How much percentage regression in this case?

Best,
Zhijiang



------------------------------------------------------------------
From:Thomas Weise <[hidden email]>
Send Time:2020年7月2日(星期四) 09:54
To:dev <[hidden email]>
Subject:Re: [VOTE] Release 1.11.0, release candidate #4

Hi Till,

Yes, we don't have the setting in flink-conf.yaml.

Generally, we carry forward the existing configuration and any change to
default configuration values would impact the upgrade.

Yes, since it is an incompatible change I would state it in the release
notes.

Thanks,
Thomas

BTW I found a performance regression while trying to upgrade another
pipeline with this RC. It is a simple Kinesis to Kinesis job. Wasn't able
to pin it down yet, symptoms include increased checkpoint alignment time.

On Wed, Jul 1, 2020 at 12:04 AM Till Rohrmann <[hidden email]> wrote:

> Hi Thomas,
>
> just to confirm: When starting the image in local mode, then you don't have
> any of the JobManager memory configuration settings configured in the
> effective flink-conf.yaml, right? Does this mean that you have explicitly
> removed `jobmanager.heap.size: 1024m` from the default configuration? If
> this is the case, then I believe it was more of an unintentional artifact
> that it worked before and it has been corrected now so that one needs to
> specify the memory of the JM process explicitly. Do you think it would help
> to explicitly state this in the release notes?
>
> Cheers,
> Till
>
> On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise <[hidden email]> wrote:
>
> > Thanks for preparing another RC!
> >
> > As mentioned in the previous RC thread, it would be super helpful if the
> > release notes that are part of the documentation can be included [1].
> It's
> > a significant time-saver to have read those first.
> >
> > I found one more non-backward compatible change that would be worth
> > addressing/mentioning:
> >
> > It is now necessary to configure the jobmanager heap size in
> > flink-conf.yaml (with either jobmanager.heap.size
> > or jobmanager.memory.heap.size). Why would I not want to do that anyways?
> > Well, we set it dynamically for a cluster deployment via the
> > flinkk8soperator, but the container image can also be used for testing
> with
> > local mode (./bin/jobmanager.sh start-foreground local). That will fail
> if
> > the heap wasn't configured and that's how I noticed it.
> >
> > Thanks,
> > Thomas
> >
> > [1]
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html
> >
> > On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <[hidden email]
> > .invalid>
> > wrote:
> >
> > > Hi everyone,
> > >
> > > Please review and vote on the release candidate #4 for the version
> > 1.11.0,
> > > as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release and binary convenience releases to
> > be
> > > deployed to dist.apache.org [2], which are signed with the key with
> > > fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag "release-1.11.0-rc4" [5],
> > > * website pull request listing the new release and adding announcement
> > > blog post [6].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Thanks,
> > > Release Manager
> > >
> > > [1]
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
> > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1377/
> > > [5] https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
> > > [6] https://github.com/apache/flink-web/pull/352
> > >
> > >
> >
>

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.11.0, release candidate #4

Jark Wu-2
Hi,

I'm very sorry but we just found a blocker issue FLINK-18461 [1] in the new
feature of changelog source (CDC).
This bug will result in queries on changelog source can’t be inserted into
upsert sink (e.g. ES, JDBC, HBase),
which is a common case in production. CDC is one of the important features
of Table/SQL in this release,
so from my side, I hope we can have this fix in 1.11.0, otherwise, this is
a broken feature...

Again, I am terribly sorry for delaying the release...

Best,
Jark

[1]: https://issues.apache.org/jira/browse/FLINK-18461

On Thu, 2 Jul 2020 at 12:02, Zhijiang <[hidden email]>
wrote:

> Hi Thomas,
>
> Thanks for the efficient feedback.
>
> Regarding the suggestion of adding the release notes document, I agree
> with your point. Maybe we should adjust the vote template accordingly in
> the respective wiki to guide the following release processes.
>
> Regarding the performance regression, could you provide some more details
> for our better measurement or reproducing on our sides?
> E.g. I guess the topology only includes two vertexes source and sink?
> What is the parallelism for every vertex?
> The upstream shuffles data to the downstream via rebalance partitioner or
> other?
> The checkpoint mode is exactly-once with rocksDB state backend?
> The backpressure happened in this case?
> How much percentage regression in this case?
>
> Best,
> Zhijiang
>
>
>
> ------------------------------------------------------------------
> From:Thomas Weise <[hidden email]>
> Send Time:2020年7月2日(星期四) 09:54
> To:dev <[hidden email]>
> Subject:Re: [VOTE] Release 1.11.0, release candidate #4
>
> Hi Till,
>
> Yes, we don't have the setting in flink-conf.yaml.
>
> Generally, we carry forward the existing configuration and any change to
> default configuration values would impact the upgrade.
>
> Yes, since it is an incompatible change I would state it in the release
> notes.
>
> Thanks,
> Thomas
>
> BTW I found a performance regression while trying to upgrade another
> pipeline with this RC. It is a simple Kinesis to Kinesis job. Wasn't able
> to pin it down yet, symptoms include increased checkpoint alignment time.
>
> On Wed, Jul 1, 2020 at 12:04 AM Till Rohrmann <[hidden email]>
> wrote:
>
> > Hi Thomas,
> >
> > just to confirm: When starting the image in local mode, then you don't
> have
> > any of the JobManager memory configuration settings configured in the
> > effective flink-conf.yaml, right? Does this mean that you have explicitly
> > removed `jobmanager.heap.size: 1024m` from the default configuration? If
> > this is the case, then I believe it was more of an unintentional artifact
> > that it worked before and it has been corrected now so that one needs to
> > specify the memory of the JM process explicitly. Do you think it would
> help
> > to explicitly state this in the release notes?
> >
> > Cheers,
> > Till
> >
> > On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise <[hidden email]> wrote:
> >
> > > Thanks for preparing another RC!
> > >
> > > As mentioned in the previous RC thread, it would be super helpful if
> the
> > > release notes that are part of the documentation can be included [1].
> > It's
> > > a significant time-saver to have read those first.
> > >
> > > I found one more non-backward compatible change that would be worth
> > > addressing/mentioning:
> > >
> > > It is now necessary to configure the jobmanager heap size in
> > > flink-conf.yaml (with either jobmanager.heap.size
> > > or jobmanager.memory.heap.size). Why would I not want to do that
> anyways?
> > > Well, we set it dynamically for a cluster deployment via the
> > > flinkk8soperator, but the container image can also be used for testing
> > with
> > > local mode (./bin/jobmanager.sh start-foreground local). That will fail
> > if
> > > the heap wasn't configured and that's how I noticed it.
> > >
> > > Thanks,
> > > Thomas
> > >
> > > [1]
> > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html
> > >
> > > On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <[hidden email]
> > > .invalid>
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Please review and vote on the release candidate #4 for the version
> > > 1.11.0,
> > > > as follows:
> > > > [ ] +1, Approve the release
> > > > [ ] -1, Do not approve the release (please provide specific comments)
> > > >
> > > > The complete staging area is available for your review, which
> includes:
> > > > * JIRA release notes [1],
> > > > * the official Apache source release and binary convenience releases
> to
> > > be
> > > > deployed to dist.apache.org [2], which are signed with the key with
> > > > fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
> > > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > > * source code tag "release-1.11.0-rc4" [5],
> > > > * website pull request listing the new release and adding
> announcement
> > > > blog post [6].
> > > >
> > > > The vote will be open for at least 72 hours. It is adopted by
> majority
> > > > approval, with at least 3 PMC affirmative votes.
> > > >
> > > > Thanks,
> > > > Release Manager
> > > >
> > > > [1]
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
> > > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
> > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > [4]
> > > >
> > https://repository.apache.org/content/repositories/orgapacheflink-1377/
> > > > [5] https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
> > > > [6] https://github.com/apache/flink-web/pull/352
> > > >
> > > >
> > >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.11.0, release candidate #4

Robert Metzger
Thanks a lot for the thorough testing Thomas! This is really helpful!

@Chesnay: I would not block the release on this. The web submission does
not seem to be the documented / preferred way of job submission. It is
unlikely to harm the beginner's experience (and they would anyways not read
the release notes). I mention the beginner experience, because they are the
primary audience of the examples.

Regarding FLINK-18461 / Jark's issue: I would not block the release on
that, but still try to get it fixed asap. It is likely that this RC doesn't
go through (given the rate at which we are finding issues), and even if it
goes through, we can document it as a known issue in the release
announcement and immediately release 1.11.1.
Blocking the release on this causes quite a bit of work for the release
managers for rolling a new RC. Until we have understood the performance
regression Thomas is reporting, I would keep this RC open, and keep testing.


On Thu, Jul 2, 2020 at 8:34 AM Jark Wu <[hidden email]> wrote:

> Hi,
>
> I'm very sorry but we just found a blocker issue FLINK-18461 [1] in the new
> feature of changelog source (CDC).
> This bug will result in queries on changelog source can’t be inserted into
> upsert sink (e.g. ES, JDBC, HBase),
> which is a common case in production. CDC is one of the important features
> of Table/SQL in this release,
> so from my side, I hope we can have this fix in 1.11.0, otherwise, this is
> a broken feature...
>
> Again, I am terribly sorry for delaying the release...
>
> Best,
> Jark
>
> [1]: https://issues.apache.org/jira/browse/FLINK-18461
>
> On Thu, 2 Jul 2020 at 12:02, Zhijiang <[hidden email]>
> wrote:
>
> > Hi Thomas,
> >
> > Thanks for the efficient feedback.
> >
> > Regarding the suggestion of adding the release notes document, I agree
> > with your point. Maybe we should adjust the vote template accordingly in
> > the respective wiki to guide the following release processes.
> >
> > Regarding the performance regression, could you provide some more details
> > for our better measurement or reproducing on our sides?
> > E.g. I guess the topology only includes two vertexes source and sink?
> > What is the parallelism for every vertex?
> > The upstream shuffles data to the downstream via rebalance partitioner or
> > other?
> > The checkpoint mode is exactly-once with rocksDB state backend?
> > The backpressure happened in this case?
> > How much percentage regression in this case?
> >
> > Best,
> > Zhijiang
> >
> >
> >
> > ------------------------------------------------------------------
> > From:Thomas Weise <[hidden email]>
> > Send Time:2020年7月2日(星期四) 09:54
> > To:dev <[hidden email]>
> > Subject:Re: [VOTE] Release 1.11.0, release candidate #4
> >
> > Hi Till,
> >
> > Yes, we don't have the setting in flink-conf.yaml.
> >
> > Generally, we carry forward the existing configuration and any change to
> > default configuration values would impact the upgrade.
> >
> > Yes, since it is an incompatible change I would state it in the release
> > notes.
> >
> > Thanks,
> > Thomas
> >
> > BTW I found a performance regression while trying to upgrade another
> > pipeline with this RC. It is a simple Kinesis to Kinesis job. Wasn't able
> > to pin it down yet, symptoms include increased checkpoint alignment time.
> >
> > On Wed, Jul 1, 2020 at 12:04 AM Till Rohrmann <[hidden email]>
> > wrote:
> >
> > > Hi Thomas,
> > >
> > > just to confirm: When starting the image in local mode, then you don't
> > have
> > > any of the JobManager memory configuration settings configured in the
> > > effective flink-conf.yaml, right? Does this mean that you have
> explicitly
> > > removed `jobmanager.heap.size: 1024m` from the default configuration?
> If
> > > this is the case, then I believe it was more of an unintentional
> artifact
> > > that it worked before and it has been corrected now so that one needs
> to
> > > specify the memory of the JM process explicitly. Do you think it would
> > help
> > > to explicitly state this in the release notes?
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise <[hidden email]> wrote:
> > >
> > > > Thanks for preparing another RC!
> > > >
> > > > As mentioned in the previous RC thread, it would be super helpful if
> > the
> > > > release notes that are part of the documentation can be included [1].
> > > It's
> > > > a significant time-saver to have read those first.
> > > >
> > > > I found one more non-backward compatible change that would be worth
> > > > addressing/mentioning:
> > > >
> > > > It is now necessary to configure the jobmanager heap size in
> > > > flink-conf.yaml (with either jobmanager.heap.size
> > > > or jobmanager.memory.heap.size). Why would I not want to do that
> > anyways?
> > > > Well, we set it dynamically for a cluster deployment via the
> > > > flinkk8soperator, but the container image can also be used for
> testing
> > > with
> > > > local mode (./bin/jobmanager.sh start-foreground local). That will
> fail
> > > if
> > > > the heap wasn't configured and that's how I noticed it.
> > > >
> > > > Thanks,
> > > > Thomas
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html
> > > >
> > > > On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <[hidden email]
> > > > .invalid>
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > Please review and vote on the release candidate #4 for the version
> > > > 1.11.0,
> > > > > as follows:
> > > > > [ ] +1, Approve the release
> > > > > [ ] -1, Do not approve the release (please provide specific
> comments)
> > > > >
> > > > > The complete staging area is available for your review, which
> > includes:
> > > > > * JIRA release notes [1],
> > > > > * the official Apache source release and binary convenience
> releases
> > to
> > > > be
> > > > > deployed to dist.apache.org [2], which are signed with the key
> with
> > > > > fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
> > > > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > > > * source code tag "release-1.11.0-rc4" [5],
> > > > > * website pull request listing the new release and adding
> > announcement
> > > > > blog post [6].
> > > > >
> > > > > The vote will be open for at least 72 hours. It is adopted by
> > majority
> > > > > approval, with at least 3 PMC affirmative votes.
> > > > >
> > > > > Thanks,
> > > > > Release Manager
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
> > > > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
> > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > > [4]
> > > > >
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1377/
> > > > > [5]
> https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
> > > > > [6] https://github.com/apache/flink-web/pull/352
> > > > >
> > > > >
> > > >
> > >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.11.0, release candidate #4

Till Rohrmann
I agree with Robert.

@Chesnay: The problem has probably already existed in Flink 1.10 and before
because we cannot run jobs with eager execution calls from the web ui. I
agree with Robert that we can/should improve our documentation in this
regard, though.

@Thomas:
1. I will update the release notes to add a short section describing that
one needs to configure the JobManager memory.
2. Concerning the performance regression we should look into it. I believe
Zhijiang is very eager to learn more about your exact setup to further
debug it. Again I agree with Robert to not block the release on it at the
moment.

@Jark: How much of a problem is FLINK-18461? Will it make the CDC feature
completely unusable or will only make a subset of the use cases to not
work? If it is the latter, then I believe that we can document the
limitations and try to fix it asap. Depending on the remaining testing the
fix might make it into the 1.11.0 or the 1.11.1 release.

Cheers,
Till

On Thu, Jul 2, 2020 at 10:33 AM Robert Metzger <[hidden email]> wrote:

> Thanks a lot for the thorough testing Thomas! This is really helpful!
>
> @Chesnay: I would not block the release on this. The web submission does
> not seem to be the documented / preferred way of job submission. It is
> unlikely to harm the beginner's experience (and they would anyways not read
> the release notes). I mention the beginner experience, because they are the
> primary audience of the examples.
>
> Regarding FLINK-18461 / Jark's issue: I would not block the release on
> that, but still try to get it fixed asap. It is likely that this RC doesn't
> go through (given the rate at which we are finding issues), and even if it
> goes through, we can document it as a known issue in the release
> announcement and immediately release 1.11.1.
> Blocking the release on this causes quite a bit of work for the release
> managers for rolling a new RC. Until we have understood the performance
> regression Thomas is reporting, I would keep this RC open, and keep
> testing.
>
>
> On Thu, Jul 2, 2020 at 8:34 AM Jark Wu <[hidden email]> wrote:
>
> > Hi,
> >
> > I'm very sorry but we just found a blocker issue FLINK-18461 [1] in the
> new
> > feature of changelog source (CDC).
> > This bug will result in queries on changelog source can’t be inserted
> into
> > upsert sink (e.g. ES, JDBC, HBase),
> > which is a common case in production. CDC is one of the important
> features
> > of Table/SQL in this release,
> > so from my side, I hope we can have this fix in 1.11.0, otherwise, this
> is
> > a broken feature...
> >
> > Again, I am terribly sorry for delaying the release...
> >
> > Best,
> > Jark
> >
> > [1]: https://issues.apache.org/jira/browse/FLINK-18461
> >
> > On Thu, 2 Jul 2020 at 12:02, Zhijiang <[hidden email]
> .invalid>
> > wrote:
> >
> > > Hi Thomas,
> > >
> > > Thanks for the efficient feedback.
> > >
> > > Regarding the suggestion of adding the release notes document, I agree
> > > with your point. Maybe we should adjust the vote template accordingly
> in
> > > the respective wiki to guide the following release processes.
> > >
> > > Regarding the performance regression, could you provide some more
> details
> > > for our better measurement or reproducing on our sides?
> > > E.g. I guess the topology only includes two vertexes source and sink?
> > > What is the parallelism for every vertex?
> > > The upstream shuffles data to the downstream via rebalance partitioner
> or
> > > other?
> > > The checkpoint mode is exactly-once with rocksDB state backend?
> > > The backpressure happened in this case?
> > > How much percentage regression in this case?
> > >
> > > Best,
> > > Zhijiang
> > >
> > >
> > >
> > > ------------------------------------------------------------------
> > > From:Thomas Weise <[hidden email]>
> > > Send Time:2020年7月2日(星期四) 09:54
> > > To:dev <[hidden email]>
> > > Subject:Re: [VOTE] Release 1.11.0, release candidate #4
> > >
> > > Hi Till,
> > >
> > > Yes, we don't have the setting in flink-conf.yaml.
> > >
> > > Generally, we carry forward the existing configuration and any change
> to
> > > default configuration values would impact the upgrade.
> > >
> > > Yes, since it is an incompatible change I would state it in the release
> > > notes.
> > >
> > > Thanks,
> > > Thomas
> > >
> > > BTW I found a performance regression while trying to upgrade another
> > > pipeline with this RC. It is a simple Kinesis to Kinesis job. Wasn't
> able
> > > to pin it down yet, symptoms include increased checkpoint alignment
> time.
> > >
> > > On Wed, Jul 1, 2020 at 12:04 AM Till Rohrmann <[hidden email]>
> > > wrote:
> > >
> > > > Hi Thomas,
> > > >
> > > > just to confirm: When starting the image in local mode, then you
> don't
> > > have
> > > > any of the JobManager memory configuration settings configured in the
> > > > effective flink-conf.yaml, right? Does this mean that you have
> > explicitly
> > > > removed `jobmanager.heap.size: 1024m` from the default configuration?
> > If
> > > > this is the case, then I believe it was more of an unintentional
> > artifact
> > > > that it worked before and it has been corrected now so that one needs
> > to
> > > > specify the memory of the JM process explicitly. Do you think it
> would
> > > help
> > > > to explicitly state this in the release notes?
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise <[hidden email]> wrote:
> > > >
> > > > > Thanks for preparing another RC!
> > > > >
> > > > > As mentioned in the previous RC thread, it would be super helpful
> if
> > > the
> > > > > release notes that are part of the documentation can be included
> [1].
> > > > It's
> > > > > a significant time-saver to have read those first.
> > > > >
> > > > > I found one more non-backward compatible change that would be worth
> > > > > addressing/mentioning:
> > > > >
> > > > > It is now necessary to configure the jobmanager heap size in
> > > > > flink-conf.yaml (with either jobmanager.heap.size
> > > > > or jobmanager.memory.heap.size). Why would I not want to do that
> > > anyways?
> > > > > Well, we set it dynamically for a cluster deployment via the
> > > > > flinkk8soperator, but the container image can also be used for
> > testing
> > > > with
> > > > > local mode (./bin/jobmanager.sh start-foreground local). That will
> > fail
> > > > if
> > > > > the heap wasn't configured and that's how I noticed it.
> > > > >
> > > > > Thanks,
> > > > > Thomas
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html
> > > > >
> > > > > On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <
> [hidden email]
> > > > > .invalid>
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > Please review and vote on the release candidate #4 for the
> version
> > > > > 1.11.0,
> > > > > > as follows:
> > > > > > [ ] +1, Approve the release
> > > > > > [ ] -1, Do not approve the release (please provide specific
> > comments)
> > > > > >
> > > > > > The complete staging area is available for your review, which
> > > includes:
> > > > > > * JIRA release notes [1],
> > > > > > * the official Apache source release and binary convenience
> > releases
> > > to
> > > > > be
> > > > > > deployed to dist.apache.org [2], which are signed with the key
> > with
> > > > > > fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
> > > > > > * all artifacts to be deployed to the Maven Central Repository
> [4],
> > > > > > * source code tag "release-1.11.0-rc4" [5],
> > > > > > * website pull request listing the new release and adding
> > > announcement
> > > > > > blog post [6].
> > > > > >
> > > > > > The vote will be open for at least 72 hours. It is adopted by
> > > majority
> > > > > > approval, with at least 3 PMC affirmative votes.
> > > > > >
> > > > > > Thanks,
> > > > > > Release Manager
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
> > > > > > [2]
> https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
> > > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > > > [4]
> > > > > >
> > > >
> > https://repository.apache.org/content/repositories/orgapacheflink-1377/
> > > > > > [5]
> > https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
> > > > > > [6] https://github.com/apache/flink-web/pull/352
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.11.0, release candidate #4

Zhijiang(wangzhijiang999)
I also agree with Till and Robert's proposals.

In general I think we should not block the release based on current estimation. Otherwise we continuously postpone the release, it might probably occur new bugs for blockers, then we might probably
get stuck in such cycle to not give a final release for users in time. But that does not mean RC4 would be the final one, and we can reevaluate the effects in progress with the accumulated issues.

Regarding the performance regression, if possible we can reproduce to analysis the reason based on Thomas's feedback, then we can evaluate its effect.

Regarding the FLINK-18461, after syncing with Jark offline, the bug would effect one of three scenarios for using CDC feature, and this effected scenario is actually the most commonly used way by users.
My suggestion is to merge it into release-1.11 ATM since the PR already open for review, then let's further finalize the conclusion later. If this issue is the only one after RC4 going through, then another option is to cover it in next release-1.11.1 as Robert suggested, as we can prepare for the next minor release soon. If there are other blockers issues during voting and necessary to be resolved soon, then it is no doubt to cover all of them in next RC5.

Best,
Zhijiang


------------------------------------------------------------------
From:Till Rohrmann <[hidden email]>
Send Time:2020年7月2日(星期四) 16:46
To:dev <[hidden email]>
Cc:Zhijiang <[hidden email]>
Subject:Re: [VOTE] Release 1.11.0, release candidate #4

I agree with Robert.

@Chesnay: The problem has probably already existed in Flink 1.10 and before because we cannot run jobs with eager execution calls from the web ui. I agree with Robert that we can/should improve our documentation in this regard, though.

@Thomas:
1. I will update the release notes to add a short section describing that one needs to configure the JobManager memory.
2. Concerning the performance regression we should look into it. I believe Zhijiang is very eager to learn more about your exact setup to further debug it. Again I agree with Robert to not block the release on it at the moment.

@Jark: How much of a problem is FLINK-18461? Will it make the CDC feature completely unusable or will only make a subset of the use cases to not work? If it is the latter, then I believe that we can document the limitations and try to fix it asap. Depending on the remaining testing the fix might make it into the 1.11.0 or the 1.11.1 release.

Cheers,
Till
On Thu, Jul 2, 2020 at 10:33 AM Robert Metzger <[hidden email]> wrote:
Thanks a lot for the thorough testing Thomas! This is really helpful!

 @Chesnay: I would not block the release on this. The web submission does
 not seem to be the documented / preferred way of job submission. It is
 unlikely to harm the beginner's experience (and they would anyways not read
 the release notes). I mention the beginner experience, because they are the
 primary audience of the examples.

 Regarding FLINK-18461 / Jark's issue: I would not block the release on
 that, but still try to get it fixed asap. It is likely that this RC doesn't
 go through (given the rate at which we are finding issues), and even if it
 goes through, we can document it as a known issue in the release
 announcement and immediately release 1.11.1.
 Blocking the release on this causes quite a bit of work for the release
 managers for rolling a new RC. Until we have understood the performance
 regression Thomas is reporting, I would keep this RC open, and keep testing.


 On Thu, Jul 2, 2020 at 8:34 AM Jark Wu <[hidden email]> wrote:

 > Hi,
 >
 > I'm very sorry but we just found a blocker issue FLINK-18461 [1] in the new
 > feature of changelog source (CDC).
 > This bug will result in queries on changelog source can’t be inserted into
 > upsert sink (e.g. ES, JDBC, HBase),
 > which is a common case in production. CDC is one of the important features
 > of Table/SQL in this release,
 > so from my side, I hope we can have this fix in 1.11.0, otherwise, this is
 > a broken feature...
 >
 > Again, I am terribly sorry for delaying the release...
 >
 > Best,
 > Jark
 >
 > [1]: https://issues.apache.org/jira/browse/FLINK-18461
 >
 > On Thu, 2 Jul 2020 at 12:02, Zhijiang <[hidden email]>
 > wrote:
 >
 > > Hi Thomas,
 > >
 > > Thanks for the efficient feedback.
 > >
 > > Regarding the suggestion of adding the release notes document, I agree
 > > with your point. Maybe we should adjust the vote template accordingly in
 > > the respective wiki to guide the following release processes.
 > >
 > > Regarding the performance regression, could you provide some more details
 > > for our better measurement or reproducing on our sides?
 > > E.g. I guess the topology only includes two vertexes source and sink?
 > > What is the parallelism for every vertex?
 > > The upstream shuffles data to the downstream via rebalance partitioner or
 > > other?
 > > The checkpoint mode is exactly-once with rocksDB state backend?
 > > The backpressure happened in this case?
 > > How much percentage regression in this case?
 > >
 > > Best,
 > > Zhijiang
 > >
 > >
 > >
 > > ------------------------------------------------------------------
 > > From:Thomas Weise <[hidden email]>
 > > Send Time:2020年7月2日(星期四) 09:54
 > > To:dev <[hidden email]>
 > > Subject:Re: [VOTE] Release 1.11.0, release candidate #4
 > >
 > > Hi Till,
 > >
 > > Yes, we don't have the setting in flink-conf.yaml.
 > >
 > > Generally, we carry forward the existing configuration and any change to
 > > default configuration values would impact the upgrade.
 > >
 > > Yes, since it is an incompatible change I would state it in the release
 > > notes.
 > >
 > > Thanks,
 > > Thomas
 > >
 > > BTW I found a performance regression while trying to upgrade another
 > > pipeline with this RC. It is a simple Kinesis to Kinesis job. Wasn't able
 > > to pin it down yet, symptoms include increased checkpoint alignment time.
 > >
 > > On Wed, Jul 1, 2020 at 12:04 AM Till Rohrmann <[hidden email]>
 > > wrote:
 > >
 > > > Hi Thomas,
 > > >
 > > > just to confirm: When starting the image in local mode, then you don't
 > > have
 > > > any of the JobManager memory configuration settings configured in the
 > > > effective flink-conf.yaml, right? Does this mean that you have
 > explicitly
 > > > removed `jobmanager.heap.size: 1024m` from the default configuration?
 > If
 > > > this is the case, then I believe it was more of an unintentional
 > artifact
 > > > that it worked before and it has been corrected now so that one needs
 > to
 > > > specify the memory of the JM process explicitly. Do you think it would
 > > help
 > > > to explicitly state this in the release notes?
 > > >
 > > > Cheers,
 > > > Till
 > > >
 > > > On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise <[hidden email]> wrote:
 > > >
 > > > > Thanks for preparing another RC!
 > > > >
 > > > > As mentioned in the previous RC thread, it would be super helpful if
 > > the
 > > > > release notes that are part of the documentation can be included [1].
 > > > It's
 > > > > a significant time-saver to have read those first.
 > > > >
 > > > > I found one more non-backward compatible change that would be worth
 > > > > addressing/mentioning:
 > > > >
 > > > > It is now necessary to configure the jobmanager heap size in
 > > > > flink-conf.yaml (with either jobmanager.heap.size
 > > > > or jobmanager.memory.heap.size). Why would I not want to do that
 > > anyways?
 > > > > Well, we set it dynamically for a cluster deployment via the
 > > > > flinkk8soperator, but the container image can also be used for
 > testing
 > > > with
 > > > > local mode (./bin/jobmanager.sh start-foreground local). That will
 > fail
 > > > if
 > > > > the heap wasn't configured and that's how I noticed it.
 > > > >
 > > > > Thanks,
 > > > > Thomas
 > > > >
 > > > > [1]
 > > > >
 > > > >
 > > >
 > >
 > https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html
 > > > >
 > > > > On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <[hidden email]
 > > > > .invalid>
 > > > > wrote:
 > > > >
 > > > > > Hi everyone,
 > > > > >
 > > > > > Please review and vote on the release candidate #4 for the version
 > > > > 1.11.0,
 > > > > > as follows:
 > > > > > [ ] +1, Approve the release
 > > > > > [ ] -1, Do not approve the release (please provide specific
 > comments)
 > > > > >
 > > > > > The complete staging area is available for your review, which
 > > includes:
 > > > > > * JIRA release notes [1],
 > > > > > * the official Apache source release and binary convenience
 > releases
 > > to
 > > > > be
 > > > > > deployed to dist.apache.org [2], which are signed with the key
 > with
 > > > > > fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
 > > > > > * all artifacts to be deployed to the Maven Central Repository [4],
 > > > > > * source code tag "release-1.11.0-rc4" [5],
 > > > > > * website pull request listing the new release and adding
 > > announcement
 > > > > > blog post [6].
 > > > > >
 > > > > > The vote will be open for at least 72 hours. It is adopted by
 > > majority
 > > > > > approval, with at least 3 PMC affirmative votes.
 > > > > >
 > > > > > Thanks,
 > > > > > Release Manager
 > > > > >
 > > > > > [1]
 > > > > >
 > > > >
 > > >
 > >
 > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
 > > > > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
 > > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
 > > > > > [4]
 > > > > >
 > > >
 > https://repository.apache.org/content/repositories/orgapacheflink-1377/
 > > > > > [5]
 > https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
 > > > > > [6] https://github.com/apache/flink-web/pull/352
 > > > > >
 > > > > >
 > > > >
 > > >
 > >
 > >
 >

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.11.0, release candidate #4

Stephan Ewen
+1 (binding) from my side

  - legal files (license, notice) looks correct
  - no binaries in the release
  - ran examples from command line
  - ran some examples from web ui
  - log files look sane
  - RocksDB, incremental checkpoints, savepoints, moving savepoints
all works as expected.

There are some friction points, which have also been mentioned. However, I
am not sure they need to block the release.
  - Some batch examples in the web UI have not been working in 1.10. We
should fix that asap, because it impacts the "getting started" experience,
but I personally don't vote against the release based on that
  - Same for the CDC bug. It is unfortunate, but I would not hold the
release at such a late stage for one special issue in a new connector.
Let's work on a timely 1.11.1.


I would withdraw my vote, if we find a fundamental issue in the network
system causing the increased checkpoint delays, causing the job regression
Thomas mentioned.
Such a core bug would be a deal-breaker for a large fraction of users.




On Thu, Jul 2, 2020 at 11:35 AM Zhijiang <[hidden email]>
wrote:

> I also agree with Till and Robert's proposals.
>
> In general I think we should not block the release based on current
> estimation. Otherwise we continuously postpone the release, it might
> probably occur new bugs for blockers, then we might probably
> get stuck in such cycle to not give a final release for users in time. But
> that does not mean RC4 would be the final one, and we can reevaluate the
> effects in progress with the accumulated issues.
>
> Regarding the performance regression, if possible we can reproduce to
> analysis the reason based on Thomas's feedback, then we can evaluate its
> effect.
>
> Regarding the FLINK-18461, after syncing with Jark offline, the bug would
> effect one of three scenarios for using CDC feature, and this effected
> scenario is actually the most commonly used way by users.
> My suggestion is to merge it into release-1.11 ATM since the PR already
> open for review, then let's further finalize the conclusion later. If this
> issue is the only one after RC4 going through, then another option is to
> cover it in next release-1.11.1 as Robert suggested, as we can prepare for
> the next minor release soon. If there are other blockers issues during
> voting and necessary to be resolved soon, then it is no doubt to cover all
> of them in next RC5.
>
> Best,
> Zhijiang
>
>
> ------------------------------------------------------------------
> From:Till Rohrmann <[hidden email]>
> Send Time:2020年7月2日(星期四) 16:46
> To:dev <[hidden email]>
> Cc:Zhijiang <[hidden email]>
> Subject:Re: [VOTE] Release 1.11.0, release candidate #4
>
> I agree with Robert.
>
> @Chesnay: The problem has probably already existed in Flink 1.10 and
> before because we cannot run jobs with eager execution calls from the web
> ui. I agree with Robert that we can/should improve our documentation in
> this regard, though.
>
> @Thomas:
> 1. I will update the release notes to add a short section describing that
> one needs to configure the JobManager memory.
> 2. Concerning the performance regression we should look into it. I believe
> Zhijiang is very eager to learn more about your exact setup to further
> debug it. Again I agree with Robert to not block the release on it at the
> moment.
>
> @Jark: How much of a problem is FLINK-18461? Will it make the CDC feature
> completely unusable or will only make a subset of the use cases to not
> work? If it is the latter, then I believe that we can document the
> limitations and try to fix it asap. Depending on the remaining testing the
> fix might make it into the 1.11.0 or the 1.11.1 release.
>
> Cheers,
> Till
> On Thu, Jul 2, 2020 at 10:33 AM Robert Metzger <[hidden email]>
> wrote:
> Thanks a lot for the thorough testing Thomas! This is really helpful!
>
>  @Chesnay: I would not block the release on this. The web submission does
>  not seem to be the documented / preferred way of job submission. It is
>  unlikely to harm the beginner's experience (and they would anyways not
> read
>  the release notes). I mention the beginner experience, because they are
> the
>  primary audience of the examples.
>
>  Regarding FLINK-18461 / Jark's issue: I would not block the release on
>  that, but still try to get it fixed asap. It is likely that this RC
> doesn't
>  go through (given the rate at which we are finding issues), and even if it
>  goes through, we can document it as a known issue in the release
>  announcement and immediately release 1.11.1.
>  Blocking the release on this causes quite a bit of work for the release
>  managers for rolling a new RC. Until we have understood the performance
>  regression Thomas is reporting, I would keep this RC open, and keep
> testing.
>
>
>  On Thu, Jul 2, 2020 at 8:34 AM Jark Wu <[hidden email]> wrote:
>
>  > Hi,
>  >
>  > I'm very sorry but we just found a blocker issue FLINK-18461 [1] in the
> new
>  > feature of changelog source (CDC).
>  > This bug will result in queries on changelog source can’t be inserted
> into
>  > upsert sink (e.g. ES, JDBC, HBase),
>  > which is a common case in production. CDC is one of the important
> features
>  > of Table/SQL in this release,
>  > so from my side, I hope we can have this fix in 1.11.0, otherwise, this
> is
>  > a broken feature...
>  >
>  > Again, I am terribly sorry for delaying the release...
>  >
>  > Best,
>  > Jark
>  >
>  > [1]: https://issues.apache.org/jira/browse/FLINK-18461
>  >
>  > On Thu, 2 Jul 2020 at 12:02, Zhijiang <[hidden email]
> .invalid>
>  > wrote:
>  >
>  > > Hi Thomas,
>  > >
>  > > Thanks for the efficient feedback.
>  > >
>  > > Regarding the suggestion of adding the release notes document, I agree
>  > > with your point. Maybe we should adjust the vote template accordingly
> in
>  > > the respective wiki to guide the following release processes.
>  > >
>  > > Regarding the performance regression, could you provide some more
> details
>  > > for our better measurement or reproducing on our sides?
>  > > E.g. I guess the topology only includes two vertexes source and sink?
>  > > What is the parallelism for every vertex?
>  > > The upstream shuffles data to the downstream via rebalance
> partitioner or
>  > > other?
>  > > The checkpoint mode is exactly-once with rocksDB state backend?
>  > > The backpressure happened in this case?
>  > > How much percentage regression in this case?
>  > >
>  > > Best,
>  > > Zhijiang
>  > >
>  > >
>  > >
>  > > ------------------------------------------------------------------
>  > > From:Thomas Weise <[hidden email]>
>  > > Send Time:2020年7月2日(星期四) 09:54
>  > > To:dev <[hidden email]>
>  > > Subject:Re: [VOTE] Release 1.11.0, release candidate #4
>  > >
>  > > Hi Till,
>  > >
>  > > Yes, we don't have the setting in flink-conf.yaml.
>  > >
>  > > Generally, we carry forward the existing configuration and any change
> to
>  > > default configuration values would impact the upgrade.
>  > >
>  > > Yes, since it is an incompatible change I would state it in the
> release
>  > > notes.
>  > >
>  > > Thanks,
>  > > Thomas
>  > >
>  > > BTW I found a performance regression while trying to upgrade another
>  > > pipeline with this RC. It is a simple Kinesis to Kinesis job. Wasn't
> able
>  > > to pin it down yet, symptoms include increased checkpoint alignment
> time.
>  > >
>  > > On Wed, Jul 1, 2020 at 12:04 AM Till Rohrmann <[hidden email]>
>  > > wrote:
>  > >
>  > > > Hi Thomas,
>  > > >
>  > > > just to confirm: When starting the image in local mode, then you
> don't
>  > > have
>  > > > any of the JobManager memory configuration settings configured in
> the
>  > > > effective flink-conf.yaml, right? Does this mean that you have
>  > explicitly
>  > > > removed `jobmanager.heap.size: 1024m` from the default
> configuration?
>  > If
>  > > > this is the case, then I believe it was more of an unintentional
>  > artifact
>  > > > that it worked before and it has been corrected now so that one
> needs
>  > to
>  > > > specify the memory of the JM process explicitly. Do you think it
> would
>  > > help
>  > > > to explicitly state this in the release notes?
>  > > >
>  > > > Cheers,
>  > > > Till
>  > > >
>  > > > On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise <[hidden email]> wrote:
>  > > >
>  > > > > Thanks for preparing another RC!
>  > > > >
>  > > > > As mentioned in the previous RC thread, it would be super helpful
> if
>  > > the
>  > > > > release notes that are part of the documentation can be included
> [1].
>  > > > It's
>  > > > > a significant time-saver to have read those first.
>  > > > >
>  > > > > I found one more non-backward compatible change that would be
> worth
>  > > > > addressing/mentioning:
>  > > > >
>  > > > > It is now necessary to configure the jobmanager heap size in
>  > > > > flink-conf.yaml (with either jobmanager.heap.size
>  > > > > or jobmanager.memory.heap.size). Why would I not want to do that
>  > > anyways?
>  > > > > Well, we set it dynamically for a cluster deployment via the
>  > > > > flinkk8soperator, but the container image can also be used for
>  > testing
>  > > > with
>  > > > > local mode (./bin/jobmanager.sh start-foreground local). That will
>  > fail
>  > > > if
>  > > > > the heap wasn't configured and that's how I noticed it.
>  > > > >
>  > > > > Thanks,
>  > > > > Thomas
>  > > > >
>  > > > > [1]
>  > > > >
>  > > > >
>  > > >
>  > >
>  >
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html
>  > > > >
>  > > > > On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <
> [hidden email]
>  > > > > .invalid>
>  > > > > wrote:
>  > > > >
>  > > > > > Hi everyone,
>  > > > > >
>  > > > > > Please review and vote on the release candidate #4 for the
> version
>  > > > > 1.11.0,
>  > > > > > as follows:
>  > > > > > [ ] +1, Approve the release
>  > > > > > [ ] -1, Do not approve the release (please provide specific
>  > comments)
>  > > > > >
>  > > > > > The complete staging area is available for your review, which
>  > > includes:
>  > > > > > * JIRA release notes [1],
>  > > > > > * the official Apache source release and binary convenience
>  > releases
>  > > to
>  > > > > be
>  > > > > > deployed to dist.apache.org [2], which are signed with the key
>  > with
>  > > > > > fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
>  > > > > > * all artifacts to be deployed to the Maven Central Repository
> [4],
>  > > > > > * source code tag "release-1.11.0-rc4" [5],
>  > > > > > * website pull request listing the new release and adding
>  > > announcement
>  > > > > > blog post [6].
>  > > > > >
>  > > > > > The vote will be open for at least 72 hours. It is adopted by
>  > > majority
>  > > > > > approval, with at least 3 PMC affirmative votes.
>  > > > > >
>  > > > > > Thanks,
>  > > > > > Release Manager
>  > > > > >
>  > > > > > [1]
>  > > > > >
>  > > > >
>  > > >
>  > >
>  >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
>  > > > > > [2]
> https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
>  > > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>  > > > > > [4]
>  > > > > >
>  > > >
>  > https://repository.apache.org/content/repositories/orgapacheflink-1377/
>  > > > > > [5]
>  > https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
>  > > > > > [6] https://github.com/apache/flink-web/pull/352
>  > > > > >
>  > > > > >
>  > > > >
>  > > >
>  > >
>  > >
>  >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.11.0, release candidate #4

Chesnay Schepler-3
+1

Re examples:

The examples failing is new in 1.1*1*, and was introduced in
https://issues.apache.org/jira/browse/FLINK-16655.

In prior versions, calls to print()/count()/etc. were were simply
treated as an execute(),  whereas with 1.11 we outright fail the
submission because these do not work in detached submissions (which jar
submissions always are).
This is generally /fine/, and may safe users some headaches, but we
should add this to the release notes and in a follow-up ensure a proper
error message is shown in the UI (I'll take care of that). At the moment
you just get an "Internal Server Error.", and have to check the
JobManager logs for details.

On 02/07/2020 15:47, Stephan Ewen wrote:

> +1 (binding) from my side
>
>    - legal files (license, notice) looks correct
>    - no binaries in the release
>    - ran examples from command line
>    - ran some examples from web ui
>    - log files look sane
>    - RocksDB, incremental checkpoints, savepoints, moving savepoints
> all works as expected.
>
> There are some friction points, which have also been mentioned. However, I
> am not sure they need to block the release.
>    - Some batch examples in the web UI have not been working in 1.10. We
> should fix that asap, because it impacts the "getting started" experience,
> but I personally don't vote against the release based on that
>    - Same for the CDC bug. It is unfortunate, but I would not hold the
> release at such a late stage for one special issue in a new connector.
> Let's work on a timely 1.11.1.
>
>
> I would withdraw my vote, if we find a fundamental issue in the network
> system causing the increased checkpoint delays, causing the job regression
> Thomas mentioned.
> Such a core bug would be a deal-breaker for a large fraction of users.
>
>
>
>
> On Thu, Jul 2, 2020 at 11:35 AM Zhijiang <[hidden email]>
> wrote:
>
>> I also agree with Till and Robert's proposals.
>>
>> In general I think we should not block the release based on current
>> estimation. Otherwise we continuously postpone the release, it might
>> probably occur new bugs for blockers, then we might probably
>> get stuck in such cycle to not give a final release for users in time. But
>> that does not mean RC4 would be the final one, and we can reevaluate the
>> effects in progress with the accumulated issues.
>>
>> Regarding the performance regression, if possible we can reproduce to
>> analysis the reason based on Thomas's feedback, then we can evaluate its
>> effect.
>>
>> Regarding the FLINK-18461, after syncing with Jark offline, the bug would
>> effect one of three scenarios for using CDC feature, and this effected
>> scenario is actually the most commonly used way by users.
>> My suggestion is to merge it into release-1.11 ATM since the PR already
>> open for review, then let's further finalize the conclusion later. If this
>> issue is the only one after RC4 going through, then another option is to
>> cover it in next release-1.11.1 as Robert suggested, as we can prepare for
>> the next minor release soon. If there are other blockers issues during
>> voting and necessary to be resolved soon, then it is no doubt to cover all
>> of them in next RC5.
>>
>> Best,
>> Zhijiang
>>
>>
>> ------------------------------------------------------------------
>> From:Till Rohrmann <[hidden email]>
>> Send Time:2020年7月2日(星期四) 16:46
>> To:dev <[hidden email]>
>> Cc:Zhijiang <[hidden email]>
>> Subject:Re: [VOTE] Release 1.11.0, release candidate #4
>>
>> I agree with Robert.
>>
>> @Chesnay: The problem has probably already existed in Flink 1.10 and
>> before because we cannot run jobs with eager execution calls from the web
>> ui. I agree with Robert that we can/should improve our documentation in
>> this regard, though.
>>
>> @Thomas:
>> 1. I will update the release notes to add a short section describing that
>> one needs to configure the JobManager memory.
>> 2. Concerning the performance regression we should look into it. I believe
>> Zhijiang is very eager to learn more about your exact setup to further
>> debug it. Again I agree with Robert to not block the release on it at the
>> moment.
>>
>> @Jark: How much of a problem is FLINK-18461? Will it make the CDC feature
>> completely unusable or will only make a subset of the use cases to not
>> work? If it is the latter, then I believe that we can document the
>> limitations and try to fix it asap. Depending on the remaining testing the
>> fix might make it into the 1.11.0 or the 1.11.1 release.
>>
>> Cheers,
>> Till
>> On Thu, Jul 2, 2020 at 10:33 AM Robert Metzger <[hidden email]>
>> wrote:
>> Thanks a lot for the thorough testing Thomas! This is really helpful!
>>
>>   @Chesnay: I would not block the release on this. The web submission does
>>   not seem to be the documented / preferred way of job submission. It is
>>   unlikely to harm the beginner's experience (and they would anyways not
>> read
>>   the release notes). I mention the beginner experience, because they are
>> the
>>   primary audience of the examples.
>>
>>   Regarding FLINK-18461 / Jark's issue: I would not block the release on
>>   that, but still try to get it fixed asap. It is likely that this RC
>> doesn't
>>   go through (given the rate at which we are finding issues), and even if it
>>   goes through, we can document it as a known issue in the release
>>   announcement and immediately release 1.11.1.
>>   Blocking the release on this causes quite a bit of work for the release
>>   managers for rolling a new RC. Until we have understood the performance
>>   regression Thomas is reporting, I would keep this RC open, and keep
>> testing.
>>
>>
>>   On Thu, Jul 2, 2020 at 8:34 AM Jark Wu <[hidden email]> wrote:
>>
>>   > Hi,
>>   >
>>   > I'm very sorry but we just found a blocker issue FLINK-18461 [1] in the
>> new
>>   > feature of changelog source (CDC).
>>   > This bug will result in queries on changelog source can’t be inserted
>> into
>>   > upsert sink (e.g. ES, JDBC, HBase),
>>   > which is a common case in production. CDC is one of the important
>> features
>>   > of Table/SQL in this release,
>>   > so from my side, I hope we can have this fix in 1.11.0, otherwise, this
>> is
>>   > a broken feature...
>>   >
>>   > Again, I am terribly sorry for delaying the release...
>>   >
>>   > Best,
>>   > Jark
>>   >
>>   > [1]: https://issues.apache.org/jira/browse/FLINK-18461
>>   >
>>   > On Thu, 2 Jul 2020 at 12:02, Zhijiang <[hidden email]
>> .invalid>
>>   > wrote:
>>   >
>>   > > Hi Thomas,
>>   > >
>>   > > Thanks for the efficient feedback.
>>   > >
>>   > > Regarding the suggestion of adding the release notes document, I agree
>>   > > with your point. Maybe we should adjust the vote template accordingly
>> in
>>   > > the respective wiki to guide the following release processes.
>>   > >
>>   > > Regarding the performance regression, could you provide some more
>> details
>>   > > for our better measurement or reproducing on our sides?
>>   > > E.g. I guess the topology only includes two vertexes source and sink?
>>   > > What is the parallelism for every vertex?
>>   > > The upstream shuffles data to the downstream via rebalance
>> partitioner or
>>   > > other?
>>   > > The checkpoint mode is exactly-once with rocksDB state backend?
>>   > > The backpressure happened in this case?
>>   > > How much percentage regression in this case?
>>   > >
>>   > > Best,
>>   > > Zhijiang
>>   > >
>>   > >
>>   > >
>>   > > ------------------------------------------------------------------
>>   > > From:Thomas Weise <[hidden email]>
>>   > > Send Time:2020年7月2日(星期四) 09:54
>>   > > To:dev <[hidden email]>
>>   > > Subject:Re: [VOTE] Release 1.11.0, release candidate #4
>>   > >
>>   > > Hi Till,
>>   > >
>>   > > Yes, we don't have the setting in flink-conf.yaml.
>>   > >
>>   > > Generally, we carry forward the existing configuration and any change
>> to
>>   > > default configuration values would impact the upgrade.
>>   > >
>>   > > Yes, since it is an incompatible change I would state it in the
>> release
>>   > > notes.
>>   > >
>>   > > Thanks,
>>   > > Thomas
>>   > >
>>   > > BTW I found a performance regression while trying to upgrade another
>>   > > pipeline with this RC. It is a simple Kinesis to Kinesis job. Wasn't
>> able
>>   > > to pin it down yet, symptoms include increased checkpoint alignment
>> time.
>>   > >
>>   > > On Wed, Jul 1, 2020 at 12:04 AM Till Rohrmann <[hidden email]>
>>   > > wrote:
>>   > >
>>   > > > Hi Thomas,
>>   > > >
>>   > > > just to confirm: When starting the image in local mode, then you
>> don't
>>   > > have
>>   > > > any of the JobManager memory configuration settings configured in
>> the
>>   > > > effective flink-conf.yaml, right? Does this mean that you have
>>   > explicitly
>>   > > > removed `jobmanager.heap.size: 1024m` from the default
>> configuration?
>>   > If
>>   > > > this is the case, then I believe it was more of an unintentional
>>   > artifact
>>   > > > that it worked before and it has been corrected now so that one
>> needs
>>   > to
>>   > > > specify the memory of the JM process explicitly. Do you think it
>> would
>>   > > help
>>   > > > to explicitly state this in the release notes?
>>   > > >
>>   > > > Cheers,
>>   > > > Till
>>   > > >
>>   > > > On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise <[hidden email]> wrote:
>>   > > >
>>   > > > > Thanks for preparing another RC!
>>   > > > >
>>   > > > > As mentioned in the previous RC thread, it would be super helpful
>> if
>>   > > the
>>   > > > > release notes that are part of the documentation can be included
>> [1].
>>   > > > It's
>>   > > > > a significant time-saver to have read those first.
>>   > > > >
>>   > > > > I found one more non-backward compatible change that would be
>> worth
>>   > > > > addressing/mentioning:
>>   > > > >
>>   > > > > It is now necessary to configure the jobmanager heap size in
>>   > > > > flink-conf.yaml (with either jobmanager.heap.size
>>   > > > > or jobmanager.memory.heap.size). Why would I not want to do that
>>   > > anyways?
>>   > > > > Well, we set it dynamically for a cluster deployment via the
>>   > > > > flinkk8soperator, but the container image can also be used for
>>   > testing
>>   > > > with
>>   > > > > local mode (./bin/jobmanager.sh start-foreground local). That will
>>   > fail
>>   > > > if
>>   > > > > the heap wasn't configured and that's how I noticed it.
>>   > > > >
>>   > > > > Thanks,
>>   > > > > Thomas
>>   > > > >
>>   > > > > [1]
>>   > > > >
>>   > > > >
>>   > > >
>>   > >
>>   >
>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html
>>   > > > >
>>   > > > > On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <
>> [hidden email]
>>   > > > > .invalid>
>>   > > > > wrote:
>>   > > > >
>>   > > > > > Hi everyone,
>>   > > > > >
>>   > > > > > Please review and vote on the release candidate #4 for the
>> version
>>   > > > > 1.11.0,
>>   > > > > > as follows:
>>   > > > > > [ ] +1, Approve the release
>>   > > > > > [ ] -1, Do not approve the release (please provide specific
>>   > comments)
>>   > > > > >
>>   > > > > > The complete staging area is available for your review, which
>>   > > includes:
>>   > > > > > * JIRA release notes [1],
>>   > > > > > * the official Apache source release and binary convenience
>>   > releases
>>   > > to
>>   > > > > be
>>   > > > > > deployed to dist.apache.org [2], which are signed with the key
>>   > with
>>   > > > > > fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
>>   > > > > > * all artifacts to be deployed to the Maven Central Repository
>> [4],
>>   > > > > > * source code tag "release-1.11.0-rc4" [5],
>>   > > > > > * website pull request listing the new release and adding
>>   > > announcement
>>   > > > > > blog post [6].
>>   > > > > >
>>   > > > > > The vote will be open for at least 72 hours. It is adopted by
>>   > > majority
>>   > > > > > approval, with at least 3 PMC affirmative votes.
>>   > > > > >
>>   > > > > > Thanks,
>>   > > > > > Release Manager
>>   > > > > >
>>   > > > > > [1]
>>   > > > > >
>>   > > > >
>>   > > >
>>   > >
>>   >
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
>>   > > > > > [2]
>> https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
>>   > > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>>   > > > > > [4]
>>   > > > > >
>>   > > >
>>   > https://repository.apache.org/content/repositories/orgapacheflink-1377/
>>   > > > > > [5]
>>   > https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
>>   > > > > > [6] https://github.com/apache/flink-web/pull/352
>>   > > > > >
>>   > > > > >
>>   > > > >
>>   > > >
>>   > >
>>   > >
>>   >
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.11.0, release candidate #4

Robert Metzger
In reply to this post by Stephan Ewen
Issues found:
-
https://repository.apache.org/content/repositories/orgapacheflink-1377/org/apache/flink/flink-runtime_2.12/1.11.0/flink-runtime_2.12-1.11.0.jar
./META-INF/NOTICE lists "org.uncommons.maths:uncommons-maths:1.2.2a" as a
bundled dependency. However, it seems they are not bundled. I'm waiting
with my vote until we've discussed this issue. I'm leaning towards
continuing the release vote (
https://issues.apache.org/jira/browse/FLINK-18471).

Checks:
- source archive compiles
- checked artifacts in staging repo
  - flink-azure-fs-hadoop-1.11.0.jar seems to have a correct NOTICE file
  - versions in pom seem correct
  - checked some other jars
- ... I will continue later ...

On Thu, Jul 2, 2020 at 3:47 PM Stephan Ewen <[hidden email]> wrote:

> +1 (binding) from my side
>
>   - legal files (license, notice) looks correct
>   - no binaries in the release
>   - ran examples from command line
>   - ran some examples from web ui
>   - log files look sane
>   - RocksDB, incremental checkpoints, savepoints, moving savepoints
> all works as expected.
>
> There are some friction points, which have also been mentioned. However, I
> am not sure they need to block the release.
>   - Some batch examples in the web UI have not been working in 1.10. We
> should fix that asap, because it impacts the "getting started" experience,
> but I personally don't vote against the release based on that
>   - Same for the CDC bug. It is unfortunate, but I would not hold the
> release at such a late stage for one special issue in a new connector.
> Let's work on a timely 1.11.1.
>
>
> I would withdraw my vote, if we find a fundamental issue in the network
> system causing the increased checkpoint delays, causing the job regression
> Thomas mentioned.
> Such a core bug would be a deal-breaker for a large fraction of users.
>
>
>
>
> On Thu, Jul 2, 2020 at 11:35 AM Zhijiang <[hidden email]
> .invalid>
> wrote:
>
> > I also agree with Till and Robert's proposals.
> >
> > In general I think we should not block the release based on current
> > estimation. Otherwise we continuously postpone the release, it might
> > probably occur new bugs for blockers, then we might probably
> > get stuck in such cycle to not give a final release for users in time.
> But
> > that does not mean RC4 would be the final one, and we can reevaluate the
> > effects in progress with the accumulated issues.
> >
> > Regarding the performance regression, if possible we can reproduce to
> > analysis the reason based on Thomas's feedback, then we can evaluate its
> > effect.
> >
> > Regarding the FLINK-18461, after syncing with Jark offline, the bug would
> > effect one of three scenarios for using CDC feature, and this effected
> > scenario is actually the most commonly used way by users.
> > My suggestion is to merge it into release-1.11 ATM since the PR already
> > open for review, then let's further finalize the conclusion later. If
> this
> > issue is the only one after RC4 going through, then another option is to
> > cover it in next release-1.11.1 as Robert suggested, as we can prepare
> for
> > the next minor release soon. If there are other blockers issues during
> > voting and necessary to be resolved soon, then it is no doubt to cover
> all
> > of them in next RC5.
> >
> > Best,
> > Zhijiang
> >
> >
> > ------------------------------------------------------------------
> > From:Till Rohrmann <[hidden email]>
> > Send Time:2020年7月2日(星期四) 16:46
> > To:dev <[hidden email]>
> > Cc:Zhijiang <[hidden email]>
> > Subject:Re: [VOTE] Release 1.11.0, release candidate #4
> >
> > I agree with Robert.
> >
> > @Chesnay: The problem has probably already existed in Flink 1.10 and
> > before because we cannot run jobs with eager execution calls from the web
> > ui. I agree with Robert that we can/should improve our documentation in
> > this regard, though.
> >
> > @Thomas:
> > 1. I will update the release notes to add a short section describing that
> > one needs to configure the JobManager memory.
> > 2. Concerning the performance regression we should look into it. I
> believe
> > Zhijiang is very eager to learn more about your exact setup to further
> > debug it. Again I agree with Robert to not block the release on it at the
> > moment.
> >
> > @Jark: How much of a problem is FLINK-18461? Will it make the CDC feature
> > completely unusable or will only make a subset of the use cases to not
> > work? If it is the latter, then I believe that we can document the
> > limitations and try to fix it asap. Depending on the remaining testing
> the
> > fix might make it into the 1.11.0 or the 1.11.1 release.
> >
> > Cheers,
> > Till
> > On Thu, Jul 2, 2020 at 10:33 AM Robert Metzger <[hidden email]>
> > wrote:
> > Thanks a lot for the thorough testing Thomas! This is really helpful!
> >
> >  @Chesnay: I would not block the release on this. The web submission does
> >  not seem to be the documented / preferred way of job submission. It is
> >  unlikely to harm the beginner's experience (and they would anyways not
> > read
> >  the release notes). I mention the beginner experience, because they are
> > the
> >  primary audience of the examples.
> >
> >  Regarding FLINK-18461 / Jark's issue: I would not block the release on
> >  that, but still try to get it fixed asap. It is likely that this RC
> > doesn't
> >  go through (given the rate at which we are finding issues), and even if
> it
> >  goes through, we can document it as a known issue in the release
> >  announcement and immediately release 1.11.1.
> >  Blocking the release on this causes quite a bit of work for the release
> >  managers for rolling a new RC. Until we have understood the performance
> >  regression Thomas is reporting, I would keep this RC open, and keep
> > testing.
> >
> >
> >  On Thu, Jul 2, 2020 at 8:34 AM Jark Wu <[hidden email]> wrote:
> >
> >  > Hi,
> >  >
> >  > I'm very sorry but we just found a blocker issue FLINK-18461 [1] in
> the
> > new
> >  > feature of changelog source (CDC).
> >  > This bug will result in queries on changelog source can’t be inserted
> > into
> >  > upsert sink (e.g. ES, JDBC, HBase),
> >  > which is a common case in production. CDC is one of the important
> > features
> >  > of Table/SQL in this release,
> >  > so from my side, I hope we can have this fix in 1.11.0, otherwise,
> this
> > is
> >  > a broken feature...
> >  >
> >  > Again, I am terribly sorry for delaying the release...
> >  >
> >  > Best,
> >  > Jark
> >  >
> >  > [1]: https://issues.apache.org/jira/browse/FLINK-18461
> >  >
> >  > On Thu, 2 Jul 2020 at 12:02, Zhijiang <[hidden email]
> > .invalid>
> >  > wrote:
> >  >
> >  > > Hi Thomas,
> >  > >
> >  > > Thanks for the efficient feedback.
> >  > >
> >  > > Regarding the suggestion of adding the release notes document, I
> agree
> >  > > with your point. Maybe we should adjust the vote template
> accordingly
> > in
> >  > > the respective wiki to guide the following release processes.
> >  > >
> >  > > Regarding the performance regression, could you provide some more
> > details
> >  > > for our better measurement or reproducing on our sides?
> >  > > E.g. I guess the topology only includes two vertexes source and
> sink?
> >  > > What is the parallelism for every vertex?
> >  > > The upstream shuffles data to the downstream via rebalance
> > partitioner or
> >  > > other?
> >  > > The checkpoint mode is exactly-once with rocksDB state backend?
> >  > > The backpressure happened in this case?
> >  > > How much percentage regression in this case?
> >  > >
> >  > > Best,
> >  > > Zhijiang
> >  > >
> >  > >
> >  > >
> >  > > ------------------------------------------------------------------
> >  > > From:Thomas Weise <[hidden email]>
> >  > > Send Time:2020年7月2日(星期四) 09:54
> >  > > To:dev <[hidden email]>
> >  > > Subject:Re: [VOTE] Release 1.11.0, release candidate #4
> >  > >
> >  > > Hi Till,
> >  > >
> >  > > Yes, we don't have the setting in flink-conf.yaml.
> >  > >
> >  > > Generally, we carry forward the existing configuration and any
> change
> > to
> >  > > default configuration values would impact the upgrade.
> >  > >
> >  > > Yes, since it is an incompatible change I would state it in the
> > release
> >  > > notes.
> >  > >
> >  > > Thanks,
> >  > > Thomas
> >  > >
> >  > > BTW I found a performance regression while trying to upgrade another
> >  > > pipeline with this RC. It is a simple Kinesis to Kinesis job. Wasn't
> > able
> >  > > to pin it down yet, symptoms include increased checkpoint alignment
> > time.
> >  > >
> >  > > On Wed, Jul 1, 2020 at 12:04 AM Till Rohrmann <[hidden email]
> >
> >  > > wrote:
> >  > >
> >  > > > Hi Thomas,
> >  > > >
> >  > > > just to confirm: When starting the image in local mode, then you
> > don't
> >  > > have
> >  > > > any of the JobManager memory configuration settings configured in
> > the
> >  > > > effective flink-conf.yaml, right? Does this mean that you have
> >  > explicitly
> >  > > > removed `jobmanager.heap.size: 1024m` from the default
> > configuration?
> >  > If
> >  > > > this is the case, then I believe it was more of an unintentional
> >  > artifact
> >  > > > that it worked before and it has been corrected now so that one
> > needs
> >  > to
> >  > > > specify the memory of the JM process explicitly. Do you think it
> > would
> >  > > help
> >  > > > to explicitly state this in the release notes?
> >  > > >
> >  > > > Cheers,
> >  > > > Till
> >  > > >
> >  > > > On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise <[hidden email]>
> wrote:
> >  > > >
> >  > > > > Thanks for preparing another RC!
> >  > > > >
> >  > > > > As mentioned in the previous RC thread, it would be super
> helpful
> > if
> >  > > the
> >  > > > > release notes that are part of the documentation can be included
> > [1].
> >  > > > It's
> >  > > > > a significant time-saver to have read those first.
> >  > > > >
> >  > > > > I found one more non-backward compatible change that would be
> > worth
> >  > > > > addressing/mentioning:
> >  > > > >
> >  > > > > It is now necessary to configure the jobmanager heap size in
> >  > > > > flink-conf.yaml (with either jobmanager.heap.size
> >  > > > > or jobmanager.memory.heap.size). Why would I not want to do that
> >  > > anyways?
> >  > > > > Well, we set it dynamically for a cluster deployment via the
> >  > > > > flinkk8soperator, but the container image can also be used for
> >  > testing
> >  > > > with
> >  > > > > local mode (./bin/jobmanager.sh start-foreground local). That
> will
> >  > fail
> >  > > > if
> >  > > > > the heap wasn't configured and that's how I noticed it.
> >  > > > >
> >  > > > > Thanks,
> >  > > > > Thomas
> >  > > > >
> >  > > > > [1]
> >  > > > >
> >  > > > >
> >  > > >
> >  > >
> >  >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html
> >  > > > >
> >  > > > > On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <
> > [hidden email]
> >  > > > > .invalid>
> >  > > > > wrote:
> >  > > > >
> >  > > > > > Hi everyone,
> >  > > > > >
> >  > > > > > Please review and vote on the release candidate #4 for the
> > version
> >  > > > > 1.11.0,
> >  > > > > > as follows:
> >  > > > > > [ ] +1, Approve the release
> >  > > > > > [ ] -1, Do not approve the release (please provide specific
> >  > comments)
> >  > > > > >
> >  > > > > > The complete staging area is available for your review, which
> >  > > includes:
> >  > > > > > * JIRA release notes [1],
> >  > > > > > * the official Apache source release and binary convenience
> >  > releases
> >  > > to
> >  > > > > be
> >  > > > > > deployed to dist.apache.org [2], which are signed with the
> key
> >  > with
> >  > > > > > fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
> >  > > > > > * all artifacts to be deployed to the Maven Central Repository
> > [4],
> >  > > > > > * source code tag "release-1.11.0-rc4" [5],
> >  > > > > > * website pull request listing the new release and adding
> >  > > announcement
> >  > > > > > blog post [6].
> >  > > > > >
> >  > > > > > The vote will be open for at least 72 hours. It is adopted by
> >  > > majority
> >  > > > > > approval, with at least 3 PMC affirmative votes.
> >  > > > > >
> >  > > > > > Thanks,
> >  > > > > > Release Manager
> >  > > > > >
> >  > > > > > [1]
> >  > > > > >
> >  > > > >
> >  > > >
> >  > >
> >  >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
> >  > > > > > [2]
> > https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
> >  > > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> >  > > > > > [4]
> >  > > > > >
> >  > > >
> >  >
> https://repository.apache.org/content/repositories/orgapacheflink-1377/
> >  > > > > > [5]
> >  > https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
> >  > > > > > [6] https://github.com/apache/flink-web/pull/352
> >  > > > > >
> >  > > > > >
> >  > > > >
> >  > > >
> >  > >
> >  > >
> >  >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.11.0, release candidate #4

Chesnay Schepler-3
Listing more than we need to (especially if it is apache licensed) isn't
a big problem, since nothing changes from a users perspective in regards
to licensing.

On 02/07/2020 17:08, Robert Metzger wrote:

> Issues found:
> -
> https://repository.apache.org/content/repositories/orgapacheflink-1377/org/apache/flink/flink-runtime_2.12/1.11.0/flink-runtime_2.12-1.11.0.jar
> ./META-INF/NOTICE lists "org.uncommons.maths:uncommons-maths:1.2.2a" as a
> bundled dependency. However, it seems they are not bundled. I'm waiting
> with my vote until we've discussed this issue. I'm leaning towards
> continuing the release vote (
> https://issues.apache.org/jira/browse/FLINK-18471).
>
> Checks:
> - source archive compiles
> - checked artifacts in staging repo
>    - flink-azure-fs-hadoop-1.11.0.jar seems to have a correct NOTICE file
>    - versions in pom seem correct
>    - checked some other jars
> - ... I will continue later ...
>
> On Thu, Jul 2, 2020 at 3:47 PM Stephan Ewen <[hidden email]> wrote:
>
>> +1 (binding) from my side
>>
>>    - legal files (license, notice) looks correct
>>    - no binaries in the release
>>    - ran examples from command line
>>    - ran some examples from web ui
>>    - log files look sane
>>    - RocksDB, incremental checkpoints, savepoints, moving savepoints
>> all works as expected.
>>
>> There are some friction points, which have also been mentioned. However, I
>> am not sure they need to block the release.
>>    - Some batch examples in the web UI have not been working in 1.10. We
>> should fix that asap, because it impacts the "getting started" experience,
>> but I personally don't vote against the release based on that
>>    - Same for the CDC bug. It is unfortunate, but I would not hold the
>> release at such a late stage for one special issue in a new connector.
>> Let's work on a timely 1.11.1.
>>
>>
>> I would withdraw my vote, if we find a fundamental issue in the network
>> system causing the increased checkpoint delays, causing the job regression
>> Thomas mentioned.
>> Such a core bug would be a deal-breaker for a large fraction of users.
>>
>>
>>
>>
>> On Thu, Jul 2, 2020 at 11:35 AM Zhijiang <[hidden email]
>> .invalid>
>> wrote:
>>
>>> I also agree with Till and Robert's proposals.
>>>
>>> In general I think we should not block the release based on current
>>> estimation. Otherwise we continuously postpone the release, it might
>>> probably occur new bugs for blockers, then we might probably
>>> get stuck in such cycle to not give a final release for users in time.
>> But
>>> that does not mean RC4 would be the final one, and we can reevaluate the
>>> effects in progress with the accumulated issues.
>>>
>>> Regarding the performance regression, if possible we can reproduce to
>>> analysis the reason based on Thomas's feedback, then we can evaluate its
>>> effect.
>>>
>>> Regarding the FLINK-18461, after syncing with Jark offline, the bug would
>>> effect one of three scenarios for using CDC feature, and this effected
>>> scenario is actually the most commonly used way by users.
>>> My suggestion is to merge it into release-1.11 ATM since the PR already
>>> open for review, then let's further finalize the conclusion later. If
>> this
>>> issue is the only one after RC4 going through, then another option is to
>>> cover it in next release-1.11.1 as Robert suggested, as we can prepare
>> for
>>> the next minor release soon. If there are other blockers issues during
>>> voting and necessary to be resolved soon, then it is no doubt to cover
>> all
>>> of them in next RC5.
>>>
>>> Best,
>>> Zhijiang
>>>
>>>
>>> ------------------------------------------------------------------
>>> From:Till Rohrmann <[hidden email]>
>>> Send Time:2020年7月2日(星期四) 16:46
>>> To:dev <[hidden email]>
>>> Cc:Zhijiang <[hidden email]>
>>> Subject:Re: [VOTE] Release 1.11.0, release candidate #4
>>>
>>> I agree with Robert.
>>>
>>> @Chesnay: The problem has probably already existed in Flink 1.10 and
>>> before because we cannot run jobs with eager execution calls from the web
>>> ui. I agree with Robert that we can/should improve our documentation in
>>> this regard, though.
>>>
>>> @Thomas:
>>> 1. I will update the release notes to add a short section describing that
>>> one needs to configure the JobManager memory.
>>> 2. Concerning the performance regression we should look into it. I
>> believe
>>> Zhijiang is very eager to learn more about your exact setup to further
>>> debug it. Again I agree with Robert to not block the release on it at the
>>> moment.
>>>
>>> @Jark: How much of a problem is FLINK-18461? Will it make the CDC feature
>>> completely unusable or will only make a subset of the use cases to not
>>> work? If it is the latter, then I believe that we can document the
>>> limitations and try to fix it asap. Depending on the remaining testing
>> the
>>> fix might make it into the 1.11.0 or the 1.11.1 release.
>>>
>>> Cheers,
>>> Till
>>> On Thu, Jul 2, 2020 at 10:33 AM Robert Metzger <[hidden email]>
>>> wrote:
>>> Thanks a lot for the thorough testing Thomas! This is really helpful!
>>>
>>>   @Chesnay: I would not block the release on this. The web submission does
>>>   not seem to be the documented / preferred way of job submission. It is
>>>   unlikely to harm the beginner's experience (and they would anyways not
>>> read
>>>   the release notes). I mention the beginner experience, because they are
>>> the
>>>   primary audience of the examples.
>>>
>>>   Regarding FLINK-18461 / Jark's issue: I would not block the release on
>>>   that, but still try to get it fixed asap. It is likely that this RC
>>> doesn't
>>>   go through (given the rate at which we are finding issues), and even if
>> it
>>>   goes through, we can document it as a known issue in the release
>>>   announcement and immediately release 1.11.1.
>>>   Blocking the release on this causes quite a bit of work for the release
>>>   managers for rolling a new RC. Until we have understood the performance
>>>   regression Thomas is reporting, I would keep this RC open, and keep
>>> testing.
>>>
>>>
>>>   On Thu, Jul 2, 2020 at 8:34 AM Jark Wu <[hidden email]> wrote:
>>>
>>>   > Hi,
>>>   >
>>>   > I'm very sorry but we just found a blocker issue FLINK-18461 [1] in
>> the
>>> new
>>>   > feature of changelog source (CDC).
>>>   > This bug will result in queries on changelog source can’t be inserted
>>> into
>>>   > upsert sink (e.g. ES, JDBC, HBase),
>>>   > which is a common case in production. CDC is one of the important
>>> features
>>>   > of Table/SQL in this release,
>>>   > so from my side, I hope we can have this fix in 1.11.0, otherwise,
>> this
>>> is
>>>   > a broken feature...
>>>   >
>>>   > Again, I am terribly sorry for delaying the release...
>>>   >
>>>   > Best,
>>>   > Jark
>>>   >
>>>   > [1]: https://issues.apache.org/jira/browse/FLINK-18461
>>>   >
>>>   > On Thu, 2 Jul 2020 at 12:02, Zhijiang <[hidden email]
>>> .invalid>
>>>   > wrote:
>>>   >
>>>   > > Hi Thomas,
>>>   > >
>>>   > > Thanks for the efficient feedback.
>>>   > >
>>>   > > Regarding the suggestion of adding the release notes document, I
>> agree
>>>   > > with your point. Maybe we should adjust the vote template
>> accordingly
>>> in
>>>   > > the respective wiki to guide the following release processes.
>>>   > >
>>>   > > Regarding the performance regression, could you provide some more
>>> details
>>>   > > for our better measurement or reproducing on our sides?
>>>   > > E.g. I guess the topology only includes two vertexes source and
>> sink?
>>>   > > What is the parallelism for every vertex?
>>>   > > The upstream shuffles data to the downstream via rebalance
>>> partitioner or
>>>   > > other?
>>>   > > The checkpoint mode is exactly-once with rocksDB state backend?
>>>   > > The backpressure happened in this case?
>>>   > > How much percentage regression in this case?
>>>   > >
>>>   > > Best,
>>>   > > Zhijiang
>>>   > >
>>>   > >
>>>   > >
>>>   > > ------------------------------------------------------------------
>>>   > > From:Thomas Weise <[hidden email]>
>>>   > > Send Time:2020年7月2日(星期四) 09:54
>>>   > > To:dev <[hidden email]>
>>>   > > Subject:Re: [VOTE] Release 1.11.0, release candidate #4
>>>   > >
>>>   > > Hi Till,
>>>   > >
>>>   > > Yes, we don't have the setting in flink-conf.yaml.
>>>   > >
>>>   > > Generally, we carry forward the existing configuration and any
>> change
>>> to
>>>   > > default configuration values would impact the upgrade.
>>>   > >
>>>   > > Yes, since it is an incompatible change I would state it in the
>>> release
>>>   > > notes.
>>>   > >
>>>   > > Thanks,
>>>   > > Thomas
>>>   > >
>>>   > > BTW I found a performance regression while trying to upgrade another
>>>   > > pipeline with this RC. It is a simple Kinesis to Kinesis job. Wasn't
>>> able
>>>   > > to pin it down yet, symptoms include increased checkpoint alignment
>>> time.
>>>   > >
>>>   > > On Wed, Jul 1, 2020 at 12:04 AM Till Rohrmann <[hidden email]
>>>
>>>   > > wrote:
>>>   > >
>>>   > > > Hi Thomas,
>>>   > > >
>>>   > > > just to confirm: When starting the image in local mode, then you
>>> don't
>>>   > > have
>>>   > > > any of the JobManager memory configuration settings configured in
>>> the
>>>   > > > effective flink-conf.yaml, right? Does this mean that you have
>>>   > explicitly
>>>   > > > removed `jobmanager.heap.size: 1024m` from the default
>>> configuration?
>>>   > If
>>>   > > > this is the case, then I believe it was more of an unintentional
>>>   > artifact
>>>   > > > that it worked before and it has been corrected now so that one
>>> needs
>>>   > to
>>>   > > > specify the memory of the JM process explicitly. Do you think it
>>> would
>>>   > > help
>>>   > > > to explicitly state this in the release notes?
>>>   > > >
>>>   > > > Cheers,
>>>   > > > Till
>>>   > > >
>>>   > > > On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise <[hidden email]>
>> wrote:
>>>   > > >
>>>   > > > > Thanks for preparing another RC!
>>>   > > > >
>>>   > > > > As mentioned in the previous RC thread, it would be super
>> helpful
>>> if
>>>   > > the
>>>   > > > > release notes that are part of the documentation can be included
>>> [1].
>>>   > > > It's
>>>   > > > > a significant time-saver to have read those first.
>>>   > > > >
>>>   > > > > I found one more non-backward compatible change that would be
>>> worth
>>>   > > > > addressing/mentioning:
>>>   > > > >
>>>   > > > > It is now necessary to configure the jobmanager heap size in
>>>   > > > > flink-conf.yaml (with either jobmanager.heap.size
>>>   > > > > or jobmanager.memory.heap.size). Why would I not want to do that
>>>   > > anyways?
>>>   > > > > Well, we set it dynamically for a cluster deployment via the
>>>   > > > > flinkk8soperator, but the container image can also be used for
>>>   > testing
>>>   > > > with
>>>   > > > > local mode (./bin/jobmanager.sh start-foreground local). That
>> will
>>>   > fail
>>>   > > > if
>>>   > > > > the heap wasn't configured and that's how I noticed it.
>>>   > > > >
>>>   > > > > Thanks,
>>>   > > > > Thomas
>>>   > > > >
>>>   > > > > [1]
>>>   > > > >
>>>   > > > >
>>>   > > >
>>>   > >
>>>   >
>>>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html
>>>   > > > >
>>>   > > > > On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <
>>> [hidden email]
>>>   > > > > .invalid>
>>>   > > > > wrote:
>>>   > > > >
>>>   > > > > > Hi everyone,
>>>   > > > > >
>>>   > > > > > Please review and vote on the release candidate #4 for the
>>> version
>>>   > > > > 1.11.0,
>>>   > > > > > as follows:
>>>   > > > > > [ ] +1, Approve the release
>>>   > > > > > [ ] -1, Do not approve the release (please provide specific
>>>   > comments)
>>>   > > > > >
>>>   > > > > > The complete staging area is available for your review, which
>>>   > > includes:
>>>   > > > > > * JIRA release notes [1],
>>>   > > > > > * the official Apache source release and binary convenience
>>>   > releases
>>>   > > to
>>>   > > > > be
>>>   > > > > > deployed to dist.apache.org [2], which are signed with the
>> key
>>>   > with
>>>   > > > > > fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
>>>   > > > > > * all artifacts to be deployed to the Maven Central Repository
>>> [4],
>>>   > > > > > * source code tag "release-1.11.0-rc4" [5],
>>>   > > > > > * website pull request listing the new release and adding
>>>   > > announcement
>>>   > > > > > blog post [6].
>>>   > > > > >
>>>   > > > > > The vote will be open for at least 72 hours. It is adopted by
>>>   > > majority
>>>   > > > > > approval, with at least 3 PMC affirmative votes.
>>>   > > > > >
>>>   > > > > > Thanks,
>>>   > > > > > Release Manager
>>>   > > > > >
>>>   > > > > > [1]
>>>   > > > > >
>>>   > > > >
>>>   > > >
>>>   > >
>>>   >
>>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
>>>   > > > > > [2]
>>> https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
>>>   > > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>>>   > > > > > [4]
>>>   > > > > >
>>>   > > >
>>>   >
>> https://repository.apache.org/content/repositories/orgapacheflink-1377/
>>>   > > > > > [5]
>>>   > https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
>>>   > > > > > [6] https://github.com/apache/flink-web/pull/352
>>>   > > > > >
>>>   > > > > >
>>>   > > > >
>>>   > > >
>>>   > >
>>>   > >
>>>   >
>>>
>>>

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.11.0, release candidate #4

Till Rohrmann
- verified checksums and signature
- built Flink from source release with Scala 2.12
- Executed some example jobs successfully
- verified license and notice files

I found the following issues with some NOTICE files:

* flink-connector-hive: org.apache.parquet:parquet-format:1.10.0 ->
org.apache.parquet:parquet-format:2.4.0
* flink-connector-kinesis:
  com.amazonaws:aws-java-sdk-dynamodb:jar:1.11.754 ->
com.amazonaws:aws-java-sdk-dynamodb:jar:1.11.603
  com.amazonaws:aws-java-sdk-s3:jar:1.11.754 ->
com.amazonaws:aws-java-sdk-s3:jar:1.11.603
  com.amazonaws:aws-java-sdk-kms:jar:1.11.754 ->
com.amazonaws:aws-java-sdk-kms:jar:1.11.603
* flink-sql-parquet: org.apache.commons:commons-compress:1.20 not used

So these three modules report wrong versions for their dependencies in the
NOTICE files. I would argue that this is not a big problem since the
license did not change and we are not required to list ASL 2.0
dependencies. Hence, I would suggest to continue with the release voting. I
will open a PR to fix these problems soon.

Given that this is not a problem and that we don't find a problem in the
network stack, +1 for this release candidate.

Cheers,
Till

On Thu, Jul 2, 2020 at 5:29 PM Chesnay Schepler <[hidden email]> wrote:

> Listing more than we need to (especially if it is apache licensed) isn't
> a big problem, since nothing changes from a users perspective in regards
> to licensing.
>
> On 02/07/2020 17:08, Robert Metzger wrote:
> > Issues found:
> > -
> >
> https://repository.apache.org/content/repositories/orgapacheflink-1377/org/apache/flink/flink-runtime_2.12/1.11.0/flink-runtime_2.12-1.11.0.jar
> > ./META-INF/NOTICE lists "org.uncommons.maths:uncommons-maths:1.2.2a" as a
> > bundled dependency. However, it seems they are not bundled. I'm waiting
> > with my vote until we've discussed this issue. I'm leaning towards
> > continuing the release vote (
> > https://issues.apache.org/jira/browse/FLINK-18471).
> >
> > Checks:
> > - source archive compiles
> > - checked artifacts in staging repo
> >    - flink-azure-fs-hadoop-1.11.0.jar seems to have a correct NOTICE file
> >    - versions in pom seem correct
> >    - checked some other jars
> > - ... I will continue later ...
> >
> > On Thu, Jul 2, 2020 at 3:47 PM Stephan Ewen <[hidden email]> wrote:
> >
> >> +1 (binding) from my side
> >>
> >>    - legal files (license, notice) looks correct
> >>    - no binaries in the release
> >>    - ran examples from command line
> >>    - ran some examples from web ui
> >>    - log files look sane
> >>    - RocksDB, incremental checkpoints, savepoints, moving savepoints
> >> all works as expected.
> >>
> >> There are some friction points, which have also been mentioned.
> However, I
> >> am not sure they need to block the release.
> >>    - Some batch examples in the web UI have not been working in 1.10. We
> >> should fix that asap, because it impacts the "getting started"
> experience,
> >> but I personally don't vote against the release based on that
> >>    - Same for the CDC bug. It is unfortunate, but I would not hold the
> >> release at such a late stage for one special issue in a new connector.
> >> Let's work on a timely 1.11.1.
> >>
> >>
> >> I would withdraw my vote, if we find a fundamental issue in the network
> >> system causing the increased checkpoint delays, causing the job
> regression
> >> Thomas mentioned.
> >> Such a core bug would be a deal-breaker for a large fraction of users.
> >>
> >>
> >>
> >>
> >> On Thu, Jul 2, 2020 at 11:35 AM Zhijiang <[hidden email]
> >> .invalid>
> >> wrote:
> >>
> >>> I also agree with Till and Robert's proposals.
> >>>
> >>> In general I think we should not block the release based on current
> >>> estimation. Otherwise we continuously postpone the release, it might
> >>> probably occur new bugs for blockers, then we might probably
> >>> get stuck in such cycle to not give a final release for users in time.
> >> But
> >>> that does not mean RC4 would be the final one, and we can reevaluate
> the
> >>> effects in progress with the accumulated issues.
> >>>
> >>> Regarding the performance regression, if possible we can reproduce to
> >>> analysis the reason based on Thomas's feedback, then we can evaluate
> its
> >>> effect.
> >>>
> >>> Regarding the FLINK-18461, after syncing with Jark offline, the bug
> would
> >>> effect one of three scenarios for using CDC feature, and this effected
> >>> scenario is actually the most commonly used way by users.
> >>> My suggestion is to merge it into release-1.11 ATM since the PR already
> >>> open for review, then let's further finalize the conclusion later. If
> >> this
> >>> issue is the only one after RC4 going through, then another option is
> to
> >>> cover it in next release-1.11.1 as Robert suggested, as we can prepare
> >> for
> >>> the next minor release soon. If there are other blockers issues during
> >>> voting and necessary to be resolved soon, then it is no doubt to cover
> >> all
> >>> of them in next RC5.
> >>>
> >>> Best,
> >>> Zhijiang
> >>>
> >>>
> >>> ------------------------------------------------------------------
> >>> From:Till Rohrmann <[hidden email]>
> >>> Send Time:2020年7月2日(星期四) 16:46
> >>> To:dev <[hidden email]>
> >>> Cc:Zhijiang <[hidden email]>
> >>> Subject:Re: [VOTE] Release 1.11.0, release candidate #4
> >>>
> >>> I agree with Robert.
> >>>
> >>> @Chesnay: The problem has probably already existed in Flink 1.10 and
> >>> before because we cannot run jobs with eager execution calls from the
> web
> >>> ui. I agree with Robert that we can/should improve our documentation in
> >>> this regard, though.
> >>>
> >>> @Thomas:
> >>> 1. I will update the release notes to add a short section describing
> that
> >>> one needs to configure the JobManager memory.
> >>> 2. Concerning the performance regression we should look into it. I
> >> believe
> >>> Zhijiang is very eager to learn more about your exact setup to further
> >>> debug it. Again I agree with Robert to not block the release on it at
> the
> >>> moment.
> >>>
> >>> @Jark: How much of a problem is FLINK-18461? Will it make the CDC
> feature
> >>> completely unusable or will only make a subset of the use cases to not
> >>> work? If it is the latter, then I believe that we can document the
> >>> limitations and try to fix it asap. Depending on the remaining testing
> >> the
> >>> fix might make it into the 1.11.0 or the 1.11.1 release.
> >>>
> >>> Cheers,
> >>> Till
> >>> On Thu, Jul 2, 2020 at 10:33 AM Robert Metzger <[hidden email]>
> >>> wrote:
> >>> Thanks a lot for the thorough testing Thomas! This is really helpful!
> >>>
> >>>   @Chesnay: I would not block the release on this. The web submission
> does
> >>>   not seem to be the documented / preferred way of job submission. It
> is
> >>>   unlikely to harm the beginner's experience (and they would anyways
> not
> >>> read
> >>>   the release notes). I mention the beginner experience, because they
> are
> >>> the
> >>>   primary audience of the examples.
> >>>
> >>>   Regarding FLINK-18461 / Jark's issue: I would not block the release
> on
> >>>   that, but still try to get it fixed asap. It is likely that this RC
> >>> doesn't
> >>>   go through (given the rate at which we are finding issues), and even
> if
> >> it
> >>>   goes through, we can document it as a known issue in the release
> >>>   announcement and immediately release 1.11.1.
> >>>   Blocking the release on this causes quite a bit of work for the
> release
> >>>   managers for rolling a new RC. Until we have understood the
> performance
> >>>   regression Thomas is reporting, I would keep this RC open, and keep
> >>> testing.
> >>>
> >>>
> >>>   On Thu, Jul 2, 2020 at 8:34 AM Jark Wu <[hidden email]> wrote:
> >>>
> >>>   > Hi,
> >>>   >
> >>>   > I'm very sorry but we just found a blocker issue FLINK-18461 [1] in
> >> the
> >>> new
> >>>   > feature of changelog source (CDC).
> >>>   > This bug will result in queries on changelog source can’t be
> inserted
> >>> into
> >>>   > upsert sink (e.g. ES, JDBC, HBase),
> >>>   > which is a common case in production. CDC is one of the important
> >>> features
> >>>   > of Table/SQL in this release,
> >>>   > so from my side, I hope we can have this fix in 1.11.0, otherwise,
> >> this
> >>> is
> >>>   > a broken feature...
> >>>   >
> >>>   > Again, I am terribly sorry for delaying the release...
> >>>   >
> >>>   > Best,
> >>>   > Jark
> >>>   >
> >>>   > [1]: https://issues.apache.org/jira/browse/FLINK-18461
> >>>   >
> >>>   > On Thu, 2 Jul 2020 at 12:02, Zhijiang <[hidden email]
> >>> .invalid>
> >>>   > wrote:
> >>>   >
> >>>   > > Hi Thomas,
> >>>   > >
> >>>   > > Thanks for the efficient feedback.
> >>>   > >
> >>>   > > Regarding the suggestion of adding the release notes document, I
> >> agree
> >>>   > > with your point. Maybe we should adjust the vote template
> >> accordingly
> >>> in
> >>>   > > the respective wiki to guide the following release processes.
> >>>   > >
> >>>   > > Regarding the performance regression, could you provide some more
> >>> details
> >>>   > > for our better measurement or reproducing on our sides?
> >>>   > > E.g. I guess the topology only includes two vertexes source and
> >> sink?
> >>>   > > What is the parallelism for every vertex?
> >>>   > > The upstream shuffles data to the downstream via rebalance
> >>> partitioner or
> >>>   > > other?
> >>>   > > The checkpoint mode is exactly-once with rocksDB state backend?
> >>>   > > The backpressure happened in this case?
> >>>   > > How much percentage regression in this case?
> >>>   > >
> >>>   > > Best,
> >>>   > > Zhijiang
> >>>   > >
> >>>   > >
> >>>   > >
> >>>   > >
> ------------------------------------------------------------------
> >>>   > > From:Thomas Weise <[hidden email]>
> >>>   > > Send Time:2020年7月2日(星期四) 09:54
> >>>   > > To:dev <[hidden email]>
> >>>   > > Subject:Re: [VOTE] Release 1.11.0, release candidate #4
> >>>   > >
> >>>   > > Hi Till,
> >>>   > >
> >>>   > > Yes, we don't have the setting in flink-conf.yaml.
> >>>   > >
> >>>   > > Generally, we carry forward the existing configuration and any
> >> change
> >>> to
> >>>   > > default configuration values would impact the upgrade.
> >>>   > >
> >>>   > > Yes, since it is an incompatible change I would state it in the
> >>> release
> >>>   > > notes.
> >>>   > >
> >>>   > > Thanks,
> >>>   > > Thomas
> >>>   > >
> >>>   > > BTW I found a performance regression while trying to upgrade
> another
> >>>   > > pipeline with this RC. It is a simple Kinesis to Kinesis job.
> Wasn't
> >>> able
> >>>   > > to pin it down yet, symptoms include increased checkpoint
> alignment
> >>> time.
> >>>   > >
> >>>   > > On Wed, Jul 1, 2020 at 12:04 AM Till Rohrmann <
> [hidden email]
> >>>
> >>>   > > wrote:
> >>>   > >
> >>>   > > > Hi Thomas,
> >>>   > > >
> >>>   > > > just to confirm: When starting the image in local mode, then
> you
> >>> don't
> >>>   > > have
> >>>   > > > any of the JobManager memory configuration settings configured
> in
> >>> the
> >>>   > > > effective flink-conf.yaml, right? Does this mean that you have
> >>>   > explicitly
> >>>   > > > removed `jobmanager.heap.size: 1024m` from the default
> >>> configuration?
> >>>   > If
> >>>   > > > this is the case, then I believe it was more of an
> unintentional
> >>>   > artifact
> >>>   > > > that it worked before and it has been corrected now so that one
> >>> needs
> >>>   > to
> >>>   > > > specify the memory of the JM process explicitly. Do you think
> it
> >>> would
> >>>   > > help
> >>>   > > > to explicitly state this in the release notes?
> >>>   > > >
> >>>   > > > Cheers,
> >>>   > > > Till
> >>>   > > >
> >>>   > > > On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise <[hidden email]>
> >> wrote:
> >>>   > > >
> >>>   > > > > Thanks for preparing another RC!
> >>>   > > > >
> >>>   > > > > As mentioned in the previous RC thread, it would be super
> >> helpful
> >>> if
> >>>   > > the
> >>>   > > > > release notes that are part of the documentation can be
> included
> >>> [1].
> >>>   > > > It's
> >>>   > > > > a significant time-saver to have read those first.
> >>>   > > > >
> >>>   > > > > I found one more non-backward compatible change that would be
> >>> worth
> >>>   > > > > addressing/mentioning:
> >>>   > > > >
> >>>   > > > > It is now necessary to configure the jobmanager heap size in
> >>>   > > > > flink-conf.yaml (with either jobmanager.heap.size
> >>>   > > > > or jobmanager.memory.heap.size). Why would I not want to do
> that
> >>>   > > anyways?
> >>>   > > > > Well, we set it dynamically for a cluster deployment via the
> >>>   > > > > flinkk8soperator, but the container image can also be used
> for
> >>>   > testing
> >>>   > > > with
> >>>   > > > > local mode (./bin/jobmanager.sh start-foreground local). That
> >> will
> >>>   > fail
> >>>   > > > if
> >>>   > > > > the heap wasn't configured and that's how I noticed it.
> >>>   > > > >
> >>>   > > > > Thanks,
> >>>   > > > > Thomas
> >>>   > > > >
> >>>   > > > > [1]
> >>>   > > > >
> >>>   > > > >
> >>>   > > >
> >>>   > >
> >>>   >
> >>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html
> >>>   > > > >
> >>>   > > > > On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <
> >>> [hidden email]
> >>>   > > > > .invalid>
> >>>   > > > > wrote:
> >>>   > > > >
> >>>   > > > > > Hi everyone,
> >>>   > > > > >
> >>>   > > > > > Please review and vote on the release candidate #4 for the
> >>> version
> >>>   > > > > 1.11.0,
> >>>   > > > > > as follows:
> >>>   > > > > > [ ] +1, Approve the release
> >>>   > > > > > [ ] -1, Do not approve the release (please provide specific
> >>>   > comments)
> >>>   > > > > >
> >>>   > > > > > The complete staging area is available for your review,
> which
> >>>   > > includes:
> >>>   > > > > > * JIRA release notes [1],
> >>>   > > > > > * the official Apache source release and binary convenience
> >>>   > releases
> >>>   > > to
> >>>   > > > > be
> >>>   > > > > > deployed to dist.apache.org [2], which are signed with the
> >> key
> >>>   > with
> >>>   > > > > > fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
> >>>   > > > > > * all artifacts to be deployed to the Maven Central
> Repository
> >>> [4],
> >>>   > > > > > * source code tag "release-1.11.0-rc4" [5],
> >>>   > > > > > * website pull request listing the new release and adding
> >>>   > > announcement
> >>>   > > > > > blog post [6].
> >>>   > > > > >
> >>>   > > > > > The vote will be open for at least 72 hours. It is adopted
> by
> >>>   > > majority
> >>>   > > > > > approval, with at least 3 PMC affirmative votes.
> >>>   > > > > >
> >>>   > > > > > Thanks,
> >>>   > > > > > Release Manager
> >>>   > > > > >
> >>>   > > > > > [1]
> >>>   > > > > >
> >>>   > > > >
> >>>   > > >
> >>>   > >
> >>>   >
> >>>
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
> >>>   > > > > > [2]
> >>> https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
> >>>   > > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> >>>   > > > > > [4]
> >>>   > > > > >
> >>>   > > >
> >>>   >
> >> https://repository.apache.org/content/repositories/orgapacheflink-1377/
> >>>   > > > > > [5]
> >>>   > https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
> >>>   > > > > > [6] https://github.com/apache/flink-web/pull/352
> >>>   > > > > >
> >>>   > > > > >
> >>>   > > > >
> >>>   > > >
> >>>   > >
> >>>   > >
> >>>   >
> >>>
> >>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.11.0, release candidate #4

Till Rohrmann
I've opened a PR for fixing the NOTICE file problems [1].

[1] https://github.com/apache/flink/pull/12811

Cheers,
Till

On Thu, Jul 2, 2020 at 6:23 PM Till Rohrmann <[hidden email]> wrote:

> - verified checksums and signature
> - built Flink from source release with Scala 2.12
> - Executed some example jobs successfully
> - verified license and notice files
>
> I found the following issues with some NOTICE files:
>
> * flink-connector-hive: org.apache.parquet:parquet-format:1.10.0 ->
> org.apache.parquet:parquet-format:2.4.0
> * flink-connector-kinesis:
>   com.amazonaws:aws-java-sdk-dynamodb:jar:1.11.754 ->
> com.amazonaws:aws-java-sdk-dynamodb:jar:1.11.603
>   com.amazonaws:aws-java-sdk-s3:jar:1.11.754 ->
> com.amazonaws:aws-java-sdk-s3:jar:1.11.603
>   com.amazonaws:aws-java-sdk-kms:jar:1.11.754 ->
> com.amazonaws:aws-java-sdk-kms:jar:1.11.603
> * flink-sql-parquet: org.apache.commons:commons-compress:1.20 not used
>
> So these three modules report wrong versions for their dependencies in the
> NOTICE files. I would argue that this is not a big problem since the
> license did not change and we are not required to list ASL 2.0
> dependencies. Hence, I would suggest to continue with the release voting. I
> will open a PR to fix these problems soon.
>
> Given that this is not a problem and that we don't find a problem in the
> network stack, +1 for this release candidate.
>
> Cheers,
> Till
>
> On Thu, Jul 2, 2020 at 5:29 PM Chesnay Schepler <[hidden email]>
> wrote:
>
>> Listing more than we need to (especially if it is apache licensed) isn't
>> a big problem, since nothing changes from a users perspective in regards
>> to licensing.
>>
>> On 02/07/2020 17:08, Robert Metzger wrote:
>> > Issues found:
>> > -
>> >
>> https://repository.apache.org/content/repositories/orgapacheflink-1377/org/apache/flink/flink-runtime_2.12/1.11.0/flink-runtime_2.12-1.11.0.jar
>> > ./META-INF/NOTICE lists "org.uncommons.maths:uncommons-maths:1.2.2a" as
>> a
>> > bundled dependency. However, it seems they are not bundled. I'm waiting
>> > with my vote until we've discussed this issue. I'm leaning towards
>> > continuing the release vote (
>> > https://issues.apache.org/jira/browse/FLINK-18471).
>> >
>> > Checks:
>> > - source archive compiles
>> > - checked artifacts in staging repo
>> >    - flink-azure-fs-hadoop-1.11.0.jar seems to have a correct NOTICE
>> file
>> >    - versions in pom seem correct
>> >    - checked some other jars
>> > - ... I will continue later ...
>> >
>> > On Thu, Jul 2, 2020 at 3:47 PM Stephan Ewen <[hidden email]> wrote:
>> >
>> >> +1 (binding) from my side
>> >>
>> >>    - legal files (license, notice) looks correct
>> >>    - no binaries in the release
>> >>    - ran examples from command line
>> >>    - ran some examples from web ui
>> >>    - log files look sane
>> >>    - RocksDB, incremental checkpoints, savepoints, moving savepoints
>> >> all works as expected.
>> >>
>> >> There are some friction points, which have also been mentioned.
>> However, I
>> >> am not sure they need to block the release.
>> >>    - Some batch examples in the web UI have not been working in 1.10.
>> We
>> >> should fix that asap, because it impacts the "getting started"
>> experience,
>> >> but I personally don't vote against the release based on that
>> >>    - Same for the CDC bug. It is unfortunate, but I would not hold the
>> >> release at such a late stage for one special issue in a new connector.
>> >> Let's work on a timely 1.11.1.
>> >>
>> >>
>> >> I would withdraw my vote, if we find a fundamental issue in the network
>> >> system causing the increased checkpoint delays, causing the job
>> regression
>> >> Thomas mentioned.
>> >> Such a core bug would be a deal-breaker for a large fraction of users.
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Jul 2, 2020 at 11:35 AM Zhijiang <[hidden email]
>> >> .invalid>
>> >> wrote:
>> >>
>> >>> I also agree with Till and Robert's proposals.
>> >>>
>> >>> In general I think we should not block the release based on current
>> >>> estimation. Otherwise we continuously postpone the release, it might
>> >>> probably occur new bugs for blockers, then we might probably
>> >>> get stuck in such cycle to not give a final release for users in time.
>> >> But
>> >>> that does not mean RC4 would be the final one, and we can reevaluate
>> the
>> >>> effects in progress with the accumulated issues.
>> >>>
>> >>> Regarding the performance regression, if possible we can reproduce to
>> >>> analysis the reason based on Thomas's feedback, then we can evaluate
>> its
>> >>> effect.
>> >>>
>> >>> Regarding the FLINK-18461, after syncing with Jark offline, the bug
>> would
>> >>> effect one of three scenarios for using CDC feature, and this effected
>> >>> scenario is actually the most commonly used way by users.
>> >>> My suggestion is to merge it into release-1.11 ATM since the PR
>> already
>> >>> open for review, then let's further finalize the conclusion later. If
>> >> this
>> >>> issue is the only one after RC4 going through, then another option is
>> to
>> >>> cover it in next release-1.11.1 as Robert suggested, as we can prepare
>> >> for
>> >>> the next minor release soon. If there are other blockers issues during
>> >>> voting and necessary to be resolved soon, then it is no doubt to cover
>> >> all
>> >>> of them in next RC5.
>> >>>
>> >>> Best,
>> >>> Zhijiang
>> >>>
>> >>>
>> >>> ------------------------------------------------------------------
>> >>> From:Till Rohrmann <[hidden email]>
>> >>> Send Time:2020年7月2日(星期四) 16:46
>> >>> To:dev <[hidden email]>
>> >>> Cc:Zhijiang <[hidden email]>
>> >>> Subject:Re: [VOTE] Release 1.11.0, release candidate #4
>> >>>
>> >>> I agree with Robert.
>> >>>
>> >>> @Chesnay: The problem has probably already existed in Flink 1.10 and
>> >>> before because we cannot run jobs with eager execution calls from the
>> web
>> >>> ui. I agree with Robert that we can/should improve our documentation
>> in
>> >>> this regard, though.
>> >>>
>> >>> @Thomas:
>> >>> 1. I will update the release notes to add a short section describing
>> that
>> >>> one needs to configure the JobManager memory.
>> >>> 2. Concerning the performance regression we should look into it. I
>> >> believe
>> >>> Zhijiang is very eager to learn more about your exact setup to further
>> >>> debug it. Again I agree with Robert to not block the release on it at
>> the
>> >>> moment.
>> >>>
>> >>> @Jark: How much of a problem is FLINK-18461? Will it make the CDC
>> feature
>> >>> completely unusable or will only make a subset of the use cases to not
>> >>> work? If it is the latter, then I believe that we can document the
>> >>> limitations and try to fix it asap. Depending on the remaining testing
>> >> the
>> >>> fix might make it into the 1.11.0 or the 1.11.1 release.
>> >>>
>> >>> Cheers,
>> >>> Till
>> >>> On Thu, Jul 2, 2020 at 10:33 AM Robert Metzger <[hidden email]>
>> >>> wrote:
>> >>> Thanks a lot for the thorough testing Thomas! This is really helpful!
>> >>>
>> >>>   @Chesnay: I would not block the release on this. The web submission
>> does
>> >>>   not seem to be the documented / preferred way of job submission. It
>> is
>> >>>   unlikely to harm the beginner's experience (and they would anyways
>> not
>> >>> read
>> >>>   the release notes). I mention the beginner experience, because they
>> are
>> >>> the
>> >>>   primary audience of the examples.
>> >>>
>> >>>   Regarding FLINK-18461 / Jark's issue: I would not block the release
>> on
>> >>>   that, but still try to get it fixed asap. It is likely that this RC
>> >>> doesn't
>> >>>   go through (given the rate at which we are finding issues), and
>> even if
>> >> it
>> >>>   goes through, we can document it as a known issue in the release
>> >>>   announcement and immediately release 1.11.1.
>> >>>   Blocking the release on this causes quite a bit of work for the
>> release
>> >>>   managers for rolling a new RC. Until we have understood the
>> performance
>> >>>   regression Thomas is reporting, I would keep this RC open, and keep
>> >>> testing.
>> >>>
>> >>>
>> >>>   On Thu, Jul 2, 2020 at 8:34 AM Jark Wu <[hidden email]> wrote:
>> >>>
>> >>>   > Hi,
>> >>>   >
>> >>>   > I'm very sorry but we just found a blocker issue FLINK-18461 [1]
>> in
>> >> the
>> >>> new
>> >>>   > feature of changelog source (CDC).
>> >>>   > This bug will result in queries on changelog source can’t be
>> inserted
>> >>> into
>> >>>   > upsert sink (e.g. ES, JDBC, HBase),
>> >>>   > which is a common case in production. CDC is one of the important
>> >>> features
>> >>>   > of Table/SQL in this release,
>> >>>   > so from my side, I hope we can have this fix in 1.11.0, otherwise,
>> >> this
>> >>> is
>> >>>   > a broken feature...
>> >>>   >
>> >>>   > Again, I am terribly sorry for delaying the release...
>> >>>   >
>> >>>   > Best,
>> >>>   > Jark
>> >>>   >
>> >>>   > [1]: https://issues.apache.org/jira/browse/FLINK-18461
>> >>>   >
>> >>>   > On Thu, 2 Jul 2020 at 12:02, Zhijiang <[hidden email]
>> >>> .invalid>
>> >>>   > wrote:
>> >>>   >
>> >>>   > > Hi Thomas,
>> >>>   > >
>> >>>   > > Thanks for the efficient feedback.
>> >>>   > >
>> >>>   > > Regarding the suggestion of adding the release notes document, I
>> >> agree
>> >>>   > > with your point. Maybe we should adjust the vote template
>> >> accordingly
>> >>> in
>> >>>   > > the respective wiki to guide the following release processes.
>> >>>   > >
>> >>>   > > Regarding the performance regression, could you provide some
>> more
>> >>> details
>> >>>   > > for our better measurement or reproducing on our sides?
>> >>>   > > E.g. I guess the topology only includes two vertexes source and
>> >> sink?
>> >>>   > > What is the parallelism for every vertex?
>> >>>   > > The upstream shuffles data to the downstream via rebalance
>> >>> partitioner or
>> >>>   > > other?
>> >>>   > > The checkpoint mode is exactly-once with rocksDB state backend?
>> >>>   > > The backpressure happened in this case?
>> >>>   > > How much percentage regression in this case?
>> >>>   > >
>> >>>   > > Best,
>> >>>   > > Zhijiang
>> >>>   > >
>> >>>   > >
>> >>>   > >
>> >>>   > >
>> ------------------------------------------------------------------
>> >>>   > > From:Thomas Weise <[hidden email]>
>> >>>   > > Send Time:2020年7月2日(星期四) 09:54
>> >>>   > > To:dev <[hidden email]>
>> >>>   > > Subject:Re: [VOTE] Release 1.11.0, release candidate #4
>> >>>   > >
>> >>>   > > Hi Till,
>> >>>   > >
>> >>>   > > Yes, we don't have the setting in flink-conf.yaml.
>> >>>   > >
>> >>>   > > Generally, we carry forward the existing configuration and any
>> >> change
>> >>> to
>> >>>   > > default configuration values would impact the upgrade.
>> >>>   > >
>> >>>   > > Yes, since it is an incompatible change I would state it in the
>> >>> release
>> >>>   > > notes.
>> >>>   > >
>> >>>   > > Thanks,
>> >>>   > > Thomas
>> >>>   > >
>> >>>   > > BTW I found a performance regression while trying to upgrade
>> another
>> >>>   > > pipeline with this RC. It is a simple Kinesis to Kinesis job.
>> Wasn't
>> >>> able
>> >>>   > > to pin it down yet, symptoms include increased checkpoint
>> alignment
>> >>> time.
>> >>>   > >
>> >>>   > > On Wed, Jul 1, 2020 at 12:04 AM Till Rohrmann <
>> [hidden email]
>> >>>
>> >>>   > > wrote:
>> >>>   > >
>> >>>   > > > Hi Thomas,
>> >>>   > > >
>> >>>   > > > just to confirm: When starting the image in local mode, then
>> you
>> >>> don't
>> >>>   > > have
>> >>>   > > > any of the JobManager memory configuration settings
>> configured in
>> >>> the
>> >>>   > > > effective flink-conf.yaml, right? Does this mean that you have
>> >>>   > explicitly
>> >>>   > > > removed `jobmanager.heap.size: 1024m` from the default
>> >>> configuration?
>> >>>   > If
>> >>>   > > > this is the case, then I believe it was more of an
>> unintentional
>> >>>   > artifact
>> >>>   > > > that it worked before and it has been corrected now so that
>> one
>> >>> needs
>> >>>   > to
>> >>>   > > > specify the memory of the JM process explicitly. Do you think
>> it
>> >>> would
>> >>>   > > help
>> >>>   > > > to explicitly state this in the release notes?
>> >>>   > > >
>> >>>   > > > Cheers,
>> >>>   > > > Till
>> >>>   > > >
>> >>>   > > > On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise <[hidden email]>
>> >> wrote:
>> >>>   > > >
>> >>>   > > > > Thanks for preparing another RC!
>> >>>   > > > >
>> >>>   > > > > As mentioned in the previous RC thread, it would be super
>> >> helpful
>> >>> if
>> >>>   > > the
>> >>>   > > > > release notes that are part of the documentation can be
>> included
>> >>> [1].
>> >>>   > > > It's
>> >>>   > > > > a significant time-saver to have read those first.
>> >>>   > > > >
>> >>>   > > > > I found one more non-backward compatible change that would
>> be
>> >>> worth
>> >>>   > > > > addressing/mentioning:
>> >>>   > > > >
>> >>>   > > > > It is now necessary to configure the jobmanager heap size in
>> >>>   > > > > flink-conf.yaml (with either jobmanager.heap.size
>> >>>   > > > > or jobmanager.memory.heap.size). Why would I not want to do
>> that
>> >>>   > > anyways?
>> >>>   > > > > Well, we set it dynamically for a cluster deployment via the
>> >>>   > > > > flinkk8soperator, but the container image can also be used
>> for
>> >>>   > testing
>> >>>   > > > with
>> >>>   > > > > local mode (./bin/jobmanager.sh start-foreground local).
>> That
>> >> will
>> >>>   > fail
>> >>>   > > > if
>> >>>   > > > > the heap wasn't configured and that's how I noticed it.
>> >>>   > > > >
>> >>>   > > > > Thanks,
>> >>>   > > > > Thomas
>> >>>   > > > >
>> >>>   > > > > [1]
>> >>>   > > > >
>> >>>   > > > >
>> >>>   > > >
>> >>>   > >
>> >>>   >
>> >>>
>> >>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html
>> >>>   > > > >
>> >>>   > > > > On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <
>> >>> [hidden email]
>> >>>   > > > > .invalid>
>> >>>   > > > > wrote:
>> >>>   > > > >
>> >>>   > > > > > Hi everyone,
>> >>>   > > > > >
>> >>>   > > > > > Please review and vote on the release candidate #4 for the
>> >>> version
>> >>>   > > > > 1.11.0,
>> >>>   > > > > > as follows:
>> >>>   > > > > > [ ] +1, Approve the release
>> >>>   > > > > > [ ] -1, Do not approve the release (please provide
>> specific
>> >>>   > comments)
>> >>>   > > > > >
>> >>>   > > > > > The complete staging area is available for your review,
>> which
>> >>>   > > includes:
>> >>>   > > > > > * JIRA release notes [1],
>> >>>   > > > > > * the official Apache source release and binary
>> convenience
>> >>>   > releases
>> >>>   > > to
>> >>>   > > > > be
>> >>>   > > > > > deployed to dist.apache.org [2], which are signed with
>> the
>> >> key
>> >>>   > with
>> >>>   > > > > > fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
>> >>>   > > > > > * all artifacts to be deployed to the Maven Central
>> Repository
>> >>> [4],
>> >>>   > > > > > * source code tag "release-1.11.0-rc4" [5],
>> >>>   > > > > > * website pull request listing the new release and adding
>> >>>   > > announcement
>> >>>   > > > > > blog post [6].
>> >>>   > > > > >
>> >>>   > > > > > The vote will be open for at least 72 hours. It is
>> adopted by
>> >>>   > > majority
>> >>>   > > > > > approval, with at least 3 PMC affirmative votes.
>> >>>   > > > > >
>> >>>   > > > > > Thanks,
>> >>>   > > > > > Release Manager
>> >>>   > > > > >
>> >>>   > > > > > [1]
>> >>>   > > > > >
>> >>>   > > > >
>> >>>   > > >
>> >>>   > >
>> >>>   >
>> >>>
>> >>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
>> >>>   > > > > > [2]
>> >>> https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
>> >>>   > > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>> >>>   > > > > > [4]
>> >>>   > > > > >
>> >>>   > > >
>> >>>   >
>> >>
>> https://repository.apache.org/content/repositories/orgapacheflink-1377/
>> >>>   > > > > > [5]
>> >>>   > https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
>> >>>   > > > > > [6] https://github.com/apache/flink-web/pull/352
>> >>>   > > > > >
>> >>>   > > > > >
>> >>>   > > > >
>> >>>   > > >
>> >>>   > >
>> >>>   > >
>> >>>   >
>> >>>
>> >>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.11.0, release candidate #4

Thomas Weise
In reply to this post by Zhijiang(wangzhijiang999)
Hi Zhijiang,

The performance degradation manifests in backpressure which leads to
growing backlog in the source. I switched a few times between 1.10 and 1.11
and the behavior is consistent.

The DAG is:

KinesisConsumer -> (Flat Map, Flat Map, Flat Map)   -------- forward
---------> KinesisProducer

Parallelism: 160
No shuffle/rebalance.

Checkpointing config:

Checkpointing Mode Exactly Once
Interval 10s
Timeout 10m 0s
Minimum Pause Between Checkpoints 10s
Maximum Concurrent Checkpoints 1
Persist Checkpoints Externally Enabled (delete on cancellation)

State backend: rocksdb  (filesystem leads to same symptoms)
Checkpoint size is tiny (500KB)

An interesting difference to another job that I had upgraded successfully
is the low checkpointing interval.

Thanks,
Thomas


On Wed, Jul 1, 2020 at 9:02 PM Zhijiang <[hidden email]>
wrote:

> Hi Thomas,
>
> Thanks for the efficient feedback.
>
> Regarding the suggestion of adding the release notes document, I agree
> with your point. Maybe we should adjust the vote template accordingly in
> the respective wiki to guide the following release processes.
>
> Regarding the performance regression, could you provide some more details
> for our better measurement or reproducing on our sides?
> E.g. I guess the topology only includes two vertexes source and sink?
> What is the parallelism for every vertex?
> The upstream shuffles data to the downstream via rebalance partitioner or
> other?
> The checkpoint mode is exactly-once with rocksDB state backend?
> The backpressure happened in this case?
> How much percentage regression in this case?
>
> Best,
> Zhijiang
>
>
>
> ------------------------------------------------------------------
> From:Thomas Weise <[hidden email]>
> Send Time:2020年7月2日(星期四) 09:54
> To:dev <[hidden email]>
> Subject:Re: [VOTE] Release 1.11.0, release candidate #4
>
> Hi Till,
>
> Yes, we don't have the setting in flink-conf.yaml.
>
> Generally, we carry forward the existing configuration and any change to
> default configuration values would impact the upgrade.
>
> Yes, since it is an incompatible change I would state it in the release
> notes.
>
> Thanks,
> Thomas
>
> BTW I found a performance regression while trying to upgrade another
> pipeline with this RC. It is a simple Kinesis to Kinesis job. Wasn't able
> to pin it down yet, symptoms include increased checkpoint alignment time.
>
> On Wed, Jul 1, 2020 at 12:04 AM Till Rohrmann <[hidden email]>
> wrote:
>
> > Hi Thomas,
> >
> > just to confirm: When starting the image in local mode, then you don't
> have
> > any of the JobManager memory configuration settings configured in the
> > effective flink-conf.yaml, right? Does this mean that you have explicitly
> > removed `jobmanager.heap.size: 1024m` from the default configuration? If
> > this is the case, then I believe it was more of an unintentional artifact
> > that it worked before and it has been corrected now so that one needs to
> > specify the memory of the JM process explicitly. Do you think it would
> help
> > to explicitly state this in the release notes?
> >
> > Cheers,
> > Till
> >
> > On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise <[hidden email]> wrote:
> >
> > > Thanks for preparing another RC!
> > >
> > > As mentioned in the previous RC thread, it would be super helpful if
> the
> > > release notes that are part of the documentation can be included [1].
> > It's
> > > a significant time-saver to have read those first.
> > >
> > > I found one more non-backward compatible change that would be worth
> > > addressing/mentioning:
> > >
> > > It is now necessary to configure the jobmanager heap size in
> > > flink-conf.yaml (with either jobmanager.heap.size
> > > or jobmanager.memory.heap.size). Why would I not want to do that
> anyways?
> > > Well, we set it dynamically for a cluster deployment via the
> > > flinkk8soperator, but the container image can also be used for testing
> > with
> > > local mode (./bin/jobmanager.sh start-foreground local). That will fail
> > if
> > > the heap wasn't configured and that's how I noticed it.
> > >
> > > Thanks,
> > > Thomas
> > >
> > > [1]
> > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html
> > >
> > > On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <[hidden email]
> > > .invalid>
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Please review and vote on the release candidate #4 for the version
> > > 1.11.0,
> > > > as follows:
> > > > [ ] +1, Approve the release
> > > > [ ] -1, Do not approve the release (please provide specific comments)
> > > >
> > > > The complete staging area is available for your review, which
> includes:
> > > > * JIRA release notes [1],
> > > > * the official Apache source release and binary convenience releases
> to
> > > be
> > > > deployed to dist.apache.org [2], which are signed with the key with
> > > > fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
> > > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > > * source code tag "release-1.11.0-rc4" [5],
> > > > * website pull request listing the new release and adding
> announcement
> > > > blog post [6].
> > > >
> > > > The vote will be open for at least 72 hours. It is adopted by
> majority
> > > > approval, with at least 3 PMC affirmative votes.
> > > >
> > > > Thanks,
> > > > Release Manager
> > > >
> > > > [1]
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
> > > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
> > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > [4]
> > > >
> > https://repository.apache.org/content/repositories/orgapacheflink-1377/
> > > > [5] https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
> > > > [6] https://github.com/apache/flink-web/pull/352
> > > >
> > > >
> > >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.11.0, release candidate #4

Robert Metzger
+1 (binding)

Checks:
- source archive compiles
- checked artifacts in staging repo
  - flink-azure-fs-hadoop-1.11.0.jar seems to have a correct NOTICE file
  - versions in pom seem correct
  - checked some other jars
- deployed Flink on YARN on Azure HDInsight (which uses Hadoop 3.1.1)
  - Reported some tiny log sanity issue:
https://issues.apache.org/jira/browse/FLINK-18474
  - Wordcount against HDFS works


On Thu, Jul 2, 2020 at 7:07 PM Thomas Weise <[hidden email]> wrote:

> Hi Zhijiang,
>
> The performance degradation manifests in backpressure which leads to
> growing backlog in the source. I switched a few times between 1.10 and 1.11
> and the behavior is consistent.
>
> The DAG is:
>
> KinesisConsumer -> (Flat Map, Flat Map, Flat Map)   -------- forward
> ---------> KinesisProducer
>
> Parallelism: 160
> No shuffle/rebalance.
>
> Checkpointing config:
>
> Checkpointing Mode Exactly Once
> Interval 10s
> Timeout 10m 0s
> Minimum Pause Between Checkpoints 10s
> Maximum Concurrent Checkpoints 1
> Persist Checkpoints Externally Enabled (delete on cancellation)
>
> State backend: rocksdb  (filesystem leads to same symptoms)
> Checkpoint size is tiny (500KB)
>
> An interesting difference to another job that I had upgraded successfully
> is the low checkpointing interval.
>
> Thanks,
> Thomas
>
>
> On Wed, Jul 1, 2020 at 9:02 PM Zhijiang <[hidden email]
> .invalid>
> wrote:
>
> > Hi Thomas,
> >
> > Thanks for the efficient feedback.
> >
> > Regarding the suggestion of adding the release notes document, I agree
> > with your point. Maybe we should adjust the vote template accordingly in
> > the respective wiki to guide the following release processes.
> >
> > Regarding the performance regression, could you provide some more details
> > for our better measurement or reproducing on our sides?
> > E.g. I guess the topology only includes two vertexes source and sink?
> > What is the parallelism for every vertex?
> > The upstream shuffles data to the downstream via rebalance partitioner or
> > other?
> > The checkpoint mode is exactly-once with rocksDB state backend?
> > The backpressure happened in this case?
> > How much percentage regression in this case?
> >
> > Best,
> > Zhijiang
> >
> >
> >
> > ------------------------------------------------------------------
> > From:Thomas Weise <[hidden email]>
> > Send Time:2020年7月2日(星期四) 09:54
> > To:dev <[hidden email]>
> > Subject:Re: [VOTE] Release 1.11.0, release candidate #4
> >
> > Hi Till,
> >
> > Yes, we don't have the setting in flink-conf.yaml.
> >
> > Generally, we carry forward the existing configuration and any change to
> > default configuration values would impact the upgrade.
> >
> > Yes, since it is an incompatible change I would state it in the release
> > notes.
> >
> > Thanks,
> > Thomas
> >
> > BTW I found a performance regression while trying to upgrade another
> > pipeline with this RC. It is a simple Kinesis to Kinesis job. Wasn't able
> > to pin it down yet, symptoms include increased checkpoint alignment time.
> >
> > On Wed, Jul 1, 2020 at 12:04 AM Till Rohrmann <[hidden email]>
> > wrote:
> >
> > > Hi Thomas,
> > >
> > > just to confirm: When starting the image in local mode, then you don't
> > have
> > > any of the JobManager memory configuration settings configured in the
> > > effective flink-conf.yaml, right? Does this mean that you have
> explicitly
> > > removed `jobmanager.heap.size: 1024m` from the default configuration?
> If
> > > this is the case, then I believe it was more of an unintentional
> artifact
> > > that it worked before and it has been corrected now so that one needs
> to
> > > specify the memory of the JM process explicitly. Do you think it would
> > help
> > > to explicitly state this in the release notes?
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise <[hidden email]> wrote:
> > >
> > > > Thanks for preparing another RC!
> > > >
> > > > As mentioned in the previous RC thread, it would be super helpful if
> > the
> > > > release notes that are part of the documentation can be included [1].
> > > It's
> > > > a significant time-saver to have read those first.
> > > >
> > > > I found one more non-backward compatible change that would be worth
> > > > addressing/mentioning:
> > > >
> > > > It is now necessary to configure the jobmanager heap size in
> > > > flink-conf.yaml (with either jobmanager.heap.size
> > > > or jobmanager.memory.heap.size). Why would I not want to do that
> > anyways?
> > > > Well, we set it dynamically for a cluster deployment via the
> > > > flinkk8soperator, but the container image can also be used for
> testing
> > > with
> > > > local mode (./bin/jobmanager.sh start-foreground local). That will
> fail
> > > if
> > > > the heap wasn't configured and that's how I noticed it.
> > > >
> > > > Thanks,
> > > > Thomas
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html
> > > >
> > > > On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <[hidden email]
> > > > .invalid>
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > Please review and vote on the release candidate #4 for the version
> > > > 1.11.0,
> > > > > as follows:
> > > > > [ ] +1, Approve the release
> > > > > [ ] -1, Do not approve the release (please provide specific
> comments)
> > > > >
> > > > > The complete staging area is available for your review, which
> > includes:
> > > > > * JIRA release notes [1],
> > > > > * the official Apache source release and binary convenience
> releases
> > to
> > > > be
> > > > > deployed to dist.apache.org [2], which are signed with the key
> with
> > > > > fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
> > > > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > > > * source code tag "release-1.11.0-rc4" [5],
> > > > > * website pull request listing the new release and adding
> > announcement
> > > > > blog post [6].
> > > > >
> > > > > The vote will be open for at least 72 hours. It is adopted by
> > majority
> > > > > approval, with at least 3 PMC affirmative votes.
> > > > >
> > > > > Thanks,
> > > > > Release Manager
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
> > > > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
> > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > > [4]
> > > > >
> > >
> https://repository.apache.org/content/repositories/orgapacheflink-1377/
> > > > > [5]
> https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
> > > > > [6] https://github.com/apache/flink-web/pull/352
> > > > >
> > > > >
> > > >
> > >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.11.0, release candidate #4

Kostas Kloudas-4
Hi all,

As far as the issue that Chesnay mentioned that leads to a "Caused by:
org.apache.flink.api.common.InvalidProgramException:"  for DataSet
examples with print() collect() or count() as sink, this was a
semi-intensional side-effect of the application mode. Before, in these
cases, the output was simply ignored. Now we have the same behavior as
in the "detached" mode. I already opened a PR for the release notes
(sorry for not doing it earlier although this was a known change in
behavior, as mentioned it in the PR here
https://github.com/apache/flink/pull/11460 ) and I will merge it
today.

Cheers,
Kostas

On Thu, Jul 2, 2020 at 8:07 PM Robert Metzger <[hidden email]> wrote:

>
> +1 (binding)
>
> Checks:
> - source archive compiles
> - checked artifacts in staging repo
>   - flink-azure-fs-hadoop-1.11.0.jar seems to have a correct NOTICE file
>   - versions in pom seem correct
>   - checked some other jars
> - deployed Flink on YARN on Azure HDInsight (which uses Hadoop 3.1.1)
>   - Reported some tiny log sanity issue:
> https://issues.apache.org/jira/browse/FLINK-18474
>   - Wordcount against HDFS works
>
>
> On Thu, Jul 2, 2020 at 7:07 PM Thomas Weise <[hidden email]> wrote:
>
> > Hi Zhijiang,
> >
> > The performance degradation manifests in backpressure which leads to
> > growing backlog in the source. I switched a few times between 1.10 and 1.11
> > and the behavior is consistent.
> >
> > The DAG is:
> >
> > KinesisConsumer -> (Flat Map, Flat Map, Flat Map)   -------- forward
> > ---------> KinesisProducer
> >
> > Parallelism: 160
> > No shuffle/rebalance.
> >
> > Checkpointing config:
> >
> > Checkpointing Mode Exactly Once
> > Interval 10s
> > Timeout 10m 0s
> > Minimum Pause Between Checkpoints 10s
> > Maximum Concurrent Checkpoints 1
> > Persist Checkpoints Externally Enabled (delete on cancellation)
> >
> > State backend: rocksdb  (filesystem leads to same symptoms)
> > Checkpoint size is tiny (500KB)
> >
> > An interesting difference to another job that I had upgraded successfully
> > is the low checkpointing interval.
> >
> > Thanks,
> > Thomas
> >
> >
> > On Wed, Jul 1, 2020 at 9:02 PM Zhijiang <[hidden email]
> > .invalid>
> > wrote:
> >
> > > Hi Thomas,
> > >
> > > Thanks for the efficient feedback.
> > >
> > > Regarding the suggestion of adding the release notes document, I agree
> > > with your point. Maybe we should adjust the vote template accordingly in
> > > the respective wiki to guide the following release processes.
> > >
> > > Regarding the performance regression, could you provide some more details
> > > for our better measurement or reproducing on our sides?
> > > E.g. I guess the topology only includes two vertexes source and sink?
> > > What is the parallelism for every vertex?
> > > The upstream shuffles data to the downstream via rebalance partitioner or
> > > other?
> > > The checkpoint mode is exactly-once with rocksDB state backend?
> > > The backpressure happened in this case?
> > > How much percentage regression in this case?
> > >
> > > Best,
> > > Zhijiang
> > >
> > >
> > >
> > > ------------------------------------------------------------------
> > > From:Thomas Weise <[hidden email]>
> > > Send Time:2020年7月2日(星期四) 09:54
> > > To:dev <[hidden email]>
> > > Subject:Re: [VOTE] Release 1.11.0, release candidate #4
> > >
> > > Hi Till,
> > >
> > > Yes, we don't have the setting in flink-conf.yaml.
> > >
> > > Generally, we carry forward the existing configuration and any change to
> > > default configuration values would impact the upgrade.
> > >
> > > Yes, since it is an incompatible change I would state it in the release
> > > notes.
> > >
> > > Thanks,
> > > Thomas
> > >
> > > BTW I found a performance regression while trying to upgrade another
> > > pipeline with this RC. It is a simple Kinesis to Kinesis job. Wasn't able
> > > to pin it down yet, symptoms include increased checkpoint alignment time.
> > >
> > > On Wed, Jul 1, 2020 at 12:04 AM Till Rohrmann <[hidden email]>
> > > wrote:
> > >
> > > > Hi Thomas,
> > > >
> > > > just to confirm: When starting the image in local mode, then you don't
> > > have
> > > > any of the JobManager memory configuration settings configured in the
> > > > effective flink-conf.yaml, right? Does this mean that you have
> > explicitly
> > > > removed `jobmanager.heap.size: 1024m` from the default configuration?
> > If
> > > > this is the case, then I believe it was more of an unintentional
> > artifact
> > > > that it worked before and it has been corrected now so that one needs
> > to
> > > > specify the memory of the JM process explicitly. Do you think it would
> > > help
> > > > to explicitly state this in the release notes?
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise <[hidden email]> wrote:
> > > >
> > > > > Thanks for preparing another RC!
> > > > >
> > > > > As mentioned in the previous RC thread, it would be super helpful if
> > > the
> > > > > release notes that are part of the documentation can be included [1].
> > > > It's
> > > > > a significant time-saver to have read those first.
> > > > >
> > > > > I found one more non-backward compatible change that would be worth
> > > > > addressing/mentioning:
> > > > >
> > > > > It is now necessary to configure the jobmanager heap size in
> > > > > flink-conf.yaml (with either jobmanager.heap.size
> > > > > or jobmanager.memory.heap.size). Why would I not want to do that
> > > anyways?
> > > > > Well, we set it dynamically for a cluster deployment via the
> > > > > flinkk8soperator, but the container image can also be used for
> > testing
> > > > with
> > > > > local mode (./bin/jobmanager.sh start-foreground local). That will
> > fail
> > > > if
> > > > > the heap wasn't configured and that's how I noticed it.
> > > > >
> > > > > Thanks,
> > > > > Thomas
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> > https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html
> > > > >
> > > > > On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <[hidden email]
> > > > > .invalid>
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > Please review and vote on the release candidate #4 for the version
> > > > > 1.11.0,
> > > > > > as follows:
> > > > > > [ ] +1, Approve the release
> > > > > > [ ] -1, Do not approve the release (please provide specific
> > comments)
> > > > > >
> > > > > > The complete staging area is available for your review, which
> > > includes:
> > > > > > * JIRA release notes [1],
> > > > > > * the official Apache source release and binary convenience
> > releases
> > > to
> > > > > be
> > > > > > deployed to dist.apache.org [2], which are signed with the key
> > with
> > > > > > fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
> > > > > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > > > > * source code tag "release-1.11.0-rc4" [5],
> > > > > > * website pull request listing the new release and adding
> > > announcement
> > > > > > blog post [6].
> > > > > >
> > > > > > The vote will be open for at least 72 hours. It is adopted by
> > > majority
> > > > > > approval, with at least 3 PMC affirmative votes.
> > > > > >
> > > > > > Thanks,
> > > > > > Release Manager
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
> > > > > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
> > > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > > > > [4]
> > > > > >
> > > >
> > https://repository.apache.org/content/repositories/orgapacheflink-1377/
> > > > > > [5]
> > https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
> > > > > > [6] https://github.com/apache/flink-web/pull/352
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.11.0, release candidate #4

Aljoscha Krettek-2
+1

  - verified hash of source release
  - verified signature of source release
  - source release compiles (with Scala 2.11)
  - examples run without spurious log output (errors, exceptions)

I can confirm that log scrolling doesn't work on Firefox, though it
never has.

I would also feel better if we can find the source of the performance
regression that Thomas mentioned. It might be that we have to solve that
in a .1 patch release.

Best,
Aljoscha

On 02.07.20 20:37, Kostas Kloudas wrote:

> Hi all,
>
> As far as the issue that Chesnay mentioned that leads to a "Caused by:
> org.apache.flink.api.common.InvalidProgramException:"  for DataSet
> examples with print() collect() or count() as sink, this was a
> semi-intensional side-effect of the application mode. Before, in these
> cases, the output was simply ignored. Now we have the same behavior as
> in the "detached" mode. I already opened a PR for the release notes
> (sorry for not doing it earlier although this was a known change in
> behavior, as mentioned it in the PR here
> https://github.com/apache/flink/pull/11460 ) and I will merge it
> today.
>
> Cheers,
> Kostas
>
> On Thu, Jul 2, 2020 at 8:07 PM Robert Metzger <[hidden email]> wrote:
>>
>> +1 (binding)
>>
>> Checks:
>> - source archive compiles
>> - checked artifacts in staging repo
>>    - flink-azure-fs-hadoop-1.11.0.jar seems to have a correct NOTICE file
>>    - versions in pom seem correct
>>    - checked some other jars
>> - deployed Flink on YARN on Azure HDInsight (which uses Hadoop 3.1.1)
>>    - Reported some tiny log sanity issue:
>> https://issues.apache.org/jira/browse/FLINK-18474
>>    - Wordcount against HDFS works
>>
>>
>> On Thu, Jul 2, 2020 at 7:07 PM Thomas Weise <[hidden email]> wrote:
>>
>>> Hi Zhijiang,
>>>
>>> The performance degradation manifests in backpressure which leads to
>>> growing backlog in the source. I switched a few times between 1.10 and 1.11
>>> and the behavior is consistent.
>>>
>>> The DAG is:
>>>
>>> KinesisConsumer -> (Flat Map, Flat Map, Flat Map)   -------- forward
>>> ---------> KinesisProducer
>>>
>>> Parallelism: 160
>>> No shuffle/rebalance.
>>>
>>> Checkpointing config:
>>>
>>> Checkpointing Mode Exactly Once
>>> Interval 10s
>>> Timeout 10m 0s
>>> Minimum Pause Between Checkpoints 10s
>>> Maximum Concurrent Checkpoints 1
>>> Persist Checkpoints Externally Enabled (delete on cancellation)
>>>
>>> State backend: rocksdb  (filesystem leads to same symptoms)
>>> Checkpoint size is tiny (500KB)
>>>
>>> An interesting difference to another job that I had upgraded successfully
>>> is the low checkpointing interval.
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>> On Wed, Jul 1, 2020 at 9:02 PM Zhijiang <[hidden email]
>>> .invalid>
>>> wrote:
>>>
>>>> Hi Thomas,
>>>>
>>>> Thanks for the efficient feedback.
>>>>
>>>> Regarding the suggestion of adding the release notes document, I agree
>>>> with your point. Maybe we should adjust the vote template accordingly in
>>>> the respective wiki to guide the following release processes.
>>>>
>>>> Regarding the performance regression, could you provide some more details
>>>> for our better measurement or reproducing on our sides?
>>>> E.g. I guess the topology only includes two vertexes source and sink?
>>>> What is the parallelism for every vertex?
>>>> The upstream shuffles data to the downstream via rebalance partitioner or
>>>> other?
>>>> The checkpoint mode is exactly-once with rocksDB state backend?
>>>> The backpressure happened in this case?
>>>> How much percentage regression in this case?
>>>>
>>>> Best,
>>>> Zhijiang
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------
>>>> From:Thomas Weise <[hidden email]>
>>>> Send Time:2020年7月2日(星期四) 09:54
>>>> To:dev <[hidden email]>
>>>> Subject:Re: [VOTE] Release 1.11.0, release candidate #4
>>>>
>>>> Hi Till,
>>>>
>>>> Yes, we don't have the setting in flink-conf.yaml.
>>>>
>>>> Generally, we carry forward the existing configuration and any change to
>>>> default configuration values would impact the upgrade.
>>>>
>>>> Yes, since it is an incompatible change I would state it in the release
>>>> notes.
>>>>
>>>> Thanks,
>>>> Thomas
>>>>
>>>> BTW I found a performance regression while trying to upgrade another
>>>> pipeline with this RC. It is a simple Kinesis to Kinesis job. Wasn't able
>>>> to pin it down yet, symptoms include increased checkpoint alignment time.
>>>>
>>>> On Wed, Jul 1, 2020 at 12:04 AM Till Rohrmann <[hidden email]>
>>>> wrote:
>>>>
>>>>> Hi Thomas,
>>>>>
>>>>> just to confirm: When starting the image in local mode, then you don't
>>>> have
>>>>> any of the JobManager memory configuration settings configured in the
>>>>> effective flink-conf.yaml, right? Does this mean that you have
>>> explicitly
>>>>> removed `jobmanager.heap.size: 1024m` from the default configuration?
>>> If
>>>>> this is the case, then I believe it was more of an unintentional
>>> artifact
>>>>> that it worked before and it has been corrected now so that one needs
>>> to
>>>>> specify the memory of the JM process explicitly. Do you think it would
>>>> help
>>>>> to explicitly state this in the release notes?
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>> On Wed, Jul 1, 2020 at 7:01 AM Thomas Weise <[hidden email]> wrote:
>>>>>
>>>>>> Thanks for preparing another RC!
>>>>>>
>>>>>> As mentioned in the previous RC thread, it would be super helpful if
>>>> the
>>>>>> release notes that are part of the documentation can be included [1].
>>>>> It's
>>>>>> a significant time-saver to have read those first.
>>>>>>
>>>>>> I found one more non-backward compatible change that would be worth
>>>>>> addressing/mentioning:
>>>>>>
>>>>>> It is now necessary to configure the jobmanager heap size in
>>>>>> flink-conf.yaml (with either jobmanager.heap.size
>>>>>> or jobmanager.memory.heap.size). Why would I not want to do that
>>>> anyways?
>>>>>> Well, we set it dynamically for a cluster deployment via the
>>>>>> flinkk8soperator, but the container image can also be used for
>>> testing
>>>>> with
>>>>>> local mode (./bin/jobmanager.sh start-foreground local). That will
>>> fail
>>>>> if
>>>>>> the heap wasn't configured and that's how I noticed it.
>>>>>>
>>>>>> Thanks,
>>>>>> Thomas
>>>>>>
>>>>>> [1]
>>>>>>
>>>>>>
>>>>>
>>>>
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html
>>>>>>
>>>>>> On Tue, Jun 30, 2020 at 3:18 AM Zhijiang <[hidden email]
>>>>>> .invalid>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi everyone,
>>>>>>>
>>>>>>> Please review and vote on the release candidate #4 for the version
>>>>>> 1.11.0,
>>>>>>> as follows:
>>>>>>> [ ] +1, Approve the release
>>>>>>> [ ] -1, Do not approve the release (please provide specific
>>> comments)
>>>>>>>
>>>>>>> The complete staging area is available for your review, which
>>>> includes:
>>>>>>> * JIRA release notes [1],
>>>>>>> * the official Apache source release and binary convenience
>>> releases
>>>> to
>>>>>> be
>>>>>>> deployed to dist.apache.org [2], which are signed with the key
>>> with
>>>>>>> fingerprint 2DA85B93244FDFA19A6244500653C0A2CEA00D0E [3],
>>>>>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>>>>>> * source code tag "release-1.11.0-rc4" [5],
>>>>>>> * website pull request listing the new release and adding
>>>> announcement
>>>>>>> blog post [6].
>>>>>>>
>>>>>>> The vote will be open for at least 72 hours. It is adopted by
>>>> majority
>>>>>>> approval, with at least 3 PMC affirmative votes.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Release Manager
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>
>>>>>
>>>>
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346364
>>>>>>> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.11.0-rc4/
>>>>>>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>>>>>>> [4]
>>>>>>>
>>>>>
>>> https://repository.apache.org/content/repositories/orgapacheflink-1377/
>>>>>>> [5]
>>> https://github.com/apache/flink/releases/tag/release-1.11.0-rc4
>>>>>>> [6] https://github.com/apache/flink-web/pull/352
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>

123