Dear Flink community,
Please vote on releasing the following candidate as Apache Flink version 1.1.0. I've CC'd [hidden email] as users are encouraged to help testing Flink 1.1.0 for their specific use cases. Please feel free to report issues and successful tests on [hidden email]. The commit to be voted on: 3a18463 (http://git-wip-us.apache.org/repos/asf/flink/commit/3a18463) Branch: release-1.1.0-rc1 (https://git1-us-west.apache.org/repos/asf/flink/repo?p=flink.git;a=shortlog;h=refs/heads/release-1.1.0-rc1 ) The release artifacts to be voted on can be found at: http://people.apache.org/~uce/flink-1.1.0-rc1/ The release artifacts are signed with the key with fingerprint 9D403309: http://www.apache.org/dist/flink/KEYS The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapacheflink-1098 There is also a Google doc to coordinate the testing efforts. This is a copy of the release document found in our Wiki: https://docs.google.com/document/d/1cDZGtnGJKLU1fLw8AE_FzkoDLOR8amYT2oc3mD0_lw4/edit?usp=sharing ------------------------------------------------------------- Thanks to everyone who contributed to this release candidate. The vote is open for the next 3 days (not counting the weekend) and passes if a majority of at least three +1 PMC votes are cast. The vote ends on Monday August 1st, 2016. [ ] +1 Release this package as Apache Flink 1.1.0 [ ] -1 Do not release this package, because ... |
When running "mvn clean verify" with Hadoop version 2.6.1 the
Zookeeper/Leader Election tests fail with this: java.lang.NoSuchMethodError: org.apache.curator.utils.PathUtils.validatePath(Ljava/lang/String;)Ljava/lang/String; at org.apache.curator.framework.imps.NamespaceImpl.<init>(NamespaceImpl.java:37) at org.apache.curator.framework.imps.CuratorFrameworkImpl.<init>(CuratorFrameworkImpl.java:113) at org.apache.curator.framework.CuratorFrameworkFactory$Builder.build(CuratorFrameworkFactory.java:124) at org.apache.flink.runtime.util.ZooKeeperUtils.startCuratorFramework(ZooKeeperUtils.java:101) at org.apache.flink.runtime.util.ZooKeeperUtils.createLeaderRetrievalService(ZooKeeperUtils.java:143) at org.apache.flink.runtime.util.LeaderRetrievalUtils.createLeaderRetrievalService(LeaderRetrievalUtils.java:70) at org.apache.flink.runtime.leaderelection.ZooKeeperLeaderRetrievalTest.testTimeoutOfFindConnectingAddress(ZooKeeperLeaderRetrievalTest.java:187) I'll continue testing other parts and other Hadoop versions. On Wed, 27 Jul 2016 at 11:51 Ufuk Celebi <[hidden email]> wrote: > Dear Flink community, > > Please vote on releasing the following candidate as Apache Flink version > 1.1.0. > > I've CC'd [hidden email] as users are encouraged to help > testing Flink 1.1.0 for their specific use cases. Please feel free to > report issues and successful tests on [hidden email]. > > The commit to be voted on: > 3a18463 (http://git-wip-us.apache.org/repos/asf/flink/commit/3a18463) > > Branch: > release-1.1.0-rc1 > ( > https://git1-us-west.apache.org/repos/asf/flink/repo?p=flink.git;a=shortlog;h=refs/heads/release-1.1.0-rc1 > ) > > The release artifacts to be voted on can be found at: > http://people.apache.org/~uce/flink-1.1.0-rc1/ > > The release artifacts are signed with the key with fingerprint 9D403309: > http://www.apache.org/dist/flink/KEYS > > The staging repository for this release can be found at: > https://repository.apache.org/content/repositories/orgapacheflink-1098 > > There is also a Google doc to coordinate the testing efforts. This is > a copy of the release document found in our Wiki: > > https://docs.google.com/document/d/1cDZGtnGJKLU1fLw8AE_FzkoDLOR8amYT2oc3mD0_lw4/edit?usp=sharing > > ------------------------------------------------------------- > > Thanks to everyone who contributed to this release candidate. > > The vote is open for the next 3 days (not counting the weekend) and > passes if a majority of at least three +1 PMC votes are cast. > > The vote ends on Monday August 1st, 2016. > > [ ] +1 Release this package as Apache Flink 1.1.0 > [ ] -1 Do not release this package, because ... > |
Probably related to shading :( What's strange is that Travis builds
for Hadoop 2.6.3 with the release-1.1 branch do succeed (sometimes... Travis is super flakey at the moment, because of some corrupted cached dependencies): https://travis-ci.org/apache/flink/jobs/148348699 On Fri, Jul 29, 2016 at 4:19 PM, Aljoscha Krettek <[hidden email]> wrote: > When running "mvn clean verify" with Hadoop version 2.6.1 the > Zookeeper/Leader Election tests fail with this: > > java.lang.NoSuchMethodError: > org.apache.curator.utils.PathUtils.validatePath(Ljava/lang/String;)Ljava/lang/String; > at > org.apache.curator.framework.imps.NamespaceImpl.<init>(NamespaceImpl.java:37) > at > org.apache.curator.framework.imps.CuratorFrameworkImpl.<init>(CuratorFrameworkImpl.java:113) > at > org.apache.curator.framework.CuratorFrameworkFactory$Builder.build(CuratorFrameworkFactory.java:124) > at > org.apache.flink.runtime.util.ZooKeeperUtils.startCuratorFramework(ZooKeeperUtils.java:101) > at > org.apache.flink.runtime.util.ZooKeeperUtils.createLeaderRetrievalService(ZooKeeperUtils.java:143) > at > org.apache.flink.runtime.util.LeaderRetrievalUtils.createLeaderRetrievalService(LeaderRetrievalUtils.java:70) > at > org.apache.flink.runtime.leaderelection.ZooKeeperLeaderRetrievalTest.testTimeoutOfFindConnectingAddress(ZooKeeperLeaderRetrievalTest.java:187) > > I'll continue testing other parts and other Hadoop versions. > > On Wed, 27 Jul 2016 at 11:51 Ufuk Celebi <[hidden email]> wrote: > >> Dear Flink community, >> >> Please vote on releasing the following candidate as Apache Flink version >> 1.1.0. >> >> I've CC'd [hidden email] as users are encouraged to help >> testing Flink 1.1.0 for their specific use cases. Please feel free to >> report issues and successful tests on [hidden email]. >> >> The commit to be voted on: >> 3a18463 (http://git-wip-us.apache.org/repos/asf/flink/commit/3a18463) >> >> Branch: >> release-1.1.0-rc1 >> ( >> https://git1-us-west.apache.org/repos/asf/flink/repo?p=flink.git;a=shortlog;h=refs/heads/release-1.1.0-rc1 >> ) >> >> The release artifacts to be voted on can be found at: >> http://people.apache.org/~uce/flink-1.1.0-rc1/ >> >> The release artifacts are signed with the key with fingerprint 9D403309: >> http://www.apache.org/dist/flink/KEYS >> >> The staging repository for this release can be found at: >> https://repository.apache.org/content/repositories/orgapacheflink-1098 >> >> There is also a Google doc to coordinate the testing efforts. This is >> a copy of the release document found in our Wiki: >> >> https://docs.google.com/document/d/1cDZGtnGJKLU1fLw8AE_FzkoDLOR8amYT2oc3mD0_lw4/edit?usp=sharing >> >> ------------------------------------------------------------- >> >> Thanks to everyone who contributed to this release candidate. >> >> The vote is open for the next 3 days (not counting the weekend) and >> passes if a majority of at least three +1 PMC votes are cast. >> >> The vote ends on Monday August 1st, 2016. >> >> [ ] +1 Release this package as Apache Flink 1.1.0 >> [ ] -1 Do not release this package, because ... >> |
Just tried to reproduce the error reported by Aljoscha, but could not.
I used a clean checkpoint of the RC1 code and cleaned all local maven caches before the testing. @Aljoscha: Can you reproduce this on your machine? Can you try and clean the maven caches? On Sun, Jul 31, 2016 at 7:31 PM, Ufuk Celebi <[hidden email]> wrote: > Probably related to shading :( What's strange is that Travis builds > for Hadoop 2.6.3 with the release-1.1 branch do succeed (sometimes... > Travis is super flakey at the moment, because of some corrupted cached > dependencies): https://travis-ci.org/apache/flink/jobs/148348699 > > On Fri, Jul 29, 2016 at 4:19 PM, Aljoscha Krettek <[hidden email]> > wrote: > > When running "mvn clean verify" with Hadoop version 2.6.1 the > > Zookeeper/Leader Election tests fail with this: > > > > java.lang.NoSuchMethodError: > > > org.apache.curator.utils.PathUtils.validatePath(Ljava/lang/String;)Ljava/lang/String; > > at > > > org.apache.curator.framework.imps.NamespaceImpl.<init>(NamespaceImpl.java:37) > > at > > > org.apache.curator.framework.imps.CuratorFrameworkImpl.<init>(CuratorFrameworkImpl.java:113) > > at > > > org.apache.curator.framework.CuratorFrameworkFactory$Builder.build(CuratorFrameworkFactory.java:124) > > at > > > org.apache.flink.runtime.util.ZooKeeperUtils.startCuratorFramework(ZooKeeperUtils.java:101) > > at > > > org.apache.flink.runtime.util.ZooKeeperUtils.createLeaderRetrievalService(ZooKeeperUtils.java:143) > > at > > > org.apache.flink.runtime.util.LeaderRetrievalUtils.createLeaderRetrievalService(LeaderRetrievalUtils.java:70) > > at > > > org.apache.flink.runtime.leaderelection.ZooKeeperLeaderRetrievalTest.testTimeoutOfFindConnectingAddress(ZooKeeperLeaderRetrievalTest.java:187) > > > > I'll continue testing other parts and other Hadoop versions. > > > > On Wed, 27 Jul 2016 at 11:51 Ufuk Celebi <[hidden email]> wrote: > > > >> Dear Flink community, > >> > >> Please vote on releasing the following candidate as Apache Flink version > >> 1.1.0. > >> > >> I've CC'd [hidden email] as users are encouraged to help > >> testing Flink 1.1.0 for their specific use cases. Please feel free to > >> report issues and successful tests on [hidden email]. > >> > >> The commit to be voted on: > >> 3a18463 (http://git-wip-us.apache.org/repos/asf/flink/commit/3a18463) > >> > >> Branch: > >> release-1.1.0-rc1 > >> ( > >> > https://git1-us-west.apache.org/repos/asf/flink/repo?p=flink.git;a=shortlog;h=refs/heads/release-1.1.0-rc1 > >> ) > >> > >> The release artifacts to be voted on can be found at: > >> http://people.apache.org/~uce/flink-1.1.0-rc1/ > >> > >> The release artifacts are signed with the key with fingerprint 9D403309: > >> http://www.apache.org/dist/flink/KEYS > >> > >> The staging repository for this release can be found at: > >> https://repository.apache.org/content/repositories/orgapacheflink-1098 > >> > >> There is also a Google doc to coordinate the testing efforts. This is > >> a copy of the release document found in our Wiki: > >> > >> > https://docs.google.com/document/d/1cDZGtnGJKLU1fLw8AE_FzkoDLOR8amYT2oc3mD0_lw4/edit?usp=sharing > >> > >> ------------------------------------------------------------- > >> > >> Thanks to everyone who contributed to this release candidate. > >> > >> The vote is open for the next 3 days (not counting the weekend) and > >> passes if a majority of at least three +1 PMC votes are cast. > >> > >> The vote ends on Monday August 1st, 2016. > >> > >> [ ] +1 Release this package as Apache Flink 1.1.0 > >> [ ] -1 Do not release this package, because ... > >> > |
Thanks for the new release candidate Ufuk!
Found two issues during testing: 1) Scheduling: The Flink scheduler accepts (it shouldn't) jobs with parallelism > total number of task slots, schedules tasks in all available task slots, and leaves the remaining tasks lingering forever. Haven't had time to investigate much, but a bit more details here: => JIRA: https://issues.apache.org/jira/browse/FLINK-4296 2) Yarn encoding issues with special characters in automatically determined location of the far jar => JIRA: https://issues.apache.org/jira/browse/FLINK-4297 => Fix: https://github.com/apache/flink/pull/2320 Otherwise, looks pretty good so far :) On Mon, Aug 1, 2016 at 10:27 AM, Stephan Ewen <[hidden email]> wrote: > Just tried to reproduce the error reported by Aljoscha, but could not. > I used a clean checkpoint of the RC1 code and cleaned all local maven caches > before the testing. > > @Aljoscha: Can you reproduce this on your machine? Can you try and clean the > maven caches? > > On Sun, Jul 31, 2016 at 7:31 PM, Ufuk Celebi <[hidden email]> wrote: >> >> Probably related to shading :( What's strange is that Travis builds >> for Hadoop 2.6.3 with the release-1.1 branch do succeed (sometimes... >> Travis is super flakey at the moment, because of some corrupted cached >> dependencies): https://travis-ci.org/apache/flink/jobs/148348699 >> >> On Fri, Jul 29, 2016 at 4:19 PM, Aljoscha Krettek <[hidden email]> >> wrote: >> > When running "mvn clean verify" with Hadoop version 2.6.1 the >> > Zookeeper/Leader Election tests fail with this: >> > >> > java.lang.NoSuchMethodError: >> > >> > org.apache.curator.utils.PathUtils.validatePath(Ljava/lang/String;)Ljava/lang/String; >> > at >> > >> > org.apache.curator.framework.imps.NamespaceImpl.<init>(NamespaceImpl.java:37) >> > at >> > >> > org.apache.curator.framework.imps.CuratorFrameworkImpl.<init>(CuratorFrameworkImpl.java:113) >> > at >> > >> > org.apache.curator.framework.CuratorFrameworkFactory$Builder.build(CuratorFrameworkFactory.java:124) >> > at >> > >> > org.apache.flink.runtime.util.ZooKeeperUtils.startCuratorFramework(ZooKeeperUtils.java:101) >> > at >> > >> > org.apache.flink.runtime.util.ZooKeeperUtils.createLeaderRetrievalService(ZooKeeperUtils.java:143) >> > at >> > >> > org.apache.flink.runtime.util.LeaderRetrievalUtils.createLeaderRetrievalService(LeaderRetrievalUtils.java:70) >> > at >> > >> > org.apache.flink.runtime.leaderelection.ZooKeeperLeaderRetrievalTest.testTimeoutOfFindConnectingAddress(ZooKeeperLeaderRetrievalTest.java:187) >> > >> > I'll continue testing other parts and other Hadoop versions. >> > >> > On Wed, 27 Jul 2016 at 11:51 Ufuk Celebi <[hidden email]> wrote: >> > >> >> Dear Flink community, >> >> >> >> Please vote on releasing the following candidate as Apache Flink >> >> version >> >> 1.1.0. >> >> >> >> I've CC'd [hidden email] as users are encouraged to help >> >> testing Flink 1.1.0 for their specific use cases. Please feel free to >> >> report issues and successful tests on [hidden email]. >> >> >> >> The commit to be voted on: >> >> 3a18463 (http://git-wip-us.apache.org/repos/asf/flink/commit/3a18463) >> >> >> >> Branch: >> >> release-1.1.0-rc1 >> >> ( >> >> >> >> https://git1-us-west.apache.org/repos/asf/flink/repo?p=flink.git;a=shortlog;h=refs/heads/release-1.1.0-rc1 >> >> ) >> >> >> >> The release artifacts to be voted on can be found at: >> >> http://people.apache.org/~uce/flink-1.1.0-rc1/ >> >> >> >> The release artifacts are signed with the key with fingerprint >> >> 9D403309: >> >> http://www.apache.org/dist/flink/KEYS >> >> >> >> The staging repository for this release can be found at: >> >> https://repository.apache.org/content/repositories/orgapacheflink-1098 >> >> >> >> There is also a Google doc to coordinate the testing efforts. This is >> >> a copy of the release document found in our Wiki: >> >> >> >> >> >> https://docs.google.com/document/d/1cDZGtnGJKLU1fLw8AE_FzkoDLOR8amYT2oc3mD0_lw4/edit?usp=sharing >> >> >> >> ------------------------------------------------------------- >> >> >> >> Thanks to everyone who contributed to this release candidate. >> >> >> >> The vote is open for the next 3 days (not counting the weekend) and >> >> passes if a majority of at least three +1 PMC votes are cast. >> >> >> >> The vote ends on Monday August 1st, 2016. >> >> >> >> [ ] +1 Release this package as Apache Flink 1.1.0 >> >> [ ] -1 Do not release this package, because ... >> >> > > |
This is also a major issue for batch with off-heap memory and memory
preallocation turned off: https://issues.apache.org/jira/browse/FLINK-4094 Not hard to fix though as we simply need to reliably clear the direct memory instead of relying on garbage collection. Another possible fix is to maintain memory pools independently of the preallocation mode. I think this is fine because preallocation:false suggests that no memory will be preallocated but not that memory will be freed once acquired. |
I tried it again now. I did:
rm -r .m2/repository mvn clean verify -Dhadoop.version=2.6.0 failed again. Also with versions 2.6.1 and 2.6.3. On Mon, 1 Aug 2016 at 08:23 Maximilian Michels <[hidden email]> wrote: > This is also a major issue for batch with off-heap memory and memory > preallocation turned off: > https://issues.apache.org/jira/browse/FLINK-4094 > Not hard to fix though as we simply need to reliably clear the direct > memory instead of relying on garbage collection. Another possible fix > is to maintain memory pools independently of the preallocation mode. I > think this is fine because preallocation:false suggests that no memory > will be preallocated but not that memory will be freed once acquired. > |
I think that FLINK-4094 is nice to fix but not a release blocker since we
know how to prevent this situation (setting preallocation to true). On Mon, Aug 1, 2016 at 11:56 PM, Aljoscha Krettek <[hidden email]> wrote: > I tried it again now. I did: > > rm -r .m2/repository > mvn clean verify -Dhadoop.version=2.6.0 > > failed again. Also with versions 2.6.1 and 2.6.3. > > On Mon, 1 Aug 2016 at 08:23 Maximilian Michels <[hidden email]> wrote: > > > This is also a major issue for batch with off-heap memory and memory > > preallocation turned off: > > https://issues.apache.org/jira/browse/FLINK-4094 > > Not hard to fix though as we simply need to reliably clear the direct > > memory instead of relying on garbage collection. Another possible fix > > is to maintain memory pools independently of the preallocation mode. I > > think this is fine because preallocation:false suggests that no memory > > will be preallocated but not that memory will be freed once acquired. > > > |
In reply to this post by Aljoscha Krettek-2
Which Maven version are you using?
On Mon, Aug 1, 2016 at 5:56 PM, Aljoscha Krettek <[hidden email]> wrote: > I tried it again now. I did: > > rm -r .m2/repository > mvn clean verify -Dhadoop.version=2.6.0 > > failed again. Also with versions 2.6.1 and 2.6.3. > > On Mon, 1 Aug 2016 at 08:23 Maximilian Michels <[hidden email]> wrote: > >> This is also a major issue for batch with off-heap memory and memory >> preallocation turned off: >> https://issues.apache.org/jira/browse/FLINK-4094 >> Not hard to fix though as we simply need to reliably clear the direct >> memory instead of relying on garbage collection. Another possible fix >> is to maintain memory pools independently of the preallocation mode. I >> think this is fine because preallocation:false suggests that no memory >> will be preallocated but not that memory will be freed once acquired. >> |
In reply to this post by Aljoscha Krettek-2
@Aljoscha: Have you made sure you have a clean maven cache (remove the
.m2/repository/org/apache/flink folder)? On Mon, Aug 1, 2016 at 5:56 PM, Aljoscha Krettek <[hidden email]> wrote: > I tried it again now. I did: > > rm -r .m2/repository > mvn clean verify -Dhadoop.version=2.6.0 > > failed again. Also with versions 2.6.1 and 2.6.3. > > On Mon, 1 Aug 2016 at 08:23 Maximilian Michels <[hidden email]> wrote: > > > This is also a major issue for batch with off-heap memory and memory > > preallocation turned off: > > https://issues.apache.org/jira/browse/FLINK-4094 > > Not hard to fix though as we simply need to reliably clear the direct > > memory instead of relying on garbage collection. Another possible fix > > is to maintain memory pools independently of the preallocation mode. I > > think this is fine because preallocation:false suggests that no memory > > will be preallocated but not that memory will be freed once acquired. > > > |
@Ufuk: 3.3.9, that's probably it because that messes with the shading,
right? @Stephan: Yes, even did a "rm -r .m2/repository". But the maven version is most likely the reason. On Mon, 1 Aug 2016 at 10:59 Stephan Ewen <[hidden email]> wrote: > @Aljoscha: Have you made sure you have a clean maven cache (remove the > .m2/repository/org/apache/flink folder)? > > On Mon, Aug 1, 2016 at 5:56 PM, Aljoscha Krettek <[hidden email]> > wrote: > > > I tried it again now. I did: > > > > rm -r .m2/repository > > mvn clean verify -Dhadoop.version=2.6.0 > > > > failed again. Also with versions 2.6.1 and 2.6.3. > > > > On Mon, 1 Aug 2016 at 08:23 Maximilian Michels <[hidden email]> wrote: > > > > > This is also a major issue for batch with off-heap memory and memory > > > preallocation turned off: > > > https://issues.apache.org/jira/browse/FLINK-4094 > > > Not hard to fix though as we simply need to reliably clear the direct > > > memory instead of relying on garbage collection. Another possible fix > > > is to maintain memory pools independently of the preallocation mode. I > > > think this is fine because preallocation:false suggests that no memory > > > will be preallocated but not that memory will be freed once acquired. > > > > > > |
I can confirm Aljoscha's findings concerning building Flink with Hadoop
version 2.6.0 using Maven 3.3.9. Aljoscha is right that it is indeed a Maven 3.3 issue. If you build flink-runtime twice, then everything goes through because the shaded curator Flink dependency is installed in during the first run. On Tue, Aug 2, 2016 at 5:09 AM, Aljoscha Krettek <[hidden email]> wrote: > @Ufuk: 3.3.9, that's probably it because that messes with the shading, > right? > > @Stephan: Yes, even did a "rm -r .m2/repository". But the maven version is > most likely the reason. > > On Mon, 1 Aug 2016 at 10:59 Stephan Ewen <[hidden email]> wrote: > > > @Aljoscha: Have you made sure you have a clean maven cache (remove the > > .m2/repository/org/apache/flink folder)? > > > > On Mon, Aug 1, 2016 at 5:56 PM, Aljoscha Krettek <[hidden email]> > > wrote: > > > > > I tried it again now. I did: > > > > > > rm -r .m2/repository > > > mvn clean verify -Dhadoop.version=2.6.0 > > > > > > failed again. Also with versions 2.6.1 and 2.6.3. > > > > > > On Mon, 1 Aug 2016 at 08:23 Maximilian Michels <[hidden email]> wrote: > > > > > > > This is also a major issue for batch with off-heap memory and memory > > > > preallocation turned off: > > > > https://issues.apache.org/jira/browse/FLINK-4094 > > > > Not hard to fix though as we simply need to reliably clear the direct > > > > memory instead of relying on garbage collection. Another possible fix > > > > is to maintain memory pools independently of the preallocation mode. > I > > > > think this is fine because preallocation:false suggests that no > memory > > > > will be preallocated but not that memory will be freed once acquired. > > > > > > > > > > |
Dear community,
I would like to vote +1, but during testing I've noted that we should have reverted FLINK-4154 (correction of murmur hash) for this release. We had a wrong murmur hash implementation for 1.0, which was fixed for 1.1. We reverted that fix, because we thought that it broke savepoint compatibility between 1.0 and 1.1. That revert is part of RC1. It turns out though that there are other problems with savepoint compatibility which are independent of the hash function. Therefore I would like to revert it again and create a new RC with only this extra commit and extend the vote for one day. Would you be OK with this? Most testing results should be applicable to RC2, too. I ran the following tests: + Check checksums and signatures + Verify no binaries in source release + Build (clean verify) with default Hadoop version + Build (clean verify) with Hadoop 2.6.1 + Checked build for Scala 2.11 + Checked all POMs + Read README.md + Examined OUT and LOG files + Checked paths with spaces (found non-blocking issue with YARN CLI) + Checked local, cluster mode, and multi-node cluster + Tested HDFS split assignment + Tested bin/flink command line + Tested recovery (master and worker failure) in standalone mode with RocksDB and HDFS + Tested Scala/SBT giter8 template + Tested Metrics (user defined metrics, multiple JMX reporters, JM metrics, user defined reporter) – Ufuk On Tue, Aug 2, 2016 at 10:13 AM, Till Rohrmann <[hidden email]> wrote: > I can confirm Aljoscha's findings concerning building Flink with Hadoop > version 2.6.0 using Maven 3.3.9. Aljoscha is right that it is indeed a > Maven 3.3 issue. If you build flink-runtime twice, then everything goes > through because the shaded curator Flink dependency is installed in during > the first run. > > On Tue, Aug 2, 2016 at 5:09 AM, Aljoscha Krettek <[hidden email]> > wrote: > >> @Ufuk: 3.3.9, that's probably it because that messes with the shading, >> right? >> >> @Stephan: Yes, even did a "rm -r .m2/repository". But the maven version is >> most likely the reason. >> >> On Mon, 1 Aug 2016 at 10:59 Stephan Ewen <[hidden email]> wrote: >> >> > @Aljoscha: Have you made sure you have a clean maven cache (remove the >> > .m2/repository/org/apache/flink folder)? >> > >> > On Mon, Aug 1, 2016 at 5:56 PM, Aljoscha Krettek <[hidden email]> >> > wrote: >> > >> > > I tried it again now. I did: >> > > >> > > rm -r .m2/repository >> > > mvn clean verify -Dhadoop.version=2.6.0 >> > > >> > > failed again. Also with versions 2.6.1 and 2.6.3. >> > > >> > > On Mon, 1 Aug 2016 at 08:23 Maximilian Michels <[hidden email]> wrote: >> > > >> > > > This is also a major issue for batch with off-heap memory and memory >> > > > preallocation turned off: >> > > > https://issues.apache.org/jira/browse/FLINK-4094 >> > > > Not hard to fix though as we simply need to reliably clear the direct >> > > > memory instead of relying on garbage collection. Another possible fix >> > > > is to maintain memory pools independently of the preallocation mode. >> I >> > > > think this is fine because preallocation:false suggests that no >> memory >> > > > will be preallocated but not that memory will be freed once acquired. >> > > > >> > > >> > >> |
+1 from my side
Create a new RC that differs only in the hash function commit. I would support to carry forward the vote thread (extend it for one additional day), because virtually all test results should apply to the new RC as well. We certainly need to redo: - signature validation - Build & integration tests (that should catch any potential error caused by a change of hash function) That is pretty lightweight, should be good within a day. On Tue, Aug 2, 2016 at 10:43 AM, Ufuk Celebi <[hidden email]> wrote: > Dear community, > > I would like to vote +1, but during testing I've noted that we should > have reverted FLINK-4154 (correction of murmur hash) for this release. > > We had a wrong murmur hash implementation for 1.0, which was fixed for > 1.1. We reverted that fix, because we thought that it broke savepoint > compatibility between 1.0 and 1.1. That revert is part of RC1. It > turns out though that there are other problems with savepoint > compatibility which are independent of the hash function. Therefore I > would like to revert it again and create a new RC with only this extra > commit and extend the vote for one day. > > Would you be OK with this? Most testing results should be applicable > to RC2, too. > > I ran the following tests: > > + Check checksums and signatures > + Verify no binaries in source release > + Build (clean verify) with default Hadoop version > + Build (clean verify) with Hadoop 2.6.1 > + Checked build for Scala 2.11 > + Checked all POMs > + Read README.md > + Examined OUT and LOG files > + Checked paths with spaces (found non-blocking issue with YARN CLI) > + Checked local, cluster mode, and multi-node cluster > + Tested HDFS split assignment > + Tested bin/flink command line > + Tested recovery (master and worker failure) in standalone mode with > RocksDB and HDFS > + Tested Scala/SBT giter8 template > + Tested Metrics (user defined metrics, multiple JMX reporters, JM > metrics, user defined reporter) > > – Ufuk > > > On Tue, Aug 2, 2016 at 10:13 AM, Till Rohrmann <[hidden email]> > wrote: > > I can confirm Aljoscha's findings concerning building Flink with Hadoop > > version 2.6.0 using Maven 3.3.9. Aljoscha is right that it is indeed a > > Maven 3.3 issue. If you build flink-runtime twice, then everything goes > > through because the shaded curator Flink dependency is installed in > during > > the first run. > > > > On Tue, Aug 2, 2016 at 5:09 AM, Aljoscha Krettek <[hidden email]> > > wrote: > > > >> @Ufuk: 3.3.9, that's probably it because that messes with the shading, > >> right? > >> > >> @Stephan: Yes, even did a "rm -r .m2/repository". But the maven version > is > >> most likely the reason. > >> > >> On Mon, 1 Aug 2016 at 10:59 Stephan Ewen <[hidden email]> wrote: > >> > >> > @Aljoscha: Have you made sure you have a clean maven cache (remove the > >> > .m2/repository/org/apache/flink folder)? > >> > > >> > On Mon, Aug 1, 2016 at 5:56 PM, Aljoscha Krettek <[hidden email] > > > >> > wrote: > >> > > >> > > I tried it again now. I did: > >> > > > >> > > rm -r .m2/repository > >> > > mvn clean verify -Dhadoop.version=2.6.0 > >> > > > >> > > failed again. Also with versions 2.6.1 and 2.6.3. > >> > > > >> > > On Mon, 1 Aug 2016 at 08:23 Maximilian Michels <[hidden email]> > wrote: > >> > > > >> > > > This is also a major issue for batch with off-heap memory and > memory > >> > > > preallocation turned off: > >> > > > https://issues.apache.org/jira/browse/FLINK-4094 > >> > > > Not hard to fix though as we simply need to reliably clear the > direct > >> > > > memory instead of relying on garbage collection. Another possible > fix > >> > > > is to maintain memory pools independently of the preallocation > mode. > >> I > >> > > > think this is fine because preallocation:false suggests that no > >> memory > >> > > > will be preallocated but not that memory will be freed once > acquired. > >> > > > > >> > > > >> > > >> > |
I agree with Ufuk and Stephan that we could forward most of the
testing if we only included the hash function fix in the new RC. There are some other minor issues we could merge as well, but they are involved enough that they would set us back to redoing the testing. So +1 for a new RC with the hash function fix. On Tue, Aug 2, 2016 at 12:35 PM, Stephan Ewen <[hidden email]> wrote: > +1 from my side > > Create a new RC that differs only in the hash function commit. > I would support to carry forward the vote thread (extend it for one > additional day), because virtually all test results should apply to the new > RC as well. > > We certainly need to redo: > - signature validation > - Build & integration tests (that should catch any potential error caused > by a change of hash function) > > That is pretty lightweight, should be good within a day. > > > On Tue, Aug 2, 2016 at 10:43 AM, Ufuk Celebi <[hidden email]> wrote: > >> Dear community, >> >> I would like to vote +1, but during testing I've noted that we should >> have reverted FLINK-4154 (correction of murmur hash) for this release. >> >> We had a wrong murmur hash implementation for 1.0, which was fixed for >> 1.1. We reverted that fix, because we thought that it broke savepoint >> compatibility between 1.0 and 1.1. That revert is part of RC1. It >> turns out though that there are other problems with savepoint >> compatibility which are independent of the hash function. Therefore I >> would like to revert it again and create a new RC with only this extra >> commit and extend the vote for one day. >> >> Would you be OK with this? Most testing results should be applicable >> to RC2, too. >> >> I ran the following tests: >> >> + Check checksums and signatures >> + Verify no binaries in source release >> + Build (clean verify) with default Hadoop version >> + Build (clean verify) with Hadoop 2.6.1 >> + Checked build for Scala 2.11 >> + Checked all POMs >> + Read README.md >> + Examined OUT and LOG files >> + Checked paths with spaces (found non-blocking issue with YARN CLI) >> + Checked local, cluster mode, and multi-node cluster >> + Tested HDFS split assignment >> + Tested bin/flink command line >> + Tested recovery (master and worker failure) in standalone mode with >> RocksDB and HDFS >> + Tested Scala/SBT giter8 template >> + Tested Metrics (user defined metrics, multiple JMX reporters, JM >> metrics, user defined reporter) >> >> – Ufuk >> >> >> On Tue, Aug 2, 2016 at 10:13 AM, Till Rohrmann <[hidden email]> >> wrote: >> > I can confirm Aljoscha's findings concerning building Flink with Hadoop >> > version 2.6.0 using Maven 3.3.9. Aljoscha is right that it is indeed a >> > Maven 3.3 issue. If you build flink-runtime twice, then everything goes >> > through because the shaded curator Flink dependency is installed in >> during >> > the first run. >> > >> > On Tue, Aug 2, 2016 at 5:09 AM, Aljoscha Krettek <[hidden email]> >> > wrote: >> > >> >> @Ufuk: 3.3.9, that's probably it because that messes with the shading, >> >> right? >> >> >> >> @Stephan: Yes, even did a "rm -r .m2/repository". But the maven version >> is >> >> most likely the reason. >> >> >> >> On Mon, 1 Aug 2016 at 10:59 Stephan Ewen <[hidden email]> wrote: >> >> >> >> > @Aljoscha: Have you made sure you have a clean maven cache (remove the >> >> > .m2/repository/org/apache/flink folder)? >> >> > >> >> > On Mon, Aug 1, 2016 at 5:56 PM, Aljoscha Krettek <[hidden email] >> > >> >> > wrote: >> >> > >> >> > > I tried it again now. I did: >> >> > > >> >> > > rm -r .m2/repository >> >> > > mvn clean verify -Dhadoop.version=2.6.0 >> >> > > >> >> > > failed again. Also with versions 2.6.1 and 2.6.3. >> >> > > >> >> > > On Mon, 1 Aug 2016 at 08:23 Maximilian Michels <[hidden email]> >> wrote: >> >> > > >> >> > > > This is also a major issue for batch with off-heap memory and >> memory >> >> > > > preallocation turned off: >> >> > > > https://issues.apache.org/jira/browse/FLINK-4094 >> >> > > > Not hard to fix though as we simply need to reliably clear the >> direct >> >> > > > memory instead of relying on garbage collection. Another possible >> fix >> >> > > > is to maintain memory pools independently of the preallocation >> mode. >> >> I >> >> > > > think this is fine because preallocation:false suggests that no >> >> memory >> >> > > > will be preallocated but not that memory will be freed once >> acquired. >> >> > > > >> >> > > >> >> > >> >> >> |
I just saw that we changed the behaviour of ListState and
FoldingState. They used to return the default value given to the state descriptor, but have been changed to return null now (in [1]). Furthermore ValueState still returns the default value instead of null. Gyula noticed another inconsistency for GenericListState and GenericFoldingState in [2]. The state interfaces are annotated with @PublicEvolving, so technically it should be OK to change this, but I wanted to double check that everyone is aware of this. Do we want to keep it like it is or should we revert this? – Ufuk [1] https://github.com/apache/flink/commit/12bf7c1a0b81d199085fe874c64763c51a93b3bf#diff-2c622001cff86abb3e36e6621d6f73ad [2] https://issues.apache.org/jira/browse/FLINK-4275 On Tue, Aug 2, 2016 at 1:37 PM, Maximilian Michels <[hidden email]> wrote: > I agree with Ufuk and Stephan that we could forward most of the > testing if we only included the hash function fix in the new RC. There > are some other minor issues we could merge as well, but they are > involved enough that they would set us back to redoing the testing. So > +1 for a new RC with the hash function fix. > > On Tue, Aug 2, 2016 at 12:35 PM, Stephan Ewen <[hidden email]> wrote: >> +1 from my side >> >> Create a new RC that differs only in the hash function commit. >> I would support to carry forward the vote thread (extend it for one >> additional day), because virtually all test results should apply to the new >> RC as well. >> >> We certainly need to redo: >> - signature validation >> - Build & integration tests (that should catch any potential error caused >> by a change of hash function) >> >> That is pretty lightweight, should be good within a day. >> >> >> On Tue, Aug 2, 2016 at 10:43 AM, Ufuk Celebi <[hidden email]> wrote: >> >>> Dear community, >>> >>> I would like to vote +1, but during testing I've noted that we should >>> have reverted FLINK-4154 (correction of murmur hash) for this release. >>> >>> We had a wrong murmur hash implementation for 1.0, which was fixed for >>> 1.1. We reverted that fix, because we thought that it broke savepoint >>> compatibility between 1.0 and 1.1. That revert is part of RC1. It >>> turns out though that there are other problems with savepoint >>> compatibility which are independent of the hash function. Therefore I >>> would like to revert it again and create a new RC with only this extra >>> commit and extend the vote for one day. >>> >>> Would you be OK with this? Most testing results should be applicable >>> to RC2, too. >>> >>> I ran the following tests: >>> >>> + Check checksums and signatures >>> + Verify no binaries in source release >>> + Build (clean verify) with default Hadoop version >>> + Build (clean verify) with Hadoop 2.6.1 >>> + Checked build for Scala 2.11 >>> + Checked all POMs >>> + Read README.md >>> + Examined OUT and LOG files >>> + Checked paths with spaces (found non-blocking issue with YARN CLI) >>> + Checked local, cluster mode, and multi-node cluster >>> + Tested HDFS split assignment >>> + Tested bin/flink command line >>> + Tested recovery (master and worker failure) in standalone mode with >>> RocksDB and HDFS >>> + Tested Scala/SBT giter8 template >>> + Tested Metrics (user defined metrics, multiple JMX reporters, JM >>> metrics, user defined reporter) >>> >>> – Ufuk >>> >>> >>> On Tue, Aug 2, 2016 at 10:13 AM, Till Rohrmann <[hidden email]> >>> wrote: >>> > I can confirm Aljoscha's findings concerning building Flink with Hadoop >>> > version 2.6.0 using Maven 3.3.9. Aljoscha is right that it is indeed a >>> > Maven 3.3 issue. If you build flink-runtime twice, then everything goes >>> > through because the shaded curator Flink dependency is installed in >>> during >>> > the first run. >>> > >>> > On Tue, Aug 2, 2016 at 5:09 AM, Aljoscha Krettek <[hidden email]> >>> > wrote: >>> > >>> >> @Ufuk: 3.3.9, that's probably it because that messes with the shading, >>> >> right? >>> >> >>> >> @Stephan: Yes, even did a "rm -r .m2/repository". But the maven version >>> is >>> >> most likely the reason. >>> >> >>> >> On Mon, 1 Aug 2016 at 10:59 Stephan Ewen <[hidden email]> wrote: >>> >> >>> >> > @Aljoscha: Have you made sure you have a clean maven cache (remove the >>> >> > .m2/repository/org/apache/flink folder)? >>> >> > >>> >> > On Mon, Aug 1, 2016 at 5:56 PM, Aljoscha Krettek <[hidden email] >>> > >>> >> > wrote: >>> >> > >>> >> > > I tried it again now. I did: >>> >> > > >>> >> > > rm -r .m2/repository >>> >> > > mvn clean verify -Dhadoop.version=2.6.0 >>> >> > > >>> >> > > failed again. Also with versions 2.6.1 and 2.6.3. >>> >> > > >>> >> > > On Mon, 1 Aug 2016 at 08:23 Maximilian Michels <[hidden email]> >>> wrote: >>> >> > > >>> >> > > > This is also a major issue for batch with off-heap memory and >>> memory >>> >> > > > preallocation turned off: >>> >> > > > https://issues.apache.org/jira/browse/FLINK-4094 >>> >> > > > Not hard to fix though as we simply need to reliably clear the >>> direct >>> >> > > > memory instead of relying on garbage collection. Another possible >>> fix >>> >> > > > is to maintain memory pools independently of the preallocation >>> mode. >>> >> I >>> >> > > > think this is fine because preallocation:false suggests that no >>> >> memory >>> >> > > > will be preallocated but not that memory will be freed once >>> acquired. >>> >> > > > >>> >> > > >>> >> > >>> >> >>> |
@Ufuk - I agree, this looks quite dubious.
Need to resolve that before proceeding with the release... On Tue, Aug 2, 2016 at 1:45 PM, Ufuk Celebi <[hidden email]> wrote: > I just saw that we changed the behaviour of ListState and > FoldingState. They used to return the default value given to the state > descriptor, but have been changed to return null now (in [1]). > Furthermore ValueState still returns the default value instead of > null. Gyula noticed another inconsistency for GenericListState and > GenericFoldingState in [2]. > > The state interfaces are annotated with @PublicEvolving, so > technically it should be OK to change this, but I wanted to double > check that everyone is aware of this. Do we want to keep it like it is > or should we revert this? > > – Ufuk > > [1] > https://github.com/apache/flink/commit/12bf7c1a0b81d199085fe874c64763c51a93b3bf#diff-2c622001cff86abb3e36e6621d6f73ad > [2] https://issues.apache.org/jira/browse/FLINK-4275 > > On Tue, Aug 2, 2016 at 1:37 PM, Maximilian Michels <[hidden email]> wrote: > > I agree with Ufuk and Stephan that we could forward most of the > > testing if we only included the hash function fix in the new RC. There > > are some other minor issues we could merge as well, but they are > > involved enough that they would set us back to redoing the testing. So > > +1 for a new RC with the hash function fix. > > > > On Tue, Aug 2, 2016 at 12:35 PM, Stephan Ewen <[hidden email]> wrote: > >> +1 from my side > >> > >> Create a new RC that differs only in the hash function commit. > >> I would support to carry forward the vote thread (extend it for one > >> additional day), because virtually all test results should apply to the > new > >> RC as well. > >> > >> We certainly need to redo: > >> - signature validation > >> - Build & integration tests (that should catch any potential error > caused > >> by a change of hash function) > >> > >> That is pretty lightweight, should be good within a day. > >> > >> > >> On Tue, Aug 2, 2016 at 10:43 AM, Ufuk Celebi <[hidden email]> wrote: > >> > >>> Dear community, > >>> > >>> I would like to vote +1, but during testing I've noted that we should > >>> have reverted FLINK-4154 (correction of murmur hash) for this release. > >>> > >>> We had a wrong murmur hash implementation for 1.0, which was fixed for > >>> 1.1. We reverted that fix, because we thought that it broke savepoint > >>> compatibility between 1.0 and 1.1. That revert is part of RC1. It > >>> turns out though that there are other problems with savepoint > >>> compatibility which are independent of the hash function. Therefore I > >>> would like to revert it again and create a new RC with only this extra > >>> commit and extend the vote for one day. > >>> > >>> Would you be OK with this? Most testing results should be applicable > >>> to RC2, too. > >>> > >>> I ran the following tests: > >>> > >>> + Check checksums and signatures > >>> + Verify no binaries in source release > >>> + Build (clean verify) with default Hadoop version > >>> + Build (clean verify) with Hadoop 2.6.1 > >>> + Checked build for Scala 2.11 > >>> + Checked all POMs > >>> + Read README.md > >>> + Examined OUT and LOG files > >>> + Checked paths with spaces (found non-blocking issue with YARN CLI) > >>> + Checked local, cluster mode, and multi-node cluster > >>> + Tested HDFS split assignment > >>> + Tested bin/flink command line > >>> + Tested recovery (master and worker failure) in standalone mode with > >>> RocksDB and HDFS > >>> + Tested Scala/SBT giter8 template > >>> + Tested Metrics (user defined metrics, multiple JMX reporters, JM > >>> metrics, user defined reporter) > >>> > >>> – Ufuk > >>> > >>> > >>> On Tue, Aug 2, 2016 at 10:13 AM, Till Rohrmann <[hidden email]> > >>> wrote: > >>> > I can confirm Aljoscha's findings concerning building Flink with > Hadoop > >>> > version 2.6.0 using Maven 3.3.9. Aljoscha is right that it is indeed > a > >>> > Maven 3.3 issue. If you build flink-runtime twice, then everything > goes > >>> > through because the shaded curator Flink dependency is installed in > >>> during > >>> > the first run. > >>> > > >>> > On Tue, Aug 2, 2016 at 5:09 AM, Aljoscha Krettek < > [hidden email]> > >>> > wrote: > >>> > > >>> >> @Ufuk: 3.3.9, that's probably it because that messes with the > shading, > >>> >> right? > >>> >> > >>> >> @Stephan: Yes, even did a "rm -r .m2/repository". But the maven > version > >>> is > >>> >> most likely the reason. > >>> >> > >>> >> On Mon, 1 Aug 2016 at 10:59 Stephan Ewen <[hidden email]> wrote: > >>> >> > >>> >> > @Aljoscha: Have you made sure you have a clean maven cache > (remove the > >>> >> > .m2/repository/org/apache/flink folder)? > >>> >> > > >>> >> > On Mon, Aug 1, 2016 at 5:56 PM, Aljoscha Krettek < > [hidden email] > >>> > > >>> >> > wrote: > >>> >> > > >>> >> > > I tried it again now. I did: > >>> >> > > > >>> >> > > rm -r .m2/repository > >>> >> > > mvn clean verify -Dhadoop.version=2.6.0 > >>> >> > > > >>> >> > > failed again. Also with versions 2.6.1 and 2.6.3. > >>> >> > > > >>> >> > > On Mon, 1 Aug 2016 at 08:23 Maximilian Michels <[hidden email]> > >>> wrote: > >>> >> > > > >>> >> > > > This is also a major issue for batch with off-heap memory and > >>> memory > >>> >> > > > preallocation turned off: > >>> >> > > > https://issues.apache.org/jira/browse/FLINK-4094 > >>> >> > > > Not hard to fix though as we simply need to reliably clear the > >>> direct > >>> >> > > > memory instead of relying on garbage collection. Another > possible > >>> fix > >>> >> > > > is to maintain memory pools independently of the preallocation > >>> mode. > >>> >> I > >>> >> > > > think this is fine because preallocation:false suggests that > no > >>> >> memory > >>> >> > > > will be preallocated but not that memory will be freed once > >>> acquired. > >>> >> > > > > >>> >> > > > >>> >> > > >>> >> > >>> > |
Free forum by Nabble | Edit this page |