Hi
everyone, Please
review and vote on the release candidate #2 for the version
1.13.0, as follows: [ ]
+1, Approve the release [ ]
-1, Do not approve the release (please provide specific
comments) The
complete staging area is available for your review, which
includes: *
JIRA release notes [1], *
the official Apache source release and binary convenience
releases to be deployed to dist.apache.org [2], which are signed
with the key with fingerprint
*
all artifacts to be deployed to the Maven Central Repository
[4], *
source code tag "release-1.13.0-rc2" [5], *
website pull request listing the new release and adding
announcement blog post [6]. The
vote will be open for at least 72 hours. It is adopted by
majority approval, with at least 3 PMC affirmative votes. Thanks, Dawid
Wysakowicz OpenPGP_signature (855 bytes) Download Attachment |
Hi Dawid,
I have just merged the doc change for hive dialect [1], which talks about how to use hive dialect to run hive queries. Do you think we can have another RC to include it? Sorry for the trouble. [1] https://issues.apache.org/jira/browse/FLINK-22119 On Sat, Apr 24, 2021 at 1:36 AM Dawid Wysakowicz <[hidden email]> wrote: > Hi everyone, > Please review and vote on the release candidate #2 for the version 1.13.0, > as follows: > [ ] +1, Approve the release > [ ] -1, Do not approve the release (please provide specific comments) > > > The complete staging area is available for your review, which includes: > * JIRA release notes [1], > * the official Apache source release and binary convenience releases to be > deployed to dist.apache.org [2], which are signed with the key with > fingerprint 31D2DD10BFC15A2D [3], > * all artifacts to be deployed to the Maven Central Repository [4], > * source code tag "release-1.13.0-rc2" [5], > * website pull request listing the new release and adding announcement > blog post [6]. > > The vote will be open for at least 72 hours. It is adopted by majority > approval, with at least 3 PMC affirmative votes. > > Thanks, > Dawid Wysakowicz > > [1] > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12349287 > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.13.0-rc2/ > [3] https://dist.apache.org/repos/dist/release/flink/KEYS > [4] > https://repository.apache.org/content/repositories/orgapacheflink-1420/ > [5] https://github.com/apache/flink/tree/release-1.13.0-rc2 > [6] https://github.com/apache/flink-web/pull/436 > -- Best regards! Rui Li |
Hi Rui,
Documentation is not a part of the release package. The changes to the documentation will take effect in 24 hours and can be accessed on the web [1] then. Docs can even be updated after 1.13 is released. So we don't need to cancel the RC for this. Best, Jark [1]: https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/connectors/table/hive/overview/ On Sun, 25 Apr 2021 at 13:47, Rui Li <[hidden email]> wrote: > Hi Dawid, > > I have just merged the doc change for hive dialect [1], which talks about > how to use hive dialect to run hive queries. Do you think we can have > another RC to include it? Sorry for the trouble. > > [1] https://issues.apache.org/jira/browse/FLINK-22119 > > On Sat, Apr 24, 2021 at 1:36 AM Dawid Wysakowicz <[hidden email]> > wrote: > > > Hi everyone, > > Please review and vote on the release candidate #2 for the version > 1.13.0, > > as follows: > > [ ] +1, Approve the release > > [ ] -1, Do not approve the release (please provide specific comments) > > > > > > The complete staging area is available for your review, which includes: > > * JIRA release notes [1], > > * the official Apache source release and binary convenience releases to > be > > deployed to dist.apache.org [2], which are signed with the key with > > fingerprint 31D2DD10BFC15A2D [3], > > * all artifacts to be deployed to the Maven Central Repository [4], > > * source code tag "release-1.13.0-rc2" [5], > > * website pull request listing the new release and adding announcement > > blog post [6]. > > > > The vote will be open for at least 72 hours. It is adopted by majority > > approval, with at least 3 PMC affirmative votes. > > > > Thanks, > > Dawid Wysakowicz > > > > [1] > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12349287 > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.13.0-rc2/ > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS > > [4] > > https://repository.apache.org/content/repositories/orgapacheflink-1420/ > > [5] https://github.com/apache/flink/tree/release-1.13.0-rc2 > > [6] https://github.com/apache/flink-web/pull/436 > > > > > -- > Best regards! > Rui Li > |
I see. Thanks Jark for the clarification.
On Sun, Apr 25, 2021 at 10:36 PM Jark Wu <[hidden email]> wrote: > Hi Rui, > > Documentation is not a part of the release package. > The changes to the documentation will take effect in 24 hours > and can be accessed on the web [1] then. > Docs can even be updated after 1.13 is released. > So we don't need to cancel the RC for this. > > Best, > Jark > > [1]: > > https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/connectors/table/hive/overview/ > > On Sun, 25 Apr 2021 at 13:47, Rui Li <[hidden email]> wrote: > > > Hi Dawid, > > > > I have just merged the doc change for hive dialect [1], which talks about > > how to use hive dialect to run hive queries. Do you think we can have > > another RC to include it? Sorry for the trouble. > > > > [1] https://issues.apache.org/jira/browse/FLINK-22119 > > > > On Sat, Apr 24, 2021 at 1:36 AM Dawid Wysakowicz <[hidden email] > > > > wrote: > > > > > Hi everyone, > > > Please review and vote on the release candidate #2 for the version > > 1.13.0, > > > as follows: > > > [ ] +1, Approve the release > > > [ ] -1, Do not approve the release (please provide specific comments) > > > > > > > > > The complete staging area is available for your review, which includes: > > > * JIRA release notes [1], > > > * the official Apache source release and binary convenience releases to > > be > > > deployed to dist.apache.org [2], which are signed with the key with > > > fingerprint 31D2DD10BFC15A2D [3], > > > * all artifacts to be deployed to the Maven Central Repository [4], > > > * source code tag "release-1.13.0-rc2" [5], > > > * website pull request listing the new release and adding announcement > > > blog post [6]. > > > > > > The vote will be open for at least 72 hours. It is adopted by majority > > > approval, with at least 3 PMC affirmative votes. > > > > > > Thanks, > > > Dawid Wysakowicz > > > > > > [1] > > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12349287 > > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.13.0-rc2/ > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS > > > [4] > > > > https://repository.apache.org/content/repositories/orgapacheflink-1420/ > > > [5] https://github.com/apache/flink/tree/release-1.13.0-rc2 > > > [6] https://github.com/apache/flink-web/pull/436 > > > > > > > > > -- > > Best regards! > > Rui Li > > > -- Best regards! Rui Li |
+1 (binding)
I'm not aware of any release blockers. Me and my colleagues have checked quite extensively this release for correctness of the unaligned checkpoints, finally nailing down the two remaining known bugs. I have manually checked the WebUI with it's new back-pressure monitoring tool. One bug was discovered in this area recently, which has already pending PR, but as it's not critical and very old, I wouldn't cancel the RC vote because of that. Piotrek pon., 26 kwi 2021 o 07:53 Rui Li <[hidden email]> napisał(a): > I see. Thanks Jark for the clarification. > > On Sun, Apr 25, 2021 at 10:36 PM Jark Wu <[hidden email]> wrote: > > > Hi Rui, > > > > Documentation is not a part of the release package. > > The changes to the documentation will take effect in 24 hours > > and can be accessed on the web [1] then. > > Docs can even be updated after 1.13 is released. > > So we don't need to cancel the RC for this. > > > > Best, > > Jark > > > > [1]: > > > > > https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/connectors/table/hive/overview/ > > > > On Sun, 25 Apr 2021 at 13:47, Rui Li <[hidden email]> wrote: > > > > > Hi Dawid, > > > > > > I have just merged the doc change for hive dialect [1], which talks > about > > > how to use hive dialect to run hive queries. Do you think we can have > > > another RC to include it? Sorry for the trouble. > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-22119 > > > > > > On Sat, Apr 24, 2021 at 1:36 AM Dawid Wysakowicz < > [hidden email] > > > > > > wrote: > > > > > > > Hi everyone, > > > > Please review and vote on the release candidate #2 for the version > > > 1.13.0, > > > > as follows: > > > > [ ] +1, Approve the release > > > > [ ] -1, Do not approve the release (please provide specific comments) > > > > > > > > > > > > The complete staging area is available for your review, which > includes: > > > > * JIRA release notes [1], > > > > * the official Apache source release and binary convenience releases > to > > > be > > > > deployed to dist.apache.org [2], which are signed with the key with > > > > fingerprint 31D2DD10BFC15A2D [3], > > > > * all artifacts to be deployed to the Maven Central Repository [4], > > > > * source code tag "release-1.13.0-rc2" [5], > > > > * website pull request listing the new release and adding > announcement > > > > blog post [6]. > > > > > > > > The vote will be open for at least 72 hours. It is adopted by > majority > > > > approval, with at least 3 PMC affirmative votes. > > > > > > > > Thanks, > > > > Dawid Wysakowicz > > > > > > > > [1] > > > > > > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12349287 > > > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.13.0-rc2/ > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS > > > > [4] > > > > > > https://repository.apache.org/content/repositories/orgapacheflink-1420/ > > > > [5] https://github.com/apache/flink/tree/release-1.13.0-rc2 > > > > [6] https://github.com/apache/flink-web/pull/436 > > > > > > > > > > > > > -- > > > Best regards! > > > Rui Li > > > > > > > > -- > Best regards! > Rui Li > |
+1 (non-binding)
- verified checksum and signature - built from source - executed example jobs with standalone / native kubernetes deployments, nothing unexpected - reviewed release announcement pull request Thank you~ Xintong Song On Tue, Apr 27, 2021 at 10:50 PM Piotr Nowojski <[hidden email]> wrote: > +1 (binding) > > I'm not aware of any release blockers. Me and my colleagues have checked > quite extensively this release for correctness of the unaligned > checkpoints, finally nailing down the two remaining known bugs. > I have manually checked the WebUI with it's new back-pressure > monitoring tool. One bug was discovered in this area recently, which has > already pending PR, but as it's not critical and very old, I wouldn't > cancel the RC vote because of that. > > Piotrek > > pon., 26 kwi 2021 o 07:53 Rui Li <[hidden email]> napisał(a): > > > I see. Thanks Jark for the clarification. > > > > On Sun, Apr 25, 2021 at 10:36 PM Jark Wu <[hidden email]> wrote: > > > > > Hi Rui, > > > > > > Documentation is not a part of the release package. > > > The changes to the documentation will take effect in 24 hours > > > and can be accessed on the web [1] then. > > > Docs can even be updated after 1.13 is released. > > > So we don't need to cancel the RC for this. > > > > > > Best, > > > Jark > > > > > > [1]: > > > > > > > > > https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/connectors/table/hive/overview/ > > > > > > On Sun, 25 Apr 2021 at 13:47, Rui Li <[hidden email]> wrote: > > > > > > > Hi Dawid, > > > > > > > > I have just merged the doc change for hive dialect [1], which talks > > about > > > > how to use hive dialect to run hive queries. Do you think we can have > > > > another RC to include it? Sorry for the trouble. > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-22119 > > > > > > > > On Sat, Apr 24, 2021 at 1:36 AM Dawid Wysakowicz < > > [hidden email] > > > > > > > > wrote: > > > > > > > > > Hi everyone, > > > > > Please review and vote on the release candidate #2 for the version > > > > 1.13.0, > > > > > as follows: > > > > > [ ] +1, Approve the release > > > > > [ ] -1, Do not approve the release (please provide specific > comments) > > > > > > > > > > > > > > > The complete staging area is available for your review, which > > includes: > > > > > * JIRA release notes [1], > > > > > * the official Apache source release and binary convenience > releases > > to > > > > be > > > > > deployed to dist.apache.org [2], which are signed with the key > with > > > > > fingerprint 31D2DD10BFC15A2D [3], > > > > > * all artifacts to be deployed to the Maven Central Repository [4], > > > > > * source code tag "release-1.13.0-rc2" [5], > > > > > * website pull request listing the new release and adding > > announcement > > > > > blog post [6]. > > > > > > > > > > The vote will be open for at least 72 hours. It is adopted by > > majority > > > > > approval, with at least 3 PMC affirmative votes. > > > > > > > > > > Thanks, > > > > > Dawid Wysakowicz > > > > > > > > > > [1] > > > > > > > > > > > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12349287 > > > > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.13.0-rc2/ > > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS > > > > > [4] > > > > > > > > > https://repository.apache.org/content/repositories/orgapacheflink-1420/ > > > > > [5] https://github.com/apache/flink/tree/release-1.13.0-rc2 > > > > > [6] https://github.com/apache/flink-web/pull/436 > > > > > > > > > > > > > > > > > -- > > > > Best regards! > > > > Rui Li > > > > > > > > > > > > > -- > > Best regards! > > Rui Li > > > |
-1
We're testing this version on batch jobs with large (600~1000) parallelisms and the following exception messages appear with high frequency: 2021-04-27 21:27:26 org.apache.flink.util.FlinkException: An OperatorEvent from an OperatorCoordinator to a task was lost. Triggering task failover to ensure consistency. Event: '[NoMoreSplitEvent]', targetTask: <task name> - execution #0 at org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at akka.actor.Actor$class.aroundReceive(Actor.scala:517) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Becket Qin is investigating this issue. |
@Caizhi and @Becket - let me reach out to you to jointly debug this issue.
I am wondering if there is some incorrect reporting of failed events? On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng <[hidden email]> wrote: > -1 > > We're testing this version on batch jobs with large (600~1000) parallelisms > and the following exception messages appear with high frequency: > > 2021-04-27 21:27:26 > org.apache.flink.util.FlinkException: An OperatorEvent from an > OperatorCoordinator to a task was lost. Triggering task failover to ensure > consistency. Event: '[NoMoreSplitEvent]', targetTask: <task name> - > execution #0 > at > > org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) > at > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) > at > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) > at > > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > at > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) > at > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) > at > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) > at > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > Becket Qin is investigating this issue. > |
After the investigation we found that this issue is caused by the
implementation of connector, not by the Flink framework. Sorry for the false alarm. Stephan Ewen <[hidden email]> 于2021年4月28日周三 下午3:23写道: > @Caizhi and @Becket - let me reach out to you to jointly debug this issue. > > I am wondering if there is some incorrect reporting of failed events? > > On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng <[hidden email]> wrote: > > > -1 > > > > We're testing this version on batch jobs with large (600~1000) > parallelisms > > and the following exception messages appear with high frequency: > > > > 2021-04-27 21:27:26 > > org.apache.flink.util.FlinkException: An OperatorEvent from an > > OperatorCoordinator to a task was lost. Triggering task failover to > ensure > > consistency. Event: '[NoMoreSplitEvent]', targetTask: <task name> - > > execution #0 > > at > > > > > org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) > > at > > > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) > > at > > > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) > > at > > > > > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > > at > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) > > at > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) > > at > > > > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) > > at > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > at > > > > > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > at > > > > > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > Becket Qin is investigating this issue. > > > |
Glad to hear that outcome. And no worries about the false alarm.
Thank you for doing thorough testing, this is very helpful! On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng <[hidden email]> wrote: > After the investigation we found that this issue is caused by the > implementation of connector, not by the Flink framework. > > Sorry for the false alarm. > > Stephan Ewen <[hidden email]> 于2021年4月28日周三 下午3:23写道: > > > @Caizhi and @Becket - let me reach out to you to jointly debug this > issue. > > > > I am wondering if there is some incorrect reporting of failed events? > > > > On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng <[hidden email]> > wrote: > > > > > -1 > > > > > > We're testing this version on batch jobs with large (600~1000) > > parallelisms > > > and the following exception messages appear with high frequency: > > > > > > 2021-04-27 21:27:26 > > > org.apache.flink.util.FlinkException: An OperatorEvent from an > > > OperatorCoordinator to a task was lost. Triggering task failover to > > ensure > > > consistency. Event: '[NoMoreSplitEvent]', targetTask: <task name> - > > > execution #0 > > > at > > > > > > > > > org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) > > > at > > > > > > > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) > > > at > > > > > > > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) > > > at > > > > > > > > > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > > > at > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) > > > at > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) > > > at > > > > > > > > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) > > > at > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) > > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > > > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > > > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > > > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > > > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > > > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > > > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > > > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > > > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > > > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > > at > > > > > > > > > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > > at > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > > at > > > > > > > > > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > > > Becket Qin is investigating this issue. > > > > > > |
Thanks Dawid and Guowei for managing this release.
- downloaded the sources and binaries and checked the checksums - built Flink from the downloaded sources - executed example jobs with standalone deployments - I didn't find anything suspicious in the logs - reviewed release announcement pull request - I did a pass over dependency updates: git diff release-1.12.2 release-1.13.0-rc2 */*.xml There's one thing someone should double-check whether that's suppose to be like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a dependency but I don't see it being reflected in the NOTICE file of the flink-python module. Or is this automatically added later on? +1 (non-binding; please see remark on dependency above) Matthias On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen <[hidden email]> wrote: > Glad to hear that outcome. And no worries about the false alarm. > Thank you for doing thorough testing, this is very helpful! > > On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng <[hidden email]> wrote: > > > After the investigation we found that this issue is caused by the > > implementation of connector, not by the Flink framework. > > > > Sorry for the false alarm. > > > > Stephan Ewen <[hidden email]> 于2021年4月28日周三 下午3:23写道: > > > > > @Caizhi and @Becket - let me reach out to you to jointly debug this > > issue. > > > > > > I am wondering if there is some incorrect reporting of failed events? > > > > > > On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng <[hidden email]> > > wrote: > > > > > > > -1 > > > > > > > > We're testing this version on batch jobs with large (600~1000) > > > parallelisms > > > > and the following exception messages appear with high frequency: > > > > > > > > 2021-04-27 21:27:26 > > > > org.apache.flink.util.FlinkException: An OperatorEvent from an > > > > OperatorCoordinator to a task was lost. Triggering task failover to > > > ensure > > > > consistency. Event: '[NoMoreSplitEvent]', targetTask: <task name> - > > > > execution #0 > > > > at > > > > > > > > > > > > > > org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) > > > > at > > > > > > > > > > > > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) > > > > at > > > > > > > > > > > > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) > > > > at > > > > > > > > > > > > > > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > > > > at > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) > > > > at > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) > > > > at > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) > > > > at > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) > > > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > > > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > > > > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > > > > at akka.japi.pf > .UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > > > > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > > > > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > > > at > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > > > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > > > > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > > > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > > > > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > > > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > > > > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > > > > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > > > > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > > > at > > > > > > > > > > > > > > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > > > at > > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > > > at > > > > > > > > > > > > > > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > > > > > Becket Qin is investigating this issue. > > > > > > > > > |
Hi, Matthias
Thank you very much for your careful inspection. I check the flink-python_2.11-1.13.0.jar and we do not bundle org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it. So I think we may not need to add this to the NOTICE file. (BTW The jar's scope is runtime) Best, Guowei On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl <[hidden email]> wrote: > Thanks Dawid and Guowei for managing this release. > > - downloaded the sources and binaries and checked the checksums > - built Flink from the downloaded sources > - executed example jobs with standalone deployments - I didn't find > anything suspicious in the logs > - reviewed release announcement pull request > > - I did a pass over dependency updates: git diff release-1.12.2 > release-1.13.0-rc2 */*.xml > There's one thing someone should double-check whether that's suppose to be > like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a > dependency but I don't see it being reflected in the NOTICE file of the > flink-python module. Or is this automatically added later on? > > +1 (non-binding; please see remark on dependency above) > > Matthias > > On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen <[hidden email]> wrote: > > > Glad to hear that outcome. And no worries about the false alarm. > > Thank you for doing thorough testing, this is very helpful! > > > > On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng <[hidden email]> > wrote: > > > > > After the investigation we found that this issue is caused by the > > > implementation of connector, not by the Flink framework. > > > > > > Sorry for the false alarm. > > > > > > Stephan Ewen <[hidden email]> 于2021年4月28日周三 下午3:23写道: > > > > > > > @Caizhi and @Becket - let me reach out to you to jointly debug this > > > issue. > > > > > > > > I am wondering if there is some incorrect reporting of failed events? > > > > > > > > On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng <[hidden email]> > > > wrote: > > > > > > > > > -1 > > > > > > > > > > We're testing this version on batch jobs with large (600~1000) > > > > parallelisms > > > > > and the following exception messages appear with high frequency: > > > > > > > > > > 2021-04-27 21:27:26 > > > > > org.apache.flink.util.FlinkException: An OperatorEvent from an > > > > > OperatorCoordinator to a task was lost. Triggering task failover to > > > > ensure > > > > > consistency. Event: '[NoMoreSplitEvent]', targetTask: <task name> - > > > > > execution #0 > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) > > > > > at > > > > > > > > > > > > > > > > > > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) > > > > > at > > > > > > > > > > > > > > > > > > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) > > > > > at > > > > > > > > > > > > > > > > > > > > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) > > > > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > > > > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > > > > > at > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > > > > > at akka.japi.pf > > .UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > > > > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > > > > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > > > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > > > > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > > > > > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > > > > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > > > > > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > > > > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > > > > > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > > > > > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > > > > > at > akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > > > > at > > > > > > > > > > > > > > > > > > > > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > > > > at > > > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > > > > at > > > > > > > > > > > > > > > > > > > > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > > > > > > > Becket Qin is investigating this issue. > > > > > > > > > > > > > |
In reply to this post by Matthias
+1 (binding)
- checked signatures and checksums - built from source - ran example jobs with parallelism=8000 on a YARN cluster. checked output and logs. - the website PR looks good Thanks, Zhu Matthias Pohl <[hidden email]> 于2021年4月29日周四 上午2:34写道: > Thanks Dawid and Guowei for managing this release. > > - downloaded the sources and binaries and checked the checksums > - built Flink from the downloaded sources > - executed example jobs with standalone deployments - I didn't find > anything suspicious in the logs > - reviewed release announcement pull request > > - I did a pass over dependency updates: git diff release-1.12.2 > release-1.13.0-rc2 */*.xml > There's one thing someone should double-check whether that's suppose to be > like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a > dependency but I don't see it being reflected in the NOTICE file of the > flink-python module. Or is this automatically added later on? > > +1 (non-binding; please see remark on dependency above) > > Matthias > > On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen <[hidden email]> wrote: > > > Glad to hear that outcome. And no worries about the false alarm. > > Thank you for doing thorough testing, this is very helpful! > > > > On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng <[hidden email]> > wrote: > > > > > After the investigation we found that this issue is caused by the > > > implementation of connector, not by the Flink framework. > > > > > > Sorry for the false alarm. > > > > > > Stephan Ewen <[hidden email]> 于2021年4月28日周三 下午3:23写道: > > > > > > > @Caizhi and @Becket - let me reach out to you to jointly debug this > > > issue. > > > > > > > > I am wondering if there is some incorrect reporting of failed events? > > > > > > > > On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng <[hidden email]> > > > wrote: > > > > > > > > > -1 > > > > > > > > > > We're testing this version on batch jobs with large (600~1000) > > > > parallelisms > > > > > and the following exception messages appear with high frequency: > > > > > > > > > > 2021-04-27 21:27:26 > > > > > org.apache.flink.util.FlinkException: An OperatorEvent from an > > > > > OperatorCoordinator to a task was lost. Triggering task failover to > > > > ensure > > > > > consistency. Event: '[NoMoreSplitEvent]', targetTask: <task name> - > > > > > execution #0 > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) > > > > > at > > > > > > > > > > > > > > > > > > > > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) > > > > > at > > > > > > > > > > > > > > > > > > > > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) > > > > > at > > > > > > > > > > > > > > > > > > > > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) > > > > > at > > > > > > > > > > > > > > > > > > > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) > > > > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > > > > > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > > > > > at > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > > > > > at akka.japi.pf > > .UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > > > > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > > > > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > > > > at > > scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > > > > > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > > > > > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > > > > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > > > > > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > > > > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > > > > > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > > > > > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > > > > > at > akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > > > > at > > > > > > > > > > > > > > > > > > > > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > > > > at > > > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > > > > at > > > > > > > > > > > > > > > > > > > > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > > > > > > > > > > Becket Qin is investigating this issue. > > > > > > > > > > > > > |
In reply to this post by Guowei Ma
Hey Matthias,
I'd like to double confirm what Guowei said. The dependency is Apache 2 licensed and we do not bundle it in our jar (as it is in the runtime scope) thus we do not need to mention it in the NOTICE file (btw, the best way to check what is bundled is to check the output of maven shade plugin). Thanks for checking it! Best, Dawid On 29/04/2021 05:25, Guowei Ma wrote: > Hi, Matthias > > Thank you very much for your careful inspection. > I check the flink-python_2.11-1.13.0.jar and we do not bundle > org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it. > So I think we may not need to add this to the NOTICE file. (BTW The jar's > scope is runtime) > > Best, > Guowei > > > On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl <[hidden email]> > wrote: > >> Thanks Dawid and Guowei for managing this release. >> >> - downloaded the sources and binaries and checked the checksums >> - built Flink from the downloaded sources >> - executed example jobs with standalone deployments - I didn't find >> anything suspicious in the logs >> - reviewed release announcement pull request >> >> - I did a pass over dependency updates: git diff release-1.12.2 >> release-1.13.0-rc2 */*.xml >> There's one thing someone should double-check whether that's suppose to be >> like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a >> dependency but I don't see it being reflected in the NOTICE file of the >> flink-python module. Or is this automatically added later on? >> >> +1 (non-binding; please see remark on dependency above) >> >> Matthias >> >> On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen <[hidden email]> wrote: >> >>> Glad to hear that outcome. And no worries about the false alarm. >>> Thank you for doing thorough testing, this is very helpful! >>> >>> On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng <[hidden email]> >> wrote: >>>> After the investigation we found that this issue is caused by the >>>> implementation of connector, not by the Flink framework. >>>> >>>> Sorry for the false alarm. >>>> >>>> Stephan Ewen <[hidden email]> 于2021年4月28日周三 下午3:23写道: >>>> >>>>> @Caizhi and @Becket - let me reach out to you to jointly debug this >>>> issue. >>>>> I am wondering if there is some incorrect reporting of failed events? >>>>> >>>>> On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng <[hidden email]> >>>> wrote: >>>>>> -1 >>>>>> >>>>>> We're testing this version on batch jobs with large (600~1000) >>>>> parallelisms >>>>>> and the following exception messages appear with high frequency: >>>>>> >>>>>> 2021-04-27 21:27:26 >>>>>> org.apache.flink.util.FlinkException: An OperatorEvent from an >>>>>> OperatorCoordinator to a task was lost. Triggering task failover to >>>>> ensure >>>>>> consistency. Event: '[NoMoreSplitEvent]', targetTask: <task name> - >>>>>> execution #0 >>>>>> at >>>>>> >>>>>> >> org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) >>>>>> at >>>>>> >>>>>> >> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) >>>>>> at >>>>>> >>>>>> >> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) >>>>>> at >>>>>> >>>>>> >> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) >>>>>> at >>>>>> >>>>>> >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) >>>>>> at >>>>>> >>>>>> >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) >>>>>> at >>>>>> >>>>>> >> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) >>>>>> at >>>>>> >>>>>> >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) >>>>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) >>>>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) >>>>>> at >> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) >>>>>> at akka.japi.pf >>> .UnitCaseStatement.applyOrElse(CaseStatements.scala:21) >>>>>> at >>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) >>>>>> at >>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) >>>>>> at >>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) >>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517) >>>>>> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) >>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) >>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:561) >>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) >>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:225) >>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235) >>>>>> at >> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>>>> at >>>>>> >>>>>> >> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>>>>> at >>>> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>>>> at >>>>>> >>>>>> >> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>>>>> Becket Qin is investigating this issue. >>>>>> OpenPGP_signature (855 bytes) Download Attachment |
+1 (binding)
- checked/verified signatures and hashes - started cluster and run some e2e sql queries using SQL Client, results are as expect: * read from kafka source, window aggregate, lookup mysql database, write into elasticsearch * window aggregate using legacy window syntax and new window TVF * verified web ui and log output - reviewed the release PR I found the log contains some verbose information when using window aggregate, but I think this doesn't block the release, I created FLINK-22522 to fix it. Best, Jark On Thu, 29 Apr 2021 at 14:46, Dawid Wysakowicz <[hidden email]> wrote: > Hey Matthias, > > I'd like to double confirm what Guowei said. The dependency is Apache 2 > licensed and we do not bundle it in our jar (as it is in the runtime > scope) thus we do not need to mention it in the NOTICE file (btw, the > best way to check what is bundled is to check the output of maven shade > plugin). Thanks for checking it! > > Best, > > Dawid > > On 29/04/2021 05:25, Guowei Ma wrote: > > Hi, Matthias > > > > Thank you very much for your careful inspection. > > I check the flink-python_2.11-1.13.0.jar and we do not bundle > > org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it. > > So I think we may not need to add this to the NOTICE file. (BTW The jar's > > scope is runtime) > > > > Best, > > Guowei > > > > > > On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl <[hidden email]> > > wrote: > > > >> Thanks Dawid and Guowei for managing this release. > >> > >> - downloaded the sources and binaries and checked the checksums > >> - built Flink from the downloaded sources > >> - executed example jobs with standalone deployments - I didn't find > >> anything suspicious in the logs > >> - reviewed release announcement pull request > >> > >> - I did a pass over dependency updates: git diff release-1.12.2 > >> release-1.13.0-rc2 */*.xml > >> There's one thing someone should double-check whether that's suppose to > be > >> like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a > >> dependency but I don't see it being reflected in the NOTICE file of the > >> flink-python module. Or is this automatically added later on? > >> > >> +1 (non-binding; please see remark on dependency above) > >> > >> Matthias > >> > >> On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen <[hidden email]> wrote: > >> > >>> Glad to hear that outcome. And no worries about the false alarm. > >>> Thank you for doing thorough testing, this is very helpful! > >>> > >>> On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng <[hidden email]> > >> wrote: > >>>> After the investigation we found that this issue is caused by the > >>>> implementation of connector, not by the Flink framework. > >>>> > >>>> Sorry for the false alarm. > >>>> > >>>> Stephan Ewen <[hidden email]> 于2021年4月28日周三 下午3:23写道: > >>>> > >>>>> @Caizhi and @Becket - let me reach out to you to jointly debug this > >>>> issue. > >>>>> I am wondering if there is some incorrect reporting of failed events? > >>>>> > >>>>> On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng <[hidden email]> > >>>> wrote: > >>>>>> -1 > >>>>>> > >>>>>> We're testing this version on batch jobs with large (600~1000) > >>>>> parallelisms > >>>>>> and the following exception messages appear with high frequency: > >>>>>> > >>>>>> 2021-04-27 21:27:26 > >>>>>> org.apache.flink.util.FlinkException: An OperatorEvent from an > >>>>>> OperatorCoordinator to a task was lost. Triggering task failover to > >>>>> ensure > >>>>>> consistency. Event: '[NoMoreSplitEvent]', targetTask: <task name> - > >>>>>> execution #0 > >>>>>> at > >>>>>> > >>>>>> > >> > org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) > >>>>>> at > >>>>>> > >>>>>> > >> > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) > >>>>>> at > >>>>>> > >>>>>> > >> > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) > >>>>>> at > >>>>>> > >>>>>> > >> > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > >>>>>> at > >>>>>> > >>>>>> > >> > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) > >>>>>> at > >>>>>> > >>>>>> > >> > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) > >>>>>> at > >>>>>> > >>>>>> > >> > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) > >>>>>> at > >>>>>> > >>>>>> > >> > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) > >>>>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > >>>>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > >>>>>> at > >> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > >>>>>> at akka.japi.pf > >>> .UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > >>>>>> at > >>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > >>>>>> at > >>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > >>>>>> at > >>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > >>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > >>>>>> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > >>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > >>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:561) > >>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > >>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:225) > >>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > >>>>>> at > >> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > >>>>>> at > >>>>>> > >>>>>> > >> > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > >>>>>> at > >>>> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > >>>>>> at > >>>>>> > >>>>>> > >> > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > >>>>>> Becket Qin is investigating this issue. > >>>>>> > > |
+1 (binding)
- Verified the signature and checksum - Installed PyFlink successfully using the source package - Run a few PyFlink examples: Python UDF, Pandas UDF, Python DataStream API with state access, Python DataStream API with batch execution mode - Reviewed the website PR Regards, Dian > 2021年4月29日 下午3:11,Jark Wu <[hidden email]> 写道: > > +1 (binding) > > - checked/verified signatures and hashes > - started cluster and run some e2e sql queries using SQL Client, results > are as expect: > * read from kafka source, window aggregate, lookup mysql database, write > into elasticsearch > * window aggregate using legacy window syntax and new window TVF > * verified web ui and log output > - reviewed the release PR > > I found the log contains some verbose information when using window > aggregate, > but I think this doesn't block the release, I created FLINK-22522 to fix > it. > > Best, > Jark > > > On Thu, 29 Apr 2021 at 14:46, Dawid Wysakowicz <[hidden email]> > wrote: > >> Hey Matthias, >> >> I'd like to double confirm what Guowei said. The dependency is Apache 2 >> licensed and we do not bundle it in our jar (as it is in the runtime >> scope) thus we do not need to mention it in the NOTICE file (btw, the >> best way to check what is bundled is to check the output of maven shade >> plugin). Thanks for checking it! >> >> Best, >> >> Dawid >> >> On 29/04/2021 05:25, Guowei Ma wrote: >>> Hi, Matthias >>> >>> Thank you very much for your careful inspection. >>> I check the flink-python_2.11-1.13.0.jar and we do not bundle >>> org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it. >>> So I think we may not need to add this to the NOTICE file. (BTW The jar's >>> scope is runtime) >>> >>> Best, >>> Guowei >>> >>> >>> On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl <[hidden email]> >>> wrote: >>> >>>> Thanks Dawid and Guowei for managing this release. >>>> >>>> - downloaded the sources and binaries and checked the checksums >>>> - built Flink from the downloaded sources >>>> - executed example jobs with standalone deployments - I didn't find >>>> anything suspicious in the logs >>>> - reviewed release announcement pull request >>>> >>>> - I did a pass over dependency updates: git diff release-1.12.2 >>>> release-1.13.0-rc2 */*.xml >>>> There's one thing someone should double-check whether that's suppose to >> be >>>> like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a >>>> dependency but I don't see it being reflected in the NOTICE file of the >>>> flink-python module. Or is this automatically added later on? >>>> >>>> +1 (non-binding; please see remark on dependency above) >>>> >>>> Matthias >>>> >>>> On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen <[hidden email]> wrote: >>>> >>>>> Glad to hear that outcome. And no worries about the false alarm. >>>>> Thank you for doing thorough testing, this is very helpful! >>>>> >>>>> On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng <[hidden email]> >>>> wrote: >>>>>> After the investigation we found that this issue is caused by the >>>>>> implementation of connector, not by the Flink framework. >>>>>> >>>>>> Sorry for the false alarm. >>>>>> >>>>>> Stephan Ewen <[hidden email]> 于2021年4月28日周三 下午3:23写道: >>>>>> >>>>>>> @Caizhi and @Becket - let me reach out to you to jointly debug this >>>>>> issue. >>>>>>> I am wondering if there is some incorrect reporting of failed events? >>>>>>> >>>>>>> On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng <[hidden email]> >>>>>> wrote: >>>>>>>> -1 >>>>>>>> >>>>>>>> We're testing this version on batch jobs with large (600~1000) >>>>>>> parallelisms >>>>>>>> and the following exception messages appear with high frequency: >>>>>>>> >>>>>>>> 2021-04-27 21:27:26 >>>>>>>> org.apache.flink.util.FlinkException: An OperatorEvent from an >>>>>>>> OperatorCoordinator to a task was lost. Triggering task failover to >>>>>>> ensure >>>>>>>> consistency. Event: '[NoMoreSplitEvent]', targetTask: <task name> - >>>>>>>> execution #0 >>>>>>>> at >>>>>>>> >>>>>>>> >>>> >> org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) >>>>>>>> at >>>>>>>> >>>>>>>> >>>> >> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) >>>>>>>> at >>>>>>>> >>>>>>>> >>>> >> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) >>>>>>>> at >>>>>>>> >>>>>>>> >>>> >> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) >>>>>>>> at >>>>>>>> >>>>>>>> >>>> >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) >>>>>>>> at >>>>>>>> >>>>>>>> >>>> >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) >>>>>>>> at >>>>>>>> >>>>>>>> >>>> >> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) >>>>>>>> at >>>>>>>> >>>>>>>> >>>> >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) >>>>>>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) >>>>>>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) >>>>>>>> at >>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) >>>>>>>> at akka.japi.pf >>>>> .UnitCaseStatement.applyOrElse(CaseStatements.scala:21) >>>>>>>> at >>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) >>>>>>>> at >>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) >>>>>>>> at >>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) >>>>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517) >>>>>>>> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) >>>>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) >>>>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:561) >>>>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) >>>>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:225) >>>>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235) >>>>>>>> at >>>> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>>>>>> at >>>>>>>> >>>>>>>> >>>> >> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>>>>>>> at >>>>>> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>>>>>> at >>>>>>>> >>>>>>>> >>>> >> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>>>>>>> Becket Qin is investigating this issue. >>>>>>>> >> >> |
+1 (non-binding)
- verified checksum and signature - test upload `apache-flink` and `apache-flink-libraries` to test.pypi - pip install `apache-flink-libraries` and `apache-flink` in mac os - started cluster and run row-based operation test - started cluster and test python general group window agg Best, Xingbo Dian Fu <[hidden email]> 于2021年4月29日周四 下午4:05写道: > +1 (binding) > > - Verified the signature and checksum > - Installed PyFlink successfully using the source package > - Run a few PyFlink examples: Python UDF, Pandas UDF, Python DataStream > API with state access, Python DataStream API with batch execution mode > - Reviewed the website PR > > Regards, > Dian > > > 2021年4月29日 下午3:11,Jark Wu <[hidden email]> 写道: > > > > +1 (binding) > > > > - checked/verified signatures and hashes > > - started cluster and run some e2e sql queries using SQL Client, results > > are as expect: > > * read from kafka source, window aggregate, lookup mysql database, write > > into elasticsearch > > * window aggregate using legacy window syntax and new window TVF > > * verified web ui and log output > > - reviewed the release PR > > > > I found the log contains some verbose information when using window > > aggregate, > > but I think this doesn't block the release, I created FLINK-22522 to fix > > it. > > > > Best, > > Jark > > > > > > On Thu, 29 Apr 2021 at 14:46, Dawid Wysakowicz <[hidden email]> > > wrote: > > > >> Hey Matthias, > >> > >> I'd like to double confirm what Guowei said. The dependency is Apache 2 > >> licensed and we do not bundle it in our jar (as it is in the runtime > >> scope) thus we do not need to mention it in the NOTICE file (btw, the > >> best way to check what is bundled is to check the output of maven shade > >> plugin). Thanks for checking it! > >> > >> Best, > >> > >> Dawid > >> > >> On 29/04/2021 05:25, Guowei Ma wrote: > >>> Hi, Matthias > >>> > >>> Thank you very much for your careful inspection. > >>> I check the flink-python_2.11-1.13.0.jar and we do not bundle > >>> org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it. > >>> So I think we may not need to add this to the NOTICE file. (BTW The > jar's > >>> scope is runtime) > >>> > >>> Best, > >>> Guowei > >>> > >>> > >>> On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl <[hidden email]> > >>> wrote: > >>> > >>>> Thanks Dawid and Guowei for managing this release. > >>>> > >>>> - downloaded the sources and binaries and checked the checksums > >>>> - built Flink from the downloaded sources > >>>> - executed example jobs with standalone deployments - I didn't find > >>>> anything suspicious in the logs > >>>> - reviewed release announcement pull request > >>>> > >>>> - I did a pass over dependency updates: git diff release-1.12.2 > >>>> release-1.13.0-rc2 */*.xml > >>>> There's one thing someone should double-check whether that's suppose > to > >> be > >>>> like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a > >>>> dependency but I don't see it being reflected in the NOTICE file of > the > >>>> flink-python module. Or is this automatically added later on? > >>>> > >>>> +1 (non-binding; please see remark on dependency above) > >>>> > >>>> Matthias > >>>> > >>>> On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen <[hidden email]> > wrote: > >>>> > >>>>> Glad to hear that outcome. And no worries about the false alarm. > >>>>> Thank you for doing thorough testing, this is very helpful! > >>>>> > >>>>> On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng <[hidden email]> > >>>> wrote: > >>>>>> After the investigation we found that this issue is caused by the > >>>>>> implementation of connector, not by the Flink framework. > >>>>>> > >>>>>> Sorry for the false alarm. > >>>>>> > >>>>>> Stephan Ewen <[hidden email]> 于2021年4月28日周三 下午3:23写道: > >>>>>> > >>>>>>> @Caizhi and @Becket - let me reach out to you to jointly debug this > >>>>>> issue. > >>>>>>> I am wondering if there is some incorrect reporting of failed > events? > >>>>>>> > >>>>>>> On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng <[hidden email]> > >>>>>> wrote: > >>>>>>>> -1 > >>>>>>>> > >>>>>>>> We're testing this version on batch jobs with large (600~1000) > >>>>>>> parallelisms > >>>>>>>> and the following exception messages appear with high frequency: > >>>>>>>> > >>>>>>>> 2021-04-27 21:27:26 > >>>>>>>> org.apache.flink.util.FlinkException: An OperatorEvent from an > >>>>>>>> OperatorCoordinator to a task was lost. Triggering task failover > to > >>>>>>> ensure > >>>>>>>> consistency. Event: '[NoMoreSplitEvent]', targetTask: <task name> > - > >>>>>>>> execution #0 > >>>>>>>> at > >>>>>>>> > >>>>>>>> > >>>> > >> > org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) > >>>>>>>> at > >>>>>>>> > >>>>>>>> > >>>> > >> > java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) > >>>>>>>> at > >>>>>>>> > >>>>>>>> > >>>> > >> > java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) > >>>>>>>> at > >>>>>>>> > >>>>>>>> > >>>> > >> > java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) > >>>>>>>> at > >>>>>>>> > >>>>>>>> > >>>> > >> > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) > >>>>>>>> at > >>>>>>>> > >>>>>>>> > >>>> > >> > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) > >>>>>>>> at > >>>>>>>> > >>>>>>>> > >>>> > >> > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) > >>>>>>>> at > >>>>>>>> > >>>>>>>> > >>>> > >> > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) > >>>>>>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > >>>>>>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > >>>>>>>> at > >>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > >>>>>>>> at akka.japi.pf > >>>>> .UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > >>>>>>>> at > >>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > >>>>>>>> at > >>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > >>>>>>>> at > >>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > >>>>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > >>>>>>>> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > >>>>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > >>>>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:561) > >>>>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > >>>>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:225) > >>>>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > >>>>>>>> at > >>>> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > >>>>>>>> at > >>>>>>>> > >>>>>>>> > >>>> > >> > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > >>>>>>>> at > >>>>>> > akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > >>>>>>>> at > >>>>>>>> > >>>>>>>> > >>>> > >> > akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > >>>>>>>> Becket Qin is investigating this issue. > >>>>>>>> > >> > >> > > |
+1 (non-binding)
- verified signatures and hashes - built from source code with scala 2.11 succeeded - started a cluster, WebUI was accessible, ran some simple SQL jobs, no suspicious log output - tested time functions and time zone usage in SQL Client, the query result is as expected - the web PR looks good - found one minor exception message typo, will improve it later Best, Leonard Xu > 在 2021年4月29日,16:11,Xingbo Huang <[hidden email]> 写道: > > +1 (non-binding) > > - verified checksum and signature > - test upload `apache-flink` and `apache-flink-libraries` to test.pypi > - pip install `apache-flink-libraries` and `apache-flink` in mac os > - started cluster and run row-based operation test > - started cluster and test python general group window agg > > Best, > Xingbo > > Dian Fu <[hidden email]> 于2021年4月29日周四 下午4:05写道: > >> +1 (binding) >> >> - Verified the signature and checksum >> - Installed PyFlink successfully using the source package >> - Run a few PyFlink examples: Python UDF, Pandas UDF, Python DataStream >> API with state access, Python DataStream API with batch execution mode >> - Reviewed the website PR >> >> Regards, >> Dian >> >>> 2021年4月29日 下午3:11,Jark Wu <[hidden email]> 写道: >>> >>> +1 (binding) >>> >>> - checked/verified signatures and hashes >>> - started cluster and run some e2e sql queries using SQL Client, results >>> are as expect: >>> * read from kafka source, window aggregate, lookup mysql database, write >>> into elasticsearch >>> * window aggregate using legacy window syntax and new window TVF >>> * verified web ui and log output >>> - reviewed the release PR >>> >>> I found the log contains some verbose information when using window >>> aggregate, >>> but I think this doesn't block the release, I created FLINK-22522 to fix >>> it. >>> >>> Best, >>> Jark >>> >>> >>> On Thu, 29 Apr 2021 at 14:46, Dawid Wysakowicz <[hidden email]> >>> wrote: >>> >>>> Hey Matthias, >>>> >>>> I'd like to double confirm what Guowei said. The dependency is Apache 2 >>>> licensed and we do not bundle it in our jar (as it is in the runtime >>>> scope) thus we do not need to mention it in the NOTICE file (btw, the >>>> best way to check what is bundled is to check the output of maven shade >>>> plugin). Thanks for checking it! >>>> >>>> Best, >>>> >>>> Dawid >>>> >>>> On 29/04/2021 05:25, Guowei Ma wrote: >>>>> Hi, Matthias >>>>> >>>>> Thank you very much for your careful inspection. >>>>> I check the flink-python_2.11-1.13.0.jar and we do not bundle >>>>> org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it. >>>>> So I think we may not need to add this to the NOTICE file. (BTW The >> jar's >>>>> scope is runtime) >>>>> >>>>> Best, >>>>> Guowei >>>>> >>>>> >>>>> On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl <[hidden email]> >>>>> wrote: >>>>> >>>>>> Thanks Dawid and Guowei for managing this release. >>>>>> >>>>>> - downloaded the sources and binaries and checked the checksums >>>>>> - built Flink from the downloaded sources >>>>>> - executed example jobs with standalone deployments - I didn't find >>>>>> anything suspicious in the logs >>>>>> - reviewed release announcement pull request >>>>>> >>>>>> - I did a pass over dependency updates: git diff release-1.12.2 >>>>>> release-1.13.0-rc2 */*.xml >>>>>> There's one thing someone should double-check whether that's suppose >> to >>>> be >>>>>> like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a >>>>>> dependency but I don't see it being reflected in the NOTICE file of >> the >>>>>> flink-python module. Or is this automatically added later on? >>>>>> >>>>>> +1 (non-binding; please see remark on dependency above) >>>>>> >>>>>> Matthias >>>>>> >>>>>> On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen <[hidden email]> >> wrote: >>>>>> >>>>>>> Glad to hear that outcome. And no worries about the false alarm. >>>>>>> Thank you for doing thorough testing, this is very helpful! >>>>>>> >>>>>>> On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng <[hidden email]> >>>>>> wrote: >>>>>>>> After the investigation we found that this issue is caused by the >>>>>>>> implementation of connector, not by the Flink framework. >>>>>>>> >>>>>>>> Sorry for the false alarm. >>>>>>>> >>>>>>>> Stephan Ewen <[hidden email]> 于2021年4月28日周三 下午3:23写道: >>>>>>>> >>>>>>>>> @Caizhi and @Becket - let me reach out to you to jointly debug this >>>>>>>> issue. >>>>>>>>> I am wondering if there is some incorrect reporting of failed >> events? >>>>>>>>> >>>>>>>>> On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng <[hidden email]> >>>>>>>> wrote: >>>>>>>>>> -1 >>>>>>>>>> >>>>>>>>>> We're testing this version on batch jobs with large (600~1000) >>>>>>>>> parallelisms >>>>>>>>>> and the following exception messages appear with high frequency: >>>>>>>>>> >>>>>>>>>> 2021-04-27 21:27:26 >>>>>>>>>> org.apache.flink.util.FlinkException: An OperatorEvent from an >>>>>>>>>> OperatorCoordinator to a task was lost. Triggering task failover >> to >>>>>>>>> ensure >>>>>>>>>> consistency. Event: '[NoMoreSplitEvent]', targetTask: <task name> >> - >>>>>>>>>> execution #0 >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) >>>>>>>>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) >>>>>>>>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) >>>>>>>>>> at >>>>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) >>>>>>>>>> at akka.japi.pf >>>>>>> .UnitCaseStatement.applyOrElse(CaseStatements.scala:21) >>>>>>>>>> at >>>>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) >>>>>>>>>> at >>>>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) >>>>>>>>>> at >>>>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) >>>>>>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517) >>>>>>>>>> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) >>>>>>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) >>>>>>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:561) >>>>>>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) >>>>>>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:225) >>>>>>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235) >>>>>>>>>> at >>>>>> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>>>>>>>>> at >>>>>>>> >> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>>>>>>>>> Becket Qin is investigating this issue. >>>>>>>>>> >>>> >>>> >> >> |
+1 (non-binding)
- Build the Pravega Flink connector with RC artifacts and all tests pass - Start a cluster, Run Pravega reader and writer application on it successfully Thanks, Brian -----Original Message----- From: Leonard Xu <[hidden email]> Sent: Thursday, April 29, 2021 16:53 To: dev Subject: Re: [VOTE] Release 1.13.0, release candidate #2 [EXTERNAL EMAIL] +1 (non-binding) - verified signatures and hashes - built from source code with scala 2.11 succeeded - started a cluster, WebUI was accessible, ran some simple SQL jobs, no suspicious log output - tested time functions and time zone usage in SQL Client, the query result is as expected - the web PR looks good - found one minor exception message typo, will improve it later Best, Leonard Xu > 在 2021年4月29日,16:11,Xingbo Huang <[hidden email]> 写道: > > +1 (non-binding) > > - verified checksum and signature > - test upload `apache-flink` and `apache-flink-libraries` to test.pypi > - pip install `apache-flink-libraries` and `apache-flink` in mac os > - started cluster and run row-based operation test > - started cluster and test python general group window agg > > Best, > Xingbo > > Dian Fu <[hidden email]> 于2021年4月29日周四 下午4:05写道: > >> +1 (binding) >> >> - Verified the signature and checksum >> - Installed PyFlink successfully using the source package >> - Run a few PyFlink examples: Python UDF, Pandas UDF, Python >> DataStream API with state access, Python DataStream API with batch >> execution mode >> - Reviewed the website PR >> >> Regards, >> Dian >> >>> 2021年4月29日 下午3:11,Jark Wu <[hidden email]> 写道: >>> >>> +1 (binding) >>> >>> - checked/verified signatures and hashes >>> - started cluster and run some e2e sql queries using SQL Client, >>> results are as expect: >>> * read from kafka source, window aggregate, lookup mysql database, >>> write into elasticsearch >>> * window aggregate using legacy window syntax and new window TVF >>> * verified web ui and log output >>> - reviewed the release PR >>> >>> I found the log contains some verbose information when using window >>> aggregate, but I think this doesn't block the release, I created >>> FLINK-22522 to fix it. >>> >>> Best, >>> Jark >>> >>> >>> On Thu, 29 Apr 2021 at 14:46, Dawid Wysakowicz >>> <[hidden email]> >>> wrote: >>> >>>> Hey Matthias, >>>> >>>> I'd like to double confirm what Guowei said. The dependency is >>>> Apache 2 licensed and we do not bundle it in our jar (as it is in >>>> the runtime >>>> scope) thus we do not need to mention it in the NOTICE file (btw, >>>> the best way to check what is bundled is to check the output of >>>> maven shade plugin). Thanks for checking it! >>>> >>>> Best, >>>> >>>> Dawid >>>> >>>> On 29/04/2021 05:25, Guowei Ma wrote: >>>>> Hi, Matthias >>>>> >>>>> Thank you very much for your careful inspection. >>>>> I check the flink-python_2.11-1.13.0.jar and we do not bundle >>>>> org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it. >>>>> So I think we may not need to add this to the NOTICE file. (BTW >>>>> The >> jar's >>>>> scope is runtime) >>>>> >>>>> Best, >>>>> Guowei >>>>> >>>>> >>>>> On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl >>>>> <[hidden email]> >>>>> wrote: >>>>> >>>>>> Thanks Dawid and Guowei for managing this release. >>>>>> >>>>>> - downloaded the sources and binaries and checked the checksums >>>>>> - built Flink from the downloaded sources >>>>>> - executed example jobs with standalone deployments - I didn't >>>>>> find anything suspicious in the logs >>>>>> - reviewed release announcement pull request >>>>>> >>>>>> - I did a pass over dependency updates: git diff release-1.12.2 >>>>>> release-1.13.0-rc2 */*.xml >>>>>> There's one thing someone should double-check whether that's >>>>>> suppose >> to >>>> be >>>>>> like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as >>>>>> a dependency but I don't see it being reflected in the NOTICE >>>>>> file of >> the >>>>>> flink-python module. Or is this automatically added later on? >>>>>> >>>>>> +1 (non-binding; please see remark on dependency above) >>>>>> >>>>>> Matthias >>>>>> >>>>>> On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen <[hidden email]> >> wrote: >>>>>> >>>>>>> Glad to hear that outcome. And no worries about the false alarm. >>>>>>> Thank you for doing thorough testing, this is very helpful! >>>>>>> >>>>>>> On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng >>>>>>> <[hidden email]> >>>>>> wrote: >>>>>>>> After the investigation we found that this issue is caused by >>>>>>>> the implementation of connector, not by the Flink framework. >>>>>>>> >>>>>>>> Sorry for the false alarm. >>>>>>>> >>>>>>>> Stephan Ewen <[hidden email]> 于2021年4月28日周三 下午3:23写道: >>>>>>>> >>>>>>>>> @Caizhi and @Becket - let me reach out to you to jointly debug >>>>>>>>> this >>>>>>>> issue. >>>>>>>>> I am wondering if there is some incorrect reporting of failed >> events? >>>>>>>>> >>>>>>>>> On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng >>>>>>>>> <[hidden email]> >>>>>>>> wrote: >>>>>>>>>> -1 >>>>>>>>>> >>>>>>>>>> We're testing this version on batch jobs with large >>>>>>>>>> (600~1000) >>>>>>>>> parallelisms >>>>>>>>>> and the following exception messages appear with high frequency: >>>>>>>>>> >>>>>>>>>> 2021-04-27 21:27:26 >>>>>>>>>> org.apache.flink.util.FlinkException: An OperatorEvent from >>>>>>>>>> an OperatorCoordinator to a task was lost. Triggering task >>>>>>>>>> failover >> to >>>>>>>>> ensure >>>>>>>>>> consistency. Event: '[NoMoreSplitEvent]', targetTask: <task >>>>>>>>>> name> >> - >>>>>>>>>> execution #0 >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.la >> mbda$sendEvent$0(SubtaskGatewayImpl.java:81) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.ja >> va:822) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableF >> uture.java:797) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> java.util.concurrent.CompletableFuture$Completion.run(CompletableFutu >> re.java:442) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpc >> Actor.java:440) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaR >> pcActor.java:208) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage >> (FencedAkkaRpcActor.java:77) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcA >> ctor.java:158) >>>>>>>>>> at >>>>>>>>>> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) >>>>>>>>>> at >>>>>>>>>> akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) >>>>>>>>>> at >>>>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123 >>>>>> ) >>>>>>>>>> at akka.japi.pf >>>>>>> .UnitCaseStatement.applyOrElse(CaseStatements.scala:21) >>>>>>>>>> at >>>>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:1 >>>>>>> 70) >>>>>>>>>> at >>>>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:1 >>>>>>> 71) >>>>>>>>>> at >>>>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:1 >>>>>>> 71) >>>>>>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517) >>>>>>>>>> at >>>>>>>>>> akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:22 >>>>>>>>>> 5) at >>>>>>>>>> akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) >>>>>>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:561) >>>>>>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) >>>>>>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:225) >>>>>>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235) >>>>>>>>>> at >>>>>> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.ja >> va:1339) >>>>>>>>>> at >>>>>>>> >> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread. >> java:107) >>>>>>>>>> Becket Qin is investigating this issue. >>>>>>>>>> >>>> >>>> >> >> |
In reply to this post by Leonard Xu
+1 (non-binding)
- built from source code with scala 2.11 succeeded - submit state machine example and it runed well with expected commit id shown in UI. - enable state latency tracking with slf4j metrics reporter and all behaves as expected. - Click 'FlameGraph' but found the we UI did not give friendly hint to tell me enable it via setting rest.flamegraph.enabled: true, will create issue later. Best Yun Tang ________________________________ From: Leonard Xu <[hidden email]> Sent: Thursday, April 29, 2021 16:52 To: dev <[hidden email]> Subject: Re: [VOTE] Release 1.13.0, release candidate #2 +1 (non-binding) - verified signatures and hashes - built from source code with scala 2.11 succeeded - started a cluster, WebUI was accessible, ran some simple SQL jobs, no suspicious log output - tested time functions and time zone usage in SQL Client, the query result is as expected - the web PR looks good - found one minor exception message typo, will improve it later Best, Leonard Xu > 在 2021年4月29日,16:11,Xingbo Huang <[hidden email]> 写道: > > +1 (non-binding) > > - verified checksum and signature > - test upload `apache-flink` and `apache-flink-libraries` to test.pypi > - pip install `apache-flink-libraries` and `apache-flink` in mac os > - started cluster and run row-based operation test > - started cluster and test python general group window agg > > Best, > Xingbo > > Dian Fu <[hidden email]> 于2021年4月29日周四 下午4:05写道: > >> +1 (binding) >> >> - Verified the signature and checksum >> - Installed PyFlink successfully using the source package >> - Run a few PyFlink examples: Python UDF, Pandas UDF, Python DataStream >> API with state access, Python DataStream API with batch execution mode >> - Reviewed the website PR >> >> Regards, >> Dian >> >>> 2021年4月29日 下午3:11,Jark Wu <[hidden email]> 写道: >>> >>> +1 (binding) >>> >>> - checked/verified signatures and hashes >>> - started cluster and run some e2e sql queries using SQL Client, results >>> are as expect: >>> * read from kafka source, window aggregate, lookup mysql database, write >>> into elasticsearch >>> * window aggregate using legacy window syntax and new window TVF >>> * verified web ui and log output >>> - reviewed the release PR >>> >>> I found the log contains some verbose information when using window >>> aggregate, >>> but I think this doesn't block the release, I created FLINK-22522 to fix >>> it. >>> >>> Best, >>> Jark >>> >>> >>> On Thu, 29 Apr 2021 at 14:46, Dawid Wysakowicz <[hidden email]> >>> wrote: >>> >>>> Hey Matthias, >>>> >>>> I'd like to double confirm what Guowei said. The dependency is Apache 2 >>>> licensed and we do not bundle it in our jar (as it is in the runtime >>>> scope) thus we do not need to mention it in the NOTICE file (btw, the >>>> best way to check what is bundled is to check the output of maven shade >>>> plugin). Thanks for checking it! >>>> >>>> Best, >>>> >>>> Dawid >>>> >>>> On 29/04/2021 05:25, Guowei Ma wrote: >>>>> Hi, Matthias >>>>> >>>>> Thank you very much for your careful inspection. >>>>> I check the flink-python_2.11-1.13.0.jar and we do not bundle >>>>> org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it. >>>>> So I think we may not need to add this to the NOTICE file. (BTW The >> jar's >>>>> scope is runtime) >>>>> >>>>> Best, >>>>> Guowei >>>>> >>>>> >>>>> On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl <[hidden email]> >>>>> wrote: >>>>> >>>>>> Thanks Dawid and Guowei for managing this release. >>>>>> >>>>>> - downloaded the sources and binaries and checked the checksums >>>>>> - built Flink from the downloaded sources >>>>>> - executed example jobs with standalone deployments - I didn't find >>>>>> anything suspicious in the logs >>>>>> - reviewed release announcement pull request >>>>>> >>>>>> - I did a pass over dependency updates: git diff release-1.12.2 >>>>>> release-1.13.0-rc2 */*.xml >>>>>> There's one thing someone should double-check whether that's suppose >> to >>>> be >>>>>> like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a >>>>>> dependency but I don't see it being reflected in the NOTICE file of >> the >>>>>> flink-python module. Or is this automatically added later on? >>>>>> >>>>>> +1 (non-binding; please see remark on dependency above) >>>>>> >>>>>> Matthias >>>>>> >>>>>> On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen <[hidden email]> >> wrote: >>>>>> >>>>>>> Glad to hear that outcome. And no worries about the false alarm. >>>>>>> Thank you for doing thorough testing, this is very helpful! >>>>>>> >>>>>>> On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng <[hidden email]> >>>>>> wrote: >>>>>>>> After the investigation we found that this issue is caused by the >>>>>>>> implementation of connector, not by the Flink framework. >>>>>>>> >>>>>>>> Sorry for the false alarm. >>>>>>>> >>>>>>>> Stephan Ewen <[hidden email]> 于2021年4月28日周三 下午3:23写道: >>>>>>>> >>>>>>>>> @Caizhi and @Becket - let me reach out to you to jointly debug this >>>>>>>> issue. >>>>>>>>> I am wondering if there is some incorrect reporting of failed >> events? >>>>>>>>> >>>>>>>>> On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng <[hidden email]> >>>>>>>> wrote: >>>>>>>>>> -1 >>>>>>>>>> >>>>>>>>>> We're testing this version on batch jobs with large (600~1000) >>>>>>>>> parallelisms >>>>>>>>>> and the following exception messages appear with high frequency: >>>>>>>>>> >>>>>>>>>> 2021-04-27 21:27:26 >>>>>>>>>> org.apache.flink.util.FlinkException: An OperatorEvent from an >>>>>>>>>> OperatorCoordinator to a task was lost. Triggering task failover >> to >>>>>>>>> ensure >>>>>>>>>> consistency. Event: '[NoMoreSplitEvent]', targetTask: <task name> >> - >>>>>>>>>> execution #0 >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) >>>>>>>>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) >>>>>>>>>> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) >>>>>>>>>> at >>>>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) >>>>>>>>>> at akka.japi.pf >>>>>>> .UnitCaseStatement.applyOrElse(CaseStatements.scala:21) >>>>>>>>>> at >>>>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) >>>>>>>>>> at >>>>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) >>>>>>>>>> at >>>>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) >>>>>>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517) >>>>>>>>>> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) >>>>>>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) >>>>>>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:561) >>>>>>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) >>>>>>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:225) >>>>>>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235) >>>>>>>>>> at >>>>>> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) >>>>>>>>>> at >>>>>>>> >> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>>>>>>>> at >>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) >>>>>>>>>> Becket Qin is investigating this issue. >>>>>>>>>> >>>> >>>> >> >> |
Free forum by Nabble | Edit this page |