[VOTE] Release 1.13.0, release candidate #2

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.13.0, release candidate #2

Robert Metzger
Thanks for creating the RC and managing the release process so far Guowei
and Dawid!

+1 (binding)

Checks:
- I deployed the RC on AWS EMR (session cluster and per-job cluster). I
confirmed a minor issue Arvid Heise told me offline about:
https://issues.apache.org/jira/browse/FLINK-22509. I believe we can ignore
this issue.
- I tested reactive mode extensively on Kubernetes, letting it scale up and
down for a very long time (multiple weeks)
- Checked the changes to the pom files: all dependency changes seem to be
reflected properly in the NOTICE files
  - netty bump to 4.1.46
  - Elasticsearch to 1.15.1
  - hbase dependency changes
  - the new aws glue schema doesn't deploy anything foreign to maven
  -flink-sql-connector-hbase-1.4 excludes fewer hbase classes, but seems
fine
- the license checker has not been changed in this release cycle (there are
two exclusion lists in there)





On Thu, Apr 29, 2021 at 12:05 PM Yun Tang <[hidden email]> wrote:

> +1 (non-binding)
>
> - built from source code with scala 2.11 succeeded
> - submit state machine example and it runed well with expected commit id
> shown in UI.
> - enable state latency tracking with slf4j metrics reporter and all
> behaves as expected.
> - Click 'FlameGraph' but found the we UI did not give friendly hint to
> tell me enable it via setting rest.flamegraph.enabled: true, will create
> issue later.
>
> Best
> Yun Tang
> ________________________________
> From: Leonard Xu <[hidden email]>
> Sent: Thursday, April 29, 2021 16:52
> To: dev <[hidden email]>
> Subject: Re: [VOTE] Release 1.13.0, release candidate #2
>
> +1 (non-binding)
>
> - verified signatures and hashes
> - built from source code with scala 2.11 succeeded
> - started a cluster, WebUI was accessible, ran some simple SQL jobs, no
> suspicious log output
> - tested time functions and time zone usage in SQL Client, the query
> result is as expected
> - the web PR looks good
> - found one minor exception message typo, will improve it later
>
> Best,
> Leonard Xu
>
> > 在 2021年4月29日,16:11,Xingbo Huang <[hidden email]> 写道:
> >
> > +1 (non-binding)
> >
> > - verified checksum and signature
> > - test upload `apache-flink` and `apache-flink-libraries` to test.pypi
> > - pip install `apache-flink-libraries` and `apache-flink` in mac os
> > - started cluster and run row-based operation test
> > - started cluster and test python general group window agg
> >
> > Best,
> > Xingbo
> >
> > Dian Fu <[hidden email]> 于2021年4月29日周四 下午4:05写道:
> >
> >> +1 (binding)
> >>
> >> - Verified the signature and checksum
> >> - Installed PyFlink successfully using the source package
> >> - Run a few PyFlink examples: Python UDF, Pandas UDF, Python DataStream
> >> API with state access, Python DataStream API with batch execution mode
> >> - Reviewed the website PR
> >>
> >> Regards,
> >> Dian
> >>
> >>> 2021年4月29日 下午3:11,Jark Wu <[hidden email]> 写道:
> >>>
> >>> +1 (binding)
> >>>
> >>> - checked/verified signatures and hashes
> >>> - started cluster and run some e2e sql queries using SQL Client,
> results
> >>> are as expect:
> >>> * read from kafka source, window aggregate, lookup mysql database,
> write
> >>> into elasticsearch
> >>> * window aggregate using legacy window syntax and new window TVF
> >>> * verified web ui and log output
> >>> - reviewed the release PR
> >>>
> >>> I found the log contains some verbose information when using window
> >>> aggregate,
> >>> but I think this doesn't block the release, I created FLINK-22522 to
> fix
> >>> it.
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>>
> >>> On Thu, 29 Apr 2021 at 14:46, Dawid Wysakowicz <[hidden email]
> >
> >>> wrote:
> >>>
> >>>> Hey Matthias,
> >>>>
> >>>> I'd like to double confirm what Guowei said. The dependency is Apache
> 2
> >>>> licensed and we do not bundle it in our jar (as it is in the runtime
> >>>> scope) thus we do not need to mention it in the NOTICE file (btw, the
> >>>> best way to check what is bundled is to check the output of maven
> shade
> >>>> plugin). Thanks for checking it!
> >>>>
> >>>> Best,
> >>>>
> >>>> Dawid
> >>>>
> >>>> On 29/04/2021 05:25, Guowei Ma wrote:
> >>>>> Hi, Matthias
> >>>>>
> >>>>> Thank you very much for your careful inspection.
> >>>>> I check the flink-python_2.11-1.13.0.jar and we do not bundle
> >>>>> org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it.
> >>>>> So I think we may not need to add this to the NOTICE file. (BTW The
> >> jar's
> >>>>> scope is runtime)
> >>>>>
> >>>>> Best,
> >>>>> Guowei
> >>>>>
> >>>>>
> >>>>> On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl <
> [hidden email]>
> >>>>> wrote:
> >>>>>
> >>>>>> Thanks Dawid and Guowei for managing this release.
> >>>>>>
> >>>>>> - downloaded the sources and binaries and checked the checksums
> >>>>>> - built Flink from the downloaded sources
> >>>>>> - executed example jobs with standalone deployments - I didn't find
> >>>>>> anything suspicious in the logs
> >>>>>> - reviewed release announcement pull request
> >>>>>>
> >>>>>> - I did a pass over dependency updates: git diff release-1.12.2
> >>>>>> release-1.13.0-rc2 */*.xml
> >>>>>> There's one thing someone should double-check whether that's suppose
> >> to
> >>>> be
> >>>>>> like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a
> >>>>>> dependency but I don't see it being reflected in the NOTICE file of
> >> the
> >>>>>> flink-python module. Or is this automatically added later on?
> >>>>>>
> >>>>>> +1 (non-binding; please see remark on dependency above)
> >>>>>>
> >>>>>> Matthias
> >>>>>>
> >>>>>> On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen <[hidden email]>
> >> wrote:
> >>>>>>
> >>>>>>> Glad to hear that outcome. And no worries about the false alarm.
> >>>>>>> Thank you for doing thorough testing, this is very helpful!
> >>>>>>>
> >>>>>>> On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng <[hidden email]>
> >>>>>> wrote:
> >>>>>>>> After the investigation we found that this issue is caused by the
> >>>>>>>> implementation of connector, not by the Flink framework.
> >>>>>>>>
> >>>>>>>> Sorry for the false alarm.
> >>>>>>>>
> >>>>>>>> Stephan Ewen <[hidden email]> 于2021年4月28日周三 下午3:23写道:
> >>>>>>>>
> >>>>>>>>> @Caizhi and @Becket - let me reach out to you to jointly debug
> this
> >>>>>>>> issue.
> >>>>>>>>> I am wondering if there is some incorrect reporting of failed
> >> events?
> >>>>>>>>>
> >>>>>>>>> On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng <
> [hidden email]>
> >>>>>>>> wrote:
> >>>>>>>>>> -1
> >>>>>>>>>>
> >>>>>>>>>> We're testing this version on batch jobs with large (600~1000)
> >>>>>>>>> parallelisms
> >>>>>>>>>> and the following exception messages appear with high frequency:
> >>>>>>>>>>
> >>>>>>>>>> 2021-04-27 21:27:26
> >>>>>>>>>> org.apache.flink.util.FlinkException: An OperatorEvent from an
> >>>>>>>>>> OperatorCoordinator to a task was lost. Triggering task failover
> >> to
> >>>>>>>>> ensure
> >>>>>>>>>> consistency. Event: '[NoMoreSplitEvent]', targetTask: <task
> name>
> >> -
> >>>>>>>>>> execution #0
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>
> >>>>
> >>
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>
> >>>>
> >>
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>
> >>>>
> >>
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158)
> >>>>>>>>>> at akka.japi.pf
> .UnitCaseStatement.apply(CaseStatements.scala:26)
> >>>>>>>>>> at akka.japi.pf
> .UnitCaseStatement.apply(CaseStatements.scala:21)
> >>>>>>>>>> at
> >>>>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> >>>>>>>>>> at akka.japi.pf
> >>>>>>> .UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> >>>>>>>>>> at
> >>>>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
> >>>>>>>>>> at
> >>>>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> >>>>>>>>>> at
> >>>>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> >>>>>>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
> >>>>>>>>>> at
> akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> >>>>>>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> >>>>>>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
> >>>>>>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> >>>>>>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
> >>>>>>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
> >>>>>>>>>> at
> >>>>>> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>
> >>>>
> >>
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> >>>>>>>>>> at
> >>>>>>>>
> >> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> >>>>>>>>>> at
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>
> >>>>
> >>
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> >>>>>>>>>> Becket Qin is investigating this issue.
> >>>>>>>>>>
> >>>>
> >>>>
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.13.0, release candidate #2

dwysakowicz
Thank you all for helping to verify the release. Really appreciated! I
will conclude the vote in a separate thread.

Best,

Dawid

On 29/04/2021 14:16, Robert Metzger wrote:

> Thanks for creating the RC and managing the release process so far Guowei
> and Dawid!
>
> +1 (binding)
>
> Checks:
> - I deployed the RC on AWS EMR (session cluster and per-job cluster). I
> confirmed a minor issue Arvid Heise told me offline about:
> https://issues.apache.org/jira/browse/FLINK-22509. I believe we can ignore
> this issue.
> - I tested reactive mode extensively on Kubernetes, letting it scale up and
> down for a very long time (multiple weeks)
> - Checked the changes to the pom files: all dependency changes seem to be
> reflected properly in the NOTICE files
>   - netty bump to 4.1.46
>   - Elasticsearch to 1.15.1
>   - hbase dependency changes
>   - the new aws glue schema doesn't deploy anything foreign to maven
>   -flink-sql-connector-hbase-1.4 excludes fewer hbase classes, but seems
> fine
> - the license checker has not been changed in this release cycle (there are
> two exclusion lists in there)
>
>
>
>
>
> On Thu, Apr 29, 2021 at 12:05 PM Yun Tang <[hidden email]> wrote:
>
>> +1 (non-binding)
>>
>> - built from source code with scala 2.11 succeeded
>> - submit state machine example and it runed well with expected commit id
>> shown in UI.
>> - enable state latency tracking with slf4j metrics reporter and all
>> behaves as expected.
>> - Click 'FlameGraph' but found the we UI did not give friendly hint to
>> tell me enable it via setting rest.flamegraph.enabled: true, will create
>> issue later.
>>
>> Best
>> Yun Tang
>> ________________________________
>> From: Leonard Xu <[hidden email]>
>> Sent: Thursday, April 29, 2021 16:52
>> To: dev <[hidden email]>
>> Subject: Re: [VOTE] Release 1.13.0, release candidate #2
>>
>> +1 (non-binding)
>>
>> - verified signatures and hashes
>> - built from source code with scala 2.11 succeeded
>> - started a cluster, WebUI was accessible, ran some simple SQL jobs, no
>> suspicious log output
>> - tested time functions and time zone usage in SQL Client, the query
>> result is as expected
>> - the web PR looks good
>> - found one minor exception message typo, will improve it later
>>
>> Best,
>> Leonard Xu
>>
>>> 在 2021年4月29日,16:11,Xingbo Huang <[hidden email]> 写道:
>>>
>>> +1 (non-binding)
>>>
>>> - verified checksum and signature
>>> - test upload `apache-flink` and `apache-flink-libraries` to test.pypi
>>> - pip install `apache-flink-libraries` and `apache-flink` in mac os
>>> - started cluster and run row-based operation test
>>> - started cluster and test python general group window agg
>>>
>>> Best,
>>> Xingbo
>>>
>>> Dian Fu <[hidden email]> 于2021年4月29日周四 下午4:05写道:
>>>
>>>> +1 (binding)
>>>>
>>>> - Verified the signature and checksum
>>>> - Installed PyFlink successfully using the source package
>>>> - Run a few PyFlink examples: Python UDF, Pandas UDF, Python DataStream
>>>> API with state access, Python DataStream API with batch execution mode
>>>> - Reviewed the website PR
>>>>
>>>> Regards,
>>>> Dian
>>>>
>>>>> 2021年4月29日 下午3:11,Jark Wu <[hidden email]> 写道:
>>>>>
>>>>> +1 (binding)
>>>>>
>>>>> - checked/verified signatures and hashes
>>>>> - started cluster and run some e2e sql queries using SQL Client,
>> results
>>>>> are as expect:
>>>>> * read from kafka source, window aggregate, lookup mysql database,
>> write
>>>>> into elasticsearch
>>>>> * window aggregate using legacy window syntax and new window TVF
>>>>> * verified web ui and log output
>>>>> - reviewed the release PR
>>>>>
>>>>> I found the log contains some verbose information when using window
>>>>> aggregate,
>>>>> but I think this doesn't block the release, I created FLINK-22522 to
>> fix
>>>>> it.
>>>>>
>>>>> Best,
>>>>> Jark
>>>>>
>>>>>
>>>>> On Thu, 29 Apr 2021 at 14:46, Dawid Wysakowicz <[hidden email]
>>>>> wrote:
>>>>>
>>>>>> Hey Matthias,
>>>>>>
>>>>>> I'd like to double confirm what Guowei said. The dependency is Apache
>> 2
>>>>>> licensed and we do not bundle it in our jar (as it is in the runtime
>>>>>> scope) thus we do not need to mention it in the NOTICE file (btw, the
>>>>>> best way to check what is bundled is to check the output of maven
>> shade
>>>>>> plugin). Thanks for checking it!
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Dawid
>>>>>>
>>>>>> On 29/04/2021 05:25, Guowei Ma wrote:
>>>>>>> Hi, Matthias
>>>>>>>
>>>>>>> Thank you very much for your careful inspection.
>>>>>>> I check the flink-python_2.11-1.13.0.jar and we do not bundle
>>>>>>> org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it.
>>>>>>> So I think we may not need to add this to the NOTICE file. (BTW The
>>>> jar's
>>>>>>> scope is runtime)
>>>>>>>
>>>>>>> Best,
>>>>>>> Guowei
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl <
>> [hidden email]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Dawid and Guowei for managing this release.
>>>>>>>>
>>>>>>>> - downloaded the sources and binaries and checked the checksums
>>>>>>>> - built Flink from the downloaded sources
>>>>>>>> - executed example jobs with standalone deployments - I didn't find
>>>>>>>> anything suspicious in the logs
>>>>>>>> - reviewed release announcement pull request
>>>>>>>>
>>>>>>>> - I did a pass over dependency updates: git diff release-1.12.2
>>>>>>>> release-1.13.0-rc2 */*.xml
>>>>>>>> There's one thing someone should double-check whether that's suppose
>>>> to
>>>>>> be
>>>>>>>> like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as a
>>>>>>>> dependency but I don't see it being reflected in the NOTICE file of
>>>> the
>>>>>>>> flink-python module. Or is this automatically added later on?
>>>>>>>>
>>>>>>>> +1 (non-binding; please see remark on dependency above)
>>>>>>>>
>>>>>>>> Matthias
>>>>>>>>
>>>>>>>> On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen <[hidden email]>
>>>> wrote:
>>>>>>>>> Glad to hear that outcome. And no worries about the false alarm.
>>>>>>>>> Thank you for doing thorough testing, this is very helpful!
>>>>>>>>>
>>>>>>>>> On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>>> After the investigation we found that this issue is caused by the
>>>>>>>>>> implementation of connector, not by the Flink framework.
>>>>>>>>>>
>>>>>>>>>> Sorry for the false alarm.
>>>>>>>>>>
>>>>>>>>>> Stephan Ewen <[hidden email]> 于2021年4月28日周三 下午3:23写道:
>>>>>>>>>>
>>>>>>>>>>> @Caizhi and @Becket - let me reach out to you to jointly debug
>> this
>>>>>>>>>> issue.
>>>>>>>>>>> I am wondering if there is some incorrect reporting of failed
>>>> events?
>>>>>>>>>>> On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng <
>> [hidden email]>
>>>>>>>>>> wrote:
>>>>>>>>>>>> -1
>>>>>>>>>>>>
>>>>>>>>>>>> We're testing this version on batch jobs with large (600~1000)
>>>>>>>>>>> parallelisms
>>>>>>>>>>>> and the following exception messages appear with high frequency:
>>>>>>>>>>>>
>>>>>>>>>>>> 2021-04-27 21:27:26
>>>>>>>>>>>> org.apache.flink.util.FlinkException: An OperatorEvent from an
>>>>>>>>>>>> OperatorCoordinator to a task was lost. Triggering task failover
>>>> to
>>>>>>>>>>> ensure
>>>>>>>>>>>> consistency. Event: '[NoMoreSplitEvent]', targetTask: <task
>> name>
>>>> -
>>>>>>>>>>>> execution #0
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>>
>> org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>>
>> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>>
>> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>>
>> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>>
>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>>
>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>>
>> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>>
>> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158)
>>>>>>>>>>>> at akka.japi.pf
>> .UnitCaseStatement.apply(CaseStatements.scala:26)
>>>>>>>>>>>> at akka.japi.pf
>> .UnitCaseStatement.apply(CaseStatements.scala:21)
>>>>>>>>>>>> at
>>>>>>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>>>>>>>>>>>> at akka.japi.pf
>>>>>>>>> .UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
>>>>>>>>>>>> at
>>>>>>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
>>>>>>>>>>>> at
>>>>>>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>>>>>>>>>>>> at
>>>>>>>>> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>>>>>>>>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
>>>>>>>>>>>> at
>> akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
>>>>>>>>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
>>>>>>>>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
>>>>>>>>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
>>>>>>>>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
>>>>>>>>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
>>>>>>>>>>>> at
>>>>>>>> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>>
>> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>>>>>>>>> at
>>>> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>>>>>>>>> at
>>>>>>>>>>>>
>>>>>>>>>>>>
>> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>>>>>>>> Becket Qin is investigating this issue.
>>>>>>>>>>>>
>>>>>>
>>>>
>>


OpenPGP_signature (855 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Release 1.13.0, release candidate #2

David Anderson-3
In reply to this post by Robert Metzger
+1 (non-binding)

Checks:
- I built from source, successfully.
- I tested the new backpressure metrics and UI. I found one non-critical
bug that's been around for years, and for which a fix has already been
merged for 1.13.1 (https://issues.apache.org/jira/browse/FLINK-22489
<https://issues.apache.org/jira/browse/FLINK-22489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel>
).
- I tested flame graphs.

On Thu, Apr 29, 2021 at 2:17 PM Robert Metzger <[hidden email]> wrote:

> Thanks for creating the RC and managing the release process so far Guowei
> and Dawid!
>
> +1 (binding)
>
> Checks:
> - I deployed the RC on AWS EMR (session cluster and per-job cluster). I
> confirmed a minor issue Arvid Heise told me offline about:
> https://issues.apache.org/jira/browse/FLINK-22509. I believe we can ignore
> this issue.
> - I tested reactive mode extensively on Kubernetes, letting it scale up and
> down for a very long time (multiple weeks)
> - Checked the changes to the pom files: all dependency changes seem to be
> reflected properly in the NOTICE files
>   - netty bump to 4.1.46
>   - Elasticsearch to 1.15.1
>   - hbase dependency changes
>   - the new aws glue schema doesn't deploy anything foreign to maven
>   -flink-sql-connector-hbase-1.4 excludes fewer hbase classes, but seems
> fine
> - the license checker has not been changed in this release cycle (there are
> two exclusion lists in there)
>
>
>
>
>
> On Thu, Apr 29, 2021 at 12:05 PM Yun Tang <[hidden email]> wrote:
>
> > +1 (non-binding)
> >
> > - built from source code with scala 2.11 succeeded
> > - submit state machine example and it runed well with expected commit id
> > shown in UI.
> > - enable state latency tracking with slf4j metrics reporter and all
> > behaves as expected.
> > - Click 'FlameGraph' but found the we UI did not give friendly hint to
> > tell me enable it via setting rest.flamegraph.enabled: true, will create
> > issue later.
> >
> > Best
> > Yun Tang
> > ________________________________
> > From: Leonard Xu <[hidden email]>
> > Sent: Thursday, April 29, 2021 16:52
> > To: dev <[hidden email]>
> > Subject: Re: [VOTE] Release 1.13.0, release candidate #2
> >
> > +1 (non-binding)
> >
> > - verified signatures and hashes
> > - built from source code with scala 2.11 succeeded
> > - started a cluster, WebUI was accessible, ran some simple SQL jobs, no
> > suspicious log output
> > - tested time functions and time zone usage in SQL Client, the query
> > result is as expected
> > - the web PR looks good
> > - found one minor exception message typo, will improve it later
> >
> > Best,
> > Leonard Xu
> >
> > > 在 2021年4月29日,16:11,Xingbo Huang <[hidden email]> 写道:
> > >
> > > +1 (non-binding)
> > >
> > > - verified checksum and signature
> > > - test upload `apache-flink` and `apache-flink-libraries` to test.pypi
> > > - pip install `apache-flink-libraries` and `apache-flink` in mac os
> > > - started cluster and run row-based operation test
> > > - started cluster and test python general group window agg
> > >
> > > Best,
> > > Xingbo
> > >
> > > Dian Fu <[hidden email]> 于2021年4月29日周四 下午4:05写道:
> > >
> > >> +1 (binding)
> > >>
> > >> - Verified the signature and checksum
> > >> - Installed PyFlink successfully using the source package
> > >> - Run a few PyFlink examples: Python UDF, Pandas UDF, Python
> DataStream
> > >> API with state access, Python DataStream API with batch execution mode
> > >> - Reviewed the website PR
> > >>
> > >> Regards,
> > >> Dian
> > >>
> > >>> 2021年4月29日 下午3:11,Jark Wu <[hidden email]> 写道:
> > >>>
> > >>> +1 (binding)
> > >>>
> > >>> - checked/verified signatures and hashes
> > >>> - started cluster and run some e2e sql queries using SQL Client,
> > results
> > >>> are as expect:
> > >>> * read from kafka source, window aggregate, lookup mysql database,
> > write
> > >>> into elasticsearch
> > >>> * window aggregate using legacy window syntax and new window TVF
> > >>> * verified web ui and log output
> > >>> - reviewed the release PR
> > >>>
> > >>> I found the log contains some verbose information when using window
> > >>> aggregate,
> > >>> but I think this doesn't block the release, I created FLINK-22522 to
> > fix
> > >>> it.
> > >>>
> > >>> Best,
> > >>> Jark
> > >>>
> > >>>
> > >>> On Thu, 29 Apr 2021 at 14:46, Dawid Wysakowicz <
> [hidden email]
> > >
> > >>> wrote:
> > >>>
> > >>>> Hey Matthias,
> > >>>>
> > >>>> I'd like to double confirm what Guowei said. The dependency is
> Apache
> > 2
> > >>>> licensed and we do not bundle it in our jar (as it is in the runtime
> > >>>> scope) thus we do not need to mention it in the NOTICE file (btw,
> the
> > >>>> best way to check what is bundled is to check the output of maven
> > shade
> > >>>> plugin). Thanks for checking it!
> > >>>>
> > >>>> Best,
> > >>>>
> > >>>> Dawid
> > >>>>
> > >>>> On 29/04/2021 05:25, Guowei Ma wrote:
> > >>>>> Hi, Matthias
> > >>>>>
> > >>>>> Thank you very much for your careful inspection.
> > >>>>> I check the flink-python_2.11-1.13.0.jar and we do not bundle
> > >>>>> org.conscrypt:conscrypt-openjdk-uber:2.5.1 to it.
> > >>>>> So I think we may not need to add this to the NOTICE file. (BTW The
> > >> jar's
> > >>>>> scope is runtime)
> > >>>>>
> > >>>>> Best,
> > >>>>> Guowei
> > >>>>>
> > >>>>>
> > >>>>> On Thu, Apr 29, 2021 at 2:33 AM Matthias Pohl <
> > [hidden email]>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Thanks Dawid and Guowei for managing this release.
> > >>>>>>
> > >>>>>> - downloaded the sources and binaries and checked the checksums
> > >>>>>> - built Flink from the downloaded sources
> > >>>>>> - executed example jobs with standalone deployments - I didn't
> find
> > >>>>>> anything suspicious in the logs
> > >>>>>> - reviewed release announcement pull request
> > >>>>>>
> > >>>>>> - I did a pass over dependency updates: git diff release-1.12.2
> > >>>>>> release-1.13.0-rc2 */*.xml
> > >>>>>> There's one thing someone should double-check whether that's
> suppose
> > >> to
> > >>>> be
> > >>>>>> like that: We added org.conscrypt:conscrypt-openjdk-uber:2.5.1 as
> a
> > >>>>>> dependency but I don't see it being reflected in the NOTICE file
> of
> > >> the
> > >>>>>> flink-python module. Or is this automatically added later on?
> > >>>>>>
> > >>>>>> +1 (non-binding; please see remark on dependency above)
> > >>>>>>
> > >>>>>> Matthias
> > >>>>>>
> > >>>>>> On Wed, Apr 28, 2021 at 1:52 PM Stephan Ewen <[hidden email]>
> > >> wrote:
> > >>>>>>
> > >>>>>>> Glad to hear that outcome. And no worries about the false alarm.
> > >>>>>>> Thank you for doing thorough testing, this is very helpful!
> > >>>>>>>
> > >>>>>>> On Wed, Apr 28, 2021 at 1:04 PM Caizhi Weng <
> [hidden email]>
> > >>>>>> wrote:
> > >>>>>>>> After the investigation we found that this issue is caused by
> the
> > >>>>>>>> implementation of connector, not by the Flink framework.
> > >>>>>>>>
> > >>>>>>>> Sorry for the false alarm.
> > >>>>>>>>
> > >>>>>>>> Stephan Ewen <[hidden email]> 于2021年4月28日周三 下午3:23写道:
> > >>>>>>>>
> > >>>>>>>>> @Caizhi and @Becket - let me reach out to you to jointly debug
> > this
> > >>>>>>>> issue.
> > >>>>>>>>> I am wondering if there is some incorrect reporting of failed
> > >> events?
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Apr 28, 2021 at 8:53 AM Caizhi Weng <
> > [hidden email]>
> > >>>>>>>> wrote:
> > >>>>>>>>>> -1
> > >>>>>>>>>>
> > >>>>>>>>>> We're testing this version on batch jobs with large (600~1000)
> > >>>>>>>>> parallelisms
> > >>>>>>>>>> and the following exception messages appear with high
> frequency:
> > >>>>>>>>>>
> > >>>>>>>>>> 2021-04-27 21:27:26
> > >>>>>>>>>> org.apache.flink.util.FlinkException: An OperatorEvent from an
> > >>>>>>>>>> OperatorCoordinator to a task was lost. Triggering task
> failover
> > >> to
> > >>>>>>>>> ensure
> > >>>>>>>>>> consistency. Event: '[NoMoreSplitEvent]', targetTask: <task
> > name>
> > >> -
> > >>>>>>>>>> execution #0
> > >>>>>>>>>> at
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>
> > >>>>
> > >>
> >
> org.apache.flink.runtime.operators.coordination.SubtaskGatewayImpl.lambda$sendEvent$0(SubtaskGatewayImpl.java:81)
> > >>>>>>>>>> at
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>
> > >>>>
> > >>
> >
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
> > >>>>>>>>>> at
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>
> > >>>>
> > >>
> >
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
> > >>>>>>>>>> at
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>
> > >>>>
> > >>
> >
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
> > >>>>>>>>>> at
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>
> > >>>>
> > >>
> >
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440)
> > >>>>>>>>>> at
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>
> > >>>>
> > >>
> >
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208)
> > >>>>>>>>>> at
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>
> > >>>>
> > >>
> >
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77)
> > >>>>>>>>>> at
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>
> > >>>>
> > >>
> >
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158)
> > >>>>>>>>>> at akka.japi.pf
> > .UnitCaseStatement.apply(CaseStatements.scala:26)
> > >>>>>>>>>> at akka.japi.pf
> > .UnitCaseStatement.apply(CaseStatements.scala:21)
> > >>>>>>>>>> at
> > >>>>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
> > >>>>>>>>>> at akka.japi.pf
> > >>>>>>> .UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> > >>>>>>>>>> at
> > >>>>>>>
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
> > >>>>>>>>>> at
> > >>>>>>>
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> > >>>>>>>>>> at
> > >>>>>>>
> scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> > >>>>>>>>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
> > >>>>>>>>>> at
> > akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> > >>>>>>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> > >>>>>>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
> > >>>>>>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> > >>>>>>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
> > >>>>>>>>>> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
> > >>>>>>>>>> at
> > >>>>>> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> > >>>>>>>>>> at
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>
> > >>>>
> > >>
> >
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> > >>>>>>>>>> at
> > >>>>>>>>
> > >> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> > >>>>>>>>>> at
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>
> > >>>>
> > >>
> >
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> > >>>>>>>>>> Becket Qin is investigating this issue.
> > >>>>>>>>>>
> > >>>>
> > >>>>
> > >>
> > >>
> >
> >
>
12