Robert Metzger created FLINK-20005:
-------------------------------------- Summary: "Kerberized YARN application" test unstable Key: FLINK-20005 URL: https://issues.apache.org/jira/browse/FLINK-20005 Project: Flink Issue Type: Bug Components: Deployment / YARN, Runtime / Coordination Affects Versions: 1.12.0 Reporter: Robert Metzger https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=9066&view=logs&j=c88eea3b-64a0-564d-0031-9fdcd7b8abee&t=ff888d9b-cd34-53cc-d90f-3e446d355529 The {{Running Kerberized YARN application on Docker test (default input)}} is failing. These are some exceptions spotted in the logs: {code} 2020-11-05T14:22:29.3315695Z Nov 05 14:22:29 2020-11-05 14:21:52,696 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Flat Map (2/3) (7806b7a7074425c5ff0906befd94e122) switched from SCHEDULED to FAILED on not deployed. 2020-11-05T14:22:29.3318307Z Nov 05 14:22:29 java.util.concurrent.CompletionException: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Slot request bulk is not fulfillable! Could not allocate the required slot within slot request timeout 2020-11-05T14:22:29.3320512Z Nov 05 14:22:29 at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_272] 2020-11-05T14:22:29.3322173Z Nov 05 14:22:29 at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_272] 2020-11-05T14:22:29.3323809Z Nov 05 14:22:29 at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607) ~[?:1.8.0_272] 2020-11-05T14:22:29.3325448Z Nov 05 14:22:29 at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591) ~[?:1.8.0_272] 2020-11-05T14:22:29.3331094Z Nov 05 14:22:29 at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_272] 2020-11-05T14:22:29.3332769Z Nov 05 14:22:29 at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_272] 2020-11-05T14:22:29.3335736Z Nov 05 14:22:29 at org.apache.flink.runtime.scheduler.SharedSlot.cancelLogicalSlotRequest(SharedSlot.java:195) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3342621Z Nov 05 14:22:29 at org.apache.flink.runtime.scheduler.SlotSharingExecutionSlotAllocator.cancelLogicalSlotRequest(SlotSharingExecutionSlotAllocator.java:147) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3348463Z Nov 05 14:22:29 at org.apache.flink.runtime.scheduler.SharingPhysicalSlotRequestBulk.cancel(SharingPhysicalSlotRequestBulk.java:84) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3353749Z Nov 05 14:22:29 at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkWithTimestamp.cancel(PhysicalSlotRequestBulkWithTimestamp.java:66) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3362495Z Nov 05 14:22:29 at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:87) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3366937Z Nov 05 14:22:29 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_272] 2020-11-05T14:22:29.3370686Z Nov 05 14:22:29 at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_272] 2020-11-05T14:22:29.3380715Z Nov 05 14:22:29 at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:404) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3384436Z Nov 05 14:22:29 at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:197) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3387431Z Nov 05 14:22:29 at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3390333Z Nov 05 14:22:29 at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3392937Z Nov 05 14:22:29 at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3395430Z Nov 05 14:22:29 at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3397949Z Nov 05 14:22:29 at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3401799Z Nov 05 14:22:29 at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3449637Z Nov 05 14:22:29 at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3452289Z Nov 05 14:22:29 at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3454833Z Nov 05 14:22:29 at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3458801Z Nov 05 14:22:29 at akka.actor.Actor$class.aroundReceive(Actor.scala:517) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3469564Z Nov 05 14:22:29 at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3472736Z Nov 05 14:22:29 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3475094Z Nov 05 14:22:29 at akka.actor.ActorCell.invoke(ActorCell.scala:561) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3478753Z Nov 05 14:22:29 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3497848Z Nov 05 14:22:29 at akka.dispatch.Mailbox.run(Mailbox.scala:225) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3516200Z Nov 05 14:22:29 at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3519594Z Nov 05 14:22:29 at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3522331Z Nov 05 14:22:29 at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3524990Z Nov 05 14:22:29 at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3528102Z Nov 05 14:22:29 at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3530334Z Nov 05 14:22:29 Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Slot request bulk is not fulfillable! Could not allocate the required slot within slot request timeout 2020-11-05T14:22:29.3534080Z Nov 05 14:22:29 at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:84) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3536451Z Nov 05 14:22:29 ... 24 more 2020-11-05T14:22:29.3537535Z Nov 05 14:22:29 Caused by: java.util.concurrent.TimeoutException: Timeout has occurred: 120000 ms 2020-11-05T14:22:29.3540969Z Nov 05 14:22:29 at org.apache.flink.runtime.jobmaster.slotpool.PhysicalSlotRequestBulkCheckerImpl.lambda$schedulePendingRequestBulkWithTimestampCheck$0(PhysicalSlotRequestBulkCheckerImpl.java:84) ~[flink-dist_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT] 2020-11-05T14:22:29.3542868Z Nov 05 14:22:29 ... 24 more {code} {code} 2020-11-05T14:22:14.3964651Z Nov 05 14:22:13 20/11/05 14:21:55 INFO rmapp.RMAppImpl: application_1604585664395_0001 State change from RUNNING to FINAL_SAVING on event=ATTEMPT_FAILED 2020-11-05T14:22:14.3966539Z Nov 05 14:22:13 20/11/05 14:21:55 INFO recovery.RMStateStore: Updating info for app: application_1604585664395_0001 2020-11-05T14:22:14.3968255Z Nov 05 14:22:13 20/11/05 14:21:55 INFO capacity.CapacityScheduler: Application Attempt appattempt_1604585664395_0001_000001 is done. finalState=FAILED 2020-11-05T14:22:14.3970618Z Nov 05 14:22:13 20/11/05 14:21:55 INFO rmapp.RMAppImpl: Application application_1604585664395_0001 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1604585664395_0001_000001 exited with exitCode: 2 2020-11-05T14:22:14.3973331Z Nov 05 14:22:13 Failing this attempt.Diagnostics: Exception from container-launch. 2020-11-05T14:22:14.3974475Z Nov 05 14:22:13 Container id: container_1604585664395_0001_01_000001 2020-11-05T14:22:14.3975384Z Nov 05 14:22:13 Exit code: 2 2020-11-05T14:22:14.3976946Z Nov 05 14:22:13 Stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Launch container failed 2020-11-05T14:22:14.3979115Z Nov 05 14:22:13 at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:112) 2020-11-05T14:22:14.3981642Z Nov 05 14:22:13 at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:130) 2020-11-05T14:22:14.3983756Z Nov 05 14:22:13 at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:395) 2020-11-05T14:22:14.3985627Z Nov 05 14:22:13 at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299) 2020-11-05T14:22:14.3987444Z Nov 05 14:22:13 at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) 2020-11-05T14:22:14.3989017Z Nov 05 14:22:13 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2020-11-05T14:22:14.3990393Z Nov 05 14:22:13 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 2020-11-05T14:22:14.3991866Z Nov 05 14:22:13 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 2020-11-05T14:22:14.3993133Z Nov 05 14:22:13 at java.lang.Thread.run(Thread.java:748) 2020-11-05T14:22:14.3993947Z Nov 05 14:22:13 2020-11-05T14:22:14.3994706Z Nov 05 14:22:13 Shell output: main : command provided 1 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |