Arvid Heise created FLINK-19391:
----------------------------------- Summary: Deadlock during partition update Key: FLINK-19391 URL: https://issues.apache.org/jira/browse/FLINK-19391 Project: Flink Issue Type: Bug Components: Runtime / Network Affects Versions: 1.12.0 Reporter: Arvid Heise Assignee: Arvid Heise Master cron job is currently failing because of a deadlock introduced in FLINK-19026. {noformat} 2020-09-23T21:50:39.2444176Z Found one Java-level deadlock: 2020-09-23T21:50:39.2444633Z ============================= 2020-09-23T21:50:39.2445001Z "Temp writer": 2020-09-23T21:50:39.2445484Z waiting to lock monitor 0x00007f4e14004ca8 (object 0x0000000086501948, a java.lang.Object), 2020-09-23T21:50:39.2446418Z which is held by "flink-akka.actor.default-dispatcher-2" 2020-09-23T21:50:39.2447193Z "flink-akka.actor.default-dispatcher-2": 2020-09-23T21:50:39.2447903Z waiting to lock monitor 0x00007f4e14004bf8 (object 0x0000000086501930, a org.apache.flink.runtime.io.network.partition.PrioritizedDeque), 2020-09-23T21:50:39.2448703Z which is held by "Temp writer" 2020-09-23T21:50:39.2448965Z 2020-09-23T21:50:39.2449384Z Java stack information for the threads listed above: 2020-09-23T21:50:39.2449900Z =================================================== 2020-09-23T21:50:39.2450325Z "Temp writer": 2020-09-23T21:50:39.2451050Z at org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.checkAndWaitForSubpartitionView(LocalInputChannel.java:244) 2020-09-23T21:50:39.2452264Z - waiting to lock <0x0000000086501948> (a java.lang.Object) 2020-09-23T21:50:39.2453183Z at org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.getNextBuffer(LocalInputChannel.java:205) 2020-09-23T21:50:39.2454173Z at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.waitAndGetNextData(SingleInputGate.java:642) 2020-09-23T21:50:39.2455422Z - locked <0x0000000086501930> (a org.apache.flink.runtime.io.network.partition.PrioritizedDeque) 2020-09-23T21:50:39.2456310Z at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:619) 2020-09-23T21:50:39.2457311Z at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNext(SingleInputGate.java:602) 2020-09-23T21:50:39.2458205Z at org.apache.flink.runtime.taskmanager.InputGateWithMetrics.getNext(InputGateWithMetrics.java:105) 2020-09-23T21:50:39.2459258Z at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:100) 2020-09-23T21:50:39.2460465Z at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47) 2020-09-23T21:50:39.2461344Z at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:59) 2020-09-23T21:50:39.2462164Z at org.apache.flink.runtime.operators.TempBarrier$TempWritingThread.run(TempBarrier.java:178) 2020-09-23T21:50:39.2463418Z "flink-akka.actor.default-dispatcher-2": 2020-09-23T21:50:39.2464109Z at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.queueChannel(SingleInputGate.java:825) 2020-09-23T21:50:39.2465336Z - waiting to lock <0x0000000086501930> (a org.apache.flink.runtime.io.network.partition.PrioritizedDeque) 2020-09-23T21:50:39.2466228Z at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.notifyChannelNonEmpty(SingleInputGate.java:791) 2020-09-23T21:50:39.2467222Z at org.apache.flink.runtime.io.network.partition.consumer.InputChannel.notifyChannelNonEmpty(InputChannel.java:154) 2020-09-23T21:50:39.2468212Z at org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.notifyDataAvailable(LocalInputChannel.java:236) 2020-09-23T21:50:39.2469577Z at org.apache.flink.runtime.io.network.partition.ResultPartitionManager.createSubpartitionView(ResultPartitionManager.java:76) 2020-09-23T21:50:39.2470607Z at org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.requestSubpartition(LocalInputChannel.java:133) 2020-09-23T21:50:39.2471765Z - locked <0x0000000086501948> (a java.lang.Object) 2020-09-23T21:50:39.2472685Z at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.updateInputChannel(SingleInputGate.java:489) 2020-09-23T21:50:39.2473727Z - locked <0x0000000086532500> (a java.lang.Object) 2020-09-23T21:50:39.2474449Z at org.apache.flink.runtime.io.network.NettyShuffleEnvironment.updatePartitionInfo(NettyShuffleEnvironment.java:279) 2020-09-23T21:50:39.2475394Z at org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$updatePartitions$12(TaskExecutor.java:758) 2020-09-23T21:50:39.2476235Z at org.apache.flink.runtime.taskexecutor.TaskExecutor$$Lambda$406/1860601696.run(Unknown Source) 2020-09-23T21:50:39.2476973Z at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640) 2020-09-23T21:50:39.2477714Z at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) 2020-09-23T21:50:39.2478698Z at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) 2020-09-23T21:50:39.2479506Z at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) 2020-09-23T21:50:39.2480263Z at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) 2020-09-23T21:50:39.2481018Z at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) 2020-09-23T21:50:39.2481727Z at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 2020-09-23T21:50:39.2482192Z {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |