[jira] [Created] (FLINK-19391) Deadlock during partition update

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-19391) Deadlock during partition update

Shang Yuanchun (Jira)
Arvid Heise created FLINK-19391:
-----------------------------------

             Summary: Deadlock during partition update
                 Key: FLINK-19391
                 URL: https://issues.apache.org/jira/browse/FLINK-19391
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Network
    Affects Versions: 1.12.0
            Reporter: Arvid Heise
            Assignee: Arvid Heise


Master cron job is currently failing because of a deadlock introduced in FLINK-19026.
{noformat}
2020-09-23T21:50:39.2444176Z Found one Java-level deadlock:
2020-09-23T21:50:39.2444633Z =============================
2020-09-23T21:50:39.2445001Z "Temp writer":
2020-09-23T21:50:39.2445484Z   waiting to lock monitor 0x00007f4e14004ca8 (object 0x0000000086501948, a java.lang.Object),
2020-09-23T21:50:39.2446418Z   which is held by "flink-akka.actor.default-dispatcher-2"
2020-09-23T21:50:39.2447193Z "flink-akka.actor.default-dispatcher-2":
2020-09-23T21:50:39.2447903Z   waiting to lock monitor 0x00007f4e14004bf8 (object 0x0000000086501930, a org.apache.flink.runtime.io.network.partition.PrioritizedDeque),
2020-09-23T21:50:39.2448703Z   which is held by "Temp writer"
2020-09-23T21:50:39.2448965Z
2020-09-23T21:50:39.2449384Z Java stack information for the threads listed above:
2020-09-23T21:50:39.2449900Z ===================================================
2020-09-23T21:50:39.2450325Z "Temp writer":
2020-09-23T21:50:39.2451050Z at org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.checkAndWaitForSubpartitionView(LocalInputChannel.java:244)
2020-09-23T21:50:39.2452264Z - waiting to lock <0x0000000086501948> (a java.lang.Object)
2020-09-23T21:50:39.2453183Z at org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.getNextBuffer(LocalInputChannel.java:205)
2020-09-23T21:50:39.2454173Z at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.waitAndGetNextData(SingleInputGate.java:642)
2020-09-23T21:50:39.2455422Z - locked <0x0000000086501930> (a org.apache.flink.runtime.io.network.partition.PrioritizedDeque)
2020-09-23T21:50:39.2456310Z at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:619)
2020-09-23T21:50:39.2457311Z at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNext(SingleInputGate.java:602)
2020-09-23T21:50:39.2458205Z at org.apache.flink.runtime.taskmanager.InputGateWithMetrics.getNext(InputGateWithMetrics.java:105)
2020-09-23T21:50:39.2459258Z at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:100)
2020-09-23T21:50:39.2460465Z at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:47)
2020-09-23T21:50:39.2461344Z at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:59)
2020-09-23T21:50:39.2462164Z at org.apache.flink.runtime.operators.TempBarrier$TempWritingThread.run(TempBarrier.java:178)
2020-09-23T21:50:39.2463418Z "flink-akka.actor.default-dispatcher-2":
2020-09-23T21:50:39.2464109Z at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.queueChannel(SingleInputGate.java:825)
2020-09-23T21:50:39.2465336Z - waiting to lock <0x0000000086501930> (a org.apache.flink.runtime.io.network.partition.PrioritizedDeque)
2020-09-23T21:50:39.2466228Z at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.notifyChannelNonEmpty(SingleInputGate.java:791)
2020-09-23T21:50:39.2467222Z at org.apache.flink.runtime.io.network.partition.consumer.InputChannel.notifyChannelNonEmpty(InputChannel.java:154)
2020-09-23T21:50:39.2468212Z at org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.notifyDataAvailable(LocalInputChannel.java:236)
2020-09-23T21:50:39.2469577Z at org.apache.flink.runtime.io.network.partition.ResultPartitionManager.createSubpartitionView(ResultPartitionManager.java:76)
2020-09-23T21:50:39.2470607Z at org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.requestSubpartition(LocalInputChannel.java:133)
2020-09-23T21:50:39.2471765Z - locked <0x0000000086501948> (a java.lang.Object)
2020-09-23T21:50:39.2472685Z at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.updateInputChannel(SingleInputGate.java:489)
2020-09-23T21:50:39.2473727Z - locked <0x0000000086532500> (a java.lang.Object)
2020-09-23T21:50:39.2474449Z at org.apache.flink.runtime.io.network.NettyShuffleEnvironment.updatePartitionInfo(NettyShuffleEnvironment.java:279)
2020-09-23T21:50:39.2475394Z at org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$updatePartitions$12(TaskExecutor.java:758)
2020-09-23T21:50:39.2476235Z at org.apache.flink.runtime.taskexecutor.TaskExecutor$$Lambda$406/1860601696.run(Unknown Source)
2020-09-23T21:50:39.2476973Z at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
2020-09-23T21:50:39.2477714Z at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
2020-09-23T21:50:39.2478698Z at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)
2020-09-23T21:50:39.2479506Z at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
2020-09-23T21:50:39.2480263Z at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
2020-09-23T21:50:39.2481018Z at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
2020-09-23T21:50:39.2481727Z at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
2020-09-23T21:50:39.2482192Z
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)