[jira] [Created] (FLINK-17341) freeSlot in TaskExecutor.closeJobManagerConnection cause ConcurrentModificationException

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-17341) freeSlot in TaskExecutor.closeJobManagerConnection cause ConcurrentModificationException

Shang Yuanchun (Jira)
huweihua created FLINK-17341:
--------------------------------

             Summary: freeSlot in TaskExecutor.closeJobManagerConnection cause ConcurrentModificationException
                 Key: FLINK-17341
                 URL: https://issues.apache.org/jira/browse/FLINK-17341
             Project: Flink
          Issue Type: Bug
            Reporter: huweihua


TaskExecutor may freeSlot when closeJobManagerConnection. freeSlot will modify the TaskSlotTable.slotsPerJob. this modify will cause ConcurrentModificationException.
{code:java}
Iterator<AllocationID> activeSlots = taskSlotTable.getActiveSlots(jobId);

final FlinkException freeingCause = new FlinkException("Slot could not be marked inactive.");

while (activeSlots.hasNext()) {
 AllocationID activeSlot = activeSlots.next();

 try {
 if (!taskSlotTable.markSlotInactive(activeSlot, taskManagerConfiguration.getTimeout())) {
 freeSlotInternal(activeSlot, freeingCause);
 }
 } catch (SlotNotFoundException e) {
 log.debug("Could not mark the slot {} inactive.", jobId, e);
 }
}
{code}
 error log:
{code:java}
2020-04-21 23:37:11,363 ERROR org.apache.flink.runtime.rpc.akka.AkkaRpcActor                - Caught exception while executing runnable in main thread.
java.util.ConcurrentModificationException
    at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437)
    at java.util.HashMap$KeyIterator.next(HashMap.java:1461)
    at org.apache.flink.runtime.taskexecutor.slot.TaskSlotTable$TaskSlotIterator.hasNext(TaskSlotTable.java:698)
    at org.apache.flink.runtime.taskexecutor.slot.TaskSlotTable$AllocationIDIterator.hasNext(TaskSlotTable.java:652)
    at org.apache.flink.runtime.taskexecutor.TaskExecutor.closeJobManagerConnection(TaskExecutor.java:1314)
    at org.apache.flink.runtime.taskexecutor.TaskExecutor.access$1300(TaskExecutor.java:149)
    at org.apache.flink.runtime.taskexecutor.TaskExecutor$JobLeaderListenerImpl.lambda$jobManagerLostLeadership$1(TaskExecutor.java:1726)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190)
    at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152)
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
    at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
    at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)