[jira] [Created] (FLINK-1376) SubSlots are not properly released in case that a TaskManager fatally fails, leaving the system in a corrupted state

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-1376) SubSlots are not properly released in case that a TaskManager fatally fails, leaving the system in a corrupted state

Shang Yuanchun (Jira)
Till Rohrmann created FLINK-1376:
------------------------------------

             Summary: SubSlots are not properly released in case that a TaskManager fatally fails, leaving the system in a corrupted state
                 Key: FLINK-1376
                 URL: https://issues.apache.org/jira/browse/FLINK-1376
             Project: Flink
          Issue Type: Bug
            Reporter: Till Rohrmann


In case that the TaskManager fatally fails and some of the failing node's slots are SharedSlots, then the slots are not properly released by the JobManager. This causes that the corresponding job will not be properly failed, leaving the system in a corrupted state.

The reason for that is that the AllocatedSlot is not aware of being treated as a SharedSlot and thus he cannot release the associated SubSlots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)