[jira] [Created] (FLINK-17726) Scheduler should take care of tasks directly canceled by TaskManager

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-17726) Scheduler should take care of tasks directly canceled by TaskManager

Shang Yuanchun (Jira)
Zhu Zhu created FLINK-17726:
-------------------------------

             Summary: Scheduler should take care of tasks directly canceled by TaskManager
                 Key: FLINK-17726
                 URL: https://issues.apache.org/jira/browse/FLINK-17726
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Coordination
    Affects Versions: 1.12.0
            Reporter: Zhu Zhu
             Fix For: 1.12.0


JobManager will not trigger failure handling when receiving CANCELED task update.
This is because CANCELED tasks are usually caused by another FAILED task. These CANCELED tasks will be restarted by the failover process triggered  FAILED task.

However, if a task is directly CANCELED by TaskManager due to its own runtime issue, the task will not be recovered by JM and thus the job would hang.
This is a potential issue and we should avoid it.

A possible solution is to let JobManager treat tasks transitioning to CANCELED from all states except from CANCELING as failed tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)