[jira] [Created] (FLINK-14331) Reset vertices right after they transition to terminated states

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-14331) Reset vertices right after they transition to terminated states

Shang Yuanchun (Jira)
Zhu Zhu created FLINK-14331:
-------------------------------

             Summary: Reset vertices right after they transition to terminated states
                 Key: FLINK-14331
                 URL: https://issues.apache.org/jira/browse/FLINK-14331
             Project: Flink
          Issue Type: Sub-task
          Components: Runtime / Coordination
    Affects Versions: 1.10.0
            Reporter: Zhu Zhu
             Fix For: 1.10.0


Currently in DefaultScheduler, tasks to restart will remain in terminated state until they are re-scheduled by the SchedulingStrategy.
This behavior may cause 2 problems:
1. Failed/Canceled tasks are possibly not be able to be restarted in lazy scheduling. e.g. The job A1--pipelined-->B1 fails. And only A1 will be re-scheduled on restartTasks() since the inputs of B1 are not ready. B1 should be scheduled later on the partition consumable event from restarted A1. But the terminal state of B1 will prevent B1 from being scheduled.
2. Keeping a task in FAILED/CANCELED state for a long time can happen if it takes a long time for its inputs to become ready again. This is also not friendly to users, which may cause confusions.

That's why I'd propose to reset vertices right after they transition to terminated states.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)