Zhu Zhu created FLINK-14331:
-------------------------------
Summary: Reset vertices right after they transition to terminated states
Key: FLINK-14331
URL:
https://issues.apache.org/jira/browse/FLINK-14331 Project: Flink
Issue Type: Sub-task
Components: Runtime / Coordination
Affects Versions: 1.10.0
Reporter: Zhu Zhu
Fix For: 1.10.0
Currently in DefaultScheduler, tasks to restart will remain in terminated state until they are re-scheduled by the SchedulingStrategy.
This behavior may cause 2 problems:
1. Failed/Canceled tasks are possibly not be able to be restarted in lazy scheduling. e.g. The job A1--pipelined-->B1 fails. And only A1 will be re-scheduled on restartTasks() since the inputs of B1 are not ready. B1 should be scheduled later on the partition consumable event from restarted A1. But the terminal state of B1 will prevent B1 from being scheduled.
2. Keeping a task in FAILED/CANCELED state for a long time can happen if it takes a long time for its inputs to become ready again. This is also not friendly to users, which may cause confusions.
That's why I'd propose to reset vertices right after they transition to terminated states.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)