Stefan Richter created FLINK-4141:
-------------------------------------
Summary: TaskManager failures not always recover when killed during an ApplicationMaster failure in HA mode on Yarn
Key: FLINK-4141
URL:
https://issues.apache.org/jira/browse/FLINK-4141 Project: Flink
Issue Type: Bug
Affects Versions: 1.0.3
Reporter: Stefan Richter
High availability on Yarn often fails to recover in the following test scenario:
1. Kill application master process.
2. Then, while application master is recovering, randomly kill several task managers (with some delay).
After the application master recovered, not all the killed task manager are brought back and no further attempts are made the restart them.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)