[jira] [Created] (FLINK-4141) TaskManager failures not always recover when killed during an ApplicationMaster failure in HA mode on Yarn

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-4141) TaskManager failures not always recover when killed during an ApplicationMaster failure in HA mode on Yarn

Shang Yuanchun (Jira)
Stefan Richter created FLINK-4141:
-------------------------------------

             Summary: TaskManager failures not always recover when killed during an ApplicationMaster failure in HA mode on Yarn
                 Key: FLINK-4141
                 URL: https://issues.apache.org/jira/browse/FLINK-4141
             Project: Flink
          Issue Type: Bug
    Affects Versions: 1.0.3
            Reporter: Stefan Richter


High availability on Yarn often fails to recover in the following test scenario:

1. Kill application master process.
2. Then, while application master is recovering, randomly kill several task managers (with some delay).

After the application master recovered, not all the killed task manager are brought back and no further attempts are made the restart them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)