JIN SUN created FLINK-10298:
-------------------------------
Summary: Batch Job Failover Strategy
Key: FLINK-10298
URL:
https://issues.apache.org/jira/browse/FLINK-10298 Project: Flink
Issue Type: Sub-task
Components: JobManager
Reporter: JIN SUN
Assignee: JIN SUN
The new failover strategy needs to consider handling failures according to different failure types. It orchestrates all the logics we mentioned in this [document|
https://docs.google.com/document/d/1FdZdcA63tPUEewcCimTFy9Iz2jlVlMRANZkO4RngIuk/edit#], we can put the logic in onTaskFailure method of the FailoverStrategy interface, with the logic inline:
{code:java}
public void onTaskFailure(Execution taskExecution, Throwable cause) {
//1. Get the throwable type
//2. If the type is NonrecoverableType fail the job
//3. If the type is PatritionDataMissingError, do revocation
//4. If the type is EnvironmentError, do check blacklist
//5. Other failure types are recoverable, but we need to remember the count of the failure,
if it exceeds the threshold, fail the job
}{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)