[jira] [Created] (FLINK-13055) Leverage JM side partition state to improve region failover experience

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-13055) Leverage JM side partition state to improve region failover experience

Shang Yuanchun (Jira)
Zhu Zhu created FLINK-13055:
-------------------------------

             Summary: Leverage JM side partition state to improve region failover experience
                 Key: FLINK-13055
                 URL: https://issues.apache.org/jira/browse/FLINK-13055
             Project: Flink
          Issue Type: Sub-task
          Components: Runtime / Coordination
    Affects Versions: 1.8.1
            Reporter: Zhu Zhu
            Assignee: Zhu Zhu


In current region failover process, most of the input result partition states are unknown. Even though the failure cause is a PartitionException, only one unhealthy partition can be identified.

The may lead to multiple unsuccessful failovers before all the unhealthy but needed partitions are identified and their producers are involved in the failover as well.

Using JM side tracked partition states to help the region failover to identify unhealthy(missing) partitions earlier can help with this case.

The basic idea is to build RestartPipelinedRegionStrategy with a ResultPartitionAvailabilityChecker which can query the JM side tracked partition states.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)