[jira] [Created] (FLINK-5064) Checkpoint messages are not scoped to the leader session ID

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-5064) Checkpoint messages are not scoped to the leader session ID

Shang Yuanchun (Jira)
Till Rohrmann created FLINK-5064:
------------------------------------

             Summary: Checkpoint messages are not scoped to the leader session ID
                 Key: FLINK-5064
                 URL: https://issues.apache.org/jira/browse/FLINK-5064
             Project: Flink
          Issue Type: Bug
          Components: State Backends, Checkpointing
    Affects Versions: 1.1.3, 1.2.0
            Reporter: Till Rohrmann
             Fix For: 1.2.0


The checkpoint messages ({{AbstractCheckpointMessage}}) don't implement the {{RequiresLeaderSessionID}} interface. Thus, they are not scoped to the leadership of a {{JobManager}} and can interfere with a new leader session.

The downside of scoping the checkpoint messages to the leader id is that messages might get filtered out leading to resource leaks because the contained state handle is never discarded. However, in case of a JM failure one might end up in the same situation if there were some checkpoint messages in flight.

In order to mitigate the problem one could change the behaviour such that the {{CheckpointResponder}} awaits a response back and in case of a negative response or an outstanding response (timeout) it discards the state handle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)