Roman Khachatryan created FLINK-19113:
-----------------------------------------
Summary: Add support for checkpointing with selectable inputs
Key: FLINK-19113
URL:
https://issues.apache.org/jira/browse/FLINK-19113 Project: Flink
Issue Type: Improvement
Components: Runtime / Checkpointing
Affects Versions: 1.12.0
Reporter: Roman Khachatryan
Currently, there is a validation in StreamingJobGraphGenerator that fails if an operator implements InputSelectable and checkpointing is enabled.
One issue when removing this validation is that with Unaligned checkpoints recovery would deadlock if there are records spanning multiple buffers.
Consider the following case:
- Two {{IntputGate}}s
- Input selection is not ALL (say FIRST initially)
- Unaligned Checkpoints ON
- on recovery, there are "parts" of records in all channels (actually 1 is enough I think)
On recovery,
1. {{StreamTask}} initiates recovery and scedule partition request upon it's end
2. All gates and channels will *receive* buffers from {{StateReader}}
3. All channels of a *single* gate will *consume* those state buffers - completing that gate's {{StateConsumedFuture}}
4. {{InputProcessor}} will return {{NOTHING_AVAILABLE}} (see {{StreamTwoInputProcessor.getInputStatus}})
5. {{StreamTask}} will suspend its default action
6. State of the 2nd gate won't be consumed - so its {{StateConsumedFuture}}s won't be completed - so no partitions will be requested (edited)
A simple solution is to request partitions for each channel independently.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)