Roman Khachatryan created FLINK-19385:
-----------------------------------------
Summary: Channel recovery may deadlock
Key: FLINK-19385
URL:
https://issues.apache.org/jira/browse/FLINK-19385 Project: Flink
Issue Type: Bug
Components: Runtime / Network, Runtime / Task
Affects Versions: 1.11.2, 1.12.0
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan
Fix For: 1.12.0
Consider the following case:
* Two IntputGates
* Input selection is not ALL (say FIRST initially)
* Unaligned Checkpoints ON
* on recovery, there are "parts" of records in all channels (actually 1 is enough I think)
What happens:
# StreamTask initiates recovery and scedule partition request upon it's end
# All gates and channels will receive buffers from StateReader
# All channels of a single gate will consume those state buffers - completing that gate's StateConsumedFuture
# InputProcessor will return NOTHING_AVAILABLE (see StreamTwoInputProcessor.getInputStatus)
# StreamTask will suspend its default action
# State of the 2nd gate won't be consumed - so its StateConsumedFutures won't be completed - so no partitions will be requested
Solution: request partitions independently for each channel.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)