(DEPRECATED) Apache Flink Mailing List archive.

[jira] [Created] (FLINK-1489) Failing JobManager due to blocking calls in Execution.scheduleOrUpdateConsumers

Classic

List

Threaded

1 message

Shang Yuanchun (Jira)

[jira] [Created] (FLINK-1489) Failing JobManager due to blocking calls in Execution.scheduleOrUpdateConsumers

Till Rohrmann created FLINK-1489:
------------------------------------

Summary: Failing JobManager due to blocking calls in Execution.scheduleOrUpdateConsumers
Key: FLINK-1489
URL: https://issues.apache.org/jira/browse/FLINK-1489
Project: Flink
Issue Type: Bug
Reporter: Till Rohrmann
Assignee: Till Rohrmann

[~Zentol] reported that the JobManager failed to execute his python job. The reason is that the the JobManager executes blocking calls in the actor thread in the method {{Execution.sendUpdateTaskRpcCall}} as a result to receiving a {{ScheduleOrUpdateConsumers}} message.

Every TaskManager possibly sends a {{ScheduleOrUpdateConsumers}} to the JobManager to notify the consumers about available data. The JobManager then sends to each TaskManager the respective update call {{Execution.sendUpdateTaskRpcCall}}. By blocking the actor thread, we effectively execute the update calls sequentially. Due to the ever accumulating delay, some of the initial timeouts on the TaskManager side in {{IntermediateResultParititon.scheduleOrUpdateConsumers}} fail. As a result the execution of the respective Tasks fails.

A solution would be to make the call non-blocking.

A general caveat for actor programming is: We should never block the actor thread, otherwise we seriously jeopardize the scalability of the system. Or even worse, the system simply fails.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)