Stephan Ewen created FLINK-1352:
-----------------------------------
Summary: Buggy registration from TaskManager to JobManager
Key: FLINK-1352
URL:
https://issues.apache.org/jira/browse/FLINK-1352 Project: Flink
Issue Type: Bug
Components: JobManager, TaskManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Till Rohrmann
Fix For: 0.9
The JobManager's InstanceManager may refuse the registration attempt from a TaskManager, because it has this taskmanager already connected, or,in the future, because the TaskManager has been blacklisted as unreliable.
Unpon refused registration, the instance ID is null, to signal that refused registration. TaskManager reacts incorrectly to such methods, assuming successful registration
Possible solution: JobManager sends back a dedicated "RegistrationRefused" message, if the instance manager returns null as the registration result. If the TastManager receives that before being registered, it knows that the registration response was lost (which should not happen on TCP and it would indicate a corrupt connection)
Followup question: Does it make sense to have the TaskManager trying indefinitely to connect to the JobManager. With increasing interval (from seconds to minutes)?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)