An created FLINK-18367:
--------------------------
Summary: Flink HA Mode in Kubernetes. Fencing token not set
Key: FLINK-18367
URL:
https://issues.apache.org/jira/browse/FLINK-18367 Project: Flink
Issue Type: Bug
Components: Runtime / Coordination
Affects Versions: 1.10.1
Reporter: An
Attachments: taskmanager.log
The issue is similar to
https://issues.apache.org/jira/browse/FLINK-12382I'm testing zetcd + session jobs in k8s. Have 2 job managers and 2 taskmanagers. Everything works fine, but after I delete the pod with the job manager leader, task managers not always can register itselves at the new leader. The following exception occurs:
´2020-06-18 13:02:43,555 [Thread=flink-akka.actor.default-dispatcher-3] ERROR org.apache.flink.runtime.taskexecutor.TaskExecutor - Registration at ResourceManager failed due to an error
java.util.concurrent.CompletionException: org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token not set: Ignoring message RemoteFencedMessage(bcb7d4652fe53a2f8997dc8c87d641a7, RemoteRpcInvocation(registerTaskExecutor(TaskExecutorRegistration, Time))) sent to akka.tcp://flink@poc-ha-walle-flink-jobmanager:50010/user/resourcemanager because the fencing token is null.
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
´
Task managers receive notification that leader was changed but seems RpcEndpoint can't refresh fence token for some reason
Attached full log from the task manager pod
--
This message was sent by Atlassian Jira
(v8.3.4#803005)