[jira] [Created] (FLINK-12384) Rolling the etcd servers causes "Connected to an old server; r-o mode will be unavailable"

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-12384) Rolling the etcd servers causes "Connected to an old server; r-o mode will be unavailable"

Shang Yuanchun (Jira)
Henrik created FLINK-12384:
------------------------------

             Summary: Rolling the etcd servers causes "Connected to an old server; r-o mode will be unavailable"
                 Key: FLINK-12384
                 URL: https://issues.apache.org/jira/browse/FLINK-12384
             Project: Flink
          Issue Type: Bug
            Reporter: Henrik


{code:java}
[tm] 2019-05-01 13:30:53,316 INFO  org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Initiating client connection, connectString=analytics-zetcd:2181 sessionTimeout=60000 watcher=org.apache.flink.shaded.curator.org.apache.curator.ConnectionState@5c8eee0f
[tm] 2019-05-01 13:30:53,384 WARN  org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - SASL configuration failed: javax.security.auth.login.LoginException: No JAAS configuration section named 'Client' was found in specified JAAS configuration file: '/tmp/jaas-3674237213070587877.conf'. Will continue connection to Zookeeper server without SASL authentication, if Zookeeper server allows it.
[tm] 2019-05-01 13:30:53,395 INFO  org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Opening socket connection to server analytics-zetcd.default.svc.cluster.local/10.108.52.97:2181
[tm] 2019-05-01 13:30:53,395 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - Using configured hostname/address for TaskManager: 10.1.2.173.
[tm] 2019-05-01 13:30:53,401 ERROR org.apache.flink.shaded.curator.org.apache.curator.ConnectionState  - Authentication failed
[tm] 2019-05-01 13:30:53,418 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils         - Trying to start actor system at 10.1.2.173:0
[tm] 2019-05-01 13:30:53,420 INFO  org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Socket connection established to analytics-zetcd.default.svc.cluster.local/10.108.52.97:2181, initiating session
[tm] 2019-05-01 13:30:53,500 WARN  org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxnSocket  - Connected to an old server; r-o mode will be unavailable
[tm] 2019-05-01 13:30:53,500 INFO  org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Session establishment complete on server analytics-zetcd.default.svc.cluster.local/10.108.52.97:2181, sessionid = 0xbf06a739001d446, negotiated timeout = 60000
[tm] 2019-05-01 13:30:53,525 INFO  org.apache.flink.shaded.curator.org.apache.curator.framework.state.ConnectionStateManager  - State change: CONNECTED{code}
Repro:

Start an etcd-cluster, with e.g. etcd-operator, with three members. Start zetcd in front. Configure the sesssion cluster to go against zetcd.

Ensure the job can start successfully.

Now, kill the etcd pods one by one, letting the quorum re-establish in between, so that the cluster is still OK.

Now restart the job/tm pods. You'll end up in this no-mans-land.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)