Piyush Narang created FLINK-17443:
------------------------------------- Summary: Flink's ZK in HA mode setup is unable to start up if any of the zk hosts are unreachable Key: FLINK-17443 URL: https://issues.apache.org/jira/browse/FLINK-17443 Project: Flink Issue Type: Bug Reporter: Piyush Narang We occasionally hit an issue where our Flink cluster will not startup if any of the zookeeper hosts passed in the "high-availability.zookeeper.quorum" config setting are unreachable. This seems to stem from us using an older zookeeper dependency version (3.4.10). Sample error we see is shown below. This error seems to stem from us being on an older zookeeper release (3.4.10). This has been fixed as part of: https://issues.apache.org/jira/browse/ZOOKEEPER-1576 in the 3.4.x branch ([https://github.com/apache/zookeeper/commit/be1409cc9a14ac2e28693e0e02a0ba6d9713565e]). {code:java} java.net.UnknownHostException: zk01-pa4.hpc.criteo.prod: Name or service not knownjava.net.UnknownHostException: zk01-pa4.hpc.criteo.prod: Name or service not known at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at java.net.InetAddress.getAllByName0(InetAddress.java:1277) at java.net.InetAddress.getAllByName(InetAddress.java:1193) at java.net.InetAddress.getAllByName(InetAddress.java:1127) at org.apache.flink.shaded.zookeeper.org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61) at org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445) at org.apache.flink.shaded.curator.org.apache.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:29) at org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:150) at org.apache.flink.shaded.curator.org.apache.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94) at org.apache.flink.shaded.curator.org.apache.curator.HandleHolder.getZooKeeper(HandleHolder.java:55) at org.apache.flink.shaded.curator.org.apache.curator.ConnectionState.reset(ConnectionState.java:262) at org.apache.flink.shaded.curator.org.apache.curator.ConnectionState.start(ConnectionState.java:109) at org.apache.flink.shaded.curator.org.apache.curator.CuratorZookeeperClient.start(CuratorZookeeperClient.java:191) at org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.start(CuratorFrameworkImpl.java:259) at org.apache.flink.runtime.util.ZooKeeperUtils.startCuratorFramework(ZooKeeperUtils.java:131) at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:123) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:292) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:257){code} -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |