Daebeom Lee created FLINK-14012:
----------------------------------- Summary: Failed to start job for consuming Secure Kafka after the job cancel Key: FLINK-14012 URL: https://issues.apache.org/jira/browse/FLINK-14012 Project: Flink Issue Type: Bug Components: Connectors / Kafka Affects Versions: 1.9.0 Environment: * Kubernetes 1.13.2 * Flink 1.9.0 * Kafka client libary 2.2.0 Reporter: Daebeom Lee Hello, this is Daebeom Lee. h2. Background I installed Flink 1.9.0 at this our Kubernetes cluster. We use Flink session cluster. - build fatJar file and upload it at the UI, run serval jobs. At first, our jobs are good to start. But, when we cancel some jobs, the job failed This is the error code. {code:java} // code placeholder java.lang.NoClassDefFoundError: org/apache/kafka/common/security/scram/internals/ScramSaslClient at org.apache.kafka.common.security.scram.internals.ScramSaslClient$ScramSaslClientFactory.createSaslClient(ScramSaslClient.java:235) at javax.security.sasl.Sasl.createSaslClient(Sasl.java:384) at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.lambda$createSaslClient$0(SaslClientAuthenticator.java:180) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.createSaslClient(SaslClientAuthenticator.java:176) at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.<init>(SaslClientAuthenticator.java:168) at org.apache.kafka.common.network.SaslChannelBuilder.buildClientAuthenticator(SaslChannelBuilder.java:254) at org.apache.kafka.common.network.SaslChannelBuilder.lambda$buildChannel$1(SaslChannelBuilder.java:202) at org.apache.kafka.common.network.KafkaChannel.<init>(KafkaChannel.java:140) at org.apache.kafka.common.network.SaslChannelBuilder.buildChannel(SaslChannelBuilder.java:210) at org.apache.kafka.common.network.Selector.buildAndAttachKafkaChannel(Selector.java:334) at org.apache.kafka.common.network.Selector.registerChannel(Selector.java:325) at org.apache.kafka.common.network.Selector.connect(Selector.java:257) at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:920) at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:287) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.trySend(ConsumerNetworkClient.java:474) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:255) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215) at org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:292) at org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1803) at org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1771) at org.apache.flink.streaming.connectors.kafka.internal.KafkaPartitionDiscoverer.getAllPartitionsForTopics(KafkaPartitionDiscoverer.java:77) at org.apache.flink.streaming.connectors.kafka.internals.AbstractPartitionDiscoverer.discoverPartitions(AbstractPartitionDiscoverer.java:131) at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.open(FlinkKafkaConsumerBase.java:508) at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36) at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102) at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:529) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:393) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at java.lang.Thread.run(Thread.java:748) {code} h2. Our workaround * I think that this is Flink JVM classloader issue. * Classloader unloads when job cancels by the way kafka client library is included fatJar. * So, I located Kafka client library to /opt/flink/lib ** /opt/flink/lib/kafka-clients-2.2.0.jar * And then all issue solved. * But there are weird points ** When Flink 1.8.1 has no problem before 2 weeks ** Before 1 week I rollback from 1.9.0 to 1.8.1, same errors occurred. ** Maybe docker image is changed at docker repository ( [https://github.com/docker-flink/docker-flink ) |https://github.com/docker-flink/docker-flink] h2. Suggestion * I'd like to know why this error occurred exactly reason after upgrade 1.9.0. * Does anybody know a better solution in this case? Thank you in advance. -- This message was sent by Atlassian Jira (v8.3.2#803003) |
Free forum by Nabble | Edit this page |