[jira] [Created] (FLINK-10538) standalone-job.sh causes Classpath issues

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-10538) standalone-job.sh causes Classpath issues

Shang Yuanchun (Jira)
Razvan created FLINK-10538:
------------------------------

             Summary: standalone-job.sh causes Classpath issues
                 Key: FLINK-10538
                 URL: https://issues.apache.org/jira/browse/FLINK-10538
             Project: Flink
          Issue Type: Bug
          Components: Docker, Job-Submission, Kubernetes
    Affects Versions: 1.6.1, 1.6.0, 1.6.2
            Reporter: Razvan


When launching a job with the cluster through this script it creates dependency issues.

 

We have a job which uses AsyncHttpClient, which uses the netty library. By default, using standalone-job.sh (when building/running a Docker image for Kubenetes) it will copy our given artifact to a file called "job.jar" in the lib/ folder of the distribution inside the container.

Upon runtime we get:

 
{code:java}
2018-10-11 13:44:10.057 [flink-akka.actor.default-dispatcher-15] INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph  - StateProcessFunction -> ToCustomerRatingFlatMap -> async wait operator -> Sink: CollectResultsSink (1/1) (f7fac66a85d41d4eac44ff609c515710) switched from RUNNING to FAILED.
java.lang.NoSuchMethodError: io.netty.handler.ssl.SslContext.newClientContextInternal(Lio/netty/handler/ssl/SslProvider;Ljava/security/Provider;[Ljava/security/cert/X509Certificate;Ljavax/net/ssl/TrustManagerFactory;[Ljava/security/cert/X509Certificate;Ljava/security/PrivateKey;Ljava/lang/String;Ljavax/net/ssl/KeyManagerFactory;Ljava/lang/Iterable;Lio/netty/handler/ssl/CipherSuiteFilter;Lio/netty/handler/ssl/ApplicationProtocolConfig;[Ljava/lang/String;JJZ)Lio/netty/handler/ssl/SslContext;
    at io.netty.handler.ssl.SslContextBuilder.build(SslContextBuilder.java:452)
    at org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.buildSslContext(DefaultSslEngineFactory.java:58)
    at org.asynchttpclient.netty.ssl.DefaultSslEngineFactory.init(DefaultSslEngineFactory.java:73)
    at org.asynchttpclient.netty.channel.ChannelManager.<init>(ChannelManager.java:100)
    at org.asynchttpclient.DefaultAsyncHttpClient.<init>(DefaultAsyncHttpClient.java:89)
    at org.asynchttpclient.Dsl.asyncHttpClient(Dsl.java:32)
    at com.test.events.common.asynchttp.AsyncHttpClientProvider.configureAsyncHttpClient(AsyncHttpClientProvider.java:128)
    at com.test.events.common.asynchttp.AsyncHttpClientProvider.<init>(AsyncHttpClientProvider.java:51)

{code}
 

 

It's because it loads Apache Flink's netty first

 

 
{code:java}
[Loaded io.netty.handler.codec.http.HttpObject from file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
[Loaded io.netty.handler.codec.http.HttpMessage from file:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar]
{code}
 
{code:java}
2018-10-12 11:48:20.434 [main] INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: /opt/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/opt/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/opt/flink-1.6.1/lib/job.jar:/opt/flink-1.6.1/lib/log4j-1.2.17.jar:/opt/flink-1.6.1/lib/logback-access.jar:/opt/flink-1.6.1/lib/logback-classic.jar:/opt/flink-1.6.1/lib/logback-core.jar:/opt/flink-1.6.1/lib/netty-buffer-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-codec-socks-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-handler-proxy-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-resolver-dns-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-epoll-4.1.30.Final.jar:/opt/flink-1.6.1/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/opt/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar:::
{code}
 

The workaround is to rename job.jar to 1JOB.jar for example to be loaded first

 
{code:java}
2018-10-12 13:51:09.165 [main] INFO org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath: /Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-shaded-hadoop2-uber-1.6.1.jar:/Users/users/projects/flink/flink-1.6.1/lib/log4j-1.2.17.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-access.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-classic.jar:/Users/users/projects/flink/flink-1.6.1/lib/logback-core.jar:/Users/users/projects/flink/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/Users/users/projects/flink/flink-1.6.1/lib/flink-dist_2.11-1.6.1.jar::: 
{code}
 

 
{code:java}
[Loaded io.netty.handler.codec.http.HttpObject from file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar]
[Loaded io.netty.handler.codec.http.HttpMessage from file:/Users/users/projects/flink/flink-1.6.1/lib/1JOB.jar] 
{code}
 

This needs to be fixed properly as it also means after workaround it will load the job's libraries first and could cause the Flink to crash or behave in unexpected ways



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)