[jira] [Created] (FLINK-14973) OSS filesystem does not relocate many dependencies

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-14973) OSS filesystem does not relocate many dependencies

Shang Yuanchun (Jira)
Patrick Lucas created FLINK-14973:
-------------------------------------

             Summary: OSS filesystem does not relocate many dependencies
                 Key: FLINK-14973
                 URL: https://issues.apache.org/jira/browse/FLINK-14973
             Project: Flink
          Issue Type: Bug
          Components: FileSystems
    Affects Versions: 1.9.1
            Reporter: Patrick Lucas


Whereas the Azure and S3 Hadoop filesystem jars relocate all of their depdendencies:

{noformat}
$ jar tf opt/flink-azure-fs-hadoop-1.9.1.jar | grep -v '^org/apache/fs/shaded/' | grep -v '^META-INF' | grep '/' | cut -f -2 -d / | sort | uniq
org/
org/apache

$ jar tf opt/flink-s3-fs-hadoop-1.9.1.jar | grep -v '^org/apache/fs/shaded/' | grep -v '^META-INF' | grep '/' | cut -f -2 -d / | sort | uniq
org/
org/apache
{noformat}

The OSS Hadoop filesystem leaves many things un-relocated:

{noformat}
$ jar tf opt/flink-oss-fs-hadoop-1.9.1.jar | grep -v '^org/apache/fs/shaded/' | grep -v '^META-INF' | grep '/' | cut -f -2 -d / | sort | uniq
assets/
assets/org
avro/
avro/shaded
com/
com/ctc
com/fasterxml
com/google
com/jcraft
com/nimbusds
com/sun
com/thoughtworks
javax/
javax/activation
javax/el
javax/servlet
javax/ws
javax/xml
jersey/
jersey/repackaged
licenses/
licenses/LICENSE.asm
licenses/LICENSE.cddlv1.0
licenses/LICENSE.cddlv1.1
licenses/LICENSE.jdom
licenses/LICENSE.jzlib
licenses/LICENSE.paranamer
licenses/LICENSE.protobuf
licenses/LICENSE.re2j
licenses/LICENSE.stax2api
net/
net/jcip
net/minidev
org/
org/apache
org/codehaus
org/eclipse
org/jdom
org/objectweb
org/tukaani
org/xerial
{noformat}

The first symptom of this I ran into was that Flink is unable to restore from a savepoint if both the OSS and Azure Hadoop filesystems are on the classpath, but I assume this has the potential to cause further problems, at least until more progress is made on the module/classloading front.

h3. Steps to reproduce

# Copy both the Azure and OSS Hadoop filesystem JARs from opt/ into lib/
# Run a job that restores from a savepoint (the savepoint might need to be stored on OSS)
# See a crash and traceback like:

{noformat}
2019-11-26 15:59:25,318 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal error occurred in the cluster entrypoint.
org.apache.flink.runtime.dispatcher.DispatcherException: Failed to take leadership with session id 00000000-0000-0000-0000-000000000000.
        at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$30(Dispatcher.java:915) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_232]
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_232]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_232]
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_232]
        at org.apache.flink.runtime.concurrent.FutureUtils$WaitingConjunctFuture.handleCompletedFuture(FutureUtils.java:691) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_232]
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_232]
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_232]
        at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575) ~[?:1.8.0_232]
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:753) ~[?:1.8.0_232]
        at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456) ~[?:1.8.0_232]
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at akka.actor.Actor.aroundReceive(Actor.scala:517) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at akka.actor.Actor.aroundReceive$(Actor.scala:515) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at akka.actor.ActorCell.invoke(ActorCell.scala:561) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at akka.dispatch.Mailbox.run(Mailbox.scala:225) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
Caused by: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
        at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_232]
        at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        ... 4 more
Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
        at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:152) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:83) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:375) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_232]
        at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        ... 4 more
Caused by: java.lang.NoSuchMethodError: org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V
        at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem.open(AliyunOSSFileSystem.java:570) ~[flink-oss-fs-hadoop-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950) ~[flink-azure-fs-hadoop-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.open(HadoopFileSystem.java:120) ~[flink-oss-fs-hadoop-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.open(HadoopFileSystem.java:37) ~[flink-oss-fs-hadoop-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:68) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.checkpoint.Checkpoints.loadAndValidateCheckpoint(Checkpoints.java:141) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1132) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.scheduler.LegacyScheduler.tryRestoreExecutionGraphFromSavepoint(LegacyScheduler.java:237) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.scheduler.LegacyScheduler.createAndRestoreExecutionGraph(LegacyScheduler.java:196) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.scheduler.LegacyScheduler.<init>(LegacyScheduler.java:176) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.scheduler.LegacySchedulerFactory.createInstance(LegacySchedulerFactory.java:70) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:275) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:265) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:146) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:83) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:375) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_232]
        at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2]
        ... 4 more
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)