Patrick Lucas created FLINK-14973:
------------------------------------- Summary: OSS filesystem does not relocate many dependencies Key: FLINK-14973 URL: https://issues.apache.org/jira/browse/FLINK-14973 Project: Flink Issue Type: Bug Components: FileSystems Affects Versions: 1.9.1 Reporter: Patrick Lucas Whereas the Azure and S3 Hadoop filesystem jars relocate all of their depdendencies: {noformat} $ jar tf opt/flink-azure-fs-hadoop-1.9.1.jar | grep -v '^org/apache/fs/shaded/' | grep -v '^META-INF' | grep '/' | cut -f -2 -d / | sort | uniq org/ org/apache $ jar tf opt/flink-s3-fs-hadoop-1.9.1.jar | grep -v '^org/apache/fs/shaded/' | grep -v '^META-INF' | grep '/' | cut -f -2 -d / | sort | uniq org/ org/apache {noformat} The OSS Hadoop filesystem leaves many things un-relocated: {noformat} $ jar tf opt/flink-oss-fs-hadoop-1.9.1.jar | grep -v '^org/apache/fs/shaded/' | grep -v '^META-INF' | grep '/' | cut -f -2 -d / | sort | uniq assets/ assets/org avro/ avro/shaded com/ com/ctc com/fasterxml com/google com/jcraft com/nimbusds com/sun com/thoughtworks javax/ javax/activation javax/el javax/servlet javax/ws javax/xml jersey/ jersey/repackaged licenses/ licenses/LICENSE.asm licenses/LICENSE.cddlv1.0 licenses/LICENSE.cddlv1.1 licenses/LICENSE.jdom licenses/LICENSE.jzlib licenses/LICENSE.paranamer licenses/LICENSE.protobuf licenses/LICENSE.re2j licenses/LICENSE.stax2api net/ net/jcip net/minidev org/ org/apache org/codehaus org/eclipse org/jdom org/objectweb org/tukaani org/xerial {noformat} The first symptom of this I ran into was that Flink is unable to restore from a savepoint if both the OSS and Azure Hadoop filesystems are on the classpath, but I assume this has the potential to cause further problems, at least until more progress is made on the module/classloading front. h3. Steps to reproduce # Copy both the Azure and OSS Hadoop filesystem JARs from opt/ into lib/ # Run a job that restores from a savepoint (the savepoint might need to be stored on OSS) # See a crash and traceback like: {noformat} 2019-11-26 15:59:25,318 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [] - Fatal error occurred in the cluster entrypoint. org.apache.flink.runtime.dispatcher.DispatcherException: Failed to take leadership with session id 00000000-0000-0000-0000-000000000000. at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$30(Dispatcher.java:915) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_232] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_232] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_232] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_232] at org.apache.flink.runtime.concurrent.FutureUtils$WaitingConjunctFuture.handleCompletedFuture(FutureUtils.java:691) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_232] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_232] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_232] at java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:575) ~[?:1.8.0_232] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:753) ~[?:1.8.0_232] at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456) ~[?:1.8.0_232] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at akka.actor.Actor.aroundReceive(Actor.scala:517) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at akka.actor.Actor.aroundReceive$(Actor.scala:515) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at akka.actor.ActorCell.invoke(ActorCell.scala:561) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at akka.dispatch.Mailbox.run(Mailbox.scala:225) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at akka.dispatch.Mailbox.exec(Mailbox.scala:235) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] Caused by: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_232] at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] ... 4 more Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:152) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:83) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:375) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_232] at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] ... 4 more Caused by: java.lang.NoSuchMethodError: org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem.open(AliyunOSSFileSystem.java:570) ~[flink-oss-fs-hadoop-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950) ~[flink-azure-fs-hadoop-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.open(HadoopFileSystem.java:120) ~[flink-oss-fs-hadoop-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.open(HadoopFileSystem.java:37) ~[flink-oss-fs-hadoop-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:68) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.checkpoint.Checkpoints.loadAndValidateCheckpoint(Checkpoints.java:141) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1132) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.scheduler.LegacyScheduler.tryRestoreExecutionGraphFromSavepoint(LegacyScheduler.java:237) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.scheduler.LegacyScheduler.createAndRestoreExecutionGraph(LegacyScheduler.java:196) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.scheduler.LegacyScheduler.<init>(LegacyScheduler.java:176) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.scheduler.LegacySchedulerFactory.createInstance(LegacySchedulerFactory.java:70) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:275) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:265) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.jobmaster.JobManagerRunner.<init>(JobManagerRunner.java:146) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:83) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:375) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_232] at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) ~[flink-dist_2.12-1.9.1-stream2.jar:1.9.1-stream2] ... 4 more {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |