Hi,
I am trying to setup flink with Yarn on Mapr cluster. I built flink (flink-1.3-SNAPSHOT) as follows: mvn clean install -DskipTests -Pvendor-repos -Dhadoop.version=2.7.0-mapr-1607 The build is successful. Then I try to run ./bin/yarn-session.sh -n 4 (without changing any config or whatsoever) and get the following two errors: 1. This one is a minor error (or bug?) Error while trying to split key and value in configuration file /conf/flink-conf.yaml: 2. Second error is more serious and as follows: Error while deploying YARN cluster: Couldn't deploy Yarn cluster java.lang.RuntimeException: Couldn't deploy Yarn cluster at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:425) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:620) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:476) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:473) at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:473) Caused by: java.lang.NumberFormatException: For input string: "${nodemanager.resource.cpu-vcores}" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:569) at java.lang.Integer.parseInt(Integer.java:615) at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1271) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.isReadyForDeployment(AbstractYarnClusterDescriptor.java:315) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:434) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:423) ... 9 more Now, the property that is causing this error nodemanager.resource.cpu-vcores is appropriately set in yarn-site.xml. The cluster is 3 ResourceManager (2 on standby) and 5 NodeManager. To be extra safe, I changed the value for this property at ALL the Nodemanager’s yarn-site.xml. I believe that this property is default set to 4 according to this blog [ https://www.mapr.com/blog/best-practices-yarn-resource-management ]. So I am trying to understand as to why is this error cropping up. The required environment variable is set as follows: YARN_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/ I also tried setting the fs.hdfs.hadoopconf property (to point to the Hadoop conf directory) in flink-config.yaml. But I still get the same error. Any help with these (especially the latter) errors would be greatly appreciated. Thanks in advance, Aniket D |
Hi Aniket,
The first error you are reporting is not a big deal. I think there are some whitespaces or so in the default flink config file that cause the parser to print the message. The second one is tougher to fix. It seems that there is an issue with loading the Hadoop configuration correctly. Can you post the contents of the client log file from the "log/" directory? It contains for example the Hadoop version being used (Maybe it didn't correctly pick up the custom Hadoop version) and maybe some helpful WARN log messages (because our YARN client is doing some checks before starting) Regards, Robert On Fri, Jan 20, 2017 at 12:27 AM, Aniket Deshpande <[hidden email]> wrote: > Hi, > I am trying to setup flink with Yarn on Mapr cluster. I built flink > (flink-1.3-SNAPSHOT) as follows: > > mvn clean install -DskipTests -Pvendor-repos -Dhadoop.version=2.7.0-mapr- > 1607 > > The build is successful. Then I try to run ./bin/yarn-session.sh -n 4 > (without changing any config or whatsoever) and get the following two > errors: > > > 1. This one is a minor error (or bug?) > > Error while trying to split key and value in configuration file > /conf/flink-conf.yaml: > > > > 2. Second error is more serious and as follows: > > > > Error while deploying YARN cluster: Couldn't deploy Yarn cluster > > java.lang.RuntimeException: Couldn't deploy Yarn cluster > > at org.apache.flink.yarn.AbstractYarnClusterDescriptor. > deploy(AbstractYarnClusterDescriptor.java:425) > > at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run( > FlinkYarnSessionCli.java:620) > > at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call( > FlinkYarnSessionCli.java:476) > > at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call( > FlinkYarnSessionCli.java:473) > > at org.apache.flink.runtime.security. > HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) > > at java.security.AccessController.doPrivileged(Native > Method) > > at javax.security.auth.Subject.doAs(Subject.java:422) > > at org.apache.hadoop.security.UserGroupInformation.doAs( > UserGroupInformation.java:1595) > > at org.apache.flink.runtime.security. > HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) > > at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main( > FlinkYarnSessionCli.java:473) > > Caused by: java.lang.NumberFormatException: For input string: > "${nodemanager.resource.cpu-vcores}" > > at java.lang.NumberFormatException.forInputString( > NumberFormatException.java:65) > > at java.lang.Integer.parseInt(Integer.java:569) > > at java.lang.Integer.parseInt(Integer.java:615) > > at org.apache.hadoop.conf.Configuration.getInt( > Configuration.java:1271) > > at org.apache.flink.yarn.AbstractYarnClusterDescriptor. > isReadyForDeployment(AbstractYarnClusterDescriptor.java:315) > > at org.apache.flink.yarn.AbstractYarnClusterDescriptor. > deployInternal(AbstractYarnClusterDescriptor.java:434) > > at org.apache.flink.yarn.AbstractYarnClusterDescriptor. > deploy(AbstractYarnClusterDescriptor.java:423) > > ... 9 more > > > > Now, the property that is causing this error nodemanager.resource.cpu-vcores > is appropriately set in yarn-site.xml. The cluster is 3 ResourceManager (2 > on standby) and 5 NodeManager. To be extra safe, I changed the value for > this property at ALL the Nodemanager’s yarn-site.xml. > > I believe that this property is default set to 4 according to this blog [ > https://www.mapr.com/blog/best-practices-yarn-resource-management ]. So I > am trying to understand as to why is this error cropping up. > > The required environment variable is set as follows: > > YARN_CONF_DIR=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/ > > > > I also tried setting the fs.hdfs.hadoopconf property (to point to the > Hadoop conf directory) in flink-config.yaml. But I still get the same error. > > > > Any help with these (especially the latter) errors would be greatly > appreciated. > > > Thanks in advance, > > Aniket D > |
That is the problem. There are NO logs under "log/" folder. P.S : Sorry for the dual post. I messed up on my part. |
Hi,
no problem. Can you send me the log statements written to standard out on the client? On Wed, Jan 25, 2017 at 5:02 PM, ani.desh1512 <[hidden email]> wrote: > Robert Metzger wrote > > The second one is tougher to fix. It seems that there is an issue with > > loading the Hadoop configuration correctly. > > Can you post the contents of the client log file from the "log/" > > directory? > > It contains for example the Hadoop version being used (Maybe it didn't > > correctly pick up the custom Hadoop version) and maybe some helpful WARN > > log messages (because our YARN client is doing some checks before > > starting) > > That is the problem. There are NO logs under "log/" folder. > > P.S : Sorry for the dual post. I messed up on my part. > > > > > -- > View this message in context: http://apache-flink-mailing- > list-archive.1008284.n3.nabble.com/Flink-with-Yarn-on- > MapR-tp15448p15589.html > Sent from the Apache Flink Mailing List archive. mailing list archive at > Nabble.com. > |
Following is the output when I execute ./bin/yarn-session.sh -n 2 2017-01-25 15:53:57,805 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, 10.101.2.111 2017-01-25 15:53:57,807 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-01-25 15:53:57,807 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 8192 2017-01-25 15:53:57,807 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 40960 2017-01-25 15:53:57,807 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 5 2017-01-25 15:53:57,807 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-01-25 15:53:57,808 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-01-25 15:53:57,808 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-01-25 15:53:57,808 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.tmp.dirs, /tmp/flink 2017-01-25 15:53:57,809 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: fs.hdfs.hadoopconf, /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop 2017-01-25 15:53:57,809 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: fs.default-scheme, maprfs:/// 2017-01-25 15:53:57,839 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, 10.101.2.111 2017-01-25 15:53:57,839 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-01-25 15:53:57,839 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 8192 2017-01-25 15:53:57,839 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 40960 2017-01-25 15:53:57,839 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 5 2017-01-25 15:53:57,839 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-01-25 15:53:57,839 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-01-25 15:53:57,840 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-01-25 15:53:57,840 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.tmp.dirs, /tmp/flink 2017-01-25 15:53:57,841 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: fs.hdfs.hadoopconf, /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop 2017-01-25 15:53:57,841 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: fs.default-scheme, maprfs:/// 2017-01-25 15:53:59,356 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2017-01-25 15:53:59,628 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to ubuntu (auth:SIMPLE) 2017-01-25 15:53:59,649 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, 10.101.2.111 2017-01-25 15:53:59,649 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-01-25 15:53:59,650 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 8192 2017-01-25 15:53:59,650 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 40960 2017-01-25 15:53:59,650 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 5 2017-01-25 15:53:59,650 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-01-25 15:53:59,650 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-01-25 15:53:59,650 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-01-25 15:53:59,651 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.tmp.dirs, /tmp/flink 2017-01-25 15:53:59,651 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: fs.hdfs.hadoopconf, /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop 2017-01-25 15:53:59,651 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: fs.default-scheme, maprfs:/// Error while deploying YARN cluster: Couldn't deploy Yarn cluster java.lang.RuntimeException: Couldn't deploy Yarn cluster at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:425) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:620) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:476) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:473) at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:473) Caused by: java.lang.NumberFormatException: For input string: "${nodemanager.resource.cpu-vcores}" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:481) at java.lang.Integer.parseInt(Integer.java:527) at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1271) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.isReadyForDeployment(AbstractYarnClusterDescriptor.java:315) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:434) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:423) ... 9 more |
I was able to resolve this issue by referring to this issue
In short, I had to modify flink's pom.xml to change the zookeeper version to the zookeeper mapr version and this error the disappears. But, I do have ran into a separate issue, which I will be starting a new thread on. Thanks for all the help Robert. |
Hi,
I'm facing the same problem, but even after changing the zookeeper version the error persists In my current mapr setup, the zookeeper version is 3.4.5 and the error messages are as follows: [mapr@maprdemo flink]$ ./flink-1.3-SNAPSHOT/bin/yarn-session.sh -n 1 2017-02-27 04:51:29,364 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2017-02-27 04:51:29,365 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-02-27 04:51:29,365 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 256 2017-02-27 04:51:29,366 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 512 2017-02-27 04:51:29,366 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2017-02-27 04:51:29,366 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-02-27 04:51:29,366 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-02-27 04:51:29,367 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-02-27 04:51:29,425 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2017-02-27 04:51:29,426 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-02-27 04:51:29,426 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 256 2017-02-27 04:51:29,426 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 512 2017-02-27 04:51:29,426 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2017-02-27 04:51:29,426 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-02-27 04:51:29,427 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-02-27 04:51:29,427 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 2017-02-27 04:51:31,192 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2017-02-27 04:51:31,477 INFO org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set to mapr (auth:SIMPLE) 2017-02-27 04:51:31,505 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2017-02-27 04:51:31,505 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2017-02-27 04:51:31,505 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 256 2017-02-27 04:51:31,505 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 512 2017-02-27 04:51:31,505 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2017-02-27 04:51:31,505 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.memory.preallocate, false 2017-02-27 04:51:31,505 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2017-02-27 04:51:31,506 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.web.port, 8081 Error while deploying YARN cluster: Couldn't deploy Yarn cluster java.lang.RuntimeException: Couldn't deploy Yarn cluster at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:426) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:620) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:476) at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:473) at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:421) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:473) Caused by: java.lang.NumberFormatException: For input string: "${nodemanager.resource.cpu-vcores}" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:481) at java.lang.Integer.parseInt(Integer.java:527) at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1271) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.isReadyForDeployment(AbstractYarnClusterDescriptor.java:316) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:435) at org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploy(AbstractYarnClusterDescriptor.java:424) ... 9 more Thanks in advance |
While it is difficult to pinpoint exactly what is wrong without Yarn logs, there are a couple of things that I want you to ensure:
1. Is the YARN_CONF_DIR environment variable set? 2. Is the "yarn.nodemanager.resource.cpu-vcores" property set in yarn-site.xml? Let me know if these two things are in order and if they are, and you are still getting the error, then we will need to look at Yarn logs. |
Free forum by Nabble | Edit this page |