Gary Yao created FLINK-15082:
-------------------------------- Summary: Mesos App Master does not respect taskmanager.memory.total-process.size Key: FLINK-15082 URL: https://issues.apache.org/jira/browse/FLINK-15082 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.10.0 Reporter: Gary Yao Fix For: 1.10.0 *Description* When the Mesos App Master is started with {{taskmanager.memory.total-process.size}}, [the value is not respected|https://github.com/apache/flink/blob/d08beaa3255b3df96afe35f17e257df31a0d71ed/flink-mesos/src/main/java/org/apache/flink/mesos/runtime/clusterframework/MesosTaskManagerParameters.java#L339]. One can reproduce this when starting the App Master with the command below: {noformat} /bin/mesos-appmaster.sh \ -Dtaskmanager.memory.total-process.size=2048m \ -Djobmanager.heap.size=2048m \ ... {noformat} The ClusterEntryPoint will fail with an exception (see below). The reason is that the default value of {{mesos.resourcemanager.tasks.mem}} will be taken as the total process memory size (1024 mb). {noformat} org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint MesosSessionClusterEntrypoint. at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518) at org.apache.flink.mesos.entrypoint.MesosSessionClusterEntrypoint.main(MesosSessionClusterEntrypoint.java:126) Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent. at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:261) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:215) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168) ... 2 more Caused by: org.apache.flink.configuration.IllegalConfigurationException: Sum of configured Framework Heap Memory (134217728 bytes), Framework Off-Heap Memory (134217728 bytes), Task Off-Heap Memory (0 bytes), Managed Memory (719407031 bytes) and Shuffle Memory (80530638 bytes) exceed configured Total Flink Memory (805306368 bytes). at org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.deriveInternalMemoryFromTotalFlinkMemory(TaskExecutorResourceUtils.java:273) at org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.deriveResourceSpecWithTotalProcessMemory(TaskExecutorResourceUtils.java:210) at org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:108) at org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:94) at org.apache.flink.mesos.runtime.clusterframework.MesosTaskManagerParameters.create(MesosTaskManagerParameters.java:341) at org.apache.flink.mesos.util.MesosUtils.createTmParameters(MesosUtils.java:109) at org.apache.flink.mesos.runtime.clusterframework.MesosResourceManagerFactory.createActiveResourceManager(MesosResourceManagerFactory.java:80) at org.apache.flink.runtime.resourcemanager.ActiveResourceManagerFactory.createResourceManager(ActiveResourceManagerFactory.java:58) at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:170) ... 9 more {noformat} *Expected Behavior* * If taskmanager.memory.total-process.size and mesos.resourcemanager.tasks.mem are both set and differ in their values, an exception should be thrown * If only taskmanager.memory.total-process.size is set and mesos.resourcemanager.tasks.mem is not set, then the value configured by the former should be respected * If only mesos.resourcemanager.tasks.mem is set and taskmanager.memory.total-process.size is not set, then the value configured by the former should be respected -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |