Steven Zhen Wu created FLINK-18350:
-------------------------------------- Summary: [1.11.0] jobmanager complains `taskmanager.memory.process.size` missing Key: FLINK-18350 URL: https://issues.apache.org/jira/browse/FLINK-18350 Project: Flink Issue Type: Bug Components: Runtime / Configuration Affects Versions: 1.11.0 Reporter: Steven Zhen Wu Saw this failure in jobmanager startup. I know the exception said that `taskmanager.memory.process.size` missing. We set it at taskmanager side in `flink-conf.yaml`. But I am wondering why is this required by jobmanager for session cluster mode. When taskmanager registering with jobmanager, it reports the resources (like CPU, memory etc.). {code:java} 2020-06-17 18:06:25,079 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint [main] - Could not start cluster entrypoint TitusSessionClusterEntrypoint. org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint TitusSessionClusterEntrypoint. at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:516) at com.netflix.spaas.runtime.TitusSessionClusterEntrypoint.main(TitusSessionClusterEntrypoint.java:103) Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent. at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:255) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:216) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168) ... 2 more Caused by: org.apache.flink.configuration.IllegalConfigurationException: Cannot read memory size from config option 'taskmanager.memory.process.size'. at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.getMemorySizeFromConfig(ProcessMemoryUtils.java:234) at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.deriveProcessSpecWithTotalProcessMemory(ProcessMemoryUtils.java:100) at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.memoryProcessSpecFromConfig(ProcessMemoryUtils.java:79) at org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils.processSpecFromConfig(TaskExecutorProcessUtils.java:109) at org.apache.flink.runtime.clusterframework.TaskExecutorProcessSpecBuilder.build(TaskExecutorProcessSpecBuilder.java:58) at org.apache.flink.runtime.resourcemanager.WorkerResourceSpecFactory.workerResourceSpecFromConfigAndCpu(WorkerResourceSpecFactory.java:37) at com.netflix.spaas.runtime.resourcemanager.TitusWorkerResourceSpecFactory.createDefaultWorkerResourceSpec(TitusWorkerResourceSpecFactory.java:17) at org.apache.flink.runtime.resourcemanager.ResourceManagerRuntimeServicesConfiguration.fromConfiguration(ResourceManagerRuntimeServicesConfiguration.java:67) at com.netflix.spaas.runtime.resourcemanager.TitusResourceManagerFactory.createResourceManager(TitusResourceManagerFactory.java:53) at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:167) ... 9 more Caused by: java.lang.IllegalArgumentException: Could not parse value '7500}' for key 'taskmanager.memory.process.size'. at org.apache.flink.configuration.Configuration.getOptional(Configuration.java:753) at org.apache.flink.configuration.Configuration.get(Configuration.java:738) at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.getMemorySizeFromConfig(ProcessMemoryUtils.java:232) ... 18 more Caused by: java.lang.IllegalArgumentException: Memory size unit '}' does not match any of the recognized units: (b | bytes) / (k | kb | kibibytes) / (m | mb | mebibytes) / (g | gb | gibibytes) / (t | tb | tebibytes) at org.apache.flink.configuration.MemorySize.parseUnit(MemorySize.java:331) at org.apache.flink.configuration.MemorySize.parseBytes(MemorySize.java:306) at org.apache.flink.configuration.MemorySize.parse(MemorySize.java:247) at org.apache.flink.configuration.Configuration.convertToMemorySize(Configuration.java:951) at org.apache.flink.configuration.Configuration.convertValue(Configuration.java:885) at org.apache.flink.configuration.Configuration.lambda$getOptional$2(Configuration.java:750) at java.util.Optional.map(Optional.java:215) at org.apache.flink.configuration.Configuration.getOptional(Configuration.java:750) ... 20 more {code} We extend from WorkerResourceSpecFactory similar to KubernetesWorkerResourceSpecFactory. {code:java} public class TitusWorkerResourceSpecFactory extends WorkerResourceSpecFactory { public static final TitusWorkerResourceSpecFactory INSTANCE = new TitusWorkerResourceSpecFactory(); @Override public WorkerResourceSpec createDefaultWorkerResourceSpec(Configuration configuration) { return workerResourceSpecFromConfigAndCpu(configuration, getDefaultCpus(configuration)); } @VisibleForTesting static CPUResource getDefaultCpus(Configuration configuration) { double fallback = Double.valueOf(System.getenv("TITUS_NUM_CPU")); return TaskExecutorProcessUtils.getCpuCoresWithFallback(configuration, fallback); } } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |