[jira] [Created] (FLINK-18350) [1.11.0] jobmanager complains `taskmanager.memory.process.size` missing

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-18350) [1.11.0] jobmanager complains `taskmanager.memory.process.size` missing

Shang Yuanchun (Jira)
Steven Zhen Wu created FLINK-18350:
--------------------------------------

             Summary: [1.11.0] jobmanager complains `taskmanager.memory.process.size` missing
                 Key: FLINK-18350
                 URL: https://issues.apache.org/jira/browse/FLINK-18350
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Configuration
    Affects Versions: 1.11.0
            Reporter: Steven Zhen Wu


 

Saw this failure in jobmanager startup. I know the exception said that `taskmanager.memory.process.size` missing. We set it at taskmanager side in `flink-conf.yaml`. But I am wondering why is this required by jobmanager for session cluster mode. When taskmanager registering with jobmanager, it reports the resources (like CPU, memory etc.).  
{code:java}
2020-06-17 18:06:25,079 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [main]  - Could not start cluster entrypoint TitusSessionClusterEntrypoint.
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint TitusSessionClusterEntrypoint.
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:516)
        at com.netflix.spaas.runtime.TitusSessionClusterEntrypoint.main(TitusSessionClusterEntrypoint.java:103)
Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.
        at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:255)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:216)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
        at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
        at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)
        ... 2 more
Caused by: org.apache.flink.configuration.IllegalConfigurationException: Cannot read memory size from config option 'taskmanager.memory.process.size'.
        at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.getMemorySizeFromConfig(ProcessMemoryUtils.java:234)
        at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.deriveProcessSpecWithTotalProcessMemory(ProcessMemoryUtils.java:100)
        at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.memoryProcessSpecFromConfig(ProcessMemoryUtils.java:79)
        at org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils.processSpecFromConfig(TaskExecutorProcessUtils.java:109)
        at org.apache.flink.runtime.clusterframework.TaskExecutorProcessSpecBuilder.build(TaskExecutorProcessSpecBuilder.java:58)
        at org.apache.flink.runtime.resourcemanager.WorkerResourceSpecFactory.workerResourceSpecFromConfigAndCpu(WorkerResourceSpecFactory.java:37)
        at com.netflix.spaas.runtime.resourcemanager.TitusWorkerResourceSpecFactory.createDefaultWorkerResourceSpec(TitusWorkerResourceSpecFactory.java:17)
        at org.apache.flink.runtime.resourcemanager.ResourceManagerRuntimeServicesConfiguration.fromConfiguration(ResourceManagerRuntimeServicesConfiguration.java:67)
        at com.netflix.spaas.runtime.resourcemanager.TitusResourceManagerFactory.createResourceManager(TitusResourceManagerFactory.java:53)
        at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:167)
        ... 9 more
Caused by: java.lang.IllegalArgumentException: Could not parse value '7500}' for key 'taskmanager.memory.process.size'.
        at org.apache.flink.configuration.Configuration.getOptional(Configuration.java:753)
        at org.apache.flink.configuration.Configuration.get(Configuration.java:738)
        at org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.getMemorySizeFromConfig(ProcessMemoryUtils.java:232)
        ... 18 more
Caused by: java.lang.IllegalArgumentException: Memory size unit '}' does not match any of the recognized units: (b | bytes) / (k | kb | kibibytes) / (m | mb | mebibytes) / (g | gb | gibibytes) / (t | tb | tebibytes)
        at org.apache.flink.configuration.MemorySize.parseUnit(MemorySize.java:331)
        at org.apache.flink.configuration.MemorySize.parseBytes(MemorySize.java:306)
        at org.apache.flink.configuration.MemorySize.parse(MemorySize.java:247)
        at org.apache.flink.configuration.Configuration.convertToMemorySize(Configuration.java:951)
        at org.apache.flink.configuration.Configuration.convertValue(Configuration.java:885)
        at org.apache.flink.configuration.Configuration.lambda$getOptional$2(Configuration.java:750)
        at java.util.Optional.map(Optional.java:215)
        at org.apache.flink.configuration.Configuration.getOptional(Configuration.java:750)
        ... 20 more
{code}
We extend from WorkerResourceSpecFactory similar to KubernetesWorkerResourceSpecFactory.
{code:java}
public class TitusWorkerResourceSpecFactory extends WorkerResourceSpecFactory {

  public static final TitusWorkerResourceSpecFactory INSTANCE =
      new TitusWorkerResourceSpecFactory();

  @Override
  public WorkerResourceSpec createDefaultWorkerResourceSpec(Configuration configuration) {
    return workerResourceSpecFromConfigAndCpu(configuration, getDefaultCpus(configuration));
  }

  @VisibleForTesting
  static CPUResource getDefaultCpus(Configuration configuration) {
    double fallback = Double.valueOf(System.getenv("TITUS_NUM_CPU"));
    return TaskExecutorProcessUtils.getCpuCoresWithFallback(configuration, fallback);
  }
}
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)