[jira] [Created] (FLINK-15981) Control the direct memory in FileChannelBoundedData.FileBufferReader

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-15981) Control the direct memory in FileChannelBoundedData.FileBufferReader

Shang Yuanchun (Jira)
Jingsong Lee created FLINK-15981:
------------------------------------

             Summary: Control the direct memory in FileChannelBoundedData.FileBufferReader
                 Key: FLINK-15981
                 URL: https://issues.apache.org/jira/browse/FLINK-15981
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Network
    Affects Versions: 1.10.0
            Reporter: Jingsong Lee
             Fix For: 1.11.0


Now, the default blocking BoundedData is FileChannelBoundedData. In its reader, will create new direct buffer 64KB.

When parallelism greater than 100, users need configure "taskmanager.memory.task.off-heap.size" to avoid direct memory OOM. It is hard to configure, and it cost a lot of memory. Consider 1000 parallelism, maybe we need 1GB+ for a task manager.

This is not conducive to the scenario of less slots and large parallelism. Batch jobs could run little by little, but memory shortage would consume a lot.

If we provided N-Input operators, maybe things will be worse. This means the number of subpartitions that can be requested at the same time will be more. We have no idea how much memory.

Here are my rough thoughts:
 * Obtain memory from network buffers.
 * provide "The maximum number of subpartitions that can be requested at the same time".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)