[DISCUSS] Make Managed Memory always off-heap (Adjustment to FLIP-49)

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Make Managed Memory always off-heap (Adjustment to FLIP-49)

Stephan Ewen
Hi all!

Yesterday, some of the people involved in FLIP-49 had a long discussion
about managed memory in Flink.
Particularly, the fact that we have managed memory either on heap or off
heap and that FLIP-49 introduced having both of these types of memory at
the same time.

==> What we want to suggest is a simplification to only have off-heap
managed memory.

The rationale is the following:
  - Integrating state backends with managed memory means we need to support
"reserving" memory on top of creating MemorySegments.
    Reserving memory isn't really possible on the Java Heap, but works well
off-heap

  - All components that will use managed memory will work with off-heap
managed memory: MemorySegment-based structures, RocksDB, possibly external
processes in the future.

  - A setup where state backends integrate with managed memory, but managed
memory is by default all on-heap breaks the RocksDB backend out of the box
experience.

  - The only state backend to not use managed memory is the
HeapKeyedStateBackend (used in MemoryStateBackend and FileStateBackend). It
means that the HeapKeyedStateBackend always, also when all managed memory
is off-heap.

  - The larger use of the HeapKeyedStateBackend needs a larger JVM heap.
The current FLIP-49 way to get this is to "configure managed memory to
on-heap, but the managed memory will not be used, it just helps to
implicitly grow the heap through the way the heap size is computed. That is
a pretty confusing story. Especially when we start thinking about scenarios
where Flink runs as a library in pre-existing JVM, about the mini-cluster,
etc. It is simpler (and more accurate) to just say that the
HeapKeyedStateBackend does not participate in managed memory, and extensive
use of it requires to user to reserve heap memory (in FLIP-49 you have a
new TaskHeapMemory option to request that a larger heap should be created).

==> This seems to support all scenarios in a nice way out of the box.

==> This seems easier to understand for users.

==> This simplifies the implementation of resource profiles, configuration,
and computation of memory pools.


Does anybody have a concern about his? In particular, would any users be
impacted if MemorySegment based jobs (batch) would now run always with
off-heap memory?

If no one raises an objection, we would update the FLIP-49 proposal to have
a default setup of dividing the Flink memory by default into 50% JVM heap
and 50% managed memory (or even 60%/40%). All state backends and batch jobs
will have a good out-of-the-box experience that way.

Best,
Stephan
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Make Managed Memory always off-heap (Adjustment to FLIP-49)

Jingsong Li
Hi Stephan,

+1 to default have off-heap managed memory.

From the perspective of batch, In our long-term performance test and online
practice:
- There is no significant difference in performance between heap and
off-heap memory. If it is a heap object, the JVM has many opportunities to
optimize in JIT, so generally speaking, the heap object will be faster. But
at present, the manage memory we use in Flink is used as binary. In this
case, we use unsafe api to operate, so there is no obvious performance gap.
- On the contrary, too much memory in the heap will affect the performance
and latency of GC.

But I'm not sure if we should only have off heap managed memory.
According to previous experience, array and object operations in the JVM
will be more beneficial. As mentioned earlier, the JVM/JIT will do a lot of
optimization.
- For vectorization, the way of array is obviously more conducive to
calculation. JVM can have many optimizations in array loop.
- We can consider using some deep code generation to generate some dynamic
Java objects to further speed up the operators. The snappydata[1] has done
some work in this area.

So I am +0 to only have off-heap managed memory. Because we don't rely on
heap memory right now, only a few ideas for the future.

[1] https://github.com/SnappyDataInc/snappydata

Best,
Jingsong Lee

On Wed, Nov 27, 2019 at 10:14 AM Stephan Ewen <[hidden email]> wrote:

> Hi all!
>
> Yesterday, some of the people involved in FLIP-49 had a long discussion
> about managed memory in Flink.
> Particularly, the fact that we have managed memory either on heap or off
> heap and that FLIP-49 introduced having both of these types of memory at
> the same time.
>
> ==> What we want to suggest is a simplification to only have off-heap
> managed memory.
>
> The rationale is the following:
>   - Integrating state backends with managed memory means we need to
> support "reserving" memory on top of creating MemorySegments.
>     Reserving memory isn't really possible on the Java Heap, but works
> well off-heap
>
>   - All components that will use managed memory will work with off-heap
> managed memory: MemorySegment-based structures, RocksDB, possibly external
> processes in the future.
>
>   - A setup where state backends integrate with managed memory, but
> managed memory is by default all on-heap breaks the RocksDB backend out of
> the box experience.
>
>   - The only state backend to not use managed memory is the
> HeapKeyedStateBackend (used in MemoryStateBackend and FileStateBackend). It
> means that the HeapKeyedStateBackend always, also when all managed memory
> is off-heap.
>
>   - The larger use of the HeapKeyedStateBackend needs a larger JVM heap.
> The current FLIP-49 way to get this is to "configure managed memory to
> on-heap, but the managed memory will not be used, it just helps to
> implicitly grow the heap through the way the heap size is computed. That is
> a pretty confusing story. Especially when we start thinking about scenarios
> where Flink runs as a library in pre-existing JVM, about the mini-cluster,
> etc. It is simpler (and more accurate) to just say that the
> HeapKeyedStateBackend does not participate in managed memory, and extensive
> use of it requires to user to reserve heap memory (in FLIP-49 you have a
> new TaskHeapMemory option to request that a larger heap should be created).
>
> ==> This seems to support all scenarios in a nice way out of the box.
>
> ==> This seems easier to understand for users.
>
> ==> This simplifies the implementation of resource profiles,
> configuration, and computation of memory pools.
>
>
> Does anybody have a concern about his? In particular, would any users be
> impacted if MemorySegment based jobs (batch) would now run always with
> off-heap memory?
>
> If no one raises an objection, we would update the FLIP-49 proposal to
> have a default setup of dividing the Flink memory by default into 50% JVM
> heap and 50% managed memory (or even 60%/40%). All state backends and batch
> jobs will have a good out-of-the-box experience that way.
>
> Best,
> Stephan
>


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Make Managed Memory always off-heap (Adjustment to FLIP-49)

Xintong Song
Sorry, I just realized that I've send my feedbacks to Jingsong's email
address, instead of the dev / user mailing list.

Please find my comments below.


Thank you~

Xintong Song

On Wed, Nov 27, 2019 at 4:32 PM Xintong Song <[hidden email]> wrote:

> As a participant of the discussion yesterday, I'm +1 for the proposal of
> removing on-heap managed memory.
>
> And there's one thing I want to add. In order to "reserving" memory (where
> memory consumers do not allocate MemorySegments from MemoryManager but
> allocate the reserved memory themselves), we no longer support
> pre-allocation of memory segments in FLIP-49. That means even if we do not
> remove on-heap managed memory, the MemorySegment will not be allocated
> unless requested by the consumer, and will be deallocated immediately when
> released by the consumer. Thus, it is likely that the memory segments will
> not always stays in the JVM old generation, and will be affected by GC /
> swapping just like other java objects.
>
> @Jingsong, I'm not sure whether this will be related to the performance
> issue that you mentioned.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Nov 27, 2019 at 12:10 PM Jingsong Li <[hidden email]>
> wrote:
>
>> Hi Stephan,
>>
>> +1 to default have off-heap managed memory.
>>
>> From the perspective of batch, In our long-term performance test and
>> online practice:
>> - There is no significant difference in performance between heap and
>> off-heap memory. If it is a heap object, the JVM has many opportunities to
>> optimize in JIT, so generally speaking, the heap object will be faster. But
>> at present, the manage memory we use in Flink is used as binary. In this
>> case, we use unsafe api to operate, so there is no obvious performance gap.
>> - On the contrary, too much memory in the heap will affect the
>> performance and latency of GC.
>>
>> But I'm not sure if we should only have off heap managed memory.
>> According to previous experience, array and object operations in the JVM
>> will be more beneficial. As mentioned earlier, the JVM/JIT will do a lot of
>> optimization.
>> - For vectorization, the way of array is obviously more conducive to
>> calculation. JVM can have many optimizations in array loop.
>> - We can consider using some deep code generation to generate some
>> dynamic Java objects to further speed up the operators. The snappydata[1]
>> has done some work in this area.
>>
>> So I am +0 to only have off-heap managed memory. Because we don't rely on
>> heap memory right now, only a few ideas for the future.
>>
>> [1] https://github.com/SnappyDataInc/snappydata
>>
>> Best,
>> Jingsong Lee
>>
>> On Wed, Nov 27, 2019 at 10:14 AM Stephan Ewen <[hidden email]> wrote:
>>
>>> Hi all!
>>>
>>> Yesterday, some of the people involved in FLIP-49 had a long discussion
>>> about managed memory in Flink.
>>> Particularly, the fact that we have managed memory either on heap or off
>>> heap and that FLIP-49 introduced having both of these types of memory at
>>> the same time.
>>>
>>> ==> What we want to suggest is a simplification to only have off-heap
>>> managed memory.
>>>
>>> The rationale is the following:
>>>   - Integrating state backends with managed memory means we need to
>>> support "reserving" memory on top of creating MemorySegments.
>>>     Reserving memory isn't really possible on the Java Heap, but works
>>> well off-heap
>>>
>>>   - All components that will use managed memory will work with off-heap
>>> managed memory: MemorySegment-based structures, RocksDB, possibly external
>>> processes in the future.
>>>
>>>   - A setup where state backends integrate with managed memory, but
>>> managed memory is by default all on-heap breaks the RocksDB backend out of
>>> the box experience.
>>>
>>>   - The only state backend to not use managed memory is the
>>> HeapKeyedStateBackend (used in MemoryStateBackend and FileStateBackend). It
>>> means that the HeapKeyedStateBackend always, also when all managed memory
>>> is off-heap.
>>>
>>>   - The larger use of the HeapKeyedStateBackend needs a larger JVM heap.
>>> The current FLIP-49 way to get this is to "configure managed memory to
>>> on-heap, but the managed memory will not be used, it just helps to
>>> implicitly grow the heap through the way the heap size is computed. That is
>>> a pretty confusing story. Especially when we start thinking about scenarios
>>> where Flink runs as a library in pre-existing JVM, about the mini-cluster,
>>> etc. It is simpler (and more accurate) to just say that the
>>> HeapKeyedStateBackend does not participate in managed memory, and extensive
>>> use of it requires to user to reserve heap memory (in FLIP-49 you have a
>>> new TaskHeapMemory option to request that a larger heap should be created).
>>>
>>> ==> This seems to support all scenarios in a nice way out of the box.
>>>
>>> ==> This seems easier to understand for users.
>>>
>>> ==> This simplifies the implementation of resource profiles,
>>> configuration, and computation of memory pools.
>>>
>>>
>>> Does anybody have a concern about his? In particular, would any users be
>>> impacted if MemorySegment based jobs (batch) would now run always with
>>> off-heap memory?
>>>
>>> If no one raises an objection, we would update the FLIP-49 proposal to
>>> have a default setup of dividing the Flink memory by default into 50% JVM
>>> heap and 50% managed memory (or even 60%/40%). All state backends and batch
>>> jobs will have a good out-of-the-box experience that way.
>>>
>>> Best,
>>> Stephan
>>>
>>
>>
>> --
>> Best, Jingsong Lee
>>
>