[Discuss] Tuning FLIP-49 configuration default values.

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[Discuss] Tuning FLIP-49 configuration default values.

Xintong Song
Hi all,

As described in FLINK-15145 [1], we decided to tune the default
configuration values of FLIP-49 with more jobs and cases.

After spending time analyzing and tuning the configurations, I've come with
several findings. To be brief, I would suggest the following changes, and
for more details please take a look at my tuning report [2].

   - Change default managed memory fraction from 0.4 to 0.3.
   - Change default JVM metaspace size from 128MB to 64MB.
   - Change default JVM overhead min size from 128MB to 196MB.

Looking forward to your feedback.

Thank you~

Xintong Song


[1] https://issues.apache.org/jira/browse/FLINK-15145

[2]
https://docs.google.com/document/d/1-LravhQYUIkXb7rh0XnBB78vSvhp3ecLSAgsiabfVkk/edit?usp=sharing
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Tuning FLIP-49 configuration default values.

Stephan Ewen
Hi all!

Thanks a lot, Xintong, for this thorough analysis. Based on your analysis,
here are some thoughts:

+1 to change default JVM metaspace size from 128MB to 64MB
+1 to change default JVM overhead min size from 128MB to 196MB

Concerning the managed memory fraction, I am not sure I would change it,
for the following reasons:

  - We should assume RocksDB will be limited to managed memory by default.
This will either be active by default or we would encourage everyone to use
this by default, because otherwise it is super hard to reason about the
RocksDB footprint.
  - For standalone, a managed memory fraction of 0.3 is less than half of
the managed memory from 1.9.
  - I am not sure if the managed memory fraction is a value that all users
adjust immediately when scaling up the memory during their first try-out
phase. I would assume that most users initially only adjust
"memory.flink.size" or "memory.process.size". A value of 0.3 will lead to
having too large heaps and very little RocksDB / batch memory even when
scaling up during the initial exploration.
  - I agree, though, that 0.5 looks too aggressive, from your benchmarks.
So maybe keeping it at 0.4 could work?

And one question: Why do we set the Framework Heap by default? Is that so
we reduce the managed memory further is less than framework heap would be
left from the JVM heap?

Best,
Stephan

On Thu, Jan 9, 2020 at 10:54 AM Xintong Song <[hidden email]> wrote:

> Hi all,
>
> As described in FLINK-15145 [1], we decided to tune the default
> configuration values of FLIP-49 with more jobs and cases.
>
> After spending time analyzing and tuning the configurations, I've come
> with several findings. To be brief, I would suggest the following changes,
> and for more details please take a look at my tuning report [2].
>
>    - Change default managed memory fraction from 0.4 to 0.3.
>    - Change default JVM metaspace size from 128MB to 64MB.
>    - Change default JVM overhead min size from 128MB to 196MB.
>
> Looking forward to your feedback.
>
> Thank you~
>
> Xintong Song
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-15145
>
> [2]
> https://docs.google.com/document/d/1-LravhQYUIkXb7rh0XnBB78vSvhp3ecLSAgsiabfVkk/edit?usp=sharing
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Tuning FLIP-49 configuration default values.

Kurt Young
HI Xingtong,

IIRC during our tpc-ds 10T benchmark, we have suffered by JM's metaspace
size and full gc which
caused by lots of classloadings of source input split. Could you check
whether changing the default
value from 128MB to 64MB will make it worse?

Correct me if I misunderstood anything, also cc @Jingsong

Best,
Kurt


On Tue, Jan 14, 2020 at 3:44 AM Stephan Ewen <[hidden email]> wrote:

> Hi all!
>
> Thanks a lot, Xintong, for this thorough analysis. Based on your analysis,
> here are some thoughts:
>
> +1 to change default JVM metaspace size from 128MB to 64MB
> +1 to change default JVM overhead min size from 128MB to 196MB
>
> Concerning the managed memory fraction, I am not sure I would change it,
> for the following reasons:
>
>   - We should assume RocksDB will be limited to managed memory by default.
> This will either be active by default or we would encourage everyone to use
> this by default, because otherwise it is super hard to reason about the
> RocksDB footprint.
>   - For standalone, a managed memory fraction of 0.3 is less than half of
> the managed memory from 1.9.
>   - I am not sure if the managed memory fraction is a value that all users
> adjust immediately when scaling up the memory during their first try-out
> phase. I would assume that most users initially only adjust
> "memory.flink.size" or "memory.process.size". A value of 0.3 will lead to
> having too large heaps and very little RocksDB / batch memory even when
> scaling up during the initial exploration.
>   - I agree, though, that 0.5 looks too aggressive, from your benchmarks.
> So maybe keeping it at 0.4 could work?
>
> And one question: Why do we set the Framework Heap by default? Is that so
> we reduce the managed memory further is less than framework heap would be
> left from the JVM heap?
>
> Best,
> Stephan
>
> On Thu, Jan 9, 2020 at 10:54 AM Xintong Song <[hidden email]>
> wrote:
>
> > Hi all,
> >
> > As described in FLINK-15145 [1], we decided to tune the default
> > configuration values of FLIP-49 with more jobs and cases.
> >
> > After spending time analyzing and tuning the configurations, I've come
> > with several findings. To be brief, I would suggest the following
> changes,
> > and for more details please take a look at my tuning report [2].
> >
> >    - Change default managed memory fraction from 0.4 to 0.3.
> >    - Change default JVM metaspace size from 128MB to 64MB.
> >    - Change default JVM overhead min size from 128MB to 196MB.
> >
> > Looking forward to your feedback.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-15145
> >
> > [2]
> >
> https://docs.google.com/document/d/1-LravhQYUIkXb7rh0XnBB78vSvhp3ecLSAgsiabfVkk/edit?usp=sharing
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Tuning FLIP-49 configuration default values.

Xintong Song
Thanks for the feedback, Stephan and Kurt.

@Stephan

Regarding managed memory fraction,
- It makes sense to keep the default value 0.4, if we assume rocksdb memory
is limited by default.
- AFAIK, currently rocksdb by default does not limit its memory usage. And
I'm positive to change it.
- Personally, I don't like the idea that we the out-of-box experience (for
which we set the default fraction) relies on that users will manually turn
another switch on.

Regarding framework heap memory,
- The major reason we set it by default is, as you mentioned, that to have
a safe net of minimal JVM heap size.
- Also, considering the in progress FLIP-56 (dynamic slot allocation), we
want to reserve some heap memory that will not go into the slot profiles.
That's why we decide the default value according to the heap memory usage
of an empty task executor.

@Kurt
Regarding metaspace,
- This config option ("taskmanager.memory.jvm-metaspace") only takes effect
on TMs. Currently we do not set metaspace size for JM.
- If we have the same metaspace problem on TMs, then yes, changing it from
128M to 64M will make it worse. However, IMO 10T tpc-ds benchmark should
not be considered as out-of-box experience and it makes sense to tune the
configurations for it. I think the smaller metaspace size would be a better
choice for the first trying-out, where a job should not be too complicated,
the TM size could be relative small (e.g. 1g).

Thank you~

Xintong Song



On Tue, Jan 14, 2020 at 9:38 AM Kurt Young <[hidden email]> wrote:

> HI Xingtong,
>
> IIRC during our tpc-ds 10T benchmark, we have suffered by JM's metaspace
> size and full gc which
> caused by lots of classloadings of source input split. Could you check
> whether changing the default
> value from 128MB to 64MB will make it worse?
>
> Correct me if I misunderstood anything, also cc @Jingsong
>
> Best,
> Kurt
>
>
> On Tue, Jan 14, 2020 at 3:44 AM Stephan Ewen <[hidden email]> wrote:
>
>> Hi all!
>>
>> Thanks a lot, Xintong, for this thorough analysis. Based on your analysis,
>> here are some thoughts:
>>
>> +1 to change default JVM metaspace size from 128MB to 64MB
>> +1 to change default JVM overhead min size from 128MB to 196MB
>>
>> Concerning the managed memory fraction, I am not sure I would change it,
>> for the following reasons:
>>
>>   - We should assume RocksDB will be limited to managed memory by default.
>> This will either be active by default or we would encourage everyone to
>> use
>> this by default, because otherwise it is super hard to reason about the
>> RocksDB footprint.
>>   - For standalone, a managed memory fraction of 0.3 is less than half of
>> the managed memory from 1.9.
>>   - I am not sure if the managed memory fraction is a value that all users
>> adjust immediately when scaling up the memory during their first try-out
>> phase. I would assume that most users initially only adjust
>> "memory.flink.size" or "memory.process.size". A value of 0.3 will lead to
>> having too large heaps and very little RocksDB / batch memory even when
>> scaling up during the initial exploration.
>>   - I agree, though, that 0.5 looks too aggressive, from your benchmarks.
>> So maybe keeping it at 0.4 could work?
>>
>> And one question: Why do we set the Framework Heap by default? Is that so
>> we reduce the managed memory further is less than framework heap would be
>> left from the JVM heap?
>>
>> Best,
>> Stephan
>>
>> On Thu, Jan 9, 2020 at 10:54 AM Xintong Song <[hidden email]>
>> wrote:
>>
>> > Hi all,
>> >
>> > As described in FLINK-15145 [1], we decided to tune the default
>> > configuration values of FLIP-49 with more jobs and cases.
>> >
>> > After spending time analyzing and tuning the configurations, I've come
>> > with several findings. To be brief, I would suggest the following
>> changes,
>> > and for more details please take a look at my tuning report [2].
>> >
>> >    - Change default managed memory fraction from 0.4 to 0.3.
>> >    - Change default JVM metaspace size from 128MB to 64MB.
>> >    - Change default JVM overhead min size from 128MB to 196MB.
>> >
>> > Looking forward to your feedback.
>> >
>> > Thank you~
>> >
>> > Xintong Song
>> >
>> >
>> > [1] https://issues.apache.org/jira/browse/FLINK-15145
>> >
>> > [2]
>> >
>> https://docs.google.com/document/d/1-LravhQYUIkXb7rh0XnBB78vSvhp3ecLSAgsiabfVkk/edit?usp=sharing
>> >
>> >
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Tuning FLIP-49 configuration default values.

Till Rohrmann
I guess one of the most important results of this experiment is to have a
good tuning guide available for users who are past the initial try-out
phase because the default settings will be kind of a compromise. I assume
that this is part of the outstanding FLIP-49 documentation task.

If we limit RocksDB's memory consumption by default, then I believe that
0.4 would give the better all-round experience as it leaves a bit more
memory for RocksDB. However, I'm a bit sceptical whether we should optimize
the default settings for a configuration where the user still needs to
activate the strict memory limiting for RocksDB. In this case, I would
expect that the user could also adapt the managed memory fraction.

Cheers,
Till

On Tue, Jan 14, 2020 at 3:39 AM Xintong Song <[hidden email]> wrote:

> Thanks for the feedback, Stephan and Kurt.
>
> @Stephan
>
> Regarding managed memory fraction,
> - It makes sense to keep the default value 0.4, if we assume rocksdb
> memory is limited by default.
> - AFAIK, currently rocksdb by default does not limit its memory usage. And
> I'm positive to change it.
> - Personally, I don't like the idea that we the out-of-box experience (for
> which we set the default fraction) relies on that users will manually turn
> another switch on.
>
> Regarding framework heap memory,
> - The major reason we set it by default is, as you mentioned, that to have
> a safe net of minimal JVM heap size.
> - Also, considering the in progress FLIP-56 (dynamic slot allocation), we
> want to reserve some heap memory that will not go into the slot profiles.
> That's why we decide the default value according to the heap memory usage
> of an empty task executor.
>
> @Kurt
> Regarding metaspace,
> - This config option ("taskmanager.memory.jvm-metaspace") only takes
> effect on TMs. Currently we do not set metaspace size for JM.
> - If we have the same metaspace problem on TMs, then yes, changing it from
> 128M to 64M will make it worse. However, IMO 10T tpc-ds benchmark should
> not be considered as out-of-box experience and it makes sense to tune the
> configurations for it. I think the smaller metaspace size would be a better
> choice for the first trying-out, where a job should not be too complicated,
> the TM size could be relative small (e.g. 1g).
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, Jan 14, 2020 at 9:38 AM Kurt Young <[hidden email]> wrote:
>
>> HI Xingtong,
>>
>> IIRC during our tpc-ds 10T benchmark, we have suffered by JM's metaspace
>> size and full gc which
>> caused by lots of classloadings of source input split. Could you check
>> whether changing the default
>> value from 128MB to 64MB will make it worse?
>>
>> Correct me if I misunderstood anything, also cc @Jingsong
>>
>> Best,
>> Kurt
>>
>>
>> On Tue, Jan 14, 2020 at 3:44 AM Stephan Ewen <[hidden email]> wrote:
>>
>>> Hi all!
>>>
>>> Thanks a lot, Xintong, for this thorough analysis. Based on your
>>> analysis,
>>> here are some thoughts:
>>>
>>> +1 to change default JVM metaspace size from 128MB to 64MB
>>> +1 to change default JVM overhead min size from 128MB to 196MB
>>>
>>> Concerning the managed memory fraction, I am not sure I would change it,
>>> for the following reasons:
>>>
>>>   - We should assume RocksDB will be limited to managed memory by
>>> default.
>>> This will either be active by default or we would encourage everyone to
>>> use
>>> this by default, because otherwise it is super hard to reason about the
>>> RocksDB footprint.
>>>   - For standalone, a managed memory fraction of 0.3 is less than half of
>>> the managed memory from 1.9.
>>>   - I am not sure if the managed memory fraction is a value that all
>>> users
>>> adjust immediately when scaling up the memory during their first try-out
>>> phase. I would assume that most users initially only adjust
>>> "memory.flink.size" or "memory.process.size". A value of 0.3 will lead to
>>> having too large heaps and very little RocksDB / batch memory even when
>>> scaling up during the initial exploration.
>>>   - I agree, though, that 0.5 looks too aggressive, from your benchmarks.
>>> So maybe keeping it at 0.4 could work?
>>>
>>> And one question: Why do we set the Framework Heap by default? Is that so
>>> we reduce the managed memory further is less than framework heap would be
>>> left from the JVM heap?
>>>
>>> Best,
>>> Stephan
>>>
>>> On Thu, Jan 9, 2020 at 10:54 AM Xintong Song <[hidden email]>
>>> wrote:
>>>
>>> > Hi all,
>>> >
>>> > As described in FLINK-15145 [1], we decided to tune the default
>>> > configuration values of FLIP-49 with more jobs and cases.
>>> >
>>> > After spending time analyzing and tuning the configurations, I've come
>>> > with several findings. To be brief, I would suggest the following
>>> changes,
>>> > and for more details please take a look at my tuning report [2].
>>> >
>>> >    - Change default managed memory fraction from 0.4 to 0.3.
>>> >    - Change default JVM metaspace size from 128MB to 64MB.
>>> >    - Change default JVM overhead min size from 128MB to 196MB.
>>> >
>>> > Looking forward to your feedback.
>>> >
>>> > Thank you~
>>> >
>>> > Xintong Song
>>> >
>>> >
>>> > [1] https://issues.apache.org/jira/browse/FLINK-15145
>>> >
>>> > [2]
>>> >
>>> https://docs.google.com/document/d/1-LravhQYUIkXb7rh0XnBB78vSvhp3ecLSAgsiabfVkk/edit?usp=sharing
>>> >
>>> >
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Tuning FLIP-49 configuration default values.

Till Rohrmann
+1 for the JVM metaspace and overhead changes.

On Tue, Jan 14, 2020 at 11:19 AM Till Rohrmann <[hidden email]> wrote:

> I guess one of the most important results of this experiment is to have a
> good tuning guide available for users who are past the initial try-out
> phase because the default settings will be kind of a compromise. I assume
> that this is part of the outstanding FLIP-49 documentation task.
>
> If we limit RocksDB's memory consumption by default, then I believe that
> 0.4 would give the better all-round experience as it leaves a bit more
> memory for RocksDB. However, I'm a bit sceptical whether we should optimize
> the default settings for a configuration where the user still needs to
> activate the strict memory limiting for RocksDB. In this case, I would
> expect that the user could also adapt the managed memory fraction.
>
> Cheers,
> Till
>
> On Tue, Jan 14, 2020 at 3:39 AM Xintong Song <[hidden email]>
> wrote:
>
>> Thanks for the feedback, Stephan and Kurt.
>>
>> @Stephan
>>
>> Regarding managed memory fraction,
>> - It makes sense to keep the default value 0.4, if we assume rocksdb
>> memory is limited by default.
>> - AFAIK, currently rocksdb by default does not limit its memory usage.
>> And I'm positive to change it.
>> - Personally, I don't like the idea that we the out-of-box experience
>> (for which we set the default fraction) relies on that users will manually
>> turn another switch on.
>>
>> Regarding framework heap memory,
>> - The major reason we set it by default is, as you mentioned, that to
>> have a safe net of minimal JVM heap size.
>> - Also, considering the in progress FLIP-56 (dynamic slot allocation), we
>> want to reserve some heap memory that will not go into the slot profiles.
>> That's why we decide the default value according to the heap memory usage
>> of an empty task executor.
>>
>> @Kurt
>> Regarding metaspace,
>> - This config option ("taskmanager.memory.jvm-metaspace") only takes
>> effect on TMs. Currently we do not set metaspace size for JM.
>> - If we have the same metaspace problem on TMs, then yes, changing it
>> from 128M to 64M will make it worse. However, IMO 10T tpc-ds benchmark
>> should not be considered as out-of-box experience and it makes sense to
>> tune the configurations for it. I think the smaller metaspace size would be
>> a better choice for the first trying-out, where a job should not be too
>> complicated, the TM size could be relative small (e.g. 1g).
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>>
>> On Tue, Jan 14, 2020 at 9:38 AM Kurt Young <[hidden email]> wrote:
>>
>>> HI Xingtong,
>>>
>>> IIRC during our tpc-ds 10T benchmark, we have suffered by JM's metaspace
>>> size and full gc which
>>> caused by lots of classloadings of source input split. Could you check
>>> whether changing the default
>>> value from 128MB to 64MB will make it worse?
>>>
>>> Correct me if I misunderstood anything, also cc @Jingsong
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Tue, Jan 14, 2020 at 3:44 AM Stephan Ewen <[hidden email]> wrote:
>>>
>>>> Hi all!
>>>>
>>>> Thanks a lot, Xintong, for this thorough analysis. Based on your
>>>> analysis,
>>>> here are some thoughts:
>>>>
>>>> +1 to change default JVM metaspace size from 128MB to 64MB
>>>> +1 to change default JVM overhead min size from 128MB to 196MB
>>>>
>>>> Concerning the managed memory fraction, I am not sure I would change it,
>>>> for the following reasons:
>>>>
>>>>   - We should assume RocksDB will be limited to managed memory by
>>>> default.
>>>> This will either be active by default or we would encourage everyone to
>>>> use
>>>> this by default, because otherwise it is super hard to reason about the
>>>> RocksDB footprint.
>>>>   - For standalone, a managed memory fraction of 0.3 is less than half
>>>> of
>>>> the managed memory from 1.9.
>>>>   - I am not sure if the managed memory fraction is a value that all
>>>> users
>>>> adjust immediately when scaling up the memory during their first try-out
>>>> phase. I would assume that most users initially only adjust
>>>> "memory.flink.size" or "memory.process.size". A value of 0.3 will lead
>>>> to
>>>> having too large heaps and very little RocksDB / batch memory even when
>>>> scaling up during the initial exploration.
>>>>   - I agree, though, that 0.5 looks too aggressive, from your
>>>> benchmarks.
>>>> So maybe keeping it at 0.4 could work?
>>>>
>>>> And one question: Why do we set the Framework Heap by default? Is that
>>>> so
>>>> we reduce the managed memory further is less than framework heap would
>>>> be
>>>> left from the JVM heap?
>>>>
>>>> Best,
>>>> Stephan
>>>>
>>>> On Thu, Jan 9, 2020 at 10:54 AM Xintong Song <[hidden email]>
>>>> wrote:
>>>>
>>>> > Hi all,
>>>> >
>>>> > As described in FLINK-15145 [1], we decided to tune the default
>>>> > configuration values of FLIP-49 with more jobs and cases.
>>>> >
>>>> > After spending time analyzing and tuning the configurations, I've come
>>>> > with several findings. To be brief, I would suggest the following
>>>> changes,
>>>> > and for more details please take a look at my tuning report [2].
>>>> >
>>>> >    - Change default managed memory fraction from 0.4 to 0.3.
>>>> >    - Change default JVM metaspace size from 128MB to 64MB.
>>>> >    - Change default JVM overhead min size from 128MB to 196MB.
>>>> >
>>>> > Looking forward to your feedback.
>>>> >
>>>> > Thank you~
>>>> >
>>>> > Xintong Song
>>>> >
>>>> >
>>>> > [1] https://issues.apache.org/jira/browse/FLINK-15145
>>>> >
>>>> > [2]
>>>> >
>>>> https://docs.google.com/document/d/1-LravhQYUIkXb7rh0XnBB78vSvhp3ecLSAgsiabfVkk/edit?usp=sharing
>>>> >
>>>> >
>>>>
>>>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Tuning FLIP-49 configuration default values.

Andrey Zagrebin-4
Hi all!

Great that we have already tried out new FLIP-49 with the bigger jobs.

I am also +1 for the JVM metaspace and overhead changes.

Regarding 0.3 vs 0.4 for managed memory, +1 for having more managed memory
for Rocksdb limiting case.

In general, this looks mostly to be about memory distribution between JVM
heap and managed off-heap.
Comparing to the previous default setup, the JVM heap dropped (especially
for standalone) mostly due to moving managed from heap to off-heap and then
also adding framework off-heap.
In general, this can be the most important consequence for beginners and
those who rely on the default configuration.
Especially the legacy default configuration in standalone with falling back
heap.size to flink.size but there it seems we cannot do too much now.

I prepared a spreadsheet
<https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE>
to play with numbers for the mentioned in the report setups.

One idea would be to set process size (or smaller flink size respectively)
to a bigger default number, like 2048.
In this case, the abs derived default JVM heap and managed memory are close
to the previous defaults, especially for managed fraction 0.3.
This should align the defaults with the previous standalone try-out
experience where the increased off-heap memory is not strictly controlled
by the environment anyways.
The consequence for container users who relied on and updated the default
configuration is that the containers will be requested with the double size.

Best,
Andrey


On Tue, Jan 14, 2020 at 11:20 AM Till Rohrmann <[hidden email]> wrote:

> +1 for the JVM metaspace and overhead changes.
>
> On Tue, Jan 14, 2020 at 11:19 AM Till Rohrmann <[hidden email]>
> wrote:
>
>> I guess one of the most important results of this experiment is to have a
>> good tuning guide available for users who are past the initial try-out
>> phase because the default settings will be kind of a compromise. I assume
>> that this is part of the outstanding FLIP-49 documentation task.
>>
>> If we limit RocksDB's memory consumption by default, then I believe that
>> 0.4 would give the better all-round experience as it leaves a bit more
>> memory for RocksDB. However, I'm a bit sceptical whether we should optimize
>> the default settings for a configuration where the user still needs to
>> activate the strict memory limiting for RocksDB. In this case, I would
>> expect that the user could also adapt the managed memory fraction.
>>
>> Cheers,
>> Till
>>
>> On Tue, Jan 14, 2020 at 3:39 AM Xintong Song <[hidden email]>
>> wrote:
>>
>>> Thanks for the feedback, Stephan and Kurt.
>>>
>>> @Stephan
>>>
>>> Regarding managed memory fraction,
>>> - It makes sense to keep the default value 0.4, if we assume rocksdb
>>> memory is limited by default.
>>> - AFAIK, currently rocksdb by default does not limit its memory usage.
>>> And I'm positive to change it.
>>> - Personally, I don't like the idea that we the out-of-box experience
>>> (for which we set the default fraction) relies on that users will manually
>>> turn another switch on.
>>>
>>> Regarding framework heap memory,
>>> - The major reason we set it by default is, as you mentioned, that to
>>> have a safe net of minimal JVM heap size.
>>> - Also, considering the in progress FLIP-56 (dynamic slot allocation),
>>> we want to reserve some heap memory that will not go into the slot
>>> profiles. That's why we decide the default value according to the heap
>>> memory usage of an empty task executor.
>>>
>>> @Kurt
>>> Regarding metaspace,
>>> - This config option ("taskmanager.memory.jvm-metaspace") only takes
>>> effect on TMs. Currently we do not set metaspace size for JM.
>>> - If we have the same metaspace problem on TMs, then yes, changing it
>>> from 128M to 64M will make it worse. However, IMO 10T tpc-ds benchmark
>>> should not be considered as out-of-box experience and it makes sense to
>>> tune the configurations for it. I think the smaller metaspace size would be
>>> a better choice for the first trying-out, where a job should not be too
>>> complicated, the TM size could be relative small (e.g. 1g).
>>>
>>> Thank you~
>>>
>>> Xintong Song
>>>
>>>
>>>
>>> On Tue, Jan 14, 2020 at 9:38 AM Kurt Young <[hidden email]> wrote:
>>>
>>>> HI Xingtong,
>>>>
>>>> IIRC during our tpc-ds 10T benchmark, we have suffered by JM's
>>>> metaspace size and full gc which
>>>> caused by lots of classloadings of source input split. Could you check
>>>> whether changing the default
>>>> value from 128MB to 64MB will make it worse?
>>>>
>>>> Correct me if I misunderstood anything, also cc @Jingsong
>>>>
>>>> Best,
>>>> Kurt
>>>>
>>>>
>>>> On Tue, Jan 14, 2020 at 3:44 AM Stephan Ewen <[hidden email]> wrote:
>>>>
>>>>> Hi all!
>>>>>
>>>>> Thanks a lot, Xintong, for this thorough analysis. Based on your
>>>>> analysis,
>>>>> here are some thoughts:
>>>>>
>>>>> +1 to change default JVM metaspace size from 128MB to 64MB
>>>>> +1 to change default JVM overhead min size from 128MB to 196MB
>>>>>
>>>>> Concerning the managed memory fraction, I am not sure I would change
>>>>> it,
>>>>> for the following reasons:
>>>>>
>>>>>   - We should assume RocksDB will be limited to managed memory by
>>>>> default.
>>>>> This will either be active by default or we would encourage everyone
>>>>> to use
>>>>> this by default, because otherwise it is super hard to reason about the
>>>>> RocksDB footprint.
>>>>>   - For standalone, a managed memory fraction of 0.3 is less than half
>>>>> of
>>>>> the managed memory from 1.9.
>>>>>   - I am not sure if the managed memory fraction is a value that all
>>>>> users
>>>>> adjust immediately when scaling up the memory during their first
>>>>> try-out
>>>>> phase. I would assume that most users initially only adjust
>>>>> "memory.flink.size" or "memory.process.size". A value of 0.3 will lead
>>>>> to
>>>>> having too large heaps and very little RocksDB / batch memory even when
>>>>> scaling up during the initial exploration.
>>>>>   - I agree, though, that 0.5 looks too aggressive, from your
>>>>> benchmarks.
>>>>> So maybe keeping it at 0.4 could work?
>>>>>
>>>>> And one question: Why do we set the Framework Heap by default? Is that
>>>>> so
>>>>> we reduce the managed memory further is less than framework heap would
>>>>> be
>>>>> left from the JVM heap?
>>>>>
>>>>> Best,
>>>>> Stephan
>>>>>
>>>>> On Thu, Jan 9, 2020 at 10:54 AM Xintong Song <[hidden email]>
>>>>> wrote:
>>>>>
>>>>> > Hi all,
>>>>> >
>>>>> > As described in FLINK-15145 [1], we decided to tune the default
>>>>> > configuration values of FLIP-49 with more jobs and cases.
>>>>> >
>>>>> > After spending time analyzing and tuning the configurations, I've
>>>>> come
>>>>> > with several findings. To be brief, I would suggest the following
>>>>> changes,
>>>>> > and for more details please take a look at my tuning report [2].
>>>>> >
>>>>> >    - Change default managed memory fraction from 0.4 to 0.3.
>>>>> >    - Change default JVM metaspace size from 128MB to 64MB.
>>>>> >    - Change default JVM overhead min size from 128MB to 196MB.
>>>>> >
>>>>> > Looking forward to your feedback.
>>>>> >
>>>>> > Thank you~
>>>>> >
>>>>> > Xintong Song
>>>>> >
>>>>> >
>>>>> > [1] https://issues.apache.org/jira/browse/FLINK-15145
>>>>> >
>>>>> > [2]
>>>>> >
>>>>> https://docs.google.com/document/d/1-LravhQYUIkXb7rh0XnBB78vSvhp3ecLSAgsiabfVkk/edit?usp=sharing
>>>>> >
>>>>> >
>>>>>
>>>>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Tuning FLIP-49 configuration default values.

Stephan Ewen
I like the idea of having a larger default "flink.size" in the config.yaml.
Maybe we don't need to double it, but something like 1280m would be okay?

On Tue, Jan 14, 2020 at 3:47 PM Andrey Zagrebin <[hidden email]>
wrote:

> Hi all!
>
> Great that we have already tried out new FLIP-49 with the bigger jobs.
>
> I am also +1 for the JVM metaspace and overhead changes.
>
> Regarding 0.3 vs 0.4 for managed memory, +1 for having more managed memory
> for Rocksdb limiting case.
>
> In general, this looks mostly to be about memory distribution between JVM
> heap and managed off-heap.
> Comparing to the previous default setup, the JVM heap dropped (especially
> for standalone) mostly due to moving managed from heap to off-heap and then
> also adding framework off-heap.
> In general, this can be the most important consequence for beginners and
> those who rely on the default configuration.
> Especially the legacy default configuration in standalone with falling back
> heap.size to flink.size but there it seems we cannot do too much now.
>
> I prepared a spreadsheet
> <
> https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE
> >
> to play with numbers for the mentioned in the report setups.
>
> One idea would be to set process size (or smaller flink size respectively)
> to a bigger default number, like 2048.
> In this case, the abs derived default JVM heap and managed memory are close
> to the previous defaults, especially for managed fraction 0.3.
> This should align the defaults with the previous standalone try-out
> experience where the increased off-heap memory is not strictly controlled
> by the environment anyways.
> The consequence for container users who relied on and updated the default
> configuration is that the containers will be requested with the double
> size.
>
> Best,
> Andrey
>
>
> On Tue, Jan 14, 2020 at 11:20 AM Till Rohrmann <[hidden email]>
> wrote:
>
> > +1 for the JVM metaspace and overhead changes.
> >
> > On Tue, Jan 14, 2020 at 11:19 AM Till Rohrmann <[hidden email]>
> > wrote:
> >
> >> I guess one of the most important results of this experiment is to have
> a
> >> good tuning guide available for users who are past the initial try-out
> >> phase because the default settings will be kind of a compromise. I
> assume
> >> that this is part of the outstanding FLIP-49 documentation task.
> >>
> >> If we limit RocksDB's memory consumption by default, then I believe that
> >> 0.4 would give the better all-round experience as it leaves a bit more
> >> memory for RocksDB. However, I'm a bit sceptical whether we should
> optimize
> >> the default settings for a configuration where the user still needs to
> >> activate the strict memory limiting for RocksDB. In this case, I would
> >> expect that the user could also adapt the managed memory fraction.
> >>
> >> Cheers,
> >> Till
> >>
> >> On Tue, Jan 14, 2020 at 3:39 AM Xintong Song <[hidden email]>
> >> wrote:
> >>
> >>> Thanks for the feedback, Stephan and Kurt.
> >>>
> >>> @Stephan
> >>>
> >>> Regarding managed memory fraction,
> >>> - It makes sense to keep the default value 0.4, if we assume rocksdb
> >>> memory is limited by default.
> >>> - AFAIK, currently rocksdb by default does not limit its memory usage.
> >>> And I'm positive to change it.
> >>> - Personally, I don't like the idea that we the out-of-box experience
> >>> (for which we set the default fraction) relies on that users will
> manually
> >>> turn another switch on.
> >>>
> >>> Regarding framework heap memory,
> >>> - The major reason we set it by default is, as you mentioned, that to
> >>> have a safe net of minimal JVM heap size.
> >>> - Also, considering the in progress FLIP-56 (dynamic slot allocation),
> >>> we want to reserve some heap memory that will not go into the slot
> >>> profiles. That's why we decide the default value according to the heap
> >>> memory usage of an empty task executor.
> >>>
> >>> @Kurt
> >>> Regarding metaspace,
> >>> - This config option ("taskmanager.memory.jvm-metaspace") only takes
> >>> effect on TMs. Currently we do not set metaspace size for JM.
> >>> - If we have the same metaspace problem on TMs, then yes, changing it
> >>> from 128M to 64M will make it worse. However, IMO 10T tpc-ds benchmark
> >>> should not be considered as out-of-box experience and it makes sense to
> >>> tune the configurations for it. I think the smaller metaspace size
> would be
> >>> a better choice for the first trying-out, where a job should not be too
> >>> complicated, the TM size could be relative small (e.g. 1g).
> >>>
> >>> Thank you~
> >>>
> >>> Xintong Song
> >>>
> >>>
> >>>
> >>> On Tue, Jan 14, 2020 at 9:38 AM Kurt Young <[hidden email]> wrote:
> >>>
> >>>> HI Xingtong,
> >>>>
> >>>> IIRC during our tpc-ds 10T benchmark, we have suffered by JM's
> >>>> metaspace size and full gc which
> >>>> caused by lots of classloadings of source input split. Could you check
> >>>> whether changing the default
> >>>> value from 128MB to 64MB will make it worse?
> >>>>
> >>>> Correct me if I misunderstood anything, also cc @Jingsong
> >>>>
> >>>> Best,
> >>>> Kurt
> >>>>
> >>>>
> >>>> On Tue, Jan 14, 2020 at 3:44 AM Stephan Ewen <[hidden email]>
> wrote:
> >>>>
> >>>>> Hi all!
> >>>>>
> >>>>> Thanks a lot, Xintong, for this thorough analysis. Based on your
> >>>>> analysis,
> >>>>> here are some thoughts:
> >>>>>
> >>>>> +1 to change default JVM metaspace size from 128MB to 64MB
> >>>>> +1 to change default JVM overhead min size from 128MB to 196MB
> >>>>>
> >>>>> Concerning the managed memory fraction, I am not sure I would change
> >>>>> it,
> >>>>> for the following reasons:
> >>>>>
> >>>>>   - We should assume RocksDB will be limited to managed memory by
> >>>>> default.
> >>>>> This will either be active by default or we would encourage everyone
> >>>>> to use
> >>>>> this by default, because otherwise it is super hard to reason about
> the
> >>>>> RocksDB footprint.
> >>>>>   - For standalone, a managed memory fraction of 0.3 is less than
> half
> >>>>> of
> >>>>> the managed memory from 1.9.
> >>>>>   - I am not sure if the managed memory fraction is a value that all
> >>>>> users
> >>>>> adjust immediately when scaling up the memory during their first
> >>>>> try-out
> >>>>> phase. I would assume that most users initially only adjust
> >>>>> "memory.flink.size" or "memory.process.size". A value of 0.3 will
> lead
> >>>>> to
> >>>>> having too large heaps and very little RocksDB / batch memory even
> when
> >>>>> scaling up during the initial exploration.
> >>>>>   - I agree, though, that 0.5 looks too aggressive, from your
> >>>>> benchmarks.
> >>>>> So maybe keeping it at 0.4 could work?
> >>>>>
> >>>>> And one question: Why do we set the Framework Heap by default? Is
> that
> >>>>> so
> >>>>> we reduce the managed memory further is less than framework heap
> would
> >>>>> be
> >>>>> left from the JVM heap?
> >>>>>
> >>>>> Best,
> >>>>> Stephan
> >>>>>
> >>>>> On Thu, Jan 9, 2020 at 10:54 AM Xintong Song <[hidden email]>
> >>>>> wrote:
> >>>>>
> >>>>> > Hi all,
> >>>>> >
> >>>>> > As described in FLINK-15145 [1], we decided to tune the default
> >>>>> > configuration values of FLIP-49 with more jobs and cases.
> >>>>> >
> >>>>> > After spending time analyzing and tuning the configurations, I've
> >>>>> come
> >>>>> > with several findings. To be brief, I would suggest the following
> >>>>> changes,
> >>>>> > and for more details please take a look at my tuning report [2].
> >>>>> >
> >>>>> >    - Change default managed memory fraction from 0.4 to 0.3.
> >>>>> >    - Change default JVM metaspace size from 128MB to 64MB.
> >>>>> >    - Change default JVM overhead min size from 128MB to 196MB.
> >>>>> >
> >>>>> > Looking forward to your feedback.
> >>>>> >
> >>>>> > Thank you~
> >>>>> >
> >>>>> > Xintong Song
> >>>>> >
> >>>>> >
> >>>>> > [1] https://issues.apache.org/jira/browse/FLINK-15145
> >>>>> >
> >>>>> > [2]
> >>>>> >
> >>>>>
> https://docs.google.com/document/d/1-LravhQYUIkXb7rh0XnBB78vSvhp3ecLSAgsiabfVkk/edit?usp=sharing
> >>>>> >
> >>>>> >
> >>>>>
> >>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Tuning FLIP-49 configuration default values.

Andrey Zagrebin-4
Hi all,

Stephan, Till and me had another offline discussion today. Here is the
outcome of our brainstorm.

*managed fraction 0.4*
just confirmed what we already discussed here.

*process.size = 1536Mb (1,5Gb)*
We agreed to have process.size in the default settings with the explanation
of flink.size alternative in the comment.
The suggestion is to increase it from 1024 to 1536mb. As you can see in the
earlier provided calculation spreadsheet,
it will result in bigger JVM Heap and managed memory (both ~0.5Gb) for all
new setups.
This should provide good enough experience for trying out Flink.

*JVM overhead min 196 -> 192Mb (128 + 64)*
small correction for better power 2 alignment of sizes

*meta space at least 96Mb?*
There is still a concern about JVM metaspace being just 64Mb.
We should confirm that it is not a problem by trying to test it also with
the SQL jobs, Blink planner.
Also, by running tpc-ds e2e Flink tests with this setting. Basically, where
more classes are generated/loaded.
We can look into this tomorrow.

*sanity check of JVM overhead*
When the explicitly configured process and flink memory sizes are verified
with the JVM meta space and overhead,
JVM overhead does not have to be the exact fraction.
It can be just within its min/max range, similar to how it is now for
network/shuffle memory check after FLINK-15300.

Best,Andrey

On Tue, Jan 14, 2020 at 4:30 PM Stephan Ewen <[hidden email]> wrote:

> I like the idea of having a larger default "flink.size" in the config.yaml.
> Maybe we don't need to double it, but something like 1280m would be okay?
>
> On Tue, Jan 14, 2020 at 3:47 PM Andrey Zagrebin <[hidden email]>
> wrote:
>
> > Hi all!
> >
> > Great that we have already tried out new FLIP-49 with the bigger jobs.
> >
> > I am also +1 for the JVM metaspace and overhead changes.
> >
> > Regarding 0.3 vs 0.4 for managed memory, +1 for having more managed
> memory
> > for Rocksdb limiting case.
> >
> > In general, this looks mostly to be about memory distribution between JVM
> > heap and managed off-heap.
> > Comparing to the previous default setup, the JVM heap dropped (especially
> > for standalone) mostly due to moving managed from heap to off-heap and
> then
> > also adding framework off-heap.
> > In general, this can be the most important consequence for beginners and
> > those who rely on the default configuration.
> > Especially the legacy default configuration in standalone with falling
> back
> > heap.size to flink.size but there it seems we cannot do too much now.
> >
> > I prepared a spreadsheet
> > <
> >
> https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE
> > >
> > to play with numbers for the mentioned in the report setups.
> >
> > One idea would be to set process size (or smaller flink size
> respectively)
> > to a bigger default number, like 2048.
> > In this case, the abs derived default JVM heap and managed memory are
> close
> > to the previous defaults, especially for managed fraction 0.3.
> > This should align the defaults with the previous standalone try-out
> > experience where the increased off-heap memory is not strictly controlled
> > by the environment anyways.
> > The consequence for container users who relied on and updated the default
> > configuration is that the containers will be requested with the double
> > size.
> >
> > Best,
> > Andrey
> >
> >
> > On Tue, Jan 14, 2020 at 11:20 AM Till Rohrmann <[hidden email]>
> > wrote:
> >
> > > +1 for the JVM metaspace and overhead changes.
> > >
> > > On Tue, Jan 14, 2020 at 11:19 AM Till Rohrmann <[hidden email]>
> > > wrote:
> > >
> > >> I guess one of the most important results of this experiment is to
> have
> > a
> > >> good tuning guide available for users who are past the initial try-out
> > >> phase because the default settings will be kind of a compromise. I
> > assume
> > >> that this is part of the outstanding FLIP-49 documentation task.
> > >>
> > >> If we limit RocksDB's memory consumption by default, then I believe
> that
> > >> 0.4 would give the better all-round experience as it leaves a bit more
> > >> memory for RocksDB. However, I'm a bit sceptical whether we should
> > optimize
> > >> the default settings for a configuration where the user still needs to
> > >> activate the strict memory limiting for RocksDB. In this case, I would
> > >> expect that the user could also adapt the managed memory fraction.
> > >>
> > >> Cheers,
> > >> Till
> > >>
> > >> On Tue, Jan 14, 2020 at 3:39 AM Xintong Song <[hidden email]>
> > >> wrote:
> > >>
> > >>> Thanks for the feedback, Stephan and Kurt.
> > >>>
> > >>> @Stephan
> > >>>
> > >>> Regarding managed memory fraction,
> > >>> - It makes sense to keep the default value 0.4, if we assume rocksdb
> > >>> memory is limited by default.
> > >>> - AFAIK, currently rocksdb by default does not limit its memory
> usage.
> > >>> And I'm positive to change it.
> > >>> - Personally, I don't like the idea that we the out-of-box experience
> > >>> (for which we set the default fraction) relies on that users will
> > manually
> > >>> turn another switch on.
> > >>>
> > >>> Regarding framework heap memory,
> > >>> - The major reason we set it by default is, as you mentioned, that to
> > >>> have a safe net of minimal JVM heap size.
> > >>> - Also, considering the in progress FLIP-56 (dynamic slot
> allocation),
> > >>> we want to reserve some heap memory that will not go into the slot
> > >>> profiles. That's why we decide the default value according to the
> heap
> > >>> memory usage of an empty task executor.
> > >>>
> > >>> @Kurt
> > >>> Regarding metaspace,
> > >>> - This config option ("taskmanager.memory.jvm-metaspace") only takes
> > >>> effect on TMs. Currently we do not set metaspace size for JM.
> > >>> - If we have the same metaspace problem on TMs, then yes, changing it
> > >>> from 128M to 64M will make it worse. However, IMO 10T tpc-ds
> benchmark
> > >>> should not be considered as out-of-box experience and it makes sense
> to
> > >>> tune the configurations for it. I think the smaller metaspace size
> > would be
> > >>> a better choice for the first trying-out, where a job should not be
> too
> > >>> complicated, the TM size could be relative small (e.g. 1g).
> > >>>
> > >>> Thank you~
> > >>>
> > >>> Xintong Song
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Jan 14, 2020 at 9:38 AM Kurt Young <[hidden email]> wrote:
> > >>>
> > >>>> HI Xingtong,
> > >>>>
> > >>>> IIRC during our tpc-ds 10T benchmark, we have suffered by JM's
> > >>>> metaspace size and full gc which
> > >>>> caused by lots of classloadings of source input split. Could you
> check
> > >>>> whether changing the default
> > >>>> value from 128MB to 64MB will make it worse?
> > >>>>
> > >>>> Correct me if I misunderstood anything, also cc @Jingsong
> > >>>>
> > >>>> Best,
> > >>>> Kurt
> > >>>>
> > >>>>
> > >>>> On Tue, Jan 14, 2020 at 3:44 AM Stephan Ewen <[hidden email]>
> > wrote:
> > >>>>
> > >>>>> Hi all!
> > >>>>>
> > >>>>> Thanks a lot, Xintong, for this thorough analysis. Based on your
> > >>>>> analysis,
> > >>>>> here are some thoughts:
> > >>>>>
> > >>>>> +1 to change default JVM metaspace size from 128MB to 64MB
> > >>>>> +1 to change default JVM overhead min size from 128MB to 196MB
> > >>>>>
> > >>>>> Concerning the managed memory fraction, I am not sure I would
> change
> > >>>>> it,
> > >>>>> for the following reasons:
> > >>>>>
> > >>>>>   - We should assume RocksDB will be limited to managed memory by
> > >>>>> default.
> > >>>>> This will either be active by default or we would encourage
> everyone
> > >>>>> to use
> > >>>>> this by default, because otherwise it is super hard to reason about
> > the
> > >>>>> RocksDB footprint.
> > >>>>>   - For standalone, a managed memory fraction of 0.3 is less than
> > half
> > >>>>> of
> > >>>>> the managed memory from 1.9.
> > >>>>>   - I am not sure if the managed memory fraction is a value that
> all
> > >>>>> users
> > >>>>> adjust immediately when scaling up the memory during their first
> > >>>>> try-out
> > >>>>> phase. I would assume that most users initially only adjust
> > >>>>> "memory.flink.size" or "memory.process.size". A value of 0.3 will
> > lead
> > >>>>> to
> > >>>>> having too large heaps and very little RocksDB / batch memory even
> > when
> > >>>>> scaling up during the initial exploration.
> > >>>>>   - I agree, though, that 0.5 looks too aggressive, from your
> > >>>>> benchmarks.
> > >>>>> So maybe keeping it at 0.4 could work?
> > >>>>>
> > >>>>> And one question: Why do we set the Framework Heap by default? Is
> > that
> > >>>>> so
> > >>>>> we reduce the managed memory further is less than framework heap
> > would
> > >>>>> be
> > >>>>> left from the JVM heap?
> > >>>>>
> > >>>>> Best,
> > >>>>> Stephan
> > >>>>>
> > >>>>> On Thu, Jan 9, 2020 at 10:54 AM Xintong Song <
> [hidden email]>
> > >>>>> wrote:
> > >>>>>
> > >>>>> > Hi all,
> > >>>>> >
> > >>>>> > As described in FLINK-15145 [1], we decided to tune the default
> > >>>>> > configuration values of FLIP-49 with more jobs and cases.
> > >>>>> >
> > >>>>> > After spending time analyzing and tuning the configurations, I've
> > >>>>> come
> > >>>>> > with several findings. To be brief, I would suggest the following
> > >>>>> changes,
> > >>>>> > and for more details please take a look at my tuning report [2].
> > >>>>> >
> > >>>>> >    - Change default managed memory fraction from 0.4 to 0.3.
> > >>>>> >    - Change default JVM metaspace size from 128MB to 64MB.
> > >>>>> >    - Change default JVM overhead min size from 128MB to 196MB.
> > >>>>> >
> > >>>>> > Looking forward to your feedback.
> > >>>>> >
> > >>>>> > Thank you~
> > >>>>> >
> > >>>>> > Xintong Song
> > >>>>> >
> > >>>>> >
> > >>>>> > [1] https://issues.apache.org/jira/browse/FLINK-15145
> > >>>>> >
> > >>>>> > [2]
> > >>>>> >
> > >>>>>
> >
> https://docs.google.com/document/d/1-LravhQYUIkXb7rh0XnBB78vSvhp3ecLSAgsiabfVkk/edit?usp=sharing
> > >>>>> >
> > >>>>> >
> > >>>>>
> > >>>>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Tuning FLIP-49 configuration default values.

Xintong Song
Thanks for the discussion, Stephan, Till and Andrey.

+1 for the managed fraction (0.4) and process.size (1.5G).

*JVM overhead min 196 -> 192Mb (128 + 64)*
> small correction for better power 2 alignment of sizes
>
Sorry, this was a typo (and the same for the jira comment which is
copy-pasted). It was 192mb used in the tuning report.

*meta space at least 96Mb?*
> There is still a concern about JVM metaspace being just 64Mb.
> We should confirm that it is not a problem by trying to test it also with
> the SQL jobs, Blink planner.
> Also, by running tpc-ds e2e Flink tests with this setting. Basically, where
> more classes are generated/loaded.
> We can look into this tomorrow.
>
I have already tried the setting metaspace to 64Mb with the e2e tests,
where I believe various sql / blink / tpc-ds test cases are included. (See
https://travis-ci.com/flink-ci/flink/builds/142970113 )
However, I'm also ok with 96Mb, since we are increasing the process.size to
1.5G.
My original concern for having larger metaspace size was that we may result
in too small flink.size for out-of-box configuration on containerized
setups.

*sanity check of JVM overhead*
> When the explicitly configured process and flink memory sizes are verified
> with the JVM meta space and overhead,
> JVM overhead does not have to be the exact fraction.
> It can be just within its min/max range, similar to how it is now for
> network/shuffle memory check after FLINK-15300.
>
Also +1 for this.

Thank you~

Xintong Song



On Wed, Jan 15, 2020 at 6:16 AM Andrey Zagrebin <[hidden email]>
wrote:

> Hi all,
>
> Stephan, Till and me had another offline discussion today. Here is the
> outcome of our brainstorm.
>
> *managed fraction 0.4*
> just confirmed what we already discussed here.
>
> *process.size = 1536Mb (1,5Gb)*
> We agreed to have process.size in the default settings with the explanation
> of flink.size alternative in the comment.
> The suggestion is to increase it from 1024 to 1536mb. As you can see in the
> earlier provided calculation spreadsheet,
> it will result in bigger JVM Heap and managed memory (both ~0.5Gb) for all
> new setups.
> This should provide good enough experience for trying out Flink.
>
> *JVM overhead min 196 -> 192Mb (128 + 64)*
> small correction for better power 2 alignment of sizes
>
> *meta space at least 96Mb?*
> There is still a concern about JVM metaspace being just 64Mb.
> We should confirm that it is not a problem by trying to test it also with
> the SQL jobs, Blink planner.
> Also, by running tpc-ds e2e Flink tests with this setting. Basically, where
> more classes are generated/loaded.
> We can look into this tomorrow.
>
> *sanity check of JVM overhead*
> When the explicitly configured process and flink memory sizes are verified
> with the JVM meta space and overhead,
> JVM overhead does not have to be the exact fraction.
> It can be just within its min/max range, similar to how it is now for
> network/shuffle memory check after FLINK-15300.
>
> Best,Andrey
>
> On Tue, Jan 14, 2020 at 4:30 PM Stephan Ewen <[hidden email]> wrote:
>
> > I like the idea of having a larger default "flink.size" in the
> config.yaml.
> > Maybe we don't need to double it, but something like 1280m would be okay?
> >
> > On Tue, Jan 14, 2020 at 3:47 PM Andrey Zagrebin <[hidden email]>
> > wrote:
> >
> > > Hi all!
> > >
> > > Great that we have already tried out new FLIP-49 with the bigger jobs.
> > >
> > > I am also +1 for the JVM metaspace and overhead changes.
> > >
> > > Regarding 0.3 vs 0.4 for managed memory, +1 for having more managed
> > memory
> > > for Rocksdb limiting case.
> > >
> > > In general, this looks mostly to be about memory distribution between
> JVM
> > > heap and managed off-heap.
> > > Comparing to the previous default setup, the JVM heap dropped
> (especially
> > > for standalone) mostly due to moving managed from heap to off-heap and
> > then
> > > also adding framework off-heap.
> > > In general, this can be the most important consequence for beginners
> and
> > > those who rely on the default configuration.
> > > Especially the legacy default configuration in standalone with falling
> > back
> > > heap.size to flink.size but there it seems we cannot do too much now.
> > >
> > > I prepared a spreadsheet
> > > <
> > >
> >
> https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE
> > > >
> > > to play with numbers for the mentioned in the report setups.
> > >
> > > One idea would be to set process size (or smaller flink size
> > respectively)
> > > to a bigger default number, like 2048.
> > > In this case, the abs derived default JVM heap and managed memory are
> > close
> > > to the previous defaults, especially for managed fraction 0.3.
> > > This should align the defaults with the previous standalone try-out
> > > experience where the increased off-heap memory is not strictly
> controlled
> > > by the environment anyways.
> > > The consequence for container users who relied on and updated the
> default
> > > configuration is that the containers will be requested with the double
> > > size.
> > >
> > > Best,
> > > Andrey
> > >
> > >
> > > On Tue, Jan 14, 2020 at 11:20 AM Till Rohrmann <[hidden email]>
> > > wrote:
> > >
> > > > +1 for the JVM metaspace and overhead changes.
> > > >
> > > > On Tue, Jan 14, 2020 at 11:19 AM Till Rohrmann <[hidden email]
> >
> > > > wrote:
> > > >
> > > >> I guess one of the most important results of this experiment is to
> > have
> > > a
> > > >> good tuning guide available for users who are past the initial
> try-out
> > > >> phase because the default settings will be kind of a compromise. I
> > > assume
> > > >> that this is part of the outstanding FLIP-49 documentation task.
> > > >>
> > > >> If we limit RocksDB's memory consumption by default, then I believe
> > that
> > > >> 0.4 would give the better all-round experience as it leaves a bit
> more
> > > >> memory for RocksDB. However, I'm a bit sceptical whether we should
> > > optimize
> > > >> the default settings for a configuration where the user still needs
> to
> > > >> activate the strict memory limiting for RocksDB. In this case, I
> would
> > > >> expect that the user could also adapt the managed memory fraction.
> > > >>
> > > >> Cheers,
> > > >> Till
> > > >>
> > > >> On Tue, Jan 14, 2020 at 3:39 AM Xintong Song <[hidden email]
> >
> > > >> wrote:
> > > >>
> > > >>> Thanks for the feedback, Stephan and Kurt.
> > > >>>
> > > >>> @Stephan
> > > >>>
> > > >>> Regarding managed memory fraction,
> > > >>> - It makes sense to keep the default value 0.4, if we assume
> rocksdb
> > > >>> memory is limited by default.
> > > >>> - AFAIK, currently rocksdb by default does not limit its memory
> > usage.
> > > >>> And I'm positive to change it.
> > > >>> - Personally, I don't like the idea that we the out-of-box
> experience
> > > >>> (for which we set the default fraction) relies on that users will
> > > manually
> > > >>> turn another switch on.
> > > >>>
> > > >>> Regarding framework heap memory,
> > > >>> - The major reason we set it by default is, as you mentioned, that
> to
> > > >>> have a safe net of minimal JVM heap size.
> > > >>> - Also, considering the in progress FLIP-56 (dynamic slot
> > allocation),
> > > >>> we want to reserve some heap memory that will not go into the slot
> > > >>> profiles. That's why we decide the default value according to the
> > heap
> > > >>> memory usage of an empty task executor.
> > > >>>
> > > >>> @Kurt
> > > >>> Regarding metaspace,
> > > >>> - This config option ("taskmanager.memory.jvm-metaspace") only
> takes
> > > >>> effect on TMs. Currently we do not set metaspace size for JM.
> > > >>> - If we have the same metaspace problem on TMs, then yes, changing
> it
> > > >>> from 128M to 64M will make it worse. However, IMO 10T tpc-ds
> > benchmark
> > > >>> should not be considered as out-of-box experience and it makes
> sense
> > to
> > > >>> tune the configurations for it. I think the smaller metaspace size
> > > would be
> > > >>> a better choice for the first trying-out, where a job should not be
> > too
> > > >>> complicated, the TM size could be relative small (e.g. 1g).
> > > >>>
> > > >>> Thank you~
> > > >>>
> > > >>> Xintong Song
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Tue, Jan 14, 2020 at 9:38 AM Kurt Young <[hidden email]>
> wrote:
> > > >>>
> > > >>>> HI Xingtong,
> > > >>>>
> > > >>>> IIRC during our tpc-ds 10T benchmark, we have suffered by JM's
> > > >>>> metaspace size and full gc which
> > > >>>> caused by lots of classloadings of source input split. Could you
> > check
> > > >>>> whether changing the default
> > > >>>> value from 128MB to 64MB will make it worse?
> > > >>>>
> > > >>>> Correct me if I misunderstood anything, also cc @Jingsong
> > > >>>>
> > > >>>> Best,
> > > >>>> Kurt
> > > >>>>
> > > >>>>
> > > >>>> On Tue, Jan 14, 2020 at 3:44 AM Stephan Ewen <[hidden email]>
> > > wrote:
> > > >>>>
> > > >>>>> Hi all!
> > > >>>>>
> > > >>>>> Thanks a lot, Xintong, for this thorough analysis. Based on your
> > > >>>>> analysis,
> > > >>>>> here are some thoughts:
> > > >>>>>
> > > >>>>> +1 to change default JVM metaspace size from 128MB to 64MB
> > > >>>>> +1 to change default JVM overhead min size from 128MB to 196MB
> > > >>>>>
> > > >>>>> Concerning the managed memory fraction, I am not sure I would
> > change
> > > >>>>> it,
> > > >>>>> for the following reasons:
> > > >>>>>
> > > >>>>>   - We should assume RocksDB will be limited to managed memory by
> > > >>>>> default.
> > > >>>>> This will either be active by default or we would encourage
> > everyone
> > > >>>>> to use
> > > >>>>> this by default, because otherwise it is super hard to reason
> about
> > > the
> > > >>>>> RocksDB footprint.
> > > >>>>>   - For standalone, a managed memory fraction of 0.3 is less than
> > > half
> > > >>>>> of
> > > >>>>> the managed memory from 1.9.
> > > >>>>>   - I am not sure if the managed memory fraction is a value that
> > all
> > > >>>>> users
> > > >>>>> adjust immediately when scaling up the memory during their first
> > > >>>>> try-out
> > > >>>>> phase. I would assume that most users initially only adjust
> > > >>>>> "memory.flink.size" or "memory.process.size". A value of 0.3 will
> > > lead
> > > >>>>> to
> > > >>>>> having too large heaps and very little RocksDB / batch memory
> even
> > > when
> > > >>>>> scaling up during the initial exploration.
> > > >>>>>   - I agree, though, that 0.5 looks too aggressive, from your
> > > >>>>> benchmarks.
> > > >>>>> So maybe keeping it at 0.4 could work?
> > > >>>>>
> > > >>>>> And one question: Why do we set the Framework Heap by default? Is
> > > that
> > > >>>>> so
> > > >>>>> we reduce the managed memory further is less than framework heap
> > > would
> > > >>>>> be
> > > >>>>> left from the JVM heap?
> > > >>>>>
> > > >>>>> Best,
> > > >>>>> Stephan
> > > >>>>>
> > > >>>>> On Thu, Jan 9, 2020 at 10:54 AM Xintong Song <
> > [hidden email]>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>> > Hi all,
> > > >>>>> >
> > > >>>>> > As described in FLINK-15145 [1], we decided to tune the default
> > > >>>>> > configuration values of FLIP-49 with more jobs and cases.
> > > >>>>> >
> > > >>>>> > After spending time analyzing and tuning the configurations,
> I've
> > > >>>>> come
> > > >>>>> > with several findings. To be brief, I would suggest the
> following
> > > >>>>> changes,
> > > >>>>> > and for more details please take a look at my tuning report
> [2].
> > > >>>>> >
> > > >>>>> >    - Change default managed memory fraction from 0.4 to 0.3.
> > > >>>>> >    - Change default JVM metaspace size from 128MB to 64MB.
> > > >>>>> >    - Change default JVM overhead min size from 128MB to 196MB.
> > > >>>>> >
> > > >>>>> > Looking forward to your feedback.
> > > >>>>> >
> > > >>>>> > Thank you~
> > > >>>>> >
> > > >>>>> > Xintong Song
> > > >>>>> >
> > > >>>>> >
> > > >>>>> > [1] https://issues.apache.org/jira/browse/FLINK-15145
> > > >>>>> >
> > > >>>>> > [2]
> > > >>>>> >
> > > >>>>>
> > >
> >
> https://docs.google.com/document/d/1-LravhQYUIkXb7rh0XnBB78vSvhp3ecLSAgsiabfVkk/edit?usp=sharing
> > > >>>>> >
> > > >>>>> >
> > > >>>>>
> > > >>>>
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Tuning FLIP-49 configuration default values.

Xintong Song
There's more idea from offline discussion with Andrey.

If we decide to make metaspace 96MB, we can also make process.size 1568MB
(1.5G + 32MB).
According to the spreadsheet
<https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE/edit#gid=0>,
1.5GB process size and 64MB metaspace result in memory sizes with the
values to be powers of 2.
When increasing the metaspace from 64MB to 96MB, it would be good to
preserve that alignment, for better readability that later we explain the
memory configuration and calculations in documents.
I believe it's not a big difference between 1.5GB and 1.5GB + 32 MB for
memory consumption.

Thank you~

Xintong Song



On Wed, Jan 15, 2020 at 11:55 AM Xintong Song <[hidden email]> wrote:

> Thanks for the discussion, Stephan, Till and Andrey.
>
> +1 for the managed fraction (0.4) and process.size (1.5G).
>
> *JVM overhead min 196 -> 192Mb (128 + 64)*
>> small correction for better power 2 alignment of sizes
>>
> Sorry, this was a typo (and the same for the jira comment which is
> copy-pasted). It was 192mb used in the tuning report.
>
> *meta space at least 96Mb?*
>> There is still a concern about JVM metaspace being just 64Mb.
>> We should confirm that it is not a problem by trying to test it also with
>> the SQL jobs, Blink planner.
>> Also, by running tpc-ds e2e Flink tests with this setting. Basically,
>> where
>> more classes are generated/loaded.
>> We can look into this tomorrow.
>>
> I have already tried the setting metaspace to 64Mb with the e2e tests,
> where I believe various sql / blink / tpc-ds test cases are included. (See
> https://travis-ci.com/flink-ci/flink/builds/142970113 )
> However, I'm also ok with 96Mb, since we are increasing the process.size
> to 1.5G.
> My original concern for having larger metaspace size was that we may
> result in too small flink.size for out-of-box configuration on
> containerized setups.
>
> *sanity check of JVM overhead*
>> When the explicitly configured process and flink memory sizes are verified
>> with the JVM meta space and overhead,
>> JVM overhead does not have to be the exact fraction.
>> It can be just within its min/max range, similar to how it is now for
>> network/shuffle memory check after FLINK-15300.
>>
> Also +1 for this.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Jan 15, 2020 at 6:16 AM Andrey Zagrebin <[hidden email]>
> wrote:
>
>> Hi all,
>>
>> Stephan, Till and me had another offline discussion today. Here is the
>> outcome of our brainstorm.
>>
>> *managed fraction 0.4*
>> just confirmed what we already discussed here.
>>
>> *process.size = 1536Mb (1,5Gb)*
>> We agreed to have process.size in the default settings with the
>> explanation
>> of flink.size alternative in the comment.
>> The suggestion is to increase it from 1024 to 1536mb. As you can see in
>> the
>> earlier provided calculation spreadsheet,
>> it will result in bigger JVM Heap and managed memory (both ~0.5Gb) for all
>> new setups.
>> This should provide good enough experience for trying out Flink.
>>
>> *JVM overhead min 196 -> 192Mb (128 + 64)*
>> small correction for better power 2 alignment of sizes
>>
>> *meta space at least 96Mb?*
>> There is still a concern about JVM metaspace being just 64Mb.
>> We should confirm that it is not a problem by trying to test it also with
>> the SQL jobs, Blink planner.
>> Also, by running tpc-ds e2e Flink tests with this setting. Basically,
>> where
>> more classes are generated/loaded.
>> We can look into this tomorrow.
>>
>> *sanity check of JVM overhead*
>> When the explicitly configured process and flink memory sizes are verified
>> with the JVM meta space and overhead,
>> JVM overhead does not have to be the exact fraction.
>> It can be just within its min/max range, similar to how it is now for
>> network/shuffle memory check after FLINK-15300.
>>
>> Best,Andrey
>>
>> On Tue, Jan 14, 2020 at 4:30 PM Stephan Ewen <[hidden email]> wrote:
>>
>> > I like the idea of having a larger default "flink.size" in the
>> config.yaml.
>> > Maybe we don't need to double it, but something like 1280m would be
>> okay?
>> >
>> > On Tue, Jan 14, 2020 at 3:47 PM Andrey Zagrebin <[hidden email]>
>> > wrote:
>> >
>> > > Hi all!
>> > >
>> > > Great that we have already tried out new FLIP-49 with the bigger jobs.
>> > >
>> > > I am also +1 for the JVM metaspace and overhead changes.
>> > >
>> > > Regarding 0.3 vs 0.4 for managed memory, +1 for having more managed
>> > memory
>> > > for Rocksdb limiting case.
>> > >
>> > > In general, this looks mostly to be about memory distribution between
>> JVM
>> > > heap and managed off-heap.
>> > > Comparing to the previous default setup, the JVM heap dropped
>> (especially
>> > > for standalone) mostly due to moving managed from heap to off-heap and
>> > then
>> > > also adding framework off-heap.
>> > > In general, this can be the most important consequence for beginners
>> and
>> > > those who rely on the default configuration.
>> > > Especially the legacy default configuration in standalone with falling
>> > back
>> > > heap.size to flink.size but there it seems we cannot do too much now.
>> > >
>> > > I prepared a spreadsheet
>> > > <
>> > >
>> >
>> https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE
>> > > >
>> > > to play with numbers for the mentioned in the report setups.
>> > >
>> > > One idea would be to set process size (or smaller flink size
>> > respectively)
>> > > to a bigger default number, like 2048.
>> > > In this case, the abs derived default JVM heap and managed memory are
>> > close
>> > > to the previous defaults, especially for managed fraction 0.3.
>> > > This should align the defaults with the previous standalone try-out
>> > > experience where the increased off-heap memory is not strictly
>> controlled
>> > > by the environment anyways.
>> > > The consequence for container users who relied on and updated the
>> default
>> > > configuration is that the containers will be requested with the double
>> > > size.
>> > >
>> > > Best,
>> > > Andrey
>> > >
>> > >
>> > > On Tue, Jan 14, 2020 at 11:20 AM Till Rohrmann <[hidden email]>
>> > > wrote:
>> > >
>> > > > +1 for the JVM metaspace and overhead changes.
>> > > >
>> > > > On Tue, Jan 14, 2020 at 11:19 AM Till Rohrmann <
>> [hidden email]>
>> > > > wrote:
>> > > >
>> > > >> I guess one of the most important results of this experiment is to
>> > have
>> > > a
>> > > >> good tuning guide available for users who are past the initial
>> try-out
>> > > >> phase because the default settings will be kind of a compromise. I
>> > > assume
>> > > >> that this is part of the outstanding FLIP-49 documentation task.
>> > > >>
>> > > >> If we limit RocksDB's memory consumption by default, then I believe
>> > that
>> > > >> 0.4 would give the better all-round experience as it leaves a bit
>> more
>> > > >> memory for RocksDB. However, I'm a bit sceptical whether we should
>> > > optimize
>> > > >> the default settings for a configuration where the user still
>> needs to
>> > > >> activate the strict memory limiting for RocksDB. In this case, I
>> would
>> > > >> expect that the user could also adapt the managed memory fraction.
>> > > >>
>> > > >> Cheers,
>> > > >> Till
>> > > >>
>> > > >> On Tue, Jan 14, 2020 at 3:39 AM Xintong Song <
>> [hidden email]>
>> > > >> wrote:
>> > > >>
>> > > >>> Thanks for the feedback, Stephan and Kurt.
>> > > >>>
>> > > >>> @Stephan
>> > > >>>
>> > > >>> Regarding managed memory fraction,
>> > > >>> - It makes sense to keep the default value 0.4, if we assume
>> rocksdb
>> > > >>> memory is limited by default.
>> > > >>> - AFAIK, currently rocksdb by default does not limit its memory
>> > usage.
>> > > >>> And I'm positive to change it.
>> > > >>> - Personally, I don't like the idea that we the out-of-box
>> experience
>> > > >>> (for which we set the default fraction) relies on that users will
>> > > manually
>> > > >>> turn another switch on.
>> > > >>>
>> > > >>> Regarding framework heap memory,
>> > > >>> - The major reason we set it by default is, as you mentioned,
>> that to
>> > > >>> have a safe net of minimal JVM heap size.
>> > > >>> - Also, considering the in progress FLIP-56 (dynamic slot
>> > allocation),
>> > > >>> we want to reserve some heap memory that will not go into the slot
>> > > >>> profiles. That's why we decide the default value according to the
>> > heap
>> > > >>> memory usage of an empty task executor.
>> > > >>>
>> > > >>> @Kurt
>> > > >>> Regarding metaspace,
>> > > >>> - This config option ("taskmanager.memory.jvm-metaspace") only
>> takes
>> > > >>> effect on TMs. Currently we do not set metaspace size for JM.
>> > > >>> - If we have the same metaspace problem on TMs, then yes,
>> changing it
>> > > >>> from 128M to 64M will make it worse. However, IMO 10T tpc-ds
>> > benchmark
>> > > >>> should not be considered as out-of-box experience and it makes
>> sense
>> > to
>> > > >>> tune the configurations for it. I think the smaller metaspace size
>> > > would be
>> > > >>> a better choice for the first trying-out, where a job should not
>> be
>> > too
>> > > >>> complicated, the TM size could be relative small (e.g. 1g).
>> > > >>>
>> > > >>> Thank you~
>> > > >>>
>> > > >>> Xintong Song
>> > > >>>
>> > > >>>
>> > > >>>
>> > > >>> On Tue, Jan 14, 2020 at 9:38 AM Kurt Young <[hidden email]>
>> wrote:
>> > > >>>
>> > > >>>> HI Xingtong,
>> > > >>>>
>> > > >>>> IIRC during our tpc-ds 10T benchmark, we have suffered by JM's
>> > > >>>> metaspace size and full gc which
>> > > >>>> caused by lots of classloadings of source input split. Could you
>> > check
>> > > >>>> whether changing the default
>> > > >>>> value from 128MB to 64MB will make it worse?
>> > > >>>>
>> > > >>>> Correct me if I misunderstood anything, also cc @Jingsong
>> > > >>>>
>> > > >>>> Best,
>> > > >>>> Kurt
>> > > >>>>
>> > > >>>>
>> > > >>>> On Tue, Jan 14, 2020 at 3:44 AM Stephan Ewen <[hidden email]>
>> > > wrote:
>> > > >>>>
>> > > >>>>> Hi all!
>> > > >>>>>
>> > > >>>>> Thanks a lot, Xintong, for this thorough analysis. Based on your
>> > > >>>>> analysis,
>> > > >>>>> here are some thoughts:
>> > > >>>>>
>> > > >>>>> +1 to change default JVM metaspace size from 128MB to 64MB
>> > > >>>>> +1 to change default JVM overhead min size from 128MB to 196MB
>> > > >>>>>
>> > > >>>>> Concerning the managed memory fraction, I am not sure I would
>> > change
>> > > >>>>> it,
>> > > >>>>> for the following reasons:
>> > > >>>>>
>> > > >>>>>   - We should assume RocksDB will be limited to managed memory
>> by
>> > > >>>>> default.
>> > > >>>>> This will either be active by default or we would encourage
>> > everyone
>> > > >>>>> to use
>> > > >>>>> this by default, because otherwise it is super hard to reason
>> about
>> > > the
>> > > >>>>> RocksDB footprint.
>> > > >>>>>   - For standalone, a managed memory fraction of 0.3 is less
>> than
>> > > half
>> > > >>>>> of
>> > > >>>>> the managed memory from 1.9.
>> > > >>>>>   - I am not sure if the managed memory fraction is a value that
>> > all
>> > > >>>>> users
>> > > >>>>> adjust immediately when scaling up the memory during their first
>> > > >>>>> try-out
>> > > >>>>> phase. I would assume that most users initially only adjust
>> > > >>>>> "memory.flink.size" or "memory.process.size". A value of 0.3
>> will
>> > > lead
>> > > >>>>> to
>> > > >>>>> having too large heaps and very little RocksDB / batch memory
>> even
>> > > when
>> > > >>>>> scaling up during the initial exploration.
>> > > >>>>>   - I agree, though, that 0.5 looks too aggressive, from your
>> > > >>>>> benchmarks.
>> > > >>>>> So maybe keeping it at 0.4 could work?
>> > > >>>>>
>> > > >>>>> And one question: Why do we set the Framework Heap by default?
>> Is
>> > > that
>> > > >>>>> so
>> > > >>>>> we reduce the managed memory further is less than framework heap
>> > > would
>> > > >>>>> be
>> > > >>>>> left from the JVM heap?
>> > > >>>>>
>> > > >>>>> Best,
>> > > >>>>> Stephan
>> > > >>>>>
>> > > >>>>> On Thu, Jan 9, 2020 at 10:54 AM Xintong Song <
>> > [hidden email]>
>> > > >>>>> wrote:
>> > > >>>>>
>> > > >>>>> > Hi all,
>> > > >>>>> >
>> > > >>>>> > As described in FLINK-15145 [1], we decided to tune the
>> default
>> > > >>>>> > configuration values of FLIP-49 with more jobs and cases.
>> > > >>>>> >
>> > > >>>>> > After spending time analyzing and tuning the configurations,
>> I've
>> > > >>>>> come
>> > > >>>>> > with several findings. To be brief, I would suggest the
>> following
>> > > >>>>> changes,
>> > > >>>>> > and for more details please take a look at my tuning report
>> [2].
>> > > >>>>> >
>> > > >>>>> >    - Change default managed memory fraction from 0.4 to 0.3.
>> > > >>>>> >    - Change default JVM metaspace size from 128MB to 64MB.
>> > > >>>>> >    - Change default JVM overhead min size from 128MB to 196MB.
>> > > >>>>> >
>> > > >>>>> > Looking forward to your feedback.
>> > > >>>>> >
>> > > >>>>> > Thank you~
>> > > >>>>> >
>> > > >>>>> > Xintong Song
>> > > >>>>> >
>> > > >>>>> >
>> > > >>>>> > [1] https://issues.apache.org/jira/browse/FLINK-15145
>> > > >>>>> >
>> > > >>>>> > [2]
>> > > >>>>> >
>> > > >>>>>
>> > >
>> >
>> https://docs.google.com/document/d/1-LravhQYUIkXb7rh0XnBB78vSvhp3ecLSAgsiabfVkk/edit?usp=sharing
>> > > >>>>> >
>> > > >>>>> >
>> > > >>>>>
>> > > >>>>
>> > >
>> >
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Tuning FLIP-49 configuration default values.

Xintong Song
Thank you all for the well discussion.

If there's no further concerns or objections, I would like to conclude this
thread into the following action items.

   - Change default value of "taskmanager.memory.jvm-overhead.min" to 192MB.
   - Change default value of "taskmanager.memory.jvm-metaspace.size" to
   96MB.
   - Change the value of "taskmanager.memory.process.size" in the default
   "flink-conf.yaml" to 1568MB.
   - Relax JVM overhead sanity check, so that the fraction does not need to
   be strictly followed, as long as the min/max range is respected.


Thank you~

Xintong Song



On Wed, Jan 15, 2020 at 5:50 PM Xintong Song <[hidden email]> wrote:

> There's more idea from offline discussion with Andrey.
>
> If we decide to make metaspace 96MB, we can also make process.size 1568MB
> (1.5G + 32MB).
> According to the spreadsheet
> <https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE/edit#gid=0>,
> 1.5GB process size and 64MB metaspace result in memory sizes with the
> values to be powers of 2.
> When increasing the metaspace from 64MB to 96MB, it would be good to
> preserve that alignment, for better readability that later we explain the
> memory configuration and calculations in documents.
> I believe it's not a big difference between 1.5GB and 1.5GB + 32 MB for
> memory consumption.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Jan 15, 2020 at 11:55 AM Xintong Song <[hidden email]>
> wrote:
>
>> Thanks for the discussion, Stephan, Till and Andrey.
>>
>> +1 for the managed fraction (0.4) and process.size (1.5G).
>>
>> *JVM overhead min 196 -> 192Mb (128 + 64)*
>>> small correction for better power 2 alignment of sizes
>>>
>> Sorry, this was a typo (and the same for the jira comment which is
>> copy-pasted). It was 192mb used in the tuning report.
>>
>> *meta space at least 96Mb?*
>>> There is still a concern about JVM metaspace being just 64Mb.
>>> We should confirm that it is not a problem by trying to test it also with
>>> the SQL jobs, Blink planner.
>>> Also, by running tpc-ds e2e Flink tests with this setting. Basically,
>>> where
>>> more classes are generated/loaded.
>>> We can look into this tomorrow.
>>>
>> I have already tried the setting metaspace to 64Mb with the e2e tests,
>> where I believe various sql / blink / tpc-ds test cases are included. (See
>> https://travis-ci.com/flink-ci/flink/builds/142970113 )
>> However, I'm also ok with 96Mb, since we are increasing the process.size
>> to 1.5G.
>> My original concern for having larger metaspace size was that we may
>> result in too small flink.size for out-of-box configuration on
>> containerized setups.
>>
>> *sanity check of JVM overhead*
>>> When the explicitly configured process and flink memory sizes are
>>> verified
>>> with the JVM meta space and overhead,
>>> JVM overhead does not have to be the exact fraction.
>>> It can be just within its min/max range, similar to how it is now for
>>> network/shuffle memory check after FLINK-15300.
>>>
>> Also +1 for this.
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>>
>> On Wed, Jan 15, 2020 at 6:16 AM Andrey Zagrebin <[hidden email]>
>> wrote:
>>
>>> Hi all,
>>>
>>> Stephan, Till and me had another offline discussion today. Here is the
>>> outcome of our brainstorm.
>>>
>>> *managed fraction 0.4*
>>> just confirmed what we already discussed here.
>>>
>>> *process.size = 1536Mb (1,5Gb)*
>>> We agreed to have process.size in the default settings with the
>>> explanation
>>> of flink.size alternative in the comment.
>>> The suggestion is to increase it from 1024 to 1536mb. As you can see in
>>> the
>>> earlier provided calculation spreadsheet,
>>> it will result in bigger JVM Heap and managed memory (both ~0.5Gb) for
>>> all
>>> new setups.
>>> This should provide good enough experience for trying out Flink.
>>>
>>> *JVM overhead min 196 -> 192Mb (128 + 64)*
>>> small correction for better power 2 alignment of sizes
>>>
>>> *meta space at least 96Mb?*
>>> There is still a concern about JVM metaspace being just 64Mb.
>>> We should confirm that it is not a problem by trying to test it also with
>>> the SQL jobs, Blink planner.
>>> Also, by running tpc-ds e2e Flink tests with this setting. Basically,
>>> where
>>> more classes are generated/loaded.
>>> We can look into this tomorrow.
>>>
>>> *sanity check of JVM overhead*
>>> When the explicitly configured process and flink memory sizes are
>>> verified
>>> with the JVM meta space and overhead,
>>> JVM overhead does not have to be the exact fraction.
>>> It can be just within its min/max range, similar to how it is now for
>>> network/shuffle memory check after FLINK-15300.
>>>
>>> Best,Andrey
>>>
>>> On Tue, Jan 14, 2020 at 4:30 PM Stephan Ewen <[hidden email]> wrote:
>>>
>>> > I like the idea of having a larger default "flink.size" in the
>>> config.yaml.
>>> > Maybe we don't need to double it, but something like 1280m would be
>>> okay?
>>> >
>>> > On Tue, Jan 14, 2020 at 3:47 PM Andrey Zagrebin <[hidden email]>
>>> > wrote:
>>> >
>>> > > Hi all!
>>> > >
>>> > > Great that we have already tried out new FLIP-49 with the bigger
>>> jobs.
>>> > >
>>> > > I am also +1 for the JVM metaspace and overhead changes.
>>> > >
>>> > > Regarding 0.3 vs 0.4 for managed memory, +1 for having more managed
>>> > memory
>>> > > for Rocksdb limiting case.
>>> > >
>>> > > In general, this looks mostly to be about memory distribution
>>> between JVM
>>> > > heap and managed off-heap.
>>> > > Comparing to the previous default setup, the JVM heap dropped
>>> (especially
>>> > > for standalone) mostly due to moving managed from heap to off-heap
>>> and
>>> > then
>>> > > also adding framework off-heap.
>>> > > In general, this can be the most important consequence for beginners
>>> and
>>> > > those who rely on the default configuration.
>>> > > Especially the legacy default configuration in standalone with
>>> falling
>>> > back
>>> > > heap.size to flink.size but there it seems we cannot do too much now.
>>> > >
>>> > > I prepared a spreadsheet
>>> > > <
>>> > >
>>> >
>>> https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE
>>> > > >
>>> > > to play with numbers for the mentioned in the report setups.
>>> > >
>>> > > One idea would be to set process size (or smaller flink size
>>> > respectively)
>>> > > to a bigger default number, like 2048.
>>> > > In this case, the abs derived default JVM heap and managed memory are
>>> > close
>>> > > to the previous defaults, especially for managed fraction 0.3.
>>> > > This should align the defaults with the previous standalone try-out
>>> > > experience where the increased off-heap memory is not strictly
>>> controlled
>>> > > by the environment anyways.
>>> > > The consequence for container users who relied on and updated the
>>> default
>>> > > configuration is that the containers will be requested with the
>>> double
>>> > > size.
>>> > >
>>> > > Best,
>>> > > Andrey
>>> > >
>>> > >
>>> > > On Tue, Jan 14, 2020 at 11:20 AM Till Rohrmann <[hidden email]
>>> >
>>> > > wrote:
>>> > >
>>> > > > +1 for the JVM metaspace and overhead changes.
>>> > > >
>>> > > > On Tue, Jan 14, 2020 at 11:19 AM Till Rohrmann <
>>> [hidden email]>
>>> > > > wrote:
>>> > > >
>>> > > >> I guess one of the most important results of this experiment is to
>>> > have
>>> > > a
>>> > > >> good tuning guide available for users who are past the initial
>>> try-out
>>> > > >> phase because the default settings will be kind of a compromise. I
>>> > > assume
>>> > > >> that this is part of the outstanding FLIP-49 documentation task.
>>> > > >>
>>> > > >> If we limit RocksDB's memory consumption by default, then I
>>> believe
>>> > that
>>> > > >> 0.4 would give the better all-round experience as it leaves a bit
>>> more
>>> > > >> memory for RocksDB. However, I'm a bit sceptical whether we should
>>> > > optimize
>>> > > >> the default settings for a configuration where the user still
>>> needs to
>>> > > >> activate the strict memory limiting for RocksDB. In this case, I
>>> would
>>> > > >> expect that the user could also adapt the managed memory fraction.
>>> > > >>
>>> > > >> Cheers,
>>> > > >> Till
>>> > > >>
>>> > > >> On Tue, Jan 14, 2020 at 3:39 AM Xintong Song <
>>> [hidden email]>
>>> > > >> wrote:
>>> > > >>
>>> > > >>> Thanks for the feedback, Stephan and Kurt.
>>> > > >>>
>>> > > >>> @Stephan
>>> > > >>>
>>> > > >>> Regarding managed memory fraction,
>>> > > >>> - It makes sense to keep the default value 0.4, if we assume
>>> rocksdb
>>> > > >>> memory is limited by default.
>>> > > >>> - AFAIK, currently rocksdb by default does not limit its memory
>>> > usage.
>>> > > >>> And I'm positive to change it.
>>> > > >>> - Personally, I don't like the idea that we the out-of-box
>>> experience
>>> > > >>> (for which we set the default fraction) relies on that users will
>>> > > manually
>>> > > >>> turn another switch on.
>>> > > >>>
>>> > > >>> Regarding framework heap memory,
>>> > > >>> - The major reason we set it by default is, as you mentioned,
>>> that to
>>> > > >>> have a safe net of minimal JVM heap size.
>>> > > >>> - Also, considering the in progress FLIP-56 (dynamic slot
>>> > allocation),
>>> > > >>> we want to reserve some heap memory that will not go into the
>>> slot
>>> > > >>> profiles. That's why we decide the default value according to the
>>> > heap
>>> > > >>> memory usage of an empty task executor.
>>> > > >>>
>>> > > >>> @Kurt
>>> > > >>> Regarding metaspace,
>>> > > >>> - This config option ("taskmanager.memory.jvm-metaspace") only
>>> takes
>>> > > >>> effect on TMs. Currently we do not set metaspace size for JM.
>>> > > >>> - If we have the same metaspace problem on TMs, then yes,
>>> changing it
>>> > > >>> from 128M to 64M will make it worse. However, IMO 10T tpc-ds
>>> > benchmark
>>> > > >>> should not be considered as out-of-box experience and it makes
>>> sense
>>> > to
>>> > > >>> tune the configurations for it. I think the smaller metaspace
>>> size
>>> > > would be
>>> > > >>> a better choice for the first trying-out, where a job should not
>>> be
>>> > too
>>> > > >>> complicated, the TM size could be relative small (e.g. 1g).
>>> > > >>>
>>> > > >>> Thank you~
>>> > > >>>
>>> > > >>> Xintong Song
>>> > > >>>
>>> > > >>>
>>> > > >>>
>>> > > >>> On Tue, Jan 14, 2020 at 9:38 AM Kurt Young <[hidden email]>
>>> wrote:
>>> > > >>>
>>> > > >>>> HI Xingtong,
>>> > > >>>>
>>> > > >>>> IIRC during our tpc-ds 10T benchmark, we have suffered by JM's
>>> > > >>>> metaspace size and full gc which
>>> > > >>>> caused by lots of classloadings of source input split. Could you
>>> > check
>>> > > >>>> whether changing the default
>>> > > >>>> value from 128MB to 64MB will make it worse?
>>> > > >>>>
>>> > > >>>> Correct me if I misunderstood anything, also cc @Jingsong
>>> > > >>>>
>>> > > >>>> Best,
>>> > > >>>> Kurt
>>> > > >>>>
>>> > > >>>>
>>> > > >>>> On Tue, Jan 14, 2020 at 3:44 AM Stephan Ewen <[hidden email]>
>>> > > wrote:
>>> > > >>>>
>>> > > >>>>> Hi all!
>>> > > >>>>>
>>> > > >>>>> Thanks a lot, Xintong, for this thorough analysis. Based on
>>> your
>>> > > >>>>> analysis,
>>> > > >>>>> here are some thoughts:
>>> > > >>>>>
>>> > > >>>>> +1 to change default JVM metaspace size from 128MB to 64MB
>>> > > >>>>> +1 to change default JVM overhead min size from 128MB to 196MB
>>> > > >>>>>
>>> > > >>>>> Concerning the managed memory fraction, I am not sure I would
>>> > change
>>> > > >>>>> it,
>>> > > >>>>> for the following reasons:
>>> > > >>>>>
>>> > > >>>>>   - We should assume RocksDB will be limited to managed memory
>>> by
>>> > > >>>>> default.
>>> > > >>>>> This will either be active by default or we would encourage
>>> > everyone
>>> > > >>>>> to use
>>> > > >>>>> this by default, because otherwise it is super hard to reason
>>> about
>>> > > the
>>> > > >>>>> RocksDB footprint.
>>> > > >>>>>   - For standalone, a managed memory fraction of 0.3 is less
>>> than
>>> > > half
>>> > > >>>>> of
>>> > > >>>>> the managed memory from 1.9.
>>> > > >>>>>   - I am not sure if the managed memory fraction is a value
>>> that
>>> > all
>>> > > >>>>> users
>>> > > >>>>> adjust immediately when scaling up the memory during their
>>> first
>>> > > >>>>> try-out
>>> > > >>>>> phase. I would assume that most users initially only adjust
>>> > > >>>>> "memory.flink.size" or "memory.process.size". A value of 0.3
>>> will
>>> > > lead
>>> > > >>>>> to
>>> > > >>>>> having too large heaps and very little RocksDB / batch memory
>>> even
>>> > > when
>>> > > >>>>> scaling up during the initial exploration.
>>> > > >>>>>   - I agree, though, that 0.5 looks too aggressive, from your
>>> > > >>>>> benchmarks.
>>> > > >>>>> So maybe keeping it at 0.4 could work?
>>> > > >>>>>
>>> > > >>>>> And one question: Why do we set the Framework Heap by default?
>>> Is
>>> > > that
>>> > > >>>>> so
>>> > > >>>>> we reduce the managed memory further is less than framework
>>> heap
>>> > > would
>>> > > >>>>> be
>>> > > >>>>> left from the JVM heap?
>>> > > >>>>>
>>> > > >>>>> Best,
>>> > > >>>>> Stephan
>>> > > >>>>>
>>> > > >>>>> On Thu, Jan 9, 2020 at 10:54 AM Xintong Song <
>>> > [hidden email]>
>>> > > >>>>> wrote:
>>> > > >>>>>
>>> > > >>>>> > Hi all,
>>> > > >>>>> >
>>> > > >>>>> > As described in FLINK-15145 [1], we decided to tune the
>>> default
>>> > > >>>>> > configuration values of FLIP-49 with more jobs and cases.
>>> > > >>>>> >
>>> > > >>>>> > After spending time analyzing and tuning the configurations,
>>> I've
>>> > > >>>>> come
>>> > > >>>>> > with several findings. To be brief, I would suggest the
>>> following
>>> > > >>>>> changes,
>>> > > >>>>> > and for more details please take a look at my tuning report
>>> [2].
>>> > > >>>>> >
>>> > > >>>>> >    - Change default managed memory fraction from 0.4 to 0.3.
>>> > > >>>>> >    - Change default JVM metaspace size from 128MB to 64MB.
>>> > > >>>>> >    - Change default JVM overhead min size from 128MB to
>>> 196MB.
>>> > > >>>>> >
>>> > > >>>>> > Looking forward to your feedback.
>>> > > >>>>> >
>>> > > >>>>> > Thank you~
>>> > > >>>>> >
>>> > > >>>>> > Xintong Song
>>> > > >>>>> >
>>> > > >>>>> >
>>> > > >>>>> > [1] https://issues.apache.org/jira/browse/FLINK-15145
>>> > > >>>>> >
>>> > > >>>>> > [2]
>>> > > >>>>> >
>>> > > >>>>>
>>> > >
>>> >
>>> https://docs.google.com/document/d/1-LravhQYUIkXb7rh0XnBB78vSvhp3ecLSAgsiabfVkk/edit?usp=sharing
>>> > > >>>>> >
>>> > > >>>>> >
>>> > > >>>>>
>>> > > >>>>
>>> > >
>>> >
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Tuning FLIP-49 configuration default values.

Till Rohrmann
I'd be fine with these changes. Thanks for the summary Xintong.

Cheers,
Till

On Wed, Jan 15, 2020 at 11:09 AM Xintong Song <[hidden email]> wrote:

> Thank you all for the well discussion.
>
> If there's no further concerns or objections, I would like to conclude this
> thread into the following action items.
>
>    - Change default value of "taskmanager.memory.jvm-overhead.min" to
> 192MB.
>    - Change default value of "taskmanager.memory.jvm-metaspace.size" to
>    96MB.
>    - Change the value of "taskmanager.memory.process.size" in the default
>    "flink-conf.yaml" to 1568MB.
>    - Relax JVM overhead sanity check, so that the fraction does not need to
>    be strictly followed, as long as the min/max range is respected.
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Jan 15, 2020 at 5:50 PM Xintong Song <[hidden email]>
> wrote:
>
> > There's more idea from offline discussion with Andrey.
> >
> > If we decide to make metaspace 96MB, we can also make process.size 1568MB
> > (1.5G + 32MB).
> > According to the spreadsheet
> > <
> https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE/edit#gid=0
> >,
> > 1.5GB process size and 64MB metaspace result in memory sizes with the
> > values to be powers of 2.
> > When increasing the metaspace from 64MB to 96MB, it would be good to
> > preserve that alignment, for better readability that later we explain the
> > memory configuration and calculations in documents.
> > I believe it's not a big difference between 1.5GB and 1.5GB + 32 MB for
> > memory consumption.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Wed, Jan 15, 2020 at 11:55 AM Xintong Song <[hidden email]>
> > wrote:
> >
> >> Thanks for the discussion, Stephan, Till and Andrey.
> >>
> >> +1 for the managed fraction (0.4) and process.size (1.5G).
> >>
> >> *JVM overhead min 196 -> 192Mb (128 + 64)*
> >>> small correction for better power 2 alignment of sizes
> >>>
> >> Sorry, this was a typo (and the same for the jira comment which is
> >> copy-pasted). It was 192mb used in the tuning report.
> >>
> >> *meta space at least 96Mb?*
> >>> There is still a concern about JVM metaspace being just 64Mb.
> >>> We should confirm that it is not a problem by trying to test it also
> with
> >>> the SQL jobs, Blink planner.
> >>> Also, by running tpc-ds e2e Flink tests with this setting. Basically,
> >>> where
> >>> more classes are generated/loaded.
> >>> We can look into this tomorrow.
> >>>
> >> I have already tried the setting metaspace to 64Mb with the e2e tests,
> >> where I believe various sql / blink / tpc-ds test cases are included.
> (See
> >> https://travis-ci.com/flink-ci/flink/builds/142970113 )
> >> However, I'm also ok with 96Mb, since we are increasing the process.size
> >> to 1.5G.
> >> My original concern for having larger metaspace size was that we may
> >> result in too small flink.size for out-of-box configuration on
> >> containerized setups.
> >>
> >> *sanity check of JVM overhead*
> >>> When the explicitly configured process and flink memory sizes are
> >>> verified
> >>> with the JVM meta space and overhead,
> >>> JVM overhead does not have to be the exact fraction.
> >>> It can be just within its min/max range, similar to how it is now for
> >>> network/shuffle memory check after FLINK-15300.
> >>>
> >> Also +1 for this.
> >>
> >> Thank you~
> >>
> >> Xintong Song
> >>
> >>
> >>
> >> On Wed, Jan 15, 2020 at 6:16 AM Andrey Zagrebin <[hidden email]>
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> Stephan, Till and me had another offline discussion today. Here is the
> >>> outcome of our brainstorm.
> >>>
> >>> *managed fraction 0.4*
> >>> just confirmed what we already discussed here.
> >>>
> >>> *process.size = 1536Mb (1,5Gb)*
> >>> We agreed to have process.size in the default settings with the
> >>> explanation
> >>> of flink.size alternative in the comment.
> >>> The suggestion is to increase it from 1024 to 1536mb. As you can see in
> >>> the
> >>> earlier provided calculation spreadsheet,
> >>> it will result in bigger JVM Heap and managed memory (both ~0.5Gb) for
> >>> all
> >>> new setups.
> >>> This should provide good enough experience for trying out Flink.
> >>>
> >>> *JVM overhead min 196 -> 192Mb (128 + 64)*
> >>> small correction for better power 2 alignment of sizes
> >>>
> >>> *meta space at least 96Mb?*
> >>> There is still a concern about JVM metaspace being just 64Mb.
> >>> We should confirm that it is not a problem by trying to test it also
> with
> >>> the SQL jobs, Blink planner.
> >>> Also, by running tpc-ds e2e Flink tests with this setting. Basically,
> >>> where
> >>> more classes are generated/loaded.
> >>> We can look into this tomorrow.
> >>>
> >>> *sanity check of JVM overhead*
> >>> When the explicitly configured process and flink memory sizes are
> >>> verified
> >>> with the JVM meta space and overhead,
> >>> JVM overhead does not have to be the exact fraction.
> >>> It can be just within its min/max range, similar to how it is now for
> >>> network/shuffle memory check after FLINK-15300.
> >>>
> >>> Best,Andrey
> >>>
> >>> On Tue, Jan 14, 2020 at 4:30 PM Stephan Ewen <[hidden email]> wrote:
> >>>
> >>> > I like the idea of having a larger default "flink.size" in the
> >>> config.yaml.
> >>> > Maybe we don't need to double it, but something like 1280m would be
> >>> okay?
> >>> >
> >>> > On Tue, Jan 14, 2020 at 3:47 PM Andrey Zagrebin <
> [hidden email]>
> >>> > wrote:
> >>> >
> >>> > > Hi all!
> >>> > >
> >>> > > Great that we have already tried out new FLIP-49 with the bigger
> >>> jobs.
> >>> > >
> >>> > > I am also +1 for the JVM metaspace and overhead changes.
> >>> > >
> >>> > > Regarding 0.3 vs 0.4 for managed memory, +1 for having more managed
> >>> > memory
> >>> > > for Rocksdb limiting case.
> >>> > >
> >>> > > In general, this looks mostly to be about memory distribution
> >>> between JVM
> >>> > > heap and managed off-heap.
> >>> > > Comparing to the previous default setup, the JVM heap dropped
> >>> (especially
> >>> > > for standalone) mostly due to moving managed from heap to off-heap
> >>> and
> >>> > then
> >>> > > also adding framework off-heap.
> >>> > > In general, this can be the most important consequence for
> beginners
> >>> and
> >>> > > those who rely on the default configuration.
> >>> > > Especially the legacy default configuration in standalone with
> >>> falling
> >>> > back
> >>> > > heap.size to flink.size but there it seems we cannot do too much
> now.
> >>> > >
> >>> > > I prepared a spreadsheet
> >>> > > <
> >>> > >
> >>> >
> >>>
> https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE
> >>> > > >
> >>> > > to play with numbers for the mentioned in the report setups.
> >>> > >
> >>> > > One idea would be to set process size (or smaller flink size
> >>> > respectively)
> >>> > > to a bigger default number, like 2048.
> >>> > > In this case, the abs derived default JVM heap and managed memory
> are
> >>> > close
> >>> > > to the previous defaults, especially for managed fraction 0.3.
> >>> > > This should align the defaults with the previous standalone try-out
> >>> > > experience where the increased off-heap memory is not strictly
> >>> controlled
> >>> > > by the environment anyways.
> >>> > > The consequence for container users who relied on and updated the
> >>> default
> >>> > > configuration is that the containers will be requested with the
> >>> double
> >>> > > size.
> >>> > >
> >>> > > Best,
> >>> > > Andrey
> >>> > >
> >>> > >
> >>> > > On Tue, Jan 14, 2020 at 11:20 AM Till Rohrmann <
> [hidden email]
> >>> >
> >>> > > wrote:
> >>> > >
> >>> > > > +1 for the JVM metaspace and overhead changes.
> >>> > > >
> >>> > > > On Tue, Jan 14, 2020 at 11:19 AM Till Rohrmann <
> >>> [hidden email]>
> >>> > > > wrote:
> >>> > > >
> >>> > > >> I guess one of the most important results of this experiment is
> to
> >>> > have
> >>> > > a
> >>> > > >> good tuning guide available for users who are past the initial
> >>> try-out
> >>> > > >> phase because the default settings will be kind of a
> compromise. I
> >>> > > assume
> >>> > > >> that this is part of the outstanding FLIP-49 documentation task.
> >>> > > >>
> >>> > > >> If we limit RocksDB's memory consumption by default, then I
> >>> believe
> >>> > that
> >>> > > >> 0.4 would give the better all-round experience as it leaves a
> bit
> >>> more
> >>> > > >> memory for RocksDB. However, I'm a bit sceptical whether we
> should
> >>> > > optimize
> >>> > > >> the default settings for a configuration where the user still
> >>> needs to
> >>> > > >> activate the strict memory limiting for RocksDB. In this case, I
> >>> would
> >>> > > >> expect that the user could also adapt the managed memory
> fraction.
> >>> > > >>
> >>> > > >> Cheers,
> >>> > > >> Till
> >>> > > >>
> >>> > > >> On Tue, Jan 14, 2020 at 3:39 AM Xintong Song <
> >>> [hidden email]>
> >>> > > >> wrote:
> >>> > > >>
> >>> > > >>> Thanks for the feedback, Stephan and Kurt.
> >>> > > >>>
> >>> > > >>> @Stephan
> >>> > > >>>
> >>> > > >>> Regarding managed memory fraction,
> >>> > > >>> - It makes sense to keep the default value 0.4, if we assume
> >>> rocksdb
> >>> > > >>> memory is limited by default.
> >>> > > >>> - AFAIK, currently rocksdb by default does not limit its memory
> >>> > usage.
> >>> > > >>> And I'm positive to change it.
> >>> > > >>> - Personally, I don't like the idea that we the out-of-box
> >>> experience
> >>> > > >>> (for which we set the default fraction) relies on that users
> will
> >>> > > manually
> >>> > > >>> turn another switch on.
> >>> > > >>>
> >>> > > >>> Regarding framework heap memory,
> >>> > > >>> - The major reason we set it by default is, as you mentioned,
> >>> that to
> >>> > > >>> have a safe net of minimal JVM heap size.
> >>> > > >>> - Also, considering the in progress FLIP-56 (dynamic slot
> >>> > allocation),
> >>> > > >>> we want to reserve some heap memory that will not go into the
> >>> slot
> >>> > > >>> profiles. That's why we decide the default value according to
> the
> >>> > heap
> >>> > > >>> memory usage of an empty task executor.
> >>> > > >>>
> >>> > > >>> @Kurt
> >>> > > >>> Regarding metaspace,
> >>> > > >>> - This config option ("taskmanager.memory.jvm-metaspace") only
> >>> takes
> >>> > > >>> effect on TMs. Currently we do not set metaspace size for JM.
> >>> > > >>> - If we have the same metaspace problem on TMs, then yes,
> >>> changing it
> >>> > > >>> from 128M to 64M will make it worse. However, IMO 10T tpc-ds
> >>> > benchmark
> >>> > > >>> should not be considered as out-of-box experience and it makes
> >>> sense
> >>> > to
> >>> > > >>> tune the configurations for it. I think the smaller metaspace
> >>> size
> >>> > > would be
> >>> > > >>> a better choice for the first trying-out, where a job should
> not
> >>> be
> >>> > too
> >>> > > >>> complicated, the TM size could be relative small (e.g. 1g).
> >>> > > >>>
> >>> > > >>> Thank you~
> >>> > > >>>
> >>> > > >>> Xintong Song
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>> On Tue, Jan 14, 2020 at 9:38 AM Kurt Young <[hidden email]>
> >>> wrote:
> >>> > > >>>
> >>> > > >>>> HI Xingtong,
> >>> > > >>>>
> >>> > > >>>> IIRC during our tpc-ds 10T benchmark, we have suffered by JM's
> >>> > > >>>> metaspace size and full gc which
> >>> > > >>>> caused by lots of classloadings of source input split. Could
> you
> >>> > check
> >>> > > >>>> whether changing the default
> >>> > > >>>> value from 128MB to 64MB will make it worse?
> >>> > > >>>>
> >>> > > >>>> Correct me if I misunderstood anything, also cc @Jingsong
> >>> > > >>>>
> >>> > > >>>> Best,
> >>> > > >>>> Kurt
> >>> > > >>>>
> >>> > > >>>>
> >>> > > >>>> On Tue, Jan 14, 2020 at 3:44 AM Stephan Ewen <
> [hidden email]>
> >>> > > wrote:
> >>> > > >>>>
> >>> > > >>>>> Hi all!
> >>> > > >>>>>
> >>> > > >>>>> Thanks a lot, Xintong, for this thorough analysis. Based on
> >>> your
> >>> > > >>>>> analysis,
> >>> > > >>>>> here are some thoughts:
> >>> > > >>>>>
> >>> > > >>>>> +1 to change default JVM metaspace size from 128MB to 64MB
> >>> > > >>>>> +1 to change default JVM overhead min size from 128MB to
> 196MB
> >>> > > >>>>>
> >>> > > >>>>> Concerning the managed memory fraction, I am not sure I would
> >>> > change
> >>> > > >>>>> it,
> >>> > > >>>>> for the following reasons:
> >>> > > >>>>>
> >>> > > >>>>>   - We should assume RocksDB will be limited to managed
> memory
> >>> by
> >>> > > >>>>> default.
> >>> > > >>>>> This will either be active by default or we would encourage
> >>> > everyone
> >>> > > >>>>> to use
> >>> > > >>>>> this by default, because otherwise it is super hard to reason
> >>> about
> >>> > > the
> >>> > > >>>>> RocksDB footprint.
> >>> > > >>>>>   - For standalone, a managed memory fraction of 0.3 is less
> >>> than
> >>> > > half
> >>> > > >>>>> of
> >>> > > >>>>> the managed memory from 1.9.
> >>> > > >>>>>   - I am not sure if the managed memory fraction is a value
> >>> that
> >>> > all
> >>> > > >>>>> users
> >>> > > >>>>> adjust immediately when scaling up the memory during their
> >>> first
> >>> > > >>>>> try-out
> >>> > > >>>>> phase. I would assume that most users initially only adjust
> >>> > > >>>>> "memory.flink.size" or "memory.process.size". A value of 0.3
> >>> will
> >>> > > lead
> >>> > > >>>>> to
> >>> > > >>>>> having too large heaps and very little RocksDB / batch memory
> >>> even
> >>> > > when
> >>> > > >>>>> scaling up during the initial exploration.
> >>> > > >>>>>   - I agree, though, that 0.5 looks too aggressive, from your
> >>> > > >>>>> benchmarks.
> >>> > > >>>>> So maybe keeping it at 0.4 could work?
> >>> > > >>>>>
> >>> > > >>>>> And one question: Why do we set the Framework Heap by
> default?
> >>> Is
> >>> > > that
> >>> > > >>>>> so
> >>> > > >>>>> we reduce the managed memory further is less than framework
> >>> heap
> >>> > > would
> >>> > > >>>>> be
> >>> > > >>>>> left from the JVM heap?
> >>> > > >>>>>
> >>> > > >>>>> Best,
> >>> > > >>>>> Stephan
> >>> > > >>>>>
> >>> > > >>>>> On Thu, Jan 9, 2020 at 10:54 AM Xintong Song <
> >>> > [hidden email]>
> >>> > > >>>>> wrote:
> >>> > > >>>>>
> >>> > > >>>>> > Hi all,
> >>> > > >>>>> >
> >>> > > >>>>> > As described in FLINK-15145 [1], we decided to tune the
> >>> default
> >>> > > >>>>> > configuration values of FLIP-49 with more jobs and cases.
> >>> > > >>>>> >
> >>> > > >>>>> > After spending time analyzing and tuning the
> configurations,
> >>> I've
> >>> > > >>>>> come
> >>> > > >>>>> > with several findings. To be brief, I would suggest the
> >>> following
> >>> > > >>>>> changes,
> >>> > > >>>>> > and for more details please take a look at my tuning report
> >>> [2].
> >>> > > >>>>> >
> >>> > > >>>>> >    - Change default managed memory fraction from 0.4 to
> 0.3.
> >>> > > >>>>> >    - Change default JVM metaspace size from 128MB to 64MB.
> >>> > > >>>>> >    - Change default JVM overhead min size from 128MB to
> >>> 196MB.
> >>> > > >>>>> >
> >>> > > >>>>> > Looking forward to your feedback.
> >>> > > >>>>> >
> >>> > > >>>>> > Thank you~
> >>> > > >>>>> >
> >>> > > >>>>> > Xintong Song
> >>> > > >>>>> >
> >>> > > >>>>> >
> >>> > > >>>>> > [1] https://issues.apache.org/jira/browse/FLINK-15145
> >>> > > >>>>> >
> >>> > > >>>>> > [2]
> >>> > > >>>>> >
> >>> > > >>>>>
> >>> > >
> >>> >
> >>>
> https://docs.google.com/document/d/1-LravhQYUIkXb7rh0XnBB78vSvhp3ecLSAgsiabfVkk/edit?usp=sharing
> >>> > > >>>>> >
> >>> > > >>>>> >
> >>> > > >>>>>
> >>> > > >>>>
> >>> > >
> >>> >
> >>>
> >>
>