Hi All,
As you may have noticed, 1.10 release included an extensive improvements to memory management and configuration of Task Managers, FLIP-49: [1]. The memory configuration of Job Managers has not been touched in 1.10. Although, Job Manager's memory model does not look so sophisticated as for Task Managers, It makes to align Job Manager memory model and settings with Task Managers. Therefore, we propose to reconsider it as well in 1.11 and I prepared a FLIP 116 [2] for that. Any feedback is appreciated. So far, there is one discussion point about how to address native non-direct memory usage of user code. The user code can be run e.g. in certain job submission scenarios within the JM process. For simplicity, FLIP suggests only an option for direct memory which is translated into the setting of the JVM direct memory limit. Although, we documented for TM that the similar parameters can also address native non-direct memory usage [3], this can lead to wrong functioning of the JVM direct memory limit. The direct memory option in JM could be also named in more general way, e.g. off-heap memory but this naming would somewhat hide its nature of JVM direct memory limit. On the other hand, JVM Overhead does not suffer from this problem and affects only the container/worker memory size which is the most important matter to address for the native non-direct memory consumption. The caveat here is that JVM Overhead was not supposed to be used by any Flink or user components. Thanks, Andrey [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers [3] https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#overview |
Thanks Andrey for kicking this discussion off.
Regarding "direct" vs. "off-heap", I'm personally in favor of renaming the "direct" memory in the current FLIP-116[1] to "off-heap" memory, and making it also account for user native memory usage. On one hand, I think it would be good that JM & TM provide consistent concepts and terminologies to users. IIUC, this is exactly the purpose of this FLIP. For TMs, we already have "off-heap" memory accounting for both direct and native memory usages, and we did this so that users do not need to understand the differences between the two kinds. On the other hand, while for TMs it is hard to tell which kind of memory is needed mostly due to variety of applications, I believe for JM the major memory consumption is heap memory in most cases. That means we probably can rely on the heap activities to trigger GC in most cases, and the max direct memory limit can act as a safe net. Moreover, I think the cases should be very rare that we need native memory for user codes. Therefore, we probably should not break the JM/TM consistency for potential risks in such rare cases. WDYT? Thank you~ Xintong Song [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers On Wed, Mar 11, 2020 at 8:56 PM Andrey Zagrebin <[hidden email]> wrote: > Hi All, > > As you may have noticed, 1.10 release included an extensive improvements to > memory management and configuration of Task Managers, FLIP-49: [1]. The > memory configuration of Job Managers has not been touched in 1.10. > > Although, Job Manager's memory model does not look so sophisticated as > for Task Managers, It makes to align Job Manager memory model and settings > with Task Managers. Therefore, we propose to reconsider it as well in 1.11 > and I prepared a FLIP 116 [2] for that. > > Any feedback is appreciated. > > So far, there is one discussion point about how to address native > non-direct memory usage of user code. The user code can be run e.g. in > certain job submission scenarios within the JM process. For simplicity, > FLIP suggests only an option for direct memory which is translated into the > setting of the JVM direct memory limit. > Although, we documented for TM that the similar parameters can also > address native non-direct memory usage [3], this can lead to wrong > functioning of the JVM direct memory limit. The direct memory option in JM > could be also named in more general way, e.g. off-heap memory but this > naming would somewhat hide its nature of JVM direct memory limit. > On the other hand, JVM Overhead does not suffer from this problem and > affects only the container/worker memory size which is the most important > matter to address for the native non-direct memory consumption. The caveat > here is that JVM Overhead was not supposed to be used by any Flink or user > components. > > Thanks, > Andrey > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors > [2] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers > [3] > > https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#overview > |
Thanks for creating this FLIP Andrey.
I agree with Xintong that we should rename jobmanager.memory.direct.size into jobmanager.memory.off-heap.size which accounts for native and direct memory usage. I think it should be good enough and is easier to understand for the user. Concerning the default value for the metaspace size. Did we take the lessons learned from the TM metaspace size into account? IIRC we are about to change the default value to 256 MB. Feel free to start a vote once these last two questions have been resolved. Cheers, Till On Thu, Mar 12, 2020 at 4:25 AM Xintong Song <[hidden email]> wrote: > Thanks Andrey for kicking this discussion off. > > Regarding "direct" vs. "off-heap", I'm personally in favor of renaming the > "direct" memory in the current FLIP-116[1] to "off-heap" memory, and making > it also account for user native memory usage. > > On one hand, I think it would be good that JM & TM provide consistent > concepts and terminologies to users. IIUC, this is exactly the purpose of > this FLIP. For TMs, we already have "off-heap" memory accounting for both > direct and native memory usages, and we did this so that users do not need > to understand the differences between the two kinds. > > On the other hand, while for TMs it is hard to tell which kind of memory is > needed mostly due to variety of applications, I believe for JM the major > memory consumption is heap memory in most cases. That means we probably can > rely on the heap activities to trigger GC in most cases, and the max direct > memory limit can act as a safe net. Moreover, I think the cases should be > very rare that we need native memory for user codes. Therefore, we probably > should not break the JM/TM consistency for potential risks in such rare > cases. > > WDYT? > > Thank you~ > > Xintong Song > > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers > > On Wed, Mar 11, 2020 at 8:56 PM Andrey Zagrebin <[hidden email]> > wrote: > > > Hi All, > > > > As you may have noticed, 1.10 release included an extensive improvements > to > > memory management and configuration of Task Managers, FLIP-49: [1]. The > > memory configuration of Job Managers has not been touched in 1.10. > > > > Although, Job Manager's memory model does not look so sophisticated as > > for Task Managers, It makes to align Job Manager memory model and > settings > > with Task Managers. Therefore, we propose to reconsider it as well in > 1.11 > > and I prepared a FLIP 116 [2] for that. > > > > Any feedback is appreciated. > > > > So far, there is one discussion point about how to address native > > non-direct memory usage of user code. The user code can be run e.g. in > > certain job submission scenarios within the JM process. For simplicity, > > FLIP suggests only an option for direct memory which is translated into > the > > setting of the JVM direct memory limit. > > Although, we documented for TM that the similar parameters can also > > address native non-direct memory usage [3], this can lead to wrong > > functioning of the JVM direct memory limit. The direct memory option in > JM > > could be also named in more general way, e.g. off-heap memory but this > > naming would somewhat hide its nature of JVM direct memory limit. > > On the other hand, JVM Overhead does not suffer from this problem and > > affects only the container/worker memory size which is the most important > > matter to address for the native non-direct memory consumption. The > caveat > > here is that JVM Overhead was not supposed to be used by any Flink or > user > > components. > > > > Thanks, > > Andrey > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors > > [2] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers > > [3] > > > > > https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#overview > > > |
Hi all,
Thanks for the feedback, Xintong and Till. > rename jobmanager.memory.direct.size into jobmanager.memory.off-heap.size I am ok with that to align it with TM and avoid further complications for users. I will adjust the FLIP. > change the default value of JM Metaspace size to 256 MB Indeed, no reason to assume that the user code would need less Metaspace in JM. I will change it unless a better argument is reported for another value. I think all concerns has been resolved so I am starting the voting in a separate thread. Best, Andrey On Tue, Mar 17, 2020 at 6:16 PM Till Rohrmann <[hidden email]> wrote: > Thanks for creating this FLIP Andrey. > > I agree with Xintong that we should rename jobmanager.memory.direct.size > into jobmanager.memory.off-heap.size which accounts for native and direct > memory usage. I think it should be good enough and is easier to understand > for the user. > > Concerning the default value for the metaspace size. Did we take the > lessons learned from the TM metaspace size into account? IIRC we are about > to change the default value to 256 MB. > > Feel free to start a vote once these last two questions have been resolved. > > Cheers, > Till > > On Thu, Mar 12, 2020 at 4:25 AM Xintong Song <[hidden email]> > wrote: > > > Thanks Andrey for kicking this discussion off. > > > > Regarding "direct" vs. "off-heap", I'm personally in favor of renaming > the > > "direct" memory in the current FLIP-116[1] to "off-heap" memory, and > making > > it also account for user native memory usage. > > > > On one hand, I think it would be good that JM & TM provide consistent > > concepts and terminologies to users. IIUC, this is exactly the purpose of > > this FLIP. For TMs, we already have "off-heap" memory accounting for both > > direct and native memory usages, and we did this so that users do not > need > > to understand the differences between the two kinds. > > > > On the other hand, while for TMs it is hard to tell which kind of memory > is > > needed mostly due to variety of applications, I believe for JM the major > > memory consumption is heap memory in most cases. That means we probably > can > > rely on the heap activities to trigger GC in most cases, and the max > direct > > memory limit can act as a safe net. Moreover, I think the cases should be > > very rare that we need native memory for user codes. Therefore, we > probably > > should not break the JM/TM consistency for potential risks in such rare > > cases. > > > > WDYT? > > > > Thank you~ > > > > Xintong Song > > > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers > > > > On Wed, Mar 11, 2020 at 8:56 PM Andrey Zagrebin <[hidden email]> > > wrote: > > > > > Hi All, > > > > > > As you may have noticed, 1.10 release included an extensive > improvements > > to > > > memory management and configuration of Task Managers, FLIP-49: [1]. The > > > memory configuration of Job Managers has not been touched in 1.10. > > > > > > Although, Job Manager's memory model does not look so sophisticated as > > > for Task Managers, It makes to align Job Manager memory model and > > settings > > > with Task Managers. Therefore, we propose to reconsider it as well in > > 1.11 > > > and I prepared a FLIP 116 [2] for that. > > > > > > Any feedback is appreciated. > > > > > > So far, there is one discussion point about how to address native > > > non-direct memory usage of user code. The user code can be run e.g. in > > > certain job submission scenarios within the JM process. For simplicity, > > > FLIP suggests only an option for direct memory which is translated into > > the > > > setting of the JVM direct memory limit. > > > Although, we documented for TM that the similar parameters can also > > > address native non-direct memory usage [3], this can lead to wrong > > > functioning of the JVM direct memory limit. The direct memory option in > > JM > > > could be also named in more general way, e.g. off-heap memory but this > > > naming would somewhat hide its nature of JVM direct memory limit. > > > On the other hand, JVM Overhead does not suffer from this problem and > > > affects only the container/worker memory size which is the most > important > > > matter to address for the native non-direct memory consumption. The > > caveat > > > here is that JVM Overhead was not supposed to be used by any Flink or > > user > > > components. > > > > > > Thanks, > > > Andrey > > > > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors > > > [2] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers > > > [3] > > > > > > > > > https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#overview > > > > > > |
Hi all,
One thing more thing to mention, the current calculations can lead to arbitrary small JVM Heap, maybe even zero. I suggest to introduce a check where we at least recommend to set the JVM heap to e.g. 128Mb. Additionally, we can demand some minimum value to function and fail if it is not fulfilled. We could experiment with what is the working minimum but It is hard to come up with this limit because it again can depend on the job and environment. Best, Andrey On Wed, Mar 18, 2020 at 5:03 PM Andrey Zagrebin <[hidden email]> wrote: > Hi all, > > Thanks for the feedback, Xintong and Till. > > > rename jobmanager.memory.direct.size into jobmanager.memory.off-heap.size > > I am ok with that to align it with TM and avoid further complications for > users. > I will adjust the FLIP. > > > change the default value of JM Metaspace size to 256 MB > > Indeed, no reason to assume that the user code would need less Metaspace > in JM. > I will change it unless a better argument is reported for another value. > > I think all concerns has been resolved so I am starting the voting in a > separate thread. > > Best, > Andrey > > On Tue, Mar 17, 2020 at 6:16 PM Till Rohrmann <[hidden email]> > wrote: > >> Thanks for creating this FLIP Andrey. >> >> I agree with Xintong that we should rename jobmanager.memory.direct.size >> into jobmanager.memory.off-heap.size which accounts for native and direct >> memory usage. I think it should be good enough and is easier to understand >> for the user. >> >> Concerning the default value for the metaspace size. Did we take the >> lessons learned from the TM metaspace size into account? IIRC we are about >> to change the default value to 256 MB. >> >> Feel free to start a vote once these last two questions have been >> resolved. >> >> Cheers, >> Till >> >> On Thu, Mar 12, 2020 at 4:25 AM Xintong Song <[hidden email]> >> wrote: >> >> > Thanks Andrey for kicking this discussion off. >> > >> > Regarding "direct" vs. "off-heap", I'm personally in favor of renaming >> the >> > "direct" memory in the current FLIP-116[1] to "off-heap" memory, and >> making >> > it also account for user native memory usage. >> > >> > On one hand, I think it would be good that JM & TM provide consistent >> > concepts and terminologies to users. IIUC, this is exactly the purpose >> of >> > this FLIP. For TMs, we already have "off-heap" memory accounting for >> both >> > direct and native memory usages, and we did this so that users do not >> need >> > to understand the differences between the two kinds. >> > >> > On the other hand, while for TMs it is hard to tell which kind of >> memory is >> > needed mostly due to variety of applications, I believe for JM the major >> > memory consumption is heap memory in most cases. That means we probably >> can >> > rely on the heap activities to trigger GC in most cases, and the max >> direct >> > memory limit can act as a safe net. Moreover, I think the cases should >> be >> > very rare that we need native memory for user codes. Therefore, we >> probably >> > should not break the JM/TM consistency for potential risks in such rare >> > cases. >> > >> > WDYT? >> > >> > Thank you~ >> > >> > Xintong Song >> > >> > >> > [1] >> > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers >> > >> > On Wed, Mar 11, 2020 at 8:56 PM Andrey Zagrebin <[hidden email]> >> > wrote: >> > >> > > Hi All, >> > > >> > > As you may have noticed, 1.10 release included an extensive >> improvements >> > to >> > > memory management and configuration of Task Managers, FLIP-49: [1]. >> The >> > > memory configuration of Job Managers has not been touched in 1.10. >> > > >> > > Although, Job Manager's memory model does not look so sophisticated as >> > > for Task Managers, It makes to align Job Manager memory model and >> > settings >> > > with Task Managers. Therefore, we propose to reconsider it as well in >> > 1.11 >> > > and I prepared a FLIP 116 [2] for that. >> > > >> > > Any feedback is appreciated. >> > > >> > > So far, there is one discussion point about how to address native >> > > non-direct memory usage of user code. The user code can be run e.g. in >> > > certain job submission scenarios within the JM process. For >> simplicity, >> > > FLIP suggests only an option for direct memory which is translated >> into >> > the >> > > setting of the JVM direct memory limit. >> > > Although, we documented for TM that the similar parameters can also >> > > address native non-direct memory usage [3], this can lead to wrong >> > > functioning of the JVM direct memory limit. The direct memory option >> in >> > JM >> > > could be also named in more general way, e.g. off-heap memory but this >> > > naming would somewhat hide its nature of JVM direct memory limit. >> > > On the other hand, JVM Overhead does not suffer from this problem and >> > > affects only the container/worker memory size which is the most >> important >> > > matter to address for the native non-direct memory consumption. The >> > caveat >> > > here is that JVM Overhead was not supposed to be used by any Flink or >> > user >> > > components. >> > > >> > > Thanks, >> > > Andrey >> > > >> > > [1] >> > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors >> > > [2] >> > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers >> > > [3] >> > > >> > > >> > >> https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#overview >> > > >> > >> > |
I think recommend a minimum value in docs and throw a warning if the heap
size is too small should be good enough. Not sure about failing job if the min heap is not fulfilled. As already mentioned, it would be hard to determine the min heap size. And if we make the min heap configurable, then in any case that users need to configure the min heap, they can configure the heap size directly. Thank you~ Xintong Song On Wed, Mar 18, 2020 at 10:55 PM Andrey Zagrebin <[hidden email]> wrote: > Hi all, > > One thing more thing to mention, the current calculations can lead to > arbitrary small JVM Heap, maybe even zero. > I suggest to introduce a check where we at least recommend to set the JVM > heap to e.g. 128Mb. > > Additionally, we can demand some minimum value to function and fail if it > is not fulfilled. > We could experiment with what is the working minimum but It is hard to come > up with this limit because it again can depend on the job and environment. > > Best, > Andrey > > On Wed, Mar 18, 2020 at 5:03 PM Andrey Zagrebin <[hidden email]> > wrote: > > > Hi all, > > > > Thanks for the feedback, Xintong and Till. > > > > > rename jobmanager.memory.direct.size into > jobmanager.memory.off-heap.size > > > > I am ok with that to align it with TM and avoid further complications for > > users. > > I will adjust the FLIP. > > > > > change the default value of JM Metaspace size to 256 MB > > > > Indeed, no reason to assume that the user code would need less Metaspace > > in JM. > > I will change it unless a better argument is reported for another value. > > > > I think all concerns has been resolved so I am starting the voting in a > > separate thread. > > > > Best, > > Andrey > > > > On Tue, Mar 17, 2020 at 6:16 PM Till Rohrmann <[hidden email]> > > wrote: > > > >> Thanks for creating this FLIP Andrey. > >> > >> I agree with Xintong that we should rename jobmanager.memory.direct.size > >> into jobmanager.memory.off-heap.size which accounts for native and > direct > >> memory usage. I think it should be good enough and is easier to > understand > >> for the user. > >> > >> Concerning the default value for the metaspace size. Did we take the > >> lessons learned from the TM metaspace size into account? IIRC we are > about > >> to change the default value to 256 MB. > >> > >> Feel free to start a vote once these last two questions have been > >> resolved. > >> > >> Cheers, > >> Till > >> > >> On Thu, Mar 12, 2020 at 4:25 AM Xintong Song <[hidden email]> > >> wrote: > >> > >> > Thanks Andrey for kicking this discussion off. > >> > > >> > Regarding "direct" vs. "off-heap", I'm personally in favor of renaming > >> the > >> > "direct" memory in the current FLIP-116[1] to "off-heap" memory, and > >> making > >> > it also account for user native memory usage. > >> > > >> > On one hand, I think it would be good that JM & TM provide consistent > >> > concepts and terminologies to users. IIUC, this is exactly the purpose > >> of > >> > this FLIP. For TMs, we already have "off-heap" memory accounting for > >> both > >> > direct and native memory usages, and we did this so that users do not > >> need > >> > to understand the differences between the two kinds. > >> > > >> > On the other hand, while for TMs it is hard to tell which kind of > >> memory is > >> > needed mostly due to variety of applications, I believe for JM the > major > >> > memory consumption is heap memory in most cases. That means we > probably > >> can > >> > rely on the heap activities to trigger GC in most cases, and the max > >> direct > >> > memory limit can act as a safe net. Moreover, I think the cases should > >> be > >> > very rare that we need native memory for user codes. Therefore, we > >> probably > >> > should not break the JM/TM consistency for potential risks in such > rare > >> > cases. > >> > > >> > WDYT? > >> > > >> > Thank you~ > >> > > >> > Xintong Song > >> > > >> > > >> > [1] > >> > > >> > > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers > >> > > >> > On Wed, Mar 11, 2020 at 8:56 PM Andrey Zagrebin <[hidden email] > > > >> > wrote: > >> > > >> > > Hi All, > >> > > > >> > > As you may have noticed, 1.10 release included an extensive > >> improvements > >> > to > >> > > memory management and configuration of Task Managers, FLIP-49: [1]. > >> The > >> > > memory configuration of Job Managers has not been touched in 1.10. > >> > > > >> > > Although, Job Manager's memory model does not look so sophisticated > as > >> > > for Task Managers, It makes to align Job Manager memory model and > >> > settings > >> > > with Task Managers. Therefore, we propose to reconsider it as well > in > >> > 1.11 > >> > > and I prepared a FLIP 116 [2] for that. > >> > > > >> > > Any feedback is appreciated. > >> > > > >> > > So far, there is one discussion point about how to address native > >> > > non-direct memory usage of user code. The user code can be run e.g. > in > >> > > certain job submission scenarios within the JM process. For > >> simplicity, > >> > > FLIP suggests only an option for direct memory which is translated > >> into > >> > the > >> > > setting of the JVM direct memory limit. > >> > > Although, we documented for TM that the similar parameters can also > >> > > address native non-direct memory usage [3], this can lead to wrong > >> > > functioning of the JVM direct memory limit. The direct memory option > >> in > >> > JM > >> > > could be also named in more general way, e.g. off-heap memory but > this > >> > > naming would somewhat hide its nature of JVM direct memory limit. > >> > > On the other hand, JVM Overhead does not suffer from this problem > and > >> > > affects only the container/worker memory size which is the most > >> important > >> > > matter to address for the native non-direct memory consumption. The > >> > caveat > >> > > here is that JVM Overhead was not supposed to be used by any Flink > or > >> > user > >> > > components. > >> > > > >> > > Thanks, > >> > > Andrey > >> > > > >> > > [1] > >> > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors > >> > > [2] > >> > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers > >> > > [3] > >> > > > >> > > > >> > > >> > https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#overview > >> > > > >> > > >> > > > |
I agree with Xintong's proposal. If we see that many users run into this
problem, then one could think about escalating the warning message into a failure. Cheers, Till On Thu, Mar 19, 2020 at 4:23 AM Xintong Song <[hidden email]> wrote: > I think recommend a minimum value in docs and throw a warning if the heap > size is too small should be good enough. > Not sure about failing job if the min heap is not fulfilled. As already > mentioned, it would be hard to determine the min heap size. And if we make > the min heap configurable, then in any case that users need to configure > the min heap, they can configure the heap size directly. > > Thank you~ > > Xintong Song > > > > On Wed, Mar 18, 2020 at 10:55 PM Andrey Zagrebin <[hidden email]> > wrote: > > > Hi all, > > > > One thing more thing to mention, the current calculations can lead to > > arbitrary small JVM Heap, maybe even zero. > > I suggest to introduce a check where we at least recommend to set the JVM > > heap to e.g. 128Mb. > > > > Additionally, we can demand some minimum value to function and fail if it > > is not fulfilled. > > We could experiment with what is the working minimum but It is hard to > come > > up with this limit because it again can depend on the job and > environment. > > > > Best, > > Andrey > > > > On Wed, Mar 18, 2020 at 5:03 PM Andrey Zagrebin <[hidden email]> > > wrote: > > > > > Hi all, > > > > > > Thanks for the feedback, Xintong and Till. > > > > > > > rename jobmanager.memory.direct.size into > > jobmanager.memory.off-heap.size > > > > > > I am ok with that to align it with TM and avoid further complications > for > > > users. > > > I will adjust the FLIP. > > > > > > > change the default value of JM Metaspace size to 256 MB > > > > > > Indeed, no reason to assume that the user code would need less > Metaspace > > > in JM. > > > I will change it unless a better argument is reported for another > value. > > > > > > I think all concerns has been resolved so I am starting the voting in a > > > separate thread. > > > > > > Best, > > > Andrey > > > > > > On Tue, Mar 17, 2020 at 6:16 PM Till Rohrmann <[hidden email]> > > > wrote: > > > > > >> Thanks for creating this FLIP Andrey. > > >> > > >> I agree with Xintong that we should rename > jobmanager.memory.direct.size > > >> into jobmanager.memory.off-heap.size which accounts for native and > > direct > > >> memory usage. I think it should be good enough and is easier to > > understand > > >> for the user. > > >> > > >> Concerning the default value for the metaspace size. Did we take the > > >> lessons learned from the TM metaspace size into account? IIRC we are > > about > > >> to change the default value to 256 MB. > > >> > > >> Feel free to start a vote once these last two questions have been > > >> resolved. > > >> > > >> Cheers, > > >> Till > > >> > > >> On Thu, Mar 12, 2020 at 4:25 AM Xintong Song <[hidden email]> > > >> wrote: > > >> > > >> > Thanks Andrey for kicking this discussion off. > > >> > > > >> > Regarding "direct" vs. "off-heap", I'm personally in favor of > renaming > > >> the > > >> > "direct" memory in the current FLIP-116[1] to "off-heap" memory, and > > >> making > > >> > it also account for user native memory usage. > > >> > > > >> > On one hand, I think it would be good that JM & TM provide > consistent > > >> > concepts and terminologies to users. IIUC, this is exactly the > purpose > > >> of > > >> > this FLIP. For TMs, we already have "off-heap" memory accounting for > > >> both > > >> > direct and native memory usages, and we did this so that users do > not > > >> need > > >> > to understand the differences between the two kinds. > > >> > > > >> > On the other hand, while for TMs it is hard to tell which kind of > > >> memory is > > >> > needed mostly due to variety of applications, I believe for JM the > > major > > >> > memory consumption is heap memory in most cases. That means we > > probably > > >> can > > >> > rely on the heap activities to trigger GC in most cases, and the max > > >> direct > > >> > memory limit can act as a safe net. Moreover, I think the cases > should > > >> be > > >> > very rare that we need native memory for user codes. Therefore, we > > >> probably > > >> > should not break the JM/TM consistency for potential risks in such > > rare > > >> > cases. > > >> > > > >> > WDYT? > > >> > > > >> > Thank you~ > > >> > > > >> > Xintong Song > > >> > > > >> > > > >> > [1] > > >> > > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers > > >> > > > >> > On Wed, Mar 11, 2020 at 8:56 PM Andrey Zagrebin < > [hidden email] > > > > > >> > wrote: > > >> > > > >> > > Hi All, > > >> > > > > >> > > As you may have noticed, 1.10 release included an extensive > > >> improvements > > >> > to > > >> > > memory management and configuration of Task Managers, FLIP-49: > [1]. > > >> The > > >> > > memory configuration of Job Managers has not been touched in 1.10. > > >> > > > > >> > > Although, Job Manager's memory model does not look so > sophisticated > > as > > >> > > for Task Managers, It makes to align Job Manager memory model and > > >> > settings > > >> > > with Task Managers. Therefore, we propose to reconsider it as well > > in > > >> > 1.11 > > >> > > and I prepared a FLIP 116 [2] for that. > > >> > > > > >> > > Any feedback is appreciated. > > >> > > > > >> > > So far, there is one discussion point about how to address native > > >> > > non-direct memory usage of user code. The user code can be run > e.g. > > in > > >> > > certain job submission scenarios within the JM process. For > > >> simplicity, > > >> > > FLIP suggests only an option for direct memory which is translated > > >> into > > >> > the > > >> > > setting of the JVM direct memory limit. > > >> > > Although, we documented for TM that the similar parameters can > also > > >> > > address native non-direct memory usage [3], this can lead to wrong > > >> > > functioning of the JVM direct memory limit. The direct memory > option > > >> in > > >> > JM > > >> > > could be also named in more general way, e.g. off-heap memory but > > this > > >> > > naming would somewhat hide its nature of JVM direct memory limit. > > >> > > On the other hand, JVM Overhead does not suffer from this problem > > and > > >> > > affects only the container/worker memory size which is the most > > >> important > > >> > > matter to address for the native non-direct memory consumption. > The > > >> > caveat > > >> > > here is that JVM Overhead was not supposed to be used by any Flink > > or > > >> > user > > >> > > components. > > >> > > > > >> > > Thanks, > > >> > > Andrey > > >> > > > > >> > > [1] > > >> > > > > >> > > > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors > > >> > > [2] > > >> > > > > >> > > > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers > > >> > > [3] > > >> > > > > >> > > > > >> > > > >> > > > https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#overview > > >> > > > > >> > > > >> > > > > > > |
Alright, thanks for the feedback. I also agree with it. Then this is resolved.
> On 19 Mar 2020, at 14:14, Till Rohrmann <[hidden email]> wrote: > > I agree with Xintong's proposal. If we see that many users run into this > problem, then one could think about escalating the warning message into a > failure. > > Cheers, > Till > > On Thu, Mar 19, 2020 at 4:23 AM Xintong Song <[hidden email]> wrote: > >> I think recommend a minimum value in docs and throw a warning if the heap >> size is too small should be good enough. >> Not sure about failing job if the min heap is not fulfilled. As already >> mentioned, it would be hard to determine the min heap size. And if we make >> the min heap configurable, then in any case that users need to configure >> the min heap, they can configure the heap size directly. >> >> Thank you~ >> >> Xintong Song >> >> >> >> On Wed, Mar 18, 2020 at 10:55 PM Andrey Zagrebin <[hidden email]> >> wrote: >> >>> Hi all, >>> >>> One thing more thing to mention, the current calculations can lead to >>> arbitrary small JVM Heap, maybe even zero. >>> I suggest to introduce a check where we at least recommend to set the JVM >>> heap to e.g. 128Mb. >>> >>> Additionally, we can demand some minimum value to function and fail if it >>> is not fulfilled. >>> We could experiment with what is the working minimum but It is hard to >> come >>> up with this limit because it again can depend on the job and >> environment. >>> >>> Best, >>> Andrey >>> >>> On Wed, Mar 18, 2020 at 5:03 PM Andrey Zagrebin <[hidden email]> >>> wrote: >>> >>>> Hi all, >>>> >>>> Thanks for the feedback, Xintong and Till. >>>> >>>>> rename jobmanager.memory.direct.size into >>> jobmanager.memory.off-heap.size >>>> >>>> I am ok with that to align it with TM and avoid further complications >> for >>>> users. >>>> I will adjust the FLIP. >>>> >>>>> change the default value of JM Metaspace size to 256 MB >>>> >>>> Indeed, no reason to assume that the user code would need less >> Metaspace >>>> in JM. >>>> I will change it unless a better argument is reported for another >> value. >>>> >>>> I think all concerns has been resolved so I am starting the voting in a >>>> separate thread. >>>> >>>> Best, >>>> Andrey >>>> >>>> On Tue, Mar 17, 2020 at 6:16 PM Till Rohrmann <[hidden email]> >>>> wrote: >>>> >>>>> Thanks for creating this FLIP Andrey. >>>>> >>>>> I agree with Xintong that we should rename >> jobmanager.memory.direct.size >>>>> into jobmanager.memory.off-heap.size which accounts for native and >>> direct >>>>> memory usage. I think it should be good enough and is easier to >>> understand >>>>> for the user. >>>>> >>>>> Concerning the default value for the metaspace size. Did we take the >>>>> lessons learned from the TM metaspace size into account? IIRC we are >>> about >>>>> to change the default value to 256 MB. >>>>> >>>>> Feel free to start a vote once these last two questions have been >>>>> resolved. >>>>> >>>>> Cheers, >>>>> Till >>>>> >>>>> On Thu, Mar 12, 2020 at 4:25 AM Xintong Song <[hidden email]> >>>>> wrote: >>>>> >>>>>> Thanks Andrey for kicking this discussion off. >>>>>> >>>>>> Regarding "direct" vs. "off-heap", I'm personally in favor of >> renaming >>>>> the >>>>>> "direct" memory in the current FLIP-116[1] to "off-heap" memory, and >>>>> making >>>>>> it also account for user native memory usage. >>>>>> >>>>>> On one hand, I think it would be good that JM & TM provide >> consistent >>>>>> concepts and terminologies to users. IIUC, this is exactly the >> purpose >>>>> of >>>>>> this FLIP. For TMs, we already have "off-heap" memory accounting for >>>>> both >>>>>> direct and native memory usages, and we did this so that users do >> not >>>>> need >>>>>> to understand the differences between the two kinds. >>>>>> >>>>>> On the other hand, while for TMs it is hard to tell which kind of >>>>> memory is >>>>>> needed mostly due to variety of applications, I believe for JM the >>> major >>>>>> memory consumption is heap memory in most cases. That means we >>> probably >>>>> can >>>>>> rely on the heap activities to trigger GC in most cases, and the max >>>>> direct >>>>>> memory limit can act as a safe net. Moreover, I think the cases >> should >>>>> be >>>>>> very rare that we need native memory for user codes. Therefore, we >>>>> probably >>>>>> should not break the JM/TM consistency for potential risks in such >>> rare >>>>>> cases. >>>>>> >>>>>> WDYT? >>>>>> >>>>>> Thank you~ >>>>>> >>>>>> Xintong Song >>>>>> >>>>>> >>>>>> [1] >>>>>> >>>>>> >>>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers >>>>>> >>>>>> On Wed, Mar 11, 2020 at 8:56 PM Andrey Zagrebin < >> [hidden email] >>>> >>>>>> wrote: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> As you may have noticed, 1.10 release included an extensive >>>>> improvements >>>>>> to >>>>>>> memory management and configuration of Task Managers, FLIP-49: >> [1]. >>>>> The >>>>>>> memory configuration of Job Managers has not been touched in 1.10. >>>>>>> >>>>>>> Although, Job Manager's memory model does not look so >> sophisticated >>> as >>>>>>> for Task Managers, It makes to align Job Manager memory model and >>>>>> settings >>>>>>> with Task Managers. Therefore, we propose to reconsider it as well >>> in >>>>>> 1.11 >>>>>>> and I prepared a FLIP 116 [2] for that. >>>>>>> >>>>>>> Any feedback is appreciated. >>>>>>> >>>>>>> So far, there is one discussion point about how to address native >>>>>>> non-direct memory usage of user code. The user code can be run >> e.g. >>> in >>>>>>> certain job submission scenarios within the JM process. For >>>>> simplicity, >>>>>>> FLIP suggests only an option for direct memory which is translated >>>>> into >>>>>> the >>>>>>> setting of the JVM direct memory limit. >>>>>>> Although, we documented for TM that the similar parameters can >> also >>>>>>> address native non-direct memory usage [3], this can lead to wrong >>>>>>> functioning of the JVM direct memory limit. The direct memory >> option >>>>> in >>>>>> JM >>>>>>> could be also named in more general way, e.g. off-heap memory but >>> this >>>>>>> naming would somewhat hide its nature of JVM direct memory limit. >>>>>>> On the other hand, JVM Overhead does not suffer from this problem >>> and >>>>>>> affects only the container/worker memory size which is the most >>>>> important >>>>>>> matter to address for the native non-direct memory consumption. >> The >>>>>> caveat >>>>>>> here is that JVM Overhead was not supposed to be used by any Flink >>> or >>>>>> user >>>>>>> components. >>>>>>> >>>>>>> Thanks, >>>>>>> Andrey >>>>>>> >>>>>>> [1] >>>>>>> >>>>>>> >>>>>> >>>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors >>>>>>> [2] >>>>>>> >>>>>>> >>>>>> >>>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers >>>>>>> [3] >>>>>>> >>>>>>> >>>>>> >>>>> >>> >> https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#overview >>>>>>> >>>>>> >>>>> >>>> >>> >> |
Free forum by Nabble | Edit this page |