Thanks all for the discussion. I believe we have get consensus on all the
open questions discussed in this thread. Since Andrey already create a jira ticket for renaming shuffle memory config keys with "taskmanager.memory.network.*", I'll create ticket for the other topic that puts flink.size in flink-conf.yaml. Thank you~ Xintong Song On Wed, Jan 8, 2020 at 9:39 PM Andrey Zagrebin <[hidden email]> wrote: > It also looks to me that we should only swap network and memory in the > option names: 'taskmanager.memory.network.*'. > There is no strong consensus towards using new 'shuffle' naming so we can > just rename it to 'taskmanager.memory.network.*' as 'shuffle' naming has > never been released. > When we have other shuffle services and start advertising more this concept > in Flink, we could revisit again the whole naming for this concept. > https://jira.apache.org/jira/browse/FLINK-15517 > |
Great! Thanks, guys, for the continued effort on this topic!
On Thu, Jan 9, 2020 at 5:27 AM Xintong Song <[hidden email]> wrote: > Thanks all for the discussion. I believe we have get consensus on all the > open questions discussed in this thread. > > Since Andrey already create a jira ticket for renaming shuffle memory > config keys with "taskmanager.memory.network.*", I'll create ticket for the > other topic that puts flink.size in flink-conf.yaml. > > Thank you~ > > Xintong Song > > > > On Wed, Jan 8, 2020 at 9:39 PM Andrey Zagrebin <[hidden email]> > wrote: > > > It also looks to me that we should only swap network and memory in the > > option names: 'taskmanager.memory.network.*'. > > There is no strong consensus towards using new 'shuffle' naming so we can > > just rename it to 'taskmanager.memory.network.*' as 'shuffle' naming has > > never been released. > > When we have other shuffle services and start advertising more this > concept > > in Flink, we could revisit again the whole naming for this concept. > > https://jira.apache.org/jira/browse/FLINK-15517 > > > |
Hi all,
While working on changing process memory to Flink memory in default configuration, Xintong encountered a problem. When -tm option is used to rewrite container memory, basically process memory, it can collide with the default Flink memory. For legacy users it should not be a problem as we adjusted the legacy heap size option to be interpreted differently for standalone and container modes. One solution could be to say in -tm docs that we rewrite both options under the hood: process and Flink memory, basically unset Flink memory from yaml config. The downside is that this adds more magic. Alternatively, we can keep process memory in default config and, as mentioned before, increase it to maintain the user experience by matching the previous default setting for heap (now Flink in standalone) size. The Flink memory can be mentioned in process memory comment as a simpler alternative which does not require accounting for JVM overhead. The downside is again more confusion while trying out Flink and tuning memory at the same time. On the other hand, if memory already needs to be tuned it should quite quickly lead to the necessity of understanding the memory model in Flink. Best, Andrey On Thu, Jan 9, 2020 at 12:27 PM Stephan Ewen <[hidden email]> wrote: > Great! Thanks, guys, for the continued effort on this topic! > > On Thu, Jan 9, 2020 at 5:27 AM Xintong Song <[hidden email]> wrote: > > > Thanks all for the discussion. I believe we have get consensus on all the > > open questions discussed in this thread. > > > > Since Andrey already create a jira ticket for renaming shuffle memory > > config keys with "taskmanager.memory.network.*", I'll create ticket for > the > > other topic that puts flink.size in flink-conf.yaml. > > > > Thank you~ > > > > Xintong Song > > > > > > > > On Wed, Jan 8, 2020 at 9:39 PM Andrey Zagrebin <[hidden email]> > > wrote: > > > > > It also looks to me that we should only swap network and memory in the > > > option names: 'taskmanager.memory.network.*'. > > > There is no strong consensus towards using new 'shuffle' naming so we > can > > > just rename it to 'taskmanager.memory.network.*' as 'shuffle' naming > has > > > never been released. > > > When we have other shuffle services and start advertising more this > > concept > > > in Flink, we could revisit again the whole naming for this concept. > > > https://jira.apache.org/jira/browse/FLINK-15517 > > > > > > |
I think we need an interpretation of "-tm" regardless of what is in the
default configuration, because we can always have a modified configuration and then a user passes the "-tm" flag. I kind of like the first proposal: Interpret "-tm" as "override memory size config and set the Yarn TM container size." It would mean exactly ignoring "taskmanager.memory.flink.size" and using the "-tm" value as " "taskmanager.memory.process.size". That does not sound too bad to me. Best, Stephan On Mon, Jan 13, 2020 at 5:35 PM Andrey Zagrebin <[hidden email]> wrote: > Hi all, > > While working on changing process memory to Flink memory in default > configuration, Xintong encountered a problem. > When -tm option is used to rewrite container memory, basically process > memory, it can collide with the default Flink memory. > For legacy users it should not be a problem as we adjusted the legacy heap > size option to be interpreted differently for standalone and container > modes. > > One solution could be to say in -tm docs that we rewrite both options under > the hood: process and Flink memory, basically unset Flink memory from yaml > config. > The downside is that this adds more magic. > > Alternatively, we can keep process memory in default config and, as > mentioned before, increase it to maintain the user experience by matching > the previous default setting for heap (now Flink in standalone) size. > The Flink memory can be mentioned in process memory comment as a simpler > alternative which does not require accounting for JVM overhead. > The downside is again more confusion while trying out Flink and tuning > memory at the same time. > On the other hand, if memory already needs to be tuned it should > quite quickly lead to the necessity of understanding the memory model in > Flink. > > Best, > Andrey > > On Thu, Jan 9, 2020 at 12:27 PM Stephan Ewen <[hidden email]> wrote: > > > Great! Thanks, guys, for the continued effort on this topic! > > > > On Thu, Jan 9, 2020 at 5:27 AM Xintong Song <[hidden email]> > wrote: > > > > > Thanks all for the discussion. I believe we have get consensus on all > the > > > open questions discussed in this thread. > > > > > > Since Andrey already create a jira ticket for renaming shuffle memory > > > config keys with "taskmanager.memory.network.*", I'll create ticket for > > the > > > other topic that puts flink.size in flink-conf.yaml. > > > > > > Thank you~ > > > > > > Xintong Song > > > > > > > > > > > > On Wed, Jan 8, 2020 at 9:39 PM Andrey Zagrebin <[hidden email]> > > > wrote: > > > > > > > It also looks to me that we should only swap network and memory in > the > > > > option names: 'taskmanager.memory.network.*'. > > > > There is no strong consensus towards using new 'shuffle' naming so we > > can > > > > just rename it to 'taskmanager.memory.network.*' as 'shuffle' naming > > has > > > > never been released. > > > > When we have other shuffle services and start advertising more this > > > concept > > > > in Flink, we could revisit again the whole naming for this concept. > > > > https://jira.apache.org/jira/browse/FLINK-15517 > > > > > > > > > > |
Would be good to hear the thoughts of some more Yarn users, though.
On Mon, Jan 13, 2020 at 7:23 PM Stephan Ewen <[hidden email]> wrote: > I think we need an interpretation of "-tm" regardless of what is in the > default configuration, because we can always have a modified configuration > and then a user passes the "-tm" flag. > > I kind of like the first proposal: Interpret "-tm" as "override memory > size config and set the Yarn TM container size." It would mean exactly > ignoring "taskmanager.memory.flink.size" and using the "-tm" value as " > "taskmanager.memory.process.size". > That does not sound too bad to me. > > Best, > Stephan > > > On Mon, Jan 13, 2020 at 5:35 PM Andrey Zagrebin <[hidden email]> > wrote: > >> Hi all, >> >> While working on changing process memory to Flink memory in default >> configuration, Xintong encountered a problem. >> When -tm option is used to rewrite container memory, basically process >> memory, it can collide with the default Flink memory. >> For legacy users it should not be a problem as we adjusted the legacy heap >> size option to be interpreted differently for standalone and container >> modes. >> >> One solution could be to say in -tm docs that we rewrite both options >> under >> the hood: process and Flink memory, basically unset Flink memory from yaml >> config. >> The downside is that this adds more magic. >> >> Alternatively, we can keep process memory in default config and, as >> mentioned before, increase it to maintain the user experience by matching >> the previous default setting for heap (now Flink in standalone) size. >> The Flink memory can be mentioned in process memory comment as a simpler >> alternative which does not require accounting for JVM overhead. >> The downside is again more confusion while trying out Flink and tuning >> memory at the same time. >> On the other hand, if memory already needs to be tuned it should >> quite quickly lead to the necessity of understanding the memory model in >> Flink. >> >> Best, >> Andrey >> >> On Thu, Jan 9, 2020 at 12:27 PM Stephan Ewen <[hidden email]> wrote: >> >> > Great! Thanks, guys, for the continued effort on this topic! >> > >> > On Thu, Jan 9, 2020 at 5:27 AM Xintong Song <[hidden email]> >> wrote: >> > >> > > Thanks all for the discussion. I believe we have get consensus on all >> the >> > > open questions discussed in this thread. >> > > >> > > Since Andrey already create a jira ticket for renaming shuffle memory >> > > config keys with "taskmanager.memory.network.*", I'll create ticket >> for >> > the >> > > other topic that puts flink.size in flink-conf.yaml. >> > > >> > > Thank you~ >> > > >> > > Xintong Song >> > > >> > > >> > > >> > > On Wed, Jan 8, 2020 at 9:39 PM Andrey Zagrebin <[hidden email]> >> > > wrote: >> > > >> > > > It also looks to me that we should only swap network and memory in >> the >> > > > option names: 'taskmanager.memory.network.*'. >> > > > There is no strong consensus towards using new 'shuffle' naming so >> we >> > can >> > > > just rename it to 'taskmanager.memory.network.*' as 'shuffle' >> naming >> > has >> > > > never been released. >> > > > When we have other shuffle services and start advertising more this >> > > concept >> > > > in Flink, we could revisit again the whole naming for this concept. >> > > > https://jira.apache.org/jira/browse/FLINK-15517 >> > > > >> > > >> > >> > |
True, even we have "process.size" rather than "flink.size" in the default
config file, user can still have "flink.size" in their custom config file. If we consider "-tm" as a shortcut for users to override the TM memory size, then it makes less sense to require users to remove "flink.size" from their config file whenever then want to use "-tm". I'm convinced that ignoring "flink.size" might not be a bad idea. And I think we should also update the description of "-tm" (in "FlinkYarnSessionCli") to explicitly mention that it would overwrite "flink.size" and "process.size". Thank you~ Xintong Song On Tue, Jan 14, 2020 at 2:24 AM Stephan Ewen <[hidden email]> wrote: > Would be good to hear the thoughts of some more Yarn users, though. > > On Mon, Jan 13, 2020 at 7:23 PM Stephan Ewen <[hidden email]> wrote: > > > I think we need an interpretation of "-tm" regardless of what is in the > > default configuration, because we can always have a modified > configuration > > and then a user passes the "-tm" flag. > > > > I kind of like the first proposal: Interpret "-tm" as "override memory > > size config and set the Yarn TM container size." It would mean exactly > > ignoring "taskmanager.memory.flink.size" and using the "-tm" value as " > > "taskmanager.memory.process.size". > > That does not sound too bad to me. > > > > Best, > > Stephan > > > > > > On Mon, Jan 13, 2020 at 5:35 PM Andrey Zagrebin <[hidden email]> > > wrote: > > > >> Hi all, > >> > >> While working on changing process memory to Flink memory in default > >> configuration, Xintong encountered a problem. > >> When -tm option is used to rewrite container memory, basically process > >> memory, it can collide with the default Flink memory. > >> For legacy users it should not be a problem as we adjusted the legacy > heap > >> size option to be interpreted differently for standalone and container > >> modes. > >> > >> One solution could be to say in -tm docs that we rewrite both options > >> under > >> the hood: process and Flink memory, basically unset Flink memory from > yaml > >> config. > >> The downside is that this adds more magic. > >> > >> Alternatively, we can keep process memory in default config and, as > >> mentioned before, increase it to maintain the user experience by > matching > >> the previous default setting for heap (now Flink in standalone) size. > >> The Flink memory can be mentioned in process memory comment as a simpler > >> alternative which does not require accounting for JVM overhead. > >> The downside is again more confusion while trying out Flink and tuning > >> memory at the same time. > >> On the other hand, if memory already needs to be tuned it should > >> quite quickly lead to the necessity of understanding the memory model in > >> Flink. > >> > >> Best, > >> Andrey > >> > >> On Thu, Jan 9, 2020 at 12:27 PM Stephan Ewen <[hidden email]> wrote: > >> > >> > Great! Thanks, guys, for the continued effort on this topic! > >> > > >> > On Thu, Jan 9, 2020 at 5:27 AM Xintong Song <[hidden email]> > >> wrote: > >> > > >> > > Thanks all for the discussion. I believe we have get consensus on > all > >> the > >> > > open questions discussed in this thread. > >> > > > >> > > Since Andrey already create a jira ticket for renaming shuffle > memory > >> > > config keys with "taskmanager.memory.network.*", I'll create ticket > >> for > >> > the > >> > > other topic that puts flink.size in flink-conf.yaml. > >> > > > >> > > Thank you~ > >> > > > >> > > Xintong Song > >> > > > >> > > > >> > > > >> > > On Wed, Jan 8, 2020 at 9:39 PM Andrey Zagrebin < > [hidden email]> > >> > > wrote: > >> > > > >> > > > It also looks to me that we should only swap network and memory in > >> the > >> > > > option names: 'taskmanager.memory.network.*'. > >> > > > There is no strong consensus towards using new 'shuffle' naming so > >> we > >> > can > >> > > > just rename it to 'taskmanager.memory.network.*' as 'shuffle' > >> naming > >> > has > >> > > > never been released. > >> > > > When we have other shuffle services and start advertising more > this > >> > > concept > >> > > > in Flink, we could revisit again the whole naming for this > concept. > >> > > > https://jira.apache.org/jira/browse/FLINK-15517 > >> > > > > >> > > > >> > > >> > > > |
Clearing the `flink.size` option and setting `process.size` could indeed be
a solution. The thing I'm wondering is what would happen if the user has configured `task.heap.size` and `managed.size` instead of `flink.size`? Would we also ignore these settings? If not, then we risk to run into the situation that TaskExecutorResourceUtils fails because the memory does not add up. I guess the thing which bugs me a bit is the special casing which could lead easily into inconsistent behaviour if we don't cover all cases. The consequence of using slightly different concepts (`flink.size`, `process.size`) in standalone vs. container/Yarn/Mesos mode in order to keep the status quo is an increased maintenance overhead which we should be aware of. Cheers, Till On Tue, Jan 14, 2020 at 3:59 AM Xintong Song <[hidden email]> wrote: > True, even we have "process.size" rather than "flink.size" in the default > config file, user can still have "flink.size" in their custom config file. > If we consider "-tm" as a shortcut for users to override the TM memory > size, then it makes less sense to require users to remove "flink.size" from > their config file whenever then want to use "-tm". > > I'm convinced that ignoring "flink.size" might not be a bad idea. > And I think we should also update the description of "-tm" (in > "FlinkYarnSessionCli") to explicitly mention that it would overwrite > "flink.size" and "process.size". > > Thank you~ > > Xintong Song > > > > On Tue, Jan 14, 2020 at 2:24 AM Stephan Ewen <[hidden email]> wrote: > > > Would be good to hear the thoughts of some more Yarn users, though. > > > > On Mon, Jan 13, 2020 at 7:23 PM Stephan Ewen <[hidden email]> wrote: > > > > > I think we need an interpretation of "-tm" regardless of what is in the > > > default configuration, because we can always have a modified > > configuration > > > and then a user passes the "-tm" flag. > > > > > > I kind of like the first proposal: Interpret "-tm" as "override memory > > > size config and set the Yarn TM container size." It would mean exactly > > > ignoring "taskmanager.memory.flink.size" and using the "-tm" value as " > > > "taskmanager.memory.process.size". > > > That does not sound too bad to me. > > > > > > Best, > > > Stephan > > > > > > > > > On Mon, Jan 13, 2020 at 5:35 PM Andrey Zagrebin <[hidden email]> > > > wrote: > > > > > >> Hi all, > > >> > > >> While working on changing process memory to Flink memory in default > > >> configuration, Xintong encountered a problem. > > >> When -tm option is used to rewrite container memory, basically process > > >> memory, it can collide with the default Flink memory. > > >> For legacy users it should not be a problem as we adjusted the legacy > > heap > > >> size option to be interpreted differently for standalone and container > > >> modes. > > >> > > >> One solution could be to say in -tm docs that we rewrite both options > > >> under > > >> the hood: process and Flink memory, basically unset Flink memory from > > yaml > > >> config. > > >> The downside is that this adds more magic. > > >> > > >> Alternatively, we can keep process memory in default config and, as > > >> mentioned before, increase it to maintain the user experience by > > matching > > >> the previous default setting for heap (now Flink in standalone) size. > > >> The Flink memory can be mentioned in process memory comment as a > simpler > > >> alternative which does not require accounting for JVM overhead. > > >> The downside is again more confusion while trying out Flink and tuning > > >> memory at the same time. > > >> On the other hand, if memory already needs to be tuned it should > > >> quite quickly lead to the necessity of understanding the memory model > in > > >> Flink. > > >> > > >> Best, > > >> Andrey > > >> > > >> On Thu, Jan 9, 2020 at 12:27 PM Stephan Ewen <[hidden email]> > wrote: > > >> > > >> > Great! Thanks, guys, for the continued effort on this topic! > > >> > > > >> > On Thu, Jan 9, 2020 at 5:27 AM Xintong Song <[hidden email]> > > >> wrote: > > >> > > > >> > > Thanks all for the discussion. I believe we have get consensus on > > all > > >> the > > >> > > open questions discussed in this thread. > > >> > > > > >> > > Since Andrey already create a jira ticket for renaming shuffle > > memory > > >> > > config keys with "taskmanager.memory.network.*", I'll create > ticket > > >> for > > >> > the > > >> > > other topic that puts flink.size in flink-conf.yaml. > > >> > > > > >> > > Thank you~ > > >> > > > > >> > > Xintong Song > > >> > > > > >> > > > > >> > > > > >> > > On Wed, Jan 8, 2020 at 9:39 PM Andrey Zagrebin < > > [hidden email]> > > >> > > wrote: > > >> > > > > >> > > > It also looks to me that we should only swap network and memory > in > > >> the > > >> > > > option names: 'taskmanager.memory.network.*'. > > >> > > > There is no strong consensus towards using new 'shuffle' naming > so > > >> we > > >> > can > > >> > > > just rename it to 'taskmanager.memory.network.*' as 'shuffle' > > >> naming > > >> > has > > >> > > > never been released. > > >> > > > When we have other shuffle services and start advertising more > > this > > >> > > concept > > >> > > > in Flink, we could revisit again the whole naming for this > > concept. > > >> > > > https://jira.apache.org/jira/browse/FLINK-15517 > > >> > > > > > >> > > > > >> > > > >> > > > > > > |
I think that problem exists anyways and is independent of the "-tm" option.
You can have a combination of `task.heap.size` and `managed.size` and `framework.heap.size` that conflicts with `flink.size`. In that case, we get an exception during the startup (incompatible memory configuration)? That is the price for having these "shortcut" options (`flink.size` and `process.size`). But it is a fair price, because the shortcuts are very valuable to most users. That is added with the "-tm" setting is that this is an easy way to get the shortcut setting added, even if it was not in the config. So where a config worked in standalone, it can now fail in a container setup when "-tm" is used. But that is expected, I guess, when you start manually tune different memory types and end up defining an inconsistent combination. And it never conflicts with the default setup, only with manually tuned regions. But this example shows why we need to have log output for the logic that validates and configures the memory settings, so that users understand what is happening. Best, Stephan On Tue, Jan 14, 2020 at 2:54 PM Till Rohrmann <[hidden email]> wrote: > Clearing the `flink.size` option and setting `process.size` could indeed be > a solution. The thing I'm wondering is what would happen if the user has > configured `task.heap.size` and `managed.size` instead of `flink.size`? > Would we also ignore these settings? If not, then we risk to run into the > situation that TaskExecutorResourceUtils fails because the memory does not > add up. I guess the thing which bugs me a bit is the special casing which > could lead easily into inconsistent behaviour if we don't cover all cases. > The consequence of using slightly different concepts (`flink.size`, > `process.size`) in standalone vs. container/Yarn/Mesos mode in order to > keep the status quo is an increased maintenance overhead which we should be > aware of. > > Cheers, > Till > > On Tue, Jan 14, 2020 at 3:59 AM Xintong Song <[hidden email]> > wrote: > > > True, even we have "process.size" rather than "flink.size" in the default > > config file, user can still have "flink.size" in their custom config > file. > > If we consider "-tm" as a shortcut for users to override the TM memory > > size, then it makes less sense to require users to remove "flink.size" > from > > their config file whenever then want to use "-tm". > > > > I'm convinced that ignoring "flink.size" might not be a bad idea. > > And I think we should also update the description of "-tm" (in > > "FlinkYarnSessionCli") to explicitly mention that it would overwrite > > "flink.size" and "process.size". > > > > Thank you~ > > > > Xintong Song > > > > > > > > On Tue, Jan 14, 2020 at 2:24 AM Stephan Ewen <[hidden email]> wrote: > > > > > Would be good to hear the thoughts of some more Yarn users, though. > > > > > > On Mon, Jan 13, 2020 at 7:23 PM Stephan Ewen <[hidden email]> wrote: > > > > > > > I think we need an interpretation of "-tm" regardless of what is in > the > > > > default configuration, because we can always have a modified > > > configuration > > > > and then a user passes the "-tm" flag. > > > > > > > > I kind of like the first proposal: Interpret "-tm" as "override > memory > > > > size config and set the Yarn TM container size." It would mean > exactly > > > > ignoring "taskmanager.memory.flink.size" and using the "-tm" value > as " > > > > "taskmanager.memory.process.size". > > > > That does not sound too bad to me. > > > > > > > > Best, > > > > Stephan > > > > > > > > > > > > On Mon, Jan 13, 2020 at 5:35 PM Andrey Zagrebin < > [hidden email]> > > > > wrote: > > > > > > > >> Hi all, > > > >> > > > >> While working on changing process memory to Flink memory in default > > > >> configuration, Xintong encountered a problem. > > > >> When -tm option is used to rewrite container memory, basically > process > > > >> memory, it can collide with the default Flink memory. > > > >> For legacy users it should not be a problem as we adjusted the > legacy > > > heap > > > >> size option to be interpreted differently for standalone and > container > > > >> modes. > > > >> > > > >> One solution could be to say in -tm docs that we rewrite both > options > > > >> under > > > >> the hood: process and Flink memory, basically unset Flink memory > from > > > yaml > > > >> config. > > > >> The downside is that this adds more magic. > > > >> > > > >> Alternatively, we can keep process memory in default config and, as > > > >> mentioned before, increase it to maintain the user experience by > > > matching > > > >> the previous default setting for heap (now Flink in standalone) > size. > > > >> The Flink memory can be mentioned in process memory comment as a > > simpler > > > >> alternative which does not require accounting for JVM overhead. > > > >> The downside is again more confusion while trying out Flink and > tuning > > > >> memory at the same time. > > > >> On the other hand, if memory already needs to be tuned it should > > > >> quite quickly lead to the necessity of understanding the memory > model > > in > > > >> Flink. > > > >> > > > >> Best, > > > >> Andrey > > > >> > > > >> On Thu, Jan 9, 2020 at 12:27 PM Stephan Ewen <[hidden email]> > > wrote: > > > >> > > > >> > Great! Thanks, guys, for the continued effort on this topic! > > > >> > > > > >> > On Thu, Jan 9, 2020 at 5:27 AM Xintong Song < > [hidden email]> > > > >> wrote: > > > >> > > > > >> > > Thanks all for the discussion. I believe we have get consensus > on > > > all > > > >> the > > > >> > > open questions discussed in this thread. > > > >> > > > > > >> > > Since Andrey already create a jira ticket for renaming shuffle > > > memory > > > >> > > config keys with "taskmanager.memory.network.*", I'll create > > ticket > > > >> for > > > >> > the > > > >> > > other topic that puts flink.size in flink-conf.yaml. > > > >> > > > > > >> > > Thank you~ > > > >> > > > > > >> > > Xintong Song > > > >> > > > > > >> > > > > > >> > > > > > >> > > On Wed, Jan 8, 2020 at 9:39 PM Andrey Zagrebin < > > > [hidden email]> > > > >> > > wrote: > > > >> > > > > > >> > > > It also looks to me that we should only swap network and > memory > > in > > > >> the > > > >> > > > option names: 'taskmanager.memory.network.*'. > > > >> > > > There is no strong consensus towards using new 'shuffle' > naming > > so > > > >> we > > > >> > can > > > >> > > > just rename it to 'taskmanager.memory.network.*' as 'shuffle' > > > >> naming > > > >> > has > > > >> > > > never been released. > > > >> > > > When we have other shuffle services and start advertising more > > > this > > > >> > > concept > > > >> > > > in Flink, we could revisit again the whole naming for this > > > concept. > > > >> > > > https://jira.apache.org/jira/browse/FLINK-15517 > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > > |
Hi all,
Stephan, Till and me had another offline discussion today. Here is the outcome of our brainstorm. We agreed to have process.size in the default settings with the explanation of flink.size alternative in the comment. This way we keep -tm as a shortcut to process.size only and any inconsistencies fail fast as if configured in yaml. I will also follow-up on the thread "[Discuss] Tuning FLIP-49 configuration default values" with a bit more details. If no further objections, we can conclude this last point in this discussion. Best, Andrey On Tue, Jan 14, 2020 at 4:28 PM Stephan Ewen <[hidden email]> wrote: > I think that problem exists anyways and is independent of the "-tm" option. > > You can have a combination of `task.heap.size` and `managed.size` and > `framework.heap.size` that conflicts with `flink.size`. In that case, we > get an exception during the startup (incompatible memory configuration)? > That is the price for having these "shortcut" options (`flink.size` and > `process.size`). But it is a fair price, because the shortcuts are very > valuable to most users. > > That is added with the "-tm" setting is that this is an easy way to get the > shortcut setting added, even if it was not in the config. So where a config > worked in standalone, it can now fail in a container setup when "-tm" is > used. > But that is expected, I guess, when you start manually tune different > memory types and end up defining an inconsistent combination. And it never > conflicts with the default setup, only with manually tuned regions. > > But this example shows why we need to have log output for the logic that > validates and configures the memory settings, so that users understand what > is happening. > > Best, > Stephan > > > On Tue, Jan 14, 2020 at 2:54 PM Till Rohrmann <[hidden email]> > wrote: > > > Clearing the `flink.size` option and setting `process.size` could indeed > be > > a solution. The thing I'm wondering is what would happen if the user has > > configured `task.heap.size` and `managed.size` instead of `flink.size`? > > Would we also ignore these settings? If not, then we risk to run into the > > situation that TaskExecutorResourceUtils fails because the memory does > not > > add up. I guess the thing which bugs me a bit is the special casing which > > could lead easily into inconsistent behaviour if we don't cover all > cases. > > The consequence of using slightly different concepts (`flink.size`, > > `process.size`) in standalone vs. container/Yarn/Mesos mode in order to > > keep the status quo is an increased maintenance overhead which we should > be > > aware of. > > > > Cheers, > > Till > > > > On Tue, Jan 14, 2020 at 3:59 AM Xintong Song <[hidden email]> > > wrote: > > > > > True, even we have "process.size" rather than "flink.size" in the > default > > > config file, user can still have "flink.size" in their custom config > > file. > > > If we consider "-tm" as a shortcut for users to override the TM memory > > > size, then it makes less sense to require users to remove "flink.size" > > from > > > their config file whenever then want to use "-tm". > > > > > > I'm convinced that ignoring "flink.size" might not be a bad idea. > > > And I think we should also update the description of "-tm" (in > > > "FlinkYarnSessionCli") to explicitly mention that it would overwrite > > > "flink.size" and "process.size". > > > > > > Thank you~ > > > > > > Xintong Song > > > > > > > > > > > > On Tue, Jan 14, 2020 at 2:24 AM Stephan Ewen <[hidden email]> wrote: > > > > > > > Would be good to hear the thoughts of some more Yarn users, though. > > > > > > > > On Mon, Jan 13, 2020 at 7:23 PM Stephan Ewen <[hidden email]> > wrote: > > > > > > > > > I think we need an interpretation of "-tm" regardless of what is in > > the > > > > > default configuration, because we can always have a modified > > > > configuration > > > > > and then a user passes the "-tm" flag. > > > > > > > > > > I kind of like the first proposal: Interpret "-tm" as "override > > memory > > > > > size config and set the Yarn TM container size." It would mean > > exactly > > > > > ignoring "taskmanager.memory.flink.size" and using the "-tm" value > > as " > > > > > "taskmanager.memory.process.size". > > > > > That does not sound too bad to me. > > > > > > > > > > Best, > > > > > Stephan > > > > > > > > > > > > > > > On Mon, Jan 13, 2020 at 5:35 PM Andrey Zagrebin < > > [hidden email]> > > > > > wrote: > > > > > > > > > >> Hi all, > > > > >> > > > > >> While working on changing process memory to Flink memory in > default > > > > >> configuration, Xintong encountered a problem. > > > > >> When -tm option is used to rewrite container memory, basically > > process > > > > >> memory, it can collide with the default Flink memory. > > > > >> For legacy users it should not be a problem as we adjusted the > > legacy > > > > heap > > > > >> size option to be interpreted differently for standalone and > > container > > > > >> modes. > > > > >> > > > > >> One solution could be to say in -tm docs that we rewrite both > > options > > > > >> under > > > > >> the hood: process and Flink memory, basically unset Flink memory > > from > > > > yaml > > > > >> config. > > > > >> The downside is that this adds more magic. > > > > >> > > > > >> Alternatively, we can keep process memory in default config and, > as > > > > >> mentioned before, increase it to maintain the user experience by > > > > matching > > > > >> the previous default setting for heap (now Flink in standalone) > > size. > > > > >> The Flink memory can be mentioned in process memory comment as a > > > simpler > > > > >> alternative which does not require accounting for JVM overhead. > > > > >> The downside is again more confusion while trying out Flink and > > tuning > > > > >> memory at the same time. > > > > >> On the other hand, if memory already needs to be tuned it should > > > > >> quite quickly lead to the necessity of understanding the memory > > model > > > in > > > > >> Flink. > > > > >> > > > > >> Best, > > > > >> Andrey > > > > >> > > > > >> On Thu, Jan 9, 2020 at 12:27 PM Stephan Ewen <[hidden email]> > > > wrote: > > > > >> > > > > >> > Great! Thanks, guys, for the continued effort on this topic! > > > > >> > > > > > >> > On Thu, Jan 9, 2020 at 5:27 AM Xintong Song < > > [hidden email]> > > > > >> wrote: > > > > >> > > > > > >> > > Thanks all for the discussion. I believe we have get consensus > > on > > > > all > > > > >> the > > > > >> > > open questions discussed in this thread. > > > > >> > > > > > > >> > > Since Andrey already create a jira ticket for renaming shuffle > > > > memory > > > > >> > > config keys with "taskmanager.memory.network.*", I'll create > > > ticket > > > > >> for > > > > >> > the > > > > >> > > other topic that puts flink.size in flink-conf.yaml. > > > > >> > > > > > > >> > > Thank you~ > > > > >> > > > > > > >> > > Xintong Song > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > On Wed, Jan 8, 2020 at 9:39 PM Andrey Zagrebin < > > > > [hidden email]> > > > > >> > > wrote: > > > > >> > > > > > > >> > > > It also looks to me that we should only swap network and > > memory > > > in > > > > >> the > > > > >> > > > option names: 'taskmanager.memory.network.*'. > > > > >> > > > There is no strong consensus towards using new 'shuffle' > > naming > > > so > > > > >> we > > > > >> > can > > > > >> > > > just rename it to 'taskmanager.memory.network.*' as > 'shuffle' > > > > >> naming > > > > >> > has > > > > >> > > > never been released. > > > > >> > > > When we have other shuffle services and start advertising > more > > > > this > > > > >> > > concept > > > > >> > > > in Flink, we could revisit again the whole naming for this > > > > concept. > > > > >> > > > https://jira.apache.org/jira/browse/FLINK-15517 > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > > |
Thanks all for the discussion.
We agreed to have process.size in the default settings with the explanation > of flink.size alternative in the comment. > This way we keep -tm as a shortcut to process.size only and any > inconsistencies fail fast as if configured in yaml. > The conclusions sounds good to me. I'll update the JIRA description and PR for FLINK-15530. +1 to resolve this discussion thread. We can keep the default values discussion in the "tuning FLIP-49 default configuration values" ML thread. Thank you~ Xintong Song On Wed, Jan 15, 2020 at 5:54 AM Andrey Zagrebin <[hidden email]> wrote: > Hi all, > > Stephan, Till and me had another offline discussion today. Here is the > outcome of our brainstorm. > > We agreed to have process.size in the default settings with the explanation > of flink.size alternative in the comment. > This way we keep -tm as a shortcut to process.size only and any > inconsistencies fail fast as if configured in yaml. > I will also follow-up on the thread "[Discuss] Tuning FLIP-49 configuration > default values" with a bit more details. > > If no further objections, we can conclude this last point in this > discussion. > > Best, > Andrey > > On Tue, Jan 14, 2020 at 4:28 PM Stephan Ewen <[hidden email]> wrote: > > > I think that problem exists anyways and is independent of the "-tm" > option. > > > > You can have a combination of `task.heap.size` and `managed.size` and > > `framework.heap.size` that conflicts with `flink.size`. In that case, we > > get an exception during the startup (incompatible memory configuration)? > > That is the price for having these "shortcut" options (`flink.size` and > > `process.size`). But it is a fair price, because the shortcuts are very > > valuable to most users. > > > > That is added with the "-tm" setting is that this is an easy way to get > the > > shortcut setting added, even if it was not in the config. So where a > config > > worked in standalone, it can now fail in a container setup when "-tm" is > > used. > > But that is expected, I guess, when you start manually tune different > > memory types and end up defining an inconsistent combination. And it > never > > conflicts with the default setup, only with manually tuned regions. > > > > But this example shows why we need to have log output for the logic that > > validates and configures the memory settings, so that users understand > what > > is happening. > > > > Best, > > Stephan > > > > > > On Tue, Jan 14, 2020 at 2:54 PM Till Rohrmann <[hidden email]> > > wrote: > > > > > Clearing the `flink.size` option and setting `process.size` could > indeed > > be > > > a solution. The thing I'm wondering is what would happen if the user > has > > > configured `task.heap.size` and `managed.size` instead of `flink.size`? > > > Would we also ignore these settings? If not, then we risk to run into > the > > > situation that TaskExecutorResourceUtils fails because the memory does > > not > > > add up. I guess the thing which bugs me a bit is the special casing > which > > > could lead easily into inconsistent behaviour if we don't cover all > > cases. > > > The consequence of using slightly different concepts (`flink.size`, > > > `process.size`) in standalone vs. container/Yarn/Mesos mode in order to > > > keep the status quo is an increased maintenance overhead which we > should > > be > > > aware of. > > > > > > Cheers, > > > Till > > > > > > On Tue, Jan 14, 2020 at 3:59 AM Xintong Song <[hidden email]> > > > wrote: > > > > > > > True, even we have "process.size" rather than "flink.size" in the > > default > > > > config file, user can still have "flink.size" in their custom config > > > file. > > > > If we consider "-tm" as a shortcut for users to override the TM > memory > > > > size, then it makes less sense to require users to remove > "flink.size" > > > from > > > > their config file whenever then want to use "-tm". > > > > > > > > I'm convinced that ignoring "flink.size" might not be a bad idea. > > > > And I think we should also update the description of "-tm" (in > > > > "FlinkYarnSessionCli") to explicitly mention that it would overwrite > > > > "flink.size" and "process.size". > > > > > > > > Thank you~ > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > On Tue, Jan 14, 2020 at 2:24 AM Stephan Ewen <[hidden email]> > wrote: > > > > > > > > > Would be good to hear the thoughts of some more Yarn users, though. > > > > > > > > > > On Mon, Jan 13, 2020 at 7:23 PM Stephan Ewen <[hidden email]> > > wrote: > > > > > > > > > > > I think we need an interpretation of "-tm" regardless of what is > in > > > the > > > > > > default configuration, because we can always have a modified > > > > > configuration > > > > > > and then a user passes the "-tm" flag. > > > > > > > > > > > > I kind of like the first proposal: Interpret "-tm" as "override > > > memory > > > > > > size config and set the Yarn TM container size." It would mean > > > exactly > > > > > > ignoring "taskmanager.memory.flink.size" and using the "-tm" > value > > > as " > > > > > > "taskmanager.memory.process.size". > > > > > > That does not sound too bad to me. > > > > > > > > > > > > Best, > > > > > > Stephan > > > > > > > > > > > > > > > > > > On Mon, Jan 13, 2020 at 5:35 PM Andrey Zagrebin < > > > [hidden email]> > > > > > > wrote: > > > > > > > > > > > >> Hi all, > > > > > >> > > > > > >> While working on changing process memory to Flink memory in > > default > > > > > >> configuration, Xintong encountered a problem. > > > > > >> When -tm option is used to rewrite container memory, basically > > > process > > > > > >> memory, it can collide with the default Flink memory. > > > > > >> For legacy users it should not be a problem as we adjusted the > > > legacy > > > > > heap > > > > > >> size option to be interpreted differently for standalone and > > > container > > > > > >> modes. > > > > > >> > > > > > >> One solution could be to say in -tm docs that we rewrite both > > > options > > > > > >> under > > > > > >> the hood: process and Flink memory, basically unset Flink memory > > > from > > > > > yaml > > > > > >> config. > > > > > >> The downside is that this adds more magic. > > > > > >> > > > > > >> Alternatively, we can keep process memory in default config and, > > as > > > > > >> mentioned before, increase it to maintain the user experience by > > > > > matching > > > > > >> the previous default setting for heap (now Flink in standalone) > > > size. > > > > > >> The Flink memory can be mentioned in process memory comment as a > > > > simpler > > > > > >> alternative which does not require accounting for JVM overhead. > > > > > >> The downside is again more confusion while trying out Flink and > > > tuning > > > > > >> memory at the same time. > > > > > >> On the other hand, if memory already needs to be tuned it should > > > > > >> quite quickly lead to the necessity of understanding the memory > > > model > > > > in > > > > > >> Flink. > > > > > >> > > > > > >> Best, > > > > > >> Andrey > > > > > >> > > > > > >> On Thu, Jan 9, 2020 at 12:27 PM Stephan Ewen <[hidden email]> > > > > wrote: > > > > > >> > > > > > >> > Great! Thanks, guys, for the continued effort on this topic! > > > > > >> > > > > > > >> > On Thu, Jan 9, 2020 at 5:27 AM Xintong Song < > > > [hidden email]> > > > > > >> wrote: > > > > > >> > > > > > > >> > > Thanks all for the discussion. I believe we have get > consensus > > > on > > > > > all > > > > > >> the > > > > > >> > > open questions discussed in this thread. > > > > > >> > > > > > > > >> > > Since Andrey already create a jira ticket for renaming > shuffle > > > > > memory > > > > > >> > > config keys with "taskmanager.memory.network.*", I'll create > > > > ticket > > > > > >> for > > > > > >> > the > > > > > >> > > other topic that puts flink.size in flink-conf.yaml. > > > > > >> > > > > > > > >> > > Thank you~ > > > > > >> > > > > > > > >> > > Xintong Song > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > On Wed, Jan 8, 2020 at 9:39 PM Andrey Zagrebin < > > > > > [hidden email]> > > > > > >> > > wrote: > > > > > >> > > > > > > > >> > > > It also looks to me that we should only swap network and > > > memory > > > > in > > > > > >> the > > > > > >> > > > option names: 'taskmanager.memory.network.*'. > > > > > >> > > > There is no strong consensus towards using new 'shuffle' > > > naming > > > > so > > > > > >> we > > > > > >> > can > > > > > >> > > > just rename it to 'taskmanager.memory.network.*' as > > 'shuffle' > > > > > >> naming > > > > > >> > has > > > > > >> > > > never been released. > > > > > >> > > > When we have other shuffle services and start advertising > > more > > > > > this > > > > > >> > > concept > > > > > >> > > > in Flink, we could revisit again the whole naming for this > > > > > concept. > > > > > >> > > > https://jira.apache.org/jira/browse/FLINK-15517 > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |