Good points, Andrey. Thanks for clarification. I made some minor
adaptations to the FLIP now: - Renamed the `resource` member into `configuration` and made it a top-level member besides `metrics` and `hardware` since it's not fitting the volatile metrics context that well. - I restructured the table under Proposed Changes to cover Metaspace now. Additionally, I renamed `shuffle` into `network` to match the memory model of FLIP-49. - The UI in the screenshot picture has a bug: The counts of Direct and Mapped are accompanied by a memory unit even though they are plain counts. On Thu, Aug 20, 2020 at 4:10 PM Andrey Zagrebin <[hidden email]> wrote: > Hi All, > > Thanks for reviving the discussion, Matthias! > > This would mean that we could adapt the current proposal to replace the > > Nonheap usage pane by a pane displaying the Metaspace usage. > > > I do not know the value of having the Nonheap usage in metrics. I can see > that the metaspace metric can be interesting for the users to debug OOMs. > We had the Nonheap usage before, so as discussed, I would be a bit careful > removing. I believe it deserves a separate poll in user ML > whether the Nonheap usage is useless or not. > As a current solution, we could keep both or merge them into one box with a > slash, like Metaspace/Nonheap -> 5Mb/10Mb, if the majority agrees that this > is not confusing and clear that the metaspace is a part of Nonheap. > That would be a good solution representing both metrics. I adapted the table in FLIP-102's Confluence accordingly for now to have it visualized. Let's see what others are thinking about it. > > Btw, the "Nonheap" in the configuration box of the current FLIP-102 is > probably incorrect or confusing as it does not one-to-one correspond to the > Nonheap JVM metric. > > The only issue I see is that JVM Overhead would still not be represented in > > the memory usage > > overview. > > My understanding is that we do not need a usage metric for JVM Overhead as > it is a virtual unmanaged component which is more about configuring the max > total process memory. > > Is there a reason for us to introduce a nested structure > > TaskManagerMetricsInfo in the response object? I would rather keep it > > consistent in a flat structure instead, i.e. having all the members of > > TaskManagerResourceInfo being members of TaskManagerMetricsInfo > > I would suggest introducing a separate REST call for > TaskManagerResourceInfo. > Semantically, TaskManagerResourceInfo is more about the TM configuration > and it is not directly related to the usage metrics. > In future, I would avoid having calls with many responsibilities and maybe > consider splitting the 'TM details' call into metrics etc unless there is a > concern for having to do more calls instead of one from UI. > Good point. The growing size of the JSON response record might make it worth splitting it up into different endpoints serving different groups of data (e.g. /metrics for volatile values and /configuration for static ones). > > Alternatively, one could think of grouping the metrics collecting the > > different values (i.e. max, used, committed) per metric in a JSON object. > > But this would apply for all the other metrics of TaskManagerMetricsInfo > > as > > well. > > I would personally prefer this for metrics but I am not pushing for this. > > metrics.resource.managedMemory and metrics.resource.networkMemory have > > counterparts in metrics.networkMemory[Used|Total] and > > metrics.managedMemory[Used|Total]: Is this redundant data or do they have > > different semantics? > > As I understand, they have different semantics. The later is about > configuration, the former is about current usage metrics. > I see. Makes sense. > > Is metrics.resource.totalProcessMemory a basic sum over all provided > > values? > > this is again about configuration, I do not think it makes sense to come up > with a usage metric for the totalProcessMemory component. > Got it. > Best, > Andrey > > > On Thu, Aug 20, 2020 at 9:06 AM Matthias <[hidden email]> wrote: > > > Hi Jing, > > I recently joined Ververica and started looking into FLIP-102. I'm trying > > to > > figure out how we would implement the proposal on the backend side. > > I looked into the proposal for the REST API response and a few questions > > popped up: > > - Is there a reason for us to introduce a nested structure > > TaskManagerMetricsInfo in the response object? I would rather keep it > > consistent in a flat structure instead, i.e. having all the members of > > TaskManagerResourceInfo being members of TaskManagerMetricsInfo. > > Alternatively, one could think of grouping the metrics collecting the > > different values (i.e. max, used, committed) per metric in a JSON object. > > But this would apply for all the other metrics of TaskManagerMetricsInfo > as > > well. > > - metrics.resource.managedMemory and metrics.resource.networkMemory have > > counterparts in metrics.networkMemory[Used|Total] and > > metrics.managedMemory[Used|Total]: Is this redundant data or do they have > > different semantics? > > - Is metrics.resource.totalProcessMemory a basic sum over all provided > > values? I see the necessity to have this member if we decide to not > provide > > the memory usage for all memory pools (e.g. providing Metaspace but > leaving > > Code Cache and Compressed Class Space as Non-Heap pools out of the > > response). Otherwise, would it be worth it to remove this member from the > > response for simplicity reasons since we could sum up the memory on the > > frontend side? > > > > Best, > > Matthias > > > > > > > > -- > > Sent from: > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ > > > -- Matthias Pohl | Engineer Follow us @VervericaData Ververica <https://www.ververica.com/> -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Yip Park Tung Jason, Jinwei (Kevin) Zhang, Karl Anton Wehner |
Hi all
there have been lots of discussions since the vote started and many suggestions have been made Matthias and I had updated the FLIP-102 <https://cwiki.apache.org/confluence/display/FLINK/FLIP-102%3A+Add+More+Metrics+to+TaskManager> following the suggestions and discussions I want to cancel the vote here and start a new one, thanks Matthias Pohl <[hidden email]> 于2020年8月21日周五 上午3:33写道: > Good points, Andrey. Thanks for clarification. I made some minor > adaptations to the FLIP now: > - Renamed the `resource` member into `configuration` and made it a > top-level member besides `metrics` and `hardware` since it's not fitting > the volatile metrics context that well. > - I restructured the table under Proposed Changes to cover Metaspace now. > Additionally, I renamed `shuffle` into `network` to match the memory model > of FLIP-49. > - The UI in the screenshot picture has a bug: The counts of Direct and > Mapped are accompanied by a memory unit even though they are plain counts. > > On Thu, Aug 20, 2020 at 4:10 PM Andrey Zagrebin <[hidden email]> > wrote: > > > Hi All, > > > > Thanks for reviving the discussion, Matthias! > > > > This would mean that we could adapt the current proposal to replace the > > > Nonheap usage pane by a pane displaying the Metaspace usage. > > > > > I do not know the value of having the Nonheap usage in metrics. I can see > > that the metaspace metric can be interesting for the users to debug OOMs. > > We had the Nonheap usage before, so as discussed, I would be a bit > careful > > removing. I believe it deserves a separate poll in user ML > > whether the Nonheap usage is useless or not. > > As a current solution, we could keep both or merge them into one box > with a > > slash, like Metaspace/Nonheap -> 5Mb/10Mb, if the majority agrees that > this > > is not confusing and clear that the metaspace is a part of Nonheap. > > > > That would be a good solution representing both metrics. I adapted the > table in FLIP-102's Confluence accordingly for now to have it visualized. > Let's see what others are thinking about it. > > > > > > Btw, the "Nonheap" in the configuration box of the current FLIP-102 is > > probably incorrect or confusing as it does not one-to-one correspond to > the > > Nonheap JVM metric. > > > > The only issue I see is that JVM Overhead would still not be represented > in > > > the memory usage > > > overview. > > > > My understanding is that we do not need a usage metric for JVM Overhead > as > > it is a virtual unmanaged component which is more about configuring the > max > > total process memory. > > > > Is there a reason for us to introduce a nested structure > > > TaskManagerMetricsInfo in the response object? I would rather keep it > > > consistent in a flat structure instead, i.e. having all the members of > > > TaskManagerResourceInfo being members of TaskManagerMetricsInfo > > > > I would suggest introducing a separate REST call for > > TaskManagerResourceInfo. > > Semantically, TaskManagerResourceInfo is more about the TM configuration > > and it is not directly related to the usage metrics. > > In future, I would avoid having calls with many responsibilities and > maybe > > consider splitting the 'TM details' call into metrics etc unless there > is a > > concern for having to do more calls instead of one from UI. > > > > Good point. The growing size of the JSON response record might make it > worth splitting it up into different endpoints serving different groups of > data (e.g. /metrics for volatile values and /configuration for static > ones). > > > > > > Alternatively, one could think of grouping the metrics collecting the > > > different values (i.e. max, used, committed) per metric in a JSON > object. > > > But this would apply for all the other metrics of > TaskManagerMetricsInfo > > > as > > > well. > > > > I would personally prefer this for metrics but I am not pushing for this. > > > > metrics.resource.managedMemory and metrics.resource.networkMemory have > > > counterparts in metrics.networkMemory[Used|Total] and > > > metrics.managedMemory[Used|Total]: Is this redundant data or do they > have > > > different semantics? > > > > As I understand, they have different semantics. The later is about > > configuration, the former is about current usage metrics. > > > > I see. Makes sense. > > > > > Is metrics.resource.totalProcessMemory a basic sum over all provided > > > values? > > > > this is again about configuration, I do not think it makes sense to come > up > > with a usage metric for the totalProcessMemory component. > > > > Got it. > > > > Best, > > Andrey > > > > > > On Thu, Aug 20, 2020 at 9:06 AM Matthias <[hidden email]> wrote: > > > > > Hi Jing, > > > I recently joined Ververica and started looking into FLIP-102. I'm > trying > > > to > > > figure out how we would implement the proposal on the backend side. > > > I looked into the proposal for the REST API response and a few > questions > > > popped up: > > > - Is there a reason for us to introduce a nested structure > > > TaskManagerMetricsInfo in the response object? I would rather keep it > > > consistent in a flat structure instead, i.e. having all the members of > > > TaskManagerResourceInfo being members of TaskManagerMetricsInfo. > > > Alternatively, one could think of grouping the metrics collecting the > > > different values (i.e. max, used, committed) per metric in a JSON > object. > > > But this would apply for all the other metrics of > TaskManagerMetricsInfo > > as > > > well. > > > - metrics.resource.managedMemory and metrics.resource.networkMemory > have > > > counterparts in metrics.networkMemory[Used|Total] and > > > metrics.managedMemory[Used|Total]: Is this redundant data or do they > have > > > different semantics? > > > - Is metrics.resource.totalProcessMemory a basic sum over all provided > > > values? I see the necessity to have this member if we decide to not > > provide > > > the memory usage for all memory pools (e.g. providing Metaspace but > > leaving > > > Code Cache and Compressed Class Space as Non-Heap pools out of the > > > response). Otherwise, would it be worth it to remove this member from > the > > > response for simplicity reasons since we could sum up the memory on the > > > frontend side? > > > > > > Best, > > > Matthias > > > > > > > > > > > > -- > > > Sent from: > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ > > > > > > > > -- > > Matthias Pohl | Engineer > > Follow us @VervericaData Ververica <https://www.ververica.com/> > > -- > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > Conference > > Stream Processing | Event Driven | Real Time > > -- > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > -- > Ververica GmbH > Registered at Amtsgericht Charlottenburg: HRB 158244 B > Managing Directors: Yip Park Tung Jason, Jinwei (Kevin) Zhang, Karl Anton > Wehner > |
Free forum by Nabble | Edit this page |