Hi all
I want to start the vote for FLIP-104, which proposes to add more metrics to job manager. To help everyone better understand the proposal, we spent some efforts on making an online POC previous web: http://101.132.122.69:8081/#/job-manager/config POC web: http://101.132.122.69:8081/web/#/job-manager/metrics The vote will last for at least 72 hours, following the consensus voting process. FLIP wiki: https://cwiki.apache.org/confluence/display/FLINK/FLIP-104%3A+Add+More+Metrics+to+Jobmanager Discussion thread: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html Thanks, Yadong |
Thanks Yadong,
I think we can use different color to distinguish the memory usage (from green to red?). Besides, I think we should add an unit on the "Garbage Collection" -> "Time", it's hard to know what the value mean. Would be better to display the value like "10ms", "5ns". Best, Jark On Thu, 20 Feb 2020 at 17:58, Yadong Xie <[hidden email]> wrote: > Hi all > > I want to start the vote for FLIP-104, which proposes to add more metrics > to job manager. > > To help everyone better understand the proposal, we spent some efforts on > making an online POC > > previous web: http://101.132.122.69:8081/#/job-manager/config > POC web: http://101.132.122.69:8081/web/#/job-manager/metrics > > > The vote will last for at least 72 hours, following the consensus voting > process. > > FLIP wiki: > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-104%3A+Add+More+Metrics+to+Jobmanager > > Discussion thread: > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html > > Thanks, > > Yadong > |
FYI, there's an effort planned for 1.11 to improve the memory configuration
of the Flink master process, similar to FLIP-49 but definitely less complexity. I would not consider the memory configuration improvement as a blocker for this effort. As far as I can see, there's nothing in conflict. Just after the memory configuration improvement, we might be able to present more information on the JM metrics page, which are tightly corresponding to the configuration options, like what we planned for the TM metrics page in FLIP-102. Therefore, it might make sense to proceed this FLIP afterwards. I'm neutral on this, and would leave the call to Yandong and Lining. Thank you~ Xintong Song On Fri, Feb 21, 2020 at 2:47 PM Jark Wu <[hidden email]> wrote: > Thanks Yadong, > > I think we can use different color to distinguish the memory usage (from > green to red?). > Besides, I think we should add an unit on the "Garbage Collection" -> > "Time", it's hard to know what the value mean. > Would be better to display the value like "10ms", "5ns". > > Best, > Jark > > On Thu, 20 Feb 2020 at 17:58, Yadong Xie <[hidden email]> wrote: > > > Hi all > > > > I want to start the vote for FLIP-104, which proposes to add more metrics > > to job manager. > > > > To help everyone better understand the proposal, we spent some efforts on > > making an online POC > > > > previous web: http://101.132.122.69:8081/#/job-manager/config > > POC web: http://101.132.122.69:8081/web/#/job-manager/metrics > > > > > > The vote will last for at least 72 hours, following the consensus voting > > process. > > > > FLIP wiki: > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-104%3A+Add+More+Metrics+to+Jobmanager > > > > Discussion thread: > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html > > > > Thanks, > > > > Yadong > > > |
Hi Jark
thanks for your suggestion > I think we can use different color to distinguish the memory usage (from green to red?). It is a good idea, but what is the boundary between red and green? giving a magic number boundary may mislead the users. any suggestions? > Besides, I think we should add an unit on the "Garbage Collection" -> "Time", it's hard to know what the value mean. Would be better to display the value like "10ms", "5ns". I will add the unit later, thanks for your advice. Xintong Song <[hidden email]> 于2020年2月21日周五 下午6:02写道: > FYI, there's an effort planned for 1.11 to improve the memory configuration > of the Flink master process, similar to FLIP-49 but definitely less > complexity. > > I would not consider the memory configuration improvement as a blocker for > this effort. As far as I can see, there's nothing in conflict. Just after > the memory configuration improvement, we might be able to present more > information on the JM metrics page, which are tightly corresponding to the > configuration options, like what we planned for the TM metrics page in > FLIP-102. Therefore, it might make sense to proceed this FLIP afterwards. > > I'm neutral on this, and would leave the call to Yandong and Lining. > > Thank you~ > > Xintong Song > > > > On Fri, Feb 21, 2020 at 2:47 PM Jark Wu <[hidden email]> wrote: > > > Thanks Yadong, > > > > I think we can use different color to distinguish the memory usage (from > > green to red?). > > Besides, I think we should add an unit on the "Garbage Collection" -> > > "Time", it's hard to know what the value mean. > > Would be better to display the value like "10ms", "5ns". > > > > Best, > > Jark > > > > On Thu, 20 Feb 2020 at 17:58, Yadong Xie <[hidden email]> wrote: > > > > > Hi all > > > > > > I want to start the vote for FLIP-104, which proposes to add more > metrics > > > to job manager. > > > > > > To help everyone better understand the proposal, we spent some efforts > on > > > making an online POC > > > > > > previous web: http://101.132.122.69:8081/#/job-manager/config > > > POC web: http://101.132.122.69:8081/web/#/job-manager/metrics > > > > > > > > > The vote will last for at least 72 hours, following the consensus > voting > > > process. > > > > > > FLIP wiki: > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-104%3A+Add+More+Metrics+to+Jobmanager > > > > > > Discussion thread: > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html > > > > > > Thanks, > > > > > > Yadong > > > > > > |
Hi Yadong,
> what is the boundary between red and green? Yes. I think that's the point we need to discuss. My gut feeling is "<60%" => green, "60%~80%" => yellow, ">80%" => red. But I guess directed memory is always 100%, so it is not suitable for that? Maybe @Xintong Song <[hidden email]> has a better understanding on the memory threshold. Best, Jark On Mon, 24 Feb 2020 at 15:41, Yadong Xie <[hidden email]> wrote: > Hi Jark > thanks for your suggestion > > > I think we can use different color to distinguish the memory usage (from > green to red?). > > It is a good idea, but what is the boundary between red and green? giving a > magic number boundary may mislead the users. any suggestions? > > > Besides, I think we should add an unit on the "Garbage Collection" -> > "Time", it's hard to know what the value mean. Would be better to display > the value like "10ms", "5ns". > > I will add the unit later, thanks for your advice. > > > Xintong Song <[hidden email]> 于2020年2月21日周五 下午6:02写道: > > > FYI, there's an effort planned for 1.11 to improve the memory > configuration > > of the Flink master process, similar to FLIP-49 but definitely less > > complexity. > > > > I would not consider the memory configuration improvement as a blocker > for > > this effort. As far as I can see, there's nothing in conflict. Just after > > the memory configuration improvement, we might be able to present more > > information on the JM metrics page, which are tightly corresponding to > the > > configuration options, like what we planned for the TM metrics page in > > FLIP-102. Therefore, it might make sense to proceed this FLIP afterwards. > > > > I'm neutral on this, and would leave the call to Yandong and Lining. > > > > Thank you~ > > > > Xintong Song > > > > > > > > On Fri, Feb 21, 2020 at 2:47 PM Jark Wu <[hidden email]> wrote: > > > > > Thanks Yadong, > > > > > > I think we can use different color to distinguish the memory usage > (from > > > green to red?). > > > Besides, I think we should add an unit on the "Garbage Collection" -> > > > "Time", it's hard to know what the value mean. > > > Would be better to display the value like "10ms", "5ns". > > > > > > Best, > > > Jark > > > > > > On Thu, 20 Feb 2020 at 17:58, Yadong Xie <[hidden email]> wrote: > > > > > > > Hi all > > > > > > > > I want to start the vote for FLIP-104, which proposes to add more > > metrics > > > > to job manager. > > > > > > > > To help everyone better understand the proposal, we spent some > efforts > > on > > > > making an online POC > > > > > > > > previous web: http://101.132.122.69:8081/#/job-manager/config > > > > POC web: http://101.132.122.69:8081/web/#/job-manager/metrics > > > > > > > > > > > > The vote will last for at least 72 hours, following the consensus > > voting > > > > process. > > > > > > > > FLIP wiki: > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-104%3A+Add+More+Metrics+to+Jobmanager > > > > > > > > Discussion thread: > > > > > > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html > > > > > > > > Thanks, > > > > > > > > Yadong > > > > > > > > > > |
Hi all
we have updated the POC web, and added unit to GC metrics check it here http://101.132.122.69:8081/web/#/job-manager/metrics thanks for all the response Jark Wu <[hidden email]> 于2020年2月24日周一 下午8:48写道: > Hi Yadong, > > > what is the boundary between red and green? > Yes. I think that's the point we need to discuss. My gut feeling is "<60%" > => green, "60%~80%" => yellow, ">80%" => red. > But I guess directed memory is always 100%, so it is not suitable for that? > Maybe @Xintong Song <[hidden email]> has a better understanding on > the memory threshold. > > Best, > Jark > > On Mon, 24 Feb 2020 at 15:41, Yadong Xie <[hidden email]> wrote: > > > Hi Jark > > thanks for your suggestion > > > > > I think we can use different color to distinguish the memory usage > (from > > green to red?). > > > > It is a good idea, but what is the boundary between red and green? > giving a > > magic number boundary may mislead the users. any suggestions? > > > > > Besides, I think we should add an unit on the "Garbage Collection" -> > > "Time", it's hard to know what the value mean. Would be better to display > > the value like "10ms", "5ns". > > > > I will add the unit later, thanks for your advice. > > > > > > Xintong Song <[hidden email]> 于2020年2月21日周五 下午6:02写道: > > > > > FYI, there's an effort planned for 1.11 to improve the memory > > configuration > > > of the Flink master process, similar to FLIP-49 but definitely less > > > complexity. > > > > > > I would not consider the memory configuration improvement as a blocker > > for > > > this effort. As far as I can see, there's nothing in conflict. Just > after > > > the memory configuration improvement, we might be able to present more > > > information on the JM metrics page, which are tightly corresponding to > > the > > > configuration options, like what we planned for the TM metrics page in > > > FLIP-102. Therefore, it might make sense to proceed this FLIP > afterwards. > > > > > > I'm neutral on this, and would leave the call to Yandong and Lining. > > > > > > Thank you~ > > > > > > Xintong Song > > > > > > > > > > > > On Fri, Feb 21, 2020 at 2:47 PM Jark Wu <[hidden email]> wrote: > > > > > > > Thanks Yadong, > > > > > > > > I think we can use different color to distinguish the memory usage > > (from > > > > green to red?). > > > > Besides, I think we should add an unit on the "Garbage Collection" -> > > > > "Time", it's hard to know what the value mean. > > > > Would be better to display the value like "10ms", "5ns". > > > > > > > > Best, > > > > Jark > > > > > > > > On Thu, 20 Feb 2020 at 17:58, Yadong Xie <[hidden email]> > wrote: > > > > > > > > > Hi all > > > > > > > > > > I want to start the vote for FLIP-104, which proposes to add more > > > metrics > > > > > to job manager. > > > > > > > > > > To help everyone better understand the proposal, we spent some > > efforts > > > on > > > > > making an online POC > > > > > > > > > > previous web: http://101.132.122.69:8081/#/job-manager/config > > > > > POC web: http://101.132.122.69:8081/web/#/job-manager/metrics > > > > > > > > > > > > > > > The vote will last for at least 72 hours, following the consensus > > > voting > > > > > process. > > > > > > > > > > FLIP wiki: > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-104%3A+Add+More+Metrics+to+Jobmanager > > > > > > > > > > Discussion thread: > > > > > > > > > > > > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html > > > > > > > > > > Thanks, > > > > > > > > > > Yadong > > > > > > > > > > > > > > > |
@Jark
First, let me try to clarify that, while this FLIP is about adding JM metrics, the discussion of having different colors distinguishing the memory usage applies for both JM and TM. IMO, I don't think there's a good way to define how should memory utilization be mapped to colors in general. - Direct memory - JM: ATM, we do not specify -XX:MaxDirectMemorySize. - TM: Direct memory consists of network memory and framework/task off-heap memory, the former should always be 100% while the latter may not. Therefore, the utilization of direct memory really depends on the configured size of network memory and framework/task off-heap memory. - Heap memory: We might observe that the memory usage keeps growing until GC is triggered, thus eventually the utilization might fluctuates at somewhere close to 100%. In general, a low memory utilization probably suggests that the memory size is configured too large, but a high memory utilization does not necessarily suggest the configured memory size need to be increased, thus, not sure about rendering it in red. Thank you~ Xintong Song On Tue, Feb 25, 2020 at 3:13 PM Yadong Xie <[hidden email]> wrote: > Hi all > we have updated the POC web, and added unit to GC metrics > check it here http://101.132.122.69:8081/web/#/job-manager/metrics > thanks for all the response > > Jark Wu <[hidden email]> 于2020年2月24日周一 下午8:48写道: > >> Hi Yadong, >> >> > what is the boundary between red and green? >> Yes. I think that's the point we need to discuss. My gut feeling is "<60%" >> => green, "60%~80%" => yellow, ">80%" => red. >> But I guess directed memory is always 100%, so it is not suitable for >> that? >> Maybe @Xintong Song <[hidden email]> has a better understanding on >> the memory threshold. >> >> Best, >> Jark >> >> On Mon, 24 Feb 2020 at 15:41, Yadong Xie <[hidden email]> wrote: >> >> > Hi Jark >> > thanks for your suggestion >> > >> > > I think we can use different color to distinguish the memory usage >> (from >> > green to red?). >> > >> > It is a good idea, but what is the boundary between red and green? >> giving a >> > magic number boundary may mislead the users. any suggestions? >> > >> > > Besides, I think we should add an unit on the "Garbage Collection" -> >> > "Time", it's hard to know what the value mean. Would be better to >> display >> > the value like "10ms", "5ns". >> > >> > I will add the unit later, thanks for your advice. >> > >> > >> > Xintong Song <[hidden email]> 于2020年2月21日周五 下午6:02写道: >> > >> > > FYI, there's an effort planned for 1.11 to improve the memory >> > configuration >> > > of the Flink master process, similar to FLIP-49 but definitely less >> > > complexity. >> > > >> > > I would not consider the memory configuration improvement as a blocker >> > for >> > > this effort. As far as I can see, there's nothing in conflict. Just >> after >> > > the memory configuration improvement, we might be able to present more >> > > information on the JM metrics page, which are tightly corresponding to >> > the >> > > configuration options, like what we planned for the TM metrics page in >> > > FLIP-102. Therefore, it might make sense to proceed this FLIP >> afterwards. >> > > >> > > I'm neutral on this, and would leave the call to Yandong and Lining. >> > > >> > > Thank you~ >> > > >> > > Xintong Song >> > > >> > > >> > > >> > > On Fri, Feb 21, 2020 at 2:47 PM Jark Wu <[hidden email]> wrote: >> > > >> > > > Thanks Yadong, >> > > > >> > > > I think we can use different color to distinguish the memory usage >> > (from >> > > > green to red?). >> > > > Besides, I think we should add an unit on the "Garbage Collection" >> -> >> > > > "Time", it's hard to know what the value mean. >> > > > Would be better to display the value like "10ms", "5ns". >> > > > >> > > > Best, >> > > > Jark >> > > > >> > > > On Thu, 20 Feb 2020 at 17:58, Yadong Xie <[hidden email]> >> wrote: >> > > > >> > > > > Hi all >> > > > > >> > > > > I want to start the vote for FLIP-104, which proposes to add more >> > > metrics >> > > > > to job manager. >> > > > > >> > > > > To help everyone better understand the proposal, we spent some >> > efforts >> > > on >> > > > > making an online POC >> > > > > >> > > > > previous web: http://101.132.122.69:8081/#/job-manager/config >> > > > > POC web: http://101.132.122.69:8081/web/#/job-manager/metrics >> > > > > >> > > > > >> > > > > The vote will last for at least 72 hours, following the consensus >> > > voting >> > > > > process. >> > > > > >> > > > > FLIP wiki: >> > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-104%3A+Add+More+Metrics+to+Jobmanager >> > > > > >> > > > > Discussion thread: >> > > > > >> > > > > >> > > > >> > > >> > >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html >> > > > > >> > > > > Thanks, >> > > > > >> > > > > Yadong >> > > > > >> > > > >> > > >> > >> > |
Thanks Xintong for the explanation.
The FLIP looks good to me now. +1 from my side. Best, Jark On Tue, 25 Feb 2020 at 15:46, Xintong Song <[hidden email]> wrote: > @Jark > > First, let me try to clarify that, while this FLIP is about adding JM > metrics, the discussion of having different colors distinguishing the > memory usage applies for both JM and TM. > > IMO, I don't think there's a good way to define how should memory > utilization be mapped to colors in general. > > - Direct memory > - JM: ATM, we do not specify -XX:MaxDirectMemorySize. > - TM: Direct memory consists of network memory and framework/task > off-heap memory, the former should always be 100% while the latter may not. > Therefore, the utilization of direct memory really depends on the > configured size of network memory and framework/task off-heap memory. > - Heap memory: We might observe that the memory usage keeps growing > until GC is triggered, thus eventually the utilization might fluctuates at > somewhere close to 100%. > > In general, a low memory utilization probably suggests that the memory > size is configured too large, but a high memory utilization does not > necessarily suggest the configured memory size need to be increased, thus, > not sure about rendering it in red. > > > Thank you~ > > Xintong Song > > > > On Tue, Feb 25, 2020 at 3:13 PM Yadong Xie <[hidden email]> wrote: > >> Hi all >> we have updated the POC web, and added unit to GC metrics >> check it here http://101.132.122.69:8081/web/#/job-manager/metrics >> thanks for all the response >> >> Jark Wu <[hidden email]> 于2020年2月24日周一 下午8:48写道: >> >>> Hi Yadong, >>> >>> > what is the boundary between red and green? >>> Yes. I think that's the point we need to discuss. My gut feeling is >>> "<60%" >>> => green, "60%~80%" => yellow, ">80%" => red. >>> But I guess directed memory is always 100%, so it is not suitable for >>> that? >>> Maybe @Xintong Song <[hidden email]> has a better understanding >>> on >>> the memory threshold. >>> >>> Best, >>> Jark >>> >>> On Mon, 24 Feb 2020 at 15:41, Yadong Xie <[hidden email]> wrote: >>> >>> > Hi Jark >>> > thanks for your suggestion >>> > >>> > > I think we can use different color to distinguish the memory usage >>> (from >>> > green to red?). >>> > >>> > It is a good idea, but what is the boundary between red and green? >>> giving a >>> > magic number boundary may mislead the users. any suggestions? >>> > >>> > > Besides, I think we should add an unit on the "Garbage Collection" -> >>> > "Time", it's hard to know what the value mean. Would be better to >>> display >>> > the value like "10ms", "5ns". >>> > >>> > I will add the unit later, thanks for your advice. >>> > >>> > >>> > Xintong Song <[hidden email]> 于2020年2月21日周五 下午6:02写道: >>> > >>> > > FYI, there's an effort planned for 1.11 to improve the memory >>> > configuration >>> > > of the Flink master process, similar to FLIP-49 but definitely less >>> > > complexity. >>> > > >>> > > I would not consider the memory configuration improvement as a >>> blocker >>> > for >>> > > this effort. As far as I can see, there's nothing in conflict. Just >>> after >>> > > the memory configuration improvement, we might be able to present >>> more >>> > > information on the JM metrics page, which are tightly corresponding >>> to >>> > the >>> > > configuration options, like what we planned for the TM metrics page >>> in >>> > > FLIP-102. Therefore, it might make sense to proceed this FLIP >>> afterwards. >>> > > >>> > > I'm neutral on this, and would leave the call to Yandong and Lining. >>> > > >>> > > Thank you~ >>> > > >>> > > Xintong Song >>> > > >>> > > >>> > > >>> > > On Fri, Feb 21, 2020 at 2:47 PM Jark Wu <[hidden email]> wrote: >>> > > >>> > > > Thanks Yadong, >>> > > > >>> > > > I think we can use different color to distinguish the memory usage >>> > (from >>> > > > green to red?). >>> > > > Besides, I think we should add an unit on the "Garbage Collection" >>> -> >>> > > > "Time", it's hard to know what the value mean. >>> > > > Would be better to display the value like "10ms", "5ns". >>> > > > >>> > > > Best, >>> > > > Jark >>> > > > >>> > > > On Thu, 20 Feb 2020 at 17:58, Yadong Xie <[hidden email]> >>> wrote: >>> > > > >>> > > > > Hi all >>> > > > > >>> > > > > I want to start the vote for FLIP-104, which proposes to add more >>> > > metrics >>> > > > > to job manager. >>> > > > > >>> > > > > To help everyone better understand the proposal, we spent some >>> > efforts >>> > > on >>> > > > > making an online POC >>> > > > > >>> > > > > previous web: http://101.132.122.69:8081/#/job-manager/config >>> > > > > POC web: http://101.132.122.69:8081/web/#/job-manager/metrics >>> > > > > >>> > > > > >>> > > > > The vote will last for at least 72 hours, following the consensus >>> > > voting >>> > > > > process. >>> > > > > >>> > > > > FLIP wiki: >>> > > > > >>> > > > > >>> > > > >>> > > >>> > >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-104%3A+Add+More+Metrics+to+Jobmanager >>> > > > > >>> > > > > Discussion thread: >>> > > > > >>> > > > > >>> > > > >>> > > >>> > >>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html >>> > > > > >>> > > > > Thanks, >>> > > > > >>> > > > > Yadong >>> > > > > >>> > > > >>> > > >>> > >>> >> |
Hi Yadong,
thanks for creating this FLIP. I like the idea of exposing more cluster information to the user. I share Xintong's concerns that we are about to rework the cluster entrypoint's memory management. It might make sense to wait for these changes before starting this effort. Otherwise, we might risk to do some double work. Concerning FLINK-9741, I'm not sure whether we need to fix this issue before starting this effort. The JobManager's are now running as part of the cluster entrypoint process for which we should actually report the metrics (memory usage). Cheers, Till On Tue, Feb 25, 2020 at 10:52 AM Jark Wu <[hidden email]> wrote: > Thanks Xintong for the explanation. > > The FLIP looks good to me now. +1 from my side. > > Best, > Jark > > On Tue, 25 Feb 2020 at 15:46, Xintong Song <[hidden email]> wrote: > > > @Jark > > > > First, let me try to clarify that, while this FLIP is about adding JM > > metrics, the discussion of having different colors distinguishing the > > memory usage applies for both JM and TM. > > > > IMO, I don't think there's a good way to define how should memory > > utilization be mapped to colors in general. > > > > - Direct memory > > - JM: ATM, we do not specify -XX:MaxDirectMemorySize. > > - TM: Direct memory consists of network memory and framework/task > > off-heap memory, the former should always be 100% while the latter > may not. > > Therefore, the utilization of direct memory really depends on the > > configured size of network memory and framework/task off-heap > memory. > > - Heap memory: We might observe that the memory usage keeps growing > > until GC is triggered, thus eventually the utilization might > fluctuates at > > somewhere close to 100%. > > > > In general, a low memory utilization probably suggests that the memory > > size is configured too large, but a high memory utilization does not > > necessarily suggest the configured memory size need to be increased, > thus, > > not sure about rendering it in red. > > > > > > Thank you~ > > > > Xintong Song > > > > > > > > On Tue, Feb 25, 2020 at 3:13 PM Yadong Xie <[hidden email]> wrote: > > > >> Hi all > >> we have updated the POC web, and added unit to GC metrics > >> check it here http://101.132.122.69:8081/web/#/job-manager/metrics > >> thanks for all the response > >> > >> Jark Wu <[hidden email]> 于2020年2月24日周一 下午8:48写道: > >> > >>> Hi Yadong, > >>> > >>> > what is the boundary between red and green? > >>> Yes. I think that's the point we need to discuss. My gut feeling is > >>> "<60%" > >>> => green, "60%~80%" => yellow, ">80%" => red. > >>> But I guess directed memory is always 100%, so it is not suitable for > >>> that? > >>> Maybe @Xintong Song <[hidden email]> has a better understanding > >>> on > >>> the memory threshold. > >>> > >>> Best, > >>> Jark > >>> > >>> On Mon, 24 Feb 2020 at 15:41, Yadong Xie <[hidden email]> wrote: > >>> > >>> > Hi Jark > >>> > thanks for your suggestion > >>> > > >>> > > I think we can use different color to distinguish the memory usage > >>> (from > >>> > green to red?). > >>> > > >>> > It is a good idea, but what is the boundary between red and green? > >>> giving a > >>> > magic number boundary may mislead the users. any suggestions? > >>> > > >>> > > Besides, I think we should add an unit on the "Garbage Collection" > -> > >>> > "Time", it's hard to know what the value mean. Would be better to > >>> display > >>> > the value like "10ms", "5ns". > >>> > > >>> > I will add the unit later, thanks for your advice. > >>> > > >>> > > >>> > Xintong Song <[hidden email]> 于2020年2月21日周五 下午6:02写道: > >>> > > >>> > > FYI, there's an effort planned for 1.11 to improve the memory > >>> > configuration > >>> > > of the Flink master process, similar to FLIP-49 but definitely less > >>> > > complexity. > >>> > > > >>> > > I would not consider the memory configuration improvement as a > >>> blocker > >>> > for > >>> > > this effort. As far as I can see, there's nothing in conflict. Just > >>> after > >>> > > the memory configuration improvement, we might be able to present > >>> more > >>> > > information on the JM metrics page, which are tightly corresponding > >>> to > >>> > the > >>> > > configuration options, like what we planned for the TM metrics page > >>> in > >>> > > FLIP-102. Therefore, it might make sense to proceed this FLIP > >>> afterwards. > >>> > > > >>> > > I'm neutral on this, and would leave the call to Yandong and > Lining. > >>> > > > >>> > > Thank you~ > >>> > > > >>> > > Xintong Song > >>> > > > >>> > > > >>> > > > >>> > > On Fri, Feb 21, 2020 at 2:47 PM Jark Wu <[hidden email]> wrote: > >>> > > > >>> > > > Thanks Yadong, > >>> > > > > >>> > > > I think we can use different color to distinguish the memory > usage > >>> > (from > >>> > > > green to red?). > >>> > > > Besides, I think we should add an unit on the "Garbage > Collection" > >>> -> > >>> > > > "Time", it's hard to know what the value mean. > >>> > > > Would be better to display the value like "10ms", "5ns". > >>> > > > > >>> > > > Best, > >>> > > > Jark > >>> > > > > >>> > > > On Thu, 20 Feb 2020 at 17:58, Yadong Xie <[hidden email]> > >>> wrote: > >>> > > > > >>> > > > > Hi all > >>> > > > > > >>> > > > > I want to start the vote for FLIP-104, which proposes to add > more > >>> > > metrics > >>> > > > > to job manager. > >>> > > > > > >>> > > > > To help everyone better understand the proposal, we spent some > >>> > efforts > >>> > > on > >>> > > > > making an online POC > >>> > > > > > >>> > > > > previous web: http://101.132.122.69:8081/#/job-manager/config > >>> > > > > POC web: http://101.132.122.69:8081/web/#/job-manager/metrics > >>> > > > > > >>> > > > > > >>> > > > > The vote will last for at least 72 hours, following the > consensus > >>> > > voting > >>> > > > > process. > >>> > > > > > >>> > > > > FLIP wiki: > >>> > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-104%3A+Add+More+Metrics+to+Jobmanager > >>> > > > > > >>> > > > > Discussion thread: > >>> > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html > >>> > > > > > >>> > > > > Thanks, > >>> > > > > > >>> > > > > Yadong > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >> > |
Hi till,
thanks for your reply. > Concerning FLINK-9741, I'm not sure whether we need to fix this issue > before starting this effort. The JobManager's are now running as part of > the cluster entrypoint process for which we should actually report the > metrics (memory usage). I have confirmed it with Zhu Zhu offline, as now dispatcher still with jobmanager, so it should not affect the accuracy of the metric. Till Rohrmann <[hidden email]> 于2020年2月26日周三 上午12:04写道: > Hi Yadong, > > thanks for creating this FLIP. I like the idea of exposing more > cluster information to the user. > > I share Xintong's concerns that we are about to rework the cluster > entrypoint's memory management. It might make sense to wait for these > changes before starting this effort. Otherwise, we might risk to do some > double work. > > Concerning FLINK-9741, I'm not sure whether we need to fix this issue > before starting this effort. The JobManager's are now running as part of > the cluster entrypoint process for which we should actually report the > metrics (memory usage). > > Cheers, > Till > > On Tue, Feb 25, 2020 at 10:52 AM Jark Wu <[hidden email]> wrote: > > > Thanks Xintong for the explanation. > > > > The FLIP looks good to me now. +1 from my side. > > > > Best, > > Jark > > > > On Tue, 25 Feb 2020 at 15:46, Xintong Song <[hidden email]> > wrote: > > > > > @Jark > > > > > > First, let me try to clarify that, while this FLIP is about adding JM > > > metrics, the discussion of having different colors distinguishing the > > > memory usage applies for both JM and TM. > > > > > > IMO, I don't think there's a good way to define how should memory > > > utilization be mapped to colors in general. > > > > > > - Direct memory > > > - JM: ATM, we do not specify -XX:MaxDirectMemorySize. > > > - TM: Direct memory consists of network memory and framework/task > > > off-heap memory, the former should always be 100% while the > latter > > may not. > > > Therefore, the utilization of direct memory really depends on the > > > configured size of network memory and framework/task off-heap > > memory. > > > - Heap memory: We might observe that the memory usage keeps growing > > > until GC is triggered, thus eventually the utilization might > > fluctuates at > > > somewhere close to 100%. > > > > > > In general, a low memory utilization probably suggests that the memory > > > size is configured too large, but a high memory utilization does not > > > necessarily suggest the configured memory size need to be increased, > > thus, > > > not sure about rendering it in red. > > > > > > > > > Thank you~ > > > > > > Xintong Song > > > > > > > > > > > > On Tue, Feb 25, 2020 at 3:13 PM Yadong Xie <[hidden email]> > wrote: > > > > > >> Hi all > > >> we have updated the POC web, and added unit to GC metrics > > >> check it here http://101.132.122.69:8081/web/#/job-manager/metrics > > >> thanks for all the response > > >> > > >> Jark Wu <[hidden email]> 于2020年2月24日周一 下午8:48写道: > > >> > > >>> Hi Yadong, > > >>> > > >>> > what is the boundary between red and green? > > >>> Yes. I think that's the point we need to discuss. My gut feeling is > > >>> "<60%" > > >>> => green, "60%~80%" => yellow, ">80%" => red. > > >>> But I guess directed memory is always 100%, so it is not suitable for > > >>> that? > > >>> Maybe @Xintong Song <[hidden email]> has a better > understanding > > >>> on > > >>> the memory threshold. > > >>> > > >>> Best, > > >>> Jark > > >>> > > >>> On Mon, 24 Feb 2020 at 15:41, Yadong Xie <[hidden email]> > wrote: > > >>> > > >>> > Hi Jark > > >>> > thanks for your suggestion > > >>> > > > >>> > > I think we can use different color to distinguish the memory > usage > > >>> (from > > >>> > green to red?). > > >>> > > > >>> > It is a good idea, but what is the boundary between red and green? > > >>> giving a > > >>> > magic number boundary may mislead the users. any suggestions? > > >>> > > > >>> > > Besides, I think we should add an unit on the "Garbage > Collection" > > -> > > >>> > "Time", it's hard to know what the value mean. Would be better to > > >>> display > > >>> > the value like "10ms", "5ns". > > >>> > > > >>> > I will add the unit later, thanks for your advice. > > >>> > > > >>> > > > >>> > Xintong Song <[hidden email]> 于2020年2月21日周五 下午6:02写道: > > >>> > > > >>> > > FYI, there's an effort planned for 1.11 to improve the memory > > >>> > configuration > > >>> > > of the Flink master process, similar to FLIP-49 but definitely > less > > >>> > > complexity. > > >>> > > > > >>> > > I would not consider the memory configuration improvement as a > > >>> blocker > > >>> > for > > >>> > > this effort. As far as I can see, there's nothing in conflict. > Just > > >>> after > > >>> > > the memory configuration improvement, we might be able to present > > >>> more > > >>> > > information on the JM metrics page, which are tightly > corresponding > > >>> to > > >>> > the > > >>> > > configuration options, like what we planned for the TM metrics > page > > >>> in > > >>> > > FLIP-102. Therefore, it might make sense to proceed this FLIP > > >>> afterwards. > > >>> > > > > >>> > > I'm neutral on this, and would leave the call to Yandong and > > Lining. > > >>> > > > > >>> > > Thank you~ > > >>> > > > > >>> > > Xintong Song > > >>> > > > > >>> > > > > >>> > > > > >>> > > On Fri, Feb 21, 2020 at 2:47 PM Jark Wu <[hidden email]> > wrote: > > >>> > > > > >>> > > > Thanks Yadong, > > >>> > > > > > >>> > > > I think we can use different color to distinguish the memory > > usage > > >>> > (from > > >>> > > > green to red?). > > >>> > > > Besides, I think we should add an unit on the "Garbage > > Collection" > > >>> -> > > >>> > > > "Time", it's hard to know what the value mean. > > >>> > > > Would be better to display the value like "10ms", "5ns". > > >>> > > > > > >>> > > > Best, > > >>> > > > Jark > > >>> > > > > > >>> > > > On Thu, 20 Feb 2020 at 17:58, Yadong Xie <[hidden email]> > > >>> wrote: > > >>> > > > > > >>> > > > > Hi all > > >>> > > > > > > >>> > > > > I want to start the vote for FLIP-104, which proposes to add > > more > > >>> > > metrics > > >>> > > > > to job manager. > > >>> > > > > > > >>> > > > > To help everyone better understand the proposal, we spent > some > > >>> > efforts > > >>> > > on > > >>> > > > > making an online POC > > >>> > > > > > > >>> > > > > previous web: > http://101.132.122.69:8081/#/job-manager/config > > >>> > > > > POC web: > http://101.132.122.69:8081/web/#/job-manager/metrics > > >>> > > > > > > >>> > > > > > > >>> > > > > The vote will last for at least 72 hours, following the > > consensus > > >>> > > voting > > >>> > > > > process. > > >>> > > > > > > >>> > > > > FLIP wiki: > > >>> > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-104%3A+Add+More+Metrics+to+Jobmanager > > >>> > > > > > > >>> > > > > Discussion thread: > > >>> > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html > > >>> > > > > > > >>> > > > > Thanks, > > >>> > > > > > > >>> > > > > Yadong > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >> > > > |
Hi all
There have been lots of discussions since the vote started, and FLINK-9741 has been fixed Matthias and I had updated the FLIP-104 <https://cwiki.apache.org/confluence/display/FLINK/FLIP-104%3A+Add+More+Metrics+to+Jobmanager> following the suggestions and discussions I want to cancel the vote here and start a new one, thanks lining jing <[hidden email]> 于2020年2月26日周三 下午7:33写道: > Hi till, > thanks for your reply. > > > > Concerning FLINK-9741, I'm not sure whether we need to fix this issue > > before starting this effort. The JobManager's are now running as part of > > the cluster entrypoint process for which we should actually report the > > metrics (memory usage). > > > I have confirmed it with Zhu Zhu offline, as now dispatcher still with > jobmanager, so it should not affect the accuracy of the metric. > > Till Rohrmann <[hidden email]> 于2020年2月26日周三 上午12:04写道: > > > Hi Yadong, > > > > thanks for creating this FLIP. I like the idea of exposing more > > cluster information to the user. > > > > I share Xintong's concerns that we are about to rework the cluster > > entrypoint's memory management. It might make sense to wait for these > > changes before starting this effort. Otherwise, we might risk to do some > > double work. > > > > Concerning FLINK-9741, I'm not sure whether we need to fix this issue > > before starting this effort. The JobManager's are now running as part of > > the cluster entrypoint process for which we should actually report the > > metrics (memory usage). > > > > Cheers, > > Till > > > > On Tue, Feb 25, 2020 at 10:52 AM Jark Wu <[hidden email]> wrote: > > > > > Thanks Xintong for the explanation. > > > > > > The FLIP looks good to me now. +1 from my side. > > > > > > Best, > > > Jark > > > > > > On Tue, 25 Feb 2020 at 15:46, Xintong Song <[hidden email]> > > wrote: > > > > > > > @Jark > > > > > > > > First, let me try to clarify that, while this FLIP is about adding JM > > > > metrics, the discussion of having different colors distinguishing the > > > > memory usage applies for both JM and TM. > > > > > > > > IMO, I don't think there's a good way to define how should memory > > > > utilization be mapped to colors in general. > > > > > > > > - Direct memory > > > > - JM: ATM, we do not specify -XX:MaxDirectMemorySize. > > > > - TM: Direct memory consists of network memory and > framework/task > > > > off-heap memory, the former should always be 100% while the > > latter > > > may not. > > > > Therefore, the utilization of direct memory really depends on > the > > > > configured size of network memory and framework/task off-heap > > > memory. > > > > - Heap memory: We might observe that the memory usage keeps > growing > > > > until GC is triggered, thus eventually the utilization might > > > fluctuates at > > > > somewhere close to 100%. > > > > > > > > In general, a low memory utilization probably suggests that the > memory > > > > size is configured too large, but a high memory utilization does not > > > > necessarily suggest the configured memory size need to be increased, > > > thus, > > > > not sure about rendering it in red. > > > > > > > > > > > > Thank you~ > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > On Tue, Feb 25, 2020 at 3:13 PM Yadong Xie <[hidden email]> > > wrote: > > > > > > > >> Hi all > > > >> we have updated the POC web, and added unit to GC metrics > > > >> check it here http://101.132.122.69:8081/web/#/job-manager/metrics > > > >> thanks for all the response > > > >> > > > >> Jark Wu <[hidden email]> 于2020年2月24日周一 下午8:48写道: > > > >> > > > >>> Hi Yadong, > > > >>> > > > >>> > what is the boundary between red and green? > > > >>> Yes. I think that's the point we need to discuss. My gut feeling is > > > >>> "<60%" > > > >>> => green, "60%~80%" => yellow, ">80%" => red. > > > >>> But I guess directed memory is always 100%, so it is not suitable > for > > > >>> that? > > > >>> Maybe @Xintong Song <[hidden email]> has a better > > understanding > > > >>> on > > > >>> the memory threshold. > > > >>> > > > >>> Best, > > > >>> Jark > > > >>> > > > >>> On Mon, 24 Feb 2020 at 15:41, Yadong Xie <[hidden email]> > > wrote: > > > >>> > > > >>> > Hi Jark > > > >>> > thanks for your suggestion > > > >>> > > > > >>> > > I think we can use different color to distinguish the memory > > usage > > > >>> (from > > > >>> > green to red?). > > > >>> > > > > >>> > It is a good idea, but what is the boundary between red and > green? > > > >>> giving a > > > >>> > magic number boundary may mislead the users. any suggestions? > > > >>> > > > > >>> > > Besides, I think we should add an unit on the "Garbage > > Collection" > > > -> > > > >>> > "Time", it's hard to know what the value mean. Would be better to > > > >>> display > > > >>> > the value like "10ms", "5ns". > > > >>> > > > > >>> > I will add the unit later, thanks for your advice. > > > >>> > > > > >>> > > > > >>> > Xintong Song <[hidden email]> 于2020年2月21日周五 下午6:02写道: > > > >>> > > > > >>> > > FYI, there's an effort planned for 1.11 to improve the memory > > > >>> > configuration > > > >>> > > of the Flink master process, similar to FLIP-49 but definitely > > less > > > >>> > > complexity. > > > >>> > > > > > >>> > > I would not consider the memory configuration improvement as a > > > >>> blocker > > > >>> > for > > > >>> > > this effort. As far as I can see, there's nothing in conflict. > > Just > > > >>> after > > > >>> > > the memory configuration improvement, we might be able to > present > > > >>> more > > > >>> > > information on the JM metrics page, which are tightly > > corresponding > > > >>> to > > > >>> > the > > > >>> > > configuration options, like what we planned for the TM metrics > > page > > > >>> in > > > >>> > > FLIP-102. Therefore, it might make sense to proceed this FLIP > > > >>> afterwards. > > > >>> > > > > > >>> > > I'm neutral on this, and would leave the call to Yandong and > > > Lining. > > > >>> > > > > > >>> > > Thank you~ > > > >>> > > > > > >>> > > Xintong Song > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > On Fri, Feb 21, 2020 at 2:47 PM Jark Wu <[hidden email]> > > wrote: > > > >>> > > > > > >>> > > > Thanks Yadong, > > > >>> > > > > > > >>> > > > I think we can use different color to distinguish the memory > > > usage > > > >>> > (from > > > >>> > > > green to red?). > > > >>> > > > Besides, I think we should add an unit on the "Garbage > > > Collection" > > > >>> -> > > > >>> > > > "Time", it's hard to know what the value mean. > > > >>> > > > Would be better to display the value like "10ms", "5ns". > > > >>> > > > > > > >>> > > > Best, > > > >>> > > > Jark > > > >>> > > > > > > >>> > > > On Thu, 20 Feb 2020 at 17:58, Yadong Xie < > [hidden email]> > > > >>> wrote: > > > >>> > > > > > > >>> > > > > Hi all > > > >>> > > > > > > > >>> > > > > I want to start the vote for FLIP-104, which proposes to > add > > > more > > > >>> > > metrics > > > >>> > > > > to job manager. > > > >>> > > > > > > > >>> > > > > To help everyone better understand the proposal, we spent > > some > > > >>> > efforts > > > >>> > > on > > > >>> > > > > making an online POC > > > >>> > > > > > > > >>> > > > > previous web: > > http://101.132.122.69:8081/#/job-manager/config > > > >>> > > > > POC web: > > http://101.132.122.69:8081/web/#/job-manager/metrics > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > The vote will last for at least 72 hours, following the > > > consensus > > > >>> > > voting > > > >>> > > > > process. > > > >>> > > > > > > > >>> > > > > FLIP wiki: > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-104%3A+Add+More+Metrics+to+Jobmanager > > > >>> > > > > > > > >>> > > > > Discussion thread: > > > >>> > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html > > > >>> > > > > > > > >>> > > > > Thanks, > > > >>> > > > > > > > >>> > > > > Yadong > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >> > > > > > > |
Free forum by Nabble | Edit this page |