Hi,
Lately I was debugging some weird test failures on Travis and I needed to look into metrics like: - User, System, IOWait, IRQ CPU usages (based on CPU ticks since previous check) - System wide memory consumption (including making sure that swap was disabled) - network usage - etc… Without an access to the machines itself. For this purpose I implemented some periodic daemon thread logger. Log output looked like this: https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7 <https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7> I think it would be nice to add this feature to Flink itself, by extending existing MemoryLogger. Same lack of information that I had with travis could easily happen on productional environments. The problem is that there is no easy way to obtain such kind of information without using some external libraries (think about cross platform support). I have used for that: https://github.com/oshi/oshi <https://github.com/oshi/oshi> It has some minimal additional dependencies, one thing worth noting is a JNA - it’s JAR weights ~1MB. We would have two options to add this feature: 1. Include this oshi dependency in flink-runtime 2. Wrap oshi into flink-contrib/flink-resource-logger module and make this new module an optional/dynamically loaded dependency by flink-runtime (used only if user manually copies flink-resource-logger.jar to a class path). I would lean toward 1., since that’s a powerful tool and it’s dependencies are pretty minimal (except this JNA’s jar size). What do you think? Piotrek |
What if we added these as system metrics and added a way to write metrics to a (separate?) log file?
> On Oct 4, 2017, at 10:13 AM, Piotr Nowojski <[hidden email]> wrote: > > Hi, > > Lately I was debugging some weird test failures on Travis and I needed to look into metrics like: > - User, System, IOWait, IRQ CPU usages (based on CPU ticks since previous check) > - System wide memory consumption (including making sure that swap was disabled) > - network usage > - etc… > > Without an access to the machines itself. For this purpose I implemented some periodic daemon thread logger. Log output looked like this: > > https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7 <https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7> > > I think it would be nice to add this feature to Flink itself, by extending existing MemoryLogger. Same lack of information that I had with travis could easily happen on productional environments. The problem is that there is no easy way to obtain such kind of information without using some external libraries (think about cross platform support). I have used for that: > > https://github.com/oshi/oshi <https://github.com/oshi/oshi> > > It has some minimal additional dependencies, one thing worth noting is a JNA - it’s JAR weights ~1MB. We would have two options to add this feature: > > 1. Include this oshi dependency in flink-runtime > 2. Wrap oshi into flink-contrib/flink-resource-logger module and make this new module an optional/dynamically loaded dependency by flink-runtime (used only if user manually copies flink-resource-logger.jar to a class path). > > I would lean toward 1., since that’s a powerful tool and it’s dependencies are pretty minimal (except this JNA’s jar size). What do you think? > > Piotrek |
+1 thanks for pointing this out. It makes sense to just expand those system metrics (I was not aware of them).
> On Oct 4, 2017, at 6:07 PM, Greg Hogan <[hidden email]> wrote: > > What if we added these as system metrics and added a way to write metrics to a (separate?) log file? > > >> On Oct 4, 2017, at 10:13 AM, Piotr Nowojski <[hidden email]> wrote: >> >> Hi, >> >> Lately I was debugging some weird test failures on Travis and I needed to look into metrics like: >> - User, System, IOWait, IRQ CPU usages (based on CPU ticks since previous check) >> - System wide memory consumption (including making sure that swap was disabled) >> - network usage >> - etc… >> >> Without an access to the machines itself. For this purpose I implemented some periodic daemon thread logger. Log output looked like this: >> >> https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7 <https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7> >> >> I think it would be nice to add this feature to Flink itself, by extending existing MemoryLogger. Same lack of information that I had with travis could easily happen on productional environments. The problem is that there is no easy way to obtain such kind of information without using some external libraries (think about cross platform support). I have used for that: >> >> https://github.com/oshi/oshi <https://github.com/oshi/oshi> >> >> It has some minimal additional dependencies, one thing worth noting is a JNA - it’s JAR weights ~1MB. We would have two options to add this feature: >> >> 1. Include this oshi dependency in flink-runtime >> 2. Wrap oshi into flink-contrib/flink-resource-logger module and make this new module an optional/dynamically loaded dependency by flink-runtime (used only if user manually copies flink-resource-logger.jar to a class path). >> >> I would lean toward 1., since that’s a powerful tool and it’s dependencies are pretty minimal (except this JNA’s jar size). What do you think? >> >> Piotrek |
Thanks for the proposal Piotr. I like it a lot since it will help people to
better understand their system. I would also be in favour of adding them to the system metrics. I think o.a.f.runtime.metrics.util.MetricUtils is the right place to start. Given the small dependency footprint and the compatible license, I would be in favour of option 1. Cheers, Till On Thu, Oct 5, 2017 at 11:19 AM, Piotr Nowojski <[hidden email]> wrote: > +1 thanks for pointing this out. It makes sense to just expand those > system metrics (I was not aware of them). > > > On Oct 4, 2017, at 6:07 PM, Greg Hogan <[hidden email]> wrote: > > > > What if we added these as system metrics and added a way to write > metrics to a (separate?) log file? > > > > > >> On Oct 4, 2017, at 10:13 AM, Piotr Nowojski <[hidden email]> > wrote: > >> > >> Hi, > >> > >> Lately I was debugging some weird test failures on Travis and I needed > to look into metrics like: > >> - User, System, IOWait, IRQ CPU usages (based on CPU ticks since > previous check) > >> - System wide memory consumption (including making sure that swap was > disabled) > >> - network usage > >> - etc… > >> > >> Without an access to the machines itself. For this purpose I > implemented some periodic daemon thread logger. Log output looked like this: > >> > >> https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7 < > https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7> > >> > >> I think it would be nice to add this feature to Flink itself, by > extending existing MemoryLogger. Same lack of information that I had with > travis could easily happen on productional environments. The problem is > that there is no easy way to obtain such kind of information without using > some external libraries (think about cross platform support). I have used > for that: > >> > >> https://github.com/oshi/oshi <https://github.com/oshi/oshi> > >> > >> It has some minimal additional dependencies, one thing worth noting is > a JNA - it’s JAR weights ~1MB. We would have two options to add this > feature: > >> > >> 1. Include this oshi dependency in flink-runtime > >> 2. Wrap oshi into flink-contrib/flink-resource-logger module and make > this new module an optional/dynamically loaded dependency by flink-runtime > (used only if user manually copies flink-resource-logger.jar to a class > path). > >> > >> I would lean toward 1., since that’s a powerful tool and it’s > dependencies are pretty minimal (except this JNA’s jar size). What do you > think? > >> > >> Piotrek > > |
System and processor info, marked as 'logged once' in gist shared by Piotr,
should still be logged instead of registered as metrics, right? On Thu, Oct 5, 2017 at 2:38 AM, Till Rohrmann <[hidden email]> wrote: > Thanks for the proposal Piotr. I like it a lot since it will help people to > better understand their system. I would also be in favour of adding them to > the system metrics. I think o.a.f.runtime.metrics.util.MetricUtils is the > right place to start. Given the small dependency footprint and the > compatible license, I would be in favour of option 1. > > Cheers, > Till > > > On Thu, Oct 5, 2017 at 11:19 AM, Piotr Nowojski <[hidden email]> > wrote: > > > +1 thanks for pointing this out. It makes sense to just expand those > > system metrics (I was not aware of them). > > > > > On Oct 4, 2017, at 6:07 PM, Greg Hogan <[hidden email]> wrote: > > > > > > What if we added these as system metrics and added a way to write > > metrics to a (separate?) log file? > > > > > > > > >> On Oct 4, 2017, at 10:13 AM, Piotr Nowojski <[hidden email]> > > wrote: > > >> > > >> Hi, > > >> > > >> Lately I was debugging some weird test failures on Travis and I needed > > to look into metrics like: > > >> - User, System, IOWait, IRQ CPU usages (based on CPU ticks since > > previous check) > > >> - System wide memory consumption (including making sure that swap was > > disabled) > > >> - network usage > > >> - etc… > > >> > > >> Without an access to the machines itself. For this purpose I > > implemented some periodic daemon thread logger. Log output looked like > this: > > >> > > >> https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7 < > > https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7> > > >> > > >> I think it would be nice to add this feature to Flink itself, by > > extending existing MemoryLogger. Same lack of information that I had with > > travis could easily happen on productional environments. The problem is > > that there is no easy way to obtain such kind of information without > using > > some external libraries (think about cross platform support). I have used > > for that: > > >> > > >> https://github.com/oshi/oshi <https://github.com/oshi/oshi> > > >> > > >> It has some minimal additional dependencies, one thing worth noting is > > a JNA - it’s JAR weights ~1MB. We would have two options to add this > > feature: > > >> > > >> 1. Include this oshi dependency in flink-runtime > > >> 2. Wrap oshi into flink-contrib/flink-resource-logger module and make > > this new module an optional/dynamically loaded dependency by > flink-runtime > > (used only if user manually copies flink-resource-logger.jar to a class > > path). > > >> > > >> I would lean toward 1., since that’s a powerful tool and it’s > > dependencies are pretty minimal (except this JNA’s jar size). What do you > > think? > > >> > > >> Piotrek > > > > > |
I have decided to drop this static logged once part. Those are static informations, that user can obtain in some more conventional way.
For now I have left cpu, memory, swap and network interfaces stats. Piotrek > On 5 Oct 2017, at 18:45, Bowen Li <[hidden email]> wrote: > > System and processor info, marked as 'logged once' in gist shared by Piotr, > should still be logged instead of registered as metrics, right? > > On Thu, Oct 5, 2017 at 2:38 AM, Till Rohrmann <[hidden email]> wrote: > >> Thanks for the proposal Piotr. I like it a lot since it will help people to >> better understand their system. I would also be in favour of adding them to >> the system metrics. I think o.a.f.runtime.metrics.util.MetricUtils is the >> right place to start. Given the small dependency footprint and the >> compatible license, I would be in favour of option 1. >> >> Cheers, >> Till >> >> >> On Thu, Oct 5, 2017 at 11:19 AM, Piotr Nowojski <[hidden email]> >> wrote: >> >>> +1 thanks for pointing this out. It makes sense to just expand those >>> system metrics (I was not aware of them). >>> >>>> On Oct 4, 2017, at 6:07 PM, Greg Hogan <[hidden email]> wrote: >>>> >>>> What if we added these as system metrics and added a way to write >>> metrics to a (separate?) log file? >>>> >>>> >>>>> On Oct 4, 2017, at 10:13 AM, Piotr Nowojski <[hidden email]> >>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> Lately I was debugging some weird test failures on Travis and I needed >>> to look into metrics like: >>>>> - User, System, IOWait, IRQ CPU usages (based on CPU ticks since >>> previous check) >>>>> - System wide memory consumption (including making sure that swap was >>> disabled) >>>>> - network usage >>>>> - etc… >>>>> >>>>> Without an access to the machines itself. For this purpose I >>> implemented some periodic daemon thread logger. Log output looked like >> this: >>>>> >>>>> https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7 < >>> https://gist.github.com/pnowojski/8b863abb0fb08ac75b62627feadbd2f7> >>>>> >>>>> I think it would be nice to add this feature to Flink itself, by >>> extending existing MemoryLogger. Same lack of information that I had with >>> travis could easily happen on productional environments. The problem is >>> that there is no easy way to obtain such kind of information without >> using >>> some external libraries (think about cross platform support). I have used >>> for that: >>>>> >>>>> https://github.com/oshi/oshi <https://github.com/oshi/oshi> >>>>> >>>>> It has some minimal additional dependencies, one thing worth noting is >>> a JNA - it’s JAR weights ~1MB. We would have two options to add this >>> feature: >>>>> >>>>> 1. Include this oshi dependency in flink-runtime >>>>> 2. Wrap oshi into flink-contrib/flink-resource-logger module and make >>> this new module an optional/dynamically loaded dependency by >> flink-runtime >>> (used only if user manually copies flink-resource-logger.jar to a class >>> path). >>>>> >>>>> I would lean toward 1., since that’s a powerful tool and it’s >>> dependencies are pretty minimal (except this JNA’s jar size). What do you >>> think? >>>>> >>>>> Piotrek >>> >>> >> |
Free forum by Nabble | Edit this page |