Re: [jira] [Commented] (FLINK-964) Integrate profiling code with web interface

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Commented] (FLINK-964) Integrate profiling code with web interface

Ufuk Celebi-2
This GSoC proposal [1] might also be of interest.

[1]
https://github.com/stratosphere/stratosphere/wiki/GSoC-2014-Project-Proposal-Draft-by-Rajika-Kumarasiri


On Tue, Aug 26, 2014 at 10:12 AM, Sebastian Kruse (JIRA) <[hidden email]>
wrote:

>
>     [
> https://issues.apache.org/jira/browse/FLINK-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110468#comment-14110468
> ]
>
> Sebastian Kruse commented on FLINK-964:
> ---------------------------------------
>
> Hey guys,
>
> I am happy to hear that you like it! :)
>
> But please also consider that this prototype was thought as a first spike
> and baseline for further discussion. There is a lot more profiling data
> available, e.g., stats per task manager and execution vertex. I propose to
> have a bit of a discussion about what of those data to include and how.
>
> Cheers,
> Sebastian
>
> > Integrate profiling code with web interface
> > -------------------------------------------
> >
> >                 Key: FLINK-964
> >                 URL: https://issues.apache.org/jira/browse/FLINK-964
> >             Project: Flink
> >          Issue Type: Improvement
> >          Components: Local Runtime, Webfrontend
> >    Affects Versions: 0.6-incubating
> >            Reporter: Stephan Ewen
> >            Assignee: Jonathan Hasenburg
> >
> > This issue is subject to discussion.
> > The profiling code currently needs to be kept in sync with the job graph
> code, execution graph code, and runtime code.
> > Since that part of the code is undergoing quite some changes and the
> profiling code is not used right now, I suggest to remove it, or move it to
> an artifact repository.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)
>
Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Commented] (FLINK-964) Integrate profiling code with web interface

Stephan Ewen
Very cool first prototype, I like it!

I am posting a quick summary of the status and the other ideas that have
been floating around in the context of the job profiling:

 - There is quite a bit of profiling data gathered, but I think some stuff
is also a bit out of date (for example the gate profiling does not work and
make sense any more because the internal models changed)

 - We are currently thinking to gather data stats (byte and record counts)
from the operators as well. This could go well together with the profiling.
It would be good if the profiling code was generic in the sense that it
allows to transfer arbitrary time series of metrics. It makes sense to
define scopes for these metrics, such as for example "global (cluster
profiling)", "singe machine (machine profiling)", "operator", so these
metrics would be displayed in the web frontend in the respective section.

 - The memory profiling is a bit senseless right now, because the JVMs are
always of the roughly same memory size, once ramped up. Instead, I would
add the "managed memory" of Flink.

 - I think a lot of the machine profiling code (cpu utilization, network
throughput) works currently only on Linux.


As a side note: I think it makes sense to integrate the currently separate
profiling code communication (RPC) with the regular coordination RPCs. That
is transparent (probably 50 lines) change once we have Till's changes
merged, which bases the distributed coordination on Akka.


On Tue, Aug 26, 2014 at 10:20 AM, Ufuk Celebi <[hidden email]> wrote:

> This GSoC proposal [1] might also be of interest.
>
> [1]
>
> https://github.com/stratosphere/stratosphere/wiki/GSoC-2014-Project-Proposal-Draft-by-Rajika-Kumarasiri
>
>
> On Tue, Aug 26, 2014 at 10:12 AM, Sebastian Kruse (JIRA) <[hidden email]>
> wrote:
>
> >
> >     [
> >
> https://issues.apache.org/jira/browse/FLINK-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110468#comment-14110468
> > ]
> >
> > Sebastian Kruse commented on FLINK-964:
> > ---------------------------------------
> >
> > Hey guys,
> >
> > I am happy to hear that you like it! :)
> >
> > But please also consider that this prototype was thought as a first spike
> > and baseline for further discussion. There is a lot more profiling data
> > available, e.g., stats per task manager and execution vertex. I propose
> to
> > have a bit of a discussion about what of those data to include and how.
> >
> > Cheers,
> > Sebastian
> >
> > > Integrate profiling code with web interface
> > > -------------------------------------------
> > >
> > >                 Key: FLINK-964
> > >                 URL: https://issues.apache.org/jira/browse/FLINK-964
> > >             Project: Flink
> > >          Issue Type: Improvement
> > >          Components: Local Runtime, Webfrontend
> > >    Affects Versions: 0.6-incubating
> > >            Reporter: Stephan Ewen
> > >            Assignee: Jonathan Hasenburg
> > >
> > > This issue is subject to discussion.
> > > The profiling code currently needs to be kept in sync with the job
> graph
> > code, execution graph code, and runtime code.
> > > Since that part of the code is undergoing quite some changes and the
> > profiling code is not used right now, I suggest to remove it, or move it
> to
> > an artifact repository.
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v6.2#6252)
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Commented] (FLINK-964) Integrate profiling code with web interface

Robert Metzger
Hey Ufuk and Stephan,

you've replied on dev@ to a conversation happening on JIRA. I would suggest
to re-post your messages in JIRA. (there is no automated mirroring).


-- Robert


On Tue, Aug 26, 2014 at 11:57 AM, Stephan Ewen <[hidden email]> wrote:

> Very cool first prototype, I like it!
>
> I am posting a quick summary of the status and the other ideas that have
> been floating around in the context of the job profiling:
>
>  - There is quite a bit of profiling data gathered, but I think some stuff
> is also a bit out of date (for example the gate profiling does not work and
> make sense any more because the internal models changed)
>
>  - We are currently thinking to gather data stats (byte and record counts)
> from the operators as well. This could go well together with the profiling.
> It would be good if the profiling code was generic in the sense that it
> allows to transfer arbitrary time series of metrics. It makes sense to
> define scopes for these metrics, such as for example "global (cluster
> profiling)", "singe machine (machine profiling)", "operator", so these
> metrics would be displayed in the web frontend in the respective section.
>
>  - The memory profiling is a bit senseless right now, because the JVMs are
> always of the roughly same memory size, once ramped up. Instead, I would
> add the "managed memory" of Flink.
>
>  - I think a lot of the machine profiling code (cpu utilization, network
> throughput) works currently only on Linux.
>
>
> As a side note: I think it makes sense to integrate the currently separate
> profiling code communication (RPC) with the regular coordination RPCs. That
> is transparent (probably 50 lines) change once we have Till's changes
> merged, which bases the distributed coordination on Akka.
>
>
> On Tue, Aug 26, 2014 at 10:20 AM, Ufuk Celebi <[hidden email]> wrote:
>
> > This GSoC proposal [1] might also be of interest.
> >
> > [1]
> >
> >
> https://github.com/stratosphere/stratosphere/wiki/GSoC-2014-Project-Proposal-Draft-by-Rajika-Kumarasiri
> >
> >
> > On Tue, Aug 26, 2014 at 10:12 AM, Sebastian Kruse (JIRA) <
> [hidden email]>
> > wrote:
> >
> > >
> > >     [
> > >
> >
> https://issues.apache.org/jira/browse/FLINK-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110468#comment-14110468
> > > ]
> > >
> > > Sebastian Kruse commented on FLINK-964:
> > > ---------------------------------------
> > >
> > > Hey guys,
> > >
> > > I am happy to hear that you like it! :)
> > >
> > > But please also consider that this prototype was thought as a first
> spike
> > > and baseline for further discussion. There is a lot more profiling data
> > > available, e.g., stats per task manager and execution vertex. I propose
> > to
> > > have a bit of a discussion about what of those data to include and how.
> > >
> > > Cheers,
> > > Sebastian
> > >
> > > > Integrate profiling code with web interface
> > > > -------------------------------------------
> > > >
> > > >                 Key: FLINK-964
> > > >                 URL: https://issues.apache.org/jira/browse/FLINK-964
> > > >             Project: Flink
> > > >          Issue Type: Improvement
> > > >          Components: Local Runtime, Webfrontend
> > > >    Affects Versions: 0.6-incubating
> > > >            Reporter: Stephan Ewen
> > > >            Assignee: Jonathan Hasenburg
> > > >
> > > > This issue is subject to discussion.
> > > > The profiling code currently needs to be kept in sync with the job
> > graph
> > > code, execution graph code, and runtime code.
> > > > Since that part of the code is undergoing quite some changes and the
> > > profiling code is not used right now, I suggest to remove it, or move
> it
> > to
> > > an artifact repository.
> > >
> > >
> > >
> > > --
> > > This message was sent by Atlassian JIRA
> > > (v6.2#6252)
> > >
> >
>