[PROPOSAL] Structure the Flink Open Source Development

classic Classic list List threaded Threaded
51 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[PROPOSAL] Structure the Flink Open Source Development

Stephan Ewen
Hi everyone!

We propose to establish some lightweight structures in the Flink open
source community and development process,
to help us better handle the increased interest in Flink (mailing list and
pull requests), while not overwhelming the
committers, and giving users and contributors a good experience.

This proposal is triggered by the observation that we are reaching the
limits of where the current community can support
users and guide new contributors. The below proposal is based on
observations and ideas from Till, Robert, and me.

========
Goals
========

We try to achieve the following

  - Pull requests get handled in a timely fashion
  - New contributors are better integrated into the community
  - The community feels empowered on the mailing list.
    But questions that need the attention of someone that has deep
knowledge of a certain part of Flink get their attention.
  - At the same time, the committers that are knowledgeable about many core
parts do not get completely overwhelmed.
  - We don't overlook threads that report critical issues.
  - We always have a pretty good overview of what the status of certain
parts of the system are.
      -> What are often encountered known issues
      -> What are the most frequently requested features


========
Problems
========

Looking into the process, there are two big issues:

(1) Up to now, we have been relying on the fact that everything just
"organizes itself", driven by best effort. That assumes
that everyone feels equally responsible for every part, question, and
contribution. At the current state, this is impossible
to maintain, it overwhelms the committers and contributors.

Example: Pull requests are picked up by whoever wants to pick them up. Pull
requests that are a lot of work, have little
chance of getting in, or relate to less active components are sometimes not
picked up. When contributors are pretty
loaded already, it may happen that no one eventually feels responsible to
pick up a pull request, and it falls through the cracks.

(2) There is no good overview of what are known shortcomings, efforts, and
requested features for different parts of the system.
This information exists in various peoples' heads, but is not easily
accessible for new people. The Flink JIRA is not well
maintained, it is not easy to draw insights from that.


===========
The Proposal
===========

Since we are building a parallel system, the natural solution seems to be:
partition the workload ;-)

We propose to define a set of components for Flink. Each component is
maintained or tracked by one or more
people - let's call them maintainers. It is important to note that we don't
suggest the maintainers as an authoritative role, but
simply as committers or contributors that visibly step up for a certain
component, and mainly track and drive the efforts
pertaining to that component.

It is also important to realize that we do not want to suggest that people
get less involved with certain parts and components, because
they are not the maintainers. We simply want to make sure that each pull
request or question or contribution has in the end
one person (or a small set of people) responsible for catching and tracking
it, if it was not worked on by the pro-active
community.

For some components, having multiple maintainers will be helpful. In that
case, one maintainer should be the "chair" or "lead"
and make sure that no issue of that component gets lost between the
multiple maintainers.


A maintainers' role is:
-----------------------------

  - Have an overview of which of the open pull requests relate to their
component
  - Drive the pull requests relating to the component to resolution
      => Moderate the decision whether the feature should be merged
      => Make sure the pull request gets a shepherd.
           In many cases, the maintainers would shepherd themselves.
      => In case the shepherd becomes inactive, the maintainers need to
find a new shepherd.

  - Have an overview of what are the known issues of their component
  - Have an overview of what are the frequently requested features of their
component

  - Have an overview of which contributors are doing very good work in
their component,
    would be candidates for committers, and should be mentored towards that.

  - Resolve email threads that have been brought to their attention,
because deeper
    component knowledge is required for that thread.

A maintainers' role is NOT:
----------------------------------

  - Review all pull requests of that component
  - Answer every mail with questions about that component
  - Fix all bugs and implement all features of that components


We imagine the following way that the community and the maintainers
interact:
---------------------------------------------------------------------------------------------------------

  - Pull requests should be tagged by component. Since we cannot add labels
at this point, we need
    to rely on the following:
     => The pull request opener should name the pull request like
"[FLINK-XXX] [component] Title"
     => Components can be (re) tagged by adding special comments in the
pull request ("==> component client")
     => With some luck, GitHub and Apache Infra will allow us to use labels
at some point

  - When pull requests are associated with a component, the maintainers
will manage them
    (decision whether to add, find shepherd, catch dropped pull requests)

  - We assume that maintainers frequently reach out to other community
members and ask them if they want
    to shepherd a pull request.

  - On the mailing list, everyone should feel equally empowered to answer
and discuss.
    If at some point in the discussion, some deep technical knowledge about
a component is required,
    the maintainer(s) should be drawn into the discussion.
    Because the Mailing List infrastructure has no support to tag threads,
here are some simple workarounds:

    => One possibility is to put the maintainers' mail addresses on cc for
the thread, so they get the mail
          not just via l the mailing list
    => Another way would be to post something like "+maintainer runtime" in
the thread and the "runtime"
         maintainers would have a filter/alert on these keywords in their
mail program.

  - We assume that maintainers will reach out to community members that are
very active and helpful in
    a component, and will ask them if they want to be added as maintainers.
    That will make it visible that those people are experts for that part
of Flink.


======================================
Maintainers: Committers and Contributors
======================================

It helps if maintainers are committers (since we want them to resolve pull
requests which often involves
merging them).

Components with multiple maintainers can easily have non-committer
contributors in addition to committer
contributors.


======
JIRA
======

Ideally, JIRA can be used to get an overview of what are the known issues
of each component, and what are
common feature requests. Unfortunately, the Flink JIRA is quite unorganized
right now.

A natural followup effort of this proposal would be to define in JIRA the
same components as we defined here,
and have the maintainers keep JIRA meaningful for that particular
component. That would allow us to
easily generate some tables out of JIRA (like top known issues per
component, most requested features)
post them on the dev list once in a while as a "state of the union" report.

Initial assignment of issues to components should be made by those people
opening the issue. The maintainer
of that tagged component needs to change the tag, if the component was
classified incorrectly.


======================================
Initial Components and Maintainers Suggestion
======================================

Below is a suggestion of how to define components for Flink. One goal of
the division was to make it
obvious for the majority of questions and contributions to which component
they would relate. Otherwise,
if many contributions had fuzzy component associations, we would again not
solve the issue of having clear
responsibilities for who would track the progress and resolution.

We also looked at each component and wrote the names of some people who we
thought were natural
experts for the components, and thus natural candidates for maintainers.

**These names are only a starting point for discussion.**

Once agreed upon, the components and names of maintainers should be kept in
the wiki and updated as
components change and people step up or down.


*DataSet API* (*Fabian, Greg, Gabor*)
  - Incuding Hadoop compat. parts

*DataStream API* (*Aljoscha, Max, Stephan*)

*Runtime*
  - Distributed Coordination (JobManager/TaskManager, Akka)  (*Till*)
  - Local Runtime (Memory Management, State Backends, Tasks/Operators) (
*Stephan*)
  - Network (*Ufuk*)

*Client/Optimizer* (*Fabian*)

*Type system / Type extractor* (Timo)

*Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)

*Libraries*
  - Gelly (*Vasia, Greg*)
  - ML (*Till, Theo*)
  - CEP (*Till*)
  - Python (*Chesnay*)

*Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)

*Streaming Connectors* (*Robert*, *Aljoscha*)

*Batch Connectors and Input/Output Formats* (*Chesnay*)

*Storm Compatibility Layer* (*Mathias*)

*Scala shell* (*Till*)

*Startup Shell Scripts* (Ufuk)

*Flink Build System, Maven Files* (*Robert*)

*Documentation* (Ufuk)


Please let us know what you think about this proposal.
Happy discussing!

Greetings,
Stephan
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Structure the Flink Open Source Development

Gábor Gévay
Hello,

There are at least three Gábors in the Flink community,  :) so
assuming that the Gábor in the list of maintainers of the DataSet API
is referring to me, I'll be happy to do it. :)

Best,
Gábor G.



2016-05-10 11:24 GMT+02:00 Stephan Ewen <[hidden email]>:

> Hi everyone!
>
> We propose to establish some lightweight structures in the Flink open
> source community and development process,
> to help us better handle the increased interest in Flink (mailing list and
> pull requests), while not overwhelming the
> committers, and giving users and contributors a good experience.
>
> This proposal is triggered by the observation that we are reaching the
> limits of where the current community can support
> users and guide new contributors. The below proposal is based on
> observations and ideas from Till, Robert, and me.
>
> ========
> Goals
> ========
>
> We try to achieve the following
>
>   - Pull requests get handled in a timely fashion
>   - New contributors are better integrated into the community
>   - The community feels empowered on the mailing list.
>     But questions that need the attention of someone that has deep
> knowledge of a certain part of Flink get their attention.
>   - At the same time, the committers that are knowledgeable about many core
> parts do not get completely overwhelmed.
>   - We don't overlook threads that report critical issues.
>   - We always have a pretty good overview of what the status of certain
> parts of the system are.
>       -> What are often encountered known issues
>       -> What are the most frequently requested features
>
>
> ========
> Problems
> ========
>
> Looking into the process, there are two big issues:
>
> (1) Up to now, we have been relying on the fact that everything just
> "organizes itself", driven by best effort. That assumes
> that everyone feels equally responsible for every part, question, and
> contribution. At the current state, this is impossible
> to maintain, it overwhelms the committers and contributors.
>
> Example: Pull requests are picked up by whoever wants to pick them up. Pull
> requests that are a lot of work, have little
> chance of getting in, or relate to less active components are sometimes not
> picked up. When contributors are pretty
> loaded already, it may happen that no one eventually feels responsible to
> pick up a pull request, and it falls through the cracks.
>
> (2) There is no good overview of what are known shortcomings, efforts, and
> requested features for different parts of the system.
> This information exists in various peoples' heads, but is not easily
> accessible for new people. The Flink JIRA is not well
> maintained, it is not easy to draw insights from that.
>
>
> ===========
> The Proposal
> ===========
>
> Since we are building a parallel system, the natural solution seems to be:
> partition the workload ;-)
>
> We propose to define a set of components for Flink. Each component is
> maintained or tracked by one or more
> people - let's call them maintainers. It is important to note that we don't
> suggest the maintainers as an authoritative role, but
> simply as committers or contributors that visibly step up for a certain
> component, and mainly track and drive the efforts
> pertaining to that component.
>
> It is also important to realize that we do not want to suggest that people
> get less involved with certain parts and components, because
> they are not the maintainers. We simply want to make sure that each pull
> request or question or contribution has in the end
> one person (or a small set of people) responsible for catching and tracking
> it, if it was not worked on by the pro-active
> community.
>
> For some components, having multiple maintainers will be helpful. In that
> case, one maintainer should be the "chair" or "lead"
> and make sure that no issue of that component gets lost between the
> multiple maintainers.
>
>
> A maintainers' role is:
> -----------------------------
>
>   - Have an overview of which of the open pull requests relate to their
> component
>   - Drive the pull requests relating to the component to resolution
>       => Moderate the decision whether the feature should be merged
>       => Make sure the pull request gets a shepherd.
>            In many cases, the maintainers would shepherd themselves.
>       => In case the shepherd becomes inactive, the maintainers need to
> find a new shepherd.
>
>   - Have an overview of what are the known issues of their component
>   - Have an overview of what are the frequently requested features of their
> component
>
>   - Have an overview of which contributors are doing very good work in
> their component,
>     would be candidates for committers, and should be mentored towards that.
>
>   - Resolve email threads that have been brought to their attention,
> because deeper
>     component knowledge is required for that thread.
>
> A maintainers' role is NOT:
> ----------------------------------
>
>   - Review all pull requests of that component
>   - Answer every mail with questions about that component
>   - Fix all bugs and implement all features of that components
>
>
> We imagine the following way that the community and the maintainers
> interact:
> ---------------------------------------------------------------------------------------------------------
>
>   - Pull requests should be tagged by component. Since we cannot add labels
> at this point, we need
>     to rely on the following:
>      => The pull request opener should name the pull request like
> "[FLINK-XXX] [component] Title"
>      => Components can be (re) tagged by adding special comments in the
> pull request ("==> component client")
>      => With some luck, GitHub and Apache Infra will allow us to use labels
> at some point
>
>   - When pull requests are associated with a component, the maintainers
> will manage them
>     (decision whether to add, find shepherd, catch dropped pull requests)
>
>   - We assume that maintainers frequently reach out to other community
> members and ask them if they want
>     to shepherd a pull request.
>
>   - On the mailing list, everyone should feel equally empowered to answer
> and discuss.
>     If at some point in the discussion, some deep technical knowledge about
> a component is required,
>     the maintainer(s) should be drawn into the discussion.
>     Because the Mailing List infrastructure has no support to tag threads,
> here are some simple workarounds:
>
>     => One possibility is to put the maintainers' mail addresses on cc for
> the thread, so they get the mail
>           not just via l the mailing list
>     => Another way would be to post something like "+maintainer runtime" in
> the thread and the "runtime"
>          maintainers would have a filter/alert on these keywords in their
> mail program.
>
>   - We assume that maintainers will reach out to community members that are
> very active and helpful in
>     a component, and will ask them if they want to be added as maintainers.
>     That will make it visible that those people are experts for that part
> of Flink.
>
>
> ======================================
> Maintainers: Committers and Contributors
> ======================================
>
> It helps if maintainers are committers (since we want them to resolve pull
> requests which often involves
> merging them).
>
> Components with multiple maintainers can easily have non-committer
> contributors in addition to committer
> contributors.
>
>
> ======
> JIRA
> ======
>
> Ideally, JIRA can be used to get an overview of what are the known issues
> of each component, and what are
> common feature requests. Unfortunately, the Flink JIRA is quite unorganized
> right now.
>
> A natural followup effort of this proposal would be to define in JIRA the
> same components as we defined here,
> and have the maintainers keep JIRA meaningful for that particular
> component. That would allow us to
> easily generate some tables out of JIRA (like top known issues per
> component, most requested features)
> post them on the dev list once in a while as a "state of the union" report.
>
> Initial assignment of issues to components should be made by those people
> opening the issue. The maintainer
> of that tagged component needs to change the tag, if the component was
> classified incorrectly.
>
>
> ======================================
> Initial Components and Maintainers Suggestion
> ======================================
>
> Below is a suggestion of how to define components for Flink. One goal of
> the division was to make it
> obvious for the majority of questions and contributions to which component
> they would relate. Otherwise,
> if many contributions had fuzzy component associations, we would again not
> solve the issue of having clear
> responsibilities for who would track the progress and resolution.
>
> We also looked at each component and wrote the names of some people who we
> thought were natural
> experts for the components, and thus natural candidates for maintainers.
>
> **These names are only a starting point for discussion.**
>
> Once agreed upon, the components and names of maintainers should be kept in
> the wiki and updated as
> components change and people step up or down.
>
>
> *DataSet API* (*Fabian, Greg, Gabor*)
>   - Incuding Hadoop compat. parts
>
> *DataStream API* (*Aljoscha, Max, Stephan*)
>
> *Runtime*
>   - Distributed Coordination (JobManager/TaskManager, Akka)  (*Till*)
>   - Local Runtime (Memory Management, State Backends, Tasks/Operators) (
> *Stephan*)
>   - Network (*Ufuk*)
>
> *Client/Optimizer* (*Fabian*)
>
> *Type system / Type extractor* (Timo)
>
> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)
>
> *Libraries*
>   - Gelly (*Vasia, Greg*)
>   - ML (*Till, Theo*)
>   - CEP (*Till*)
>   - Python (*Chesnay*)
>
> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
>
> *Streaming Connectors* (*Robert*, *Aljoscha*)
>
> *Batch Connectors and Input/Output Formats* (*Chesnay*)
>
> *Storm Compatibility Layer* (*Mathias*)
>
> *Scala shell* (*Till*)
>
> *Startup Shell Scripts* (Ufuk)
>
> *Flink Build System, Maven Files* (*Robert*)
>
> *Documentation* (Ufuk)
>
>
> Please let us know what you think about this proposal.
> Happy discussing!
>
> Greetings,
> Stephan
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Structure the Flink Open Source Development

Márton Balassi
+1 for the proposal
@ggevay: I do think that it refers to you. :)

On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <[hidden email]> wrote:

> Hello,
>
> There are at least three Gábors in the Flink community,  :) so
> assuming that the Gábor in the list of maintainers of the DataSet API
> is referring to me, I'll be happy to do it. :)
>
> Best,
> Gábor G.
>
>
>
> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <[hidden email]>:
> > Hi everyone!
> >
> > We propose to establish some lightweight structures in the Flink open
> > source community and development process,
> > to help us better handle the increased interest in Flink (mailing list
> and
> > pull requests), while not overwhelming the
> > committers, and giving users and contributors a good experience.
> >
> > This proposal is triggered by the observation that we are reaching the
> > limits of where the current community can support
> > users and guide new contributors. The below proposal is based on
> > observations and ideas from Till, Robert, and me.
> >
> > ========
> > Goals
> > ========
> >
> > We try to achieve the following
> >
> >   - Pull requests get handled in a timely fashion
> >   - New contributors are better integrated into the community
> >   - The community feels empowered on the mailing list.
> >     But questions that need the attention of someone that has deep
> > knowledge of a certain part of Flink get their attention.
> >   - At the same time, the committers that are knowledgeable about many
> core
> > parts do not get completely overwhelmed.
> >   - We don't overlook threads that report critical issues.
> >   - We always have a pretty good overview of what the status of certain
> > parts of the system are.
> >       -> What are often encountered known issues
> >       -> What are the most frequently requested features
> >
> >
> > ========
> > Problems
> > ========
> >
> > Looking into the process, there are two big issues:
> >
> > (1) Up to now, we have been relying on the fact that everything just
> > "organizes itself", driven by best effort. That assumes
> > that everyone feels equally responsible for every part, question, and
> > contribution. At the current state, this is impossible
> > to maintain, it overwhelms the committers and contributors.
> >
> > Example: Pull requests are picked up by whoever wants to pick them up.
> Pull
> > requests that are a lot of work, have little
> > chance of getting in, or relate to less active components are sometimes
> not
> > picked up. When contributors are pretty
> > loaded already, it may happen that no one eventually feels responsible to
> > pick up a pull request, and it falls through the cracks.
> >
> > (2) There is no good overview of what are known shortcomings, efforts,
> and
> > requested features for different parts of the system.
> > This information exists in various peoples' heads, but is not easily
> > accessible for new people. The Flink JIRA is not well
> > maintained, it is not easy to draw insights from that.
> >
> >
> > ===========
> > The Proposal
> > ===========
> >
> > Since we are building a parallel system, the natural solution seems to
> be:
> > partition the workload ;-)
> >
> > We propose to define a set of components for Flink. Each component is
> > maintained or tracked by one or more
> > people - let's call them maintainers. It is important to note that we
> don't
> > suggest the maintainers as an authoritative role, but
> > simply as committers or contributors that visibly step up for a certain
> > component, and mainly track and drive the efforts
> > pertaining to that component.
> >
> > It is also important to realize that we do not want to suggest that
> people
> > get less involved with certain parts and components, because
> > they are not the maintainers. We simply want to make sure that each pull
> > request or question or contribution has in the end
> > one person (or a small set of people) responsible for catching and
> tracking
> > it, if it was not worked on by the pro-active
> > community.
> >
> > For some components, having multiple maintainers will be helpful. In that
> > case, one maintainer should be the "chair" or "lead"
> > and make sure that no issue of that component gets lost between the
> > multiple maintainers.
> >
> >
> > A maintainers' role is:
> > -----------------------------
> >
> >   - Have an overview of which of the open pull requests relate to their
> > component
> >   - Drive the pull requests relating to the component to resolution
> >       => Moderate the decision whether the feature should be merged
> >       => Make sure the pull request gets a shepherd.
> >            In many cases, the maintainers would shepherd themselves.
> >       => In case the shepherd becomes inactive, the maintainers need to
> > find a new shepherd.
> >
> >   - Have an overview of what are the known issues of their component
> >   - Have an overview of what are the frequently requested features of
> their
> > component
> >
> >   - Have an overview of which contributors are doing very good work in
> > their component,
> >     would be candidates for committers, and should be mentored towards
> that.
> >
> >   - Resolve email threads that have been brought to their attention,
> > because deeper
> >     component knowledge is required for that thread.
> >
> > A maintainers' role is NOT:
> > ----------------------------------
> >
> >   - Review all pull requests of that component
> >   - Answer every mail with questions about that component
> >   - Fix all bugs and implement all features of that components
> >
> >
> > We imagine the following way that the community and the maintainers
> > interact:
> >
> ---------------------------------------------------------------------------------------------------------
> >
> >   - Pull requests should be tagged by component. Since we cannot add
> labels
> > at this point, we need
> >     to rely on the following:
> >      => The pull request opener should name the pull request like
> > "[FLINK-XXX] [component] Title"
> >      => Components can be (re) tagged by adding special comments in the
> > pull request ("==> component client")
> >      => With some luck, GitHub and Apache Infra will allow us to use
> labels
> > at some point
> >
> >   - When pull requests are associated with a component, the maintainers
> > will manage them
> >     (decision whether to add, find shepherd, catch dropped pull requests)
> >
> >   - We assume that maintainers frequently reach out to other community
> > members and ask them if they want
> >     to shepherd a pull request.
> >
> >   - On the mailing list, everyone should feel equally empowered to answer
> > and discuss.
> >     If at some point in the discussion, some deep technical knowledge
> about
> > a component is required,
> >     the maintainer(s) should be drawn into the discussion.
> >     Because the Mailing List infrastructure has no support to tag
> threads,
> > here are some simple workarounds:
> >
> >     => One possibility is to put the maintainers' mail addresses on cc
> for
> > the thread, so they get the mail
> >           not just via l the mailing list
> >     => Another way would be to post something like "+maintainer runtime"
> in
> > the thread and the "runtime"
> >          maintainers would have a filter/alert on these keywords in their
> > mail program.
> >
> >   - We assume that maintainers will reach out to community members that
> are
> > very active and helpful in
> >     a component, and will ask them if they want to be added as
> maintainers.
> >     That will make it visible that those people are experts for that part
> > of Flink.
> >
> >
> > ======================================
> > Maintainers: Committers and Contributors
> > ======================================
> >
> > It helps if maintainers are committers (since we want them to resolve
> pull
> > requests which often involves
> > merging them).
> >
> > Components with multiple maintainers can easily have non-committer
> > contributors in addition to committer
> > contributors.
> >
> >
> > ======
> > JIRA
> > ======
> >
> > Ideally, JIRA can be used to get an overview of what are the known issues
> > of each component, and what are
> > common feature requests. Unfortunately, the Flink JIRA is quite
> unorganized
> > right now.
> >
> > A natural followup effort of this proposal would be to define in JIRA the
> > same components as we defined here,
> > and have the maintainers keep JIRA meaningful for that particular
> > component. That would allow us to
> > easily generate some tables out of JIRA (like top known issues per
> > component, most requested features)
> > post them on the dev list once in a while as a "state of the union"
> report.
> >
> > Initial assignment of issues to components should be made by those people
> > opening the issue. The maintainer
> > of that tagged component needs to change the tag, if the component was
> > classified incorrectly.
> >
> >
> > ======================================
> > Initial Components and Maintainers Suggestion
> > ======================================
> >
> > Below is a suggestion of how to define components for Flink. One goal of
> > the division was to make it
> > obvious for the majority of questions and contributions to which
> component
> > they would relate. Otherwise,
> > if many contributions had fuzzy component associations, we would again
> not
> > solve the issue of having clear
> > responsibilities for who would track the progress and resolution.
> >
> > We also looked at each component and wrote the names of some people who
> we
> > thought were natural
> > experts for the components, and thus natural candidates for maintainers.
> >
> > **These names are only a starting point for discussion.**
> >
> > Once agreed upon, the components and names of maintainers should be kept
> in
> > the wiki and updated as
> > components change and people step up or down.
> >
> >
> > *DataSet API* (*Fabian, Greg, Gabor*)
> >   - Incuding Hadoop compat. parts
> >
> > *DataStream API* (*Aljoscha, Max, Stephan*)
> >
> > *Runtime*
> >   - Distributed Coordination (JobManager/TaskManager, Akka)  (*Till*)
> >   - Local Runtime (Memory Management, State Backends, Tasks/Operators) (
> > *Stephan*)
> >   - Network (*Ufuk*)
> >
> > *Client/Optimizer* (*Fabian*)
> >
> > *Type system / Type extractor* (Timo)
> >
> > *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)
> >
> > *Libraries*
> >   - Gelly (*Vasia, Greg*)
> >   - ML (*Till, Theo*)
> >   - CEP (*Till*)
> >   - Python (*Chesnay*)
> >
> > *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
> >
> > *Streaming Connectors* (*Robert*, *Aljoscha*)
> >
> > *Batch Connectors and Input/Output Formats* (*Chesnay*)
> >
> > *Storm Compatibility Layer* (*Mathias*)
> >
> > *Scala shell* (*Till*)
> >
> > *Startup Shell Scripts* (Ufuk)
> >
> > *Flink Build System, Maven Files* (*Robert*)
> >
> > *Documentation* (Ufuk)
> >
> >
> > Please let us know what you think about this proposal.
> > Happy discussing!
> >
> > Greetings,
> > Stephan
>
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Structure the Flink Open Source Development

Stephan Ewen
Yes, Gabor Gevay, that did refer to you!

Sorry for the ambiguity...

On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <[hidden email]>
wrote:

> +1 for the proposal
> @ggevay: I do think that it refers to you. :)
>
> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <[hidden email]> wrote:
>
> > Hello,
> >
> > There are at least three Gábors in the Flink community,  :) so
> > assuming that the Gábor in the list of maintainers of the DataSet API
> > is referring to me, I'll be happy to do it. :)
> >
> > Best,
> > Gábor G.
> >
> >
> >
> > 2016-05-10 11:24 GMT+02:00 Stephan Ewen <[hidden email]>:
> > > Hi everyone!
> > >
> > > We propose to establish some lightweight structures in the Flink open
> > > source community and development process,
> > > to help us better handle the increased interest in Flink (mailing list
> > and
> > > pull requests), while not overwhelming the
> > > committers, and giving users and contributors a good experience.
> > >
> > > This proposal is triggered by the observation that we are reaching the
> > > limits of where the current community can support
> > > users and guide new contributors. The below proposal is based on
> > > observations and ideas from Till, Robert, and me.
> > >
> > > ========
> > > Goals
> > > ========
> > >
> > > We try to achieve the following
> > >
> > >   - Pull requests get handled in a timely fashion
> > >   - New contributors are better integrated into the community
> > >   - The community feels empowered on the mailing list.
> > >     But questions that need the attention of someone that has deep
> > > knowledge of a certain part of Flink get their attention.
> > >   - At the same time, the committers that are knowledgeable about many
> > core
> > > parts do not get completely overwhelmed.
> > >   - We don't overlook threads that report critical issues.
> > >   - We always have a pretty good overview of what the status of certain
> > > parts of the system are.
> > >       -> What are often encountered known issues
> > >       -> What are the most frequently requested features
> > >
> > >
> > > ========
> > > Problems
> > > ========
> > >
> > > Looking into the process, there are two big issues:
> > >
> > > (1) Up to now, we have been relying on the fact that everything just
> > > "organizes itself", driven by best effort. That assumes
> > > that everyone feels equally responsible for every part, question, and
> > > contribution. At the current state, this is impossible
> > > to maintain, it overwhelms the committers and contributors.
> > >
> > > Example: Pull requests are picked up by whoever wants to pick them up.
> > Pull
> > > requests that are a lot of work, have little
> > > chance of getting in, or relate to less active components are sometimes
> > not
> > > picked up. When contributors are pretty
> > > loaded already, it may happen that no one eventually feels responsible
> to
> > > pick up a pull request, and it falls through the cracks.
> > >
> > > (2) There is no good overview of what are known shortcomings, efforts,
> > and
> > > requested features for different parts of the system.
> > > This information exists in various peoples' heads, but is not easily
> > > accessible for new people. The Flink JIRA is not well
> > > maintained, it is not easy to draw insights from that.
> > >
> > >
> > > ===========
> > > The Proposal
> > > ===========
> > >
> > > Since we are building a parallel system, the natural solution seems to
> > be:
> > > partition the workload ;-)
> > >
> > > We propose to define a set of components for Flink. Each component is
> > > maintained or tracked by one or more
> > > people - let's call them maintainers. It is important to note that we
> > don't
> > > suggest the maintainers as an authoritative role, but
> > > simply as committers or contributors that visibly step up for a certain
> > > component, and mainly track and drive the efforts
> > > pertaining to that component.
> > >
> > > It is also important to realize that we do not want to suggest that
> > people
> > > get less involved with certain parts and components, because
> > > they are not the maintainers. We simply want to make sure that each
> pull
> > > request or question or contribution has in the end
> > > one person (or a small set of people) responsible for catching and
> > tracking
> > > it, if it was not worked on by the pro-active
> > > community.
> > >
> > > For some components, having multiple maintainers will be helpful. In
> that
> > > case, one maintainer should be the "chair" or "lead"
> > > and make sure that no issue of that component gets lost between the
> > > multiple maintainers.
> > >
> > >
> > > A maintainers' role is:
> > > -----------------------------
> > >
> > >   - Have an overview of which of the open pull requests relate to their
> > > component
> > >   - Drive the pull requests relating to the component to resolution
> > >       => Moderate the decision whether the feature should be merged
> > >       => Make sure the pull request gets a shepherd.
> > >            In many cases, the maintainers would shepherd themselves.
> > >       => In case the shepherd becomes inactive, the maintainers need to
> > > find a new shepherd.
> > >
> > >   - Have an overview of what are the known issues of their component
> > >   - Have an overview of what are the frequently requested features of
> > their
> > > component
> > >
> > >   - Have an overview of which contributors are doing very good work in
> > > their component,
> > >     would be candidates for committers, and should be mentored towards
> > that.
> > >
> > >   - Resolve email threads that have been brought to their attention,
> > > because deeper
> > >     component knowledge is required for that thread.
> > >
> > > A maintainers' role is NOT:
> > > ----------------------------------
> > >
> > >   - Review all pull requests of that component
> > >   - Answer every mail with questions about that component
> > >   - Fix all bugs and implement all features of that components
> > >
> > >
> > > We imagine the following way that the community and the maintainers
> > > interact:
> > >
> >
> ---------------------------------------------------------------------------------------------------------
> > >
> > >   - Pull requests should be tagged by component. Since we cannot add
> > labels
> > > at this point, we need
> > >     to rely on the following:
> > >      => The pull request opener should name the pull request like
> > > "[FLINK-XXX] [component] Title"
> > >      => Components can be (re) tagged by adding special comments in the
> > > pull request ("==> component client")
> > >      => With some luck, GitHub and Apache Infra will allow us to use
> > labels
> > > at some point
> > >
> > >   - When pull requests are associated with a component, the maintainers
> > > will manage them
> > >     (decision whether to add, find shepherd, catch dropped pull
> requests)
> > >
> > >   - We assume that maintainers frequently reach out to other community
> > > members and ask them if they want
> > >     to shepherd a pull request.
> > >
> > >   - On the mailing list, everyone should feel equally empowered to
> answer
> > > and discuss.
> > >     If at some point in the discussion, some deep technical knowledge
> > about
> > > a component is required,
> > >     the maintainer(s) should be drawn into the discussion.
> > >     Because the Mailing List infrastructure has no support to tag
> > threads,
> > > here are some simple workarounds:
> > >
> > >     => One possibility is to put the maintainers' mail addresses on cc
> > for
> > > the thread, so they get the mail
> > >           not just via l the mailing list
> > >     => Another way would be to post something like "+maintainer
> runtime"
> > in
> > > the thread and the "runtime"
> > >          maintainers would have a filter/alert on these keywords in
> their
> > > mail program.
> > >
> > >   - We assume that maintainers will reach out to community members that
> > are
> > > very active and helpful in
> > >     a component, and will ask them if they want to be added as
> > maintainers.
> > >     That will make it visible that those people are experts for that
> part
> > > of Flink.
> > >
> > >
> > > ======================================
> > > Maintainers: Committers and Contributors
> > > ======================================
> > >
> > > It helps if maintainers are committers (since we want them to resolve
> > pull
> > > requests which often involves
> > > merging them).
> > >
> > > Components with multiple maintainers can easily have non-committer
> > > contributors in addition to committer
> > > contributors.
> > >
> > >
> > > ======
> > > JIRA
> > > ======
> > >
> > > Ideally, JIRA can be used to get an overview of what are the known
> issues
> > > of each component, and what are
> > > common feature requests. Unfortunately, the Flink JIRA is quite
> > unorganized
> > > right now.
> > >
> > > A natural followup effort of this proposal would be to define in JIRA
> the
> > > same components as we defined here,
> > > and have the maintainers keep JIRA meaningful for that particular
> > > component. That would allow us to
> > > easily generate some tables out of JIRA (like top known issues per
> > > component, most requested features)
> > > post them on the dev list once in a while as a "state of the union"
> > report.
> > >
> > > Initial assignment of issues to components should be made by those
> people
> > > opening the issue. The maintainer
> > > of that tagged component needs to change the tag, if the component was
> > > classified incorrectly.
> > >
> > >
> > > ======================================
> > > Initial Components and Maintainers Suggestion
> > > ======================================
> > >
> > > Below is a suggestion of how to define components for Flink. One goal
> of
> > > the division was to make it
> > > obvious for the majority of questions and contributions to which
> > component
> > > they would relate. Otherwise,
> > > if many contributions had fuzzy component associations, we would again
> > not
> > > solve the issue of having clear
> > > responsibilities for who would track the progress and resolution.
> > >
> > > We also looked at each component and wrote the names of some people who
> > we
> > > thought were natural
> > > experts for the components, and thus natural candidates for
> maintainers.
> > >
> > > **These names are only a starting point for discussion.**
> > >
> > > Once agreed upon, the components and names of maintainers should be
> kept
> > in
> > > the wiki and updated as
> > > components change and people step up or down.
> > >
> > >
> > > *DataSet API* (*Fabian, Greg, Gabor*)
> > >   - Incuding Hadoop compat. parts
> > >
> > > *DataStream API* (*Aljoscha, Max, Stephan*)
> > >
> > > *Runtime*
> > >   - Distributed Coordination (JobManager/TaskManager, Akka)  (*Till*)
> > >   - Local Runtime (Memory Management, State Backends, Tasks/Operators)
> (
> > > *Stephan*)
> > >   - Network (*Ufuk*)
> > >
> > > *Client/Optimizer* (*Fabian*)
> > >
> > > *Type system / Type extractor* (Timo)
> > >
> > > *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)
> > >
> > > *Libraries*
> > >   - Gelly (*Vasia, Greg*)
> > >   - ML (*Till, Theo*)
> > >   - CEP (*Till*)
> > >   - Python (*Chesnay*)
> > >
> > > *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
> > >
> > > *Streaming Connectors* (*Robert*, *Aljoscha*)
> > >
> > > *Batch Connectors and Input/Output Formats* (*Chesnay*)
> > >
> > > *Storm Compatibility Layer* (*Mathias*)
> > >
> > > *Scala shell* (*Till*)
> > >
> > > *Startup Shell Scripts* (Ufuk)
> > >
> > > *Flink Build System, Maven Files* (*Robert*)
> > >
> > > *Documentation* (Ufuk)
> > >
> > >
> > > Please let us know what you think about this proposal.
> > > Happy discussing!
> > >
> > > Greetings,
> > > Stephan
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Structure the Flink Open Source Development

Till Rohrmann
+1 for the proposal
On May 12, 2016 12:13 PM, "Stephan Ewen" <[hidden email]> wrote:

> Yes, Gabor Gevay, that did refer to you!
>
> Sorry for the ambiguity...
>
> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <[hidden email]
> >
> wrote:
>
> > +1 for the proposal
> > @ggevay: I do think that it refers to you. :)
> >
> > On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <[hidden email]> wrote:
> >
> > > Hello,
> > >
> > > There are at least three Gábors in the Flink community,  :) so
> > > assuming that the Gábor in the list of maintainers of the DataSet API
> > > is referring to me, I'll be happy to do it. :)
> > >
> > > Best,
> > > Gábor G.
> > >
> > >
> > >
> > > 2016-05-10 11:24 GMT+02:00 Stephan Ewen <[hidden email]>:
> > > > Hi everyone!
> > > >
> > > > We propose to establish some lightweight structures in the Flink open
> > > > source community and development process,
> > > > to help us better handle the increased interest in Flink (mailing
> list
> > > and
> > > > pull requests), while not overwhelming the
> > > > committers, and giving users and contributors a good experience.
> > > >
> > > > This proposal is triggered by the observation that we are reaching
> the
> > > > limits of where the current community can support
> > > > users and guide new contributors. The below proposal is based on
> > > > observations and ideas from Till, Robert, and me.
> > > >
> > > > ========
> > > > Goals
> > > > ========
> > > >
> > > > We try to achieve the following
> > > >
> > > >   - Pull requests get handled in a timely fashion
> > > >   - New contributors are better integrated into the community
> > > >   - The community feels empowered on the mailing list.
> > > >     But questions that need the attention of someone that has deep
> > > > knowledge of a certain part of Flink get their attention.
> > > >   - At the same time, the committers that are knowledgeable about
> many
> > > core
> > > > parts do not get completely overwhelmed.
> > > >   - We don't overlook threads that report critical issues.
> > > >   - We always have a pretty good overview of what the status of
> certain
> > > > parts of the system are.
> > > >       -> What are often encountered known issues
> > > >       -> What are the most frequently requested features
> > > >
> > > >
> > > > ========
> > > > Problems
> > > > ========
> > > >
> > > > Looking into the process, there are two big issues:
> > > >
> > > > (1) Up to now, we have been relying on the fact that everything just
> > > > "organizes itself", driven by best effort. That assumes
> > > > that everyone feels equally responsible for every part, question, and
> > > > contribution. At the current state, this is impossible
> > > > to maintain, it overwhelms the committers and contributors.
> > > >
> > > > Example: Pull requests are picked up by whoever wants to pick them
> up.
> > > Pull
> > > > requests that are a lot of work, have little
> > > > chance of getting in, or relate to less active components are
> sometimes
> > > not
> > > > picked up. When contributors are pretty
> > > > loaded already, it may happen that no one eventually feels
> responsible
> > to
> > > > pick up a pull request, and it falls through the cracks.
> > > >
> > > > (2) There is no good overview of what are known shortcomings,
> efforts,
> > > and
> > > > requested features for different parts of the system.
> > > > This information exists in various peoples' heads, but is not easily
> > > > accessible for new people. The Flink JIRA is not well
> > > > maintained, it is not easy to draw insights from that.
> > > >
> > > >
> > > > ===========
> > > > The Proposal
> > > > ===========
> > > >
> > > > Since we are building a parallel system, the natural solution seems
> to
> > > be:
> > > > partition the workload ;-)
> > > >
> > > > We propose to define a set of components for Flink. Each component is
> > > > maintained or tracked by one or more
> > > > people - let's call them maintainers. It is important to note that we
> > > don't
> > > > suggest the maintainers as an authoritative role, but
> > > > simply as committers or contributors that visibly step up for a
> certain
> > > > component, and mainly track and drive the efforts
> > > > pertaining to that component.
> > > >
> > > > It is also important to realize that we do not want to suggest that
> > > people
> > > > get less involved with certain parts and components, because
> > > > they are not the maintainers. We simply want to make sure that each
> > pull
> > > > request or question or contribution has in the end
> > > > one person (or a small set of people) responsible for catching and
> > > tracking
> > > > it, if it was not worked on by the pro-active
> > > > community.
> > > >
> > > > For some components, having multiple maintainers will be helpful. In
> > that
> > > > case, one maintainer should be the "chair" or "lead"
> > > > and make sure that no issue of that component gets lost between the
> > > > multiple maintainers.
> > > >
> > > >
> > > > A maintainers' role is:
> > > > -----------------------------
> > > >
> > > >   - Have an overview of which of the open pull requests relate to
> their
> > > > component
> > > >   - Drive the pull requests relating to the component to resolution
> > > >       => Moderate the decision whether the feature should be merged
> > > >       => Make sure the pull request gets a shepherd.
> > > >            In many cases, the maintainers would shepherd themselves.
> > > >       => In case the shepherd becomes inactive, the maintainers need
> to
> > > > find a new shepherd.
> > > >
> > > >   - Have an overview of what are the known issues of their component
> > > >   - Have an overview of what are the frequently requested features of
> > > their
> > > > component
> > > >
> > > >   - Have an overview of which contributors are doing very good work
> in
> > > > their component,
> > > >     would be candidates for committers, and should be mentored
> towards
> > > that.
> > > >
> > > >   - Resolve email threads that have been brought to their attention,
> > > > because deeper
> > > >     component knowledge is required for that thread.
> > > >
> > > > A maintainers' role is NOT:
> > > > ----------------------------------
> > > >
> > > >   - Review all pull requests of that component
> > > >   - Answer every mail with questions about that component
> > > >   - Fix all bugs and implement all features of that components
> > > >
> > > >
> > > > We imagine the following way that the community and the maintainers
> > > > interact:
> > > >
> > >
> >
> ---------------------------------------------------------------------------------------------------------
> > > >
> > > >   - Pull requests should be tagged by component. Since we cannot add
> > > labels
> > > > at this point, we need
> > > >     to rely on the following:
> > > >      => The pull request opener should name the pull request like
> > > > "[FLINK-XXX] [component] Title"
> > > >      => Components can be (re) tagged by adding special comments in
> the
> > > > pull request ("==> component client")
> > > >      => With some luck, GitHub and Apache Infra will allow us to use
> > > labels
> > > > at some point
> > > >
> > > >   - When pull requests are associated with a component, the
> maintainers
> > > > will manage them
> > > >     (decision whether to add, find shepherd, catch dropped pull
> > requests)
> > > >
> > > >   - We assume that maintainers frequently reach out to other
> community
> > > > members and ask them if they want
> > > >     to shepherd a pull request.
> > > >
> > > >   - On the mailing list, everyone should feel equally empowered to
> > answer
> > > > and discuss.
> > > >     If at some point in the discussion, some deep technical knowledge
> > > about
> > > > a component is required,
> > > >     the maintainer(s) should be drawn into the discussion.
> > > >     Because the Mailing List infrastructure has no support to tag
> > > threads,
> > > > here are some simple workarounds:
> > > >
> > > >     => One possibility is to put the maintainers' mail addresses on
> cc
> > > for
> > > > the thread, so they get the mail
> > > >           not just via l the mailing list
> > > >     => Another way would be to post something like "+maintainer
> > runtime"
> > > in
> > > > the thread and the "runtime"
> > > >          maintainers would have a filter/alert on these keywords in
> > their
> > > > mail program.
> > > >
> > > >   - We assume that maintainers will reach out to community members
> that
> > > are
> > > > very active and helpful in
> > > >     a component, and will ask them if they want to be added as
> > > maintainers.
> > > >     That will make it visible that those people are experts for that
> > part
> > > > of Flink.
> > > >
> > > >
> > > > ======================================
> > > > Maintainers: Committers and Contributors
> > > > ======================================
> > > >
> > > > It helps if maintainers are committers (since we want them to resolve
> > > pull
> > > > requests which often involves
> > > > merging them).
> > > >
> > > > Components with multiple maintainers can easily have non-committer
> > > > contributors in addition to committer
> > > > contributors.
> > > >
> > > >
> > > > ======
> > > > JIRA
> > > > ======
> > > >
> > > > Ideally, JIRA can be used to get an overview of what are the known
> > issues
> > > > of each component, and what are
> > > > common feature requests. Unfortunately, the Flink JIRA is quite
> > > unorganized
> > > > right now.
> > > >
> > > > A natural followup effort of this proposal would be to define in JIRA
> > the
> > > > same components as we defined here,
> > > > and have the maintainers keep JIRA meaningful for that particular
> > > > component. That would allow us to
> > > > easily generate some tables out of JIRA (like top known issues per
> > > > component, most requested features)
> > > > post them on the dev list once in a while as a "state of the union"
> > > report.
> > > >
> > > > Initial assignment of issues to components should be made by those
> > people
> > > > opening the issue. The maintainer
> > > > of that tagged component needs to change the tag, if the component
> was
> > > > classified incorrectly.
> > > >
> > > >
> > > > ======================================
> > > > Initial Components and Maintainers Suggestion
> > > > ======================================
> > > >
> > > > Below is a suggestion of how to define components for Flink. One goal
> > of
> > > > the division was to make it
> > > > obvious for the majority of questions and contributions to which
> > > component
> > > > they would relate. Otherwise,
> > > > if many contributions had fuzzy component associations, we would
> again
> > > not
> > > > solve the issue of having clear
> > > > responsibilities for who would track the progress and resolution.
> > > >
> > > > We also looked at each component and wrote the names of some people
> who
> > > we
> > > > thought were natural
> > > > experts for the components, and thus natural candidates for
> > maintainers.
> > > >
> > > > **These names are only a starting point for discussion.**
> > > >
> > > > Once agreed upon, the components and names of maintainers should be
> > kept
> > > in
> > > > the wiki and updated as
> > > > components change and people step up or down.
> > > >
> > > >
> > > > *DataSet API* (*Fabian, Greg, Gabor*)
> > > >   - Incuding Hadoop compat. parts
> > > >
> > > > *DataStream API* (*Aljoscha, Max, Stephan*)
> > > >
> > > > *Runtime*
> > > >   - Distributed Coordination (JobManager/TaskManager, Akka)  (*Till*)
> > > >   - Local Runtime (Memory Management, State Backends,
> Tasks/Operators)
> > (
> > > > *Stephan*)
> > > >   - Network (*Ufuk*)
> > > >
> > > > *Client/Optimizer* (*Fabian*)
> > > >
> > > > *Type system / Type extractor* (Timo)
> > > >
> > > > *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)
> > > >
> > > > *Libraries*
> > > >   - Gelly (*Vasia, Greg*)
> > > >   - ML (*Till, Theo*)
> > > >   - CEP (*Till*)
> > > >   - Python (*Chesnay*)
> > > >
> > > > *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
> > > >
> > > > *Streaming Connectors* (*Robert*, *Aljoscha*)
> > > >
> > > > *Batch Connectors and Input/Output Formats* (*Chesnay*)
> > > >
> > > > *Storm Compatibility Layer* (*Mathias*)
> > > >
> > > > *Scala shell* (*Till*)
> > > >
> > > > *Startup Shell Scripts* (Ufuk)
> > > >
> > > > *Flink Build System, Maven Files* (*Robert*)
> > > >
> > > > *Documentation* (Ufuk)
> > > >
> > > >
> > > > Please let us know what you think about this proposal.
> > > > Happy discussing!
> > > >
> > > > Greetings,
> > > > Stephan
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Structure the Flink Open Source Development

Matthias J. Sax-2
+1 from my side.

Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
it's me, even the correct spelling would be with two 't' :P)

-Matthias

On 05/12/2016 12:56 PM, Till Rohrmann wrote:

> +1 for the proposal
> On May 12, 2016 12:13 PM, "Stephan Ewen" <[hidden email]> wrote:
>
>> Yes, Gabor Gevay, that did refer to you!
>>
>> Sorry for the ambiguity...
>>
>> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <[hidden email]
>>>
>> wrote:
>>
>>> +1 for the proposal
>>> @ggevay: I do think that it refers to you. :)
>>>
>>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <[hidden email]> wrote:
>>>
>>>> Hello,
>>>>
>>>> There are at least three Gábors in the Flink community,  :) so
>>>> assuming that the Gábor in the list of maintainers of the DataSet API
>>>> is referring to me, I'll be happy to do it. :)
>>>>
>>>> Best,
>>>> Gábor G.
>>>>
>>>>
>>>>
>>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <[hidden email]>:
>>>>> Hi everyone!
>>>>>
>>>>> We propose to establish some lightweight structures in the Flink open
>>>>> source community and development process,
>>>>> to help us better handle the increased interest in Flink (mailing
>> list
>>>> and
>>>>> pull requests), while not overwhelming the
>>>>> committers, and giving users and contributors a good experience.
>>>>>
>>>>> This proposal is triggered by the observation that we are reaching
>> the
>>>>> limits of where the current community can support
>>>>> users and guide new contributors. The below proposal is based on
>>>>> observations and ideas from Till, Robert, and me.
>>>>>
>>>>> ========
>>>>> Goals
>>>>> ========
>>>>>
>>>>> We try to achieve the following
>>>>>
>>>>>   - Pull requests get handled in a timely fashion
>>>>>   - New contributors are better integrated into the community
>>>>>   - The community feels empowered on the mailing list.
>>>>>     But questions that need the attention of someone that has deep
>>>>> knowledge of a certain part of Flink get their attention.
>>>>>   - At the same time, the committers that are knowledgeable about
>> many
>>>> core
>>>>> parts do not get completely overwhelmed.
>>>>>   - We don't overlook threads that report critical issues.
>>>>>   - We always have a pretty good overview of what the status of
>> certain
>>>>> parts of the system are.
>>>>>       -> What are often encountered known issues
>>>>>       -> What are the most frequently requested features
>>>>>
>>>>>
>>>>> ========
>>>>> Problems
>>>>> ========
>>>>>
>>>>> Looking into the process, there are two big issues:
>>>>>
>>>>> (1) Up to now, we have been relying on the fact that everything just
>>>>> "organizes itself", driven by best effort. That assumes
>>>>> that everyone feels equally responsible for every part, question, and
>>>>> contribution. At the current state, this is impossible
>>>>> to maintain, it overwhelms the committers and contributors.
>>>>>
>>>>> Example: Pull requests are picked up by whoever wants to pick them
>> up.
>>>> Pull
>>>>> requests that are a lot of work, have little
>>>>> chance of getting in, or relate to less active components are
>> sometimes
>>>> not
>>>>> picked up. When contributors are pretty
>>>>> loaded already, it may happen that no one eventually feels
>> responsible
>>> to
>>>>> pick up a pull request, and it falls through the cracks.
>>>>>
>>>>> (2) There is no good overview of what are known shortcomings,
>> efforts,
>>>> and
>>>>> requested features for different parts of the system.
>>>>> This information exists in various peoples' heads, but is not easily
>>>>> accessible for new people. The Flink JIRA is not well
>>>>> maintained, it is not easy to draw insights from that.
>>>>>
>>>>>
>>>>> ===========
>>>>> The Proposal
>>>>> ===========
>>>>>
>>>>> Since we are building a parallel system, the natural solution seems
>> to
>>>> be:
>>>>> partition the workload ;-)
>>>>>
>>>>> We propose to define a set of components for Flink. Each component is
>>>>> maintained or tracked by one or more
>>>>> people - let's call them maintainers. It is important to note that we
>>>> don't
>>>>> suggest the maintainers as an authoritative role, but
>>>>> simply as committers or contributors that visibly step up for a
>> certain
>>>>> component, and mainly track and drive the efforts
>>>>> pertaining to that component.
>>>>>
>>>>> It is also important to realize that we do not want to suggest that
>>>> people
>>>>> get less involved with certain parts and components, because
>>>>> they are not the maintainers. We simply want to make sure that each
>>> pull
>>>>> request or question or contribution has in the end
>>>>> one person (or a small set of people) responsible for catching and
>>>> tracking
>>>>> it, if it was not worked on by the pro-active
>>>>> community.
>>>>>
>>>>> For some components, having multiple maintainers will be helpful. In
>>> that
>>>>> case, one maintainer should be the "chair" or "lead"
>>>>> and make sure that no issue of that component gets lost between the
>>>>> multiple maintainers.
>>>>>
>>>>>
>>>>> A maintainers' role is:
>>>>> -----------------------------
>>>>>
>>>>>   - Have an overview of which of the open pull requests relate to
>> their
>>>>> component
>>>>>   - Drive the pull requests relating to the component to resolution
>>>>>       => Moderate the decision whether the feature should be merged
>>>>>       => Make sure the pull request gets a shepherd.
>>>>>            In many cases, the maintainers would shepherd themselves.
>>>>>       => In case the shepherd becomes inactive, the maintainers need
>> to
>>>>> find a new shepherd.
>>>>>
>>>>>   - Have an overview of what are the known issues of their component
>>>>>   - Have an overview of what are the frequently requested features of
>>>> their
>>>>> component
>>>>>
>>>>>   - Have an overview of which contributors are doing very good work
>> in
>>>>> their component,
>>>>>     would be candidates for committers, and should be mentored
>> towards
>>>> that.
>>>>>
>>>>>   - Resolve email threads that have been brought to their attention,
>>>>> because deeper
>>>>>     component knowledge is required for that thread.
>>>>>
>>>>> A maintainers' role is NOT:
>>>>> ----------------------------------
>>>>>
>>>>>   - Review all pull requests of that component
>>>>>   - Answer every mail with questions about that component
>>>>>   - Fix all bugs and implement all features of that components
>>>>>
>>>>>
>>>>> We imagine the following way that the community and the maintainers
>>>>> interact:
>>>>>
>>>>
>>>
>> ---------------------------------------------------------------------------------------------------------
>>>>>
>>>>>   - Pull requests should be tagged by component. Since we cannot add
>>>> labels
>>>>> at this point, we need
>>>>>     to rely on the following:
>>>>>      => The pull request opener should name the pull request like
>>>>> "[FLINK-XXX] [component] Title"
>>>>>      => Components can be (re) tagged by adding special comments in
>> the
>>>>> pull request ("==> component client")
>>>>>      => With some luck, GitHub and Apache Infra will allow us to use
>>>> labels
>>>>> at some point
>>>>>
>>>>>   - When pull requests are associated with a component, the
>> maintainers
>>>>> will manage them
>>>>>     (decision whether to add, find shepherd, catch dropped pull
>>> requests)
>>>>>
>>>>>   - We assume that maintainers frequently reach out to other
>> community
>>>>> members and ask them if they want
>>>>>     to shepherd a pull request.
>>>>>
>>>>>   - On the mailing list, everyone should feel equally empowered to
>>> answer
>>>>> and discuss.
>>>>>     If at some point in the discussion, some deep technical knowledge
>>>> about
>>>>> a component is required,
>>>>>     the maintainer(s) should be drawn into the discussion.
>>>>>     Because the Mailing List infrastructure has no support to tag
>>>> threads,
>>>>> here are some simple workarounds:
>>>>>
>>>>>     => One possibility is to put the maintainers' mail addresses on
>> cc
>>>> for
>>>>> the thread, so they get the mail
>>>>>           not just via l the mailing list
>>>>>     => Another way would be to post something like "+maintainer
>>> runtime"
>>>> in
>>>>> the thread and the "runtime"
>>>>>          maintainers would have a filter/alert on these keywords in
>>> their
>>>>> mail program.
>>>>>
>>>>>   - We assume that maintainers will reach out to community members
>> that
>>>> are
>>>>> very active and helpful in
>>>>>     a component, and will ask them if they want to be added as
>>>> maintainers.
>>>>>     That will make it visible that those people are experts for that
>>> part
>>>>> of Flink.
>>>>>
>>>>>
>>>>> ======================================
>>>>> Maintainers: Committers and Contributors
>>>>> ======================================
>>>>>
>>>>> It helps if maintainers are committers (since we want them to resolve
>>>> pull
>>>>> requests which often involves
>>>>> merging them).
>>>>>
>>>>> Components with multiple maintainers can easily have non-committer
>>>>> contributors in addition to committer
>>>>> contributors.
>>>>>
>>>>>
>>>>> ======
>>>>> JIRA
>>>>> ======
>>>>>
>>>>> Ideally, JIRA can be used to get an overview of what are the known
>>> issues
>>>>> of each component, and what are
>>>>> common feature requests. Unfortunately, the Flink JIRA is quite
>>>> unorganized
>>>>> right now.
>>>>>
>>>>> A natural followup effort of this proposal would be to define in JIRA
>>> the
>>>>> same components as we defined here,
>>>>> and have the maintainers keep JIRA meaningful for that particular
>>>>> component. That would allow us to
>>>>> easily generate some tables out of JIRA (like top known issues per
>>>>> component, most requested features)
>>>>> post them on the dev list once in a while as a "state of the union"
>>>> report.
>>>>>
>>>>> Initial assignment of issues to components should be made by those
>>> people
>>>>> opening the issue. The maintainer
>>>>> of that tagged component needs to change the tag, if the component
>> was
>>>>> classified incorrectly.
>>>>>
>>>>>
>>>>> ======================================
>>>>> Initial Components and Maintainers Suggestion
>>>>> ======================================
>>>>>
>>>>> Below is a suggestion of how to define components for Flink. One goal
>>> of
>>>>> the division was to make it
>>>>> obvious for the majority of questions and contributions to which
>>>> component
>>>>> they would relate. Otherwise,
>>>>> if many contributions had fuzzy component associations, we would
>> again
>>>> not
>>>>> solve the issue of having clear
>>>>> responsibilities for who would track the progress and resolution.
>>>>>
>>>>> We also looked at each component and wrote the names of some people
>> who
>>>> we
>>>>> thought were natural
>>>>> experts for the components, and thus natural candidates for
>>> maintainers.
>>>>>
>>>>> **These names are only a starting point for discussion.**
>>>>>
>>>>> Once agreed upon, the components and names of maintainers should be
>>> kept
>>>> in
>>>>> the wiki and updated as
>>>>> components change and people step up or down.
>>>>>
>>>>>
>>>>> *DataSet API* (*Fabian, Greg, Gabor*)
>>>>>   - Incuding Hadoop compat. parts
>>>>>
>>>>> *DataStream API* (*Aljoscha, Max, Stephan*)
>>>>>
>>>>> *Runtime*
>>>>>   - Distributed Coordination (JobManager/TaskManager, Akka)  (*Till*)
>>>>>   - Local Runtime (Memory Management, State Backends,
>> Tasks/Operators)
>>> (
>>>>> *Stephan*)
>>>>>   - Network (*Ufuk*)
>>>>>
>>>>> *Client/Optimizer* (*Fabian*)
>>>>>
>>>>> *Type system / Type extractor* (Timo)
>>>>>
>>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)
>>>>>
>>>>> *Libraries*
>>>>>   - Gelly (*Vasia, Greg*)
>>>>>   - ML (*Till, Theo*)
>>>>>   - CEP (*Till*)
>>>>>   - Python (*Chesnay*)
>>>>>
>>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
>>>>>
>>>>> *Streaming Connectors* (*Robert*, *Aljoscha*)
>>>>>
>>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*)
>>>>>
>>>>> *Storm Compatibility Layer* (*Mathias*)
>>>>>
>>>>> *Scala shell* (*Till*)
>>>>>
>>>>> *Startup Shell Scripts* (Ufuk)
>>>>>
>>>>> *Flink Build System, Maven Files* (*Robert*)
>>>>>
>>>>> *Documentation* (Ufuk)
>>>>>
>>>>>
>>>>> Please let us know what you think about this proposal.
>>>>> Happy discussing!
>>>>>
>>>>> Greetings,
>>>>> Stephan
>>>>
>>>
>>
>


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Structure the Flink Open Source Development

Kostas Tzoumas-2
Big +1 from my side, I think this will help the community grow and prosper
big time!

On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax <[hidden email]> wrote:

> +1 from my side.
>
> Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
> it's me, even the correct spelling would be with two 't' :P)
>
> -Matthias
>
> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
> > +1 for the proposal
> > On May 12, 2016 12:13 PM, "Stephan Ewen" <[hidden email]> wrote:
> >
> >> Yes, Gabor Gevay, that did refer to you!
> >>
> >> Sorry for the ambiguity...
> >>
> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
> [hidden email]
> >>>
> >> wrote:
> >>
> >>> +1 for the proposal
> >>> @ggevay: I do think that it refers to you. :)
> >>>
> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <[hidden email]>
> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> There are at least three Gábors in the Flink community,  :) so
> >>>> assuming that the Gábor in the list of maintainers of the DataSet API
> >>>> is referring to me, I'll be happy to do it. :)
> >>>>
> >>>> Best,
> >>>> Gábor G.
> >>>>
> >>>>
> >>>>
> >>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <[hidden email]>:
> >>>>> Hi everyone!
> >>>>>
> >>>>> We propose to establish some lightweight structures in the Flink open
> >>>>> source community and development process,
> >>>>> to help us better handle the increased interest in Flink (mailing
> >> list
> >>>> and
> >>>>> pull requests), while not overwhelming the
> >>>>> committers, and giving users and contributors a good experience.
> >>>>>
> >>>>> This proposal is triggered by the observation that we are reaching
> >> the
> >>>>> limits of where the current community can support
> >>>>> users and guide new contributors. The below proposal is based on
> >>>>> observations and ideas from Till, Robert, and me.
> >>>>>
> >>>>> ========
> >>>>> Goals
> >>>>> ========
> >>>>>
> >>>>> We try to achieve the following
> >>>>>
> >>>>>   - Pull requests get handled in a timely fashion
> >>>>>   - New contributors are better integrated into the community
> >>>>>   - The community feels empowered on the mailing list.
> >>>>>     But questions that need the attention of someone that has deep
> >>>>> knowledge of a certain part of Flink get their attention.
> >>>>>   - At the same time, the committers that are knowledgeable about
> >> many
> >>>> core
> >>>>> parts do not get completely overwhelmed.
> >>>>>   - We don't overlook threads that report critical issues.
> >>>>>   - We always have a pretty good overview of what the status of
> >> certain
> >>>>> parts of the system are.
> >>>>>       -> What are often encountered known issues
> >>>>>       -> What are the most frequently requested features
> >>>>>
> >>>>>
> >>>>> ========
> >>>>> Problems
> >>>>> ========
> >>>>>
> >>>>> Looking into the process, there are two big issues:
> >>>>>
> >>>>> (1) Up to now, we have been relying on the fact that everything just
> >>>>> "organizes itself", driven by best effort. That assumes
> >>>>> that everyone feels equally responsible for every part, question, and
> >>>>> contribution. At the current state, this is impossible
> >>>>> to maintain, it overwhelms the committers and contributors.
> >>>>>
> >>>>> Example: Pull requests are picked up by whoever wants to pick them
> >> up.
> >>>> Pull
> >>>>> requests that are a lot of work, have little
> >>>>> chance of getting in, or relate to less active components are
> >> sometimes
> >>>> not
> >>>>> picked up. When contributors are pretty
> >>>>> loaded already, it may happen that no one eventually feels
> >> responsible
> >>> to
> >>>>> pick up a pull request, and it falls through the cracks.
> >>>>>
> >>>>> (2) There is no good overview of what are known shortcomings,
> >> efforts,
> >>>> and
> >>>>> requested features for different parts of the system.
> >>>>> This information exists in various peoples' heads, but is not easily
> >>>>> accessible for new people. The Flink JIRA is not well
> >>>>> maintained, it is not easy to draw insights from that.
> >>>>>
> >>>>>
> >>>>> ===========
> >>>>> The Proposal
> >>>>> ===========
> >>>>>
> >>>>> Since we are building a parallel system, the natural solution seems
> >> to
> >>>> be:
> >>>>> partition the workload ;-)
> >>>>>
> >>>>> We propose to define a set of components for Flink. Each component is
> >>>>> maintained or tracked by one or more
> >>>>> people - let's call them maintainers. It is important to note that we
> >>>> don't
> >>>>> suggest the maintainers as an authoritative role, but
> >>>>> simply as committers or contributors that visibly step up for a
> >> certain
> >>>>> component, and mainly track and drive the efforts
> >>>>> pertaining to that component.
> >>>>>
> >>>>> It is also important to realize that we do not want to suggest that
> >>>> people
> >>>>> get less involved with certain parts and components, because
> >>>>> they are not the maintainers. We simply want to make sure that each
> >>> pull
> >>>>> request or question or contribution has in the end
> >>>>> one person (or a small set of people) responsible for catching and
> >>>> tracking
> >>>>> it, if it was not worked on by the pro-active
> >>>>> community.
> >>>>>
> >>>>> For some components, having multiple maintainers will be helpful. In
> >>> that
> >>>>> case, one maintainer should be the "chair" or "lead"
> >>>>> and make sure that no issue of that component gets lost between the
> >>>>> multiple maintainers.
> >>>>>
> >>>>>
> >>>>> A maintainers' role is:
> >>>>> -----------------------------
> >>>>>
> >>>>>   - Have an overview of which of the open pull requests relate to
> >> their
> >>>>> component
> >>>>>   - Drive the pull requests relating to the component to resolution
> >>>>>       => Moderate the decision whether the feature should be merged
> >>>>>       => Make sure the pull request gets a shepherd.
> >>>>>            In many cases, the maintainers would shepherd themselves.
> >>>>>       => In case the shepherd becomes inactive, the maintainers need
> >> to
> >>>>> find a new shepherd.
> >>>>>
> >>>>>   - Have an overview of what are the known issues of their component
> >>>>>   - Have an overview of what are the frequently requested features of
> >>>> their
> >>>>> component
> >>>>>
> >>>>>   - Have an overview of which contributors are doing very good work
> >> in
> >>>>> their component,
> >>>>>     would be candidates for committers, and should be mentored
> >> towards
> >>>> that.
> >>>>>
> >>>>>   - Resolve email threads that have been brought to their attention,
> >>>>> because deeper
> >>>>>     component knowledge is required for that thread.
> >>>>>
> >>>>> A maintainers' role is NOT:
> >>>>> ----------------------------------
> >>>>>
> >>>>>   - Review all pull requests of that component
> >>>>>   - Answer every mail with questions about that component
> >>>>>   - Fix all bugs and implement all features of that components
> >>>>>
> >>>>>
> >>>>> We imagine the following way that the community and the maintainers
> >>>>> interact:
> >>>>>
> >>>>
> >>>
> >>
> ---------------------------------------------------------------------------------------------------------
> >>>>>
> >>>>>   - Pull requests should be tagged by component. Since we cannot add
> >>>> labels
> >>>>> at this point, we need
> >>>>>     to rely on the following:
> >>>>>      => The pull request opener should name the pull request like
> >>>>> "[FLINK-XXX] [component] Title"
> >>>>>      => Components can be (re) tagged by adding special comments in
> >> the
> >>>>> pull request ("==> component client")
> >>>>>      => With some luck, GitHub and Apache Infra will allow us to use
> >>>> labels
> >>>>> at some point
> >>>>>
> >>>>>   - When pull requests are associated with a component, the
> >> maintainers
> >>>>> will manage them
> >>>>>     (decision whether to add, find shepherd, catch dropped pull
> >>> requests)
> >>>>>
> >>>>>   - We assume that maintainers frequently reach out to other
> >> community
> >>>>> members and ask them if they want
> >>>>>     to shepherd a pull request.
> >>>>>
> >>>>>   - On the mailing list, everyone should feel equally empowered to
> >>> answer
> >>>>> and discuss.
> >>>>>     If at some point in the discussion, some deep technical knowledge
> >>>> about
> >>>>> a component is required,
> >>>>>     the maintainer(s) should be drawn into the discussion.
> >>>>>     Because the Mailing List infrastructure has no support to tag
> >>>> threads,
> >>>>> here are some simple workarounds:
> >>>>>
> >>>>>     => One possibility is to put the maintainers' mail addresses on
> >> cc
> >>>> for
> >>>>> the thread, so they get the mail
> >>>>>           not just via l the mailing list
> >>>>>     => Another way would be to post something like "+maintainer
> >>> runtime"
> >>>> in
> >>>>> the thread and the "runtime"
> >>>>>          maintainers would have a filter/alert on these keywords in
> >>> their
> >>>>> mail program.
> >>>>>
> >>>>>   - We assume that maintainers will reach out to community members
> >> that
> >>>> are
> >>>>> very active and helpful in
> >>>>>     a component, and will ask them if they want to be added as
> >>>> maintainers.
> >>>>>     That will make it visible that those people are experts for that
> >>> part
> >>>>> of Flink.
> >>>>>
> >>>>>
> >>>>> ======================================
> >>>>> Maintainers: Committers and Contributors
> >>>>> ======================================
> >>>>>
> >>>>> It helps if maintainers are committers (since we want them to resolve
> >>>> pull
> >>>>> requests which often involves
> >>>>> merging them).
> >>>>>
> >>>>> Components with multiple maintainers can easily have non-committer
> >>>>> contributors in addition to committer
> >>>>> contributors.
> >>>>>
> >>>>>
> >>>>> ======
> >>>>> JIRA
> >>>>> ======
> >>>>>
> >>>>> Ideally, JIRA can be used to get an overview of what are the known
> >>> issues
> >>>>> of each component, and what are
> >>>>> common feature requests. Unfortunately, the Flink JIRA is quite
> >>>> unorganized
> >>>>> right now.
> >>>>>
> >>>>> A natural followup effort of this proposal would be to define in JIRA
> >>> the
> >>>>> same components as we defined here,
> >>>>> and have the maintainers keep JIRA meaningful for that particular
> >>>>> component. That would allow us to
> >>>>> easily generate some tables out of JIRA (like top known issues per
> >>>>> component, most requested features)
> >>>>> post them on the dev list once in a while as a "state of the union"
> >>>> report.
> >>>>>
> >>>>> Initial assignment of issues to components should be made by those
> >>> people
> >>>>> opening the issue. The maintainer
> >>>>> of that tagged component needs to change the tag, if the component
> >> was
> >>>>> classified incorrectly.
> >>>>>
> >>>>>
> >>>>> ======================================
> >>>>> Initial Components and Maintainers Suggestion
> >>>>> ======================================
> >>>>>
> >>>>> Below is a suggestion of how to define components for Flink. One goal
> >>> of
> >>>>> the division was to make it
> >>>>> obvious for the majority of questions and contributions to which
> >>>> component
> >>>>> they would relate. Otherwise,
> >>>>> if many contributions had fuzzy component associations, we would
> >> again
> >>>> not
> >>>>> solve the issue of having clear
> >>>>> responsibilities for who would track the progress and resolution.
> >>>>>
> >>>>> We also looked at each component and wrote the names of some people
> >> who
> >>>> we
> >>>>> thought were natural
> >>>>> experts for the components, and thus natural candidates for
> >>> maintainers.
> >>>>>
> >>>>> **These names are only a starting point for discussion.**
> >>>>>
> >>>>> Once agreed upon, the components and names of maintainers should be
> >>> kept
> >>>> in
> >>>>> the wiki and updated as
> >>>>> components change and people step up or down.
> >>>>>
> >>>>>
> >>>>> *DataSet API* (*Fabian, Greg, Gabor*)
> >>>>>   - Incuding Hadoop compat. parts
> >>>>>
> >>>>> *DataStream API* (*Aljoscha, Max, Stephan*)
> >>>>>
> >>>>> *Runtime*
> >>>>>   - Distributed Coordination (JobManager/TaskManager, Akka)  (*Till*)
> >>>>>   - Local Runtime (Memory Management, State Backends,
> >> Tasks/Operators)
> >>> (
> >>>>> *Stephan*)
> >>>>>   - Network (*Ufuk*)
> >>>>>
> >>>>> *Client/Optimizer* (*Fabian*)
> >>>>>
> >>>>> *Type system / Type extractor* (Timo)
> >>>>>
> >>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)
> >>>>>
> >>>>> *Libraries*
> >>>>>   - Gelly (*Vasia, Greg*)
> >>>>>   - ML (*Till, Theo*)
> >>>>>   - CEP (*Till*)
> >>>>>   - Python (*Chesnay*)
> >>>>>
> >>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
> >>>>>
> >>>>> *Streaming Connectors* (*Robert*, *Aljoscha*)
> >>>>>
> >>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*)
> >>>>>
> >>>>> *Storm Compatibility Layer* (*Mathias*)
> >>>>>
> >>>>> *Scala shell* (*Till*)
> >>>>>
> >>>>> *Startup Shell Scripts* (Ufuk)
> >>>>>
> >>>>> *Flink Build System, Maven Files* (*Robert*)
> >>>>>
> >>>>> *Documentation* (Ufuk)
> >>>>>
> >>>>>
> >>>>> Please let us know what you think about this proposal.
> >>>>> Happy discussing!
> >>>>>
> >>>>> Greetings,
> >>>>> Stephan
> >>>>
> >>>
> >>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Structure the Flink Open Source Development

Stephan Ewen
In reply to this post by Matthias J. Sax-2
Yes, Matthias, that was supposed to be you.
Sorry from another guy who frequently has his name misspelled ;-)

On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax <[hidden email]> wrote:

> +1 from my side.
>
> Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
> it's me, even the correct spelling would be with two 't' :P)
>
> -Matthias
>
> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
> > +1 for the proposal
> > On May 12, 2016 12:13 PM, "Stephan Ewen" <[hidden email]> wrote:
> >
> >> Yes, Gabor Gevay, that did refer to you!
> >>
> >> Sorry for the ambiguity...
> >>
> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
> [hidden email]
> >>>
> >> wrote:
> >>
> >>> +1 for the proposal
> >>> @ggevay: I do think that it refers to you. :)
> >>>
> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <[hidden email]>
> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> There are at least three Gábors in the Flink community,  :) so
> >>>> assuming that the Gábor in the list of maintainers of the DataSet API
> >>>> is referring to me, I'll be happy to do it. :)
> >>>>
> >>>> Best,
> >>>> Gábor G.
> >>>>
> >>>>
> >>>>
> >>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <[hidden email]>:
> >>>>> Hi everyone!
> >>>>>
> >>>>> We propose to establish some lightweight structures in the Flink open
> >>>>> source community and development process,
> >>>>> to help us better handle the increased interest in Flink (mailing
> >> list
> >>>> and
> >>>>> pull requests), while not overwhelming the
> >>>>> committers, and giving users and contributors a good experience.
> >>>>>
> >>>>> This proposal is triggered by the observation that we are reaching
> >> the
> >>>>> limits of where the current community can support
> >>>>> users and guide new contributors. The below proposal is based on
> >>>>> observations and ideas from Till, Robert, and me.
> >>>>>
> >>>>> ========
> >>>>> Goals
> >>>>> ========
> >>>>>
> >>>>> We try to achieve the following
> >>>>>
> >>>>>   - Pull requests get handled in a timely fashion
> >>>>>   - New contributors are better integrated into the community
> >>>>>   - The community feels empowered on the mailing list.
> >>>>>     But questions that need the attention of someone that has deep
> >>>>> knowledge of a certain part of Flink get their attention.
> >>>>>   - At the same time, the committers that are knowledgeable about
> >> many
> >>>> core
> >>>>> parts do not get completely overwhelmed.
> >>>>>   - We don't overlook threads that report critical issues.
> >>>>>   - We always have a pretty good overview of what the status of
> >> certain
> >>>>> parts of the system are.
> >>>>>       -> What are often encountered known issues
> >>>>>       -> What are the most frequently requested features
> >>>>>
> >>>>>
> >>>>> ========
> >>>>> Problems
> >>>>> ========
> >>>>>
> >>>>> Looking into the process, there are two big issues:
> >>>>>
> >>>>> (1) Up to now, we have been relying on the fact that everything just
> >>>>> "organizes itself", driven by best effort. That assumes
> >>>>> that everyone feels equally responsible for every part, question, and
> >>>>> contribution. At the current state, this is impossible
> >>>>> to maintain, it overwhelms the committers and contributors.
> >>>>>
> >>>>> Example: Pull requests are picked up by whoever wants to pick them
> >> up.
> >>>> Pull
> >>>>> requests that are a lot of work, have little
> >>>>> chance of getting in, or relate to less active components are
> >> sometimes
> >>>> not
> >>>>> picked up. When contributors are pretty
> >>>>> loaded already, it may happen that no one eventually feels
> >> responsible
> >>> to
> >>>>> pick up a pull request, and it falls through the cracks.
> >>>>>
> >>>>> (2) There is no good overview of what are known shortcomings,
> >> efforts,
> >>>> and
> >>>>> requested features for different parts of the system.
> >>>>> This information exists in various peoples' heads, but is not easily
> >>>>> accessible for new people. The Flink JIRA is not well
> >>>>> maintained, it is not easy to draw insights from that.
> >>>>>
> >>>>>
> >>>>> ===========
> >>>>> The Proposal
> >>>>> ===========
> >>>>>
> >>>>> Since we are building a parallel system, the natural solution seems
> >> to
> >>>> be:
> >>>>> partition the workload ;-)
> >>>>>
> >>>>> We propose to define a set of components for Flink. Each component is
> >>>>> maintained or tracked by one or more
> >>>>> people - let's call them maintainers. It is important to note that we
> >>>> don't
> >>>>> suggest the maintainers as an authoritative role, but
> >>>>> simply as committers or contributors that visibly step up for a
> >> certain
> >>>>> component, and mainly track and drive the efforts
> >>>>> pertaining to that component.
> >>>>>
> >>>>> It is also important to realize that we do not want to suggest that
> >>>> people
> >>>>> get less involved with certain parts and components, because
> >>>>> they are not the maintainers. We simply want to make sure that each
> >>> pull
> >>>>> request or question or contribution has in the end
> >>>>> one person (or a small set of people) responsible for catching and
> >>>> tracking
> >>>>> it, if it was not worked on by the pro-active
> >>>>> community.
> >>>>>
> >>>>> For some components, having multiple maintainers will be helpful. In
> >>> that
> >>>>> case, one maintainer should be the "chair" or "lead"
> >>>>> and make sure that no issue of that component gets lost between the
> >>>>> multiple maintainers.
> >>>>>
> >>>>>
> >>>>> A maintainers' role is:
> >>>>> -----------------------------
> >>>>>
> >>>>>   - Have an overview of which of the open pull requests relate to
> >> their
> >>>>> component
> >>>>>   - Drive the pull requests relating to the component to resolution
> >>>>>       => Moderate the decision whether the feature should be merged
> >>>>>       => Make sure the pull request gets a shepherd.
> >>>>>            In many cases, the maintainers would shepherd themselves.
> >>>>>       => In case the shepherd becomes inactive, the maintainers need
> >> to
> >>>>> find a new shepherd.
> >>>>>
> >>>>>   - Have an overview of what are the known issues of their component
> >>>>>   - Have an overview of what are the frequently requested features of
> >>>> their
> >>>>> component
> >>>>>
> >>>>>   - Have an overview of which contributors are doing very good work
> >> in
> >>>>> their component,
> >>>>>     would be candidates for committers, and should be mentored
> >> towards
> >>>> that.
> >>>>>
> >>>>>   - Resolve email threads that have been brought to their attention,
> >>>>> because deeper
> >>>>>     component knowledge is required for that thread.
> >>>>>
> >>>>> A maintainers' role is NOT:
> >>>>> ----------------------------------
> >>>>>
> >>>>>   - Review all pull requests of that component
> >>>>>   - Answer every mail with questions about that component
> >>>>>   - Fix all bugs and implement all features of that components
> >>>>>
> >>>>>
> >>>>> We imagine the following way that the community and the maintainers
> >>>>> interact:
> >>>>>
> >>>>
> >>>
> >>
> ---------------------------------------------------------------------------------------------------------
> >>>>>
> >>>>>   - Pull requests should be tagged by component. Since we cannot add
> >>>> labels
> >>>>> at this point, we need
> >>>>>     to rely on the following:
> >>>>>      => The pull request opener should name the pull request like
> >>>>> "[FLINK-XXX] [component] Title"
> >>>>>      => Components can be (re) tagged by adding special comments in
> >> the
> >>>>> pull request ("==> component client")
> >>>>>      => With some luck, GitHub and Apache Infra will allow us to use
> >>>> labels
> >>>>> at some point
> >>>>>
> >>>>>   - When pull requests are associated with a component, the
> >> maintainers
> >>>>> will manage them
> >>>>>     (decision whether to add, find shepherd, catch dropped pull
> >>> requests)
> >>>>>
> >>>>>   - We assume that maintainers frequently reach out to other
> >> community
> >>>>> members and ask them if they want
> >>>>>     to shepherd a pull request.
> >>>>>
> >>>>>   - On the mailing list, everyone should feel equally empowered to
> >>> answer
> >>>>> and discuss.
> >>>>>     If at some point in the discussion, some deep technical knowledge
> >>>> about
> >>>>> a component is required,
> >>>>>     the maintainer(s) should be drawn into the discussion.
> >>>>>     Because the Mailing List infrastructure has no support to tag
> >>>> threads,
> >>>>> here are some simple workarounds:
> >>>>>
> >>>>>     => One possibility is to put the maintainers' mail addresses on
> >> cc
> >>>> for
> >>>>> the thread, so they get the mail
> >>>>>           not just via l the mailing list
> >>>>>     => Another way would be to post something like "+maintainer
> >>> runtime"
> >>>> in
> >>>>> the thread and the "runtime"
> >>>>>          maintainers would have a filter/alert on these keywords in
> >>> their
> >>>>> mail program.
> >>>>>
> >>>>>   - We assume that maintainers will reach out to community members
> >> that
> >>>> are
> >>>>> very active and helpful in
> >>>>>     a component, and will ask them if they want to be added as
> >>>> maintainers.
> >>>>>     That will make it visible that those people are experts for that
> >>> part
> >>>>> of Flink.
> >>>>>
> >>>>>
> >>>>> ======================================
> >>>>> Maintainers: Committers and Contributors
> >>>>> ======================================
> >>>>>
> >>>>> It helps if maintainers are committers (since we want them to resolve
> >>>> pull
> >>>>> requests which often involves
> >>>>> merging them).
> >>>>>
> >>>>> Components with multiple maintainers can easily have non-committer
> >>>>> contributors in addition to committer
> >>>>> contributors.
> >>>>>
> >>>>>
> >>>>> ======
> >>>>> JIRA
> >>>>> ======
> >>>>>
> >>>>> Ideally, JIRA can be used to get an overview of what are the known
> >>> issues
> >>>>> of each component, and what are
> >>>>> common feature requests. Unfortunately, the Flink JIRA is quite
> >>>> unorganized
> >>>>> right now.
> >>>>>
> >>>>> A natural followup effort of this proposal would be to define in JIRA
> >>> the
> >>>>> same components as we defined here,
> >>>>> and have the maintainers keep JIRA meaningful for that particular
> >>>>> component. That would allow us to
> >>>>> easily generate some tables out of JIRA (like top known issues per
> >>>>> component, most requested features)
> >>>>> post them on the dev list once in a while as a "state of the union"
> >>>> report.
> >>>>>
> >>>>> Initial assignment of issues to components should be made by those
> >>> people
> >>>>> opening the issue. The maintainer
> >>>>> of that tagged component needs to change the tag, if the component
> >> was
> >>>>> classified incorrectly.
> >>>>>
> >>>>>
> >>>>> ======================================
> >>>>> Initial Components and Maintainers Suggestion
> >>>>> ======================================
> >>>>>
> >>>>> Below is a suggestion of how to define components for Flink. One goal
> >>> of
> >>>>> the division was to make it
> >>>>> obvious for the majority of questions and contributions to which
> >>>> component
> >>>>> they would relate. Otherwise,
> >>>>> if many contributions had fuzzy component associations, we would
> >> again
> >>>> not
> >>>>> solve the issue of having clear
> >>>>> responsibilities for who would track the progress and resolution.
> >>>>>
> >>>>> We also looked at each component and wrote the names of some people
> >> who
> >>>> we
> >>>>> thought were natural
> >>>>> experts for the components, and thus natural candidates for
> >>> maintainers.
> >>>>>
> >>>>> **These names are only a starting point for discussion.**
> >>>>>
> >>>>> Once agreed upon, the components and names of maintainers should be
> >>> kept
> >>>> in
> >>>>> the wiki and updated as
> >>>>> components change and people step up or down.
> >>>>>
> >>>>>
> >>>>> *DataSet API* (*Fabian, Greg, Gabor*)
> >>>>>   - Incuding Hadoop compat. parts
> >>>>>
> >>>>> *DataStream API* (*Aljoscha, Max, Stephan*)
> >>>>>
> >>>>> *Runtime*
> >>>>>   - Distributed Coordination (JobManager/TaskManager, Akka)  (*Till*)
> >>>>>   - Local Runtime (Memory Management, State Backends,
> >> Tasks/Operators)
> >>> (
> >>>>> *Stephan*)
> >>>>>   - Network (*Ufuk*)
> >>>>>
> >>>>> *Client/Optimizer* (*Fabian*)
> >>>>>
> >>>>> *Type system / Type extractor* (Timo)
> >>>>>
> >>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)
> >>>>>
> >>>>> *Libraries*
> >>>>>   - Gelly (*Vasia, Greg*)
> >>>>>   - ML (*Till, Theo*)
> >>>>>   - CEP (*Till*)
> >>>>>   - Python (*Chesnay*)
> >>>>>
> >>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
> >>>>>
> >>>>> *Streaming Connectors* (*Robert*, *Aljoscha*)
> >>>>>
> >>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*)
> >>>>>
> >>>>> *Storm Compatibility Layer* (*Mathias*)
> >>>>>
> >>>>> *Scala shell* (*Till*)
> >>>>>
> >>>>> *Startup Shell Scripts* (Ufuk)
> >>>>>
> >>>>> *Flink Build System, Maven Files* (*Robert*)
> >>>>>
> >>>>> *Documentation* (Ufuk)
> >>>>>
> >>>>>
> >>>>> Please let us know what you think about this proposal.
> >>>>> Happy discussing!
> >>>>>
> >>>>> Greetings,
> >>>>> Stephan
> >>>>
> >>>
> >>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Structure the Flink Open Source Development

Ufuk Celebi-2
Hey Stephan!

Thanks to you and the others who started this. I really like the
proposal and I'm happy to see my name on some components.

So, +1.

I'd say let's wait until the end of the week/beginning of next week to
see if there is any disagreement with the propsal in the community
(doesn't look like it so far ;-)). Then we can continue to execute
this. :-)

– Ufuk


On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen <[hidden email]> wrote:

> Yes, Matthias, that was supposed to be you.
> Sorry from another guy who frequently has his name misspelled ;-)
>
> On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax <[hidden email]> wrote:
>
>> +1 from my side.
>>
>> Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
>> it's me, even the correct spelling would be with two 't' :P)
>>
>> -Matthias
>>
>> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
>> > +1 for the proposal
>> > On May 12, 2016 12:13 PM, "Stephan Ewen" <[hidden email]> wrote:
>> >
>> >> Yes, Gabor Gevay, that did refer to you!
>> >>
>> >> Sorry for the ambiguity...
>> >>
>> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
>> [hidden email]
>> >>>
>> >> wrote:
>> >>
>> >>> +1 for the proposal
>> >>> @ggevay: I do think that it refers to you. :)
>> >>>
>> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <[hidden email]>
>> wrote:
>> >>>
>> >>>> Hello,
>> >>>>
>> >>>> There are at least three Gábors in the Flink community,  :) so
>> >>>> assuming that the Gábor in the list of maintainers of the DataSet API
>> >>>> is referring to me, I'll be happy to do it. :)
>> >>>>
>> >>>> Best,
>> >>>> Gábor G.
>> >>>>
>> >>>>
>> >>>>
>> >>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <[hidden email]>:
>> >>>>> Hi everyone!
>> >>>>>
>> >>>>> We propose to establish some lightweight structures in the Flink open
>> >>>>> source community and development process,
>> >>>>> to help us better handle the increased interest in Flink (mailing
>> >> list
>> >>>> and
>> >>>>> pull requests), while not overwhelming the
>> >>>>> committers, and giving users and contributors a good experience.
>> >>>>>
>> >>>>> This proposal is triggered by the observation that we are reaching
>> >> the
>> >>>>> limits of where the current community can support
>> >>>>> users and guide new contributors. The below proposal is based on
>> >>>>> observations and ideas from Till, Robert, and me.
>> >>>>>
>> >>>>> ========
>> >>>>> Goals
>> >>>>> ========
>> >>>>>
>> >>>>> We try to achieve the following
>> >>>>>
>> >>>>>   - Pull requests get handled in a timely fashion
>> >>>>>   - New contributors are better integrated into the community
>> >>>>>   - The community feels empowered on the mailing list.
>> >>>>>     But questions that need the attention of someone that has deep
>> >>>>> knowledge of a certain part of Flink get their attention.
>> >>>>>   - At the same time, the committers that are knowledgeable about
>> >> many
>> >>>> core
>> >>>>> parts do not get completely overwhelmed.
>> >>>>>   - We don't overlook threads that report critical issues.
>> >>>>>   - We always have a pretty good overview of what the status of
>> >> certain
>> >>>>> parts of the system are.
>> >>>>>       -> What are often encountered known issues
>> >>>>>       -> What are the most frequently requested features
>> >>>>>
>> >>>>>
>> >>>>> ========
>> >>>>> Problems
>> >>>>> ========
>> >>>>>
>> >>>>> Looking into the process, there are two big issues:
>> >>>>>
>> >>>>> (1) Up to now, we have been relying on the fact that everything just
>> >>>>> "organizes itself", driven by best effort. That assumes
>> >>>>> that everyone feels equally responsible for every part, question, and
>> >>>>> contribution. At the current state, this is impossible
>> >>>>> to maintain, it overwhelms the committers and contributors.
>> >>>>>
>> >>>>> Example: Pull requests are picked up by whoever wants to pick them
>> >> up.
>> >>>> Pull
>> >>>>> requests that are a lot of work, have little
>> >>>>> chance of getting in, or relate to less active components are
>> >> sometimes
>> >>>> not
>> >>>>> picked up. When contributors are pretty
>> >>>>> loaded already, it may happen that no one eventually feels
>> >> responsible
>> >>> to
>> >>>>> pick up a pull request, and it falls through the cracks.
>> >>>>>
>> >>>>> (2) There is no good overview of what are known shortcomings,
>> >> efforts,
>> >>>> and
>> >>>>> requested features for different parts of the system.
>> >>>>> This information exists in various peoples' heads, but is not easily
>> >>>>> accessible for new people. The Flink JIRA is not well
>> >>>>> maintained, it is not easy to draw insights from that.
>> >>>>>
>> >>>>>
>> >>>>> ===========
>> >>>>> The Proposal
>> >>>>> ===========
>> >>>>>
>> >>>>> Since we are building a parallel system, the natural solution seems
>> >> to
>> >>>> be:
>> >>>>> partition the workload ;-)
>> >>>>>
>> >>>>> We propose to define a set of components for Flink. Each component is
>> >>>>> maintained or tracked by one or more
>> >>>>> people - let's call them maintainers. It is important to note that we
>> >>>> don't
>> >>>>> suggest the maintainers as an authoritative role, but
>> >>>>> simply as committers or contributors that visibly step up for a
>> >> certain
>> >>>>> component, and mainly track and drive the efforts
>> >>>>> pertaining to that component.
>> >>>>>
>> >>>>> It is also important to realize that we do not want to suggest that
>> >>>> people
>> >>>>> get less involved with certain parts and components, because
>> >>>>> they are not the maintainers. We simply want to make sure that each
>> >>> pull
>> >>>>> request or question or contribution has in the end
>> >>>>> one person (or a small set of people) responsible for catching and
>> >>>> tracking
>> >>>>> it, if it was not worked on by the pro-active
>> >>>>> community.
>> >>>>>
>> >>>>> For some components, having multiple maintainers will be helpful. In
>> >>> that
>> >>>>> case, one maintainer should be the "chair" or "lead"
>> >>>>> and make sure that no issue of that component gets lost between the
>> >>>>> multiple maintainers.
>> >>>>>
>> >>>>>
>> >>>>> A maintainers' role is:
>> >>>>> -----------------------------
>> >>>>>
>> >>>>>   - Have an overview of which of the open pull requests relate to
>> >> their
>> >>>>> component
>> >>>>>   - Drive the pull requests relating to the component to resolution
>> >>>>>       => Moderate the decision whether the feature should be merged
>> >>>>>       => Make sure the pull request gets a shepherd.
>> >>>>>            In many cases, the maintainers would shepherd themselves.
>> >>>>>       => In case the shepherd becomes inactive, the maintainers need
>> >> to
>> >>>>> find a new shepherd.
>> >>>>>
>> >>>>>   - Have an overview of what are the known issues of their component
>> >>>>>   - Have an overview of what are the frequently requested features of
>> >>>> their
>> >>>>> component
>> >>>>>
>> >>>>>   - Have an overview of which contributors are doing very good work
>> >> in
>> >>>>> their component,
>> >>>>>     would be candidates for committers, and should be mentored
>> >> towards
>> >>>> that.
>> >>>>>
>> >>>>>   - Resolve email threads that have been brought to their attention,
>> >>>>> because deeper
>> >>>>>     component knowledge is required for that thread.
>> >>>>>
>> >>>>> A maintainers' role is NOT:
>> >>>>> ----------------------------------
>> >>>>>
>> >>>>>   - Review all pull requests of that component
>> >>>>>   - Answer every mail with questions about that component
>> >>>>>   - Fix all bugs and implement all features of that components
>> >>>>>
>> >>>>>
>> >>>>> We imagine the following way that the community and the maintainers
>> >>>>> interact:
>> >>>>>
>> >>>>
>> >>>
>> >>
>> ---------------------------------------------------------------------------------------------------------
>> >>>>>
>> >>>>>   - Pull requests should be tagged by component. Since we cannot add
>> >>>> labels
>> >>>>> at this point, we need
>> >>>>>     to rely on the following:
>> >>>>>      => The pull request opener should name the pull request like
>> >>>>> "[FLINK-XXX] [component] Title"
>> >>>>>      => Components can be (re) tagged by adding special comments in
>> >> the
>> >>>>> pull request ("==> component client")
>> >>>>>      => With some luck, GitHub and Apache Infra will allow us to use
>> >>>> labels
>> >>>>> at some point
>> >>>>>
>> >>>>>   - When pull requests are associated with a component, the
>> >> maintainers
>> >>>>> will manage them
>> >>>>>     (decision whether to add, find shepherd, catch dropped pull
>> >>> requests)
>> >>>>>
>> >>>>>   - We assume that maintainers frequently reach out to other
>> >> community
>> >>>>> members and ask them if they want
>> >>>>>     to shepherd a pull request.
>> >>>>>
>> >>>>>   - On the mailing list, everyone should feel equally empowered to
>> >>> answer
>> >>>>> and discuss.
>> >>>>>     If at some point in the discussion, some deep technical knowledge
>> >>>> about
>> >>>>> a component is required,
>> >>>>>     the maintainer(s) should be drawn into the discussion.
>> >>>>>     Because the Mailing List infrastructure has no support to tag
>> >>>> threads,
>> >>>>> here are some simple workarounds:
>> >>>>>
>> >>>>>     => One possibility is to put the maintainers' mail addresses on
>> >> cc
>> >>>> for
>> >>>>> the thread, so they get the mail
>> >>>>>           not just via l the mailing list
>> >>>>>     => Another way would be to post something like "+maintainer
>> >>> runtime"
>> >>>> in
>> >>>>> the thread and the "runtime"
>> >>>>>          maintainers would have a filter/alert on these keywords in
>> >>> their
>> >>>>> mail program.
>> >>>>>
>> >>>>>   - We assume that maintainers will reach out to community members
>> >> that
>> >>>> are
>> >>>>> very active and helpful in
>> >>>>>     a component, and will ask them if they want to be added as
>> >>>> maintainers.
>> >>>>>     That will make it visible that those people are experts for that
>> >>> part
>> >>>>> of Flink.
>> >>>>>
>> >>>>>
>> >>>>> ======================================
>> >>>>> Maintainers: Committers and Contributors
>> >>>>> ======================================
>> >>>>>
>> >>>>> It helps if maintainers are committers (since we want them to resolve
>> >>>> pull
>> >>>>> requests which often involves
>> >>>>> merging them).
>> >>>>>
>> >>>>> Components with multiple maintainers can easily have non-committer
>> >>>>> contributors in addition to committer
>> >>>>> contributors.
>> >>>>>
>> >>>>>
>> >>>>> ======
>> >>>>> JIRA
>> >>>>> ======
>> >>>>>
>> >>>>> Ideally, JIRA can be used to get an overview of what are the known
>> >>> issues
>> >>>>> of each component, and what are
>> >>>>> common feature requests. Unfortunately, the Flink JIRA is quite
>> >>>> unorganized
>> >>>>> right now.
>> >>>>>
>> >>>>> A natural followup effort of this proposal would be to define in JIRA
>> >>> the
>> >>>>> same components as we defined here,
>> >>>>> and have the maintainers keep JIRA meaningful for that particular
>> >>>>> component. That would allow us to
>> >>>>> easily generate some tables out of JIRA (like top known issues per
>> >>>>> component, most requested features)
>> >>>>> post them on the dev list once in a while as a "state of the union"
>> >>>> report.
>> >>>>>
>> >>>>> Initial assignment of issues to components should be made by those
>> >>> people
>> >>>>> opening the issue. The maintainer
>> >>>>> of that tagged component needs to change the tag, if the component
>> >> was
>> >>>>> classified incorrectly.
>> >>>>>
>> >>>>>
>> >>>>> ======================================
>> >>>>> Initial Components and Maintainers Suggestion
>> >>>>> ======================================
>> >>>>>
>> >>>>> Below is a suggestion of how to define components for Flink. One goal
>> >>> of
>> >>>>> the division was to make it
>> >>>>> obvious for the majority of questions and contributions to which
>> >>>> component
>> >>>>> they would relate. Otherwise,
>> >>>>> if many contributions had fuzzy component associations, we would
>> >> again
>> >>>> not
>> >>>>> solve the issue of having clear
>> >>>>> responsibilities for who would track the progress and resolution.
>> >>>>>
>> >>>>> We also looked at each component and wrote the names of some people
>> >> who
>> >>>> we
>> >>>>> thought were natural
>> >>>>> experts for the components, and thus natural candidates for
>> >>> maintainers.
>> >>>>>
>> >>>>> **These names are only a starting point for discussion.**
>> >>>>>
>> >>>>> Once agreed upon, the components and names of maintainers should be
>> >>> kept
>> >>>> in
>> >>>>> the wiki and updated as
>> >>>>> components change and people step up or down.
>> >>>>>
>> >>>>>
>> >>>>> *DataSet API* (*Fabian, Greg, Gabor*)
>> >>>>>   - Incuding Hadoop compat. parts
>> >>>>>
>> >>>>> *DataStream API* (*Aljoscha, Max, Stephan*)
>> >>>>>
>> >>>>> *Runtime*
>> >>>>>   - Distributed Coordination (JobManager/TaskManager, Akka)  (*Till*)
>> >>>>>   - Local Runtime (Memory Management, State Backends,
>> >> Tasks/Operators)
>> >>> (
>> >>>>> *Stephan*)
>> >>>>>   - Network (*Ufuk*)
>> >>>>>
>> >>>>> *Client/Optimizer* (*Fabian*)
>> >>>>>
>> >>>>> *Type system / Type extractor* (Timo)
>> >>>>>
>> >>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)
>> >>>>>
>> >>>>> *Libraries*
>> >>>>>   - Gelly (*Vasia, Greg*)
>> >>>>>   - ML (*Till, Theo*)
>> >>>>>   - CEP (*Till*)
>> >>>>>   - Python (*Chesnay*)
>> >>>>>
>> >>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
>> >>>>>
>> >>>>> *Streaming Connectors* (*Robert*, *Aljoscha*)
>> >>>>>
>> >>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*)
>> >>>>>
>> >>>>> *Storm Compatibility Layer* (*Mathias*)
>> >>>>>
>> >>>>> *Scala shell* (*Till*)
>> >>>>>
>> >>>>> *Startup Shell Scripts* (Ufuk)
>> >>>>>
>> >>>>> *Flink Build System, Maven Files* (*Robert*)
>> >>>>>
>> >>>>> *Documentation* (Ufuk)
>> >>>>>
>> >>>>>
>> >>>>> Please let us know what you think about this proposal.
>> >>>>> Happy discussing!
>> >>>>>
>> >>>>> Greetings,
>> >>>>> Stephan
>> >>>>
>> >>>
>> >>
>> >
>>
>>
mxm
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Structure the Flink Open Source Development

mxm
In reply to this post by Stephan Ewen
+1 for the initiative. With a better process we will improve the
quality of the Flink development and give us more time to focus.

Could we have another category "Infrastructure"? This would concern
things like CI, nightly deployment of snapshots/documentation, ASF
Infra communication. Robert and me could be the initial maintainers
for that.

On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen <[hidden email]> wrote:

> Yes, Matthias, that was supposed to be you.
> Sorry from another guy who frequently has his name misspelled ;-)
>
> On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax <[hidden email]> wrote:
>
>> +1 from my side.
>>
>> Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
>> it's me, even the correct spelling would be with two 't' :P)
>>
>> -Matthias
>>
>> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
>> > +1 for the proposal
>> > On May 12, 2016 12:13 PM, "Stephan Ewen" <[hidden email]> wrote:
>> >
>> >> Yes, Gabor Gevay, that did refer to you!
>> >>
>> >> Sorry for the ambiguity...
>> >>
>> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
>> [hidden email]
>> >>>
>> >> wrote:
>> >>
>> >>> +1 for the proposal
>> >>> @ggevay: I do think that it refers to you. :)
>> >>>
>> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <[hidden email]>
>> wrote:
>> >>>
>> >>>> Hello,
>> >>>>
>> >>>> There are at least three Gábors in the Flink community,  :) so
>> >>>> assuming that the Gábor in the list of maintainers of the DataSet API
>> >>>> is referring to me, I'll be happy to do it. :)
>> >>>>
>> >>>> Best,
>> >>>> Gábor G.
>> >>>>
>> >>>>
>> >>>>
>> >>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <[hidden email]>:
>> >>>>> Hi everyone!
>> >>>>>
>> >>>>> We propose to establish some lightweight structures in the Flink open
>> >>>>> source community and development process,
>> >>>>> to help us better handle the increased interest in Flink (mailing
>> >> list
>> >>>> and
>> >>>>> pull requests), while not overwhelming the
>> >>>>> committers, and giving users and contributors a good experience.
>> >>>>>
>> >>>>> This proposal is triggered by the observation that we are reaching
>> >> the
>> >>>>> limits of where the current community can support
>> >>>>> users and guide new contributors. The below proposal is based on
>> >>>>> observations and ideas from Till, Robert, and me.
>> >>>>>
>> >>>>> ========
>> >>>>> Goals
>> >>>>> ========
>> >>>>>
>> >>>>> We try to achieve the following
>> >>>>>
>> >>>>>   - Pull requests get handled in a timely fashion
>> >>>>>   - New contributors are better integrated into the community
>> >>>>>   - The community feels empowered on the mailing list.
>> >>>>>     But questions that need the attention of someone that has deep
>> >>>>> knowledge of a certain part of Flink get their attention.
>> >>>>>   - At the same time, the committers that are knowledgeable about
>> >> many
>> >>>> core
>> >>>>> parts do not get completely overwhelmed.
>> >>>>>   - We don't overlook threads that report critical issues.
>> >>>>>   - We always have a pretty good overview of what the status of
>> >> certain
>> >>>>> parts of the system are.
>> >>>>>       -> What are often encountered known issues
>> >>>>>       -> What are the most frequently requested features
>> >>>>>
>> >>>>>
>> >>>>> ========
>> >>>>> Problems
>> >>>>> ========
>> >>>>>
>> >>>>> Looking into the process, there are two big issues:
>> >>>>>
>> >>>>> (1) Up to now, we have been relying on the fact that everything just
>> >>>>> "organizes itself", driven by best effort. That assumes
>> >>>>> that everyone feels equally responsible for every part, question, and
>> >>>>> contribution. At the current state, this is impossible
>> >>>>> to maintain, it overwhelms the committers and contributors.
>> >>>>>
>> >>>>> Example: Pull requests are picked up by whoever wants to pick them
>> >> up.
>> >>>> Pull
>> >>>>> requests that are a lot of work, have little
>> >>>>> chance of getting in, or relate to less active components are
>> >> sometimes
>> >>>> not
>> >>>>> picked up. When contributors are pretty
>> >>>>> loaded already, it may happen that no one eventually feels
>> >> responsible
>> >>> to
>> >>>>> pick up a pull request, and it falls through the cracks.
>> >>>>>
>> >>>>> (2) There is no good overview of what are known shortcomings,
>> >> efforts,
>> >>>> and
>> >>>>> requested features for different parts of the system.
>> >>>>> This information exists in various peoples' heads, but is not easily
>> >>>>> accessible for new people. The Flink JIRA is not well
>> >>>>> maintained, it is not easy to draw insights from that.
>> >>>>>
>> >>>>>
>> >>>>> ===========
>> >>>>> The Proposal
>> >>>>> ===========
>> >>>>>
>> >>>>> Since we are building a parallel system, the natural solution seems
>> >> to
>> >>>> be:
>> >>>>> partition the workload ;-)
>> >>>>>
>> >>>>> We propose to define a set of components for Flink. Each component is
>> >>>>> maintained or tracked by one or more
>> >>>>> people - let's call them maintainers. It is important to note that we
>> >>>> don't
>> >>>>> suggest the maintainers as an authoritative role, but
>> >>>>> simply as committers or contributors that visibly step up for a
>> >> certain
>> >>>>> component, and mainly track and drive the efforts
>> >>>>> pertaining to that component.
>> >>>>>
>> >>>>> It is also important to realize that we do not want to suggest that
>> >>>> people
>> >>>>> get less involved with certain parts and components, because
>> >>>>> they are not the maintainers. We simply want to make sure that each
>> >>> pull
>> >>>>> request or question or contribution has in the end
>> >>>>> one person (or a small set of people) responsible for catching and
>> >>>> tracking
>> >>>>> it, if it was not worked on by the pro-active
>> >>>>> community.
>> >>>>>
>> >>>>> For some components, having multiple maintainers will be helpful. In
>> >>> that
>> >>>>> case, one maintainer should be the "chair" or "lead"
>> >>>>> and make sure that no issue of that component gets lost between the
>> >>>>> multiple maintainers.
>> >>>>>
>> >>>>>
>> >>>>> A maintainers' role is:
>> >>>>> -----------------------------
>> >>>>>
>> >>>>>   - Have an overview of which of the open pull requests relate to
>> >> their
>> >>>>> component
>> >>>>>   - Drive the pull requests relating to the component to resolution
>> >>>>>       => Moderate the decision whether the feature should be merged
>> >>>>>       => Make sure the pull request gets a shepherd.
>> >>>>>            In many cases, the maintainers would shepherd themselves.
>> >>>>>       => In case the shepherd becomes inactive, the maintainers need
>> >> to
>> >>>>> find a new shepherd.
>> >>>>>
>> >>>>>   - Have an overview of what are the known issues of their component
>> >>>>>   - Have an overview of what are the frequently requested features of
>> >>>> their
>> >>>>> component
>> >>>>>
>> >>>>>   - Have an overview of which contributors are doing very good work
>> >> in
>> >>>>> their component,
>> >>>>>     would be candidates for committers, and should be mentored
>> >> towards
>> >>>> that.
>> >>>>>
>> >>>>>   - Resolve email threads that have been brought to their attention,
>> >>>>> because deeper
>> >>>>>     component knowledge is required for that thread.
>> >>>>>
>> >>>>> A maintainers' role is NOT:
>> >>>>> ----------------------------------
>> >>>>>
>> >>>>>   - Review all pull requests of that component
>> >>>>>   - Answer every mail with questions about that component
>> >>>>>   - Fix all bugs and implement all features of that components
>> >>>>>
>> >>>>>
>> >>>>> We imagine the following way that the community and the maintainers
>> >>>>> interact:
>> >>>>>
>> >>>>
>> >>>
>> >>
>> ---------------------------------------------------------------------------------------------------------
>> >>>>>
>> >>>>>   - Pull requests should be tagged by component. Since we cannot add
>> >>>> labels
>> >>>>> at this point, we need
>> >>>>>     to rely on the following:
>> >>>>>      => The pull request opener should name the pull request like
>> >>>>> "[FLINK-XXX] [component] Title"
>> >>>>>      => Components can be (re) tagged by adding special comments in
>> >> the
>> >>>>> pull request ("==> component client")
>> >>>>>      => With some luck, GitHub and Apache Infra will allow us to use
>> >>>> labels
>> >>>>> at some point
>> >>>>>
>> >>>>>   - When pull requests are associated with a component, the
>> >> maintainers
>> >>>>> will manage them
>> >>>>>     (decision whether to add, find shepherd, catch dropped pull
>> >>> requests)
>> >>>>>
>> >>>>>   - We assume that maintainers frequently reach out to other
>> >> community
>> >>>>> members and ask them if they want
>> >>>>>     to shepherd a pull request.
>> >>>>>
>> >>>>>   - On the mailing list, everyone should feel equally empowered to
>> >>> answer
>> >>>>> and discuss.
>> >>>>>     If at some point in the discussion, some deep technical knowledge
>> >>>> about
>> >>>>> a component is required,
>> >>>>>     the maintainer(s) should be drawn into the discussion.
>> >>>>>     Because the Mailing List infrastructure has no support to tag
>> >>>> threads,
>> >>>>> here are some simple workarounds:
>> >>>>>
>> >>>>>     => One possibility is to put the maintainers' mail addresses on
>> >> cc
>> >>>> for
>> >>>>> the thread, so they get the mail
>> >>>>>           not just via l the mailing list
>> >>>>>     => Another way would be to post something like "+maintainer
>> >>> runtime"
>> >>>> in
>> >>>>> the thread and the "runtime"
>> >>>>>          maintainers would have a filter/alert on these keywords in
>> >>> their
>> >>>>> mail program.
>> >>>>>
>> >>>>>   - We assume that maintainers will reach out to community members
>> >> that
>> >>>> are
>> >>>>> very active and helpful in
>> >>>>>     a component, and will ask them if they want to be added as
>> >>>> maintainers.
>> >>>>>     That will make it visible that those people are experts for that
>> >>> part
>> >>>>> of Flink.
>> >>>>>
>> >>>>>
>> >>>>> ======================================
>> >>>>> Maintainers: Committers and Contributors
>> >>>>> ======================================
>> >>>>>
>> >>>>> It helps if maintainers are committers (since we want them to resolve
>> >>>> pull
>> >>>>> requests which often involves
>> >>>>> merging them).
>> >>>>>
>> >>>>> Components with multiple maintainers can easily have non-committer
>> >>>>> contributors in addition to committer
>> >>>>> contributors.
>> >>>>>
>> >>>>>
>> >>>>> ======
>> >>>>> JIRA
>> >>>>> ======
>> >>>>>
>> >>>>> Ideally, JIRA can be used to get an overview of what are the known
>> >>> issues
>> >>>>> of each component, and what are
>> >>>>> common feature requests. Unfortunately, the Flink JIRA is quite
>> >>>> unorganized
>> >>>>> right now.
>> >>>>>
>> >>>>> A natural followup effort of this proposal would be to define in JIRA
>> >>> the
>> >>>>> same components as we defined here,
>> >>>>> and have the maintainers keep JIRA meaningful for that particular
>> >>>>> component. That would allow us to
>> >>>>> easily generate some tables out of JIRA (like top known issues per
>> >>>>> component, most requested features)
>> >>>>> post them on the dev list once in a while as a "state of the union"
>> >>>> report.
>> >>>>>
>> >>>>> Initial assignment of issues to components should be made by those
>> >>> people
>> >>>>> opening the issue. The maintainer
>> >>>>> of that tagged component needs to change the tag, if the component
>> >> was
>> >>>>> classified incorrectly.
>> >>>>>
>> >>>>>
>> >>>>> ======================================
>> >>>>> Initial Components and Maintainers Suggestion
>> >>>>> ======================================
>> >>>>>
>> >>>>> Below is a suggestion of how to define components for Flink. One goal
>> >>> of
>> >>>>> the division was to make it
>> >>>>> obvious for the majority of questions and contributions to which
>> >>>> component
>> >>>>> they would relate. Otherwise,
>> >>>>> if many contributions had fuzzy component associations, we would
>> >> again
>> >>>> not
>> >>>>> solve the issue of having clear
>> >>>>> responsibilities for who would track the progress and resolution.
>> >>>>>
>> >>>>> We also looked at each component and wrote the names of some people
>> >> who
>> >>>> we
>> >>>>> thought were natural
>> >>>>> experts for the components, and thus natural candidates for
>> >>> maintainers.
>> >>>>>
>> >>>>> **These names are only a starting point for discussion.**
>> >>>>>
>> >>>>> Once agreed upon, the components and names of maintainers should be
>> >>> kept
>> >>>> in
>> >>>>> the wiki and updated as
>> >>>>> components change and people step up or down.
>> >>>>>
>> >>>>>
>> >>>>> *DataSet API* (*Fabian, Greg, Gabor*)
>> >>>>>   - Incuding Hadoop compat. parts
>> >>>>>
>> >>>>> *DataStream API* (*Aljoscha, Max, Stephan*)
>> >>>>>
>> >>>>> *Runtime*
>> >>>>>   - Distributed Coordination (JobManager/TaskManager, Akka)  (*Till*)
>> >>>>>   - Local Runtime (Memory Management, State Backends,
>> >> Tasks/Operators)
>> >>> (
>> >>>>> *Stephan*)
>> >>>>>   - Network (*Ufuk*)
>> >>>>>
>> >>>>> *Client/Optimizer* (*Fabian*)
>> >>>>>
>> >>>>> *Type system / Type extractor* (Timo)
>> >>>>>
>> >>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)
>> >>>>>
>> >>>>> *Libraries*
>> >>>>>   - Gelly (*Vasia, Greg*)
>> >>>>>   - ML (*Till, Theo*)
>> >>>>>   - CEP (*Till*)
>> >>>>>   - Python (*Chesnay*)
>> >>>>>
>> >>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
>> >>>>>
>> >>>>> *Streaming Connectors* (*Robert*, *Aljoscha*)
>> >>>>>
>> >>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*)
>> >>>>>
>> >>>>> *Storm Compatibility Layer* (*Mathias*)
>> >>>>>
>> >>>>> *Scala shell* (*Till*)
>> >>>>>
>> >>>>> *Startup Shell Scripts* (Ufuk)
>> >>>>>
>> >>>>> *Flink Build System, Maven Files* (*Robert*)
>> >>>>>
>> >>>>> *Documentation* (Ufuk)
>> >>>>>
>> >>>>>
>> >>>>> Please let us know what you think about this proposal.
>> >>>>> Happy discussing!
>> >>>>>
>> >>>>> Greetings,
>> >>>>> Stephan
>> >>>>
>> >>>
>> >>
>> >
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Structure the Flink Open Source Development

Robert Metzger
tl;dr: +1

I also like the proposal a lot. Our community is growing at a quite fast
pace and we need to have some structure in place to still keep track of
everything going on.

I'm happy to see that the proposal mentions cleaning up our JIRA. This is
something that has been annoying me for quite a while, but its too big to
do it alone. If maintainers could take care of their components, we should
have covered already a lot there.

One question regarding the "chair" or "lead" role for components: Is the
first name in the list of maintainers the lead?

I would actually suggest to wait until all proposed maintainers agreed to
the proposal. It doesn't make sense to make somebody a maintainer of
something if they disagree or are not aware of it.




On Thu, May 12, 2016 at 2:13 PM, Maximilian Michels <[hidden email]> wrote:

> +1 for the initiative. With a better process we will improve the
> quality of the Flink development and give us more time to focus.
>
> Could we have another category "Infrastructure"? This would concern
> things like CI, nightly deployment of snapshots/documentation, ASF
> Infra communication. Robert and me could be the initial maintainers
> for that.
>
> On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen <[hidden email]> wrote:
> > Yes, Matthias, that was supposed to be you.
> > Sorry from another guy who frequently has his name misspelled ;-)
> >
> > On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax <[hidden email]>
> wrote:
> >
> >> +1 from my side.
> >>
> >> Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
> >> it's me, even the correct spelling would be with two 't' :P)
> >>
> >> -Matthias
> >>
> >> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
> >> > +1 for the proposal
> >> > On May 12, 2016 12:13 PM, "Stephan Ewen" <[hidden email]> wrote:
> >> >
> >> >> Yes, Gabor Gevay, that did refer to you!
> >> >>
> >> >> Sorry for the ambiguity...
> >> >>
> >> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
> >> [hidden email]
> >> >>>
> >> >> wrote:
> >> >>
> >> >>> +1 for the proposal
> >> >>> @ggevay: I do think that it refers to you. :)
> >> >>>
> >> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <[hidden email]>
> >> wrote:
> >> >>>
> >> >>>> Hello,
> >> >>>>
> >> >>>> There are at least three Gábors in the Flink community,  :) so
> >> >>>> assuming that the Gábor in the list of maintainers of the DataSet
> API
> >> >>>> is referring to me, I'll be happy to do it. :)
> >> >>>>
> >> >>>> Best,
> >> >>>> Gábor G.
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <[hidden email]>:
> >> >>>>> Hi everyone!
> >> >>>>>
> >> >>>>> We propose to establish some lightweight structures in the Flink
> open
> >> >>>>> source community and development process,
> >> >>>>> to help us better handle the increased interest in Flink (mailing
> >> >> list
> >> >>>> and
> >> >>>>> pull requests), while not overwhelming the
> >> >>>>> committers, and giving users and contributors a good experience.
> >> >>>>>
> >> >>>>> This proposal is triggered by the observation that we are reaching
> >> >> the
> >> >>>>> limits of where the current community can support
> >> >>>>> users and guide new contributors. The below proposal is based on
> >> >>>>> observations and ideas from Till, Robert, and me.
> >> >>>>>
> >> >>>>> ========
> >> >>>>> Goals
> >> >>>>> ========
> >> >>>>>
> >> >>>>> We try to achieve the following
> >> >>>>>
> >> >>>>>   - Pull requests get handled in a timely fashion
> >> >>>>>   - New contributors are better integrated into the community
> >> >>>>>   - The community feels empowered on the mailing list.
> >> >>>>>     But questions that need the attention of someone that has deep
> >> >>>>> knowledge of a certain part of Flink get their attention.
> >> >>>>>   - At the same time, the committers that are knowledgeable about
> >> >> many
> >> >>>> core
> >> >>>>> parts do not get completely overwhelmed.
> >> >>>>>   - We don't overlook threads that report critical issues.
> >> >>>>>   - We always have a pretty good overview of what the status of
> >> >> certain
> >> >>>>> parts of the system are.
> >> >>>>>       -> What are often encountered known issues
> >> >>>>>       -> What are the most frequently requested features
> >> >>>>>
> >> >>>>>
> >> >>>>> ========
> >> >>>>> Problems
> >> >>>>> ========
> >> >>>>>
> >> >>>>> Looking into the process, there are two big issues:
> >> >>>>>
> >> >>>>> (1) Up to now, we have been relying on the fact that everything
> just
> >> >>>>> "organizes itself", driven by best effort. That assumes
> >> >>>>> that everyone feels equally responsible for every part, question,
> and
> >> >>>>> contribution. At the current state, this is impossible
> >> >>>>> to maintain, it overwhelms the committers and contributors.
> >> >>>>>
> >> >>>>> Example: Pull requests are picked up by whoever wants to pick them
> >> >> up.
> >> >>>> Pull
> >> >>>>> requests that are a lot of work, have little
> >> >>>>> chance of getting in, or relate to less active components are
> >> >> sometimes
> >> >>>> not
> >> >>>>> picked up. When contributors are pretty
> >> >>>>> loaded already, it may happen that no one eventually feels
> >> >> responsible
> >> >>> to
> >> >>>>> pick up a pull request, and it falls through the cracks.
> >> >>>>>
> >> >>>>> (2) There is no good overview of what are known shortcomings,
> >> >> efforts,
> >> >>>> and
> >> >>>>> requested features for different parts of the system.
> >> >>>>> This information exists in various peoples' heads, but is not
> easily
> >> >>>>> accessible for new people. The Flink JIRA is not well
> >> >>>>> maintained, it is not easy to draw insights from that.
> >> >>>>>
> >> >>>>>
> >> >>>>> ===========
> >> >>>>> The Proposal
> >> >>>>> ===========
> >> >>>>>
> >> >>>>> Since we are building a parallel system, the natural solution
> seems
> >> >> to
> >> >>>> be:
> >> >>>>> partition the workload ;-)
> >> >>>>>
> >> >>>>> We propose to define a set of components for Flink. Each
> component is
> >> >>>>> maintained or tracked by one or more
> >> >>>>> people - let's call them maintainers. It is important to note
> that we
> >> >>>> don't
> >> >>>>> suggest the maintainers as an authoritative role, but
> >> >>>>> simply as committers or contributors that visibly step up for a
> >> >> certain
> >> >>>>> component, and mainly track and drive the efforts
> >> >>>>> pertaining to that component.
> >> >>>>>
> >> >>>>> It is also important to realize that we do not want to suggest
> that
> >> >>>> people
> >> >>>>> get less involved with certain parts and components, because
> >> >>>>> they are not the maintainers. We simply want to make sure that
> each
> >> >>> pull
> >> >>>>> request or question or contribution has in the end
> >> >>>>> one person (or a small set of people) responsible for catching and
> >> >>>> tracking
> >> >>>>> it, if it was not worked on by the pro-active
> >> >>>>> community.
> >> >>>>>
> >> >>>>> For some components, having multiple maintainers will be helpful.
> In
> >> >>> that
> >> >>>>> case, one maintainer should be the "chair" or "lead"
> >> >>>>> and make sure that no issue of that component gets lost between
> the
> >> >>>>> multiple maintainers.
> >> >>>>>
> >> >>>>>
> >> >>>>> A maintainers' role is:
> >> >>>>> -----------------------------
> >> >>>>>
> >> >>>>>   - Have an overview of which of the open pull requests relate to
> >> >> their
> >> >>>>> component
> >> >>>>>   - Drive the pull requests relating to the component to
> resolution
> >> >>>>>       => Moderate the decision whether the feature should be
> merged
> >> >>>>>       => Make sure the pull request gets a shepherd.
> >> >>>>>            In many cases, the maintainers would shepherd
> themselves.
> >> >>>>>       => In case the shepherd becomes inactive, the maintainers
> need
> >> >> to
> >> >>>>> find a new shepherd.
> >> >>>>>
> >> >>>>>   - Have an overview of what are the known issues of their
> component
> >> >>>>>   - Have an overview of what are the frequently requested
> features of
> >> >>>> their
> >> >>>>> component
> >> >>>>>
> >> >>>>>   - Have an overview of which contributors are doing very good
> work
> >> >> in
> >> >>>>> their component,
> >> >>>>>     would be candidates for committers, and should be mentored
> >> >> towards
> >> >>>> that.
> >> >>>>>
> >> >>>>>   - Resolve email threads that have been brought to their
> attention,
> >> >>>>> because deeper
> >> >>>>>     component knowledge is required for that thread.
> >> >>>>>
> >> >>>>> A maintainers' role is NOT:
> >> >>>>> ----------------------------------
> >> >>>>>
> >> >>>>>   - Review all pull requests of that component
> >> >>>>>   - Answer every mail with questions about that component
> >> >>>>>   - Fix all bugs and implement all features of that components
> >> >>>>>
> >> >>>>>
> >> >>>>> We imagine the following way that the community and the
> maintainers
> >> >>>>> interact:
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >>
> ---------------------------------------------------------------------------------------------------------
> >> >>>>>
> >> >>>>>   - Pull requests should be tagged by component. Since we cannot
> add
> >> >>>> labels
> >> >>>>> at this point, we need
> >> >>>>>     to rely on the following:
> >> >>>>>      => The pull request opener should name the pull request like
> >> >>>>> "[FLINK-XXX] [component] Title"
> >> >>>>>      => Components can be (re) tagged by adding special comments
> in
> >> >> the
> >> >>>>> pull request ("==> component client")
> >> >>>>>      => With some luck, GitHub and Apache Infra will allow us to
> use
> >> >>>> labels
> >> >>>>> at some point
> >> >>>>>
> >> >>>>>   - When pull requests are associated with a component, the
> >> >> maintainers
> >> >>>>> will manage them
> >> >>>>>     (decision whether to add, find shepherd, catch dropped pull
> >> >>> requests)
> >> >>>>>
> >> >>>>>   - We assume that maintainers frequently reach out to other
> >> >> community
> >> >>>>> members and ask them if they want
> >> >>>>>     to shepherd a pull request.
> >> >>>>>
> >> >>>>>   - On the mailing list, everyone should feel equally empowered to
> >> >>> answer
> >> >>>>> and discuss.
> >> >>>>>     If at some point in the discussion, some deep technical
> knowledge
> >> >>>> about
> >> >>>>> a component is required,
> >> >>>>>     the maintainer(s) should be drawn into the discussion.
> >> >>>>>     Because the Mailing List infrastructure has no support to tag
> >> >>>> threads,
> >> >>>>> here are some simple workarounds:
> >> >>>>>
> >> >>>>>     => One possibility is to put the maintainers' mail addresses
> on
> >> >> cc
> >> >>>> for
> >> >>>>> the thread, so they get the mail
> >> >>>>>           not just via l the mailing list
> >> >>>>>     => Another way would be to post something like "+maintainer
> >> >>> runtime"
> >> >>>> in
> >> >>>>> the thread and the "runtime"
> >> >>>>>          maintainers would have a filter/alert on these keywords
> in
> >> >>> their
> >> >>>>> mail program.
> >> >>>>>
> >> >>>>>   - We assume that maintainers will reach out to community members
> >> >> that
> >> >>>> are
> >> >>>>> very active and helpful in
> >> >>>>>     a component, and will ask them if they want to be added as
> >> >>>> maintainers.
> >> >>>>>     That will make it visible that those people are experts for
> that
> >> >>> part
> >> >>>>> of Flink.
> >> >>>>>
> >> >>>>>
> >> >>>>> ======================================
> >> >>>>> Maintainers: Committers and Contributors
> >> >>>>> ======================================
> >> >>>>>
> >> >>>>> It helps if maintainers are committers (since we want them to
> resolve
> >> >>>> pull
> >> >>>>> requests which often involves
> >> >>>>> merging them).
> >> >>>>>
> >> >>>>> Components with multiple maintainers can easily have non-committer
> >> >>>>> contributors in addition to committer
> >> >>>>> contributors.
> >> >>>>>
> >> >>>>>
> >> >>>>> ======
> >> >>>>> JIRA
> >> >>>>> ======
> >> >>>>>
> >> >>>>> Ideally, JIRA can be used to get an overview of what are the known
> >> >>> issues
> >> >>>>> of each component, and what are
> >> >>>>> common feature requests. Unfortunately, the Flink JIRA is quite
> >> >>>> unorganized
> >> >>>>> right now.
> >> >>>>>
> >> >>>>> A natural followup effort of this proposal would be to define in
> JIRA
> >> >>> the
> >> >>>>> same components as we defined here,
> >> >>>>> and have the maintainers keep JIRA meaningful for that particular
> >> >>>>> component. That would allow us to
> >> >>>>> easily generate some tables out of JIRA (like top known issues per
> >> >>>>> component, most requested features)
> >> >>>>> post them on the dev list once in a while as a "state of the
> union"
> >> >>>> report.
> >> >>>>>
> >> >>>>> Initial assignment of issues to components should be made by those
> >> >>> people
> >> >>>>> opening the issue. The maintainer
> >> >>>>> of that tagged component needs to change the tag, if the component
> >> >> was
> >> >>>>> classified incorrectly.
> >> >>>>>
> >> >>>>>
> >> >>>>> ======================================
> >> >>>>> Initial Components and Maintainers Suggestion
> >> >>>>> ======================================
> >> >>>>>
> >> >>>>> Below is a suggestion of how to define components for Flink. One
> goal
> >> >>> of
> >> >>>>> the division was to make it
> >> >>>>> obvious for the majority of questions and contributions to which
> >> >>>> component
> >> >>>>> they would relate. Otherwise,
> >> >>>>> if many contributions had fuzzy component associations, we would
> >> >> again
> >> >>>> not
> >> >>>>> solve the issue of having clear
> >> >>>>> responsibilities for who would track the progress and resolution.
> >> >>>>>
> >> >>>>> We also looked at each component and wrote the names of some
> people
> >> >> who
> >> >>>> we
> >> >>>>> thought were natural
> >> >>>>> experts for the components, and thus natural candidates for
> >> >>> maintainers.
> >> >>>>>
> >> >>>>> **These names are only a starting point for discussion.**
> >> >>>>>
> >> >>>>> Once agreed upon, the components and names of maintainers should
> be
> >> >>> kept
> >> >>>> in
> >> >>>>> the wiki and updated as
> >> >>>>> components change and people step up or down.
> >> >>>>>
> >> >>>>>
> >> >>>>> *DataSet API* (*Fabian, Greg, Gabor*)
> >> >>>>>   - Incuding Hadoop compat. parts
> >> >>>>>
> >> >>>>> *DataStream API* (*Aljoscha, Max, Stephan*)
> >> >>>>>
> >> >>>>> *Runtime*
> >> >>>>>   - Distributed Coordination (JobManager/TaskManager, Akka)
> (*Till*)
> >> >>>>>   - Local Runtime (Memory Management, State Backends,
> >> >> Tasks/Operators)
> >> >>> (
> >> >>>>> *Stephan*)
> >> >>>>>   - Network (*Ufuk*)
> >> >>>>>
> >> >>>>> *Client/Optimizer* (*Fabian*)
> >> >>>>>
> >> >>>>> *Type system / Type extractor* (Timo)
> >> >>>>>
> >> >>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)
> >> >>>>>
> >> >>>>> *Libraries*
> >> >>>>>   - Gelly (*Vasia, Greg*)
> >> >>>>>   - ML (*Till, Theo*)
> >> >>>>>   - CEP (*Till*)
> >> >>>>>   - Python (*Chesnay*)
> >> >>>>>
> >> >>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
> >> >>>>>
> >> >>>>> *Streaming Connectors* (*Robert*, *Aljoscha*)
> >> >>>>>
> >> >>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*)
> >> >>>>>
> >> >>>>> *Storm Compatibility Layer* (*Mathias*)
> >> >>>>>
> >> >>>>> *Scala shell* (*Till*)
> >> >>>>>
> >> >>>>> *Startup Shell Scripts* (Ufuk)
> >> >>>>>
> >> >>>>> *Flink Build System, Maven Files* (*Robert*)
> >> >>>>>
> >> >>>>> *Documentation* (Ufuk)
> >> >>>>>
> >> >>>>>
> >> >>>>> Please let us know what you think about this proposal.
> >> >>>>> Happy discussing!
> >> >>>>>
> >> >>>>> Greetings,
> >> >>>>> Stephan
> >> >>>>
> >> >>>
> >> >>
> >> >
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Structure the Flink Open Source Development

Stephan Ewen
All maintainer candidates are only proposals so far. No indication of lead
or anything so far.

Let's first see if we agree on the structure proposed here, and if we take
the components as suggested here or if we refine the list.
Am 12.05.2016 17:45 schrieb "Robert Metzger" <[hidden email]>:

> tl;dr: +1
>
> I also like the proposal a lot. Our community is growing at a quite fast
> pace and we need to have some structure in place to still keep track of
> everything going on.
>
> I'm happy to see that the proposal mentions cleaning up our JIRA. This is
> something that has been annoying me for quite a while, but its too big to
> do it alone. If maintainers could take care of their components, we should
> have covered already a lot there.
>
> One question regarding the "chair" or "lead" role for components: Is the
> first name in the list of maintainers the lead?
>
> I would actually suggest to wait until all proposed maintainers agreed to
> the proposal. It doesn't make sense to make somebody a maintainer of
> something if they disagree or are not aware of it.
>
>
>
>
> On Thu, May 12, 2016 at 2:13 PM, Maximilian Michels <[hidden email]>
> wrote:
>
> > +1 for the initiative. With a better process we will improve the
> > quality of the Flink development and give us more time to focus.
> >
> > Could we have another category "Infrastructure"? This would concern
> > things like CI, nightly deployment of snapshots/documentation, ASF
> > Infra communication. Robert and me could be the initial maintainers
> > for that.
> >
> > On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen <[hidden email]> wrote:
> > > Yes, Matthias, that was supposed to be you.
> > > Sorry from another guy who frequently has his name misspelled ;-)
> > >
> > > On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax <[hidden email]>
> > wrote:
> > >
> > >> +1 from my side.
> > >>
> > >> Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
> > >> it's me, even the correct spelling would be with two 't' :P)
> > >>
> > >> -Matthias
> > >>
> > >> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
> > >> > +1 for the proposal
> > >> > On May 12, 2016 12:13 PM, "Stephan Ewen" <[hidden email]> wrote:
> > >> >
> > >> >> Yes, Gabor Gevay, that did refer to you!
> > >> >>
> > >> >> Sorry for the ambiguity...
> > >> >>
> > >> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
> > >> [hidden email]
> > >> >>>
> > >> >> wrote:
> > >> >>
> > >> >>> +1 for the proposal
> > >> >>> @ggevay: I do think that it refers to you. :)
> > >> >>>
> > >> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <[hidden email]>
> > >> wrote:
> > >> >>>
> > >> >>>> Hello,
> > >> >>>>
> > >> >>>> There are at least three Gábors in the Flink community,  :) so
> > >> >>>> assuming that the Gábor in the list of maintainers of the DataSet
> > API
> > >> >>>> is referring to me, I'll be happy to do it. :)
> > >> >>>>
> > >> >>>> Best,
> > >> >>>> Gábor G.
> > >> >>>>
> > >> >>>>
> > >> >>>>
> > >> >>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <[hidden email]>:
> > >> >>>>> Hi everyone!
> > >> >>>>>
> > >> >>>>> We propose to establish some lightweight structures in the Flink
> > open
> > >> >>>>> source community and development process,
> > >> >>>>> to help us better handle the increased interest in Flink
> (mailing
> > >> >> list
> > >> >>>> and
> > >> >>>>> pull requests), while not overwhelming the
> > >> >>>>> committers, and giving users and contributors a good experience.
> > >> >>>>>
> > >> >>>>> This proposal is triggered by the observation that we are
> reaching
> > >> >> the
> > >> >>>>> limits of where the current community can support
> > >> >>>>> users and guide new contributors. The below proposal is based on
> > >> >>>>> observations and ideas from Till, Robert, and me.
> > >> >>>>>
> > >> >>>>> ========
> > >> >>>>> Goals
> > >> >>>>> ========
> > >> >>>>>
> > >> >>>>> We try to achieve the following
> > >> >>>>>
> > >> >>>>>   - Pull requests get handled in a timely fashion
> > >> >>>>>   - New contributors are better integrated into the community
> > >> >>>>>   - The community feels empowered on the mailing list.
> > >> >>>>>     But questions that need the attention of someone that has
> deep
> > >> >>>>> knowledge of a certain part of Flink get their attention.
> > >> >>>>>   - At the same time, the committers that are knowledgeable
> about
> > >> >> many
> > >> >>>> core
> > >> >>>>> parts do not get completely overwhelmed.
> > >> >>>>>   - We don't overlook threads that report critical issues.
> > >> >>>>>   - We always have a pretty good overview of what the status of
> > >> >> certain
> > >> >>>>> parts of the system are.
> > >> >>>>>       -> What are often encountered known issues
> > >> >>>>>       -> What are the most frequently requested features
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> ========
> > >> >>>>> Problems
> > >> >>>>> ========
> > >> >>>>>
> > >> >>>>> Looking into the process, there are two big issues:
> > >> >>>>>
> > >> >>>>> (1) Up to now, we have been relying on the fact that everything
> > just
> > >> >>>>> "organizes itself", driven by best effort. That assumes
> > >> >>>>> that everyone feels equally responsible for every part,
> question,
> > and
> > >> >>>>> contribution. At the current state, this is impossible
> > >> >>>>> to maintain, it overwhelms the committers and contributors.
> > >> >>>>>
> > >> >>>>> Example: Pull requests are picked up by whoever wants to pick
> them
> > >> >> up.
> > >> >>>> Pull
> > >> >>>>> requests that are a lot of work, have little
> > >> >>>>> chance of getting in, or relate to less active components are
> > >> >> sometimes
> > >> >>>> not
> > >> >>>>> picked up. When contributors are pretty
> > >> >>>>> loaded already, it may happen that no one eventually feels
> > >> >> responsible
> > >> >>> to
> > >> >>>>> pick up a pull request, and it falls through the cracks.
> > >> >>>>>
> > >> >>>>> (2) There is no good overview of what are known shortcomings,
> > >> >> efforts,
> > >> >>>> and
> > >> >>>>> requested features for different parts of the system.
> > >> >>>>> This information exists in various peoples' heads, but is not
> > easily
> > >> >>>>> accessible for new people. The Flink JIRA is not well
> > >> >>>>> maintained, it is not easy to draw insights from that.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> ===========
> > >> >>>>> The Proposal
> > >> >>>>> ===========
> > >> >>>>>
> > >> >>>>> Since we are building a parallel system, the natural solution
> > seems
> > >> >> to
> > >> >>>> be:
> > >> >>>>> partition the workload ;-)
> > >> >>>>>
> > >> >>>>> We propose to define a set of components for Flink. Each
> > component is
> > >> >>>>> maintained or tracked by one or more
> > >> >>>>> people - let's call them maintainers. It is important to note
> > that we
> > >> >>>> don't
> > >> >>>>> suggest the maintainers as an authoritative role, but
> > >> >>>>> simply as committers or contributors that visibly step up for a
> > >> >> certain
> > >> >>>>> component, and mainly track and drive the efforts
> > >> >>>>> pertaining to that component.
> > >> >>>>>
> > >> >>>>> It is also important to realize that we do not want to suggest
> > that
> > >> >>>> people
> > >> >>>>> get less involved with certain parts and components, because
> > >> >>>>> they are not the maintainers. We simply want to make sure that
> > each
> > >> >>> pull
> > >> >>>>> request or question or contribution has in the end
> > >> >>>>> one person (or a small set of people) responsible for catching
> and
> > >> >>>> tracking
> > >> >>>>> it, if it was not worked on by the pro-active
> > >> >>>>> community.
> > >> >>>>>
> > >> >>>>> For some components, having multiple maintainers will be
> helpful.
> > In
> > >> >>> that
> > >> >>>>> case, one maintainer should be the "chair" or "lead"
> > >> >>>>> and make sure that no issue of that component gets lost between
> > the
> > >> >>>>> multiple maintainers.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> A maintainers' role is:
> > >> >>>>> -----------------------------
> > >> >>>>>
> > >> >>>>>   - Have an overview of which of the open pull requests relate
> to
> > >> >> their
> > >> >>>>> component
> > >> >>>>>   - Drive the pull requests relating to the component to
> > resolution
> > >> >>>>>       => Moderate the decision whether the feature should be
> > merged
> > >> >>>>>       => Make sure the pull request gets a shepherd.
> > >> >>>>>            In many cases, the maintainers would shepherd
> > themselves.
> > >> >>>>>       => In case the shepherd becomes inactive, the maintainers
> > need
> > >> >> to
> > >> >>>>> find a new shepherd.
> > >> >>>>>
> > >> >>>>>   - Have an overview of what are the known issues of their
> > component
> > >> >>>>>   - Have an overview of what are the frequently requested
> > features of
> > >> >>>> their
> > >> >>>>> component
> > >> >>>>>
> > >> >>>>>   - Have an overview of which contributors are doing very good
> > work
> > >> >> in
> > >> >>>>> their component,
> > >> >>>>>     would be candidates for committers, and should be mentored
> > >> >> towards
> > >> >>>> that.
> > >> >>>>>
> > >> >>>>>   - Resolve email threads that have been brought to their
> > attention,
> > >> >>>>> because deeper
> > >> >>>>>     component knowledge is required for that thread.
> > >> >>>>>
> > >> >>>>> A maintainers' role is NOT:
> > >> >>>>> ----------------------------------
> > >> >>>>>
> > >> >>>>>   - Review all pull requests of that component
> > >> >>>>>   - Answer every mail with questions about that component
> > >> >>>>>   - Fix all bugs and implement all features of that components
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> We imagine the following way that the community and the
> > maintainers
> > >> >>>>> interact:
> > >> >>>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >>
> >
> ---------------------------------------------------------------------------------------------------------
> > >> >>>>>
> > >> >>>>>   - Pull requests should be tagged by component. Since we cannot
> > add
> > >> >>>> labels
> > >> >>>>> at this point, we need
> > >> >>>>>     to rely on the following:
> > >> >>>>>      => The pull request opener should name the pull request
> like
> > >> >>>>> "[FLINK-XXX] [component] Title"
> > >> >>>>>      => Components can be (re) tagged by adding special comments
> > in
> > >> >> the
> > >> >>>>> pull request ("==> component client")
> > >> >>>>>      => With some luck, GitHub and Apache Infra will allow us to
> > use
> > >> >>>> labels
> > >> >>>>> at some point
> > >> >>>>>
> > >> >>>>>   - When pull requests are associated with a component, the
> > >> >> maintainers
> > >> >>>>> will manage them
> > >> >>>>>     (decision whether to add, find shepherd, catch dropped pull
> > >> >>> requests)
> > >> >>>>>
> > >> >>>>>   - We assume that maintainers frequently reach out to other
> > >> >> community
> > >> >>>>> members and ask them if they want
> > >> >>>>>     to shepherd a pull request.
> > >> >>>>>
> > >> >>>>>   - On the mailing list, everyone should feel equally empowered
> to
> > >> >>> answer
> > >> >>>>> and discuss.
> > >> >>>>>     If at some point in the discussion, some deep technical
> > knowledge
> > >> >>>> about
> > >> >>>>> a component is required,
> > >> >>>>>     the maintainer(s) should be drawn into the discussion.
> > >> >>>>>     Because the Mailing List infrastructure has no support to
> tag
> > >> >>>> threads,
> > >> >>>>> here are some simple workarounds:
> > >> >>>>>
> > >> >>>>>     => One possibility is to put the maintainers' mail addresses
> > on
> > >> >> cc
> > >> >>>> for
> > >> >>>>> the thread, so they get the mail
> > >> >>>>>           not just via l the mailing list
> > >> >>>>>     => Another way would be to post something like "+maintainer
> > >> >>> runtime"
> > >> >>>> in
> > >> >>>>> the thread and the "runtime"
> > >> >>>>>          maintainers would have a filter/alert on these keywords
> > in
> > >> >>> their
> > >> >>>>> mail program.
> > >> >>>>>
> > >> >>>>>   - We assume that maintainers will reach out to community
> members
> > >> >> that
> > >> >>>> are
> > >> >>>>> very active and helpful in
> > >> >>>>>     a component, and will ask them if they want to be added as
> > >> >>>> maintainers.
> > >> >>>>>     That will make it visible that those people are experts for
> > that
> > >> >>> part
> > >> >>>>> of Flink.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> ======================================
> > >> >>>>> Maintainers: Committers and Contributors
> > >> >>>>> ======================================
> > >> >>>>>
> > >> >>>>> It helps if maintainers are committers (since we want them to
> > resolve
> > >> >>>> pull
> > >> >>>>> requests which often involves
> > >> >>>>> merging them).
> > >> >>>>>
> > >> >>>>> Components with multiple maintainers can easily have
> non-committer
> > >> >>>>> contributors in addition to committer
> > >> >>>>> contributors.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> ======
> > >> >>>>> JIRA
> > >> >>>>> ======
> > >> >>>>>
> > >> >>>>> Ideally, JIRA can be used to get an overview of what are the
> known
> > >> >>> issues
> > >> >>>>> of each component, and what are
> > >> >>>>> common feature requests. Unfortunately, the Flink JIRA is quite
> > >> >>>> unorganized
> > >> >>>>> right now.
> > >> >>>>>
> > >> >>>>> A natural followup effort of this proposal would be to define in
> > JIRA
> > >> >>> the
> > >> >>>>> same components as we defined here,
> > >> >>>>> and have the maintainers keep JIRA meaningful for that
> particular
> > >> >>>>> component. That would allow us to
> > >> >>>>> easily generate some tables out of JIRA (like top known issues
> per
> > >> >>>>> component, most requested features)
> > >> >>>>> post them on the dev list once in a while as a "state of the
> > union"
> > >> >>>> report.
> > >> >>>>>
> > >> >>>>> Initial assignment of issues to components should be made by
> those
> > >> >>> people
> > >> >>>>> opening the issue. The maintainer
> > >> >>>>> of that tagged component needs to change the tag, if the
> component
> > >> >> was
> > >> >>>>> classified incorrectly.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> ======================================
> > >> >>>>> Initial Components and Maintainers Suggestion
> > >> >>>>> ======================================
> > >> >>>>>
> > >> >>>>> Below is a suggestion of how to define components for Flink. One
> > goal
> > >> >>> of
> > >> >>>>> the division was to make it
> > >> >>>>> obvious for the majority of questions and contributions to which
> > >> >>>> component
> > >> >>>>> they would relate. Otherwise,
> > >> >>>>> if many contributions had fuzzy component associations, we would
> > >> >> again
> > >> >>>> not
> > >> >>>>> solve the issue of having clear
> > >> >>>>> responsibilities for who would track the progress and
> resolution.
> > >> >>>>>
> > >> >>>>> We also looked at each component and wrote the names of some
> > people
> > >> >> who
> > >> >>>> we
> > >> >>>>> thought were natural
> > >> >>>>> experts for the components, and thus natural candidates for
> > >> >>> maintainers.
> > >> >>>>>
> > >> >>>>> **These names are only a starting point for discussion.**
> > >> >>>>>
> > >> >>>>> Once agreed upon, the components and names of maintainers should
> > be
> > >> >>> kept
> > >> >>>> in
> > >> >>>>> the wiki and updated as
> > >> >>>>> components change and people step up or down.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> *DataSet API* (*Fabian, Greg, Gabor*)
> > >> >>>>>   - Incuding Hadoop compat. parts
> > >> >>>>>
> > >> >>>>> *DataStream API* (*Aljoscha, Max, Stephan*)
> > >> >>>>>
> > >> >>>>> *Runtime*
> > >> >>>>>   - Distributed Coordination (JobManager/TaskManager, Akka)
> > (*Till*)
> > >> >>>>>   - Local Runtime (Memory Management, State Backends,
> > >> >> Tasks/Operators)
> > >> >>> (
> > >> >>>>> *Stephan*)
> > >> >>>>>   - Network (*Ufuk*)
> > >> >>>>>
> > >> >>>>> *Client/Optimizer* (*Fabian*)
> > >> >>>>>
> > >> >>>>> *Type system / Type extractor* (Timo)
> > >> >>>>>
> > >> >>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)
> > >> >>>>>
> > >> >>>>> *Libraries*
> > >> >>>>>   - Gelly (*Vasia, Greg*)
> > >> >>>>>   - ML (*Till, Theo*)
> > >> >>>>>   - CEP (*Till*)
> > >> >>>>>   - Python (*Chesnay*)
> > >> >>>>>
> > >> >>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
> > >> >>>>>
> > >> >>>>> *Streaming Connectors* (*Robert*, *Aljoscha*)
> > >> >>>>>
> > >> >>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*)
> > >> >>>>>
> > >> >>>>> *Storm Compatibility Layer* (*Mathias*)
> > >> >>>>>
> > >> >>>>> *Scala shell* (*Till*)
> > >> >>>>>
> > >> >>>>> *Startup Shell Scripts* (Ufuk)
> > >> >>>>>
> > >> >>>>> *Flink Build System, Maven Files* (*Robert*)
> > >> >>>>>
> > >> >>>>> *Documentation* (Ufuk)
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> Please let us know what you think about this proposal.
> > >> >>>>> Happy discussing!
> > >> >>>>>
> > >> >>>>> Greetings,
> > >> >>>>> Stephan
> > >> >>>>
> > >> >>>
> > >> >>
> > >> >
> > >>
> > >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Structure the Flink Open Source Development

Aljoscha Krettek-2
+1

The ideas seem good and the proposed number of components seems reasonable.
With this, we should also then cleanup the JIRA to make it actually usable.

On Thu, 12 May 2016 at 18:09 Stephan Ewen <[hidden email]> wrote:

> All maintainer candidates are only proposals so far. No indication of lead
> or anything so far.
>
> Let's first see if we agree on the structure proposed here, and if we take
> the components as suggested here or if we refine the list.
> Am 12.05.2016 17:45 schrieb "Robert Metzger" <[hidden email]>:
>
> > tl;dr: +1
> >
> > I also like the proposal a lot. Our community is growing at a quite fast
> > pace and we need to have some structure in place to still keep track of
> > everything going on.
> >
> > I'm happy to see that the proposal mentions cleaning up our JIRA. This is
> > something that has been annoying me for quite a while, but its too big to
> > do it alone. If maintainers could take care of their components, we
> should
> > have covered already a lot there.
> >
> > One question regarding the "chair" or "lead" role for components: Is the
> > first name in the list of maintainers the lead?
> >
> > I would actually suggest to wait until all proposed maintainers agreed to
> > the proposal. It doesn't make sense to make somebody a maintainer of
> > something if they disagree or are not aware of it.
> >
> >
> >
> >
> > On Thu, May 12, 2016 at 2:13 PM, Maximilian Michels <[hidden email]>
> > wrote:
> >
> > > +1 for the initiative. With a better process we will improve the
> > > quality of the Flink development and give us more time to focus.
> > >
> > > Could we have another category "Infrastructure"? This would concern
> > > things like CI, nightly deployment of snapshots/documentation, ASF
> > > Infra communication. Robert and me could be the initial maintainers
> > > for that.
> > >
> > > On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen <[hidden email]>
> wrote:
> > > > Yes, Matthias, that was supposed to be you.
> > > > Sorry from another guy who frequently has his name misspelled ;-)
> > > >
> > > > On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax <[hidden email]>
> > > wrote:
> > > >
> > > >> +1 from my side.
> > > >>
> > > >> Happy to be the maintainer for Storm-Compatibiltiy (at least I guess
> > > >> it's me, even the correct spelling would be with two 't' :P)
> > > >>
> > > >> -Matthias
> > > >>
> > > >> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
> > > >> > +1 for the proposal
> > > >> > On May 12, 2016 12:13 PM, "Stephan Ewen" <[hidden email]>
> wrote:
> > > >> >
> > > >> >> Yes, Gabor Gevay, that did refer to you!
> > > >> >>
> > > >> >> Sorry for the ambiguity...
> > > >> >>
> > > >> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
> > > >> [hidden email]
> > > >> >>>
> > > >> >> wrote:
> > > >> >>
> > > >> >>> +1 for the proposal
> > > >> >>> @ggevay: I do think that it refers to you. :)
> > > >> >>>
> > > >> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <[hidden email]
> >
> > > >> wrote:
> > > >> >>>
> > > >> >>>> Hello,
> > > >> >>>>
> > > >> >>>> There are at least three Gábors in the Flink community,  :) so
> > > >> >>>> assuming that the Gábor in the list of maintainers of the
> DataSet
> > > API
> > > >> >>>> is referring to me, I'll be happy to do it. :)
> > > >> >>>>
> > > >> >>>> Best,
> > > >> >>>> Gábor G.
> > > >> >>>>
> > > >> >>>>
> > > >> >>>>
> > > >> >>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <[hidden email]>:
> > > >> >>>>> Hi everyone!
> > > >> >>>>>
> > > >> >>>>> We propose to establish some lightweight structures in the
> Flink
> > > open
> > > >> >>>>> source community and development process,
> > > >> >>>>> to help us better handle the increased interest in Flink
> > (mailing
> > > >> >> list
> > > >> >>>> and
> > > >> >>>>> pull requests), while not overwhelming the
> > > >> >>>>> committers, and giving users and contributors a good
> experience.
> > > >> >>>>>
> > > >> >>>>> This proposal is triggered by the observation that we are
> > reaching
> > > >> >> the
> > > >> >>>>> limits of where the current community can support
> > > >> >>>>> users and guide new contributors. The below proposal is based
> on
> > > >> >>>>> observations and ideas from Till, Robert, and me.
> > > >> >>>>>
> > > >> >>>>> ========
> > > >> >>>>> Goals
> > > >> >>>>> ========
> > > >> >>>>>
> > > >> >>>>> We try to achieve the following
> > > >> >>>>>
> > > >> >>>>>   - Pull requests get handled in a timely fashion
> > > >> >>>>>   - New contributors are better integrated into the community
> > > >> >>>>>   - The community feels empowered on the mailing list.
> > > >> >>>>>     But questions that need the attention of someone that has
> > deep
> > > >> >>>>> knowledge of a certain part of Flink get their attention.
> > > >> >>>>>   - At the same time, the committers that are knowledgeable
> > about
> > > >> >> many
> > > >> >>>> core
> > > >> >>>>> parts do not get completely overwhelmed.
> > > >> >>>>>   - We don't overlook threads that report critical issues.
> > > >> >>>>>   - We always have a pretty good overview of what the status
> of
> > > >> >> certain
> > > >> >>>>> parts of the system are.
> > > >> >>>>>       -> What are often encountered known issues
> > > >> >>>>>       -> What are the most frequently requested features
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> ========
> > > >> >>>>> Problems
> > > >> >>>>> ========
> > > >> >>>>>
> > > >> >>>>> Looking into the process, there are two big issues:
> > > >> >>>>>
> > > >> >>>>> (1) Up to now, we have been relying on the fact that
> everything
> > > just
> > > >> >>>>> "organizes itself", driven by best effort. That assumes
> > > >> >>>>> that everyone feels equally responsible for every part,
> > question,
> > > and
> > > >> >>>>> contribution. At the current state, this is impossible
> > > >> >>>>> to maintain, it overwhelms the committers and contributors.
> > > >> >>>>>
> > > >> >>>>> Example: Pull requests are picked up by whoever wants to pick
> > them
> > > >> >> up.
> > > >> >>>> Pull
> > > >> >>>>> requests that are a lot of work, have little
> > > >> >>>>> chance of getting in, or relate to less active components are
> > > >> >> sometimes
> > > >> >>>> not
> > > >> >>>>> picked up. When contributors are pretty
> > > >> >>>>> loaded already, it may happen that no one eventually feels
> > > >> >> responsible
> > > >> >>> to
> > > >> >>>>> pick up a pull request, and it falls through the cracks.
> > > >> >>>>>
> > > >> >>>>> (2) There is no good overview of what are known shortcomings,
> > > >> >> efforts,
> > > >> >>>> and
> > > >> >>>>> requested features for different parts of the system.
> > > >> >>>>> This information exists in various peoples' heads, but is not
> > > easily
> > > >> >>>>> accessible for new people. The Flink JIRA is not well
> > > >> >>>>> maintained, it is not easy to draw insights from that.
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> ===========
> > > >> >>>>> The Proposal
> > > >> >>>>> ===========
> > > >> >>>>>
> > > >> >>>>> Since we are building a parallel system, the natural solution
> > > seems
> > > >> >> to
> > > >> >>>> be:
> > > >> >>>>> partition the workload ;-)
> > > >> >>>>>
> > > >> >>>>> We propose to define a set of components for Flink. Each
> > > component is
> > > >> >>>>> maintained or tracked by one or more
> > > >> >>>>> people - let's call them maintainers. It is important to note
> > > that we
> > > >> >>>> don't
> > > >> >>>>> suggest the maintainers as an authoritative role, but
> > > >> >>>>> simply as committers or contributors that visibly step up for
> a
> > > >> >> certain
> > > >> >>>>> component, and mainly track and drive the efforts
> > > >> >>>>> pertaining to that component.
> > > >> >>>>>
> > > >> >>>>> It is also important to realize that we do not want to suggest
> > > that
> > > >> >>>> people
> > > >> >>>>> get less involved with certain parts and components, because
> > > >> >>>>> they are not the maintainers. We simply want to make sure that
> > > each
> > > >> >>> pull
> > > >> >>>>> request or question or contribution has in the end
> > > >> >>>>> one person (or a small set of people) responsible for catching
> > and
> > > >> >>>> tracking
> > > >> >>>>> it, if it was not worked on by the pro-active
> > > >> >>>>> community.
> > > >> >>>>>
> > > >> >>>>> For some components, having multiple maintainers will be
> > helpful.
> > > In
> > > >> >>> that
> > > >> >>>>> case, one maintainer should be the "chair" or "lead"
> > > >> >>>>> and make sure that no issue of that component gets lost
> between
> > > the
> > > >> >>>>> multiple maintainers.
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> A maintainers' role is:
> > > >> >>>>> -----------------------------
> > > >> >>>>>
> > > >> >>>>>   - Have an overview of which of the open pull requests relate
> > to
> > > >> >> their
> > > >> >>>>> component
> > > >> >>>>>   - Drive the pull requests relating to the component to
> > > resolution
> > > >> >>>>>       => Moderate the decision whether the feature should be
> > > merged
> > > >> >>>>>       => Make sure the pull request gets a shepherd.
> > > >> >>>>>            In many cases, the maintainers would shepherd
> > > themselves.
> > > >> >>>>>       => In case the shepherd becomes inactive, the
> maintainers
> > > need
> > > >> >> to
> > > >> >>>>> find a new shepherd.
> > > >> >>>>>
> > > >> >>>>>   - Have an overview of what are the known issues of their
> > > component
> > > >> >>>>>   - Have an overview of what are the frequently requested
> > > features of
> > > >> >>>> their
> > > >> >>>>> component
> > > >> >>>>>
> > > >> >>>>>   - Have an overview of which contributors are doing very good
> > > work
> > > >> >> in
> > > >> >>>>> their component,
> > > >> >>>>>     would be candidates for committers, and should be mentored
> > > >> >> towards
> > > >> >>>> that.
> > > >> >>>>>
> > > >> >>>>>   - Resolve email threads that have been brought to their
> > > attention,
> > > >> >>>>> because deeper
> > > >> >>>>>     component knowledge is required for that thread.
> > > >> >>>>>
> > > >> >>>>> A maintainers' role is NOT:
> > > >> >>>>> ----------------------------------
> > > >> >>>>>
> > > >> >>>>>   - Review all pull requests of that component
> > > >> >>>>>   - Answer every mail with questions about that component
> > > >> >>>>>   - Fix all bugs and implement all features of that components
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> We imagine the following way that the community and the
> > > maintainers
> > > >> >>>>> interact:
> > > >> >>>>>
> > > >> >>>>
> > > >> >>>
> > > >> >>
> > > >>
> > >
> >
> ---------------------------------------------------------------------------------------------------------
> > > >> >>>>>
> > > >> >>>>>   - Pull requests should be tagged by component. Since we
> cannot
> > > add
> > > >> >>>> labels
> > > >> >>>>> at this point, we need
> > > >> >>>>>     to rely on the following:
> > > >> >>>>>      => The pull request opener should name the pull request
> > like
> > > >> >>>>> "[FLINK-XXX] [component] Title"
> > > >> >>>>>      => Components can be (re) tagged by adding special
> comments
> > > in
> > > >> >> the
> > > >> >>>>> pull request ("==> component client")
> > > >> >>>>>      => With some luck, GitHub and Apache Infra will allow us
> to
> > > use
> > > >> >>>> labels
> > > >> >>>>> at some point
> > > >> >>>>>
> > > >> >>>>>   - When pull requests are associated with a component, the
> > > >> >> maintainers
> > > >> >>>>> will manage them
> > > >> >>>>>     (decision whether to add, find shepherd, catch dropped
> pull
> > > >> >>> requests)
> > > >> >>>>>
> > > >> >>>>>   - We assume that maintainers frequently reach out to other
> > > >> >> community
> > > >> >>>>> members and ask them if they want
> > > >> >>>>>     to shepherd a pull request.
> > > >> >>>>>
> > > >> >>>>>   - On the mailing list, everyone should feel equally
> empowered
> > to
> > > >> >>> answer
> > > >> >>>>> and discuss.
> > > >> >>>>>     If at some point in the discussion, some deep technical
> > > knowledge
> > > >> >>>> about
> > > >> >>>>> a component is required,
> > > >> >>>>>     the maintainer(s) should be drawn into the discussion.
> > > >> >>>>>     Because the Mailing List infrastructure has no support to
> > tag
> > > >> >>>> threads,
> > > >> >>>>> here are some simple workarounds:
> > > >> >>>>>
> > > >> >>>>>     => One possibility is to put the maintainers' mail
> addresses
> > > on
> > > >> >> cc
> > > >> >>>> for
> > > >> >>>>> the thread, so they get the mail
> > > >> >>>>>           not just via l the mailing list
> > > >> >>>>>     => Another way would be to post something like
> "+maintainer
> > > >> >>> runtime"
> > > >> >>>> in
> > > >> >>>>> the thread and the "runtime"
> > > >> >>>>>          maintainers would have a filter/alert on these
> keywords
> > > in
> > > >> >>> their
> > > >> >>>>> mail program.
> > > >> >>>>>
> > > >> >>>>>   - We assume that maintainers will reach out to community
> > members
> > > >> >> that
> > > >> >>>> are
> > > >> >>>>> very active and helpful in
> > > >> >>>>>     a component, and will ask them if they want to be added as
> > > >> >>>> maintainers.
> > > >> >>>>>     That will make it visible that those people are experts
> for
> > > that
> > > >> >>> part
> > > >> >>>>> of Flink.
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> ======================================
> > > >> >>>>> Maintainers: Committers and Contributors
> > > >> >>>>> ======================================
> > > >> >>>>>
> > > >> >>>>> It helps if maintainers are committers (since we want them to
> > > resolve
> > > >> >>>> pull
> > > >> >>>>> requests which often involves
> > > >> >>>>> merging them).
> > > >> >>>>>
> > > >> >>>>> Components with multiple maintainers can easily have
> > non-committer
> > > >> >>>>> contributors in addition to committer
> > > >> >>>>> contributors.
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> ======
> > > >> >>>>> JIRA
> > > >> >>>>> ======
> > > >> >>>>>
> > > >> >>>>> Ideally, JIRA can be used to get an overview of what are the
> > known
> > > >> >>> issues
> > > >> >>>>> of each component, and what are
> > > >> >>>>> common feature requests. Unfortunately, the Flink JIRA is
> quite
> > > >> >>>> unorganized
> > > >> >>>>> right now.
> > > >> >>>>>
> > > >> >>>>> A natural followup effort of this proposal would be to define
> in
> > > JIRA
> > > >> >>> the
> > > >> >>>>> same components as we defined here,
> > > >> >>>>> and have the maintainers keep JIRA meaningful for that
> > particular
> > > >> >>>>> component. That would allow us to
> > > >> >>>>> easily generate some tables out of JIRA (like top known issues
> > per
> > > >> >>>>> component, most requested features)
> > > >> >>>>> post them on the dev list once in a while as a "state of the
> > > union"
> > > >> >>>> report.
> > > >> >>>>>
> > > >> >>>>> Initial assignment of issues to components should be made by
> > those
> > > >> >>> people
> > > >> >>>>> opening the issue. The maintainer
> > > >> >>>>> of that tagged component needs to change the tag, if the
> > component
> > > >> >> was
> > > >> >>>>> classified incorrectly.
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> ======================================
> > > >> >>>>> Initial Components and Maintainers Suggestion
> > > >> >>>>> ======================================
> > > >> >>>>>
> > > >> >>>>> Below is a suggestion of how to define components for Flink.
> One
> > > goal
> > > >> >>> of
> > > >> >>>>> the division was to make it
> > > >> >>>>> obvious for the majority of questions and contributions to
> which
> > > >> >>>> component
> > > >> >>>>> they would relate. Otherwise,
> > > >> >>>>> if many contributions had fuzzy component associations, we
> would
> > > >> >> again
> > > >> >>>> not
> > > >> >>>>> solve the issue of having clear
> > > >> >>>>> responsibilities for who would track the progress and
> > resolution.
> > > >> >>>>>
> > > >> >>>>> We also looked at each component and wrote the names of some
> > > people
> > > >> >> who
> > > >> >>>> we
> > > >> >>>>> thought were natural
> > > >> >>>>> experts for the components, and thus natural candidates for
> > > >> >>> maintainers.
> > > >> >>>>>
> > > >> >>>>> **These names are only a starting point for discussion.**
> > > >> >>>>>
> > > >> >>>>> Once agreed upon, the components and names of maintainers
> should
> > > be
> > > >> >>> kept
> > > >> >>>> in
> > > >> >>>>> the wiki and updated as
> > > >> >>>>> components change and people step up or down.
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> *DataSet API* (*Fabian, Greg, Gabor*)
> > > >> >>>>>   - Incuding Hadoop compat. parts
> > > >> >>>>>
> > > >> >>>>> *DataStream API* (*Aljoscha, Max, Stephan*)
> > > >> >>>>>
> > > >> >>>>> *Runtime*
> > > >> >>>>>   - Distributed Coordination (JobManager/TaskManager, Akka)
> > > (*Till*)
> > > >> >>>>>   - Local Runtime (Memory Management, State Backends,
> > > >> >> Tasks/Operators)
> > > >> >>> (
> > > >> >>>>> *Stephan*)
> > > >> >>>>>   - Network (*Ufuk*)
> > > >> >>>>>
> > > >> >>>>> *Client/Optimizer* (*Fabian*)
> > > >> >>>>>
> > > >> >>>>> *Type system / Type extractor* (Timo)
> > > >> >>>>>
> > > >> >>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max,
> Robert*)
> > > >> >>>>>
> > > >> >>>>> *Libraries*
> > > >> >>>>>   - Gelly (*Vasia, Greg*)
> > > >> >>>>>   - ML (*Till, Theo*)
> > > >> >>>>>   - CEP (*Till*)
> > > >> >>>>>   - Python (*Chesnay*)
> > > >> >>>>>
> > > >> >>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
> > > >> >>>>>
> > > >> >>>>> *Streaming Connectors* (*Robert*, *Aljoscha*)
> > > >> >>>>>
> > > >> >>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*)
> > > >> >>>>>
> > > >> >>>>> *Storm Compatibility Layer* (*Mathias*)
> > > >> >>>>>
> > > >> >>>>> *Scala shell* (*Till*)
> > > >> >>>>>
> > > >> >>>>> *Startup Shell Scripts* (Ufuk)
> > > >> >>>>>
> > > >> >>>>> *Flink Build System, Maven Files* (*Robert*)
> > > >> >>>>>
> > > >> >>>>> *Documentation* (Ufuk)
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> Please let us know what you think about this proposal.
> > > >> >>>>> Happy discussing!
> > > >> >>>>>
> > > >> >>>>> Greetings,
> > > >> >>>>> Stephan
> > > >> >>>>
> > > >> >>>
> > > >> >>
> > > >> >
> > > >>
> > > >>
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Structure the Flink Open Source Development

Nick Dimiduk-2
For what it's worth, this is very close to how HBase attempts to manage the
community load. We break out components (in Jira), with a list of named
component maintainers. Actually, having components alone has given a Big
Bang for the buck because when properly labeled, it makes it really easy
for part-timers to channel their efforts with precision.

As a flink user, I'm +1 for this proposal as well :)

On Thursday, May 12, 2016, Aljoscha Krettek <[hidden email]> wrote:

> +1
>
> The ideas seem good and the proposed number of components seems reasonable.
> With this, we should also then cleanup the JIRA to make it actually usable.
>
> On Thu, 12 May 2016 at 18:09 Stephan Ewen <[hidden email] <javascript:;>>
> wrote:
>
> > All maintainer candidates are only proposals so far. No indication of
> lead
> > or anything so far.
> >
> > Let's first see if we agree on the structure proposed here, and if we
> take
> > the components as suggested here or if we refine the list.
> > Am 12.05.2016 17:45 schrieb "Robert Metzger" <[hidden email]
> <javascript:;>>:
> >
> > > tl;dr: +1
> > >
> > > I also like the proposal a lot. Our community is growing at a quite
> fast
> > > pace and we need to have some structure in place to still keep track of
> > > everything going on.
> > >
> > > I'm happy to see that the proposal mentions cleaning up our JIRA. This
> is
> > > something that has been annoying me for quite a while, but its too big
> to
> > > do it alone. If maintainers could take care of their components, we
> > should
> > > have covered already a lot there.
> > >
> > > One question regarding the "chair" or "lead" role for components: Is
> the
> > > first name in the list of maintainers the lead?
> > >
> > > I would actually suggest to wait until all proposed maintainers agreed
> to
> > > the proposal. It doesn't make sense to make somebody a maintainer of
> > > something if they disagree or are not aware of it.
> > >
> > >
> > >
> > >
> > > On Thu, May 12, 2016 at 2:13 PM, Maximilian Michels <[hidden email]
> <javascript:;>>
> > > wrote:
> > >
> > > > +1 for the initiative. With a better process we will improve the
> > > > quality of the Flink development and give us more time to focus.
> > > >
> > > > Could we have another category "Infrastructure"? This would concern
> > > > things like CI, nightly deployment of snapshots/documentation, ASF
> > > > Infra communication. Robert and me could be the initial maintainers
> > > > for that.
> > > >
> > > > On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen <[hidden email]
> <javascript:;>>
> > wrote:
> > > > > Yes, Matthias, that was supposed to be you.
> > > > > Sorry from another guy who frequently has his name misspelled ;-)
> > > > >
> > > > > On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax <[hidden email]
> <javascript:;>>
> > > > wrote:
> > > > >
> > > > >> +1 from my side.
> > > > >>
> > > > >> Happy to be the maintainer for Storm-Compatibiltiy (at least I
> guess
> > > > >> it's me, even the correct spelling would be with two 't' :P)
> > > > >>
> > > > >> -Matthias
> > > > >>
> > > > >> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
> > > > >> > +1 for the proposal
> > > > >> > On May 12, 2016 12:13 PM, "Stephan Ewen" <[hidden email]
> <javascript:;>>
> > wrote:
> > > > >> >
> > > > >> >> Yes, Gabor Gevay, that did refer to you!
> > > > >> >>
> > > > >> >> Sorry for the ambiguity...
> > > > >> >>
> > > > >> >> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
> > > > >> [hidden email] <javascript:;>
> > > > >> >>>
> > > > >> >> wrote:
> > > > >> >>
> > > > >> >>> +1 for the proposal
> > > > >> >>> @ggevay: I do think that it refers to you. :)
> > > > >> >>>
> > > > >> >>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <
> [hidden email] <javascript:;>
> > >
> > > > >> wrote:
> > > > >> >>>
> > > > >> >>>> Hello,
> > > > >> >>>>
> > > > >> >>>> There are at least three Gábors in the Flink community,  :)
> so
> > > > >> >>>> assuming that the Gábor in the list of maintainers of the
> > DataSet
> > > > API
> > > > >> >>>> is referring to me, I'll be happy to do it. :)
> > > > >> >>>>
> > > > >> >>>> Best,
> > > > >> >>>> Gábor G.
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>>
> > > > >> >>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <[hidden email]
> <javascript:;>>:
> > > > >> >>>>> Hi everyone!
> > > > >> >>>>>
> > > > >> >>>>> We propose to establish some lightweight structures in the
> > Flink
> > > > open
> > > > >> >>>>> source community and development process,
> > > > >> >>>>> to help us better handle the increased interest in Flink
> > > (mailing
> > > > >> >> list
> > > > >> >>>> and
> > > > >> >>>>> pull requests), while not overwhelming the
> > > > >> >>>>> committers, and giving users and contributors a good
> > experience.
> > > > >> >>>>>
> > > > >> >>>>> This proposal is triggered by the observation that we are
> > > reaching
> > > > >> >> the
> > > > >> >>>>> limits of where the current community can support
> > > > >> >>>>> users and guide new contributors. The below proposal is
> based
> > on
> > > > >> >>>>> observations and ideas from Till, Robert, and me.
> > > > >> >>>>>
> > > > >> >>>>> ========
> > > > >> >>>>> Goals
> > > > >> >>>>> ========
> > > > >> >>>>>
> > > > >> >>>>> We try to achieve the following
> > > > >> >>>>>
> > > > >> >>>>>   - Pull requests get handled in a timely fashion
> > > > >> >>>>>   - New contributors are better integrated into the
> community
> > > > >> >>>>>   - The community feels empowered on the mailing list.
> > > > >> >>>>>     But questions that need the attention of someone that
> has
> > > deep
> > > > >> >>>>> knowledge of a certain part of Flink get their attention.
> > > > >> >>>>>   - At the same time, the committers that are knowledgeable
> > > about
> > > > >> >> many
> > > > >> >>>> core
> > > > >> >>>>> parts do not get completely overwhelmed.
> > > > >> >>>>>   - We don't overlook threads that report critical issues.
> > > > >> >>>>>   - We always have a pretty good overview of what the status
> > of
> > > > >> >> certain
> > > > >> >>>>> parts of the system are.
> > > > >> >>>>>       -> What are often encountered known issues
> > > > >> >>>>>       -> What are the most frequently requested features
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> ========
> > > > >> >>>>> Problems
> > > > >> >>>>> ========
> > > > >> >>>>>
> > > > >> >>>>> Looking into the process, there are two big issues:
> > > > >> >>>>>
> > > > >> >>>>> (1) Up to now, we have been relying on the fact that
> > everything
> > > > just
> > > > >> >>>>> "organizes itself", driven by best effort. That assumes
> > > > >> >>>>> that everyone feels equally responsible for every part,
> > > question,
> > > > and
> > > > >> >>>>> contribution. At the current state, this is impossible
> > > > >> >>>>> to maintain, it overwhelms the committers and contributors.
> > > > >> >>>>>
> > > > >> >>>>> Example: Pull requests are picked up by whoever wants to
> pick
> > > them
> > > > >> >> up.
> > > > >> >>>> Pull
> > > > >> >>>>> requests that are a lot of work, have little
> > > > >> >>>>> chance of getting in, or relate to less active components
> are
> > > > >> >> sometimes
> > > > >> >>>> not
> > > > >> >>>>> picked up. When contributors are pretty
> > > > >> >>>>> loaded already, it may happen that no one eventually feels
> > > > >> >> responsible
> > > > >> >>> to
> > > > >> >>>>> pick up a pull request, and it falls through the cracks.
> > > > >> >>>>>
> > > > >> >>>>> (2) There is no good overview of what are known
> shortcomings,
> > > > >> >> efforts,
> > > > >> >>>> and
> > > > >> >>>>> requested features for different parts of the system.
> > > > >> >>>>> This information exists in various peoples' heads, but is
> not
> > > > easily
> > > > >> >>>>> accessible for new people. The Flink JIRA is not well
> > > > >> >>>>> maintained, it is not easy to draw insights from that.
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> ===========
> > > > >> >>>>> The Proposal
> > > > >> >>>>> ===========
> > > > >> >>>>>
> > > > >> >>>>> Since we are building a parallel system, the natural
> solution
> > > > seems
> > > > >> >> to
> > > > >> >>>> be:
> > > > >> >>>>> partition the workload ;-)
> > > > >> >>>>>
> > > > >> >>>>> We propose to define a set of components for Flink. Each
> > > > component is
> > > > >> >>>>> maintained or tracked by one or more
> > > > >> >>>>> people - let's call them maintainers. It is important to
> note
> > > > that we
> > > > >> >>>> don't
> > > > >> >>>>> suggest the maintainers as an authoritative role, but
> > > > >> >>>>> simply as committers or contributors that visibly step up
> for
> > a
> > > > >> >> certain
> > > > >> >>>>> component, and mainly track and drive the efforts
> > > > >> >>>>> pertaining to that component.
> > > > >> >>>>>
> > > > >> >>>>> It is also important to realize that we do not want to
> suggest
> > > > that
> > > > >> >>>> people
> > > > >> >>>>> get less involved with certain parts and components, because
> > > > >> >>>>> they are not the maintainers. We simply want to make sure
> that
> > > > each
> > > > >> >>> pull
> > > > >> >>>>> request or question or contribution has in the end
> > > > >> >>>>> one person (or a small set of people) responsible for
> catching
> > > and
> > > > >> >>>> tracking
> > > > >> >>>>> it, if it was not worked on by the pro-active
> > > > >> >>>>> community.
> > > > >> >>>>>
> > > > >> >>>>> For some components, having multiple maintainers will be
> > > helpful.
> > > > In
> > > > >> >>> that
> > > > >> >>>>> case, one maintainer should be the "chair" or "lead"
> > > > >> >>>>> and make sure that no issue of that component gets lost
> > between
> > > > the
> > > > >> >>>>> multiple maintainers.
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> A maintainers' role is:
> > > > >> >>>>> -----------------------------
> > > > >> >>>>>
> > > > >> >>>>>   - Have an overview of which of the open pull requests
> relate
> > > to
> > > > >> >> their
> > > > >> >>>>> component
> > > > >> >>>>>   - Drive the pull requests relating to the component to
> > > > resolution
> > > > >> >>>>>       => Moderate the decision whether the feature should be
> > > > merged
> > > > >> >>>>>       => Make sure the pull request gets a shepherd.
> > > > >> >>>>>            In many cases, the maintainers would shepherd
> > > > themselves.
> > > > >> >>>>>       => In case the shepherd becomes inactive, the
> > maintainers
> > > > need
> > > > >> >> to
> > > > >> >>>>> find a new shepherd.
> > > > >> >>>>>
> > > > >> >>>>>   - Have an overview of what are the known issues of their
> > > > component
> > > > >> >>>>>   - Have an overview of what are the frequently requested
> > > > features of
> > > > >> >>>> their
> > > > >> >>>>> component
> > > > >> >>>>>
> > > > >> >>>>>   - Have an overview of which contributors are doing very
> good
> > > > work
> > > > >> >> in
> > > > >> >>>>> their component,
> > > > >> >>>>>     would be candidates for committers, and should be
> mentored
> > > > >> >> towards
> > > > >> >>>> that.
> > > > >> >>>>>
> > > > >> >>>>>   - Resolve email threads that have been brought to their
> > > > attention,
> > > > >> >>>>> because deeper
> > > > >> >>>>>     component knowledge is required for that thread.
> > > > >> >>>>>
> > > > >> >>>>> A maintainers' role is NOT:
> > > > >> >>>>> ----------------------------------
> > > > >> >>>>>
> > > > >> >>>>>   - Review all pull requests of that component
> > > > >> >>>>>   - Answer every mail with questions about that component
> > > > >> >>>>>   - Fix all bugs and implement all features of that
> components
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> We imagine the following way that the community and the
> > > > maintainers
> > > > >> >>>>> interact:
> > > > >> >>>>>
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> ---------------------------------------------------------------------------------------------------------
> > > > >> >>>>>
> > > > >> >>>>>   - Pull requests should be tagged by component. Since we
> > cannot
> > > > add
> > > > >> >>>> labels
> > > > >> >>>>> at this point, we need
> > > > >> >>>>>     to rely on the following:
> > > > >> >>>>>      => The pull request opener should name the pull request
> > > like
> > > > >> >>>>> "[FLINK-XXX] [component] Title"
> > > > >> >>>>>      => Components can be (re) tagged by adding special
> > comments
> > > > in
> > > > >> >> the
> > > > >> >>>>> pull request ("==> component client")
> > > > >> >>>>>      => With some luck, GitHub and Apache Infra will allow
> us
> > to
> > > > use
> > > > >> >>>> labels
> > > > >> >>>>> at some point
> > > > >> >>>>>
> > > > >> >>>>>   - When pull requests are associated with a component, the
> > > > >> >> maintainers
> > > > >> >>>>> will manage them
> > > > >> >>>>>     (decision whether to add, find shepherd, catch dropped
> > pull
> > > > >> >>> requests)
> > > > >> >>>>>
> > > > >> >>>>>   - We assume that maintainers frequently reach out to other
> > > > >> >> community
> > > > >> >>>>> members and ask them if they want
> > > > >> >>>>>     to shepherd a pull request.
> > > > >> >>>>>
> > > > >> >>>>>   - On the mailing list, everyone should feel equally
> > empowered
> > > to
> > > > >> >>> answer
> > > > >> >>>>> and discuss.
> > > > >> >>>>>     If at some point in the discussion, some deep technical
> > > > knowledge
> > > > >> >>>> about
> > > > >> >>>>> a component is required,
> > > > >> >>>>>     the maintainer(s) should be drawn into the discussion.
> > > > >> >>>>>     Because the Mailing List infrastructure has no support
> to
> > > tag
> > > > >> >>>> threads,
> > > > >> >>>>> here are some simple workarounds:
> > > > >> >>>>>
> > > > >> >>>>>     => One possibility is to put the maintainers' mail
> > addresses
> > > > on
> > > > >> >> cc
> > > > >> >>>> for
> > > > >> >>>>> the thread, so they get the mail
> > > > >> >>>>>           not just via l the mailing list
> > > > >> >>>>>     => Another way would be to post something like
> > "+maintainer
> > > > >> >>> runtime"
> > > > >> >>>> in
> > > > >> >>>>> the thread and the "runtime"
> > > > >> >>>>>          maintainers would have a filter/alert on these
> > keywords
> > > > in
> > > > >> >>> their
> > > > >> >>>>> mail program.
> > > > >> >>>>>
> > > > >> >>>>>   - We assume that maintainers will reach out to community
> > > members
> > > > >> >> that
> > > > >> >>>> are
> > > > >> >>>>> very active and helpful in
> > > > >> >>>>>     a component, and will ask them if they want to be added
> as
> > > > >> >>>> maintainers.
> > > > >> >>>>>     That will make it visible that those people are experts
> > for
> > > > that
> > > > >> >>> part
> > > > >> >>>>> of Flink.
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> ======================================
> > > > >> >>>>> Maintainers: Committers and Contributors
> > > > >> >>>>> ======================================
> > > > >> >>>>>
> > > > >> >>>>> It helps if maintainers are committers (since we want them
> to
> > > > resolve
> > > > >> >>>> pull
> > > > >> >>>>> requests which often involves
> > > > >> >>>>> merging them).
> > > > >> >>>>>
> > > > >> >>>>> Components with multiple maintainers can easily have
> > > non-committer
> > > > >> >>>>> contributors in addition to committer
> > > > >> >>>>> contributors.
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> ======
> > > > >> >>>>> JIRA
> > > > >> >>>>> ======
> > > > >> >>>>>
> > > > >> >>>>> Ideally, JIRA can be used to get an overview of what are the
> > > known
> > > > >> >>> issues
> > > > >> >>>>> of each component, and what are
> > > > >> >>>>> common feature requests. Unfortunately, the Flink JIRA is
> > quite
> > > > >> >>>> unorganized
> > > > >> >>>>> right now.
> > > > >> >>>>>
> > > > >> >>>>> A natural followup effort of this proposal would be to
> define
> > in
> > > > JIRA
> > > > >> >>> the
> > > > >> >>>>> same components as we defined here,
> > > > >> >>>>> and have the maintainers keep JIRA meaningful for that
> > > particular
> > > > >> >>>>> component. That would allow us to
> > > > >> >>>>> easily generate some tables out of JIRA (like top known
> issues
> > > per
> > > > >> >>>>> component, most requested features)
> > > > >> >>>>> post them on the dev list once in a while as a "state of the
> > > > union"
> > > > >> >>>> report.
> > > > >> >>>>>
> > > > >> >>>>> Initial assignment of issues to components should be made by
> > > those
> > > > >> >>> people
> > > > >> >>>>> opening the issue. The maintainer
> > > > >> >>>>> of that tagged component needs to change the tag, if the
> > > component
> > > > >> >> was
> > > > >> >>>>> classified incorrectly.
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> ======================================
> > > > >> >>>>> Initial Components and Maintainers Suggestion
> > > > >> >>>>> ======================================
> > > > >> >>>>>
> > > > >> >>>>> Below is a suggestion of how to define components for Flink.
> > One
> > > > goal
> > > > >> >>> of
> > > > >> >>>>> the division was to make it
> > > > >> >>>>> obvious for the majority of questions and contributions to
> > which
> > > > >> >>>> component
> > > > >> >>>>> they would relate. Otherwise,
> > > > >> >>>>> if many contributions had fuzzy component associations, we
> > would
> > > > >> >> again
> > > > >> >>>> not
> > > > >> >>>>> solve the issue of having clear
> > > > >> >>>>> responsibilities for who would track the progress and
> > > resolution.
> > > > >> >>>>>
> > > > >> >>>>> We also looked at each component and wrote the names of some
> > > > people
> > > > >> >> who
> > > > >> >>>> we
> > > > >> >>>>> thought were natural
> > > > >> >>>>> experts for the components, and thus natural candidates for
> > > > >> >>> maintainers.
> > > > >> >>>>>
> > > > >> >>>>> **These names are only a starting point for discussion.**
> > > > >> >>>>>
> > > > >> >>>>> Once agreed upon, the components and names of maintainers
> > should
> > > > be
> > > > >> >>> kept
> > > > >> >>>> in
> > > > >> >>>>> the wiki and updated as
> > > > >> >>>>> components change and people step up or down.
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> *DataSet API* (*Fabian, Greg, Gabor*)
> > > > >> >>>>>   - Incuding Hadoop compat. parts
> > > > >> >>>>>
> > > > >> >>>>> *DataStream API* (*Aljoscha, Max, Stephan*)
> > > > >> >>>>>
> > > > >> >>>>> *Runtime*
> > > > >> >>>>>   - Distributed Coordination (JobManager/TaskManager, Akka)
> > > > (*Till*)
> > > > >> >>>>>   - Local Runtime (Memory Management, State Backends,
> > > > >> >> Tasks/Operators)
> > > > >> >>> (
> > > > >> >>>>> *Stephan*)
> > > > >> >>>>>   - Network (*Ufuk*)
> > > > >> >>>>>
> > > > >> >>>>> *Client/Optimizer* (*Fabian*)
> > > > >> >>>>>
> > > > >> >>>>> *Type system / Type extractor* (Timo)
> > > > >> >>>>>
> > > > >> >>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max,
> > Robert*)
> > > > >> >>>>>
> > > > >> >>>>> *Libraries*
> > > > >> >>>>>   - Gelly (*Vasia, Greg*)
> > > > >> >>>>>   - ML (*Till, Theo*)
> > > > >> >>>>>   - CEP (*Till*)
> > > > >> >>>>>   - Python (*Chesnay*)
> > > > >> >>>>>
> > > > >> >>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
> > > > >> >>>>>
> > > > >> >>>>> *Streaming Connectors* (*Robert*, *Aljoscha*)
> > > > >> >>>>>
> > > > >> >>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*)
> > > > >> >>>>>
> > > > >> >>>>> *Storm Compatibility Layer* (*Mathias*)
> > > > >> >>>>>
> > > > >> >>>>> *Scala shell* (*Till*)
> > > > >> >>>>>
> > > > >> >>>>> *Startup Shell Scripts* (Ufuk)
> > > > >> >>>>>
> > > > >> >>>>> *Flink Build System, Maven Files* (*Robert*)
> > > > >> >>>>>
> > > > >> >>>>> *Documentation* (Ufuk)
> > > > >> >>>>>
> > > > >> >>>>>
> > > > >> >>>>> Please let us know what you think about this proposal.
> > > > >> >>>>> Happy discussing!
> > > > >> >>>>>
> > > > >> >>>>> Greetings,
> > > > >> >>>>> Stephan
> > > > >> >>>>
> > > > >> >>>
> > > > >> >>
> > > > >> >
> > > > >>
> > > > >>
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Structure the Flink Open Source Development

Chiwan Park-2
Thanks for great suggestion.

+1 for this proposal.

Regards,
Chiwan Park

> On May 13, 2016, at 1:44 AM, Nick Dimiduk <[hidden email]> wrote:
>
> For what it's worth, this is very close to how HBase attempts to manage the
> community load. We break out components (in Jira), with a list of named
> component maintainers. Actually, having components alone has given a Big
> Bang for the buck because when properly labeled, it makes it really easy
> for part-timers to channel their efforts with precision.
>
> As a flink user, I'm +1 for this proposal as well :)
>
> On Thursday, May 12, 2016, Aljoscha Krettek <[hidden email]> wrote:
>
>> +1
>>
>> The ideas seem good and the proposed number of components seems reasonable.
>> With this, we should also then cleanup the JIRA to make it actually usable.
>>
>> On Thu, 12 May 2016 at 18:09 Stephan Ewen <[hidden email] <javascript:;>>
>> wrote:
>>
>>> All maintainer candidates are only proposals so far. No indication of
>> lead
>>> or anything so far.
>>>
>>> Let's first see if we agree on the structure proposed here, and if we
>> take
>>> the components as suggested here or if we refine the list.
>>> Am 12.05.2016 17:45 schrieb "Robert Metzger" <[hidden email]
>> <javascript:;>>:
>>>
>>>> tl;dr: +1
>>>>
>>>> I also like the proposal a lot. Our community is growing at a quite
>> fast
>>>> pace and we need to have some structure in place to still keep track of
>>>> everything going on.
>>>>
>>>> I'm happy to see that the proposal mentions cleaning up our JIRA. This
>> is
>>>> something that has been annoying me for quite a while, but its too big
>> to
>>>> do it alone. If maintainers could take care of their components, we
>>> should
>>>> have covered already a lot there.
>>>>
>>>> One question regarding the "chair" or "lead" role for components: Is
>> the
>>>> first name in the list of maintainers the lead?
>>>>
>>>> I would actually suggest to wait until all proposed maintainers agreed
>> to
>>>> the proposal. It doesn't make sense to make somebody a maintainer of
>>>> something if they disagree or are not aware of it.
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, May 12, 2016 at 2:13 PM, Maximilian Michels <[hidden email]
>> <javascript:;>>
>>>> wrote:
>>>>
>>>>> +1 for the initiative. With a better process we will improve the
>>>>> quality of the Flink development and give us more time to focus.
>>>>>
>>>>> Could we have another category "Infrastructure"? This would concern
>>>>> things like CI, nightly deployment of snapshots/documentation, ASF
>>>>> Infra communication. Robert and me could be the initial maintainers
>>>>> for that.
>>>>>
>>>>> On Thu, May 12, 2016 at 1:52 PM, Stephan Ewen <[hidden email]
>> <javascript:;>>
>>> wrote:
>>>>>> Yes, Matthias, that was supposed to be you.
>>>>>> Sorry from another guy who frequently has his name misspelled ;-)
>>>>>>
>>>>>> On Thu, May 12, 2016 at 1:27 PM, Matthias J. Sax <[hidden email]
>> <javascript:;>>
>>>>> wrote:
>>>>>>
>>>>>>> +1 from my side.
>>>>>>>
>>>>>>> Happy to be the maintainer for Storm-Compatibiltiy (at least I
>> guess
>>>>>>> it's me, even the correct spelling would be with two 't' :P)
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>> On 05/12/2016 12:56 PM, Till Rohrmann wrote:
>>>>>>>> +1 for the proposal
>>>>>>>> On May 12, 2016 12:13 PM, "Stephan Ewen" <[hidden email]
>> <javascript:;>>
>>> wrote:
>>>>>>>>
>>>>>>>>> Yes, Gabor Gevay, that did refer to you!
>>>>>>>>>
>>>>>>>>> Sorry for the ambiguity...
>>>>>>>>>
>>>>>>>>> On Thu, May 12, 2016 at 10:46 AM, Márton Balassi <
>>>>>>> [hidden email] <javascript:;>
>>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> +1 for the proposal
>>>>>>>>>> @ggevay: I do think that it refers to you. :)
>>>>>>>>>>
>>>>>>>>>> On Thu, May 12, 2016 at 10:40 AM, Gábor Gévay <
>> [hidden email] <javascript:;>
>>>>
>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> There are at least three Gábors in the Flink community,  :)
>> so
>>>>>>>>>>> assuming that the Gábor in the list of maintainers of the
>>> DataSet
>>>>> API
>>>>>>>>>>> is referring to me, I'll be happy to do it. :)
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Gábor G.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2016-05-10 11:24 GMT+02:00 Stephan Ewen <[hidden email]
>> <javascript:;>>:
>>>>>>>>>>>> Hi everyone!
>>>>>>>>>>>>
>>>>>>>>>>>> We propose to establish some lightweight structures in the
>>> Flink
>>>>> open
>>>>>>>>>>>> source community and development process,
>>>>>>>>>>>> to help us better handle the increased interest in Flink
>>>> (mailing
>>>>>>>>> list
>>>>>>>>>>> and
>>>>>>>>>>>> pull requests), while not overwhelming the
>>>>>>>>>>>> committers, and giving users and contributors a good
>>> experience.
>>>>>>>>>>>>
>>>>>>>>>>>> This proposal is triggered by the observation that we are
>>>> reaching
>>>>>>>>> the
>>>>>>>>>>>> limits of where the current community can support
>>>>>>>>>>>> users and guide new contributors. The below proposal is
>> based
>>> on
>>>>>>>>>>>> observations and ideas from Till, Robert, and me.
>>>>>>>>>>>>
>>>>>>>>>>>> ========
>>>>>>>>>>>> Goals
>>>>>>>>>>>> ========
>>>>>>>>>>>>
>>>>>>>>>>>> We try to achieve the following
>>>>>>>>>>>>
>>>>>>>>>>>>  - Pull requests get handled in a timely fashion
>>>>>>>>>>>>  - New contributors are better integrated into the
>> community
>>>>>>>>>>>>  - The community feels empowered on the mailing list.
>>>>>>>>>>>>    But questions that need the attention of someone that
>> has
>>>> deep
>>>>>>>>>>>> knowledge of a certain part of Flink get their attention.
>>>>>>>>>>>>  - At the same time, the committers that are knowledgeable
>>>> about
>>>>>>>>> many
>>>>>>>>>>> core
>>>>>>>>>>>> parts do not get completely overwhelmed.
>>>>>>>>>>>>  - We don't overlook threads that report critical issues.
>>>>>>>>>>>>  - We always have a pretty good overview of what the status
>>> of
>>>>>>>>> certain
>>>>>>>>>>>> parts of the system are.
>>>>>>>>>>>>      -> What are often encountered known issues
>>>>>>>>>>>>      -> What are the most frequently requested features
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ========
>>>>>>>>>>>> Problems
>>>>>>>>>>>> ========
>>>>>>>>>>>>
>>>>>>>>>>>> Looking into the process, there are two big issues:
>>>>>>>>>>>>
>>>>>>>>>>>> (1) Up to now, we have been relying on the fact that
>>> everything
>>>>> just
>>>>>>>>>>>> "organizes itself", driven by best effort. That assumes
>>>>>>>>>>>> that everyone feels equally responsible for every part,
>>>> question,
>>>>> and
>>>>>>>>>>>> contribution. At the current state, this is impossible
>>>>>>>>>>>> to maintain, it overwhelms the committers and contributors.
>>>>>>>>>>>>
>>>>>>>>>>>> Example: Pull requests are picked up by whoever wants to
>> pick
>>>> them
>>>>>>>>> up.
>>>>>>>>>>> Pull
>>>>>>>>>>>> requests that are a lot of work, have little
>>>>>>>>>>>> chance of getting in, or relate to less active components
>> are
>>>>>>>>> sometimes
>>>>>>>>>>> not
>>>>>>>>>>>> picked up. When contributors are pretty
>>>>>>>>>>>> loaded already, it may happen that no one eventually feels
>>>>>>>>> responsible
>>>>>>>>>> to
>>>>>>>>>>>> pick up a pull request, and it falls through the cracks.
>>>>>>>>>>>>
>>>>>>>>>>>> (2) There is no good overview of what are known
>> shortcomings,
>>>>>>>>> efforts,
>>>>>>>>>>> and
>>>>>>>>>>>> requested features for different parts of the system.
>>>>>>>>>>>> This information exists in various peoples' heads, but is
>> not
>>>>> easily
>>>>>>>>>>>> accessible for new people. The Flink JIRA is not well
>>>>>>>>>>>> maintained, it is not easy to draw insights from that.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ===========
>>>>>>>>>>>> The Proposal
>>>>>>>>>>>> ===========
>>>>>>>>>>>>
>>>>>>>>>>>> Since we are building a parallel system, the natural
>> solution
>>>>> seems
>>>>>>>>> to
>>>>>>>>>>> be:
>>>>>>>>>>>> partition the workload ;-)
>>>>>>>>>>>>
>>>>>>>>>>>> We propose to define a set of components for Flink. Each
>>>>> component is
>>>>>>>>>>>> maintained or tracked by one or more
>>>>>>>>>>>> people - let's call them maintainers. It is important to
>> note
>>>>> that we
>>>>>>>>>>> don't
>>>>>>>>>>>> suggest the maintainers as an authoritative role, but
>>>>>>>>>>>> simply as committers or contributors that visibly step up
>> for
>>> a
>>>>>>>>> certain
>>>>>>>>>>>> component, and mainly track and drive the efforts
>>>>>>>>>>>> pertaining to that component.
>>>>>>>>>>>>
>>>>>>>>>>>> It is also important to realize that we do not want to
>> suggest
>>>>> that
>>>>>>>>>>> people
>>>>>>>>>>>> get less involved with certain parts and components, because
>>>>>>>>>>>> they are not the maintainers. We simply want to make sure
>> that
>>>>> each
>>>>>>>>>> pull
>>>>>>>>>>>> request or question or contribution has in the end
>>>>>>>>>>>> one person (or a small set of people) responsible for
>> catching
>>>> and
>>>>>>>>>>> tracking
>>>>>>>>>>>> it, if it was not worked on by the pro-active
>>>>>>>>>>>> community.
>>>>>>>>>>>>
>>>>>>>>>>>> For some components, having multiple maintainers will be
>>>> helpful.
>>>>> In
>>>>>>>>>> that
>>>>>>>>>>>> case, one maintainer should be the "chair" or "lead"
>>>>>>>>>>>> and make sure that no issue of that component gets lost
>>> between
>>>>> the
>>>>>>>>>>>> multiple maintainers.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> A maintainers' role is:
>>>>>>>>>>>> -----------------------------
>>>>>>>>>>>>
>>>>>>>>>>>>  - Have an overview of which of the open pull requests
>> relate
>>>> to
>>>>>>>>> their
>>>>>>>>>>>> component
>>>>>>>>>>>>  - Drive the pull requests relating to the component to
>>>>> resolution
>>>>>>>>>>>>      => Moderate the decision whether the feature should be
>>>>> merged
>>>>>>>>>>>>      => Make sure the pull request gets a shepherd.
>>>>>>>>>>>>           In many cases, the maintainers would shepherd
>>>>> themselves.
>>>>>>>>>>>>      => In case the shepherd becomes inactive, the
>>> maintainers
>>>>> need
>>>>>>>>> to
>>>>>>>>>>>> find a new shepherd.
>>>>>>>>>>>>
>>>>>>>>>>>>  - Have an overview of what are the known issues of their
>>>>> component
>>>>>>>>>>>>  - Have an overview of what are the frequently requested
>>>>> features of
>>>>>>>>>>> their
>>>>>>>>>>>> component
>>>>>>>>>>>>
>>>>>>>>>>>>  - Have an overview of which contributors are doing very
>> good
>>>>> work
>>>>>>>>> in
>>>>>>>>>>>> their component,
>>>>>>>>>>>>    would be candidates for committers, and should be
>> mentored
>>>>>>>>> towards
>>>>>>>>>>> that.
>>>>>>>>>>>>
>>>>>>>>>>>>  - Resolve email threads that have been brought to their
>>>>> attention,
>>>>>>>>>>>> because deeper
>>>>>>>>>>>>    component knowledge is required for that thread.
>>>>>>>>>>>>
>>>>>>>>>>>> A maintainers' role is NOT:
>>>>>>>>>>>> ----------------------------------
>>>>>>>>>>>>
>>>>>>>>>>>>  - Review all pull requests of that component
>>>>>>>>>>>>  - Answer every mail with questions about that component
>>>>>>>>>>>>  - Fix all bugs and implement all features of that
>> components
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> We imagine the following way that the community and the
>>>>> maintainers
>>>>>>>>>>>> interact:
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>> ---------------------------------------------------------------------------------------------------------
>>>>>>>>>>>>
>>>>>>>>>>>>  - Pull requests should be tagged by component. Since we
>>> cannot
>>>>> add
>>>>>>>>>>> labels
>>>>>>>>>>>> at this point, we need
>>>>>>>>>>>>    to rely on the following:
>>>>>>>>>>>>     => The pull request opener should name the pull request
>>>> like
>>>>>>>>>>>> "[FLINK-XXX] [component] Title"
>>>>>>>>>>>>     => Components can be (re) tagged by adding special
>>> comments
>>>>> in
>>>>>>>>> the
>>>>>>>>>>>> pull request ("==> component client")
>>>>>>>>>>>>     => With some luck, GitHub and Apache Infra will allow
>> us
>>> to
>>>>> use
>>>>>>>>>>> labels
>>>>>>>>>>>> at some point
>>>>>>>>>>>>
>>>>>>>>>>>>  - When pull requests are associated with a component, the
>>>>>>>>> maintainers
>>>>>>>>>>>> will manage them
>>>>>>>>>>>>    (decision whether to add, find shepherd, catch dropped
>>> pull
>>>>>>>>>> requests)
>>>>>>>>>>>>
>>>>>>>>>>>>  - We assume that maintainers frequently reach out to other
>>>>>>>>> community
>>>>>>>>>>>> members and ask them if they want
>>>>>>>>>>>>    to shepherd a pull request.
>>>>>>>>>>>>
>>>>>>>>>>>>  - On the mailing list, everyone should feel equally
>>> empowered
>>>> to
>>>>>>>>>> answer
>>>>>>>>>>>> and discuss.
>>>>>>>>>>>>    If at some point in the discussion, some deep technical
>>>>> knowledge
>>>>>>>>>>> about
>>>>>>>>>>>> a component is required,
>>>>>>>>>>>>    the maintainer(s) should be drawn into the discussion.
>>>>>>>>>>>>    Because the Mailing List infrastructure has no support
>> to
>>>> tag
>>>>>>>>>>> threads,
>>>>>>>>>>>> here are some simple workarounds:
>>>>>>>>>>>>
>>>>>>>>>>>>    => One possibility is to put the maintainers' mail
>>> addresses
>>>>> on
>>>>>>>>> cc
>>>>>>>>>>> for
>>>>>>>>>>>> the thread, so they get the mail
>>>>>>>>>>>>          not just via l the mailing list
>>>>>>>>>>>>    => Another way would be to post something like
>>> "+maintainer
>>>>>>>>>> runtime"
>>>>>>>>>>> in
>>>>>>>>>>>> the thread and the "runtime"
>>>>>>>>>>>>         maintainers would have a filter/alert on these
>>> keywords
>>>>> in
>>>>>>>>>> their
>>>>>>>>>>>> mail program.
>>>>>>>>>>>>
>>>>>>>>>>>>  - We assume that maintainers will reach out to community
>>>> members
>>>>>>>>> that
>>>>>>>>>>> are
>>>>>>>>>>>> very active and helpful in
>>>>>>>>>>>>    a component, and will ask them if they want to be added
>> as
>>>>>>>>>>> maintainers.
>>>>>>>>>>>>    That will make it visible that those people are experts
>>> for
>>>>> that
>>>>>>>>>> part
>>>>>>>>>>>> of Flink.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ======================================
>>>>>>>>>>>> Maintainers: Committers and Contributors
>>>>>>>>>>>> ======================================
>>>>>>>>>>>>
>>>>>>>>>>>> It helps if maintainers are committers (since we want them
>> to
>>>>> resolve
>>>>>>>>>>> pull
>>>>>>>>>>>> requests which often involves
>>>>>>>>>>>> merging them).
>>>>>>>>>>>>
>>>>>>>>>>>> Components with multiple maintainers can easily have
>>>> non-committer
>>>>>>>>>>>> contributors in addition to committer
>>>>>>>>>>>> contributors.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ======
>>>>>>>>>>>> JIRA
>>>>>>>>>>>> ======
>>>>>>>>>>>>
>>>>>>>>>>>> Ideally, JIRA can be used to get an overview of what are the
>>>> known
>>>>>>>>>> issues
>>>>>>>>>>>> of each component, and what are
>>>>>>>>>>>> common feature requests. Unfortunately, the Flink JIRA is
>>> quite
>>>>>>>>>>> unorganized
>>>>>>>>>>>> right now.
>>>>>>>>>>>>
>>>>>>>>>>>> A natural followup effort of this proposal would be to
>> define
>>> in
>>>>> JIRA
>>>>>>>>>> the
>>>>>>>>>>>> same components as we defined here,
>>>>>>>>>>>> and have the maintainers keep JIRA meaningful for that
>>>> particular
>>>>>>>>>>>> component. That would allow us to
>>>>>>>>>>>> easily generate some tables out of JIRA (like top known
>> issues
>>>> per
>>>>>>>>>>>> component, most requested features)
>>>>>>>>>>>> post them on the dev list once in a while as a "state of the
>>>>> union"
>>>>>>>>>>> report.
>>>>>>>>>>>>
>>>>>>>>>>>> Initial assignment of issues to components should be made by
>>>> those
>>>>>>>>>> people
>>>>>>>>>>>> opening the issue. The maintainer
>>>>>>>>>>>> of that tagged component needs to change the tag, if the
>>>> component
>>>>>>>>> was
>>>>>>>>>>>> classified incorrectly.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ======================================
>>>>>>>>>>>> Initial Components and Maintainers Suggestion
>>>>>>>>>>>> ======================================
>>>>>>>>>>>>
>>>>>>>>>>>> Below is a suggestion of how to define components for Flink.
>>> One
>>>>> goal
>>>>>>>>>> of
>>>>>>>>>>>> the division was to make it
>>>>>>>>>>>> obvious for the majority of questions and contributions to
>>> which
>>>>>>>>>>> component
>>>>>>>>>>>> they would relate. Otherwise,
>>>>>>>>>>>> if many contributions had fuzzy component associations, we
>>> would
>>>>>>>>> again
>>>>>>>>>>> not
>>>>>>>>>>>> solve the issue of having clear
>>>>>>>>>>>> responsibilities for who would track the progress and
>>>> resolution.
>>>>>>>>>>>>
>>>>>>>>>>>> We also looked at each component and wrote the names of some
>>>>> people
>>>>>>>>> who
>>>>>>>>>>> we
>>>>>>>>>>>> thought were natural
>>>>>>>>>>>> experts for the components, and thus natural candidates for
>>>>>>>>>> maintainers.
>>>>>>>>>>>>
>>>>>>>>>>>> **These names are only a starting point for discussion.**
>>>>>>>>>>>>
>>>>>>>>>>>> Once agreed upon, the components and names of maintainers
>>> should
>>>>> be
>>>>>>>>>> kept
>>>>>>>>>>> in
>>>>>>>>>>>> the wiki and updated as
>>>>>>>>>>>> components change and people step up or down.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *DataSet API* (*Fabian, Greg, Gabor*)
>>>>>>>>>>>>  - Incuding Hadoop compat. parts
>>>>>>>>>>>>
>>>>>>>>>>>> *DataStream API* (*Aljoscha, Max, Stephan*)
>>>>>>>>>>>>
>>>>>>>>>>>> *Runtime*
>>>>>>>>>>>>  - Distributed Coordination (JobManager/TaskManager, Akka)
>>>>> (*Till*)
>>>>>>>>>>>>  - Local Runtime (Memory Management, State Backends,
>>>>>>>>> Tasks/Operators)
>>>>>>>>>> (
>>>>>>>>>>>> *Stephan*)
>>>>>>>>>>>>  - Network (*Ufuk*)
>>>>>>>>>>>>
>>>>>>>>>>>> *Client/Optimizer* (*Fabian*)
>>>>>>>>>>>>
>>>>>>>>>>>> *Type system / Type extractor* (Timo)
>>>>>>>>>>>>
>>>>>>>>>>>> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max,
>>> Robert*)
>>>>>>>>>>>>
>>>>>>>>>>>> *Libraries*
>>>>>>>>>>>>  - Gelly (*Vasia, Greg*)
>>>>>>>>>>>>  - ML (*Till, Theo*)
>>>>>>>>>>>>  - CEP (*Till*)
>>>>>>>>>>>>  - Python (*Chesnay*)
>>>>>>>>>>>>
>>>>>>>>>>>> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
>>>>>>>>>>>>
>>>>>>>>>>>> *Streaming Connectors* (*Robert*, *Aljoscha*)
>>>>>>>>>>>>
>>>>>>>>>>>> *Batch Connectors and Input/Output Formats* (*Chesnay*)
>>>>>>>>>>>>
>>>>>>>>>>>> *Storm Compatibility Layer* (*Mathias*)
>>>>>>>>>>>>
>>>>>>>>>>>> *Scala shell* (*Till*)
>>>>>>>>>>>>
>>>>>>>>>>>> *Startup Shell Scripts* (Ufuk)
>>>>>>>>>>>>
>>>>>>>>>>>> *Flink Build System, Maven Files* (*Robert*)
>>>>>>>>>>>>
>>>>>>>>>>>> *Documentation* (Ufuk)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Please let us know what you think about this proposal.
>>>>>>>>>>>> Happy discussing!
>>>>>>>>>>>>
>>>>>>>>>>>> Greetings,
>>>>>>>>>>>> Stephan
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Structure the Flink Open Source Development

Timo Walther-2
+1 for from my side too



On 13.05.2016 06:13, Chiwan Park wrote:
> +1 for this proposal


Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Structure the Flink Open Source Development

Kostas Tzoumas-2
Should we also add a component "Flink website and wiki" (minus the
documentation) with an associated maintainer?

On Fri, May 13, 2016 at 12:17 PM, Timo Walther <[hidden email]> wrote:

> +1 for from my side too
>
>
>
> On 13.05.2016 06:13, Chiwan Park wrote:
>
>> +1 for this proposal
>>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Structure the Flink Open Source Development

Matthias J. Sax-2
Sounds like a good idea to me. We could include Wikipedia article as well.

As was thinking about extending the article anyway (no time so far...),
as of Flink 1.x the system is stable in large parts and it might be nice
to have a high level system description on Wikipedia, too.

-Matthias


On 05/13/2016 12:20 PM, Kostas Tzoumas wrote:

> Should we also add a component "Flink website and wiki" (minus the
> documentation) with an associated maintainer?
>
> On Fri, May 13, 2016 at 12:17 PM, Timo Walther <[hidden email]> wrote:
>
>> +1 for from my side too
>>
>>
>>
>> On 13.05.2016 06:13, Chiwan Park wrote:
>>
>>> +1 for this proposal
>>>
>>
>>
>>
>


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Structure the Flink Open Source Development

Greg Hogan
In reply to this post by Stephan Ewen
+1 to better scaling :)

Many Jira tickets are good ideas with no current traction. Some have a pull
request (usually closed), many have comments or discussion. It seems these
old tickets tend to hang around because closing the ticket feels like
rejecting the idea. How do we track requested features without cluttering
Jira?

Greg

On Tue, May 10, 2016 at 5:24 AM, Stephan Ewen <[hidden email]> wrote:

> Hi everyone!
>
> We propose to establish some lightweight structures in the Flink open
> source community and development process,
> to help us better handle the increased interest in Flink (mailing list and
> pull requests), while not overwhelming the
> committers, and giving users and contributors a good experience.
>
> This proposal is triggered by the observation that we are reaching the
> limits of where the current community can support
> users and guide new contributors. The below proposal is based on
> observations and ideas from Till, Robert, and me.
>
> ========
> Goals
> ========
>
> We try to achieve the following
>
>   - Pull requests get handled in a timely fashion
>   - New contributors are better integrated into the community
>   - The community feels empowered on the mailing list.
>     But questions that need the attention of someone that has deep
> knowledge of a certain part of Flink get their attention.
>   - At the same time, the committers that are knowledgeable about many core
> parts do not get completely overwhelmed.
>   - We don't overlook threads that report critical issues.
>   - We always have a pretty good overview of what the status of certain
> parts of the system are.
>       -> What are often encountered known issues
>       -> What are the most frequently requested features
>
>
> ========
> Problems
> ========
>
> Looking into the process, there are two big issues:
>
> (1) Up to now, we have been relying on the fact that everything just
> "organizes itself", driven by best effort. That assumes
> that everyone feels equally responsible for every part, question, and
> contribution. At the current state, this is impossible
> to maintain, it overwhelms the committers and contributors.
>
> Example: Pull requests are picked up by whoever wants to pick them up. Pull
> requests that are a lot of work, have little
> chance of getting in, or relate to less active components are sometimes not
> picked up. When contributors are pretty
> loaded already, it may happen that no one eventually feels responsible to
> pick up a pull request, and it falls through the cracks.
>
> (2) There is no good overview of what are known shortcomings, efforts, and
> requested features for different parts of the system.
> This information exists in various peoples' heads, but is not easily
> accessible for new people. The Flink JIRA is not well
> maintained, it is not easy to draw insights from that.
>
>
> ===========
> The Proposal
> ===========
>
> Since we are building a parallel system, the natural solution seems to be:
> partition the workload ;-)
>
> We propose to define a set of components for Flink. Each component is
> maintained or tracked by one or more
> people - let's call them maintainers. It is important to note that we don't
> suggest the maintainers as an authoritative role, but
> simply as committers or contributors that visibly step up for a certain
> component, and mainly track and drive the efforts
> pertaining to that component.
>
> It is also important to realize that we do not want to suggest that people
> get less involved with certain parts and components, because
> they are not the maintainers. We simply want to make sure that each pull
> request or question or contribution has in the end
> one person (or a small set of people) responsible for catching and tracking
> it, if it was not worked on by the pro-active
> community.
>
> For some components, having multiple maintainers will be helpful. In that
> case, one maintainer should be the "chair" or "lead"
> and make sure that no issue of that component gets lost between the
> multiple maintainers.
>
>
> A maintainers' role is:
> -----------------------------
>
>   - Have an overview of which of the open pull requests relate to their
> component
>   - Drive the pull requests relating to the component to resolution
>       => Moderate the decision whether the feature should be merged
>       => Make sure the pull request gets a shepherd.
>            In many cases, the maintainers would shepherd themselves.
>       => In case the shepherd becomes inactive, the maintainers need to
> find a new shepherd.
>
>   - Have an overview of what are the known issues of their component
>   - Have an overview of what are the frequently requested features of their
> component
>
>   - Have an overview of which contributors are doing very good work in
> their component,
>     would be candidates for committers, and should be mentored towards
> that.
>
>   - Resolve email threads that have been brought to their attention,
> because deeper
>     component knowledge is required for that thread.
>
> A maintainers' role is NOT:
> ----------------------------------
>
>   - Review all pull requests of that component
>   - Answer every mail with questions about that component
>   - Fix all bugs and implement all features of that components
>
>
> We imagine the following way that the community and the maintainers
> interact:
>
> ---------------------------------------------------------------------------------------------------------
>
>   - Pull requests should be tagged by component. Since we cannot add labels
> at this point, we need
>     to rely on the following:
>      => The pull request opener should name the pull request like
> "[FLINK-XXX] [component] Title"
>      => Components can be (re) tagged by adding special comments in the
> pull request ("==> component client")
>      => With some luck, GitHub and Apache Infra will allow us to use labels
> at some point
>
>   - When pull requests are associated with a component, the maintainers
> will manage them
>     (decision whether to add, find shepherd, catch dropped pull requests)
>
>   - We assume that maintainers frequently reach out to other community
> members and ask them if they want
>     to shepherd a pull request.
>
>   - On the mailing list, everyone should feel equally empowered to answer
> and discuss.
>     If at some point in the discussion, some deep technical knowledge about
> a component is required,
>     the maintainer(s) should be drawn into the discussion.
>     Because the Mailing List infrastructure has no support to tag threads,
> here are some simple workarounds:
>
>     => One possibility is to put the maintainers' mail addresses on cc for
> the thread, so they get the mail
>           not just via l the mailing list
>     => Another way would be to post something like "+maintainer runtime" in
> the thread and the "runtime"
>          maintainers would have a filter/alert on these keywords in their
> mail program.
>
>   - We assume that maintainers will reach out to community members that are
> very active and helpful in
>     a component, and will ask them if they want to be added as maintainers.
>     That will make it visible that those people are experts for that part
> of Flink.
>
>
> ======================================
> Maintainers: Committers and Contributors
> ======================================
>
> It helps if maintainers are committers (since we want them to resolve pull
> requests which often involves
> merging them).
>
> Components with multiple maintainers can easily have non-committer
> contributors in addition to committer
> contributors.
>
>
> ======
> JIRA
> ======
>
> Ideally, JIRA can be used to get an overview of what are the known issues
> of each component, and what are
> common feature requests. Unfortunately, the Flink JIRA is quite unorganized
> right now.
>
> A natural followup effort of this proposal would be to define in JIRA the
> same components as we defined here,
> and have the maintainers keep JIRA meaningful for that particular
> component. That would allow us to
> easily generate some tables out of JIRA (like top known issues per
> component, most requested features)
> post them on the dev list once in a while as a "state of the union" report.
>
> Initial assignment of issues to components should be made by those people
> opening the issue. The maintainer
> of that tagged component needs to change the tag, if the component was
> classified incorrectly.
>
>
> ======================================
> Initial Components and Maintainers Suggestion
> ======================================
>
> Below is a suggestion of how to define components for Flink. One goal of
> the division was to make it
> obvious for the majority of questions and contributions to which component
> they would relate. Otherwise,
> if many contributions had fuzzy component associations, we would again not
> solve the issue of having clear
> responsibilities for who would track the progress and resolution.
>
> We also looked at each component and wrote the names of some people who we
> thought were natural
> experts for the components, and thus natural candidates for maintainers.
>
> **These names are only a starting point for discussion.**
>
> Once agreed upon, the components and names of maintainers should be kept in
> the wiki and updated as
> components change and people step up or down.
>
>
> *DataSet API* (*Fabian, Greg, Gabor*)
>   - Incuding Hadoop compat. parts
>
> *DataStream API* (*Aljoscha, Max, Stephan*)
>
> *Runtime*
>   - Distributed Coordination (JobManager/TaskManager, Akka)  (*Till*)
>   - Local Runtime (Memory Management, State Backends, Tasks/Operators) (
> *Stephan*)
>   - Network (*Ufuk*)
>
> *Client/Optimizer* (*Fabian*)
>
> *Type system / Type extractor* (Timo)
>
> *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)
>
> *Libraries*
>   - Gelly (*Vasia, Greg*)
>   - ML (*Till, Theo*)
>   - CEP (*Till*)
>   - Python (*Chesnay*)
>
> *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
>
> *Streaming Connectors* (*Robert*, *Aljoscha*)
>
> *Batch Connectors and Input/Output Formats* (*Chesnay*)
>
> *Storm Compatibility Layer* (*Mathias*)
>
> *Scala shell* (*Till*)
>
> *Startup Shell Scripts* (Ufuk)
>
> *Flink Build System, Maven Files* (*Robert*)
>
> *Documentation* (Ufuk)
>
>
> Please let us know what you think about this proposal.
> Happy discussing!
>
> Greetings,
> Stephan
>
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Structure the Flink Open Source Development

Fabian Hueske-2
I like the proposal and especially the goal to improve the metadata and
descriptions of JIRA issues.

However, I would like to split Client and Optimizer into separate
components.
I can be a maintainer of the optimizer component (DataSet + SQL are fine as
well).

Cheers, Fabian



2016-05-13 17:03 GMT+02:00 Greg Hogan <[hidden email]>:

> +1 to better scaling :)
>
> Many Jira tickets are good ideas with no current traction. Some have a pull
> request (usually closed), many have comments or discussion. It seems these
> old tickets tend to hang around because closing the ticket feels like
> rejecting the idea. How do we track requested features without cluttering
> Jira?
>
> Greg
>
> On Tue, May 10, 2016 at 5:24 AM, Stephan Ewen <[hidden email]> wrote:
>
> > Hi everyone!
> >
> > We propose to establish some lightweight structures in the Flink open
> > source community and development process,
> > to help us better handle the increased interest in Flink (mailing list
> and
> > pull requests), while not overwhelming the
> > committers, and giving users and contributors a good experience.
> >
> > This proposal is triggered by the observation that we are reaching the
> > limits of where the current community can support
> > users and guide new contributors. The below proposal is based on
> > observations and ideas from Till, Robert, and me.
> >
> > ========
> > Goals
> > ========
> >
> > We try to achieve the following
> >
> >   - Pull requests get handled in a timely fashion
> >   - New contributors are better integrated into the community
> >   - The community feels empowered on the mailing list.
> >     But questions that need the attention of someone that has deep
> > knowledge of a certain part of Flink get their attention.
> >   - At the same time, the committers that are knowledgeable about many
> core
> > parts do not get completely overwhelmed.
> >   - We don't overlook threads that report critical issues.
> >   - We always have a pretty good overview of what the status of certain
> > parts of the system are.
> >       -> What are often encountered known issues
> >       -> What are the most frequently requested features
> >
> >
> > ========
> > Problems
> > ========
> >
> > Looking into the process, there are two big issues:
> >
> > (1) Up to now, we have been relying on the fact that everything just
> > "organizes itself", driven by best effort. That assumes
> > that everyone feels equally responsible for every part, question, and
> > contribution. At the current state, this is impossible
> > to maintain, it overwhelms the committers and contributors.
> >
> > Example: Pull requests are picked up by whoever wants to pick them up.
> Pull
> > requests that are a lot of work, have little
> > chance of getting in, or relate to less active components are sometimes
> not
> > picked up. When contributors are pretty
> > loaded already, it may happen that no one eventually feels responsible to
> > pick up a pull request, and it falls through the cracks.
> >
> > (2) There is no good overview of what are known shortcomings, efforts,
> and
> > requested features for different parts of the system.
> > This information exists in various peoples' heads, but is not easily
> > accessible for new people. The Flink JIRA is not well
> > maintained, it is not easy to draw insights from that.
> >
> >
> > ===========
> > The Proposal
> > ===========
> >
> > Since we are building a parallel system, the natural solution seems to
> be:
> > partition the workload ;-)
> >
> > We propose to define a set of components for Flink. Each component is
> > maintained or tracked by one or more
> > people - let's call them maintainers. It is important to note that we
> don't
> > suggest the maintainers as an authoritative role, but
> > simply as committers or contributors that visibly step up for a certain
> > component, and mainly track and drive the efforts
> > pertaining to that component.
> >
> > It is also important to realize that we do not want to suggest that
> people
> > get less involved with certain parts and components, because
> > they are not the maintainers. We simply want to make sure that each pull
> > request or question or contribution has in the end
> > one person (or a small set of people) responsible for catching and
> tracking
> > it, if it was not worked on by the pro-active
> > community.
> >
> > For some components, having multiple maintainers will be helpful. In that
> > case, one maintainer should be the "chair" or "lead"
> > and make sure that no issue of that component gets lost between the
> > multiple maintainers.
> >
> >
> > A maintainers' role is:
> > -----------------------------
> >
> >   - Have an overview of which of the open pull requests relate to their
> > component
> >   - Drive the pull requests relating to the component to resolution
> >       => Moderate the decision whether the feature should be merged
> >       => Make sure the pull request gets a shepherd.
> >            In many cases, the maintainers would shepherd themselves.
> >       => In case the shepherd becomes inactive, the maintainers need to
> > find a new shepherd.
> >
> >   - Have an overview of what are the known issues of their component
> >   - Have an overview of what are the frequently requested features of
> their
> > component
> >
> >   - Have an overview of which contributors are doing very good work in
> > their component,
> >     would be candidates for committers, and should be mentored towards
> > that.
> >
> >   - Resolve email threads that have been brought to their attention,
> > because deeper
> >     component knowledge is required for that thread.
> >
> > A maintainers' role is NOT:
> > ----------------------------------
> >
> >   - Review all pull requests of that component
> >   - Answer every mail with questions about that component
> >   - Fix all bugs and implement all features of that components
> >
> >
> > We imagine the following way that the community and the maintainers
> > interact:
> >
> >
> ---------------------------------------------------------------------------------------------------------
> >
> >   - Pull requests should be tagged by component. Since we cannot add
> labels
> > at this point, we need
> >     to rely on the following:
> >      => The pull request opener should name the pull request like
> > "[FLINK-XXX] [component] Title"
> >      => Components can be (re) tagged by adding special comments in the
> > pull request ("==> component client")
> >      => With some luck, GitHub and Apache Infra will allow us to use
> labels
> > at some point
> >
> >   - When pull requests are associated with a component, the maintainers
> > will manage them
> >     (decision whether to add, find shepherd, catch dropped pull requests)
> >
> >   - We assume that maintainers frequently reach out to other community
> > members and ask them if they want
> >     to shepherd a pull request.
> >
> >   - On the mailing list, everyone should feel equally empowered to answer
> > and discuss.
> >     If at some point in the discussion, some deep technical knowledge
> about
> > a component is required,
> >     the maintainer(s) should be drawn into the discussion.
> >     Because the Mailing List infrastructure has no support to tag
> threads,
> > here are some simple workarounds:
> >
> >     => One possibility is to put the maintainers' mail addresses on cc
> for
> > the thread, so they get the mail
> >           not just via l the mailing list
> >     => Another way would be to post something like "+maintainer runtime"
> in
> > the thread and the "runtime"
> >          maintainers would have a filter/alert on these keywords in their
> > mail program.
> >
> >   - We assume that maintainers will reach out to community members that
> are
> > very active and helpful in
> >     a component, and will ask them if they want to be added as
> maintainers.
> >     That will make it visible that those people are experts for that part
> > of Flink.
> >
> >
> > ======================================
> > Maintainers: Committers and Contributors
> > ======================================
> >
> > It helps if maintainers are committers (since we want them to resolve
> pull
> > requests which often involves
> > merging them).
> >
> > Components with multiple maintainers can easily have non-committer
> > contributors in addition to committer
> > contributors.
> >
> >
> > ======
> > JIRA
> > ======
> >
> > Ideally, JIRA can be used to get an overview of what are the known issues
> > of each component, and what are
> > common feature requests. Unfortunately, the Flink JIRA is quite
> unorganized
> > right now.
> >
> > A natural followup effort of this proposal would be to define in JIRA the
> > same components as we defined here,
> > and have the maintainers keep JIRA meaningful for that particular
> > component. That would allow us to
> > easily generate some tables out of JIRA (like top known issues per
> > component, most requested features)
> > post them on the dev list once in a while as a "state of the union"
> report.
> >
> > Initial assignment of issues to components should be made by those people
> > opening the issue. The maintainer
> > of that tagged component needs to change the tag, if the component was
> > classified incorrectly.
> >
> >
> > ======================================
> > Initial Components and Maintainers Suggestion
> > ======================================
> >
> > Below is a suggestion of how to define components for Flink. One goal of
> > the division was to make it
> > obvious for the majority of questions and contributions to which
> component
> > they would relate. Otherwise,
> > if many contributions had fuzzy component associations, we would again
> not
> > solve the issue of having clear
> > responsibilities for who would track the progress and resolution.
> >
> > We also looked at each component and wrote the names of some people who
> we
> > thought were natural
> > experts for the components, and thus natural candidates for maintainers.
> >
> > **These names are only a starting point for discussion.**
> >
> > Once agreed upon, the components and names of maintainers should be kept
> in
> > the wiki and updated as
> > components change and people step up or down.
> >
> >
> > *DataSet API* (*Fabian, Greg, Gabor*)
> >   - Incuding Hadoop compat. parts
> >
> > *DataStream API* (*Aljoscha, Max, Stephan*)
> >
> > *Runtime*
> >   - Distributed Coordination (JobManager/TaskManager, Akka)  (*Till*)
> >   - Local Runtime (Memory Management, State Backends, Tasks/Operators) (
> > *Stephan*)
> >   - Network (*Ufuk*)
> >
> > *Client/Optimizer* (*Fabian*)
> >
> > *Type system / Type extractor* (Timo)
> >
> > *Cluster Management* (Yarn, Mesos, Docker, ...) (*Max, Robert*)
> >
> > *Libraries*
> >   - Gelly (*Vasia, Greg*)
> >   - ML (*Till, Theo*)
> >   - CEP (*Till*)
> >   - Python (*Chesnay*)
> >
> > *Table API & SQL* (*Fabian, Vasia, Timo, Chengxiang*)
> >
> > *Streaming Connectors* (*Robert*, *Aljoscha*)
> >
> > *Batch Connectors and Input/Output Formats* (*Chesnay*)
> >
> > *Storm Compatibility Layer* (*Mathias*)
> >
> > *Scala shell* (*Till*)
> >
> > *Startup Shell Scripts* (Ufuk)
> >
> > *Flink Build System, Maven Files* (*Robert*)
> >
> > *Documentation* (Ufuk)
> >
> >
> > Please let us know what you think about this proposal.
> > Happy discussing!
> >
> > Greetings,
> > Stephan
> >
>
123