[DISCUSS] A more thorough Pull Request check list and template

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] A more thorough Pull Request check list and template

Stephan Ewen
Hi all!

I have reflected a bit on the pull requests and on some of the recent
changes to Flink and some of the introduced bugs / regressions that we have
fixed.

One thing that I think would have helped is to have more explicit
information about what the pull request does and how the contributor would
suggest to verify it. I have seen this when contributing to some other
project and really liked the approach.

It requires that a contributor takes a minute to reflect on what was
touched, and what would be ways to verify that the changes work properly.
Besides being a help to the reviewer, it also makes contributors aware of
what is important during the review process.


I suggest a new pull request template, as attached below, with a preview
here:
https://github.com/StephanEwen/incubator-flink/blob/pr_template/.github/PULL_REQUEST_TEMPLATE.md

Don't be scared, it looks long, but a big part is the introductory text
(only relevant for new contributors) and the examples contents for the
description.

Filling this out for code that is in shape should be a quick thing: Remove
the into and checklist, write a few sentences on what the PR does (one
should do that anyways) and then pick some yes/no in the classification
section.

Curious to hear what you think!

Best,
Stephan


============================

Full suggested pull request template:



*Thank you very much for contributing to Apache Flink - we are happy that
you want to help us improve Flink. To help the community review you
contribution in the best possible way, please go through the checklist
below, which will get the contribution into a shape in which it can be best
reviewed.*

*Please understand that we do not do this to make contributions to Flink a
hassle. In order to uphold a high standard of quality for code
contributions, while at the same time managing a large number of
contributions, we need contributors to prepare the contributions well, and
give reviewers enough contextual information for the review. Please also
understand that contributions that do not follow this guide will take
longer to review and thus typically be picked up with lower priority by the
community.*

## Contribution Checklist

  - Make sure that the pull request corresponds to a [JIRA issue](
https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are made
for typos in JavaDoc or documentation files, which need no JIRA issue.

  - Name the pull request in the form "[FLINK-1234] [component] Title of
the pull request", where *FLINK-1234* should be replaced by the actual
issue number. Skip *component* if you are unsure about which is the best
component.
  Typo fixes that have no associated JIRA issue should be named following
this pattern: `[hotfix] [docs] Fix typo in event time introduction` or
`[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.

  - Fill out the template below to describe the changes contributed by the
pull request. That will give reviewers the context they need to do the
review.

  - Make sure that the change passes the automated tests, i.e., `mvn clean
verify`

  - Each pull request should address only one issue, not mix up code from
multiple issues.

  - Each commit in the pull request has a meaningful commit message
(including the JIRA id)

  - Once all items of the checklist are addressed, remove the above text
and this checklist, leaving only the filled out template below.


**(The sections below can be removed for hotfixes of typos)**

## What is the purpose of the change

*(For example: This pull request makes task deployment go through the blob
server, rather than through RPC. That way we avoid re-transferring them on
each deployment (during recovery).)*


## Brief change log

*(for example:)*
  - *The TaskInfo is stored in the blob store on job creation time as a
persistent artifact*
  - *Deployments RPC transmits only the blob storage reference*
  - *TaskManagers retrieve the TaskInfo from the blob cache*


## Verifying this change

*(Please pick either of the following options)*

This change is a trivial rework / code cleanup without any test coverage.

*(or)*

This change is already covered by existing tests, such as *(please describe
tests)*.

*(or)*

This change added tests and can be verified as follows:

*(example:)*
  - *Added integration tests for end-to-end deployment with large payloads
(100MB)*
  - *Extended integration test for recovery after master (JobManager)
failure*
  - *Added test that validates that TaskInfo is transferred only once
across recoveries*
  - *Manually verified the change by running a 4 node cluser with 2
JobManagers and 4 TaskManagers, a stateful streaming program, and killing
one JobManager and to TaskManagers during the execution, verifying that
recovery happens correctly.*

## Does this pull request potentially affect one of the following parts:

  - Dependencies (does it add or upgrade a dependency): **(yes / no)**
  - The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: **(yes / no)**
  - The serializers: **(yes / no / don't know)**
  - The runtime per-record code paths (performance sensitive): **(yes / no
/ don't know)**
  - Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Yarn/Mesos, ZooKeeper: **(yes / no / don't
know)**:

## Documentation

  - Does this pull request introduce a new feature? **(yes / no)**
  - If yes, how is the feature documented? **(not applicable / docs /
JavaDocs / not documented)**
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A more thorough Pull Request check list and template

Ufuk Celebi-2
I really like this and vote to change our current template.

The simple yes/no/... options are a really good idea. I would add to
your email that the questions will equally help reviewers to remember
to look at these things, which is just as important.

When we merge this, we should make sure to strictly follow the guide.
Ideally, in the long term we can even automate some of the yes/no/...
questions via a bot... but let's not get ahead of ourselves here ;-)


On Mon, Jul 17, 2017 at 6:36 PM, Stephan Ewen <[hidden email]> wrote:

> Hi all!
>
> I have reflected a bit on the pull requests and on some of the recent
> changes to Flink and some of the introduced bugs / regressions that we have
> fixed.
>
> One thing that I think would have helped is to have more explicit
> information about what the pull request does and how the contributor would
> suggest to verify it. I have seen this when contributing to some other
> project and really liked the approach.
>
> It requires that a contributor takes a minute to reflect on what was
> touched, and what would be ways to verify that the changes work properly.
> Besides being a help to the reviewer, it also makes contributors aware of
> what is important during the review process.
>
>
> I suggest a new pull request template, as attached below, with a preview
> here:
> https://github.com/StephanEwen/incubator-flink/blob/pr_template/.github/PULL_REQUEST_TEMPLATE.md
>
> Don't be scared, it looks long, but a big part is the introductory text
> (only relevant for new contributors) and the examples contents for the
> description.
>
> Filling this out for code that is in shape should be a quick thing: Remove
> the into and checklist, write a few sentences on what the PR does (one
> should do that anyways) and then pick some yes/no in the classification
> section.
>
> Curious to hear what you think!
>
> Best,
> Stephan
>
>
> ============================
>
> Full suggested pull request template:
>
>
>
> *Thank you very much for contributing to Apache Flink - we are happy that
> you want to help us improve Flink. To help the community review you
> contribution in the best possible way, please go through the checklist
> below, which will get the contribution into a shape in which it can be best
> reviewed.*
>
> *Please understand that we do not do this to make contributions to Flink a
> hassle. In order to uphold a high standard of quality for code
> contributions, while at the same time managing a large number of
> contributions, we need contributors to prepare the contributions well, and
> give reviewers enough contextual information for the review. Please also
> understand that contributions that do not follow this guide will take
> longer to review and thus typically be picked up with lower priority by the
> community.*
>
> ## Contribution Checklist
>
>   - Make sure that the pull request corresponds to a [JIRA issue](
> https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are made
> for typos in JavaDoc or documentation files, which need no JIRA issue.
>
>   - Name the pull request in the form "[FLINK-1234] [component] Title of
> the pull request", where *FLINK-1234* should be replaced by the actual
> issue number. Skip *component* if you are unsure about which is the best
> component.
>   Typo fixes that have no associated JIRA issue should be named following
> this pattern: `[hotfix] [docs] Fix typo in event time introduction` or
> `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.
>
>   - Fill out the template below to describe the changes contributed by the
> pull request. That will give reviewers the context they need to do the
> review.
>
>   - Make sure that the change passes the automated tests, i.e., `mvn clean
> verify`
>
>   - Each pull request should address only one issue, not mix up code from
> multiple issues.
>
>   - Each commit in the pull request has a meaningful commit message
> (including the JIRA id)
>
>   - Once all items of the checklist are addressed, remove the above text
> and this checklist, leaving only the filled out template below.
>
>
> **(The sections below can be removed for hotfixes of typos)**
>
> ## What is the purpose of the change
>
> *(For example: This pull request makes task deployment go through the blob
> server, rather than through RPC. That way we avoid re-transferring them on
> each deployment (during recovery).)*
>
>
> ## Brief change log
>
> *(for example:)*
>   - *The TaskInfo is stored in the blob store on job creation time as a
> persistent artifact*
>   - *Deployments RPC transmits only the blob storage reference*
>   - *TaskManagers retrieve the TaskInfo from the blob cache*
>
>
> ## Verifying this change
>
> *(Please pick either of the following options)*
>
> This change is a trivial rework / code cleanup without any test coverage.
>
> *(or)*
>
> This change is already covered by existing tests, such as *(please describe
> tests)*.
>
> *(or)*
>
> This change added tests and can be verified as follows:
>
> *(example:)*
>   - *Added integration tests for end-to-end deployment with large payloads
> (100MB)*
>   - *Extended integration test for recovery after master (JobManager)
> failure*
>   - *Added test that validates that TaskInfo is transferred only once
> across recoveries*
>   - *Manually verified the change by running a 4 node cluser with 2
> JobManagers and 4 TaskManagers, a stateful streaming program, and killing
> one JobManager and to TaskManagers during the execution, verifying that
> recovery happens correctly.*
>
> ## Does this pull request potentially affect one of the following parts:
>
>   - Dependencies (does it add or upgrade a dependency): **(yes / no)**
>   - The public API, i.e., is any changed class annotated with
> `@Public(Evolving)`: **(yes / no)**
>   - The serializers: **(yes / no / don't know)**
>   - The runtime per-record code paths (performance sensitive): **(yes / no
> / don't know)**
>   - Anything that affects deployment or recovery: JobManager (and its
> components), Checkpointing, Yarn/Mesos, ZooKeeper: **(yes / no / don't
> know)**:
>
> ## Documentation
>
>   - Does this pull request introduce a new feature? **(yes / no)**
>   - If yes, how is the feature documented? **(not applicable / docs /
> JavaDocs / not documented)**
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A more thorough Pull Request check list and template

Tzu-Li (Gordon) Tai
+1, I like this a lot.
With the previous template, it doesn’t really resonate with what we should care about, and therefore most of the time I think contributors just delete that template and write down something on their own.

I would also like to add: “Savepoint / checkpoint binary formats” to the potential affect scope check list.
I think changes to those deserves an independent yes / no check of its own.

Cheers,
Gordon

On 18 July 2017 at 3:49:42 PM, Ufuk Celebi ([hidden email]) wrote:

I really like this and vote to change our current template.  

The simple yes/no/... options are a really good idea. I would add to  
your email that the questions will equally help reviewers to remember  
to look at these things, which is just as important.  

When we merge this, we should make sure to strictly follow the guide.  
Ideally, in the long term we can even automate some of the yes/no/...  
questions via a bot... but let's not get ahead of ourselves here ;-)  


On Mon, Jul 17, 2017 at 6:36 PM, Stephan Ewen <[hidden email]> wrote:  

> Hi all!  
>  
> I have reflected a bit on the pull requests and on some of the recent  
> changes to Flink and some of the introduced bugs / regressions that we have  
> fixed.  
>  
> One thing that I think would have helped is to have more explicit  
> information about what the pull request does and how the contributor would  
> suggest to verify it. I have seen this when contributing to some other  
> project and really liked the approach.  
>  
> It requires that a contributor takes a minute to reflect on what was  
> touched, and what would be ways to verify that the changes work properly.  
> Besides being a help to the reviewer, it also makes contributors aware of  
> what is important during the review process.  
>  
>  
> I suggest a new pull request template, as attached below, with a preview  
> here:  
> https://github.com/StephanEwen/incubator-flink/blob/pr_template/.github/PULL_REQUEST_TEMPLATE.md 
>  
> Don't be scared, it looks long, but a big part is the introductory text  
> (only relevant for new contributors) and the examples contents for the  
> description.  
>  
> Filling this out for code that is in shape should be a quick thing: Remove  
> the into and checklist, write a few sentences on what the PR does (one  
> should do that anyways) and then pick some yes/no in the classification  
> section.  
>  
> Curious to hear what you think!  
>  
> Best,  
> Stephan  
>  
>  
> ============================  
>  
> Full suggested pull request template:  
>  
>  
>  
> *Thank you very much for contributing to Apache Flink - we are happy that  
> you want to help us improve Flink. To help the community review you  
> contribution in the best possible way, please go through the checklist  
> below, which will get the contribution into a shape in which it can be best  
> reviewed.*  
>  
> *Please understand that we do not do this to make contributions to Flink a  
> hassle. In order to uphold a high standard of quality for code  
> contributions, while at the same time managing a large number of  
> contributions, we need contributors to prepare the contributions well, and  
> give reviewers enough contextual information for the review. Please also  
> understand that contributions that do not follow this guide will take  
> longer to review and thus typically be picked up with lower priority by the  
> community.*  
>  
> ## Contribution Checklist  
>  
> - Make sure that the pull request corresponds to a [JIRA issue](  
> https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are made  
> for typos in JavaDoc or documentation files, which need no JIRA issue.  
>  
> - Name the pull request in the form "[FLINK-1234] [component] Title of  
> the pull request", where *FLINK-1234* should be replaced by the actual  
> issue number. Skip *component* if you are unsure about which is the best  
> component.  
> Typo fixes that have no associated JIRA issue should be named following  
> this pattern: `[hotfix] [docs] Fix typo in event time introduction` or  
> `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.  
>  
> - Fill out the template below to describe the changes contributed by the  
> pull request. That will give reviewers the context they need to do the  
> review.  
>  
> - Make sure that the change passes the automated tests, i.e., `mvn clean  
> verify`  
>  
> - Each pull request should address only one issue, not mix up code from  
> multiple issues.  
>  
> - Each commit in the pull request has a meaningful commit message  
> (including the JIRA id)  
>  
> - Once all items of the checklist are addressed, remove the above text  
> and this checklist, leaving only the filled out template below.  
>  
>  
> **(The sections below can be removed for hotfixes of typos)**  
>  
> ## What is the purpose of the change  
>  
> *(For example: This pull request makes task deployment go through the blob  
> server, rather than through RPC. That way we avoid re-transferring them on  
> each deployment (during recovery).)*  
>  
>  
> ## Brief change log  
>  
> *(for example:)*  
> - *The TaskInfo is stored in the blob store on job creation time as a  
> persistent artifact*  
> - *Deployments RPC transmits only the blob storage reference*  
> - *TaskManagers retrieve the TaskInfo from the blob cache*  
>  
>  
> ## Verifying this change  
>  
> *(Please pick either of the following options)*  
>  
> This change is a trivial rework / code cleanup without any test coverage.  
>  
> *(or)*  
>  
> This change is already covered by existing tests, such as *(please describe  
> tests)*.  
>  
> *(or)*  
>  
> This change added tests and can be verified as follows:  
>  
> *(example:)*  
> - *Added integration tests for end-to-end deployment with large payloads  
> (100MB)*  
> - *Extended integration test for recovery after master (JobManager)  
> failure*  
> - *Added test that validates that TaskInfo is transferred only once  
> across recoveries*  
> - *Manually verified the change by running a 4 node cluser with 2  
> JobManagers and 4 TaskManagers, a stateful streaming program, and killing  
> one JobManager and to TaskManagers during the execution, verifying that  
> recovery happens correctly.*  
>  
> ## Does this pull request potentially affect one of the following parts:  
>  
> - Dependencies (does it add or upgrade a dependency): **(yes / no)**  
> - The public API, i.e., is any changed class annotated with  
> `@Public(Evolving)`: **(yes / no)**  
> - The serializers: **(yes / no / don't know)**  
> - The runtime per-record code paths (performance sensitive): **(yes / no  
> / don't know)**  
> - Anything that affects deployment or recovery: JobManager (and its  
> components), Checkpointing, Yarn/Mesos, ZooKeeper: **(yes / no / don't  
> know)**:  
>  
> ## Documentation  
>  
> - Does this pull request introduce a new feature? **(yes / no)**  
> - If yes, how is the feature documented? **(not applicable / docs /  
> JavaDocs / not documented)**  
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A more thorough Pull Request check list and template

Fabian Hueske-2
I like the sections about purpose, change log, and verification of the
changes.

However, I think the proposed template is too much text. This is probably
the reason why the first attempt to establish a PR template failed.
I would move most of the introduction and explanations incl. examples to
the "Contribution Guidelines" and only pass a link.
IMO, the template should be rather shorter than the current one and only
have the link, the sections to fill out, and checkboxes.

I'm also not sure how much the detailed questions will help.
For example even if the question about changed dependencies is answered
with "no", the reviewer still has to check that.

I think the questions of the current template work differently.
A question "Does the PR include tests?" suggests to the contributor that
those should be included. Same for documentation.

Cheers,
Fabian

2017-07-18 10:05 GMT+02:00 Tzu-Li (Gordon) Tai <[hidden email]>:

> +1, I like this a lot.
> With the previous template, it doesn’t really resonate with what we should
> care about, and therefore most of the time I think contributors just delete
> that template and write down something on their own.
>
> I would also like to add: “Savepoint / checkpoint binary formats” to the
> potential affect scope check list.
> I think changes to those deserves an independent yes / no check of its own.
>
> Cheers,
> Gordon
>
> On 18 July 2017 at 3:49:42 PM, Ufuk Celebi ([hidden email]) wrote:
>
> I really like this and vote to change our current template.
>
> The simple yes/no/... options are a really good idea. I would add to
> your email that the questions will equally help reviewers to remember
> to look at these things, which is just as important.
>
> When we merge this, we should make sure to strictly follow the guide.
> Ideally, in the long term we can even automate some of the yes/no/...
> questions via a bot... but let's not get ahead of ourselves here ;-)
>
>
> On Mon, Jul 17, 2017 at 6:36 PM, Stephan Ewen <[hidden email]> wrote:
> > Hi all!
> >
> > I have reflected a bit on the pull requests and on some of the recent
> > changes to Flink and some of the introduced bugs / regressions that we
> have
> > fixed.
> >
> > One thing that I think would have helped is to have more explicit
> > information about what the pull request does and how the contributor
> would
> > suggest to verify it. I have seen this when contributing to some other
> > project and really liked the approach.
> >
> > It requires that a contributor takes a minute to reflect on what was
> > touched, and what would be ways to verify that the changes work properly.
> > Besides being a help to the reviewer, it also makes contributors aware of
> > what is important during the review process.
> >
> >
> > I suggest a new pull request template, as attached below, with a preview
> > here:
> > https://github.com/StephanEwen/incubator-flink/
> blob/pr_template/.github/PULL_REQUEST_TEMPLATE.md
> >
> > Don't be scared, it looks long, but a big part is the introductory text
> > (only relevant for new contributors) and the examples contents for the
> > description.
> >
> > Filling this out for code that is in shape should be a quick thing:
> Remove
> > the into and checklist, write a few sentences on what the PR does (one
> > should do that anyways) and then pick some yes/no in the classification
> > section.
> >
> > Curious to hear what you think!
> >
> > Best,
> > Stephan
> >
> >
> > ============================
> >
> > Full suggested pull request template:
> >
> >
> >
> > *Thank you very much for contributing to Apache Flink - we are happy that
> > you want to help us improve Flink. To help the community review you
> > contribution in the best possible way, please go through the checklist
> > below, which will get the contribution into a shape in which it can be
> best
> > reviewed.*
> >
> > *Please understand that we do not do this to make contributions to Flink
> a
> > hassle. In order to uphold a high standard of quality for code
> > contributions, while at the same time managing a large number of
> > contributions, we need contributors to prepare the contributions well,
> and
> > give reviewers enough contextual information for the review. Please also
> > understand that contributions that do not follow this guide will take
> > longer to review and thus typically be picked up with lower priority by
> the
> > community.*
> >
> > ## Contribution Checklist
> >
> > - Make sure that the pull request corresponds to a [JIRA issue](
> > https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are
> made
> > for typos in JavaDoc or documentation files, which need no JIRA issue.
> >
> > - Name the pull request in the form "[FLINK-1234] [component] Title of
> > the pull request", where *FLINK-1234* should be replaced by the actual
> > issue number. Skip *component* if you are unsure about which is the best
> > component.
> > Typo fixes that have no associated JIRA issue should be named following
> > this pattern: `[hotfix] [docs] Fix typo in event time introduction` or
> > `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.
> >
> > - Fill out the template below to describe the changes contributed by the
> > pull request. That will give reviewers the context they need to do the
> > review.
> >
> > - Make sure that the change passes the automated tests, i.e., `mvn clean
> > verify`
> >
> > - Each pull request should address only one issue, not mix up code from
> > multiple issues.
> >
> > - Each commit in the pull request has a meaningful commit message
> > (including the JIRA id)
> >
> > - Once all items of the checklist are addressed, remove the above text
> > and this checklist, leaving only the filled out template below.
> >
> >
> > **(The sections below can be removed for hotfixes of typos)**
> >
> > ## What is the purpose of the change
> >
> > *(For example: This pull request makes task deployment go through the
> blob
> > server, rather than through RPC. That way we avoid re-transferring them
> on
> > each deployment (during recovery).)*
> >
> >
> > ## Brief change log
> >
> > *(for example:)*
> > - *The TaskInfo is stored in the blob store on job creation time as a
> > persistent artifact*
> > - *Deployments RPC transmits only the blob storage reference*
> > - *TaskManagers retrieve the TaskInfo from the blob cache*
> >
> >
> > ## Verifying this change
> >
> > *(Please pick either of the following options)*
> >
> > This change is a trivial rework / code cleanup without any test coverage.
> >
> > *(or)*
> >
> > This change is already covered by existing tests, such as *(please
> describe
> > tests)*.
> >
> > *(or)*
> >
> > This change added tests and can be verified as follows:
> >
> > *(example:)*
> > - *Added integration tests for end-to-end deployment with large payloads
> > (100MB)*
> > - *Extended integration test for recovery after master (JobManager)
> > failure*
> > - *Added test that validates that TaskInfo is transferred only once
> > across recoveries*
> > - *Manually verified the change by running a 4 node cluser with 2
> > JobManagers and 4 TaskManagers, a stateful streaming program, and killing
> > one JobManager and to TaskManagers during the execution, verifying that
> > recovery happens correctly.*
> >
> > ## Does this pull request potentially affect one of the following parts:
> >
> > - Dependencies (does it add or upgrade a dependency): **(yes / no)**
> > - The public API, i.e., is any changed class annotated with
> > `@Public(Evolving)`: **(yes / no)**
> > - The serializers: **(yes / no / don't know)**
> > - The runtime per-record code paths (performance sensitive): **(yes / no
> > / don't know)**
> > - Anything that affects deployment or recovery: JobManager (and its
> > components), Checkpointing, Yarn/Mesos, ZooKeeper: **(yes / no / don't
> > know)**:
> >
> > ## Documentation
> >
> > - Does this pull request introduce a new feature? **(yes / no)**
> > - If yes, how is the feature documented? **(not applicable / docs /
> > JavaDocs / not documented)**
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A more thorough Pull Request check list and template

Chesnay Schepler-3
I fully agree with Fabian.

Multiple-choice questions provide little value to the reviewer, since the
validity has to be verified in any case. While text answers have to be
validated as well,
they give some hint to the reviewer as to how it can be verified and
which steps the
contributor did to do so.

I also agree that it is too long; IMO this is really intimidating to new
contributors to be greeted with this.

Ideally we only link to the contributors guide and ask 3 questions:

  * What is the problem?
  * How was it fixed?
  * How can the fix be verified?


On 18.07.2017 10:47, Fabian Hueske wrote:

> I like the sections about purpose, change log, and verification of the
> changes.
>
> However, I think the proposed template is too much text. This is probably
> the reason why the first attempt to establish a PR template failed.
> I would move most of the introduction and explanations incl. examples to
> the "Contribution Guidelines" and only pass a link.
> IMO, the template should be rather shorter than the current one and only
> have the link, the sections to fill out, and checkboxes.
>
> I'm also not sure how much the detailed questions will help.
> For example even if the question about changed dependencies is answered
> with "no", the reviewer still has to check that.
>
> I think the questions of the current template work differently.
> A question "Does the PR include tests?" suggests to the contributor that
> those should be included. Same for documentation.
>
> Cheers,
> Fabian
>
> 2017-07-18 10:05 GMT+02:00 Tzu-Li (Gordon) Tai <[hidden email]>:
>
>> +1, I like this a lot.
>> With the previous template, it doesn’t really resonate with what we should
>> care about, and therefore most of the time I think contributors just delete
>> that template and write down something on their own.
>>
>> I would also like to add: “Savepoint / checkpoint binary formats” to the
>> potential affect scope check list.
>> I think changes to those deserves an independent yes / no check of its own.
>>
>> Cheers,
>> Gordon
>>
>> On 18 July 2017 at 3:49:42 PM, Ufuk Celebi ([hidden email]) wrote:
>>
>> I really like this and vote to change our current template.
>>
>> The simple yes/no/... options are a really good idea. I would add to
>> your email that the questions will equally help reviewers to remember
>> to look at these things, which is just as important.
>>
>> When we merge this, we should make sure to strictly follow the guide.
>> Ideally, in the long term we can even automate some of the yes/no/...
>> questions via a bot... but let's not get ahead of ourselves here ;-)
>>
>>
>> On Mon, Jul 17, 2017 at 6:36 PM, Stephan Ewen <[hidden email]> wrote:
>>> Hi all!
>>>
>>> I have reflected a bit on the pull requests and on some of the recent
>>> changes to Flink and some of the introduced bugs / regressions that we
>> have
>>> fixed.
>>>
>>> One thing that I think would have helped is to have more explicit
>>> information about what the pull request does and how the contributor
>> would
>>> suggest to verify it. I have seen this when contributing to some other
>>> project and really liked the approach.
>>>
>>> It requires that a contributor takes a minute to reflect on what was
>>> touched, and what would be ways to verify that the changes work properly.
>>> Besides being a help to the reviewer, it also makes contributors aware of
>>> what is important during the review process.
>>>
>>>
>>> I suggest a new pull request template, as attached below, with a preview
>>> here:
>>> https://github.com/StephanEwen/incubator-flink/
>> blob/pr_template/.github/PULL_REQUEST_TEMPLATE.md
>>> Don't be scared, it looks long, but a big part is the introductory text
>>> (only relevant for new contributors) and the examples contents for the
>>> description.
>>>
>>> Filling this out for code that is in shape should be a quick thing:
>> Remove
>>> the into and checklist, write a few sentences on what the PR does (one
>>> should do that anyways) and then pick some yes/no in the classification
>>> section.
>>>
>>> Curious to hear what you think!
>>>
>>> Best,
>>> Stephan
>>>
>>>
>>> ============================
>>>
>>> Full suggested pull request template:
>>>
>>>
>>>
>>> *Thank you very much for contributing to Apache Flink - we are happy that
>>> you want to help us improve Flink. To help the community review you
>>> contribution in the best possible way, please go through the checklist
>>> below, which will get the contribution into a shape in which it can be
>> best
>>> reviewed.*
>>>
>>> *Please understand that we do not do this to make contributions to Flink
>> a
>>> hassle. In order to uphold a high standard of quality for code
>>> contributions, while at the same time managing a large number of
>>> contributions, we need contributors to prepare the contributions well,
>> and
>>> give reviewers enough contextual information for the review. Please also
>>> understand that contributions that do not follow this guide will take
>>> longer to review and thus typically be picked up with lower priority by
>> the
>>> community.*
>>>
>>> ## Contribution Checklist
>>>
>>> - Make sure that the pull request corresponds to a [JIRA issue](
>>> https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are
>> made
>>> for typos in JavaDoc or documentation files, which need no JIRA issue.
>>>
>>> - Name the pull request in the form "[FLINK-1234] [component] Title of
>>> the pull request", where *FLINK-1234* should be replaced by the actual
>>> issue number. Skip *component* if you are unsure about which is the best
>>> component.
>>> Typo fixes that have no associated JIRA issue should be named following
>>> this pattern: `[hotfix] [docs] Fix typo in event time introduction` or
>>> `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.
>>>
>>> - Fill out the template below to describe the changes contributed by the
>>> pull request. That will give reviewers the context they need to do the
>>> review.
>>>
>>> - Make sure that the change passes the automated tests, i.e., `mvn clean
>>> verify`
>>>
>>> - Each pull request should address only one issue, not mix up code from
>>> multiple issues.
>>>
>>> - Each commit in the pull request has a meaningful commit message
>>> (including the JIRA id)
>>>
>>> - Once all items of the checklist are addressed, remove the above text
>>> and this checklist, leaving only the filled out template below.
>>>
>>>
>>> **(The sections below can be removed for hotfixes of typos)**
>>>
>>> ## What is the purpose of the change
>>>
>>> *(For example: This pull request makes task deployment go through the
>> blob
>>> server, rather than through RPC. That way we avoid re-transferring them
>> on
>>> each deployment (during recovery).)*
>>>
>>>
>>> ## Brief change log
>>>
>>> *(for example:)*
>>> - *The TaskInfo is stored in the blob store on job creation time as a
>>> persistent artifact*
>>> - *Deployments RPC transmits only the blob storage reference*
>>> - *TaskManagers retrieve the TaskInfo from the blob cache*
>>>
>>>
>>> ## Verifying this change
>>>
>>> *(Please pick either of the following options)*
>>>
>>> This change is a trivial rework / code cleanup without any test coverage.
>>>
>>> *(or)*
>>>
>>> This change is already covered by existing tests, such as *(please
>> describe
>>> tests)*.
>>>
>>> *(or)*
>>>
>>> This change added tests and can be verified as follows:
>>>
>>> *(example:)*
>>> - *Added integration tests for end-to-end deployment with large payloads
>>> (100MB)*
>>> - *Extended integration test for recovery after master (JobManager)
>>> failure*
>>> - *Added test that validates that TaskInfo is transferred only once
>>> across recoveries*
>>> - *Manually verified the change by running a 4 node cluser with 2
>>> JobManagers and 4 TaskManagers, a stateful streaming program, and killing
>>> one JobManager and to TaskManagers during the execution, verifying that
>>> recovery happens correctly.*
>>>
>>> ## Does this pull request potentially affect one of the following parts:
>>>
>>> - Dependencies (does it add or upgrade a dependency): **(yes / no)**
>>> - The public API, i.e., is any changed class annotated with
>>> `@Public(Evolving)`: **(yes / no)**
>>> - The serializers: **(yes / no / don't know)**
>>> - The runtime per-record code paths (performance sensitive): **(yes / no
>>> / don't know)**
>>> - Anything that affects deployment or recovery: JobManager (and its
>>> components), Checkpointing, Yarn/Mesos, ZooKeeper: **(yes / no / don't
>>> know)**:
>>>
>>> ## Documentation
>>>
>>> - Does this pull request introduce a new feature? **(yes / no)**
>>> - If yes, how is the feature documented? **(not applicable / docs /
>>> JavaDocs / not documented)**


Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A more thorough Pull Request check list and template

Ufuk Celebi-2
In reply to this post by Fabian Hueske-2
On Tue, Jul 18, 2017 at 10:47 AM, Fabian Hueske <[hidden email]> wrote:
> For example even if the question about changed dependencies is answered
> with "no", the reviewer still has to check that.

But having it as a required option/text in the PR descriptions helps
reviewers to actually remember to check that. I think we should be
more realistic here and assume that reviewers will also overlook
things etc.

To me, keeping the questions is more important than the intro text.
Therefore, I would be OK with moving the text to the contrib guide,
but I would definitely keep the detailed yes/nos and not go with high
level questions that everyone will answer differently.

– Ufuk
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A more thorough Pull Request check list and template

Stephan Ewen
My thinking was exactly as echoed by Gordon and Ufuk:

  - The yes/no sections are also for reviewers a reminder of what to watch
out for.
    Let's face it, probably half of the committers are not aware that these
things need to be checked implicitly against every change.
    A good part of the recent issues came from exactly that. Changes get
merged (because the pull request lingered or the number of open PRs is
high) and these implications are not thought through.

  - This is to me a tradeoff between requiring explicit +1s from certain
people (maintainers) for certain components, and getting an awareness into
everybody's mind.

  - It also makes all users aware that these things are considered and
implicitly manages expectations in how fast can things get merged.


Concerning the long text: I think it is fine to play the ball a bit more to
the contributors.
Making it easy, yes. But also making it correct and well. We need to make
contributors aware of what it means to contribute to a system to runs
highly available critical infrastructure. There is quite often still the
mindset of "hey, cool, open source, let me throw something out there".

My take is that anyone who is serious about contributing and serious about
quality is not put off by this template.

Concerning the introductory text: I bet that rarely anyone reads the "how
to contribute" guide. Before the template, virtually no new pull request
had even the required naming.
That text needs to be in the template, or we might as well not have it
anywhere at all.



Just for reference: Below is the introductory text of the JDK ;-)

5. Know what to expect

Only the best patches submitted will actually make it all the way into a
JDK code base. The goal is not to take in the maximum number of
contributions possible, but rather to accept only the highest-quality
contributions. The JDK is used daily by millions of people and thousands of
businesses, often in mission-critical applications, and so we can't afford
to accept anything less than the very best.

If you're relatively new to the Java platform then we recommend that you
gain more experience writing Java applications before you attempt to work
on the JDK itself. The purpose of the sponsored-contribution process is to
bring developers who already have the skills required to work on the JDK
into the existing development community. The members of that community have
neither the time nor the patience required to teach basic Java programming
skills or platform implementation techniques.





On Tue, Jul 18, 2017 at 12:15 PM, Ufuk Celebi <[hidden email]> wrote:

> On Tue, Jul 18, 2017 at 10:47 AM, Fabian Hueske <[hidden email]> wrote:
> > For example even if the question about changed dependencies is answered
> > with "no", the reviewer still has to check that.
>
> But having it as a required option/text in the PR descriptions helps
> reviewers to actually remember to check that. I think we should be
> more realistic here and assume that reviewers will also overlook
> things etc.
>
> To me, keeping the questions is more important than the intro text.
> Therefore, I would be OK with moving the text to the contrib guide,
> but I would definitely keep the detailed yes/nos and not go with high
> level questions that everyone will answer differently.
>
> – Ufuk
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A more thorough Pull Request check list and template

Nico Kruber
In reply to this post by Chesnay Schepler-3
I like the new template but also agree with the text being too long and would
move the intro to the contributors guide with a link in the PR template.

Regarding the questions to fill out - I'd like the headings to be short and
have the affected components last so that documentation is not lost (although
being more important than this checklist), e.g.:

* Purpose of the change
* Brief change log
* Verifying the change
* Documentation
* Affected components

The verification options in the original template look a bit too large but it
stresses what tests should be added, especially for bigger changes. Can't
think of a way to make it shorter though.


Nico

On Tuesday, 18 July 2017 11:20:41 CEST Chesnay Schepler wrote:

> I fully agree with Fabian.
>
> Multiple-choice questions provide little value to the reviewer, since the
> validity has to be verified in any case. While text answers have to be
> validated as well,
> they give some hint to the reviewer as to how it can be verified and
> which steps the
> contributor did to do so.
>
> I also agree that it is too long; IMO this is really intimidating to new
> contributors to be greeted with this.
>
> Ideally we only link to the contributors guide and ask 3 questions:
>
>   * What is the problem?
>   * How was it fixed?
>   * How can the fix be verified?
>
> On 18.07.2017 10:47, Fabian Hueske wrote:
> > I like the sections about purpose, change log, and verification of the
> > changes.
> >
> > However, I think the proposed template is too much text. This is probably
> > the reason why the first attempt to establish a PR template failed.
> > I would move most of the introduction and explanations incl. examples to
> > the "Contribution Guidelines" and only pass a link.
> > IMO, the template should be rather shorter than the current one and only
> > have the link, the sections to fill out, and checkboxes.
> >
> > I'm also not sure how much the detailed questions will help.
> > For example even if the question about changed dependencies is answered
> > with "no", the reviewer still has to check that.
> >
> > I think the questions of the current template work differently.
> > A question "Does the PR include tests?" suggests to the contributor that
> > those should be included. Same for documentation.
> >
> > Cheers,
> > Fabian
> >
> > 2017-07-18 10:05 GMT+02:00 Tzu-Li (Gordon) Tai <[hidden email]>:
> >> +1, I like this a lot.
> >> With the previous template, it doesn’t really resonate with what we
> >> should
> >> care about, and therefore most of the time I think contributors just
> >> delete
> >> that template and write down something on their own.
> >>
> >> I would also like to add: “Savepoint / checkpoint binary formats” to the
> >> potential affect scope check list.
> >> I think changes to those deserves an independent yes / no check of its
> >> own.
> >>
> >> Cheers,
> >> Gordon
> >>
> >> On 18 July 2017 at 3:49:42 PM, Ufuk Celebi ([hidden email]) wrote:
> >>
> >> I really like this and vote to change our current template.
> >>
> >> The simple yes/no/... options are a really good idea. I would add to
> >> your email that the questions will equally help reviewers to remember
> >> to look at these things, which is just as important.
> >>
> >> When we merge this, we should make sure to strictly follow the guide.
> >> Ideally, in the long term we can even automate some of the yes/no/...
> >> questions via a bot... but let's not get ahead of ourselves here ;-)
> >>
> >> On Mon, Jul 17, 2017 at 6:36 PM, Stephan Ewen <[hidden email]> wrote:
> >>> Hi all!
> >>>
> >>> I have reflected a bit on the pull requests and on some of the recent
> >>> changes to Flink and some of the introduced bugs / regressions that we
> >>
> >> have
> >>
> >>> fixed.
> >>>
> >>> One thing that I think would have helped is to have more explicit
> >>> information about what the pull request does and how the contributor
> >>
> >> would
> >>
> >>> suggest to verify it. I have seen this when contributing to some other
> >>> project and really liked the approach.
> >>>
> >>> It requires that a contributor takes a minute to reflect on what was
> >>> touched, and what would be ways to verify that the changes work
> >>> properly.
> >>> Besides being a help to the reviewer, it also makes contributors aware
> >>> of
> >>> what is important during the review process.
> >>>
> >>>
> >>> I suggest a new pull request template, as attached below, with a preview
> >>> here:
> >>> https://github.com/StephanEwen/incubator-flink/
> >>
> >> blob/pr_template/.github/PULL_REQUEST_TEMPLATE.md
> >>
> >>> Don't be scared, it looks long, but a big part is the introductory text
> >>> (only relevant for new contributors) and the examples contents for the
> >>> description.
> >>
> >>> Filling this out for code that is in shape should be a quick thing:
> >> Remove
> >>
> >>> the into and checklist, write a few sentences on what the PR does (one
> >>> should do that anyways) and then pick some yes/no in the classification
> >>> section.
> >>>
> >>> Curious to hear what you think!
> >>>
> >>> Best,
> >>> Stephan
> >>>
> >>>
> >>> ============================
> >>>
> >>> Full suggested pull request template:
> >>>
> >>>
> >>>
> >>> *Thank you very much for contributing to Apache Flink - we are happy
> >>> that
> >>> you want to help us improve Flink. To help the community review you
> >>> contribution in the best possible way, please go through the checklist
> >>> below, which will get the contribution into a shape in which it can be
> >>
> >> best
> >>
> >>> reviewed.*
> >>>
> >>> *Please understand that we do not do this to make contributions to Flink
> >>
> >> a
> >>
> >>> hassle. In order to uphold a high standard of quality for code
> >>> contributions, while at the same time managing a large number of
> >>> contributions, we need contributors to prepare the contributions well,
> >>
> >> and
> >>
> >>> give reviewers enough contextual information for the review. Please also
> >>> understand that contributions that do not follow this guide will take
> >>> longer to review and thus typically be picked up with lower priority by
> >>
> >> the
> >>
> >>> community.*
> >>>
> >>> ## Contribution Checklist
> >>>
> >>> - Make sure that the pull request corresponds to a [JIRA issue](
> >>> https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are
> >>
> >> made
> >>
> >>> for typos in JavaDoc or documentation files, which need no JIRA issue.
> >>>
> >>> - Name the pull request in the form "[FLINK-1234] [component] Title of
> >>> the pull request", where *FLINK-1234* should be replaced by the actual
> >>> issue number. Skip *component* if you are unsure about which is the best
> >>> component.
> >>> Typo fixes that have no associated JIRA issue should be named following
> >>> this pattern: `[hotfix] [docs] Fix typo in event time introduction` or
> >>> `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.
> >>>
> >>> - Fill out the template below to describe the changes contributed by the
> >>> pull request. That will give reviewers the context they need to do the
> >>> review.
> >>>
> >>> - Make sure that the change passes the automated tests, i.e., `mvn clean
> >>> verify`
> >>>
> >>> - Each pull request should address only one issue, not mix up code from
> >>> multiple issues.
> >>>
> >>> - Each commit in the pull request has a meaningful commit message
> >>> (including the JIRA id)
> >>>
> >>> - Once all items of the checklist are addressed, remove the above text
> >>> and this checklist, leaving only the filled out template below.
> >>>
> >>>
> >>> **(The sections below can be removed for hotfixes of typos)**
> >>>
> >>> ## What is the purpose of the change
> >>>
> >>> *(For example: This pull request makes task deployment go through the
> >>
> >> blob
> >>
> >>> server, rather than through RPC. That way we avoid re-transferring them
> >>
> >> on
> >>
> >>> each deployment (during recovery).)*
> >>>
> >>>
> >>> ## Brief change log
> >>>
> >>> *(for example:)*
> >>> - *The TaskInfo is stored in the blob store on job creation time as a
> >>> persistent artifact*
> >>> - *Deployments RPC transmits only the blob storage reference*
> >>> - *TaskManagers retrieve the TaskInfo from the blob cache*
> >>>
> >>>
> >>> ## Verifying this change
> >>>
> >>> *(Please pick either of the following options)*
> >>>
> >>> This change is a trivial rework / code cleanup without any test
> >>> coverage.
> >>>
> >>> *(or)*
> >>>
> >>> This change is already covered by existing tests, such as *(please
> >>
> >> describe
> >>
> >>> tests)*.
> >>>
> >>> *(or)*
> >>>
> >>> This change added tests and can be verified as follows:
> >>>
> >>> *(example:)*
> >>> - *Added integration tests for end-to-end deployment with large payloads
> >>> (100MB)*
> >>> - *Extended integration test for recovery after master (JobManager)
> >>> failure*
> >>> - *Added test that validates that TaskInfo is transferred only once
> >>> across recoveries*
> >>> - *Manually verified the change by running a 4 node cluser with 2
> >>> JobManagers and 4 TaskManagers, a stateful streaming program, and
> >>> killing
> >>> one JobManager and to TaskManagers during the execution, verifying that
> >>> recovery happens correctly.*
> >>>
> >>> ## Does this pull request potentially affect one of the following parts:
> >>>
> >>> - Dependencies (does it add or upgrade a dependency): **(yes / no)**
> >>> - The public API, i.e., is any changed class annotated with
> >>> `@Public(Evolving)`: **(yes / no)**
> >>> - The serializers: **(yes / no / don't know)**
> >>> - The runtime per-record code paths (performance sensitive): **(yes / no
> >>> / don't know)**
> >>> - Anything that affects deployment or recovery: JobManager (and its
> >>> components), Checkpointing, Yarn/Mesos, ZooKeeper: **(yes / no / don't
> >>> know)**:
> >>>
> >>> ## Documentation
> >>>
> >>> - Does this pull request introduce a new feature? **(yes / no)**
> >>> - If yes, how is the feature documented? **(not applicable / docs /
> >>> JavaDocs / not documented)**


signature.asc (201 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A more thorough Pull Request check list and template

Greg Hogan
In reply to this post by Stephan Ewen
Thanks for leading this discussion, Stephan. I don’t disagree with anything that has been said but am slightly concerned that the improvements in documenting pull requests won’t translate into the git commit messages. Acceptance of a higher standard will be swift as long as reviewers set the example and expectation. Perhaps a new template could be trialled as opt-in for a few weeks by several or more committers.

Greg


> On Jul 18, 2017, at 10:58 AM, Stephan Ewen <[hidden email]> wrote:
>
> My thinking was exactly as echoed by Gordon and Ufuk:
>
>  - The yes/no sections are also for reviewers a reminder of what to watch
> out for.
>    Let's face it, probably half of the committers are not aware that these
> things need to be checked implicitly against every change.
>    A good part of the recent issues came from exactly that. Changes get
> merged (because the pull request lingered or the number of open PRs is
> high) and these implications are not thought through.
>
>  - This is to me a tradeoff between requiring explicit +1s from certain
> people (maintainers) for certain components, and getting an awareness into
> everybody's mind.
>
>  - It also makes all users aware that these things are considered and
> implicitly manages expectations in how fast can things get merged.
>
>
> Concerning the long text: I think it is fine to play the ball a bit more to
> the contributors.
> Making it easy, yes. But also making it correct and well. We need to make
> contributors aware of what it means to contribute to a system to runs
> highly available critical infrastructure. There is quite often still the
> mindset of "hey, cool, open source, let me throw something out there".
>
> My take is that anyone who is serious about contributing and serious about
> quality is not put off by this template.
>
> Concerning the introductory text: I bet that rarely anyone reads the "how
> to contribute" guide. Before the template, virtually no new pull request
> had even the required naming.
> That text needs to be in the template, or we might as well not have it
> anywhere at all.
>
>
>
> Just for reference: Below is the introductory text of the JDK ;-)
>
> 5. Know what to expect
>
> Only the best patches submitted will actually make it all the way into a
> JDK code base. The goal is not to take in the maximum number of
> contributions possible, but rather to accept only the highest-quality
> contributions. The JDK is used daily by millions of people and thousands of
> businesses, often in mission-critical applications, and so we can't afford
> to accept anything less than the very best.
>
> If you're relatively new to the Java platform then we recommend that you
> gain more experience writing Java applications before you attempt to work
> on the JDK itself. The purpose of the sponsored-contribution process is to
> bring developers who already have the skills required to work on the JDK
> into the existing development community. The members of that community have
> neither the time nor the patience required to teach basic Java programming
> skills or platform implementation techniques.
>
>
>
>
>
> On Tue, Jul 18, 2017 at 12:15 PM, Ufuk Celebi <[hidden email]> wrote:
>
>> On Tue, Jul 18, 2017 at 10:47 AM, Fabian Hueske <[hidden email]> wrote:
>>> For example even if the question about changed dependencies is answered
>>> with "no", the reviewer still has to check that.
>>
>> But having it as a required option/text in the PR descriptions helps
>> reviewers to actually remember to check that. I think we should be
>> more realistic here and assume that reviewers will also overlook
>> things etc.
>>
>> To me, keeping the questions is more important than the intro text.
>> Therefore, I would be OK with moving the text to the contrib guide,
>> but I would definitely keep the detailed yes/nos and not go with high
>> level questions that everyone will answer differently.
>>
>> – Ufuk
>>

Reply | Threaded
Open this post in threaded view
|

回复:[DISCUSS] A more thorough Pull Request check list and template

wangzhijiang
In reply to this post by Ufuk Celebi-2
From my side, I also like the new template which contains more useful information, and both the reviewer and contributor may get benefits from it.
For my previous pull requests as a contributor, I always listed the purpose of change and change log, but rarely mentioned how to verify the change and affected components.These are really necessary and shoule not be ignored in the pull request.  So the template can indeed help contributor think more when submitting pull request.
If I review other pull requests, I also want to see these sections before code review.
zhijiang------------------------------------------------------------------发件人:Stephan Ewen <[hidden email]>发送时间:2017年7月18日(星期二) 22:59收件人:[hidden email] <[hidden email]>主 题:Re: [DISCUSS] A more thorough Pull Request check list and template
My thinking was exactly as echoed by Gordon and Ufuk:

  - The yes/no sections are also for reviewers a reminder of what to watch
out for.
    Let's face it, probably half of the committers are not aware that these
things need to be checked implicitly against every change.
    A good part of the recent issues came from exactly that. Changes get
merged (because the pull request lingered or the number of open PRs is
high) and these implications are not thought through.

  - This is to me a tradeoff between requiring explicit +1s from certain
people (maintainers) for certain components, and getting an awareness into
everybody's mind.

  - It also makes all users aware that these things are considered and
implicitly manages expectations in how fast can things get merged.


Concerning the long text: I think it is fine to play the ball a bit more to
the contributors.
Making it easy, yes. But also making it correct and well. We need to make
contributors aware of what it means to contribute to a system to runs
highly available critical infrastructure. There is quite often still the
mindset of "hey, cool, open source, let me throw something out there".

My take is that anyone who is serious about contributing and serious about
quality is not put off by this template.

Concerning the introductory text: I bet that rarely anyone reads the "how
to contribute" guide. Before the template, virtually no new pull request
had even the required naming.
That text needs to be in the template, or we might as well not have it
anywhere at all.



Just for reference: Below is the introductory text of the JDK ;-)

5. Know what to expect

Only the best patches submitted will actually make it all the way into a
JDK code base. The goal is not to take in the maximum number of
contributions possible, but rather to accept only the highest-quality
contributions. The JDK is used daily by millions of people and thousands of
businesses, often in mission-critical applications, and so we can't afford
to accept anything less than the very best.

If you're relatively new to the Java platform then we recommend that you
gain more experience writing Java applications before you attempt to work
on the JDK itself. The purpose of the sponsored-contribution process is to
bring developers who already have the skills required to work on the JDK
into the existing development community. The members of that community have
neither the time nor the patience required to teach basic Java programming
skills or platform implementation techniques.





On Tue, Jul 18, 2017 at 12:15 PM, Ufuk Celebi <[hidden email]> wrote:

> On Tue, Jul 18, 2017 at 10:47 AM, Fabian Hueske <[hidden email]> wrote:
> > For example even if the question about changed dependencies is answered
> > with "no", the reviewer still has to check that.
>
> But having it as a required option/text in the PR descriptions helps
> reviewers to actually remember to check that. I think we should be
> more realistic here and assume that reviewers will also overlook
> things etc.
>
> To me, keeping the questions is more important than the intro text.
> Therefore, I would be OK with moving the text to the contrib guide,
> but I would definitely keep the detailed yes/nos and not go with high
> level questions that everyone will answer differently.
>
> – Ufuk
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A more thorough Pull Request check list and template

Stephan Ewen
In reply to this post by Nico Kruber
Concerning moving text to the contributors guide:

I can only say it again: I believe the contribution guide is almost dead
text. Very few people read it.
Before the current template was introduced, new contributors rarely gave
the pull request a name with Jira number. That is a good indicator about
how many read this guide.
Putting the test in the template is a way that every one reads it.


I am also wondering what the concern is.
A new contributor should clearly read through a bit of text, to learn what
we look for in contributions.
A recurring contributor will not have to read it again, simply remove the
text from the pull request message and go on.

Where is the disadvantage?


On Tue, Jul 18, 2017 at 5:35 PM, Nico Kruber <[hidden email]> wrote:

> I like the new template but also agree with the text being too long and
> would
> move the intro to the contributors guide with a link in the PR template.
>
> Regarding the questions to fill out - I'd like the headings to be short and
> have the affected components last so that documentation is not lost
> (although
> being more important than this checklist), e.g.:
>
> * Purpose of the change
> * Brief change log
> * Verifying the change
> * Documentation
> * Affected components
>
> The verification options in the original template look a bit too large but
> it
> stresses what tests should be added, especially for bigger changes. Can't
> think of a way to make it shorter though.
>
>
> Nico
>
> On Tuesday, 18 July 2017 11:20:41 CEST Chesnay Schepler wrote:
> > I fully agree with Fabian.
> >
> > Multiple-choice questions provide little value to the reviewer, since the
> > validity has to be verified in any case. While text answers have to be
> > validated as well,
> > they give some hint to the reviewer as to how it can be verified and
> > which steps the
> > contributor did to do so.
> >
> > I also agree that it is too long; IMO this is really intimidating to new
> > contributors to be greeted with this.
> >
> > Ideally we only link to the contributors guide and ask 3 questions:
> >
> >   * What is the problem?
> >   * How was it fixed?
> >   * How can the fix be verified?
> >
> > On 18.07.2017 10:47, Fabian Hueske wrote:
> > > I like the sections about purpose, change log, and verification of the
> > > changes.
> > >
> > > However, I think the proposed template is too much text. This is
> probably
> > > the reason why the first attempt to establish a PR template failed.
> > > I would move most of the introduction and explanations incl. examples
> to
> > > the "Contribution Guidelines" and only pass a link.
> > > IMO, the template should be rather shorter than the current one and
> only
> > > have the link, the sections to fill out, and checkboxes.
> > >
> > > I'm also not sure how much the detailed questions will help.
> > > For example even if the question about changed dependencies is answered
> > > with "no", the reviewer still has to check that.
> > >
> > > I think the questions of the current template work differently.
> > > A question "Does the PR include tests?" suggests to the contributor
> that
> > > those should be included. Same for documentation.
> > >
> > > Cheers,
> > > Fabian
> > >
> > > 2017-07-18 10:05 GMT+02:00 Tzu-Li (Gordon) Tai <[hidden email]>:
> > >> +1, I like this a lot.
> > >> With the previous template, it doesn’t really resonate with what we
> > >> should
> > >> care about, and therefore most of the time I think contributors just
> > >> delete
> > >> that template and write down something on their own.
> > >>
> > >> I would also like to add: “Savepoint / checkpoint binary formats” to
> the
> > >> potential affect scope check list.
> > >> I think changes to those deserves an independent yes / no check of its
> > >> own.
> > >>
> > >> Cheers,
> > >> Gordon
> > >>
> > >> On 18 July 2017 at 3:49:42 PM, Ufuk Celebi ([hidden email]) wrote:
> > >>
> > >> I really like this and vote to change our current template.
> > >>
> > >> The simple yes/no/... options are a really good idea. I would add to
> > >> your email that the questions will equally help reviewers to remember
> > >> to look at these things, which is just as important.
> > >>
> > >> When we merge this, we should make sure to strictly follow the guide.
> > >> Ideally, in the long term we can even automate some of the yes/no/...
> > >> questions via a bot... but let's not get ahead of ourselves here ;-)
> > >>
> > >> On Mon, Jul 17, 2017 at 6:36 PM, Stephan Ewen <[hidden email]>
> wrote:
> > >>> Hi all!
> > >>>
> > >>> I have reflected a bit on the pull requests and on some of the recent
> > >>> changes to Flink and some of the introduced bugs / regressions that
> we
> > >>
> > >> have
> > >>
> > >>> fixed.
> > >>>
> > >>> One thing that I think would have helped is to have more explicit
> > >>> information about what the pull request does and how the contributor
> > >>
> > >> would
> > >>
> > >>> suggest to verify it. I have seen this when contributing to some
> other
> > >>> project and really liked the approach.
> > >>>
> > >>> It requires that a contributor takes a minute to reflect on what was
> > >>> touched, and what would be ways to verify that the changes work
> > >>> properly.
> > >>> Besides being a help to the reviewer, it also makes contributors
> aware
> > >>> of
> > >>> what is important during the review process.
> > >>>
> > >>>
> > >>> I suggest a new pull request template, as attached below, with a
> preview
> > >>> here:
> > >>> https://github.com/StephanEwen/incubator-flink/
> > >>
> > >> blob/pr_template/.github/PULL_REQUEST_TEMPLATE.md
> > >>
> > >>> Don't be scared, it looks long, but a big part is the introductory
> text
> > >>> (only relevant for new contributors) and the examples contents for
> the
> > >>> description.
> > >>
> > >>> Filling this out for code that is in shape should be a quick thing:
> > >> Remove
> > >>
> > >>> the into and checklist, write a few sentences on what the PR does
> (one
> > >>> should do that anyways) and then pick some yes/no in the
> classification
> > >>> section.
> > >>>
> > >>> Curious to hear what you think!
> > >>>
> > >>> Best,
> > >>> Stephan
> > >>>
> > >>>
> > >>> ============================
> > >>>
> > >>> Full suggested pull request template:
> > >>>
> > >>>
> > >>>
> > >>> *Thank you very much for contributing to Apache Flink - we are happy
> > >>> that
> > >>> you want to help us improve Flink. To help the community review you
> > >>> contribution in the best possible way, please go through the
> checklist
> > >>> below, which will get the contribution into a shape in which it can
> be
> > >>
> > >> best
> > >>
> > >>> reviewed.*
> > >>>
> > >>> *Please understand that we do not do this to make contributions to
> Flink
> > >>
> > >> a
> > >>
> > >>> hassle. In order to uphold a high standard of quality for code
> > >>> contributions, while at the same time managing a large number of
> > >>> contributions, we need contributors to prepare the contributions
> well,
> > >>
> > >> and
> > >>
> > >>> give reviewers enough contextual information for the review. Please
> also
> > >>> understand that contributions that do not follow this guide will take
> > >>> longer to review and thus typically be picked up with lower priority
> by
> > >>
> > >> the
> > >>
> > >>> community.*
> > >>>
> > >>> ## Contribution Checklist
> > >>>
> > >>> - Make sure that the pull request corresponds to a [JIRA issue](
> > >>> https://issues.apache.org/jira/projects/FLINK/issues). Exceptions
> are
> > >>
> > >> made
> > >>
> > >>> for typos in JavaDoc or documentation files, which need no JIRA
> issue.
> > >>>
> > >>> - Name the pull request in the form "[FLINK-1234] [component] Title
> of
> > >>> the pull request", where *FLINK-1234* should be replaced by the
> actual
> > >>> issue number. Skip *component* if you are unsure about which is the
> best
> > >>> component.
> > >>> Typo fixes that have no associated JIRA issue should be named
> following
> > >>> this pattern: `[hotfix] [docs] Fix typo in event time introduction`
> or
> > >>> `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.
> > >>>
> > >>> - Fill out the template below to describe the changes contributed by
> the
> > >>> pull request. That will give reviewers the context they need to do
> the
> > >>> review.
> > >>>
> > >>> - Make sure that the change passes the automated tests, i.e., `mvn
> clean
> > >>> verify`
> > >>>
> > >>> - Each pull request should address only one issue, not mix up code
> from
> > >>> multiple issues.
> > >>>
> > >>> - Each commit in the pull request has a meaningful commit message
> > >>> (including the JIRA id)
> > >>>
> > >>> - Once all items of the checklist are addressed, remove the above
> text
> > >>> and this checklist, leaving only the filled out template below.
> > >>>
> > >>>
> > >>> **(The sections below can be removed for hotfixes of typos)**
> > >>>
> > >>> ## What is the purpose of the change
> > >>>
> > >>> *(For example: This pull request makes task deployment go through the
> > >>
> > >> blob
> > >>
> > >>> server, rather than through RPC. That way we avoid re-transferring
> them
> > >>
> > >> on
> > >>
> > >>> each deployment (during recovery).)*
> > >>>
> > >>>
> > >>> ## Brief change log
> > >>>
> > >>> *(for example:)*
> > >>> - *The TaskInfo is stored in the blob store on job creation time as a
> > >>> persistent artifact*
> > >>> - *Deployments RPC transmits only the blob storage reference*
> > >>> - *TaskManagers retrieve the TaskInfo from the blob cache*
> > >>>
> > >>>
> > >>> ## Verifying this change
> > >>>
> > >>> *(Please pick either of the following options)*
> > >>>
> > >>> This change is a trivial rework / code cleanup without any test
> > >>> coverage.
> > >>>
> > >>> *(or)*
> > >>>
> > >>> This change is already covered by existing tests, such as *(please
> > >>
> > >> describe
> > >>
> > >>> tests)*.
> > >>>
> > >>> *(or)*
> > >>>
> > >>> This change added tests and can be verified as follows:
> > >>>
> > >>> *(example:)*
> > >>> - *Added integration tests for end-to-end deployment with large
> payloads
> > >>> (100MB)*
> > >>> - *Extended integration test for recovery after master (JobManager)
> > >>> failure*
> > >>> - *Added test that validates that TaskInfo is transferred only once
> > >>> across recoveries*
> > >>> - *Manually verified the change by running a 4 node cluser with 2
> > >>> JobManagers and 4 TaskManagers, a stateful streaming program, and
> > >>> killing
> > >>> one JobManager and to TaskManagers during the execution, verifying
> that
> > >>> recovery happens correctly.*
> > >>>
> > >>> ## Does this pull request potentially affect one of the following
> parts:
> > >>>
> > >>> - Dependencies (does it add or upgrade a dependency): **(yes / no)**
> > >>> - The public API, i.e., is any changed class annotated with
> > >>> `@Public(Evolving)`: **(yes / no)**
> > >>> - The serializers: **(yes / no / don't know)**
> > >>> - The runtime per-record code paths (performance sensitive): **(yes
> / no
> > >>> / don't know)**
> > >>> - Anything that affects deployment or recovery: JobManager (and its
> > >>> components), Checkpointing, Yarn/Mesos, ZooKeeper: **(yes / no /
> don't
> > >>> know)**:
> > >>>
> > >>> ## Documentation
> > >>>
> > >>> - Does this pull request introduce a new feature? **(yes / no)**
> > >>> - If yes, how is the feature documented? **(not applicable / docs /
> > >>> JavaDocs / not documented)**
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A more thorough Pull Request check list and template

Chesnay Schepler-3
I'm sorry but i can't follow your logic.

Put text into template => contributor will definitely read it
Put link to text into template => contributor will completely ignore the
link

The advantage of the link is we don't duplicate the contribution guide
in the docs and in the template.
Furthermore, you don't even need to remove something from the template,
since it's just a single line.

On 18.07.2017 19:25, Stephan Ewen wrote:

> Concerning moving text to the contributors guide:
>
> I can only say it again: I believe the contribution guide is almost dead
> text. Very few people read it.
> Before the current template was introduced, new contributors rarely gave
> the pull request a name with Jira number. That is a good indicator about
> how many read this guide.
> Putting the test in the template is a way that every one reads it.
>
>
> I am also wondering what the concern is.
> A new contributor should clearly read through a bit of text, to learn what
> we look for in contributions.
> A recurring contributor will not have to read it again, simply remove the
> text from the pull request message and go on.
>
> Where is the disadvantage?
>
>
> On Tue, Jul 18, 2017 at 5:35 PM, Nico Kruber <[hidden email]> wrote:
>
>> I like the new template but also agree with the text being too long and
>> would
>> move the intro to the contributors guide with a link in the PR template.
>>
>> Regarding the questions to fill out - I'd like the headings to be short and
>> have the affected components last so that documentation is not lost
>> (although
>> being more important than this checklist), e.g.:
>>
>> * Purpose of the change
>> * Brief change log
>> * Verifying the change
>> * Documentation
>> * Affected components
>>
>> The verification options in the original template look a bit too large but
>> it
>> stresses what tests should be added, especially for bigger changes. Can't
>> think of a way to make it shorter though.
>>
>>
>> Nico
>>
>> On Tuesday, 18 July 2017 11:20:41 CEST Chesnay Schepler wrote:
>>> I fully agree with Fabian.
>>>
>>> Multiple-choice questions provide little value to the reviewer, since the
>>> validity has to be verified in any case. While text answers have to be
>>> validated as well,
>>> they give some hint to the reviewer as to how it can be verified and
>>> which steps the
>>> contributor did to do so.
>>>
>>> I also agree that it is too long; IMO this is really intimidating to new
>>> contributors to be greeted with this.
>>>
>>> Ideally we only link to the contributors guide and ask 3 questions:
>>>
>>>    * What is the problem?
>>>    * How was it fixed?
>>>    * How can the fix be verified?
>>>
>>> On 18.07.2017 10:47, Fabian Hueske wrote:
>>>> I like the sections about purpose, change log, and verification of the
>>>> changes.
>>>>
>>>> However, I think the proposed template is too much text. This is
>> probably
>>>> the reason why the first attempt to establish a PR template failed.
>>>> I would move most of the introduction and explanations incl. examples
>> to
>>>> the "Contribution Guidelines" and only pass a link.
>>>> IMO, the template should be rather shorter than the current one and
>> only
>>>> have the link, the sections to fill out, and checkboxes.
>>>>
>>>> I'm also not sure how much the detailed questions will help.
>>>> For example even if the question about changed dependencies is answered
>>>> with "no", the reviewer still has to check that.
>>>>
>>>> I think the questions of the current template work differently.
>>>> A question "Does the PR include tests?" suggests to the contributor
>> that
>>>> those should be included. Same for documentation.
>>>>
>>>> Cheers,
>>>> Fabian
>>>>
>>>> 2017-07-18 10:05 GMT+02:00 Tzu-Li (Gordon) Tai <[hidden email]>:
>>>>> +1, I like this a lot.
>>>>> With the previous template, it doesn’t really resonate with what we
>>>>> should
>>>>> care about, and therefore most of the time I think contributors just
>>>>> delete
>>>>> that template and write down something on their own.
>>>>>
>>>>> I would also like to add: “Savepoint / checkpoint binary formats” to
>> the
>>>>> potential affect scope check list.
>>>>> I think changes to those deserves an independent yes / no check of its
>>>>> own.
>>>>>
>>>>> Cheers,
>>>>> Gordon
>>>>>
>>>>> On 18 July 2017 at 3:49:42 PM, Ufuk Celebi ([hidden email]) wrote:
>>>>>
>>>>> I really like this and vote to change our current template.
>>>>>
>>>>> The simple yes/no/... options are a really good idea. I would add to
>>>>> your email that the questions will equally help reviewers to remember
>>>>> to look at these things, which is just as important.
>>>>>
>>>>> When we merge this, we should make sure to strictly follow the guide.
>>>>> Ideally, in the long term we can even automate some of the yes/no/...
>>>>> questions via a bot... but let's not get ahead of ourselves here ;-)
>>>>>
>>>>> On Mon, Jul 17, 2017 at 6:36 PM, Stephan Ewen <[hidden email]>
>> wrote:
>>>>>> Hi all!
>>>>>>
>>>>>> I have reflected a bit on the pull requests and on some of the recent
>>>>>> changes to Flink and some of the introduced bugs / regressions that
>> we
>>>>> have
>>>>>
>>>>>> fixed.
>>>>>>
>>>>>> One thing that I think would have helped is to have more explicit
>>>>>> information about what the pull request does and how the contributor
>>>>> would
>>>>>
>>>>>> suggest to verify it. I have seen this when contributing to some
>> other
>>>>>> project and really liked the approach.
>>>>>>
>>>>>> It requires that a contributor takes a minute to reflect on what was
>>>>>> touched, and what would be ways to verify that the changes work
>>>>>> properly.
>>>>>> Besides being a help to the reviewer, it also makes contributors
>> aware
>>>>>> of
>>>>>> what is important during the review process.
>>>>>>
>>>>>>
>>>>>> I suggest a new pull request template, as attached below, with a
>> preview
>>>>>> here:
>>>>>> https://github.com/StephanEwen/incubator-flink/
>>>>> blob/pr_template/.github/PULL_REQUEST_TEMPLATE.md
>>>>>
>>>>>> Don't be scared, it looks long, but a big part is the introductory
>> text
>>>>>> (only relevant for new contributors) and the examples contents for
>> the
>>>>>> description.
>>>>>> Filling this out for code that is in shape should be a quick thing:
>>>>> Remove
>>>>>
>>>>>> the into and checklist, write a few sentences on what the PR does
>> (one
>>>>>> should do that anyways) and then pick some yes/no in the
>> classification
>>>>>> section.
>>>>>>
>>>>>> Curious to hear what you think!
>>>>>>
>>>>>> Best,
>>>>>> Stephan
>>>>>>
>>>>>>
>>>>>> ============================
>>>>>>
>>>>>> Full suggested pull request template:
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Thank you very much for contributing to Apache Flink - we are happy
>>>>>> that
>>>>>> you want to help us improve Flink. To help the community review you
>>>>>> contribution in the best possible way, please go through the
>> checklist
>>>>>> below, which will get the contribution into a shape in which it can
>> be
>>>>> best
>>>>>
>>>>>> reviewed.*
>>>>>>
>>>>>> *Please understand that we do not do this to make contributions to
>> Flink
>>>>> a
>>>>>
>>>>>> hassle. In order to uphold a high standard of quality for code
>>>>>> contributions, while at the same time managing a large number of
>>>>>> contributions, we need contributors to prepare the contributions
>> well,
>>>>> and
>>>>>
>>>>>> give reviewers enough contextual information for the review. Please
>> also
>>>>>> understand that contributions that do not follow this guide will take
>>>>>> longer to review and thus typically be picked up with lower priority
>> by
>>>>> the
>>>>>
>>>>>> community.*
>>>>>>
>>>>>> ## Contribution Checklist
>>>>>>
>>>>>> - Make sure that the pull request corresponds to a [JIRA issue](
>>>>>> https://issues.apache.org/jira/projects/FLINK/issues). Exceptions
>> are
>>>>> made
>>>>>
>>>>>> for typos in JavaDoc or documentation files, which need no JIRA
>> issue.
>>>>>> - Name the pull request in the form "[FLINK-1234] [component] Title
>> of
>>>>>> the pull request", where *FLINK-1234* should be replaced by the
>> actual
>>>>>> issue number. Skip *component* if you are unsure about which is the
>> best
>>>>>> component.
>>>>>> Typo fixes that have no associated JIRA issue should be named
>> following
>>>>>> this pattern: `[hotfix] [docs] Fix typo in event time introduction`
>> or
>>>>>> `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.
>>>>>>
>>>>>> - Fill out the template below to describe the changes contributed by
>> the
>>>>>> pull request. That will give reviewers the context they need to do
>> the
>>>>>> review.
>>>>>>
>>>>>> - Make sure that the change passes the automated tests, i.e., `mvn
>> clean
>>>>>> verify`
>>>>>>
>>>>>> - Each pull request should address only one issue, not mix up code
>> from
>>>>>> multiple issues.
>>>>>>
>>>>>> - Each commit in the pull request has a meaningful commit message
>>>>>> (including the JIRA id)
>>>>>>
>>>>>> - Once all items of the checklist are addressed, remove the above
>> text
>>>>>> and this checklist, leaving only the filled out template below.
>>>>>>
>>>>>>
>>>>>> **(The sections below can be removed for hotfixes of typos)**
>>>>>>
>>>>>> ## What is the purpose of the change
>>>>>>
>>>>>> *(For example: This pull request makes task deployment go through the
>>>>> blob
>>>>>
>>>>>> server, rather than through RPC. That way we avoid re-transferring
>> them
>>>>> on
>>>>>
>>>>>> each deployment (during recovery).)*
>>>>>>
>>>>>>
>>>>>> ## Brief change log
>>>>>>
>>>>>> *(for example:)*
>>>>>> - *The TaskInfo is stored in the blob store on job creation time as a
>>>>>> persistent artifact*
>>>>>> - *Deployments RPC transmits only the blob storage reference*
>>>>>> - *TaskManagers retrieve the TaskInfo from the blob cache*
>>>>>>
>>>>>>
>>>>>> ## Verifying this change
>>>>>>
>>>>>> *(Please pick either of the following options)*
>>>>>>
>>>>>> This change is a trivial rework / code cleanup without any test
>>>>>> coverage.
>>>>>>
>>>>>> *(or)*
>>>>>>
>>>>>> This change is already covered by existing tests, such as *(please
>>>>> describe
>>>>>
>>>>>> tests)*.
>>>>>>
>>>>>> *(or)*
>>>>>>
>>>>>> This change added tests and can be verified as follows:
>>>>>>
>>>>>> *(example:)*
>>>>>> - *Added integration tests for end-to-end deployment with large
>> payloads
>>>>>> (100MB)*
>>>>>> - *Extended integration test for recovery after master (JobManager)
>>>>>> failure*
>>>>>> - *Added test that validates that TaskInfo is transferred only once
>>>>>> across recoveries*
>>>>>> - *Manually verified the change by running a 4 node cluser with 2
>>>>>> JobManagers and 4 TaskManagers, a stateful streaming program, and
>>>>>> killing
>>>>>> one JobManager and to TaskManagers during the execution, verifying
>> that
>>>>>> recovery happens correctly.*
>>>>>>
>>>>>> ## Does this pull request potentially affect one of the following
>> parts:
>>>>>> - Dependencies (does it add or upgrade a dependency): **(yes / no)**
>>>>>> - The public API, i.e., is any changed class annotated with
>>>>>> `@Public(Evolving)`: **(yes / no)**
>>>>>> - The serializers: **(yes / no / don't know)**
>>>>>> - The runtime per-record code paths (performance sensitive): **(yes
>> / no
>>>>>> / don't know)**
>>>>>> - Anything that affects deployment or recovery: JobManager (and its
>>>>>> components), Checkpointing, Yarn/Mesos, ZooKeeper: **(yes / no /
>> don't
>>>>>> know)**:
>>>>>>
>>>>>> ## Documentation
>>>>>>
>>>>>> - Does this pull request introduce a new feature? **(yes / no)**
>>>>>> - If yes, how is the feature documented? **(not applicable / docs /
>>>>>> JavaDocs / not documented)**
>>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A more thorough Pull Request check list and template

Ufuk Celebi-2
On Tue, Jul 18, 2017 at 11:03 PM, Chesnay Schepler <[hidden email]> wrote:
> I'm sorry but i can't follow your logic.

I understand where you're coming from Chesnay, but I agree with
Stephan that a link is easier to ignore than a text. I think Stephan
gave good pointers to why he thinks that people don't read the
contribution guide. I'm +1 to add the text as proposed by Stephan.

Regarding duplication: We can also do it the other way around and link
to the contribution guide from the webpage (Markdown is very readable
on GitHub).

We won't be able to guarantee what people read or don't read, but we
should make the desired thing (read it ;)) as straight forward as
possible.
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A more thorough Pull Request check list and template

Stephan Ewen
In reply to this post by Chesnay Schepler-3
@Chesnay:

Put text into template => contributor will have to read it
Put link to text into template => most contributors will ignore the link

Yes, that is pretty much what my observation from the past is.



On Tue, Jul 18, 2017 at 11:03 PM, Chesnay Schepler <[hidden email]>
wrote:

> I'm sorry but i can't follow your logic.
>
> Put text into template => contributor will definitely read it
> Put link to text into template => contributor will completely ignore the
> link
>
> The advantage of the link is we don't duplicate the contribution guide in
> the docs and in the template.
> Furthermore, you don't even need to remove something from the template,
> since it's just a single line.
>
>
> On 18.07.2017 19:25, Stephan Ewen wrote:
>
>> Concerning moving text to the contributors guide:
>>
>> I can only say it again: I believe the contribution guide is almost dead
>> text. Very few people read it.
>> Before the current template was introduced, new contributors rarely gave
>> the pull request a name with Jira number. That is a good indicator about
>> how many read this guide.
>> Putting the test in the template is a way that every one reads it.
>>
>>
>> I am also wondering what the concern is.
>> A new contributor should clearly read through a bit of text, to learn what
>> we look for in contributions.
>> A recurring contributor will not have to read it again, simply remove the
>> text from the pull request message and go on.
>>
>> Where is the disadvantage?
>>
>>
>> On Tue, Jul 18, 2017 at 5:35 PM, Nico Kruber <[hidden email]>
>> wrote:
>>
>> I like the new template but also agree with the text being too long and
>>> would
>>> move the intro to the contributors guide with a link in the PR template.
>>>
>>> Regarding the questions to fill out - I'd like the headings to be short
>>> and
>>> have the affected components last so that documentation is not lost
>>> (although
>>> being more important than this checklist), e.g.:
>>>
>>> * Purpose of the change
>>> * Brief change log
>>> * Verifying the change
>>> * Documentation
>>> * Affected components
>>>
>>> The verification options in the original template look a bit too large
>>> but
>>> it
>>> stresses what tests should be added, especially for bigger changes. Can't
>>> think of a way to make it shorter though.
>>>
>>>
>>> Nico
>>>
>>> On Tuesday, 18 July 2017 11:20:41 CEST Chesnay Schepler wrote:
>>>
>>>> I fully agree with Fabian.
>>>>
>>>> Multiple-choice questions provide little value to the reviewer, since
>>>> the
>>>> validity has to be verified in any case. While text answers have to be
>>>> validated as well,
>>>> they give some hint to the reviewer as to how it can be verified and
>>>> which steps the
>>>> contributor did to do so.
>>>>
>>>> I also agree that it is too long; IMO this is really intimidating to new
>>>> contributors to be greeted with this.
>>>>
>>>> Ideally we only link to the contributors guide and ask 3 questions:
>>>>
>>>>    * What is the problem?
>>>>    * How was it fixed?
>>>>    * How can the fix be verified?
>>>>
>>>> On 18.07.2017 10:47, Fabian Hueske wrote:
>>>>
>>>>> I like the sections about purpose, change log, and verification of the
>>>>> changes.
>>>>>
>>>>> However, I think the proposed template is too much text. This is
>>>>>
>>>> probably
>>>
>>>> the reason why the first attempt to establish a PR template failed.
>>>>> I would move most of the introduction and explanations incl. examples
>>>>>
>>>> to
>>>
>>>> the "Contribution Guidelines" and only pass a link.
>>>>> IMO, the template should be rather shorter than the current one and
>>>>>
>>>> only
>>>
>>>> have the link, the sections to fill out, and checkboxes.
>>>>>
>>>>> I'm also not sure how much the detailed questions will help.
>>>>> For example even if the question about changed dependencies is answered
>>>>> with "no", the reviewer still has to check that.
>>>>>
>>>>> I think the questions of the current template work differently.
>>>>> A question "Does the PR include tests?" suggests to the contributor
>>>>>
>>>> that
>>>
>>>> those should be included. Same for documentation.
>>>>>
>>>>> Cheers,
>>>>> Fabian
>>>>>
>>>>> 2017-07-18 10:05 GMT+02:00 Tzu-Li (Gordon) Tai <[hidden email]>:
>>>>>
>>>>>> +1, I like this a lot.
>>>>>> With the previous template, it doesn’t really resonate with what we
>>>>>> should
>>>>>> care about, and therefore most of the time I think contributors just
>>>>>> delete
>>>>>> that template and write down something on their own.
>>>>>>
>>>>>> I would also like to add: “Savepoint / checkpoint binary formats” to
>>>>>>
>>>>> the
>>>
>>>> potential affect scope check list.
>>>>>> I think changes to those deserves an independent yes / no check of its
>>>>>> own.
>>>>>>
>>>>>> Cheers,
>>>>>> Gordon
>>>>>>
>>>>>> On 18 July 2017 at 3:49:42 PM, Ufuk Celebi ([hidden email]) wrote:
>>>>>>
>>>>>> I really like this and vote to change our current template.
>>>>>>
>>>>>> The simple yes/no/... options are a really good idea. I would add to
>>>>>> your email that the questions will equally help reviewers to remember
>>>>>> to look at these things, which is just as important.
>>>>>>
>>>>>> When we merge this, we should make sure to strictly follow the guide.
>>>>>> Ideally, in the long term we can even automate some of the yes/no/...
>>>>>> questions via a bot... but let's not get ahead of ourselves here ;-)
>>>>>>
>>>>>> On Mon, Jul 17, 2017 at 6:36 PM, Stephan Ewen <[hidden email]>
>>>>>>
>>>>> wrote:
>>>
>>>> Hi all!
>>>>>>>
>>>>>>> I have reflected a bit on the pull requests and on some of the recent
>>>>>>> changes to Flink and some of the introduced bugs / regressions that
>>>>>>>
>>>>>> we
>>>
>>>> have
>>>>>>
>>>>>> fixed.
>>>>>>>
>>>>>>> One thing that I think would have helped is to have more explicit
>>>>>>> information about what the pull request does and how the contributor
>>>>>>>
>>>>>> would
>>>>>>
>>>>>> suggest to verify it. I have seen this when contributing to some
>>>>>>>
>>>>>> other
>>>
>>>> project and really liked the approach.
>>>>>>>
>>>>>>> It requires that a contributor takes a minute to reflect on what was
>>>>>>> touched, and what would be ways to verify that the changes work
>>>>>>> properly.
>>>>>>> Besides being a help to the reviewer, it also makes contributors
>>>>>>>
>>>>>> aware
>>>
>>>> of
>>>>>>> what is important during the review process.
>>>>>>>
>>>>>>>
>>>>>>> I suggest a new pull request template, as attached below, with a
>>>>>>>
>>>>>> preview
>>>
>>>> here:
>>>>>>> https://github.com/StephanEwen/incubator-flink/
>>>>>>>
>>>>>> blob/pr_template/.github/PULL_REQUEST_TEMPLATE.md
>>>>>>
>>>>>> Don't be scared, it looks long, but a big part is the introductory
>>>>>>>
>>>>>> text
>>>
>>>> (only relevant for new contributors) and the examples contents for
>>>>>>>
>>>>>> the
>>>
>>>> description.
>>>>>>> Filling this out for code that is in shape should be a quick thing:
>>>>>>>
>>>>>> Remove
>>>>>>
>>>>>> the into and checklist, write a few sentences on what the PR does
>>>>>>>
>>>>>> (one
>>>
>>>> should do that anyways) and then pick some yes/no in the
>>>>>>>
>>>>>> classification
>>>
>>>> section.
>>>>>>>
>>>>>>> Curious to hear what you think!
>>>>>>>
>>>>>>> Best,
>>>>>>> Stephan
>>>>>>>
>>>>>>>
>>>>>>> ============================
>>>>>>>
>>>>>>> Full suggested pull request template:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Thank you very much for contributing to Apache Flink - we are happy
>>>>>>> that
>>>>>>> you want to help us improve Flink. To help the community review you
>>>>>>> contribution in the best possible way, please go through the
>>>>>>>
>>>>>> checklist
>>>
>>>> below, which will get the contribution into a shape in which it can
>>>>>>>
>>>>>> be
>>>
>>>> best
>>>>>>
>>>>>> reviewed.*
>>>>>>>
>>>>>>> *Please understand that we do not do this to make contributions to
>>>>>>>
>>>>>> Flink
>>>
>>>> a
>>>>>>
>>>>>> hassle. In order to uphold a high standard of quality for code
>>>>>>> contributions, while at the same time managing a large number of
>>>>>>> contributions, we need contributors to prepare the contributions
>>>>>>>
>>>>>> well,
>>>
>>>> and
>>>>>>
>>>>>> give reviewers enough contextual information for the review. Please
>>>>>>>
>>>>>> also
>>>
>>>> understand that contributions that do not follow this guide will take
>>>>>>> longer to review and thus typically be picked up with lower priority
>>>>>>>
>>>>>> by
>>>
>>>> the
>>>>>>
>>>>>> community.*
>>>>>>>
>>>>>>> ## Contribution Checklist
>>>>>>>
>>>>>>> - Make sure that the pull request corresponds to a [JIRA issue](
>>>>>>> https://issues.apache.org/jira/projects/FLINK/issues). Exceptions
>>>>>>>
>>>>>> are
>>>
>>>> made
>>>>>>
>>>>>> for typos in JavaDoc or documentation files, which need no JIRA
>>>>>>>
>>>>>> issue.
>>>
>>>> - Name the pull request in the form "[FLINK-1234] [component] Title
>>>>>>>
>>>>>> of
>>>
>>>> the pull request", where *FLINK-1234* should be replaced by the
>>>>>>>
>>>>>> actual
>>>
>>>> issue number. Skip *component* if you are unsure about which is the
>>>>>>>
>>>>>> best
>>>
>>>> component.
>>>>>>> Typo fixes that have no associated JIRA issue should be named
>>>>>>>
>>>>>> following
>>>
>>>> this pattern: `[hotfix] [docs] Fix typo in event time introduction`
>>>>>>>
>>>>>> or
>>>
>>>> `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.
>>>>>>>
>>>>>>> - Fill out the template below to describe the changes contributed by
>>>>>>>
>>>>>> the
>>>
>>>> pull request. That will give reviewers the context they need to do
>>>>>>>
>>>>>> the
>>>
>>>> review.
>>>>>>>
>>>>>>> - Make sure that the change passes the automated tests, i.e., `mvn
>>>>>>>
>>>>>> clean
>>>
>>>> verify`
>>>>>>>
>>>>>>> - Each pull request should address only one issue, not mix up code
>>>>>>>
>>>>>> from
>>>
>>>> multiple issues.
>>>>>>>
>>>>>>> - Each commit in the pull request has a meaningful commit message
>>>>>>> (including the JIRA id)
>>>>>>>
>>>>>>> - Once all items of the checklist are addressed, remove the above
>>>>>>>
>>>>>> text
>>>
>>>> and this checklist, leaving only the filled out template below.
>>>>>>>
>>>>>>>
>>>>>>> **(The sections below can be removed for hotfixes of typos)**
>>>>>>>
>>>>>>> ## What is the purpose of the change
>>>>>>>
>>>>>>> *(For example: This pull request makes task deployment go through the
>>>>>>>
>>>>>> blob
>>>>>>
>>>>>> server, rather than through RPC. That way we avoid re-transferring
>>>>>>>
>>>>>> them
>>>
>>>> on
>>>>>>
>>>>>> each deployment (during recovery).)*
>>>>>>>
>>>>>>>
>>>>>>> ## Brief change log
>>>>>>>
>>>>>>> *(for example:)*
>>>>>>> - *The TaskInfo is stored in the blob store on job creation time as a
>>>>>>> persistent artifact*
>>>>>>> - *Deployments RPC transmits only the blob storage reference*
>>>>>>> - *TaskManagers retrieve the TaskInfo from the blob cache*
>>>>>>>
>>>>>>>
>>>>>>> ## Verifying this change
>>>>>>>
>>>>>>> *(Please pick either of the following options)*
>>>>>>>
>>>>>>> This change is a trivial rework / code cleanup without any test
>>>>>>> coverage.
>>>>>>>
>>>>>>> *(or)*
>>>>>>>
>>>>>>> This change is already covered by existing tests, such as *(please
>>>>>>>
>>>>>> describe
>>>>>>
>>>>>> tests)*.
>>>>>>>
>>>>>>> *(or)*
>>>>>>>
>>>>>>> This change added tests and can be verified as follows:
>>>>>>>
>>>>>>> *(example:)*
>>>>>>> - *Added integration tests for end-to-end deployment with large
>>>>>>>
>>>>>> payloads
>>>
>>>> (100MB)*
>>>>>>> - *Extended integration test for recovery after master (JobManager)
>>>>>>> failure*
>>>>>>> - *Added test that validates that TaskInfo is transferred only once
>>>>>>> across recoveries*
>>>>>>> - *Manually verified the change by running a 4 node cluser with 2
>>>>>>> JobManagers and 4 TaskManagers, a stateful streaming program, and
>>>>>>> killing
>>>>>>> one JobManager and to TaskManagers during the execution, verifying
>>>>>>>
>>>>>> that
>>>
>>>> recovery happens correctly.*
>>>>>>>
>>>>>>> ## Does this pull request potentially affect one of the following
>>>>>>>
>>>>>> parts:
>>>
>>>> - Dependencies (does it add or upgrade a dependency): **(yes / no)**
>>>>>>> - The public API, i.e., is any changed class annotated with
>>>>>>> `@Public(Evolving)`: **(yes / no)**
>>>>>>> - The serializers: **(yes / no / don't know)**
>>>>>>> - The runtime per-record code paths (performance sensitive): **(yes
>>>>>>>
>>>>>> / no
>>>
>>>> / don't know)**
>>>>>>> - Anything that affects deployment or recovery: JobManager (and its
>>>>>>> components), Checkpointing, Yarn/Mesos, ZooKeeper: **(yes / no /
>>>>>>>
>>>>>> don't
>>>
>>>> know)**:
>>>>>>>
>>>>>>> ## Documentation
>>>>>>>
>>>>>>> - Does this pull request introduce a new feature? **(yes / no)**
>>>>>>> - If yes, how is the feature documented? **(not applicable / docs /
>>>>>>> JavaDocs / not documented)**
>>>>>>>
>>>>>>
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A more thorough Pull Request check list and template

Ufuk Celebi-2
What's the conclusion of last weeks discussion here?

Fabian and Chesnay raised concerns about the introductory text. Are
you still concerned?

On Wed, Jul 19, 2017 at 10:04 AM, Stephan Ewen <[hidden email]> wrote:

> @Chesnay:
>
> Put text into template => contributor will have to read it
> Put link to text into template => most contributors will ignore the link
>
> Yes, that is pretty much what my observation from the past is.
>
>
>
> On Tue, Jul 18, 2017 at 11:03 PM, Chesnay Schepler <[hidden email]>
> wrote:
>
>> I'm sorry but i can't follow your logic.
>>
>> Put text into template => contributor will definitely read it
>> Put link to text into template => contributor will completely ignore the
>> link
>>
>> The advantage of the link is we don't duplicate the contribution guide in
>> the docs and in the template.
>> Furthermore, you don't even need to remove something from the template,
>> since it's just a single line.
>>
>>
>> On 18.07.2017 19:25, Stephan Ewen wrote:
>>
>>> Concerning moving text to the contributors guide:
>>>
>>> I can only say it again: I believe the contribution guide is almost dead
>>> text. Very few people read it.
>>> Before the current template was introduced, new contributors rarely gave
>>> the pull request a name with Jira number. That is a good indicator about
>>> how many read this guide.
>>> Putting the test in the template is a way that every one reads it.
>>>
>>>
>>> I am also wondering what the concern is.
>>> A new contributor should clearly read through a bit of text, to learn what
>>> we look for in contributions.
>>> A recurring contributor will not have to read it again, simply remove the
>>> text from the pull request message and go on.
>>>
>>> Where is the disadvantage?
>>>
>>>
>>> On Tue, Jul 18, 2017 at 5:35 PM, Nico Kruber <[hidden email]>
>>> wrote:
>>>
>>> I like the new template but also agree with the text being too long and
>>>> would
>>>> move the intro to the contributors guide with a link in the PR template.
>>>>
>>>> Regarding the questions to fill out - I'd like the headings to be short
>>>> and
>>>> have the affected components last so that documentation is not lost
>>>> (although
>>>> being more important than this checklist), e.g.:
>>>>
>>>> * Purpose of the change
>>>> * Brief change log
>>>> * Verifying the change
>>>> * Documentation
>>>> * Affected components
>>>>
>>>> The verification options in the original template look a bit too large
>>>> but
>>>> it
>>>> stresses what tests should be added, especially for bigger changes. Can't
>>>> think of a way to make it shorter though.
>>>>
>>>>
>>>> Nico
>>>>
>>>> On Tuesday, 18 July 2017 11:20:41 CEST Chesnay Schepler wrote:
>>>>
>>>>> I fully agree with Fabian.
>>>>>
>>>>> Multiple-choice questions provide little value to the reviewer, since
>>>>> the
>>>>> validity has to be verified in any case. While text answers have to be
>>>>> validated as well,
>>>>> they give some hint to the reviewer as to how it can be verified and
>>>>> which steps the
>>>>> contributor did to do so.
>>>>>
>>>>> I also agree that it is too long; IMO this is really intimidating to new
>>>>> contributors to be greeted with this.
>>>>>
>>>>> Ideally we only link to the contributors guide and ask 3 questions:
>>>>>
>>>>>    * What is the problem?
>>>>>    * How was it fixed?
>>>>>    * How can the fix be verified?
>>>>>
>>>>> On 18.07.2017 10:47, Fabian Hueske wrote:
>>>>>
>>>>>> I like the sections about purpose, change log, and verification of the
>>>>>> changes.
>>>>>>
>>>>>> However, I think the proposed template is too much text. This is
>>>>>>
>>>>> probably
>>>>
>>>>> the reason why the first attempt to establish a PR template failed.
>>>>>> I would move most of the introduction and explanations incl. examples
>>>>>>
>>>>> to
>>>>
>>>>> the "Contribution Guidelines" and only pass a link.
>>>>>> IMO, the template should be rather shorter than the current one and
>>>>>>
>>>>> only
>>>>
>>>>> have the link, the sections to fill out, and checkboxes.
>>>>>>
>>>>>> I'm also not sure how much the detailed questions will help.
>>>>>> For example even if the question about changed dependencies is answered
>>>>>> with "no", the reviewer still has to check that.
>>>>>>
>>>>>> I think the questions of the current template work differently.
>>>>>> A question "Does the PR include tests?" suggests to the contributor
>>>>>>
>>>>> that
>>>>
>>>>> those should be included. Same for documentation.
>>>>>>
>>>>>> Cheers,
>>>>>> Fabian
>>>>>>
>>>>>> 2017-07-18 10:05 GMT+02:00 Tzu-Li (Gordon) Tai <[hidden email]>:
>>>>>>
>>>>>>> +1, I like this a lot.
>>>>>>> With the previous template, it doesn’t really resonate with what we
>>>>>>> should
>>>>>>> care about, and therefore most of the time I think contributors just
>>>>>>> delete
>>>>>>> that template and write down something on their own.
>>>>>>>
>>>>>>> I would also like to add: “Savepoint / checkpoint binary formats” to
>>>>>>>
>>>>>> the
>>>>
>>>>> potential affect scope check list.
>>>>>>> I think changes to those deserves an independent yes / no check of its
>>>>>>> own.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Gordon
>>>>>>>
>>>>>>> On 18 July 2017 at 3:49:42 PM, Ufuk Celebi ([hidden email]) wrote:
>>>>>>>
>>>>>>> I really like this and vote to change our current template.
>>>>>>>
>>>>>>> The simple yes/no/... options are a really good idea. I would add to
>>>>>>> your email that the questions will equally help reviewers to remember
>>>>>>> to look at these things, which is just as important.
>>>>>>>
>>>>>>> When we merge this, we should make sure to strictly follow the guide.
>>>>>>> Ideally, in the long term we can even automate some of the yes/no/...
>>>>>>> questions via a bot... but let's not get ahead of ourselves here ;-)
>>>>>>>
>>>>>>> On Mon, Jul 17, 2017 at 6:36 PM, Stephan Ewen <[hidden email]>
>>>>>>>
>>>>>> wrote:
>>>>
>>>>> Hi all!
>>>>>>>>
>>>>>>>> I have reflected a bit on the pull requests and on some of the recent
>>>>>>>> changes to Flink and some of the introduced bugs / regressions that
>>>>>>>>
>>>>>>> we
>>>>
>>>>> have
>>>>>>>
>>>>>>> fixed.
>>>>>>>>
>>>>>>>> One thing that I think would have helped is to have more explicit
>>>>>>>> information about what the pull request does and how the contributor
>>>>>>>>
>>>>>>> would
>>>>>>>
>>>>>>> suggest to verify it. I have seen this when contributing to some
>>>>>>>>
>>>>>>> other
>>>>
>>>>> project and really liked the approach.
>>>>>>>>
>>>>>>>> It requires that a contributor takes a minute to reflect on what was
>>>>>>>> touched, and what would be ways to verify that the changes work
>>>>>>>> properly.
>>>>>>>> Besides being a help to the reviewer, it also makes contributors
>>>>>>>>
>>>>>>> aware
>>>>
>>>>> of
>>>>>>>> what is important during the review process.
>>>>>>>>
>>>>>>>>
>>>>>>>> I suggest a new pull request template, as attached below, with a
>>>>>>>>
>>>>>>> preview
>>>>
>>>>> here:
>>>>>>>> https://github.com/StephanEwen/incubator-flink/
>>>>>>>>
>>>>>>> blob/pr_template/.github/PULL_REQUEST_TEMPLATE.md
>>>>>>>
>>>>>>> Don't be scared, it looks long, but a big part is the introductory
>>>>>>>>
>>>>>>> text
>>>>
>>>>> (only relevant for new contributors) and the examples contents for
>>>>>>>>
>>>>>>> the
>>>>
>>>>> description.
>>>>>>>> Filling this out for code that is in shape should be a quick thing:
>>>>>>>>
>>>>>>> Remove
>>>>>>>
>>>>>>> the into and checklist, write a few sentences on what the PR does
>>>>>>>>
>>>>>>> (one
>>>>
>>>>> should do that anyways) and then pick some yes/no in the
>>>>>>>>
>>>>>>> classification
>>>>
>>>>> section.
>>>>>>>>
>>>>>>>> Curious to hear what you think!
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Stephan
>>>>>>>>
>>>>>>>>
>>>>>>>> ============================
>>>>>>>>
>>>>>>>> Full suggested pull request template:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Thank you very much for contributing to Apache Flink - we are happy
>>>>>>>> that
>>>>>>>> you want to help us improve Flink. To help the community review you
>>>>>>>> contribution in the best possible way, please go through the
>>>>>>>>
>>>>>>> checklist
>>>>
>>>>> below, which will get the contribution into a shape in which it can
>>>>>>>>
>>>>>>> be
>>>>
>>>>> best
>>>>>>>
>>>>>>> reviewed.*
>>>>>>>>
>>>>>>>> *Please understand that we do not do this to make contributions to
>>>>>>>>
>>>>>>> Flink
>>>>
>>>>> a
>>>>>>>
>>>>>>> hassle. In order to uphold a high standard of quality for code
>>>>>>>> contributions, while at the same time managing a large number of
>>>>>>>> contributions, we need contributors to prepare the contributions
>>>>>>>>
>>>>>>> well,
>>>>
>>>>> and
>>>>>>>
>>>>>>> give reviewers enough contextual information for the review. Please
>>>>>>>>
>>>>>>> also
>>>>
>>>>> understand that contributions that do not follow this guide will take
>>>>>>>> longer to review and thus typically be picked up with lower priority
>>>>>>>>
>>>>>>> by
>>>>
>>>>> the
>>>>>>>
>>>>>>> community.*
>>>>>>>>
>>>>>>>> ## Contribution Checklist
>>>>>>>>
>>>>>>>> - Make sure that the pull request corresponds to a [JIRA issue](
>>>>>>>> https://issues.apache.org/jira/projects/FLINK/issues). Exceptions
>>>>>>>>
>>>>>>> are
>>>>
>>>>> made
>>>>>>>
>>>>>>> for typos in JavaDoc or documentation files, which need no JIRA
>>>>>>>>
>>>>>>> issue.
>>>>
>>>>> - Name the pull request in the form "[FLINK-1234] [component] Title
>>>>>>>>
>>>>>>> of
>>>>
>>>>> the pull request", where *FLINK-1234* should be replaced by the
>>>>>>>>
>>>>>>> actual
>>>>
>>>>> issue number. Skip *component* if you are unsure about which is the
>>>>>>>>
>>>>>>> best
>>>>
>>>>> component.
>>>>>>>> Typo fixes that have no associated JIRA issue should be named
>>>>>>>>
>>>>>>> following
>>>>
>>>>> this pattern: `[hotfix] [docs] Fix typo in event time introduction`
>>>>>>>>
>>>>>>> or
>>>>
>>>>> `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.
>>>>>>>>
>>>>>>>> - Fill out the template below to describe the changes contributed by
>>>>>>>>
>>>>>>> the
>>>>
>>>>> pull request. That will give reviewers the context they need to do
>>>>>>>>
>>>>>>> the
>>>>
>>>>> review.
>>>>>>>>
>>>>>>>> - Make sure that the change passes the automated tests, i.e., `mvn
>>>>>>>>
>>>>>>> clean
>>>>
>>>>> verify`
>>>>>>>>
>>>>>>>> - Each pull request should address only one issue, not mix up code
>>>>>>>>
>>>>>>> from
>>>>
>>>>> multiple issues.
>>>>>>>>
>>>>>>>> - Each commit in the pull request has a meaningful commit message
>>>>>>>> (including the JIRA id)
>>>>>>>>
>>>>>>>> - Once all items of the checklist are addressed, remove the above
>>>>>>>>
>>>>>>> text
>>>>
>>>>> and this checklist, leaving only the filled out template below.
>>>>>>>>
>>>>>>>>
>>>>>>>> **(The sections below can be removed for hotfixes of typos)**
>>>>>>>>
>>>>>>>> ## What is the purpose of the change
>>>>>>>>
>>>>>>>> *(For example: This pull request makes task deployment go through the
>>>>>>>>
>>>>>>> blob
>>>>>>>
>>>>>>> server, rather than through RPC. That way we avoid re-transferring
>>>>>>>>
>>>>>>> them
>>>>
>>>>> on
>>>>>>>
>>>>>>> each deployment (during recovery).)*
>>>>>>>>
>>>>>>>>
>>>>>>>> ## Brief change log
>>>>>>>>
>>>>>>>> *(for example:)*
>>>>>>>> - *The TaskInfo is stored in the blob store on job creation time as a
>>>>>>>> persistent artifact*
>>>>>>>> - *Deployments RPC transmits only the blob storage reference*
>>>>>>>> - *TaskManagers retrieve the TaskInfo from the blob cache*
>>>>>>>>
>>>>>>>>
>>>>>>>> ## Verifying this change
>>>>>>>>
>>>>>>>> *(Please pick either of the following options)*
>>>>>>>>
>>>>>>>> This change is a trivial rework / code cleanup without any test
>>>>>>>> coverage.
>>>>>>>>
>>>>>>>> *(or)*
>>>>>>>>
>>>>>>>> This change is already covered by existing tests, such as *(please
>>>>>>>>
>>>>>>> describe
>>>>>>>
>>>>>>> tests)*.
>>>>>>>>
>>>>>>>> *(or)*
>>>>>>>>
>>>>>>>> This change added tests and can be verified as follows:
>>>>>>>>
>>>>>>>> *(example:)*
>>>>>>>> - *Added integration tests for end-to-end deployment with large
>>>>>>>>
>>>>>>> payloads
>>>>
>>>>> (100MB)*
>>>>>>>> - *Extended integration test for recovery after master (JobManager)
>>>>>>>> failure*
>>>>>>>> - *Added test that validates that TaskInfo is transferred only once
>>>>>>>> across recoveries*
>>>>>>>> - *Manually verified the change by running a 4 node cluser with 2
>>>>>>>> JobManagers and 4 TaskManagers, a stateful streaming program, and
>>>>>>>> killing
>>>>>>>> one JobManager and to TaskManagers during the execution, verifying
>>>>>>>>
>>>>>>> that
>>>>
>>>>> recovery happens correctly.*
>>>>>>>>
>>>>>>>> ## Does this pull request potentially affect one of the following
>>>>>>>>
>>>>>>> parts:
>>>>
>>>>> - Dependencies (does it add or upgrade a dependency): **(yes / no)**
>>>>>>>> - The public API, i.e., is any changed class annotated with
>>>>>>>> `@Public(Evolving)`: **(yes / no)**
>>>>>>>> - The serializers: **(yes / no / don't know)**
>>>>>>>> - The runtime per-record code paths (performance sensitive): **(yes
>>>>>>>>
>>>>>>> / no
>>>>
>>>>> / don't know)**
>>>>>>>> - Anything that affects deployment or recovery: JobManager (and its
>>>>>>>> components), Checkpointing, Yarn/Mesos, ZooKeeper: **(yes / no /
>>>>>>>>
>>>>>>> don't
>>>>
>>>>> know)**:
>>>>>>>>
>>>>>>>> ## Documentation
>>>>>>>>
>>>>>>>> - Does this pull request introduce a new feature? **(yes / no)**
>>>>>>>> - If yes, how is the feature documented? **(not applicable / docs /
>>>>>>>> JavaDocs / not documented)**
>>>>>>>>
>>>>>>>
>>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A more thorough Pull Request check list and template

Eron Wright-2
In reply to this post by Stephan Ewen
This seems like a good step in establishing a better PR process.  I believe
the process could be improved to ensure timely and targeted review by
component experts and committers.

On Mon, Jul 17, 2017 at 9:36 AM, Stephan Ewen <[hidden email]> wrote:

> Hi all!
>
> I have reflected a bit on the pull requests and on some of the recent
> changes to Flink and some of the introduced bugs / regressions that we have
> fixed.
>
> One thing that I think would have helped is to have more explicit
> information about what the pull request does and how the contributor would
> suggest to verify it. I have seen this when contributing to some other
> project and really liked the approach.
>
> It requires that a contributor takes a minute to reflect on what was
> touched, and what would be ways to verify that the changes work properly.
> Besides being a help to the reviewer, it also makes contributors aware of
> what is important during the review process.
>
>
> I suggest a new pull request template, as attached below, with a preview
> here:
> https://github.com/StephanEwen/incubator-flink/
> blob/pr_template/.github/PULL_REQUEST_TEMPLATE.md
>
> Don't be scared, it looks long, but a big part is the introductory text
> (only relevant for new contributors) and the examples contents for the
> description.
>
> Filling this out for code that is in shape should be a quick thing: Remove
> the into and checklist, write a few sentences on what the PR does (one
> should do that anyways) and then pick some yes/no in the classification
> section.
>
> Curious to hear what you think!
>
> Best,
> Stephan
>
>
> ============================
>
> Full suggested pull request template:
>
>
>
> *Thank you very much for contributing to Apache Flink - we are happy that
> you want to help us improve Flink. To help the community review you
> contribution in the best possible way, please go through the checklist
> below, which will get the contribution into a shape in which it can be best
> reviewed.*
>
> *Please understand that we do not do this to make contributions to Flink a
> hassle. In order to uphold a high standard of quality for code
> contributions, while at the same time managing a large number of
> contributions, we need contributors to prepare the contributions well, and
> give reviewers enough contextual information for the review. Please also
> understand that contributions that do not follow this guide will take
> longer to review and thus typically be picked up with lower priority by the
> community.*
>
> ## Contribution Checklist
>
>   - Make sure that the pull request corresponds to a [JIRA issue](
> https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are made
> for typos in JavaDoc or documentation files, which need no JIRA issue.
>
>   - Name the pull request in the form "[FLINK-1234] [component] Title of
> the pull request", where *FLINK-1234* should be replaced by the actual
> issue number. Skip *component* if you are unsure about which is the best
> component.
>   Typo fixes that have no associated JIRA issue should be named following
> this pattern: `[hotfix] [docs] Fix typo in event time introduction` or
> `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.
>
>   - Fill out the template below to describe the changes contributed by the
> pull request. That will give reviewers the context they need to do the
> review.
>
>   - Make sure that the change passes the automated tests, i.e., `mvn clean
> verify`
>
>   - Each pull request should address only one issue, not mix up code from
> multiple issues.
>
>   - Each commit in the pull request has a meaningful commit message
> (including the JIRA id)
>
>   - Once all items of the checklist are addressed, remove the above text
> and this checklist, leaving only the filled out template below.
>
>
> **(The sections below can be removed for hotfixes of typos)**
>
> ## What is the purpose of the change
>
> *(For example: This pull request makes task deployment go through the blob
> server, rather than through RPC. That way we avoid re-transferring them on
> each deployment (during recovery).)*
>
>
> ## Brief change log
>
> *(for example:)*
>   - *The TaskInfo is stored in the blob store on job creation time as a
> persistent artifact*
>   - *Deployments RPC transmits only the blob storage reference*
>   - *TaskManagers retrieve the TaskInfo from the blob cache*
>
>
> ## Verifying this change
>
> *(Please pick either of the following options)*
>
> This change is a trivial rework / code cleanup without any test coverage.
>
> *(or)*
>
> This change is already covered by existing tests, such as *(please describe
> tests)*.
>
> *(or)*
>
> This change added tests and can be verified as follows:
>
> *(example:)*
>   - *Added integration tests for end-to-end deployment with large payloads
> (100MB)*
>   - *Extended integration test for recovery after master (JobManager)
> failure*
>   - *Added test that validates that TaskInfo is transferred only once
> across recoveries*
>   - *Manually verified the change by running a 4 node cluser with 2
> JobManagers and 4 TaskManagers, a stateful streaming program, and killing
> one JobManager and to TaskManagers during the execution, verifying that
> recovery happens correctly.*
>
> ## Does this pull request potentially affect one of the following parts:
>
>   - Dependencies (does it add or upgrade a dependency): **(yes / no)**
>   - The public API, i.e., is any changed class annotated with
> `@Public(Evolving)`: **(yes / no)**
>   - The serializers: **(yes / no / don't know)**
>   - The runtime per-record code paths (performance sensitive): **(yes / no
> / don't know)**
>   - Anything that affects deployment or recovery: JobManager (and its
> components), Checkpointing, Yarn/Mesos, ZooKeeper: **(yes / no / don't
> know)**:
>
> ## Documentation
>
>   - Does this pull request introduce a new feature? **(yes / no)**
>   - If yes, how is the feature documented? **(not applicable / docs /
> JavaDocs / not documented)**
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A more thorough Pull Request check list and template

Stephan Ewen
@Eron Review timeliness would be great to improve.

Some observation from the past year:
There were periods where some components in Flink were making slow progress
because all committers knowledgeable in those components were busy handling
pull requests that were opened against those components, but were not in
good shape, were adding not discussed designs, etc.

I think the only way to ensure timely handling of pull requests is to be
very strict in the handling. For example any non-trivial change needs prior
discussion, agreement that this should be fixed now, and an agreed upon
design doc. Otherwise the PR is not considered and simply rejected. Same
for presence of docs, proper tests, ...

But, I fear that introducing such strictness will scare off many in the
community. So I would be very reluctant to do this.
After all, many pull requests do bring in a good piece of perspective, at
least, even if the code is not immediately suited for contribution...


On Mon, Jul 24, 2017 at 8:18 PM, Eron Wright <[hidden email]> wrote:

> This seems like a good step in establishing a better PR process.  I believe
> the process could be improved to ensure timely and targeted review by
> component experts and committers.
>
> On Mon, Jul 17, 2017 at 9:36 AM, Stephan Ewen <[hidden email]> wrote:
>
> > Hi all!
> >
> > I have reflected a bit on the pull requests and on some of the recent
> > changes to Flink and some of the introduced bugs / regressions that we
> have
> > fixed.
> >
> > One thing that I think would have helped is to have more explicit
> > information about what the pull request does and how the contributor
> would
> > suggest to verify it. I have seen this when contributing to some other
> > project and really liked the approach.
> >
> > It requires that a contributor takes a minute to reflect on what was
> > touched, and what would be ways to verify that the changes work properly.
> > Besides being a help to the reviewer, it also makes contributors aware of
> > what is important during the review process.
> >
> >
> > I suggest a new pull request template, as attached below, with a preview
> > here:
> > https://github.com/StephanEwen/incubator-flink/
> > blob/pr_template/.github/PULL_REQUEST_TEMPLATE.md
> >
> > Don't be scared, it looks long, but a big part is the introductory text
> > (only relevant for new contributors) and the examples contents for the
> > description.
> >
> > Filling this out for code that is in shape should be a quick thing:
> Remove
> > the into and checklist, write a few sentences on what the PR does (one
> > should do that anyways) and then pick some yes/no in the classification
> > section.
> >
> > Curious to hear what you think!
> >
> > Best,
> > Stephan
> >
> >
> > ============================
> >
> > Full suggested pull request template:
> >
> >
> >
> > *Thank you very much for contributing to Apache Flink - we are happy that
> > you want to help us improve Flink. To help the community review you
> > contribution in the best possible way, please go through the checklist
> > below, which will get the contribution into a shape in which it can be
> best
> > reviewed.*
> >
> > *Please understand that we do not do this to make contributions to Flink
> a
> > hassle. In order to uphold a high standard of quality for code
> > contributions, while at the same time managing a large number of
> > contributions, we need contributors to prepare the contributions well,
> and
> > give reviewers enough contextual information for the review. Please also
> > understand that contributions that do not follow this guide will take
> > longer to review and thus typically be picked up with lower priority by
> the
> > community.*
> >
> > ## Contribution Checklist
> >
> >   - Make sure that the pull request corresponds to a [JIRA issue](
> > https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are
> made
> > for typos in JavaDoc or documentation files, which need no JIRA issue.
> >
> >   - Name the pull request in the form "[FLINK-1234] [component] Title of
> > the pull request", where *FLINK-1234* should be replaced by the actual
> > issue number. Skip *component* if you are unsure about which is the best
> > component.
> >   Typo fixes that have no associated JIRA issue should be named following
> > this pattern: `[hotfix] [docs] Fix typo in event time introduction` or
> > `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.
> >
> >   - Fill out the template below to describe the changes contributed by
> the
> > pull request. That will give reviewers the context they need to do the
> > review.
> >
> >   - Make sure that the change passes the automated tests, i.e., `mvn
> clean
> > verify`
> >
> >   - Each pull request should address only one issue, not mix up code from
> > multiple issues.
> >
> >   - Each commit in the pull request has a meaningful commit message
> > (including the JIRA id)
> >
> >   - Once all items of the checklist are addressed, remove the above text
> > and this checklist, leaving only the filled out template below.
> >
> >
> > **(The sections below can be removed for hotfixes of typos)**
> >
> > ## What is the purpose of the change
> >
> > *(For example: This pull request makes task deployment go through the
> blob
> > server, rather than through RPC. That way we avoid re-transferring them
> on
> > each deployment (during recovery).)*
> >
> >
> > ## Brief change log
> >
> > *(for example:)*
> >   - *The TaskInfo is stored in the blob store on job creation time as a
> > persistent artifact*
> >   - *Deployments RPC transmits only the blob storage reference*
> >   - *TaskManagers retrieve the TaskInfo from the blob cache*
> >
> >
> > ## Verifying this change
> >
> > *(Please pick either of the following options)*
> >
> > This change is a trivial rework / code cleanup without any test coverage.
> >
> > *(or)*
> >
> > This change is already covered by existing tests, such as *(please
> describe
> > tests)*.
> >
> > *(or)*
> >
> > This change added tests and can be verified as follows:
> >
> > *(example:)*
> >   - *Added integration tests for end-to-end deployment with large
> payloads
> > (100MB)*
> >   - *Extended integration test for recovery after master (JobManager)
> > failure*
> >   - *Added test that validates that TaskInfo is transferred only once
> > across recoveries*
> >   - *Manually verified the change by running a 4 node cluser with 2
> > JobManagers and 4 TaskManagers, a stateful streaming program, and killing
> > one JobManager and to TaskManagers during the execution, verifying that
> > recovery happens correctly.*
> >
> > ## Does this pull request potentially affect one of the following parts:
> >
> >   - Dependencies (does it add or upgrade a dependency): **(yes / no)**
> >   - The public API, i.e., is any changed class annotated with
> > `@Public(Evolving)`: **(yes / no)**
> >   - The serializers: **(yes / no / don't know)**
> >   - The runtime per-record code paths (performance sensitive): **(yes /
> no
> > / don't know)**
> >   - Anything that affects deployment or recovery: JobManager (and its
> > components), Checkpointing, Yarn/Mesos, ZooKeeper: **(yes / no / don't
> > know)**:
> >
> > ## Documentation
> >
> >   - Does this pull request introduce a new feature? **(yes / no)**
> >   - If yes, how is the feature documented? **(not applicable / docs /
> > JavaDocs / not documented)**
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A more thorough Pull Request check list and template

Eron Wright-2
I think a combination of techniques would be effective:
- identifying focus areas for the next release (e.g. see Robert's thread
about 1.4)
- emphasizing design discussion in advance of a PR
- assigning reviewers and a steward in a structured way
- using a label or assignment field to 'pass the baton'
- closing rejected PRs decisively

I don't mean to co-opt this thread for a broader process question, but
figured that the PR template could provide additional process guidance.

On Mon, Jul 24, 2017 at 11:55 AM, Stephan Ewen <[hidden email]> wrote:

> @Eron Review timeliness would be great to improve.
>
> Some observation from the past year:
> There were periods where some components in Flink were making slow progress
> because all committers knowledgeable in those components were busy handling
> pull requests that were opened against those components, but were not in
> good shape, were adding not discussed designs, etc.
>
> I think the only way to ensure timely handling of pull requests is to be
> very strict in the handling. For example any non-trivial change needs prior
> discussion, agreement that this should be fixed now, and an agreed upon
> design doc. Otherwise the PR is not considered and simply rejected. Same
> for presence of docs, proper tests, ...
>
> But, I fear that introducing such strictness will scare off many in the
> community. So I would be very reluctant to do this.
> After all, many pull requests do bring in a good piece of perspective, at
> least, even if the code is not immediately suited for contribution...
>
>
> On Mon, Jul 24, 2017 at 8:18 PM, Eron Wright <[hidden email]> wrote:
>
> > This seems like a good step in establishing a better PR process.  I
> believe
> > the process could be improved to ensure timely and targeted review by
> > component experts and committers.
> >
> > On Mon, Jul 17, 2017 at 9:36 AM, Stephan Ewen <[hidden email]> wrote:
> >
> > > Hi all!
> > >
> > > I have reflected a bit on the pull requests and on some of the recent
> > > changes to Flink and some of the introduced bugs / regressions that we
> > have
> > > fixed.
> > >
> > > One thing that I think would have helped is to have more explicit
> > > information about what the pull request does and how the contributor
> > would
> > > suggest to verify it. I have seen this when contributing to some other
> > > project and really liked the approach.
> > >
> > > It requires that a contributor takes a minute to reflect on what was
> > > touched, and what would be ways to verify that the changes work
> properly.
> > > Besides being a help to the reviewer, it also makes contributors aware
> of
> > > what is important during the review process.
> > >
> > >
> > > I suggest a new pull request template, as attached below, with a
> preview
> > > here:
> > > https://github.com/StephanEwen/incubator-flink/
> > > blob/pr_template/.github/PULL_REQUEST_TEMPLATE.md
> > >
> > > Don't be scared, it looks long, but a big part is the introductory text
> > > (only relevant for new contributors) and the examples contents for the
> > > description.
> > >
> > > Filling this out for code that is in shape should be a quick thing:
> > Remove
> > > the into and checklist, write a few sentences on what the PR does (one
> > > should do that anyways) and then pick some yes/no in the classification
> > > section.
> > >
> > > Curious to hear what you think!
> > >
> > > Best,
> > > Stephan
> > >
> > >
> > > ============================
> > >
> > > Full suggested pull request template:
> > >
> > >
> > >
> > > *Thank you very much for contributing to Apache Flink - we are happy
> that
> > > you want to help us improve Flink. To help the community review you
> > > contribution in the best possible way, please go through the checklist
> > > below, which will get the contribution into a shape in which it can be
> > best
> > > reviewed.*
> > >
> > > *Please understand that we do not do this to make contributions to
> Flink
> > a
> > > hassle. In order to uphold a high standard of quality for code
> > > contributions, while at the same time managing a large number of
> > > contributions, we need contributors to prepare the contributions well,
> > and
> > > give reviewers enough contextual information for the review. Please
> also
> > > understand that contributions that do not follow this guide will take
> > > longer to review and thus typically be picked up with lower priority by
> > the
> > > community.*
> > >
> > > ## Contribution Checklist
> > >
> > >   - Make sure that the pull request corresponds to a [JIRA issue](
> > > https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are
> > made
> > > for typos in JavaDoc or documentation files, which need no JIRA issue.
> > >
> > >   - Name the pull request in the form "[FLINK-1234] [component] Title
> of
> > > the pull request", where *FLINK-1234* should be replaced by the actual
> > > issue number. Skip *component* if you are unsure about which is the
> best
> > > component.
> > >   Typo fixes that have no associated JIRA issue should be named
> following
> > > this pattern: `[hotfix] [docs] Fix typo in event time introduction` or
> > > `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.
> > >
> > >   - Fill out the template below to describe the changes contributed by
> > the
> > > pull request. That will give reviewers the context they need to do the
> > > review.
> > >
> > >   - Make sure that the change passes the automated tests, i.e., `mvn
> > clean
> > > verify`
> > >
> > >   - Each pull request should address only one issue, not mix up code
> from
> > > multiple issues.
> > >
> > >   - Each commit in the pull request has a meaningful commit message
> > > (including the JIRA id)
> > >
> > >   - Once all items of the checklist are addressed, remove the above
> text
> > > and this checklist, leaving only the filled out template below.
> > >
> > >
> > > **(The sections below can be removed for hotfixes of typos)**
> > >
> > > ## What is the purpose of the change
> > >
> > > *(For example: This pull request makes task deployment go through the
> > blob
> > > server, rather than through RPC. That way we avoid re-transferring them
> > on
> > > each deployment (during recovery).)*
> > >
> > >
> > > ## Brief change log
> > >
> > > *(for example:)*
> > >   - *The TaskInfo is stored in the blob store on job creation time as a
> > > persistent artifact*
> > >   - *Deployments RPC transmits only the blob storage reference*
> > >   - *TaskManagers retrieve the TaskInfo from the blob cache*
> > >
> > >
> > > ## Verifying this change
> > >
> > > *(Please pick either of the following options)*
> > >
> > > This change is a trivial rework / code cleanup without any test
> coverage.
> > >
> > > *(or)*
> > >
> > > This change is already covered by existing tests, such as *(please
> > describe
> > > tests)*.
> > >
> > > *(or)*
> > >
> > > This change added tests and can be verified as follows:
> > >
> > > *(example:)*
> > >   - *Added integration tests for end-to-end deployment with large
> > payloads
> > > (100MB)*
> > >   - *Extended integration test for recovery after master (JobManager)
> > > failure*
> > >   - *Added test that validates that TaskInfo is transferred only once
> > > across recoveries*
> > >   - *Manually verified the change by running a 4 node cluser with 2
> > > JobManagers and 4 TaskManagers, a stateful streaming program, and
> killing
> > > one JobManager and to TaskManagers during the execution, verifying that
> > > recovery happens correctly.*
> > >
> > > ## Does this pull request potentially affect one of the following
> parts:
> > >
> > >   - Dependencies (does it add or upgrade a dependency): **(yes / no)**
> > >   - The public API, i.e., is any changed class annotated with
> > > `@Public(Evolving)`: **(yes / no)**
> > >   - The serializers: **(yes / no / don't know)**
> > >   - The runtime per-record code paths (performance sensitive): **(yes /
> > no
> > > / don't know)**
> > >   - Anything that affects deployment or recovery: JobManager (and its
> > > components), Checkpointing, Yarn/Mesos, ZooKeeper: **(yes / no / don't
> > > know)**:
> > >
> > > ## Documentation
> > >
> > >   - Does this pull request introduce a new feature? **(yes / no)**
> > >   - If yes, how is the feature documented? **(not applicable / docs /
> > > JavaDocs / not documented)**
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A more thorough Pull Request check list and template

Chesnay Schepler-3
In reply to this post by Ufuk Celebi-2
I'm still apprehensive about it, but don't mind trying it out. It's not
like it can break something.

On 24.07.2017 18:52, Ufuk Celebi wrote:

> What's the conclusion of last weeks discussion here?
>
> Fabian and Chesnay raised concerns about the introductory text. Are
> you still concerned?
>
> On Wed, Jul 19, 2017 at 10:04 AM, Stephan Ewen <[hidden email]> wrote:
>> @Chesnay:
>>
>> Put text into template => contributor will have to read it
>> Put link to text into template => most contributors will ignore the link
>>
>> Yes, that is pretty much what my observation from the past is.
>>
>>
>>
>> On Tue, Jul 18, 2017 at 11:03 PM, Chesnay Schepler <[hidden email]>
>> wrote:
>>
>>> I'm sorry but i can't follow your logic.
>>>
>>> Put text into template => contributor will definitely read it
>>> Put link to text into template => contributor will completely ignore the
>>> link
>>>
>>> The advantage of the link is we don't duplicate the contribution guide in
>>> the docs and in the template.
>>> Furthermore, you don't even need to remove something from the template,
>>> since it's just a single line.
>>>
>>>
>>> On 18.07.2017 19:25, Stephan Ewen wrote:
>>>
>>>> Concerning moving text to the contributors guide:
>>>>
>>>> I can only say it again: I believe the contribution guide is almost dead
>>>> text. Very few people read it.
>>>> Before the current template was introduced, new contributors rarely gave
>>>> the pull request a name with Jira number. That is a good indicator about
>>>> how many read this guide.
>>>> Putting the test in the template is a way that every one reads it.
>>>>
>>>>
>>>> I am also wondering what the concern is.
>>>> A new contributor should clearly read through a bit of text, to learn what
>>>> we look for in contributions.
>>>> A recurring contributor will not have to read it again, simply remove the
>>>> text from the pull request message and go on.
>>>>
>>>> Where is the disadvantage?
>>>>
>>>>
>>>> On Tue, Jul 18, 2017 at 5:35 PM, Nico Kruber <[hidden email]>
>>>> wrote:
>>>>
>>>> I like the new template but also agree with the text being too long and
>>>>> would
>>>>> move the intro to the contributors guide with a link in the PR template.
>>>>>
>>>>> Regarding the questions to fill out - I'd like the headings to be short
>>>>> and
>>>>> have the affected components last so that documentation is not lost
>>>>> (although
>>>>> being more important than this checklist), e.g.:
>>>>>
>>>>> * Purpose of the change
>>>>> * Brief change log
>>>>> * Verifying the change
>>>>> * Documentation
>>>>> * Affected components
>>>>>
>>>>> The verification options in the original template look a bit too large
>>>>> but
>>>>> it
>>>>> stresses what tests should be added, especially for bigger changes. Can't
>>>>> think of a way to make it shorter though.
>>>>>
>>>>>
>>>>> Nico
>>>>>
>>>>> On Tuesday, 18 July 2017 11:20:41 CEST Chesnay Schepler wrote:
>>>>>
>>>>>> I fully agree with Fabian.
>>>>>>
>>>>>> Multiple-choice questions provide little value to the reviewer, since
>>>>>> the
>>>>>> validity has to be verified in any case. While text answers have to be
>>>>>> validated as well,
>>>>>> they give some hint to the reviewer as to how it can be verified and
>>>>>> which steps the
>>>>>> contributor did to do so.
>>>>>>
>>>>>> I also agree that it is too long; IMO this is really intimidating to new
>>>>>> contributors to be greeted with this.
>>>>>>
>>>>>> Ideally we only link to the contributors guide and ask 3 questions:
>>>>>>
>>>>>>     * What is the problem?
>>>>>>     * How was it fixed?
>>>>>>     * How can the fix be verified?
>>>>>>
>>>>>> On 18.07.2017 10:47, Fabian Hueske wrote:
>>>>>>
>>>>>>> I like the sections about purpose, change log, and verification of the
>>>>>>> changes.
>>>>>>>
>>>>>>> However, I think the proposed template is too much text. This is
>>>>>>>
>>>>>> probably
>>>>>> the reason why the first attempt to establish a PR template failed.
>>>>>>> I would move most of the introduction and explanations incl. examples
>>>>>>>
>>>>>> to
>>>>>> the "Contribution Guidelines" and only pass a link.
>>>>>>> IMO, the template should be rather shorter than the current one and
>>>>>>>
>>>>>> only
>>>>>> have the link, the sections to fill out, and checkboxes.
>>>>>>> I'm also not sure how much the detailed questions will help.
>>>>>>> For example even if the question about changed dependencies is answered
>>>>>>> with "no", the reviewer still has to check that.
>>>>>>>
>>>>>>> I think the questions of the current template work differently.
>>>>>>> A question "Does the PR include tests?" suggests to the contributor
>>>>>>>
>>>>>> that
>>>>>> those should be included. Same for documentation.
>>>>>>> Cheers,
>>>>>>> Fabian
>>>>>>>
>>>>>>> 2017-07-18 10:05 GMT+02:00 Tzu-Li (Gordon) Tai <[hidden email]>:
>>>>>>>
>>>>>>>> +1, I like this a lot.
>>>>>>>> With the previous template, it doesn’t really resonate with what we
>>>>>>>> should
>>>>>>>> care about, and therefore most of the time I think contributors just
>>>>>>>> delete
>>>>>>>> that template and write down something on their own.
>>>>>>>>
>>>>>>>> I would also like to add: “Savepoint / checkpoint binary formats” to
>>>>>>>>
>>>>>>> the
>>>>>> potential affect scope check list.
>>>>>>>> I think changes to those deserves an independent yes / no check of its
>>>>>>>> own.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Gordon
>>>>>>>>
>>>>>>>> On 18 July 2017 at 3:49:42 PM, Ufuk Celebi ([hidden email]) wrote:
>>>>>>>>
>>>>>>>> I really like this and vote to change our current template.
>>>>>>>>
>>>>>>>> The simple yes/no/... options are a really good idea. I would add to
>>>>>>>> your email that the questions will equally help reviewers to remember
>>>>>>>> to look at these things, which is just as important.
>>>>>>>>
>>>>>>>> When we merge this, we should make sure to strictly follow the guide.
>>>>>>>> Ideally, in the long term we can even automate some of the yes/no/...
>>>>>>>> questions via a bot... but let's not get ahead of ourselves here ;-)
>>>>>>>>
>>>>>>>> On Mon, Jul 17, 2017 at 6:36 PM, Stephan Ewen <[hidden email]>
>>>>>>>>
>>>>>>> wrote:
>>>>>> Hi all!
>>>>>>>>> I have reflected a bit on the pull requests and on some of the recent
>>>>>>>>> changes to Flink and some of the introduced bugs / regressions that
>>>>>>>>>
>>>>>>>> we
>>>>>> have
>>>>>>>> fixed.
>>>>>>>>> One thing that I think would have helped is to have more explicit
>>>>>>>>> information about what the pull request does and how the contributor
>>>>>>>>>
>>>>>>>> would
>>>>>>>>
>>>>>>>> suggest to verify it. I have seen this when contributing to some
>>>>>>>> other
>>>>>> project and really liked the approach.
>>>>>>>>> It requires that a contributor takes a minute to reflect on what was
>>>>>>>>> touched, and what would be ways to verify that the changes work
>>>>>>>>> properly.
>>>>>>>>> Besides being a help to the reviewer, it also makes contributors
>>>>>>>>>
>>>>>>>> aware
>>>>>> of
>>>>>>>>> what is important during the review process.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I suggest a new pull request template, as attached below, with a
>>>>>>>>>
>>>>>>>> preview
>>>>>> here:
>>>>>>>>> https://github.com/StephanEwen/incubator-flink/
>>>>>>>>>
>>>>>>>> blob/pr_template/.github/PULL_REQUEST_TEMPLATE.md
>>>>>>>>
>>>>>>>> Don't be scared, it looks long, but a big part is the introductory
>>>>>>>> text
>>>>>> (only relevant for new contributors) and the examples contents for
>>>>>>>> the
>>>>>> description.
>>>>>>>>> Filling this out for code that is in shape should be a quick thing:
>>>>>>>>>
>>>>>>>> Remove
>>>>>>>>
>>>>>>>> the into and checklist, write a few sentences on what the PR does
>>>>>>>> (one
>>>>>> should do that anyways) and then pick some yes/no in the
>>>>>>>> classification
>>>>>> section.
>>>>>>>>> Curious to hear what you think!
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Stephan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ============================
>>>>>>>>>
>>>>>>>>> Full suggested pull request template:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Thank you very much for contributing to Apache Flink - we are happy
>>>>>>>>> that
>>>>>>>>> you want to help us improve Flink. To help the community review you
>>>>>>>>> contribution in the best possible way, please go through the
>>>>>>>>>
>>>>>>>> checklist
>>>>>> below, which will get the contribution into a shape in which it can
>>>>>>>> be
>>>>>> best
>>>>>>>> reviewed.*
>>>>>>>>> *Please understand that we do not do this to make contributions to
>>>>>>>>>
>>>>>>>> Flink
>>>>>> a
>>>>>>>> hassle. In order to uphold a high standard of quality for code
>>>>>>>>> contributions, while at the same time managing a large number of
>>>>>>>>> contributions, we need contributors to prepare the contributions
>>>>>>>>>
>>>>>>>> well,
>>>>>> and
>>>>>>>> give reviewers enough contextual information for the review. Please
>>>>>>>> also
>>>>>> understand that contributions that do not follow this guide will take
>>>>>>>>> longer to review and thus typically be picked up with lower priority
>>>>>>>>>
>>>>>>>> by
>>>>>> the
>>>>>>>> community.*
>>>>>>>>> ## Contribution Checklist
>>>>>>>>>
>>>>>>>>> - Make sure that the pull request corresponds to a [JIRA issue](
>>>>>>>>> https://issues.apache.org/jira/projects/FLINK/issues). Exceptions
>>>>>>>>>
>>>>>>>> are
>>>>>> made
>>>>>>>> for typos in JavaDoc or documentation files, which need no JIRA
>>>>>>>> issue.
>>>>>> - Name the pull request in the form "[FLINK-1234] [component] Title
>>>>>>>> of
>>>>>> the pull request", where *FLINK-1234* should be replaced by the
>>>>>>>> actual
>>>>>> issue number. Skip *component* if you are unsure about which is the
>>>>>>>> best
>>>>>> component.
>>>>>>>>> Typo fixes that have no associated JIRA issue should be named
>>>>>>>>>
>>>>>>>> following
>>>>>> this pattern: `[hotfix] [docs] Fix typo in event time introduction`
>>>>>>>> or
>>>>>> `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.
>>>>>>>>> - Fill out the template below to describe the changes contributed by
>>>>>>>>>
>>>>>>>> the
>>>>>> pull request. That will give reviewers the context they need to do
>>>>>>>> the
>>>>>> review.
>>>>>>>>> - Make sure that the change passes the automated tests, i.e., `mvn
>>>>>>>>>
>>>>>>>> clean
>>>>>> verify`
>>>>>>>>> - Each pull request should address only one issue, not mix up code
>>>>>>>>>
>>>>>>>> from
>>>>>> multiple issues.
>>>>>>>>> - Each commit in the pull request has a meaningful commit message
>>>>>>>>> (including the JIRA id)
>>>>>>>>>
>>>>>>>>> - Once all items of the checklist are addressed, remove the above
>>>>>>>>>
>>>>>>>> text
>>>>>> and this checklist, leaving only the filled out template below.
>>>>>>>>>
>>>>>>>>> **(The sections below can be removed for hotfixes of typos)**
>>>>>>>>>
>>>>>>>>> ## What is the purpose of the change
>>>>>>>>>
>>>>>>>>> *(For example: This pull request makes task deployment go through the
>>>>>>>>>
>>>>>>>> blob
>>>>>>>>
>>>>>>>> server, rather than through RPC. That way we avoid re-transferring
>>>>>>>> them
>>>>>> on
>>>>>>>> each deployment (during recovery).)*
>>>>>>>>>
>>>>>>>>> ## Brief change log
>>>>>>>>>
>>>>>>>>> *(for example:)*
>>>>>>>>> - *The TaskInfo is stored in the blob store on job creation time as a
>>>>>>>>> persistent artifact*
>>>>>>>>> - *Deployments RPC transmits only the blob storage reference*
>>>>>>>>> - *TaskManagers retrieve the TaskInfo from the blob cache*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ## Verifying this change
>>>>>>>>>
>>>>>>>>> *(Please pick either of the following options)*
>>>>>>>>>
>>>>>>>>> This change is a trivial rework / code cleanup without any test
>>>>>>>>> coverage.
>>>>>>>>>
>>>>>>>>> *(or)*
>>>>>>>>>
>>>>>>>>> This change is already covered by existing tests, such as *(please
>>>>>>>>>
>>>>>>>> describe
>>>>>>>>
>>>>>>>> tests)*.
>>>>>>>>> *(or)*
>>>>>>>>>
>>>>>>>>> This change added tests and can be verified as follows:
>>>>>>>>>
>>>>>>>>> *(example:)*
>>>>>>>>> - *Added integration tests for end-to-end deployment with large
>>>>>>>>>
>>>>>>>> payloads
>>>>>> (100MB)*
>>>>>>>>> - *Extended integration test for recovery after master (JobManager)
>>>>>>>>> failure*
>>>>>>>>> - *Added test that validates that TaskInfo is transferred only once
>>>>>>>>> across recoveries*
>>>>>>>>> - *Manually verified the change by running a 4 node cluser with 2
>>>>>>>>> JobManagers and 4 TaskManagers, a stateful streaming program, and
>>>>>>>>> killing
>>>>>>>>> one JobManager and to TaskManagers during the execution, verifying
>>>>>>>>>
>>>>>>>> that
>>>>>> recovery happens correctly.*
>>>>>>>>> ## Does this pull request potentially affect one of the following
>>>>>>>>>
>>>>>>>> parts:
>>>>>> - Dependencies (does it add or upgrade a dependency): **(yes / no)**
>>>>>>>>> - The public API, i.e., is any changed class annotated with
>>>>>>>>> `@Public(Evolving)`: **(yes / no)**
>>>>>>>>> - The serializers: **(yes / no / don't know)**
>>>>>>>>> - The runtime per-record code paths (performance sensitive): **(yes
>>>>>>>>>
>>>>>>>> / no
>>>>>> / don't know)**
>>>>>>>>> - Anything that affects deployment or recovery: JobManager (and its
>>>>>>>>> components), Checkpointing, Yarn/Mesos, ZooKeeper: **(yes / no /
>>>>>>>>>
>>>>>>>> don't
>>>>>> know)**:
>>>>>>>>> ## Documentation
>>>>>>>>>
>>>>>>>>> - Does this pull request introduce a new feature? **(yes / no)**
>>>>>>>>> - If yes, how is the feature documented? **(not applicable / docs /
>>>>>>>>> JavaDocs / not documented)**
>>>>>>>>>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A more thorough Pull Request check list and template

Ufuk Celebi-2
Great! :-) If Fabian is also OK with trying it out, I would ask
Stephan to open a PR for this against Flink.


On Tue, Jul 25, 2017 at 8:50 AM, Chesnay Schepler <[hidden email]> wrote:

> I'm still apprehensive about it, but don't mind trying it out. It's not like
> it can break something.
>
>
> On 24.07.2017 18:52, Ufuk Celebi wrote:
>>
>> What's the conclusion of last weeks discussion here?
>>
>> Fabian and Chesnay raised concerns about the introductory text. Are
>> you still concerned?
>>
>> On Wed, Jul 19, 2017 at 10:04 AM, Stephan Ewen <[hidden email]> wrote:
>>>
>>> @Chesnay:
>>>
>>> Put text into template => contributor will have to read it
>>> Put link to text into template => most contributors will ignore the link
>>>
>>> Yes, that is pretty much what my observation from the past is.
>>>
>>>
>>>
>>> On Tue, Jul 18, 2017 at 11:03 PM, Chesnay Schepler <[hidden email]>
>>> wrote:
>>>
>>>> I'm sorry but i can't follow your logic.
>>>>
>>>> Put text into template => contributor will definitely read it
>>>> Put link to text into template => contributor will completely ignore the
>>>> link
>>>>
>>>> The advantage of the link is we don't duplicate the contribution guide
>>>> in
>>>> the docs and in the template.
>>>> Furthermore, you don't even need to remove something from the template,
>>>> since it's just a single line.
>>>>
>>>>
>>>> On 18.07.2017 19:25, Stephan Ewen wrote:
>>>>
>>>>> Concerning moving text to the contributors guide:
>>>>>
>>>>> I can only say it again: I believe the contribution guide is almost
>>>>> dead
>>>>> text. Very few people read it.
>>>>> Before the current template was introduced, new contributors rarely
>>>>> gave
>>>>> the pull request a name with Jira number. That is a good indicator
>>>>> about
>>>>> how many read this guide.
>>>>> Putting the test in the template is a way that every one reads it.
>>>>>
>>>>>
>>>>> I am also wondering what the concern is.
>>>>> A new contributor should clearly read through a bit of text, to learn
>>>>> what
>>>>> we look for in contributions.
>>>>> A recurring contributor will not have to read it again, simply remove
>>>>> the
>>>>> text from the pull request message and go on.
>>>>>
>>>>> Where is the disadvantage?
>>>>>
>>>>>
>>>>> On Tue, Jul 18, 2017 at 5:35 PM, Nico Kruber <[hidden email]>
>>>>> wrote:
>>>>>
>>>>> I like the new template but also agree with the text being too long and
>>>>>>
>>>>>> would
>>>>>> move the intro to the contributors guide with a link in the PR
>>>>>> template.
>>>>>>
>>>>>> Regarding the questions to fill out - I'd like the headings to be
>>>>>> short
>>>>>> and
>>>>>> have the affected components last so that documentation is not lost
>>>>>> (although
>>>>>> being more important than this checklist), e.g.:
>>>>>>
>>>>>> * Purpose of the change
>>>>>> * Brief change log
>>>>>> * Verifying the change
>>>>>> * Documentation
>>>>>> * Affected components
>>>>>>
>>>>>> The verification options in the original template look a bit too large
>>>>>> but
>>>>>> it
>>>>>> stresses what tests should be added, especially for bigger changes.
>>>>>> Can't
>>>>>> think of a way to make it shorter though.
>>>>>>
>>>>>>
>>>>>> Nico
>>>>>>
>>>>>> On Tuesday, 18 July 2017 11:20:41 CEST Chesnay Schepler wrote:
>>>>>>
>>>>>>> I fully agree with Fabian.
>>>>>>>
>>>>>>> Multiple-choice questions provide little value to the reviewer, since
>>>>>>> the
>>>>>>> validity has to be verified in any case. While text answers have to
>>>>>>> be
>>>>>>> validated as well,
>>>>>>> they give some hint to the reviewer as to how it can be verified and
>>>>>>> which steps the
>>>>>>> contributor did to do so.
>>>>>>>
>>>>>>> I also agree that it is too long; IMO this is really intimidating to
>>>>>>> new
>>>>>>> contributors to be greeted with this.
>>>>>>>
>>>>>>> Ideally we only link to the contributors guide and ask 3 questions:
>>>>>>>
>>>>>>>     * What is the problem?
>>>>>>>     * How was it fixed?
>>>>>>>     * How can the fix be verified?
>>>>>>>
>>>>>>> On 18.07.2017 10:47, Fabian Hueske wrote:
>>>>>>>
>>>>>>>> I like the sections about purpose, change log, and verification of
>>>>>>>> the
>>>>>>>> changes.
>>>>>>>>
>>>>>>>> However, I think the proposed template is too much text. This is
>>>>>>>>
>>>>>>> probably
>>>>>>> the reason why the first attempt to establish a PR template failed.
>>>>>>>>
>>>>>>>> I would move most of the introduction and explanations incl.
>>>>>>>> examples
>>>>>>>>
>>>>>>> to
>>>>>>> the "Contribution Guidelines" and only pass a link.
>>>>>>>>
>>>>>>>> IMO, the template should be rather shorter than the current one and
>>>>>>>>
>>>>>>> only
>>>>>>> have the link, the sections to fill out, and checkboxes.
>>>>>>>>
>>>>>>>> I'm also not sure how much the detailed questions will help.
>>>>>>>> For example even if the question about changed dependencies is
>>>>>>>> answered
>>>>>>>> with "no", the reviewer still has to check that.
>>>>>>>>
>>>>>>>> I think the questions of the current template work differently.
>>>>>>>> A question "Does the PR include tests?" suggests to the contributor
>>>>>>>>
>>>>>>> that
>>>>>>> those should be included. Same for documentation.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Fabian
>>>>>>>>
>>>>>>>> 2017-07-18 10:05 GMT+02:00 Tzu-Li (Gordon) Tai
>>>>>>>> <[hidden email]>:
>>>>>>>>
>>>>>>>>> +1, I like this a lot.
>>>>>>>>> With the previous template, it doesn’t really resonate with what we
>>>>>>>>> should
>>>>>>>>> care about, and therefore most of the time I think contributors
>>>>>>>>> just
>>>>>>>>> delete
>>>>>>>>> that template and write down something on their own.
>>>>>>>>>
>>>>>>>>> I would also like to add: “Savepoint / checkpoint binary formats”
>>>>>>>>> to
>>>>>>>>>
>>>>>>>> the
>>>>>>>
>>>>>>> potential affect scope check list.
>>>>>>>>>
>>>>>>>>> I think changes to those deserves an independent yes / no check of
>>>>>>>>> its
>>>>>>>>> own.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Gordon
>>>>>>>>>
>>>>>>>>> On 18 July 2017 at 3:49:42 PM, Ufuk Celebi ([hidden email]) wrote:
>>>>>>>>>
>>>>>>>>> I really like this and vote to change our current template.
>>>>>>>>>
>>>>>>>>> The simple yes/no/... options are a really good idea. I would add
>>>>>>>>> to
>>>>>>>>> your email that the questions will equally help reviewers to
>>>>>>>>> remember
>>>>>>>>> to look at these things, which is just as important.
>>>>>>>>>
>>>>>>>>> When we merge this, we should make sure to strictly follow the
>>>>>>>>> guide.
>>>>>>>>> Ideally, in the long term we can even automate some of the
>>>>>>>>> yes/no/...
>>>>>>>>> questions via a bot... but let's not get ahead of ourselves here
>>>>>>>>> ;-)
>>>>>>>>>
>>>>>>>>> On Mon, Jul 17, 2017 at 6:36 PM, Stephan Ewen <[hidden email]>
>>>>>>>>>
>>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi all!
>>>>>>>>>>
>>>>>>>>>> I have reflected a bit on the pull requests and on some of the
>>>>>>>>>> recent
>>>>>>>>>> changes to Flink and some of the introduced bugs / regressions
>>>>>>>>>> that
>>>>>>>>>>
>>>>>>>>> we
>>>>>>>
>>>>>>> have
>>>>>>>>>
>>>>>>>>> fixed.
>>>>>>>>>>
>>>>>>>>>> One thing that I think would have helped is to have more explicit
>>>>>>>>>> information about what the pull request does and how the
>>>>>>>>>> contributor
>>>>>>>>>>
>>>>>>>>> would
>>>>>>>>>
>>>>>>>>> suggest to verify it. I have seen this when contributing to some
>>>>>>>>> other
>>>>>>>
>>>>>>> project and really liked the approach.
>>>>>>>>>>
>>>>>>>>>> It requires that a contributor takes a minute to reflect on what
>>>>>>>>>> was
>>>>>>>>>> touched, and what would be ways to verify that the changes work
>>>>>>>>>> properly.
>>>>>>>>>> Besides being a help to the reviewer, it also makes contributors
>>>>>>>>>>
>>>>>>>>> aware
>>>>>>>
>>>>>>> of
>>>>>>>>>>
>>>>>>>>>> what is important during the review process.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I suggest a new pull request template, as attached below, with a
>>>>>>>>>>
>>>>>>>>> preview
>>>>>>>
>>>>>>> here:
>>>>>>>>>>
>>>>>>>>>> https://github.com/StephanEwen/incubator-flink/
>>>>>>>>>>
>>>>>>>>> blob/pr_template/.github/PULL_REQUEST_TEMPLATE.md
>>>>>>>>>
>>>>>>>>> Don't be scared, it looks long, but a big part is the introductory
>>>>>>>>> text
>>>>>>>
>>>>>>> (only relevant for new contributors) and the examples contents for
>>>>>>>>>
>>>>>>>>> the
>>>>>>>
>>>>>>> description.
>>>>>>>>>>
>>>>>>>>>> Filling this out for code that is in shape should be a quick
>>>>>>>>>> thing:
>>>>>>>>>>
>>>>>>>>> Remove
>>>>>>>>>
>>>>>>>>> the into and checklist, write a few sentences on what the PR does
>>>>>>>>> (one
>>>>>>>
>>>>>>> should do that anyways) and then pick some yes/no in the
>>>>>>>>>
>>>>>>>>> classification
>>>>>>>
>>>>>>> section.
>>>>>>>>>>
>>>>>>>>>> Curious to hear what you think!
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Stephan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ============================
>>>>>>>>>>
>>>>>>>>>> Full suggested pull request template:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Thank you very much for contributing to Apache Flink - we are
>>>>>>>>>> happy
>>>>>>>>>> that
>>>>>>>>>> you want to help us improve Flink. To help the community review
>>>>>>>>>> you
>>>>>>>>>> contribution in the best possible way, please go through the
>>>>>>>>>>
>>>>>>>>> checklist
>>>>>>>
>>>>>>> below, which will get the contribution into a shape in which it can
>>>>>>>>>
>>>>>>>>> be
>>>>>>>
>>>>>>> best
>>>>>>>>>
>>>>>>>>> reviewed.*
>>>>>>>>>>
>>>>>>>>>> *Please understand that we do not do this to make contributions to
>>>>>>>>>>
>>>>>>>>> Flink
>>>>>>>
>>>>>>> a
>>>>>>>>>
>>>>>>>>> hassle. In order to uphold a high standard of quality for code
>>>>>>>>>>
>>>>>>>>>> contributions, while at the same time managing a large number of
>>>>>>>>>> contributions, we need contributors to prepare the contributions
>>>>>>>>>>
>>>>>>>>> well,
>>>>>>>
>>>>>>> and
>>>>>>>>>
>>>>>>>>> give reviewers enough contextual information for the review. Please
>>>>>>>>> also
>>>>>>>
>>>>>>> understand that contributions that do not follow this guide will take
>>>>>>>>>>
>>>>>>>>>> longer to review and thus typically be picked up with lower
>>>>>>>>>> priority
>>>>>>>>>>
>>>>>>>>> by
>>>>>>>
>>>>>>> the
>>>>>>>>>
>>>>>>>>> community.*
>>>>>>>>>>
>>>>>>>>>> ## Contribution Checklist
>>>>>>>>>>
>>>>>>>>>> - Make sure that the pull request corresponds to a [JIRA issue](
>>>>>>>>>> https://issues.apache.org/jira/projects/FLINK/issues). Exceptions
>>>>>>>>>>
>>>>>>>>> are
>>>>>>>
>>>>>>> made
>>>>>>>>>
>>>>>>>>> for typos in JavaDoc or documentation files, which need no JIRA
>>>>>>>>> issue.
>>>>>>>
>>>>>>> - Name the pull request in the form "[FLINK-1234] [component] Title
>>>>>>>>>
>>>>>>>>> of
>>>>>>>
>>>>>>> the pull request", where *FLINK-1234* should be replaced by the
>>>>>>>>>
>>>>>>>>> actual
>>>>>>>
>>>>>>> issue number. Skip *component* if you are unsure about which is the
>>>>>>>>>
>>>>>>>>> best
>>>>>>>
>>>>>>> component.
>>>>>>>>>>
>>>>>>>>>> Typo fixes that have no associated JIRA issue should be named
>>>>>>>>>>
>>>>>>>>> following
>>>>>>>
>>>>>>> this pattern: `[hotfix] [docs] Fix typo in event time introduction`
>>>>>>>>>
>>>>>>>>> or
>>>>>>>
>>>>>>> `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`.
>>>>>>>>>>
>>>>>>>>>> - Fill out the template below to describe the changes contributed
>>>>>>>>>> by
>>>>>>>>>>
>>>>>>>>> the
>>>>>>>
>>>>>>> pull request. That will give reviewers the context they need to do
>>>>>>>>>
>>>>>>>>> the
>>>>>>>
>>>>>>> review.
>>>>>>>>>>
>>>>>>>>>> - Make sure that the change passes the automated tests, i.e., `mvn
>>>>>>>>>>
>>>>>>>>> clean
>>>>>>>
>>>>>>> verify`
>>>>>>>>>>
>>>>>>>>>> - Each pull request should address only one issue, not mix up code
>>>>>>>>>>
>>>>>>>>> from
>>>>>>>
>>>>>>> multiple issues.
>>>>>>>>>>
>>>>>>>>>> - Each commit in the pull request has a meaningful commit message
>>>>>>>>>> (including the JIRA id)
>>>>>>>>>>
>>>>>>>>>> - Once all items of the checklist are addressed, remove the above
>>>>>>>>>>
>>>>>>>>> text
>>>>>>>
>>>>>>> and this checklist, leaving only the filled out template below.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> **(The sections below can be removed for hotfixes of typos)**
>>>>>>>>>>
>>>>>>>>>> ## What is the purpose of the change
>>>>>>>>>>
>>>>>>>>>> *(For example: This pull request makes task deployment go through
>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>> blob
>>>>>>>>>
>>>>>>>>> server, rather than through RPC. That way we avoid re-transferring
>>>>>>>>> them
>>>>>>>
>>>>>>> on
>>>>>>>>>
>>>>>>>>> each deployment (during recovery).)*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ## Brief change log
>>>>>>>>>>
>>>>>>>>>> *(for example:)*
>>>>>>>>>> - *The TaskInfo is stored in the blob store on job creation time
>>>>>>>>>> as a
>>>>>>>>>> persistent artifact*
>>>>>>>>>> - *Deployments RPC transmits only the blob storage reference*
>>>>>>>>>> - *TaskManagers retrieve the TaskInfo from the blob cache*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ## Verifying this change
>>>>>>>>>>
>>>>>>>>>> *(Please pick either of the following options)*
>>>>>>>>>>
>>>>>>>>>> This change is a trivial rework / code cleanup without any test
>>>>>>>>>> coverage.
>>>>>>>>>>
>>>>>>>>>> *(or)*
>>>>>>>>>>
>>>>>>>>>> This change is already covered by existing tests, such as *(please
>>>>>>>>>>
>>>>>>>>> describe
>>>>>>>>>
>>>>>>>>> tests)*.
>>>>>>>>>>
>>>>>>>>>> *(or)*
>>>>>>>>>>
>>>>>>>>>> This change added tests and can be verified as follows:
>>>>>>>>>>
>>>>>>>>>> *(example:)*
>>>>>>>>>> - *Added integration tests for end-to-end deployment with large
>>>>>>>>>>
>>>>>>>>> payloads
>>>>>>>
>>>>>>> (100MB)*
>>>>>>>>>>
>>>>>>>>>> - *Extended integration test for recovery after master
>>>>>>>>>> (JobManager)
>>>>>>>>>> failure*
>>>>>>>>>> - *Added test that validates that TaskInfo is transferred only
>>>>>>>>>> once
>>>>>>>>>> across recoveries*
>>>>>>>>>> - *Manually verified the change by running a 4 node cluser with 2
>>>>>>>>>> JobManagers and 4 TaskManagers, a stateful streaming program, and
>>>>>>>>>> killing
>>>>>>>>>> one JobManager and to TaskManagers during the execution, verifying
>>>>>>>>>>
>>>>>>>>> that
>>>>>>>
>>>>>>> recovery happens correctly.*
>>>>>>>>>>
>>>>>>>>>> ## Does this pull request potentially affect one of the following
>>>>>>>>>>
>>>>>>>>> parts:
>>>>>>>
>>>>>>> - Dependencies (does it add or upgrade a dependency): **(yes / no)**
>>>>>>>>>>
>>>>>>>>>> - The public API, i.e., is any changed class annotated with
>>>>>>>>>> `@Public(Evolving)`: **(yes / no)**
>>>>>>>>>> - The serializers: **(yes / no / don't know)**
>>>>>>>>>> - The runtime per-record code paths (performance sensitive):
>>>>>>>>>> **(yes
>>>>>>>>>>
>>>>>>>>> / no
>>>>>>>
>>>>>>> / don't know)**
>>>>>>>>>>
>>>>>>>>>> - Anything that affects deployment or recovery: JobManager (and
>>>>>>>>>> its
>>>>>>>>>> components), Checkpointing, Yarn/Mesos, ZooKeeper: **(yes / no /
>>>>>>>>>>
>>>>>>>>> don't
>>>>>>>
>>>>>>> know)**:
>>>>>>>>>>
>>>>>>>>>> ## Documentation
>>>>>>>>>>
>>>>>>>>>> - Does this pull request introduce a new feature? **(yes / no)**
>>>>>>>>>> - If yes, how is the feature documented? **(not applicable / docs
>>>>>>>>>> /
>>>>>>>>>> JavaDocs / not documented)**
>>>>>>>>>>
>
12