(DEPRECATED) Apache Flink Mailing List archive.

[DISCUSS] Repository split

Classic

List

Threaded

18 messages Options

Chesnay Schepler-3

[DISCUSS] Repository split

Hello everyone,

The Flink project sees an ever-increasing amount of dev activity, both
in terms of reworked and new features.

This is of course an excellent situation to be in, but we are getting to
a point where the associate downsides are becoming increasingly troublesome.

The ever increasing build times, in addition to unstable tests,
significantly slow down the develoment process.
Additionally, pull requests for smaller features frequently slip through
the crasks as they are being buried under a mountain of other pull requests.

As a result I'd like to start a discussion on splitting the Flink
repository.

In this mail I will outline the core idea, and what problems I currently
envision.

I'd specifically like to encourage those who were part of similar
initiatives in other projects to share the experiences and ideas.

General Idea

For starters, the idea is to create a new repository for "flink-connectors".
For the remainder of this mail, the current Flink repository is referred
to as "flink-main".

There are also other candidates that we could discuss in the future,
like flink-libraries (the next top-priority repo to ease flink-ml
development), metric reporters, filesystems and flink-formats.

Moving out flink-connectors provides the most benefits, as we straight
away save at-least an hour of testing time, and not being included in
the binary distribution simplifies a few things.

Problems to solve

To make this a reality there's a number of questions we have to discuss;
some in the short-term, others in the long-term.

1) Git history

We have to decide whether we want to rewrite the history of sub
repositories to only contain diffs/commits related to this part of
Flink, or whether we just fork from some commit in flink-main and
add a commit to the connector repo that "transforms" it from
flink-main to flink-connectors (i.e., remove everything unrelated to
connectors + update module structure etc.).

The latter option would have the advantage that our commit book
keeping in JIRA would still be correct, but it would create a
significant divide between the current and past state of the repository.

2) Maven

We should look into whether there's a way to share dependency/plugin
configurations and similar, so we don't have to keep them in sync
manually across multiple repositories.

A new parent Flink pom that all repositories define as their parent
could work; this would imply splicing out part of the current room
pom.xml.

3) Documentation

Splitting the repository realistically also implies splitting the
documentation source files (At the beginning we can get by with
having it still in flink-main).
We could just move the relevant files to the respective repository
(while maintaining the directory structure), and merge them when
building the docs.

We also have to look at how we can handle java-/scaladocs; e.g.
whether it is possible to aggregate them across projects.

4) CI (end-to-end tests)

The very basic question we have to answer is whether we want E2E
tests in the sub repositories. If so, we need to find a way to share
e2e-tooling.

5) Releases

We have to discuss how our release process will look like. This may
also have repercussions on how repositories may depend on each other
(SNAPSHOT vs LATEST). Note that this should be discussed for each
repo separately.

The current options I see are the following:

a) Single release

Release all repositories at once as a single product.

The source release would be a collection of repositories, like
flink/
|--flink-main/
    |--flink-core/
    |--flink-runtime/
    ...
|--flink-connectors/
    ...
|--flink-.../
...

This option requires a SNAPSHOT dependency between Flink
repositories, but it is pretty much how things work at the moment.

b) Synced releases

Similar to a), except that each repository gets their own source
release that they may released independent of other repositories.
For a given release cycle each repo would produce exactly one
release.

This option requires a SNAPSHOT dependency between Flink
repositories. Once any repositories has created an RC or
finished it's release, release-branches in other repos can
switch to that version.

This approach is a tad more flexible than a), but requires more
coordination between the repos.

c) Separate releases

Just like we handle flink-shaded; entirely separate release
cycles; some repositories may have more releases in a given time
period than others.

This option implies a LATEST dependency between Flink repositories.

Note that hybrid approaches would also make sense, like doing b) for
major versions and c) for bugfix releases.

For something like flink-libraries this question may also have
repercussions on how/whether they are bundled in the distribution;
options a)/b) would maintain the status-quo, c) and hybrid
approaches will likely necessitate the exclusion from the distribution.

Piotr Nowojski-3

Re: [DISCUSS] Repository split

Hi,

Thanks for proposing and writing this down Chesney.

Generally speaking +1 from my side for the idea. It will create additional pain for cross repository development, like some new feature in connectors that need some change in the main repository. I’ve worked in such setup before and the teams then regretted having such split. But I agree that we should try this to try solve the stability/build time issues.

I have no experience in making such kind of splits so I can not help here.

I would like to also raise an additional issue: currently quite some bugs (like release blockers [1]) are being discovered by ITCases of the connectors. It means that at least initially, the main repository will lose some test coverage.

Piotrek

[1] https://issues.apache.org/jira/browse/FLINK-13593 <https://issues.apache.org/jira/browse/FLINK-13593>

> On 7 Aug 2019, at 13:14, Chesnay Schepler <[hidden email]> wrote:
>
> Hello everyone,
>
> The Flink project sees an ever-increasing amount of dev activity, both in terms of reworked and new features.
>
> This is of course an excellent situation to be in, but we are getting to a point where the associate downsides are becoming increasingly troublesome.
>
> The ever increasing build times, in addition to unstable tests, significantly slow down the develoment process.
> Additionally, pull requests for smaller features frequently slip through the crasks as they are being buried under a mountain of other pull requests.
>
> As a result I'd like to start a discussion on splitting the Flink repository.
>
> In this mail I will outline the core idea, and what problems I currently envision.
>
> I'd specifically like to encourage those who were part of similar initiatives in other projects to share the experiences and ideas.
>
>
> General Idea
>
> For starters, the idea is to create a new repository for "flink-connectors".
> For the remainder of this mail, the current Flink repository is referred to as "flink-main".
>
> There are also other candidates that we could discuss in the future, like flink-libraries (the next top-priority repo to ease flink-ml development), metric reporters, filesystems and flink-formats.
>
> Moving out flink-connectors provides the most benefits, as we straight away save at-least an hour of testing time, and not being included in the binary distribution simplifies a few things.
>
>
> Problems to solve
>
> To make this a reality there's a number of questions we have to discuss; some in the short-term, others in the long-term.
>
> 1) Git history
>
> We have to decide whether we want to rewrite the history of sub
> repositories to only contain diffs/commits related to this part of
> Flink, or whether we just fork from some commit in flink-main and
> add a commit to the connector repo that "transforms" it from
> flink-main to flink-connectors (i.e., remove everything unrelated to
> connectors + update module structure etc.).
>
> The latter option would have the advantage that our commit book
> keeping in JIRA would still be correct, but it would create a
> significant divide between the current and past state of the repository.
>
> 2) Maven
>
> We should look into whether there's a way to share dependency/plugin
> configurations and similar, so we don't have to keep them in sync
> manually across multiple repositories.
>
> A new parent Flink pom that all repositories define as their parent
> could work; this would imply splicing out part of the current room
> pom.xml.
>
> 3) Documentation
>
> Splitting the repository realistically also implies splitting the
> documentation source files (At the beginning we can get by with
> having it still in flink-main).
> We could just move the relevant files to the respective repository
> (while maintaining the directory structure), and merge them when
> building the docs.
>
> We also have to look at how we can handle java-/scaladocs; e.g.
> whether it is possible to aggregate them across projects.
>
> 4) CI (end-to-end tests)
>
> The very basic question we have to answer is whether we want E2E
> tests in the sub repositories. If so, we need to find a way to share
> e2e-tooling.
>
> 5) Releases
>
> We have to discuss how our release process will look like. This may
> also have repercussions on how repositories may depend on each other
> (SNAPSHOT vs LATEST). Note that this should be discussed for each
> repo separately.
>
> The current options I see are the following:
>
> a) Single release
>
> Release all repositories at once as a single product.
>
> The source release would be a collection of repositories, like
> flink/
> |--flink-main/
> |--flink-core/
> |--flink-runtime/
> ...
> |--flink-connectors/
> ...
> |--flink-.../
> ...
>
> This option requires a SNAPSHOT dependency between Flink
> repositories, but it is pretty much how things work at the moment.
>
> b) Synced releases
>
> Similar to a), except that each repository gets their own source
> release that they may released independent of other repositories.
> For a given release cycle each repo would produce exactly one
> release.
>
> This option requires a SNAPSHOT dependency between Flink
> repositories. Once any repositories has created an RC or
> finished it's release, release-branches in other repos can
> switch to that version.
>
> This approach is a tad more flexible than a), but requires more
> coordination between the repos.
>
> c) Separate releases
>
> Just like we handle flink-shaded; entirely separate release
> cycles; some repositories may have more releases in a given time
> period than others.
>
> This option implies a LATEST dependency between Flink repositories.
>
> Note that hybrid approaches would also make sense, like doing b) for
> major versions and c) for bugfix releases.
>
> For something like flink-libraries this question may also have
> repercussions on how/whether they are bundled in the distribution;
> options a)/b) would maintain the status-quo, c) and hybrid
> approaches will likely necessitate the exclusion from the distribution.
>

Chesnay Schepler-3

Re: [DISCUSS] Repository split

> I would like to also raise an additional issue: currently quite some
bugs (like release blockers [1]) are being discovered by ITCases of the
connectors. It means that at least initially, the main repository will
lose some test coverage.

True, but I think this is more a symptom of us not properly testing the
contracts that are exposed to connectors.
That we lose lose test coverage is already a big red flag as it implies
that issues were fixed and are now verified by a connector test, and not
by a test in the Flink core.
We could also look into tooling surrounding the CI bot for running the
connectors tests on-demand, although this is very much long-term.

On 08/08/2019 13:14, Piotr Nowojski wrote:

> Hi,
>
> Thanks for proposing and writing this down Chesney.
>
> Generally speaking +1 from my side for the idea. It will create additional pain for cross repository development, like some new feature in connectors that need some change in the main repository. I’ve worked in such setup before and the teams then regretted having such split. But I agree that we should try this to try solve the stability/build time issues.
>
> I have no experience in making such kind of splits so I can not help here.
>
> I would like to also raise an additional issue: currently quite some bugs (like release blockers [1]) are being discovered by ITCases of the connectors. It means that at least initially, the main repository will lose some test coverage.
>
> Piotrek
>
> [1] https://issues.apache.org/jira/browse/FLINK-13593 <https://issues.apache.org/jira/browse/FLINK-13593>
>
>> On 7 Aug 2019, at 13:14, Chesnay Schepler <[hidden email]> wrote:
>>
>> Hello everyone,
>>
>> The Flink project sees an ever-increasing amount of dev activity, both in terms of reworked and new features.
>>
>> This is of course an excellent situation to be in, but we are getting to a point where the associate downsides are becoming increasingly troublesome.
>>
>> The ever increasing build times, in addition to unstable tests, significantly slow down the develoment process.
>> Additionally, pull requests for smaller features frequently slip through the crasks as they are being buried under a mountain of other pull requests.
>>
>> As a result I'd like to start a discussion on splitting the Flink repository.
>>
>> In this mail I will outline the core idea, and what problems I currently envision.
>>
>> I'd specifically like to encourage those who were part of similar initiatives in other projects to share the experiences and ideas.
>>
>>
>> General Idea
>>
>> For starters, the idea is to create a new repository for "flink-connectors".
>> For the remainder of this mail, the current Flink repository is referred to as "flink-main".
>>
>> There are also other candidates that we could discuss in the future, like flink-libraries (the next top-priority repo to ease flink-ml development), metric reporters, filesystems and flink-formats.
>>
>> Moving out flink-connectors provides the most benefits, as we straight away save at-least an hour of testing time, and not being included in the binary distribution simplifies a few things.
>>
>>
>> Problems to solve
>>
>> To make this a reality there's a number of questions we have to discuss; some in the short-term, others in the long-term.
>>
>> 1) Git history
>>
>> We have to decide whether we want to rewrite the history of sub
>> repositories to only contain diffs/commits related to this part of
>> Flink, or whether we just fork from some commit in flink-main and
>> add a commit to the connector repo that "transforms" it from
>> flink-main to flink-connectors (i.e., remove everything unrelated to
>> connectors + update module structure etc.).
>>
>> The latter option would have the advantage that our commit book
>> keeping in JIRA would still be correct, but it would create a
>> significant divide between the current and past state of the repository.
>>
>> 2) Maven
>>
>> We should look into whether there's a way to share dependency/plugin
>> configurations and similar, so we don't have to keep them in sync
>> manually across multiple repositories.
>>
>> A new parent Flink pom that all repositories define as their parent
>> could work; this would imply splicing out part of the current room
>> pom.xml.
>>
>> 3) Documentation
>>
>> Splitting the repository realistically also implies splitting the
>> documentation source files (At the beginning we can get by with
>> having it still in flink-main).
>> We could just move the relevant files to the respective repository
>> (while maintaining the directory structure), and merge them when
>> building the docs.
>>
>> We also have to look at how we can handle java-/scaladocs; e.g.
>> whether it is possible to aggregate them across projects.
>>
>> 4) CI (end-to-end tests)
>>
>> The very basic question we have to answer is whether we want E2E
>> tests in the sub repositories. If so, we need to find a way to share
>> e2e-tooling.
>>
>> 5) Releases
>>
>> We have to discuss how our release process will look like. This may
>> also have repercussions on how repositories may depend on each other
>> (SNAPSHOT vs LATEST). Note that this should be discussed for each
>> repo separately.
>>
>> The current options I see are the following:
>>
>> a) Single release
>>
>> Release all repositories at once as a single product.
>>
>> The source release would be a collection of repositories, like
>> flink/
>> |--flink-main/
>> |--flink-core/
>> |--flink-runtime/
>> ...
>> |--flink-connectors/
>> ...
>> |--flink-.../
>> ...
>>
>> This option requires a SNAPSHOT dependency between Flink
>> repositories, but it is pretty much how things work at the moment.
>>
>> b) Synced releases
>>
>> Similar to a), except that each repository gets their own source
>> release that they may released independent of other repositories.
>> For a given release cycle each repo would produce exactly one
>> release.
>>
>> This option requires a SNAPSHOT dependency between Flink
>> repositories. Once any repositories has created an RC or
>> finished it's release, release-branches in other repos can
>> switch to that version.
>>
>> This approach is a tad more flexible than a), but requires more
>> coordination between the repos.
>>
>> c) Separate releases
>>
>> Just like we handle flink-shaded; entirely separate release
>> cycles; some repositories may have more releases in a given time
>> period than others.
>>
>> This option implies a LATEST dependency between Flink repositories.
>>
>> Note that hybrid approaches would also make sense, like doing b) for
>> major versions and c) for bugfix releases.
>>
>> For something like flink-libraries this question may also have
>> repercussions on how/whether they are bundled in the distribution;
>> options a)/b) would maintain the status-quo, c) and hybrid
>> approaches will likely necessitate the exclusion from the distribution.
>>
>

dwysakowicz

Re: [DISCUSS] Repository split

First of all I don't have much(if not at all) experience with working with a multi repository project of Flink's size. I would like to mention a few thoughts of mine, though. In general I am slightly against splitting the repository. I fear that what we actually want to do is to introduce double standards for different modules with the repository split.

As I understand there are two issues we want to solve with the split:

1) long build/testing time

2) increasing number of PRs

Ad. 1 I agree this is a problem and that we don't necessarily need to run all the tests with every change or build the whole project all the time. However, I think we could achieve that in a single repository and at the same time keep the option to build all modules at once. If I am not mistaken this the approach that Apache Beam community decided to take (see e.g. https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PreCommit_Java.groovy where they define paths to file that if changed trigger the corresponding CI job). Maybe we could make it easier if we restructure the repository? To something like:

|--root.pom

In my opinion the Releases section from Chesnay's message shows well that it might not be the best option to split the repository. The option a) looks for me equivalent to what I suggested above but with a split. The option b) looks for me super complicated and I can see no benefit over option a). The option c) would be the most reasonable one if we decided to split the repository, if you ask me. The problem with this approach is the compatibility matrix (which versions of connectors work with which versions of Flink?). Moreover, for me it is an indicator of what I mentioned that we introduce double standards for those modules. I am not saying that I am totally against that, but I think this should be a conscious decision.

Ad. 2 I can't see how repository split could help with that rather than moving some of the PRs to a separate list (that probably even less people would look into). Also I think we can achieve something like that already with github filters, no?

To sum up my thoughts:

I think it is a good idea to split our CI builds to sub-modules (connectors being the first candidate), that would trigger on a changed path basis, but without splitting the repo.
My feeling is that the real question is if we want to change our stability guarantees of certain modules to be "just best effort".
If we were to vote on this proposal I would vote -0. I am slightly against this change, but wouldn't oppose.

Best,

Dawid

On 08/08/2019 13:23, Chesnay Schepler wrote:

> I would like to also raise an additional issue: currently quite some bugs (like release blockers [1]) are being discovered by ITCases of the connectors. It means that at least initially, the main repository will lose some test coverage.

True, but I think this is more a symptom of us not properly testing the contracts that are exposed to connectors.
That we lose lose test coverage is already a big red flag as it implies that issues were fixed and are now verified by a connector test, and not by a test in the Flink core.
We could also look into tooling surrounding the CI bot for running the connectors tests on-demand, although this is very much long-term.

On 08/08/2019 13:14, Piotr Nowojski wrote:

Hi,

Thanks for proposing and writing this down Chesney.

Generally speaking +1 from my side for the idea. It will create additional pain for cross repository development, like some new feature in connectors that need some change in the main repository. I’ve worked in such setup before and the teams then regretted having such split. But I agree that we should try this to try solve the stability/build time issues.

I have no experience in making such kind of splits so I can not help here.

I would like to also raise an additional issue: currently quite some bugs (like release blockers [1]) are being discovered by ITCases of the connectors. It means that at least initially, the main repository will lose some test coverage.

Piotrek

[1] https://issues.apache.org/jira/browse/FLINK-13593 <https://issues.apache.org/jira/browse/FLINK-13593>

On 7 Aug 2019, at 13:14, Chesnay Schepler [hidden email] wrote:

Hello everyone,

The Flink project sees an ever-increasing amount of dev activity, both in terms of reworked and new features.

This is of course an excellent situation to be in, but we are getting to a point where the associate downsides are becoming increasingly troublesome.

The ever increasing build times, in addition to unstable tests, significantly slow down the develoment process.
Additionally, pull requests for smaller features frequently slip through the crasks as they are being buried under a mountain of other pull requests.

As a result I'd like to start a discussion on splitting the Flink repository.

In this mail I will outline the core idea, and what problems I currently envision.

I'd specifically like to encourage those who were part of similar initiatives in other projects to share the experiences and ideas.

       General Idea

For starters, the idea is to create a new repository for "flink-connectors".
For the remainder of this mail, the current Flink repository is referred to as "flink-main".

There are also other candidates that we could discuss in the future, like flink-libraries (the next top-priority repo to ease flink-ml development), metric reporters, filesystems and flink-formats.

Moving out flink-connectors provides the most benefits, as we straight away save at-least an hour of testing time, and not being included in the binary distribution simplifies a few things.

       Problems to solve

To make this a reality there's a number of questions we have to discuss; some in the short-term, others in the long-term.

1) Git history

   We have to decide whether we want to rewrite the history of sub
   repositories to only contain diffs/commits related to this part of
   Flink, or whether we just fork from some commit in flink-main and
   add a commit to the connector repo that "transforms" it from
   flink-main to flink-connectors (i.e., remove everything unrelated to
   connectors + update module structure etc.).

   The latter option would have the advantage that our commit book
   keeping in JIRA would still be correct, but it would create a
   significant divide between the current and past state of the repository.

2) Maven

   We should look into whether there's a way to share dependency/plugin
   configurations and similar, so we don't have to keep them in sync
   manually across multiple repositories.

   A new parent Flink pom that all repositories define as their parent
   could work; this would imply splicing out part of the current room
   pom.xml.

3) Documentation

   Splitting the repository realistically also implies splitting the
   documentation source files (At the beginning we can get by with
   having it still in flink-main).
   We could just move the relevant files to the respective repository
   (while maintaining the directory structure), and merge them when
   building the docs.

   We also have to look at how we can handle java-/scaladocs; e.g.
   whether it is possible to aggregate them across projects.

4) CI (end-to-end tests)

   The very basic question we have to answer is whether we want E2E
   tests in the sub repositories. If so, we need to find a way to share
   e2e-tooling.

5) Releases

   We have to discuss how our release process will look like. This may
   also have repercussions on how repositories may depend on each other
   (SNAPSHOT vs LATEST). Note that this should be discussed for each
   repo separately.

   The current options I see are the following:

   a) Single release

       Release all repositories at once as a single product.

       The source release would be a collection of repositories, like
       flink/
       |--flink-main/
           |--flink-core/
           |--flink-runtime/
           ...
       |--flink-connectors/
           ...
       |--flink-.../
       ...

       This option requires a SNAPSHOT dependency between Flink
       repositories, but it is pretty much how things work at the moment.

   b) Synced releases

       Similar to a), except that each repository gets their own source
       release that they may released independent of other repositories.
       For a given release cycle each repo would produce exactly one
       release.

       This option requires a SNAPSHOT dependency between Flink
       repositories. Once any repositories has created an RC or
       finished it's release, release-branches in other repos can
       switch to that version.

       This approach is a tad more flexible than a), but requires more
       coordination between the repos.

   c) Separate releases

       Just like we handle flink-shaded; entirely separate release
       cycles; some repositories may have more releases in a given time
       period than others.

       This option implies a LATEST dependency between Flink repositories.

   Note that hybrid approaches would also make sense, like doing b) for
   major versions and c) for bugfix releases.

   For something like flink-libraries this question may also have
   repercussions on how/whether they are bundled in the distribution;
   options a)/b) would maintain the status-quo, c) and hybrid
   approaches will likely necessitate the exclusion from the distribution.

signature.asc (849 bytes) Download Attachment

David Morávek

Re: [DISCUSS] Repository split

+1 for the motivation, -1 for the solution as all of the problems mention
above can be addressed with the mono-repo as well.

Multiple repositories:
1) This creates a big pain in case of change that targets code base in
multiple repositories. Change needs to be split in multiple PRs, that need
to be reviewed separately, merged in proper order, otherwise CI would fail
(also you need to rebuild "dependent PR", once its dependency gets merged -
this will just result in a lot of false positive PR build failures). Also
if the change needs to be cherry-picked into multiple releases, it's really
easy to make a mistake.
2) PR builds are not reproducible in case you depend on SNAPSHOTS.
3) It makes release management way harder as all the parts are versioned
separately.
4) Refactoring over multi repositories.
5) For newcomers, it's way harder to contribute, as the local setup gets
complicated. Also depending on SNAPSHOTS from other project, can be very
frustrating for people that are not too familiar with dep. management, as
it often leads to unpredictable behavior due to local cache etc...

The increased build / testing time, does not imply that the repository is
too big, but that the current build system is not setup correctly (eg.
checkstyle takes for ages on my box, ...) / user is unaware of how to
leverage the current build system (eg. does not need to build everything
from scratch every time he makes a change; can be improved in docs).

In case of CI, as Dawid already mentioned, you only need to trigger build /
tests for the code you have changed and it's dependents. This should
greatly improve runtime of CI builds.

D.

On Thu, Aug 8, 2019 at 4:19 PM Dawid Wysakowicz <[hidden email]>
wrote:

> First of all I don't have much(if not at all) experience with working with
> a multi repository project of Flink's size. I would like to mention a few
> thoughts of mine, though. In general I am slightly against splitting the
> repository. I fear that what we actually want to do is to introduce double
> standards for different modules with the repository split.
>
> As I understand there are two issues we want to solve with the split:
>
> 1) long build/testing time
>
> 2) increasing number of PRs
>
> Ad. 1 I agree this is a problem and that we don't necessarily need to run
> all the tests with every change or build the whole project all the time.
> However, I think we could achieve that in a single repository and at the
> same time keep the option to build all modules at once. If I am not
> mistaken this the approach that Apache Beam community decided to take (see
> e.g.
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PreCommit_Java.groovy
> where they define paths to file that if changed trigger the corresponding
> CI job). Maybe we could make it easier if we restructure the repository? To
> something like:
>
> flink/
> |--flink-main/
> |--flink-core/
> |--flink-runtime/
> ...
> |--flink-connectors/
> ...
> |--flink-filesystems.../
> ...
>
> |--root.pom
>
> In my opinion the Releases section from Chesnay's message shows well that
> it might not be the best option to split the repository. The option a)
> looks for me equivalent to what I suggested above but with a split. The
> option b) looks for me super complicated and I can see no benefit over
> option a). The option c) would be the most reasonable one if we decided to
> split the repository, if you ask me. The problem with this approach is the
> compatibility matrix (which versions of connectors work with which versions
> of Flink?). Moreover, for me it is an indicator of what I mentioned that we
> introduce double standards for those modules. I am not saying that I am
> totally against that, but I think this should be a conscious decision.
>
> Ad. 2 I can't see how repository split could help with that rather than
> moving some of the PRs to a separate list (that probably even less people
> would look into). Also I think we can achieve something like that already
> with github filters, no?
>
> To sum up my thoughts:
>
> 1. I think it is a good idea to split our CI builds to sub-modules
> (connectors being the first candidate), that would trigger on a changed
> path basis, but without splitting the repo.
> 2. My feeling is that the real question is if we want to change our
> stability guarantees of certain modules to be "just best effort".
> 3. If we were to vote on this proposal I would vote -0. I am slightly
> against this change, but wouldn't oppose.
>
> Best,
>
> Dawid
> On 08/08/2019 13:23, Chesnay Schepler wrote:
>
> > I would like to also raise an additional issue: currently quite some
> bugs (like release blockers [1]) are being discovered by ITCases of the
> connectors. It means that at least initially, the main repository will lose
> some test coverage.
>
> True, but I think this is more a symptom of us not properly testing the
> contracts that are exposed to connectors.
> That we lose lose test coverage is already a big red flag as it implies
> that issues were fixed and are now verified by a connector test, and not by
> a test in the Flink core.
> We could also look into tooling surrounding the CI bot for running the
> connectors tests on-demand, although this is very much long-term.
>
> On 08/08/2019 13:14, Piotr Nowojski wrote:
>
> Hi,
>
> Thanks for proposing and writing this down Chesney.
>
> Generally speaking +1 from my side for the idea. It will create additional
> pain for cross repository development, like some new feature in connectors
> that need some change in the main repository. I’ve worked in such setup
> before and the teams then regretted having such split. But I agree that we
> should try this to try solve the stability/build time issues.
>
> I have no experience in making such kind of splits so I can not help here.
>
> I would like to also raise an additional issue: currently quite some bugs
> (like release blockers [1]) are being discovered by ITCases of the
> connectors. It means that at least initially, the main repository will lose
> some test coverage.
>
> Piotrek
>
> [1] https://issues.apache.org/jira/browse/FLINK-13593
> <https://issues.apache.org/jira/browse/FLINK-13593>
> <https://issues.apache.org/jira/browse/FLINK-13593>
>
> On 7 Aug 2019, at 13:14, Chesnay Schepler <[hidden email]>
> <[hidden email]> wrote:
>
> Hello everyone,
>
> The Flink project sees an ever-increasing amount of dev activity, both in
> terms of reworked and new features.
>
> This is of course an excellent situation to be in, but we are getting to a
> point where the associate downsides are becoming increasingly troublesome.
>
> The ever increasing build times, in addition to unstable tests,
> significantly slow down the develoment process.
> Additionally, pull requests for smaller features frequently slip through
> the crasks as they are being buried under a mountain of other pull
> requests.
>
> As a result I'd like to start a discussion on splitting the Flink
> repository.
>
> In this mail I will outline the core idea, and what problems I currently
> envision.
>
> I'd specifically like to encourage those who were part of similar
> initiatives in other projects to share the experiences and ideas.
>
>
> General Idea
>
> For starters, the idea is to create a new repository for
> "flink-connectors".
> For the remainder of this mail, the current Flink repository is referred
> to as "flink-main".
>
> There are also other candidates that we could discuss in the future, like
> flink-libraries (the next top-priority repo to ease flink-ml development),
> metric reporters, filesystems and flink-formats.
>
> Moving out flink-connectors provides the most benefits, as we straight
> away save at-least an hour of testing time, and not being included in the
> binary distribution simplifies a few things.
>
>
> Problems to solve
>
> To make this a reality there's a number of questions we have to discuss;
> some in the short-term, others in the long-term.
>
> 1) Git history
>
> We have to decide whether we want to rewrite the history of sub
> repositories to only contain diffs/commits related to this part of
> Flink, or whether we just fork from some commit in flink-main and
> add a commit to the connector repo that "transforms" it from
> flink-main to flink-connectors (i.e., remove everything unrelated to
> connectors + update module structure etc.).
>
> The latter option would have the advantage that our commit book
> keeping in JIRA would still be correct, but it would create a
> significant divide between the current and past state of the
> repository.
>
> 2) Maven
>
> We should look into whether there's a way to share dependency/plugin
> configurations and similar, so we don't have to keep them in sync
> manually across multiple repositories.
>
> A new parent Flink pom that all repositories define as their parent
> could work; this would imply splicing out part of the current room
> pom.xml.
>
> 3) Documentation
>
> Splitting the repository realistically also implies splitting the
> documentation source files (At the beginning we can get by with
> having it still in flink-main).
> We could just move the relevant files to the respective repository
> (while maintaining the directory structure), and merge them when
> building the docs.
>
> We also have to look at how we can handle java-/scaladocs; e.g.
> whether it is possible to aggregate them across projects.
>
> 4) CI (end-to-end tests)
>
> The very basic question we have to answer is whether we want E2E
> tests in the sub repositories. If so, we need to find a way to share
> e2e-tooling.
>
> 5) Releases
>
> We have to discuss how our release process will look like. This may
> also have repercussions on how repositories may depend on each other
> (SNAPSHOT vs LATEST). Note that this should be discussed for each
> repo separately.
>
> The current options I see are the following:
>
> a) Single release
>
> Release all repositories at once as a single product.
>
> The source release would be a collection of repositories, like
> flink/
> |--flink-main/
> |--flink-core/
> |--flink-runtime/
> ...
> |--flink-connectors/
> ...
> |--flink-.../
> ...
>
> This option requires a SNAPSHOT dependency between Flink
> repositories, but it is pretty much how things work at the moment.
>
> b) Synced releases
>
> Similar to a), except that each repository gets their own source
> release that they may released independent of other repositories.
> For a given release cycle each repo would produce exactly one
> release.
>
> This option requires a SNAPSHOT dependency between Flink
> repositories. Once any repositories has created an RC or
> finished it's release, release-branches in other repos can
> switch to that version.
>
> This approach is a tad more flexible than a), but requires more
> coordination between the repos.
>
> c) Separate releases
>
> Just like we handle flink-shaded; entirely separate release
> cycles; some repositories may have more releases in a given time
> period than others.
>
> This option implies a LATEST dependency between Flink repositories.
>
> Note that hybrid approaches would also make sense, like doing b) for
> major versions and c) for bugfix releases.
>
> For something like flink-libraries this question may also have
> repercussions on how/whether they are bundled in the distribution;
> options a)/b) would maintain the status-quo, c) and hybrid
> approaches will likely necessitate the exclusion from the distribution.
>
>
>
>

Piotr Nowojski-3

Re: [DISCUSS] Repository split

Hey,

I retract my +1 (at least temporarily, until we discuss about alternative solutions).

>> I would like to also raise an additional issue: currently quite some bugs (like release blockers [1]) are being discovered by ITCases of the connectors. It means that at least initially, the main repository will lose some test coverage.
>>
> True, but I think this is more a symptom of us not properly testing the contracts that are exposed to connectors.

Sure. In ideal world we should have properly test coverage and self-contained modules. In reality, especially when it comes to weird and quirky race conditions, some executions paths/races are triggered only in specific scenarios. For example when test is written in a very special way, or there are special timing constrains.

I’m not saying that this should block the split, but it is something that might need to be taken into account. Even if no immediate action required, core/runtime modules contributors must be aware of small coverage and that they should also monitor from time to time test failures in the connectors.

Re David and Dawid.

I agree that this can create big pains from time to time. However if we do the split correctly, along reasonably stable APIs boundaries, it should be rare that some development effort requires changes/refactoring in the core modules. Personally I’m only aware of one case when this would be needed in the past two years in Flink: when adding Kafka 0.11 connector, I was also adding `TwoPhaseCommitSinkFunction`. And until Kafka 0.11 connector has stabilised, there were at least couple of changes added later to the `TwoPhaseCommitSinkFunction` in order for Kafka 0.11 connector to work (like transaction time outs).

If we have counter proposal, let's talk it through.

> In case of CI, as Dawid already mentioned, you only need to trigger build /
> tests for the code you have changed and it's dependents. This should
> greatly improve runtime of CI builds.

However when we are doing change to network stack, in perfect setup, with good test coverage in `Flink-runtime` module, we shouldn’t be running connector or flink-ml tests (as long as we are not modifying the behaviour or public apis). So triggering tests based on the dependencies would only half solve the problem.

Besides that, there are two more benefits of repository split:

1. Test instabilities/intermittent failures of sub modules (connectors/flink-ml/flink-python/table-api) were causing us much more problems in the recent months, slowing down the development of lower level modules. The more such modules we have, the more developers we have, it means that even assuming that we maintain our current standards, the sheer number of intermittent failures will grow. If we comparmentize the repository into smaller one, we reduce the global probability of build failure (now the probability of a single build failure is P(Flink-core fails) + P(connector fails) + P(flink-ml fails) + … )

But maybe we could also solve this with a more clever/better build script? Defining test boundary - that connector tests are executed ONLY if the connector code was changed?

Piotrek

> On 8 Aug 2019, at 17:16, David Morávek <[hidden email]> wrote:
>
> +1 for the motivation, -1 for the solution as all of the problems mention
> above can be addressed with the mono-repo as well.
>
> Multiple repositories:
> 1) This creates a big pain in case of change that targets code base in
> multiple repositories. Change needs to be split in multiple PRs, that need
> to be reviewed separately, merged in proper order, otherwise CI would fail
> (also you need to rebuild "dependent PR", once its dependency gets merged -
> this will just result in a lot of false positive PR build failures). Also
> if the change needs to be cherry-picked into multiple releases, it's really
> easy to make a mistake.
> 2) PR builds are not reproducible in case you depend on SNAPSHOTS.
> 3) It makes release management way harder as all the parts are versioned
> separately.
> 4) Refactoring over multi repositories.
> 5) For newcomers, it's way harder to contribute, as the local setup gets
> complicated. Also depending on SNAPSHOTS from other project, can be very
> frustrating for people that are not too familiar with dep. management, as
> it often leads to unpredictable behavior due to local cache etc...
>
> The increased build / testing time, does not imply that the repository is
> too big, but that the current build system is not setup correctly (eg.
> checkstyle takes for ages on my box, ...) / user is unaware of how to
> leverage the current build system (eg. does not need to build everything
> from scratch every time he makes a change; can be improved in docs).
>
> In case of CI, as Dawid already mentioned, you only need to trigger build /
> tests for the code you have changed and it's dependents. This should
> greatly improve runtime of CI builds.
>
> D.
>
> On Thu, Aug 8, 2019 at 4:19 PM Dawid Wysakowicz <[hidden email] <mailto:[hidden email]>>
> wrote:
>
>> First of all I don't have much(if not at all) experience with working with
>> a multi repository project of Flink's size. I would like to mention a few
>> thoughts of mine, though. In general I am slightly against splitting the
>> repository. I fear that what we actually want to do is to introduce double
>> standards for different modules with the repository split.
>>
>> As I understand there are two issues we want to solve with the split:
>>
>> 1) long build/testing time
>>
>> 2) increasing number of PRs
>>
>> Ad. 1 I agree this is a problem and that we don't necessarily need to run
>> all the tests with every change or build the whole project all the time.
>> However, I think we could achieve that in a single repository and at the
>> same time keep the option to build all modules at once. If I am not
>> mistaken this the approach that Apache Beam community decided to take (see
>> e.g.
>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PreCommit_Java.groovy
>> where they define paths to file that if changed trigger the corresponding
>> CI job). Maybe we could make it easier if we restructure the repository? To
>> something like:
>>
>> flink/
>> |--flink-main/
>> |--flink-core/
>> |--flink-runtime/
>> ...
>> |--flink-connectors/
>> ...
>> |--flink-filesystems.../
>> ...
>>
>> |--root.pom
>>
>> In my opinion the Releases section from Chesnay's message shows well that
>> it might not be the best option to split the repository. The option a)
>> looks for me equivalent to what I suggested above but with a split. The
>> option b) looks for me super complicated and I can see no benefit over
>> option a). The option c) would be the most reasonable one if we decided to
>> split the repository, if you ask me. The problem with this approach is the
>> compatibility matrix (which versions of connectors work with which versions
>> of Flink?). Moreover, for me it is an indicator of what I mentioned that we
>> introduce double standards for those modules. I am not saying that I am
>> totally against that, but I think this should be a conscious decision.
>>
>> Ad. 2 I can't see how repository split could help with that rather than
>> moving some of the PRs to a separate list (that probably even less people
>> would look into). Also I think we can achieve something like that already
>> with github filters, no?
>>
>> To sum up my thoughts:
>>
>> 1. I think it is a good idea to split our CI builds to sub-modules
>> (connectors being the first candidate), that would trigger on a changed
>> path basis, but without splitting the repo.
>> 2. My feeling is that the real question is if we want to change our
>> stability guarantees of certain modules to be "just best effort".
>> 3. If we were to vote on this proposal I would vote -0. I am slightly
>> against this change, but wouldn't oppose.
>>
>> Best,
>>
>> Dawid
>> On 08/08/2019 13:23, Chesnay Schepler wrote:
>>
>>> I would like to also raise an additional issue: currently quite some
>> bugs (like release blockers [1]) are being discovered by ITCases of the
>> connectors. It means that at least initially, the main repository will lose
>> some test coverage.
>>
>> True, but I think this is more a symptom of us not properly testing the
>> contracts that are exposed to connectors.
>> That we lose lose test coverage is already a big red flag as it implies
>> that issues were fixed and are now verified by a connector test, and not by
>> a test in the Flink core.
>> We could also look into tooling surrounding the CI bot for running the
>> connectors tests on-demand, although this is very much long-term.
>>
>> On 08/08/2019 13:14, Piotr Nowojski wrote:
>>
>> Hi,
>>
>> Thanks for proposing and writing this down Chesney.
>>
>> Generally speaking +1 from my side for the idea. It will create additional
>> pain for cross repository development, like some new feature in connectors
>> that need some change in the main repository. I’ve worked in such setup
>> before and the teams then regretted having such split. But I agree that we
>> should try this to try solve the stability/build time issues.
>>
>> I have no experience in making such kind of splits so I can not help here.
>>
>> I would like to also raise an additional issue: currently quite some bugs
>> (like release blockers [1]) are being discovered by ITCases of the
>> connectors. It means that at least initially, the main repository will lose
>> some test coverage.
>>
>> Piotrek
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-13593 <https://issues.apache.org/jira/browse/FLINK-13593>
>> <https://issues.apache.org/jira/browse/FLINK-13593 <https://issues.apache.org/jira/browse/FLINK-13593>>
>> <https://issues.apache.org/jira/browse/FLINK-13593 <https://issues.apache.org/jira/browse/FLINK-13593>>
>>
>> On 7 Aug 2019, at 13:14, Chesnay Schepler <[hidden email] <mailto:[hidden email]>>
>> <[hidden email] <mailto:[hidden email]>> wrote:
>>
>> Hello everyone,
>>
>> The Flink project sees an ever-increasing amount of dev activity, both in
>> terms of reworked and new features.
>>
>> This is of course an excellent situation to be in, but we are getting to a
>> point where the associate downsides are becoming increasingly troublesome.
>>
>> The ever increasing build times, in addition to unstable tests,
>> significantly slow down the develoment process.
>> Additionally, pull requests for smaller features frequently slip through
>> the crasks as they are being buried under a mountain of other pull
>> requests.
>>
>> As a result I'd like to start a discussion on splitting the Flink
>> repository.
>>
>> In this mail I will outline the core idea, and what problems I currently
>> envision.
>>
>> I'd specifically like to encourage those who were part of similar
>> initiatives in other projects to share the experiences and ideas.
>>
>>
>> General Idea
>>
>> For starters, the idea is to create a new repository for
>> "flink-connectors".
>> For the remainder of this mail, the current Flink repository is referred
>> to as "flink-main".
>>
>> There are also other candidates that we could discuss in the future, like
>> flink-libraries (the next top-priority repo to ease flink-ml development),
>> metric reporters, filesystems and flink-formats.
>>
>> Moving out flink-connectors provides the most benefits, as we straight
>> away save at-least an hour of testing time, and not being included in the
>> binary distribution simplifies a few things.
>>
>>
>> Problems to solve
>>
>> To make this a reality there's a number of questions we have to discuss;
>> some in the short-term, others in the long-term.
>>
>> 1) Git history
>>
>> We have to decide whether we want to rewrite the history of sub
>> repositories to only contain diffs/commits related to this part of
>> Flink, or whether we just fork from some commit in flink-main and
>> add a commit to the connector repo that "transforms" it from
>> flink-main to flink-connectors (i.e., remove everything unrelated to
>> connectors + update module structure etc.).
>>
>> The latter option would have the advantage that our commit book
>> keeping in JIRA would still be correct, but it would create a
>> significant divide between the current and past state of the
>> repository.
>>
>> 2) Maven
>>
>> We should look into whether there's a way to share dependency/plugin
>> configurations and similar, so we don't have to keep them in sync
>> manually across multiple repositories.
>>
>> A new parent Flink pom that all repositories define as their parent
>> could work; this would imply splicing out part of the current room
>> pom.xml.
>>
>> 3) Documentation
>>
>> Splitting the repository realistically also implies splitting the
>> documentation source files (At the beginning we can get by with
>> having it still in flink-main).
>> We could just move the relevant files to the respective repository
>> (while maintaining the directory structure), and merge them when
>> building the docs.
>>
>> We also have to look at how we can handle java-/scaladocs; e.g.
>> whether it is possible to aggregate them across projects.
>>
>> 4) CI (end-to-end tests)
>>
>> The very basic question we have to answer is whether we want E2E
>> tests in the sub repositories. If so, we need to find a way to share
>> e2e-tooling.
>>
>> 5) Releases
>>
>> We have to discuss how our release process will look like. This may
>> also have repercussions on how repositories may depend on each other
>> (SNAPSHOT vs LATEST). Note that this should be discussed for each
>> repo separately.
>>
>> The current options I see are the following:
>>
>> a) Single release
>>
>> Release all repositories at once as a single product.
>>
>> The source release would be a collection of repositories, like
>> flink/
>> |--flink-main/
>> |--flink-core/
>> |--flink-runtime/
>> ...
>> |--flink-connectors/
>> ...
>> |--flink-.../
>> ...
>>
>> This option requires a SNAPSHOT dependency between Flink
>> repositories, but it is pretty much how things work at the moment.
>>
>> b) Synced releases
>>
>> Similar to a), except that each repository gets their own source
>> release that they may released independent of other repositories.
>> For a given release cycle each repo would produce exactly one
>> release.
>>
>> This option requires a SNAPSHOT dependency between Flink
>> repositories. Once any repositories has created an RC or
>> finished it's release, release-branches in other repos can
>> switch to that version.
>>
>> This approach is a tad more flexible than a), but requires more
>> coordination between the repos.
>>
>> c) Separate releases
>>
>> Just like we handle flink-shaded; entirely separate release
>> cycles; some repositories may have more releases in a given time
>> period than others.
>>
>> This option implies a LATEST dependency between Flink repositories.
>>
>> Note that hybrid approaches would also make sense, like doing b) for
>> major versions and c) for bugfix releases.
>>
>> For something like flink-libraries this question may also have
>> repercussions on how/whether they are bundled in the distribution;
>> options a)/b) would maintain the status-quo, c) and hybrid
>> approaches will likely necessitate the exclusion from the distribution.

Till Rohrmann

Re: [DISCUSS] Repository split

I pretty much agree with your points Dav/wid. Some problems which we want
to solve with a respository split are clearly caused by the existing build
system (no incremental builds, not enough flexibility to only build a
subset of modules). Given that a repository split would be a major
endeavour with a lot of uncertainties, changing Flink's build system might
actually be simpler.

In the past I tried to build Flink with Gradle because it better supports
incremental builds. Unfortunately, I never got it really off the grounds
because of too little time. Maybe it could be an option to investigate
other build systems like Gradle or Bazel and whether they could solve the
pain points around build time allowing us to keep a single repository.

I second Piotr's concerns that we would actually lose test coverage with
splitting the repository. Just with the 1.9 release we found a problem in
the CheckpointFailureManager because of failing Kafka tests. It might have
taken us more time to figure this problem out if the test were failing in a
separate repository.

Cheers,
Till

On Thu, Aug 8, 2019 at 5:47 PM Piotr Nowojski <[hidden email]> wrote:

> Hey,
>
> I retract my +1 (at least temporarily, until we discuss about alternative
> solutions).
>
> >> I would like to also raise an additional issue: currently quite some
> bugs (like release blockers [1]) are being discovered by ITCases of the
> connectors. It means that at least initially, the main repository will lose
> some test coverage.
> >>
> > True, but I think this is more a symptom of us not properly testing the
> contracts that are exposed to connectors.
>
> Sure. In ideal world we should have properly test coverage and
> self-contained modules. In reality, especially when it comes to weird and
> quirky race conditions, some executions paths/races are triggered only in
> specific scenarios. For example when test is written in a very special way,
> or there are special timing constrains.
>
> I’m not saying that this should block the split, but it is something that
> might need to be taken into account. Even if no immediate action required,
> core/runtime modules contributors must be aware of small coverage and that
> they should also monitor from time to time test failures in the connectors.
>
> Re David and Dawid.
>
> I agree that this can create big pains from time to time. However if we do
> the split correctly, along reasonably stable APIs boundaries, it should be
> rare that some development effort requires changes/refactoring in the core
> modules. Personally I’m only aware of one case when this would be needed in
> the past two years in Flink: when adding Kafka 0.11 connector, I was also
> adding `TwoPhaseCommitSinkFunction`. And until Kafka 0.11 connector has
> stabilised, there were at least couple of changes added later to the
> `TwoPhaseCommitSinkFunction` in order for Kafka 0.11 connector to work
> (like transaction time outs).
>
> If we have counter proposal, let's talk it through.
>
> > In case of CI, as Dawid already mentioned, you only need to trigger
> build /
> > tests for the code you have changed and it's dependents. This should
> > greatly improve runtime of CI builds.
>
> However when we are doing change to network stack, in perfect setup, with
> good test coverage in `Flink-runtime` module, we shouldn’t be running
> connector or flink-ml tests (as long as we are not modifying the behaviour
> or public apis). So triggering tests based on the dependencies would only
> half solve the problem.
>
> Besides that, there are two more benefits of repository split:
>
> 1. Test instabilities/intermittent failures of sub modules
> (connectors/flink-ml/flink-python/table-api) were causing us much more
> problems in the recent months, slowing down the development of lower level
> modules. The more such modules we have, the more developers we have, it
> means that even assuming that we maintain our current standards, the sheer
> number of intermittent failures will grow. If we comparmentize the
> repository into smaller one, we reduce the global probability of build
> failure (now the probability of a single build failure is P(Flink-core
> fails) + P(connector fails) + P(flink-ml fails) + … )
>
> But maybe we could also solve this with a more clever/better build script?
> Defining test boundary - that connector tests are executed ONLY if the
> connector code was changed?
>
> Piotrek
>
> > On 8 Aug 2019, at 17:16, David Morávek <[hidden email]> wrote:
> >
> > +1 for the motivation, -1 for the solution as all of the problems mention
> > above can be addressed with the mono-repo as well.
> >
> > Multiple repositories:
> > 1) This creates a big pain in case of change that targets code base in
> > multiple repositories. Change needs to be split in multiple PRs, that
> need
> > to be reviewed separately, merged in proper order, otherwise CI would
> fail
> > (also you need to rebuild "dependent PR", once its dependency gets
> merged -
> > this will just result in a lot of false positive PR build failures). Also
> > if the change needs to be cherry-picked into multiple releases, it's
> really
> > easy to make a mistake.
> > 2) PR builds are not reproducible in case you depend on SNAPSHOTS.
> > 3) It makes release management way harder as all the parts are versioned
> > separately.
> > 4) Refactoring over multi repositories.
> > 5) For newcomers, it's way harder to contribute, as the local setup gets
> > complicated. Also depending on SNAPSHOTS from other project, can be very
> > frustrating for people that are not too familiar with dep. management, as
> > it often leads to unpredictable behavior due to local cache etc...
> >
> > The increased build / testing time, does not imply that the repository is
> > too big, but that the current build system is not setup correctly (eg.
> > checkstyle takes for ages on my box, ...) / user is unaware of how to
> > leverage the current build system (eg. does not need to build everything
> > from scratch every time he makes a change; can be improved in docs).
> >
> > In case of CI, as Dawid already mentioned, you only need to trigger
> build /
> > tests for the code you have changed and it's dependents. This should
> > greatly improve runtime of CI builds.
> >
> > D.
> >
> > On Thu, Aug 8, 2019 at 4:19 PM Dawid Wysakowicz <[hidden email]
> <mailto:[hidden email]>>
> > wrote:
> >
> >> First of all I don't have much(if not at all) experience with working
> with
> >> a multi repository project of Flink's size. I would like to mention a
> few
> >> thoughts of mine, though. In general I am slightly against splitting the
> >> repository. I fear that what we actually want to do is to introduce
> double
> >> standards for different modules with the repository split.
> >>
> >> As I understand there are two issues we want to solve with the split:
> >>
> >> 1) long build/testing time
> >>
> >> 2) increasing number of PRs
> >>
> >> Ad. 1 I agree this is a problem and that we don't necessarily need to
> run
> >> all the tests with every change or build the whole project all the time.
> >> However, I think we could achieve that in a single repository and at the
> >> same time keep the option to build all modules at once. If I am not
> >> mistaken this the approach that Apache Beam community decided to take
> (see
> >> e.g.
> >>
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PreCommit_Java.groovy
> >> where they define paths to file that if changed trigger the
> corresponding
> >> CI job). Maybe we could make it easier if we restructure the
> repository? To
> >> something like:
> >>
> >> flink/
> >> |--flink-main/
> >> |--flink-core/
> >> |--flink-runtime/
> >> ...
> >> |--flink-connectors/
> >> ...
> >> |--flink-filesystems.../
> >> ...
> >>
> >> |--root.pom
> >>
> >> In my opinion the Releases section from Chesnay's message shows well
> that
> >> it might not be the best option to split the repository. The option a)
> >> looks for me equivalent to what I suggested above but with a split. The
> >> option b) looks for me super complicated and I can see no benefit over
> >> option a). The option c) would be the most reasonable one if we decided
> to
> >> split the repository, if you ask me. The problem with this approach is
> the
> >> compatibility matrix (which versions of connectors work with which
> versions
> >> of Flink?). Moreover, for me it is an indicator of what I mentioned
> that we
> >> introduce double standards for those modules. I am not saying that I am
> >> totally against that, but I think this should be a conscious decision.
> >>
> >> Ad. 2 I can't see how repository split could help with that rather than
> >> moving some of the PRs to a separate list (that probably even less
> people
> >> would look into). Also I think we can achieve something like that
> already
> >> with github filters, no?
> >>
> >> To sum up my thoughts:
> >>
> >> 1. I think it is a good idea to split our CI builds to sub-modules
> >> (connectors being the first candidate), that would trigger on a
> changed
> >> path basis, but without splitting the repo.
> >> 2. My feeling is that the real question is if we want to change our
> >> stability guarantees of certain modules to be "just best effort".
> >> 3. If we were to vote on this proposal I would vote -0. I am slightly
> >> against this change, but wouldn't oppose.
> >>
> >> Best,
> >>
> >> Dawid
> >> On 08/08/2019 13:23, Chesnay Schepler wrote:
> >>
> >>> I would like to also raise an additional issue: currently quite some
> >> bugs (like release blockers [1]) are being discovered by ITCases of the
> >> connectors. It means that at least initially, the main repository will
> lose
> >> some test coverage.
> >>
> >> True, but I think this is more a symptom of us not properly testing the
> >> contracts that are exposed to connectors.
> >> That we lose lose test coverage is already a big red flag as it implies
> >> that issues were fixed and are now verified by a connector test, and
> not by
> >> a test in the Flink core.
> >> We could also look into tooling surrounding the CI bot for running the
> >> connectors tests on-demand, although this is very much long-term.
> >>
> >> On 08/08/2019 13:14, Piotr Nowojski wrote:
> >>
> >> Hi,
> >>
> >> Thanks for proposing and writing this down Chesney.
> >>
> >> Generally speaking +1 from my side for the idea. It will create
> additional
> >> pain for cross repository development, like some new feature in
> connectors
> >> that need some change in the main repository. I’ve worked in such setup
> >> before and the teams then regretted having such split. But I agree that
> we
> >> should try this to try solve the stability/build time issues.
> >>
> >> I have no experience in making such kind of splits so I can not help
> here.
> >>
> >> I would like to also raise an additional issue: currently quite some
> bugs
> >> (like release blockers [1]) are being discovered by ITCases of the
> >> connectors. It means that at least initially, the main repository will
> lose
> >> some test coverage.
> >>
> >> Piotrek
> >>
> >> [1] https://issues.apache.org/jira/browse/FLINK-13593 <
> https://issues.apache.org/jira/browse/FLINK-13593>
> >> <https://issues.apache.org/jira/browse/FLINK-13593 <
> https://issues.apache.org/jira/browse/FLINK-13593>>
> >> <https://issues.apache.org/jira/browse/FLINK-13593 <
> https://issues.apache.org/jira/browse/FLINK-13593>>
> >>
> >> On 7 Aug 2019, at 13:14, Chesnay Schepler <[hidden email] <mailto:
> [hidden email]>>
> >> <[hidden email] <mailto:[hidden email]>> wrote:
> >>
> >> Hello everyone,
> >>
> >> The Flink project sees an ever-increasing amount of dev activity, both
> in
> >> terms of reworked and new features.
> >>
> >> This is of course an excellent situation to be in, but we are getting
> to a
> >> point where the associate downsides are becoming increasingly
> troublesome.
> >>
> >> The ever increasing build times, in addition to unstable tests,
> >> significantly slow down the develoment process.
> >> Additionally, pull requests for smaller features frequently slip through
> >> the crasks as they are being buried under a mountain of other pull
> >> requests.
> >>
> >> As a result I'd like to start a discussion on splitting the Flink
> >> repository.
> >>
> >> In this mail I will outline the core idea, and what problems I currently
> >> envision.
> >>
> >> I'd specifically like to encourage those who were part of similar
> >> initiatives in other projects to share the experiences and ideas.
> >>
> >>
> >> General Idea
> >>
> >> For starters, the idea is to create a new repository for
> >> "flink-connectors".
> >> For the remainder of this mail, the current Flink repository is referred
> >> to as "flink-main".
> >>
> >> There are also other candidates that we could discuss in the future,
> like
> >> flink-libraries (the next top-priority repo to ease flink-ml
> development),
> >> metric reporters, filesystems and flink-formats.
> >>
> >> Moving out flink-connectors provides the most benefits, as we straight
> >> away save at-least an hour of testing time, and not being included in
> the
> >> binary distribution simplifies a few things.
> >>
> >>
> >> Problems to solve
> >>
> >> To make this a reality there's a number of questions we have to discuss;
> >> some in the short-term, others in the long-term.
> >>
> >> 1) Git history
> >>
> >> We have to decide whether we want to rewrite the history of sub
> >> repositories to only contain diffs/commits related to this part of
> >> Flink, or whether we just fork from some commit in flink-main and
> >> add a commit to the connector repo that "transforms" it from
> >> flink-main to flink-connectors (i.e., remove everything unrelated to
> >> connectors + update module structure etc.).
> >>
> >> The latter option would have the advantage that our commit book
> >> keeping in JIRA would still be correct, but it would create a
> >> significant divide between the current and past state of the
> >> repository.
> >>
> >> 2) Maven
> >>
> >> We should look into whether there's a way to share dependency/plugin
> >> configurations and similar, so we don't have to keep them in sync
> >> manually across multiple repositories.
> >>
> >> A new parent Flink pom that all repositories define as their parent
> >> could work; this would imply splicing out part of the current room
> >> pom.xml.
> >>
> >> 3) Documentation
> >>
> >> Splitting the repository realistically also implies splitting the
> >> documentation source files (At the beginning we can get by with
> >> having it still in flink-main).
> >> We could just move the relevant files to the respective repository
> >> (while maintaining the directory structure), and merge them when
> >> building the docs.
> >>
> >> We also have to look at how we can handle java-/scaladocs; e.g.
> >> whether it is possible to aggregate them across projects.
> >>
> >> 4) CI (end-to-end tests)
> >>
> >> The very basic question we have to answer is whether we want E2E
> >> tests in the sub repositories. If so, we need to find a way to share
> >> e2e-tooling.
> >>
> >> 5) Releases
> >>
> >> We have to discuss how our release process will look like. This may
> >> also have repercussions on how repositories may depend on each other
> >> (SNAPSHOT vs LATEST). Note that this should be discussed for each
> >> repo separately.
> >>
> >> The current options I see are the following:
> >>
> >> a) Single release
> >>
> >> Release all repositories at once as a single product.
> >>
> >> The source release would be a collection of repositories, like
> >> flink/
> >> |--flink-main/
> >> |--flink-core/
> >> |--flink-runtime/
> >> ...
> >> |--flink-connectors/
> >> ...
> >> |--flink-.../
> >> ...
> >>
> >> This option requires a SNAPSHOT dependency between Flink
> >> repositories, but it is pretty much how things work at the moment.
> >>
> >> b) Synced releases
> >>
> >> Similar to a), except that each repository gets their own source
> >> release that they may released independent of other repositories.
> >> For a given release cycle each repo would produce exactly one
> >> release.
> >>
> >> This option requires a SNAPSHOT dependency between Flink
> >> repositories. Once any repositories has created an RC or
> >> finished it's release, release-branches in other repos can
> >> switch to that version.
> >>
> >> This approach is a tad more flexible than a), but requires more
> >> coordination between the repos.
> >>
> >> c) Separate releases
> >>
> >> Just like we handle flink-shaded; entirely separate release
> >> cycles; some repositories may have more releases in a given time
> >> period than others.
> >>
> >> This option implies a LATEST dependency between Flink
> repositories.
> >>
> >> Note that hybrid approaches would also make sense, like doing b) for
> >> major versions and c) for bugfix releases.
> >>
> >> For something like flink-libraries this question may also have
> >> repercussions on how/whether they are bundled in the distribution;
> >> options a)/b) would maintain the status-quo, c) and hybrid
> >> approaches will likely necessitate the exclusion from the
> distribution.
>
>

Jark Wu-2

Re: [DISCUSS] Repository split

Hi,

First of all, I agree with Dawid and David's point.

I will share some experience on the repository split. We have been through
it for Alibaba Blink, which is the most worthwhile project to learn from I
think.
We split Blink project into "blink-connectors" and "blink", but we didn't
get much benefit for better development process. In the contrary, it slow
down the development sometimes.
We have suffered from the following issues after split as far as I can see:

1. Unstable build and test:
The interface or behavior changes in the underlying (e.g. core, table) will
lead to build fail and tests fail in the connectors repo. AFAIK, table api
are still under heavy evolution.
This will make connectors repo more unstable and makes us busy to fix the
build problems and tests problems **after-commit**.
First, it's not easy to locate which commit of main repo lead to the
connectors repo fail (we have over 70+ commits every day in flink master
now and it is growing).
Second, when 2 or 3 build/test problems happened at one time, it's hard to
fix the problem because we can't make the build/test pass in separate
hotfix pull requests.

2. Debug difficulty:
As modules are separate in different repositories, if we want to debug a
Kafka IT case,
we may need to debug some code in flink runtime or verify whether the
runtime code change
can fix the Kafka case. However, it will be more complex because they are
not in one project.

IMO, this actually slows down the development process.

------

In my understanding, the issues we want to solve with the split include:
1) long build/testing time
2) unstable tests
3) increasing number of PRs

Ad. 1 I think we have several ways to reduce the build/testing time. As
Dawid said, we can trigger corresponding CI in a single repository (without
to run all the tests).
An easy way might be to analyse the pom.xml that which modules depends on
the changed module. And one thing we can do right now is skipping all the
tests for documentation changes.

Ad. 2 I can't see how unstable connectors tests can be fixed more quickly
after moved to a separate repositories. As far as I can tell, this problem
might be more significant.

Ad. 3 I also doubt how repository split could help with this. I think this
will give the sub-repositories less exposure and bahir-flink[1] is an
example (only 3 commits in the last 2 months).

At the end, from my point of view,
1) if we want to reduce build/testing time, we can start a new thread to
collect ideas from community. We can try some approaches to see if they can
solve most of the problems.
2) if we want to split repository, we need to be cautious enough to the
potential development slow down we might meet.

Regards,
Jark

[1]: https://github.com/apache/bahir-flink/graphs/commit-activity

On Fri, 9 Aug 2019 at 00:26, Till Rohrmann <[hidden email]> wrote:

> I pretty much agree with your points Dav/wid. Some problems which we want
> to solve with a respository split are clearly caused by the existing build
> system (no incremental builds, not enough flexibility to only build a
> subset of modules). Given that a repository split would be a major
> endeavour with a lot of uncertainties, changing Flink's build system might
> actually be simpler.
>
> In the past I tried to build Flink with Gradle because it better supports
> incremental builds. Unfortunately, I never got it really off the grounds
> because of too little time. Maybe it could be an option to investigate
> other build systems like Gradle or Bazel and whether they could solve the
> pain points around build time allowing us to keep a single repository.
>
> I second Piotr's concerns that we would actually lose test coverage with
> splitting the repository. Just with the 1.9 release we found a problem in
> the CheckpointFailureManager because of failing Kafka tests. It might have
> taken us more time to figure this problem out if the test were failing in a
> separate repository.
>
> Cheers,
> Till
>
> On Thu, Aug 8, 2019 at 5:47 PM Piotr Nowojski <[hidden email]> wrote:
>
> > Hey,
> >
> > I retract my +1 (at least temporarily, until we discuss about alternative
> > solutions).
> >
> > >> I would like to also raise an additional issue: currently quite some
> > bugs (like release blockers [1]) are being discovered by ITCases of the
> > connectors. It means that at least initially, the main repository will
> lose
> > some test coverage.
> > >>
> > > True, but I think this is more a symptom of us not properly testing the
> > contracts that are exposed to connectors.
> >
> > Sure. In ideal world we should have properly test coverage and
> > self-contained modules. In reality, especially when it comes to weird and
> > quirky race conditions, some executions paths/races are triggered only in
> > specific scenarios. For example when test is written in a very special
> way,
> > or there are special timing constrains.
> >
> > I’m not saying that this should block the split, but it is something that
> > might need to be taken into account. Even if no immediate action
> required,
> > core/runtime modules contributors must be aware of small coverage and
> that
> > they should also monitor from time to time test failures in the
> connectors.
> >
> > Re David and Dawid.
> >
> > I agree that this can create big pains from time to time. However if we
> do
> > the split correctly, along reasonably stable APIs boundaries, it should
> be
> > rare that some development effort requires changes/refactoring in the
> core
> > modules. Personally I’m only aware of one case when this would be needed
> in
> > the past two years in Flink: when adding Kafka 0.11 connector, I was also
> > adding `TwoPhaseCommitSinkFunction`. And until Kafka 0.11 connector has
> > stabilised, there were at least couple of changes added later to the
> > `TwoPhaseCommitSinkFunction` in order for Kafka 0.11 connector to work
> > (like transaction time outs).
> >
> > If we have counter proposal, let's talk it through.
> >
> > > In case of CI, as Dawid already mentioned, you only need to trigger
> > build /
> > > tests for the code you have changed and it's dependents. This should
> > > greatly improve runtime of CI builds.
> >
> > However when we are doing change to network stack, in perfect setup, with
> > good test coverage in `Flink-runtime` module, we shouldn’t be running
> > connector or flink-ml tests (as long as we are not modifying the
> behaviour
> > or public apis). So triggering tests based on the dependencies would only
> > half solve the problem.
> >
> > Besides that, there are two more benefits of repository split:
> >
> > 1. Test instabilities/intermittent failures of sub modules
> > (connectors/flink-ml/flink-python/table-api) were causing us much more
> > problems in the recent months, slowing down the development of lower
> level
> > modules. The more such modules we have, the more developers we have, it
> > means that even assuming that we maintain our current standards, the
> sheer
> > number of intermittent failures will grow. If we comparmentize the
> > repository into smaller one, we reduce the global probability of build
> > failure (now the probability of a single build failure is P(Flink-core
> > fails) + P(connector fails) + P(flink-ml fails) + … )
> >
> > But maybe we could also solve this with a more clever/better build
> script?
> > Defining test boundary - that connector tests are executed ONLY if the
> > connector code was changed?
> >
> > Piotrek
> >
> > > On 8 Aug 2019, at 17:16, David Morávek <[hidden email]> wrote:
> > >
> > > +1 for the motivation, -1 for the solution as all of the problems
> mention
> > > above can be addressed with the mono-repo as well.
> > >
> > > Multiple repositories:
> > > 1) This creates a big pain in case of change that targets code base in
> > > multiple repositories. Change needs to be split in multiple PRs, that
> > need
> > > to be reviewed separately, merged in proper order, otherwise CI would
> > fail
> > > (also you need to rebuild "dependent PR", once its dependency gets
> > merged -
> > > this will just result in a lot of false positive PR build failures).
> Also
> > > if the change needs to be cherry-picked into multiple releases, it's
> > really
> > > easy to make a mistake.
> > > 2) PR builds are not reproducible in case you depend on SNAPSHOTS.
> > > 3) It makes release management way harder as all the parts are
> versioned
> > > separately.
> > > 4) Refactoring over multi repositories.
> > > 5) For newcomers, it's way harder to contribute, as the local setup
> gets
> > > complicated. Also depending on SNAPSHOTS from other project, can be
> very
> > > frustrating for people that are not too familiar with dep. management,
> as
> > > it often leads to unpredictable behavior due to local cache etc...
> > >
> > > The increased build / testing time, does not imply that the repository
> is
> > > too big, but that the current build system is not setup correctly (eg.
> > > checkstyle takes for ages on my box, ...) / user is unaware of how to
> > > leverage the current build system (eg. does not need to build
> everything
> > > from scratch every time he makes a change; can be improved in docs).
> > >
> > > In case of CI, as Dawid already mentioned, you only need to trigger
> > build /
> > > tests for the code you have changed and it's dependents. This should
> > > greatly improve runtime of CI builds.
> > >
> > > D.
> > >
> > > On Thu, Aug 8, 2019 at 4:19 PM Dawid Wysakowicz <
> [hidden email]
> > <mailto:[hidden email]>>
> > > wrote:
> > >
> > >> First of all I don't have much(if not at all) experience with working
> > with
> > >> a multi repository project of Flink's size. I would like to mention a
> > few
> > >> thoughts of mine, though. In general I am slightly against splitting
> the
> > >> repository. I fear that what we actually want to do is to introduce
> > double
> > >> standards for different modules with the repository split.
> > >>
> > >> As I understand there are two issues we want to solve with the split:
> > >>
> > >> 1) long build/testing time
> > >>
> > >> 2) increasing number of PRs
> > >>
> > >> Ad. 1 I agree this is a problem and that we don't necessarily need to
> > run
> > >> all the tests with every change or build the whole project all the
> time.
> > >> However, I think we could achieve that in a single repository and at
> the
> > >> same time keep the option to build all modules at once. If I am not
> > >> mistaken this the approach that Apache Beam community decided to take
> > (see
> > >> e.g.
> > >>
> >
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PreCommit_Java.groovy
> > >> where they define paths to file that if changed trigger the
> > corresponding
> > >> CI job). Maybe we could make it easier if we restructure the
> > repository? To
> > >> something like:
> > >>
> > >> flink/
> > >> |--flink-main/
> > >> |--flink-core/
> > >> |--flink-runtime/
> > >> ...
> > >> |--flink-connectors/
> > >> ...
> > >> |--flink-filesystems.../
> > >> ...
> > >>
> > >> |--root.pom
> > >>
> > >> In my opinion the Releases section from Chesnay's message shows well
> > that
> > >> it might not be the best option to split the repository. The option a)
> > >> looks for me equivalent to what I suggested above but with a split.
> The
> > >> option b) looks for me super complicated and I can see no benefit over
> > >> option a). The option c) would be the most reasonable one if we
> decided
> > to
> > >> split the repository, if you ask me. The problem with this approach is
> > the
> > >> compatibility matrix (which versions of connectors work with which
> > versions
> > >> of Flink?). Moreover, for me it is an indicator of what I mentioned
> > that we
> > >> introduce double standards for those modules. I am not saying that I
> am
> > >> totally against that, but I think this should be a conscious decision.
> > >>
> > >> Ad. 2 I can't see how repository split could help with that rather
> than
> > >> moving some of the PRs to a separate list (that probably even less
> > people
> > >> would look into). Also I think we can achieve something like that
> > already
> > >> with github filters, no?
> > >>
> > >> To sum up my thoughts:
> > >>
> > >> 1. I think it is a good idea to split our CI builds to sub-modules
> > >> (connectors being the first candidate), that would trigger on a
> > changed
> > >> path basis, but without splitting the repo.
> > >> 2. My feeling is that the real question is if we want to change our
> > >> stability guarantees of certain modules to be "just best effort".
> > >> 3. If we were to vote on this proposal I would vote -0. I am
> slightly
> > >> against this change, but wouldn't oppose.
> > >>
> > >> Best,
> > >>
> > >> Dawid
> > >> On 08/08/2019 13:23, Chesnay Schepler wrote:
> > >>
> > >>> I would like to also raise an additional issue: currently quite some
> > >> bugs (like release blockers [1]) are being discovered by ITCases of
> the
> > >> connectors. It means that at least initially, the main repository will
> > lose
> > >> some test coverage.
> > >>
> > >> True, but I think this is more a symptom of us not properly testing
> the
> > >> contracts that are exposed to connectors.
> > >> That we lose lose test coverage is already a big red flag as it
> implies
> > >> that issues were fixed and are now verified by a connector test, and
> > not by
> > >> a test in the Flink core.
> > >> We could also look into tooling surrounding the CI bot for running the
> > >> connectors tests on-demand, although this is very much long-term.
> > >>
> > >> On 08/08/2019 13:14, Piotr Nowojski wrote:
> > >>
> > >> Hi,
> > >>
> > >> Thanks for proposing and writing this down Chesney.
> > >>
> > >> Generally speaking +1 from my side for the idea. It will create
> > additional
> > >> pain for cross repository development, like some new feature in
> > connectors
> > >> that need some change in the main repository. I’ve worked in such
> setup
> > >> before and the teams then regretted having such split. But I agree
> that
> > we
> > >> should try this to try solve the stability/build time issues.
> > >>
> > >> I have no experience in making such kind of splits so I can not help
> > here.
> > >>
> > >> I would like to also raise an additional issue: currently quite some
> > bugs
> > >> (like release blockers [1]) are being discovered by ITCases of the
> > >> connectors. It means that at least initially, the main repository will
> > lose
> > >> some test coverage.
> > >>
> > >> Piotrek
> > >>
> > >> [1] https://issues.apache.org/jira/browse/FLINK-13593 <
> > https://issues.apache.org/jira/browse/FLINK-13593>
> > >> <https://issues.apache.org/jira/browse/FLINK-13593 <
> > https://issues.apache.org/jira/browse/FLINK-13593>>
> > >> <https://issues.apache.org/jira/browse/FLINK-13593 <
> > https://issues.apache.org/jira/browse/FLINK-13593>>
> > >>
> > >> On 7 Aug 2019, at 13:14, Chesnay Schepler <[hidden email]
> <mailto:
> > [hidden email]>>
> > >> <[hidden email] <mailto:[hidden email]>> wrote:
> > >>
> > >> Hello everyone,
> > >>
> > >> The Flink project sees an ever-increasing amount of dev activity, both
> > in
> > >> terms of reworked and new features.
> > >>
> > >> This is of course an excellent situation to be in, but we are getting
> > to a
> > >> point where the associate downsides are becoming increasingly
> > troublesome.
> > >>
> > >> The ever increasing build times, in addition to unstable tests,
> > >> significantly slow down the develoment process.
> > >> Additionally, pull requests for smaller features frequently slip
> through
> > >> the crasks as they are being buried under a mountain of other pull
> > >> requests.
> > >>
> > >> As a result I'd like to start a discussion on splitting the Flink
> > >> repository.
> > >>
> > >> In this mail I will outline the core idea, and what problems I
> currently
> > >> envision.
> > >>
> > >> I'd specifically like to encourage those who were part of similar
> > >> initiatives in other projects to share the experiences and ideas.
> > >>
> > >>
> > >> General Idea
> > >>
> > >> For starters, the idea is to create a new repository for
> > >> "flink-connectors".
> > >> For the remainder of this mail, the current Flink repository is
> referred
> > >> to as "flink-main".
> > >>
> > >> There are also other candidates that we could discuss in the future,
> > like
> > >> flink-libraries (the next top-priority repo to ease flink-ml
> > development),
> > >> metric reporters, filesystems and flink-formats.
> > >>
> > >> Moving out flink-connectors provides the most benefits, as we straight
> > >> away save at-least an hour of testing time, and not being included in
> > the
> > >> binary distribution simplifies a few things.
> > >>
> > >>
> > >> Problems to solve
> > >>
> > >> To make this a reality there's a number of questions we have to
> discuss;
> > >> some in the short-term, others in the long-term.
> > >>
> > >> 1) Git history
> > >>
> > >> We have to decide whether we want to rewrite the history of sub
> > >> repositories to only contain diffs/commits related to this part of
> > >> Flink, or whether we just fork from some commit in flink-main and
> > >> add a commit to the connector repo that "transforms" it from
> > >> flink-main to flink-connectors (i.e., remove everything unrelated to
> > >> connectors + update module structure etc.).
> > >>
> > >> The latter option would have the advantage that our commit book
> > >> keeping in JIRA would still be correct, but it would create a
> > >> significant divide between the current and past state of the
> > >> repository.
> > >>
> > >> 2) Maven
> > >>
> > >> We should look into whether there's a way to share dependency/plugin
> > >> configurations and similar, so we don't have to keep them in sync
> > >> manually across multiple repositories.
> > >>
> > >> A new parent Flink pom that all repositories define as their parent
> > >> could work; this would imply splicing out part of the current room
> > >> pom.xml.
> > >>
> > >> 3) Documentation
> > >>
> > >> Splitting the repository realistically also implies splitting the
> > >> documentation source files (At the beginning we can get by with
> > >> having it still in flink-main).
> > >> We could just move the relevant files to the respective repository
> > >> (while maintaining the directory structure), and merge them when
> > >> building the docs.
> > >>
> > >> We also have to look at how we can handle java-/scaladocs; e.g.
> > >> whether it is possible to aggregate them across projects.
> > >>
> > >> 4) CI (end-to-end tests)
> > >>
> > >> The very basic question we have to answer is whether we want E2E
> > >> tests in the sub repositories. If so, we need to find a way to share
> > >> e2e-tooling.
> > >>
> > >> 5) Releases
> > >>
> > >> We have to discuss how our release process will look like. This may
> > >> also have repercussions on how repositories may depend on each other
> > >> (SNAPSHOT vs LATEST). Note that this should be discussed for each
> > >> repo separately.
> > >>
> > >> The current options I see are the following:
> > >>
> > >> a) Single release
> > >>
> > >> Release all repositories at once as a single product.
> > >>
> > >> The source release would be a collection of repositories, like
> > >> flink/
> > >> |--flink-main/
> > >> |--flink-core/
> > >> |--flink-runtime/
> > >> ...
> > >> |--flink-connectors/
> > >> ...
> > >> |--flink-.../
> > >> ...
> > >>
> > >> This option requires a SNAPSHOT dependency between Flink
> > >> repositories, but it is pretty much how things work at the
> moment.
> > >>
> > >> b) Synced releases
> > >>
> > >> Similar to a), except that each repository gets their own source
> > >> release that they may released independent of other
> repositories.
> > >> For a given release cycle each repo would produce exactly one
> > >> release.
> > >>
> > >> This option requires a SNAPSHOT dependency between Flink
> > >> repositories. Once any repositories has created an RC or
> > >> finished it's release, release-branches in other repos can
> > >> switch to that version.
> > >>
> > >> This approach is a tad more flexible than a), but requires more
> > >> coordination between the repos.
> > >>
> > >> c) Separate releases
> > >>
> > >> Just like we handle flink-shaded; entirely separate release
> > >> cycles; some repositories may have more releases in a given time
> > >> period than others.
> > >>
> > >> This option implies a LATEST dependency between Flink
> > repositories.
> > >>
> > >> Note that hybrid approaches would also make sense, like doing b) for
> > >> major versions and c) for bugfix releases.
> > >>
> > >> For something like flink-libraries this question may also have
> > >> repercussions on how/whether they are bundled in the distribution;
> > >> options a)/b) would maintain the status-quo, c) and hybrid
> > >> approaches will likely necessitate the exclusion from the
> > distribution.
> >
> >
>

Piotr Nowojski-3

Re: [DISCUSS] Repository split

Hi,

Re Jark’s:

> Ad. 2 I can't see how unstable connectors tests can be fixed more quickly
> after moved to a separate repositories.

It’s more about probability of intermittent failures across all of the modules adding up, causing whole build to fail almost all the time. With separate repositories, those probabilities stop adding up. But as I wrote before, this could also be simulated by some clever build script: run connectors tests only if connectors' code was touched.

Also I can easily see how split can lead to more unstable builds in dependant repositories (your point 1.).

Piotrek

> On 8 Aug 2019, at 18:54, Jark Wu <[hidden email]> wrote:
>
> Hi,
>
> First of all, I agree with Dawid and David's point.
>
> I will share some experience on the repository split. We have been through
> it for Alibaba Blink, which is the most worthwhile project to learn from I
> think.
> We split Blink project into "blink-connectors" and "blink", but we didn't
> get much benefit for better development process. In the contrary, it slow
> down the development sometimes.
> We have suffered from the following issues after split as far as I can see:
>
> 1. Unstable build and test:
> The interface or behavior changes in the underlying (e.g. core, table) will
> lead to build fail and tests fail in the connectors repo. AFAIK, table api
> are still under heavy evolution.
> This will make connectors repo more unstable and makes us busy to fix the
> build problems and tests problems **after-commit**.
> First, it's not easy to locate which commit of main repo lead to the
> connectors repo fail (we have over 70+ commits every day in flink master
> now and it is growing).
> Second, when 2 or 3 build/test problems happened at one time, it's hard to
> fix the problem because we can't make the build/test pass in separate
> hotfix pull requests.
>
> 2. Debug difficulty:
> As modules are separate in different repositories, if we want to debug a
> Kafka IT case,
> we may need to debug some code in flink runtime or verify whether the
> runtime code change
> can fix the Kafka case. However, it will be more complex because they are
> not in one project.
>
> IMO, this actually slows down the development process.
>
> ------
>
> In my understanding, the issues we want to solve with the split include:
> 1) long build/testing time
> 2) unstable tests
> 3) increasing number of PRs
>
> Ad. 1 I think we have several ways to reduce the build/testing time. As
> Dawid said, we can trigger corresponding CI in a single repository (without
> to run all the tests).
> An easy way might be to analyse the pom.xml that which modules depends on
> the changed module. And one thing we can do right now is skipping all the
> tests for documentation changes.
>
> Ad. 2 I can't see how unstable connectors tests can be fixed more quickly
> after moved to a separate repositories. As far as I can tell, this problem
> might be more significant.
>
> Ad. 3 I also doubt how repository split could help with this. I think this
> will give the sub-repositories less exposure and bahir-flink[1] is an
> example (only 3 commits in the last 2 months).
>
> At the end, from my point of view,
> 1) if we want to reduce build/testing time, we can start a new thread to
> collect ideas from community. We can try some approaches to see if they can
> solve most of the problems.
> 2) if we want to split repository, we need to be cautious enough to the
> potential development slow down we might meet.
>
> Regards,
> Jark
>
> [1]: https://github.com/apache/bahir-flink/graphs/commit-activity
>
>
>
>
> On Fri, 9 Aug 2019 at 00:26, Till Rohrmann <[hidden email]> wrote:
>
>> I pretty much agree with your points Dav/wid. Some problems which we want
>> to solve with a respository split are clearly caused by the existing build
>> system (no incremental builds, not enough flexibility to only build a
>> subset of modules). Given that a repository split would be a major
>> endeavour with a lot of uncertainties, changing Flink's build system might
>> actually be simpler.
>>
>> In the past I tried to build Flink with Gradle because it better supports
>> incremental builds. Unfortunately, I never got it really off the grounds
>> because of too little time. Maybe it could be an option to investigate
>> other build systems like Gradle or Bazel and whether they could solve the
>> pain points around build time allowing us to keep a single repository.
>>
>> I second Piotr's concerns that we would actually lose test coverage with
>> splitting the repository. Just with the 1.9 release we found a problem in
>> the CheckpointFailureManager because of failing Kafka tests. It might have
>> taken us more time to figure this problem out if the test were failing in a
>> separate repository.
>>
>> Cheers,
>> Till
>>
>> On Thu, Aug 8, 2019 at 5:47 PM Piotr Nowojski <[hidden email]> wrote:
>>
>>> Hey,
>>>
>>> I retract my +1 (at least temporarily, until we discuss about alternative
>>> solutions).
>>>
>>>>> I would like to also raise an additional issue: currently quite some
>>> bugs (like release blockers [1]) are being discovered by ITCases of the
>>> connectors. It means that at least initially, the main repository will
>> lose
>>> some test coverage.
>>>>>
>>>> True, but I think this is more a symptom of us not properly testing the
>>> contracts that are exposed to connectors.
>>>
>>> Sure. In ideal world we should have properly test coverage and
>>> self-contained modules. In reality, especially when it comes to weird and
>>> quirky race conditions, some executions paths/races are triggered only in
>>> specific scenarios. For example when test is written in a very special
>> way,
>>> or there are special timing constrains.
>>>
>>> I’m not saying that this should block the split, but it is something that
>>> might need to be taken into account. Even if no immediate action
>> required,
>>> core/runtime modules contributors must be aware of small coverage and
>> that
>>> they should also monitor from time to time test failures in the
>> connectors.
>>>
>>> Re David and Dawid.
>>>
>>> I agree that this can create big pains from time to time. However if we
>> do
>>> the split correctly, along reasonably stable APIs boundaries, it should
>> be
>>> rare that some development effort requires changes/refactoring in the
>> core
>>> modules. Personally I’m only aware of one case when this would be needed
>> in
>>> the past two years in Flink: when adding Kafka 0.11 connector, I was also
>>> adding `TwoPhaseCommitSinkFunction`. And until Kafka 0.11 connector has
>>> stabilised, there were at least couple of changes added later to the
>>> `TwoPhaseCommitSinkFunction` in order for Kafka 0.11 connector to work
>>> (like transaction time outs).
>>>
>>> If we have counter proposal, let's talk it through.
>>>
>>>> In case of CI, as Dawid already mentioned, you only need to trigger
>>> build /
>>>> tests for the code you have changed and it's dependents. This should
>>>> greatly improve runtime of CI builds.
>>>
>>> However when we are doing change to network stack, in perfect setup, with
>>> good test coverage in `Flink-runtime` module, we shouldn’t be running
>>> connector or flink-ml tests (as long as we are not modifying the
>> behaviour
>>> or public apis). So triggering tests based on the dependencies would only
>>> half solve the problem.
>>>
>>> Besides that, there are two more benefits of repository split:
>>>
>>> 1. Test instabilities/intermittent failures of sub modules
>>> (connectors/flink-ml/flink-python/table-api) were causing us much more
>>> problems in the recent months, slowing down the development of lower
>> level
>>> modules. The more such modules we have, the more developers we have, it
>>> means that even assuming that we maintain our current standards, the
>> sheer
>>> number of intermittent failures will grow. If we comparmentize the
>>> repository into smaller one, we reduce the global probability of build
>>> failure (now the probability of a single build failure is P(Flink-core
>>> fails) + P(connector fails) + P(flink-ml fails) + … )
>>>
>>> But maybe we could also solve this with a more clever/better build
>> script?
>>> Defining test boundary - that connector tests are executed ONLY if the
>>> connector code was changed?
>>>
>>> Piotrek
>>>
>>>> On 8 Aug 2019, at 17:16, David Morávek <[hidden email]> wrote:
>>>>
>>>> +1 for the motivation, -1 for the solution as all of the problems
>> mention
>>>> above can be addressed with the mono-repo as well.
>>>>
>>>> Multiple repositories:
>>>> 1) This creates a big pain in case of change that targets code base in
>>>> multiple repositories. Change needs to be split in multiple PRs, that
>>> need
>>>> to be reviewed separately, merged in proper order, otherwise CI would
>>> fail
>>>> (also you need to rebuild "dependent PR", once its dependency gets
>>> merged -
>>>> this will just result in a lot of false positive PR build failures).
>> Also
>>>> if the change needs to be cherry-picked into multiple releases, it's
>>> really
>>>> easy to make a mistake.
>>>> 2) PR builds are not reproducible in case you depend on SNAPSHOTS.
>>>> 3) It makes release management way harder as all the parts are
>> versioned
>>>> separately.
>>>> 4) Refactoring over multi repositories.
>>>> 5) For newcomers, it's way harder to contribute, as the local setup
>> gets
>>>> complicated. Also depending on SNAPSHOTS from other project, can be
>> very
>>>> frustrating for people that are not too familiar with dep. management,
>> as
>>>> it often leads to unpredictable behavior due to local cache etc...
>>>>
>>>> The increased build / testing time, does not imply that the repository
>> is
>>>> too big, but that the current build system is not setup correctly (eg.
>>>> checkstyle takes for ages on my box, ...) / user is unaware of how to
>>>> leverage the current build system (eg. does not need to build
>> everything
>>>> from scratch every time he makes a change; can be improved in docs).
>>>>
>>>> In case of CI, as Dawid already mentioned, you only need to trigger
>>> build /
>>>> tests for the code you have changed and it's dependents. This should
>>>> greatly improve runtime of CI builds.
>>>>
>>>> D.
>>>>
>>>> On Thu, Aug 8, 2019 at 4:19 PM Dawid Wysakowicz <
>> [hidden email]
>>> <mailto:[hidden email]>>
>>>> wrote:
>>>>
>>>>> First of all I don't have much(if not at all) experience with working
>>> with
>>>>> a multi repository project of Flink's size. I would like to mention a
>>> few
>>>>> thoughts of mine, though. In general I am slightly against splitting
>> the
>>>>> repository. I fear that what we actually want to do is to introduce
>>> double
>>>>> standards for different modules with the repository split.
>>>>>
>>>>> As I understand there are two issues we want to solve with the split:
>>>>>
>>>>> 1) long build/testing time
>>>>>
>>>>> 2) increasing number of PRs
>>>>>
>>>>> Ad. 1 I agree this is a problem and that we don't necessarily need to
>>> run
>>>>> all the tests with every change or build the whole project all the
>> time.
>>>>> However, I think we could achieve that in a single repository and at
>> the
>>>>> same time keep the option to build all modules at once. If I am not
>>>>> mistaken this the approach that Apache Beam community decided to take
>>> (see
>>>>> e.g.
>>>>>
>>>
>> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PreCommit_Java.groovy
>>>>> where they define paths to file that if changed trigger the
>>> corresponding
>>>>> CI job). Maybe we could make it easier if we restructure the
>>> repository? To
>>>>> something like:
>>>>>
>>>>> flink/
>>>>> |--flink-main/
>>>>> |--flink-core/
>>>>> |--flink-runtime/
>>>>> ...
>>>>> |--flink-connectors/
>>>>> ...
>>>>> |--flink-filesystems.../
>>>>> ...
>>>>>
>>>>> |--root.pom
>>>>>
>>>>> In my opinion the Releases section from Chesnay's message shows well
>>> that
>>>>> it might not be the best option to split the repository. The option a)
>>>>> looks for me equivalent to what I suggested above but with a split.
>> The
>>>>> option b) looks for me super complicated and I can see no benefit over
>>>>> option a). The option c) would be the most reasonable one if we
>> decided
>>> to
>>>>> split the repository, if you ask me. The problem with this approach is
>>> the
>>>>> compatibility matrix (which versions of connectors work with which
>>> versions
>>>>> of Flink?). Moreover, for me it is an indicator of what I mentioned
>>> that we
>>>>> introduce double standards for those modules. I am not saying that I
>> am
>>>>> totally against that, but I think this should be a conscious decision.
>>>>>
>>>>> Ad. 2 I can't see how repository split could help with that rather
>> than
>>>>> moving some of the PRs to a separate list (that probably even less
>>> people
>>>>> would look into). Also I think we can achieve something like that
>>> already
>>>>> with github filters, no?
>>>>>
>>>>> To sum up my thoughts:
>>>>>
>>>>> 1. I think it is a good idea to split our CI builds to sub-modules
>>>>> (connectors being the first candidate), that would trigger on a
>>> changed
>>>>> path basis, but without splitting the repo.
>>>>> 2. My feeling is that the real question is if we want to change our
>>>>> stability guarantees of certain modules to be "just best effort".
>>>>> 3. If we were to vote on this proposal I would vote -0. I am
>> slightly
>>>>> against this change, but wouldn't oppose.
>>>>>
>>>>> Best,
>>>>>
>>>>> Dawid
>>>>> On 08/08/2019 13:23, Chesnay Schepler wrote:
>>>>>
>>>>>> I would like to also raise an additional issue: currently quite some
>>>>> bugs (like release blockers [1]) are being discovered by ITCases of
>> the
>>>>> connectors. It means that at least initially, the main repository will
>>> lose
>>>>> some test coverage.
>>>>>
>>>>> True, but I think this is more a symptom of us not properly testing
>> the
>>>>> contracts that are exposed to connectors.
>>>>> That we lose lose test coverage is already a big red flag as it
>> implies
>>>>> that issues were fixed and are now verified by a connector test, and
>>> not by
>>>>> a test in the Flink core.
>>>>> We could also look into tooling surrounding the CI bot for running the
>>>>> connectors tests on-demand, although this is very much long-term.
>>>>>
>>>>> On 08/08/2019 13:14, Piotr Nowojski wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Thanks for proposing and writing this down Chesney.
>>>>>
>>>>> Generally speaking +1 from my side for the idea. It will create
>>> additional
>>>>> pain for cross repository development, like some new feature in
>>> connectors
>>>>> that need some change in the main repository. I’ve worked in such
>> setup
>>>>> before and the teams then regretted having such split. But I agree
>> that
>>> we
>>>>> should try this to try solve the stability/build time issues.
>>>>>
>>>>> I have no experience in making such kind of splits so I can not help
>>> here.
>>>>>
>>>>> I would like to also raise an additional issue: currently quite some
>>> bugs
>>>>> (like release blockers [1]) are being discovered by ITCases of the
>>>>> connectors. It means that at least initially, the main repository will
>>> lose
>>>>> some test coverage.
>>>>>
>>>>> Piotrek
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/FLINK-13593 <
>>> https://issues.apache.org/jira/browse/FLINK-13593>
>>>>> <https://issues.apache.org/jira/browse/FLINK-13593 <
>>> https://issues.apache.org/jira/browse/FLINK-13593>>
>>>>> <https://issues.apache.org/jira/browse/FLINK-13593 <
>>> https://issues.apache.org/jira/browse/FLINK-13593>>
>>>>>
>>>>> On 7 Aug 2019, at 13:14, Chesnay Schepler <[hidden email]
>> <mailto:
>>> [hidden email]>>
>>>>> <[hidden email] <mailto:[hidden email]>> wrote:
>>>>>
>>>>> Hello everyone,
>>>>>
>>>>> The Flink project sees an ever-increasing amount of dev activity, both
>>> in
>>>>> terms of reworked and new features.
>>>>>
>>>>> This is of course an excellent situation to be in, but we are getting
>>> to a
>>>>> point where the associate downsides are becoming increasingly
>>> troublesome.
>>>>>
>>>>> The ever increasing build times, in addition to unstable tests,
>>>>> significantly slow down the develoment process.
>>>>> Additionally, pull requests for smaller features frequently slip
>> through
>>>>> the crasks as they are being buried under a mountain of other pull
>>>>> requests.
>>>>>
>>>>> As a result I'd like to start a discussion on splitting the Flink
>>>>> repository.
>>>>>
>>>>> In this mail I will outline the core idea, and what problems I
>> currently
>>>>> envision.
>>>>>
>>>>> I'd specifically like to encourage those who were part of similar
>>>>> initiatives in other projects to share the experiences and ideas.
>>>>>
>>>>>
>>>>> General Idea
>>>>>
>>>>> For starters, the idea is to create a new repository for
>>>>> "flink-connectors".
>>>>> For the remainder of this mail, the current Flink repository is
>> referred
>>>>> to as "flink-main".
>>>>>
>>>>> There are also other candidates that we could discuss in the future,
>>> like
>>>>> flink-libraries (the next top-priority repo to ease flink-ml
>>> development),
>>>>> metric reporters, filesystems and flink-formats.
>>>>>
>>>>> Moving out flink-connectors provides the most benefits, as we straight
>>>>> away save at-least an hour of testing time, and not being included in
>>> the
>>>>> binary distribution simplifies a few things.
>>>>>
>>>>>
>>>>> Problems to solve
>>>>>
>>>>> To make this a reality there's a number of questions we have to
>> discuss;
>>>>> some in the short-term, others in the long-term.
>>>>>
>>>>> 1) Git history
>>>>>
>>>>> We have to decide whether we want to rewrite the history of sub
>>>>> repositories to only contain diffs/commits related to this part of
>>>>> Flink, or whether we just fork from some commit in flink-main and
>>>>> add a commit to the connector repo that "transforms" it from
>>>>> flink-main to flink-connectors (i.e., remove everything unrelated to
>>>>> connectors + update module structure etc.).
>>>>>
>>>>> The latter option would have the advantage that our commit book
>>>>> keeping in JIRA would still be correct, but it would create a
>>>>> significant divide between the current and past state of the
>>>>> repository.
>>>>>
>>>>> 2) Maven
>>>>>
>>>>> We should look into whether there's a way to share dependency/plugin
>>>>> configurations and similar, so we don't have to keep them in sync
>>>>> manually across multiple repositories.
>>>>>
>>>>> A new parent Flink pom that all repositories define as their parent
>>>>> could work; this would imply splicing out part of the current room
>>>>> pom.xml.
>>>>>
>>>>> 3) Documentation
>>>>>
>>>>> Splitting the repository realistically also implies splitting the
>>>>> documentation source files (At the beginning we can get by with
>>>>> having it still in flink-main).
>>>>> We could just move the relevant files to the respective repository
>>>>> (while maintaining the directory structure), and merge them when
>>>>> building the docs.
>>>>>
>>>>> We also have to look at how we can handle java-/scaladocs; e.g.
>>>>> whether it is possible to aggregate them across projects.
>>>>>
>>>>> 4) CI (end-to-end tests)
>>>>>
>>>>> The very basic question we have to answer is whether we want E2E
>>>>> tests in the sub repositories. If so, we need to find a way to share
>>>>> e2e-tooling.
>>>>>
>>>>> 5) Releases
>>>>>
>>>>> We have to discuss how our release process will look like. This may
>>>>> also have repercussions on how repositories may depend on each other
>>>>> (SNAPSHOT vs LATEST). Note that this should be discussed for each
>>>>> repo separately.
>>>>>
>>>>> The current options I see are the following:
>>>>>
>>>>> a) Single release
>>>>>
>>>>> Release all repositories at once as a single product.
>>>>>
>>>>> The source release would be a collection of repositories, like
>>>>> flink/
>>>>> |--flink-main/
>>>>> |--flink-core/
>>>>> |--flink-runtime/
>>>>> ...
>>>>> |--flink-connectors/
>>>>> ...
>>>>> |--flink-.../
>>>>> ...
>>>>>
>>>>> This option requires a SNAPSHOT dependency between Flink
>>>>> repositories, but it is pretty much how things work at the
>> moment.
>>>>>
>>>>> b) Synced releases
>>>>>
>>>>> Similar to a), except that each repository gets their own source
>>>>> release that they may released independent of other
>> repositories.
>>>>> For a given release cycle each repo would produce exactly one
>>>>> release.
>>>>>
>>>>> This option requires a SNAPSHOT dependency between Flink
>>>>> repositories. Once any repositories has created an RC or
>>>>> finished it's release, release-branches in other repos can
>>>>> switch to that version.
>>>>>
>>>>> This approach is a tad more flexible than a), but requires more
>>>>> coordination between the repos.
>>>>>
>>>>> c) Separate releases
>>>>>
>>>>> Just like we handle flink-shaded; entirely separate release
>>>>> cycles; some repositories may have more releases in a given time
>>>>> period than others.
>>>>>
>>>>> This option implies a LATEST dependency between Flink
>>> repositories.
>>>>>
>>>>> Note that hybrid approaches would also make sense, like doing b) for
>>>>> major versions and c) for bugfix releases.
>>>>>
>>>>> For something like flink-libraries this question may also have
>>>>> repercussions on how/whether they are bundled in the distribution;
>>>>> options a)/b) would maintain the status-quo, c) and hybrid
>>>>> approaches will likely necessitate the exclusion from the
>>> distribution.
>>>
>>>
>>

Biao Liu

Re: [DISCUSS] Repository split

In reply to this post by Jark Wu-2

Hi folks,

Thanks for bringing this discussion Chesnay.

+1 for the motivation. It's really a bad experience of waiting Travis
building for a long time.

WRT the solution, personally I agree with Dawid/David.

IMO the biggest benefit of splitting repository is reducing build time. I
think it could be achieved without splitting the repository. That's the
best solution for me.

And there would be several pains I do really care about if we split the
repository.

1. Most of our users are developer. The non-developer users probably do not
care the code structure at all. They might use the released binary file
directly. For developers, the multiple repositories are not so friendly to
read, build or test the codes. I think it's a big regression.

2. It's definitely a nightmare to work across repositories. As Piotr said,
it's should be a rare case. However Jack raised a good example, debugging a
sub-repository IT case. Image the scenario, I'm debugging an unstable Kafka
IT case. I need to add some logs in runtime components to find some clues.
What should I do? I have to locally install the flink-main project for each
time after adding logs. And it's easy to make mistakes with switching
between repositories time after time.

To sum up, at least for now I agree with Dawid that we should go toward
splitting the CI builds not the repository.

Thanks,
Biao /'bɪ.aʊ/

On Fri, Aug 9, 2019 at 12:55 AM Jark Wu <[hidden email]> wrote:

> Hi,
>
> First of all, I agree with Dawid and David's point.
>
> I will share some experience on the repository split. We have been through
> it for Alibaba Blink, which is the most worthwhile project to learn from I
> think.
> We split Blink project into "blink-connectors" and "blink", but we didn't
> get much benefit for better development process. In the contrary, it slow
> down the development sometimes.
> We have suffered from the following issues after split as far as I can see:
>
> 1. Unstable build and test:
> The interface or behavior changes in the underlying (e.g. core, table) will
> lead to build fail and tests fail in the connectors repo. AFAIK, table api
> are still under heavy evolution.
> This will make connectors repo more unstable and makes us busy to fix the
> build problems and tests problems **after-commit**.
> First, it's not easy to locate which commit of main repo lead to the
> connectors repo fail (we have over 70+ commits every day in flink master
> now and it is growing).
> Second, when 2 or 3 build/test problems happened at one time, it's hard to
> fix the problem because we can't make the build/test pass in separate
> hotfix pull requests.
>
> 2. Debug difficulty:
> As modules are separate in different repositories, if we want to debug a
> Kafka IT case,
> we may need to debug some code in flink runtime or verify whether the
> runtime code change
> can fix the Kafka case. However, it will be more complex because they are
> not in one project.
>
> IMO, this actually slows down the development process.
>
> ------
>
> In my understanding, the issues we want to solve with the split include:
> 1) long build/testing time
> 2) unstable tests
> 3) increasing number of PRs
>
> Ad. 1 I think we have several ways to reduce the build/testing time. As
> Dawid said, we can trigger corresponding CI in a single repository (without
> to run all the tests).
> An easy way might be to analyse the pom.xml that which modules depends on
> the changed module. And one thing we can do right now is skipping all the
> tests for documentation changes.
>
> Ad. 2 I can't see how unstable connectors tests can be fixed more quickly
> after moved to a separate repositories. As far as I can tell, this problem
> might be more significant.
>
> Ad. 3 I also doubt how repository split could help with this. I think this
> will give the sub-repositories less exposure and bahir-flink[1] is an
> example (only 3 commits in the last 2 months).
>
> At the end, from my point of view,
> 1) if we want to reduce build/testing time, we can start a new thread to
> collect ideas from community. We can try some approaches to see if they can
> solve most of the problems.
> 2) if we want to split repository, we need to be cautious enough to the
> potential development slow down we might meet.
>
> Regards,
> Jark
>
> [1]: https://github.com/apache/bahir-flink/graphs/commit-activity
>
>
>
>
> On Fri, 9 Aug 2019 at 00:26, Till Rohrmann <[hidden email]> wrote:
>
> > I pretty much agree with your points Dav/wid. Some problems which we want
> > to solve with a respository split are clearly caused by the existing
> build
> > system (no incremental builds, not enough flexibility to only build a
> > subset of modules). Given that a repository split would be a major
> > endeavour with a lot of uncertainties, changing Flink's build system
> might
> > actually be simpler.
> >
> > In the past I tried to build Flink with Gradle because it better supports
> > incremental builds. Unfortunately, I never got it really off the grounds
> > because of too little time. Maybe it could be an option to investigate
> > other build systems like Gradle or Bazel and whether they could solve the
> > pain points around build time allowing us to keep a single repository.
> >
> > I second Piotr's concerns that we would actually lose test coverage with
> > splitting the repository. Just with the 1.9 release we found a problem in
> > the CheckpointFailureManager because of failing Kafka tests. It might
> have
> > taken us more time to figure this problem out if the test were failing
> in a
> > separate repository.
> >
> > Cheers,
> > Till
> >
> > On Thu, Aug 8, 2019 at 5:47 PM Piotr Nowojski <[hidden email]>
> wrote:
> >
> > > Hey,
> > >
> > > I retract my +1 (at least temporarily, until we discuss about
> alternative
> > > solutions).
> > >
> > > >> I would like to also raise an additional issue: currently quite
> some
> > > bugs (like release blockers [1]) are being discovered by ITCases of the
> > > connectors. It means that at least initially, the main repository will
> > lose
> > > some test coverage.
> > > >>
> > > > True, but I think this is more a symptom of us not properly testing
> the
> > > contracts that are exposed to connectors.
> > >
> > > Sure. In ideal world we should have properly test coverage and
> > > self-contained modules. In reality, especially when it comes to weird
> and
> > > quirky race conditions, some executions paths/races are triggered only
> in
> > > specific scenarios. For example when test is written in a very special
> > way,
> > > or there are special timing constrains.
> > >
> > > I’m not saying that this should block the split, but it is something
> that
> > > might need to be taken into account. Even if no immediate action
> > required,
> > > core/runtime modules contributors must be aware of small coverage and
> > that
> > > they should also monitor from time to time test failures in the
> > connectors.
> > >
> > > Re David and Dawid.
> > >
> > > I agree that this can create big pains from time to time. However if we
> > do
> > > the split correctly, along reasonably stable APIs boundaries, it should
> > be
> > > rare that some development effort requires changes/refactoring in the
> > core
> > > modules. Personally I’m only aware of one case when this would be
> needed
> > in
> > > the past two years in Flink: when adding Kafka 0.11 connector, I was
> also
> > > adding `TwoPhaseCommitSinkFunction`. And until Kafka 0.11 connector has
> > > stabilised, there were at least couple of changes added later to the
> > > `TwoPhaseCommitSinkFunction` in order for Kafka 0.11 connector to work
> > > (like transaction time outs).
> > >
> > > If we have counter proposal, let's talk it through.
> > >
> > > > In case of CI, as Dawid already mentioned, you only need to trigger
> > > build /
> > > > tests for the code you have changed and it's dependents. This should
> > > > greatly improve runtime of CI builds.
> > >
> > > However when we are doing change to network stack, in perfect setup,
> with
> > > good test coverage in `Flink-runtime` module, we shouldn’t be running
> > > connector or flink-ml tests (as long as we are not modifying the
> > behaviour
> > > or public apis). So triggering tests based on the dependencies would
> only
> > > half solve the problem.
> > >
> > > Besides that, there are two more benefits of repository split:
> > >
> > > 1. Test instabilities/intermittent failures of sub modules
> > > (connectors/flink-ml/flink-python/table-api) were causing us much more
> > > problems in the recent months, slowing down the development of lower
> > level
> > > modules. The more such modules we have, the more developers we have, it
> > > means that even assuming that we maintain our current standards, the
> > sheer
> > > number of intermittent failures will grow. If we comparmentize the
> > > repository into smaller one, we reduce the global probability of build
> > > failure (now the probability of a single build failure is P(Flink-core
> > > fails) + P(connector fails) + P(flink-ml fails) + … )
> > >
> > > But maybe we could also solve this with a more clever/better build
> > script?
> > > Defining test boundary - that connector tests are executed ONLY if the
> > > connector code was changed?
> > >
> > > Piotrek
> > >
> > > > On 8 Aug 2019, at 17:16, David Morávek <[hidden email]> wrote:
> > > >
> > > > +1 for the motivation, -1 for the solution as all of the problems
> > mention
> > > > above can be addressed with the mono-repo as well.
> > > >
> > > > Multiple repositories:
> > > > 1) This creates a big pain in case of change that targets code base
> in
> > > > multiple repositories. Change needs to be split in multiple PRs, that
> > > need
> > > > to be reviewed separately, merged in proper order, otherwise CI would
> > > fail
> > > > (also you need to rebuild "dependent PR", once its dependency gets
> > > merged -
> > > > this will just result in a lot of false positive PR build failures).
> > Also
> > > > if the change needs to be cherry-picked into multiple releases, it's
> > > really
> > > > easy to make a mistake.
> > > > 2) PR builds are not reproducible in case you depend on SNAPSHOTS.
> > > > 3) It makes release management way harder as all the parts are
> > versioned
> > > > separately.
> > > > 4) Refactoring over multi repositories.
> > > > 5) For newcomers, it's way harder to contribute, as the local setup
> > gets
> > > > complicated. Also depending on SNAPSHOTS from other project, can be
> > very
> > > > frustrating for people that are not too familiar with dep.
> management,
> > as
> > > > it often leads to unpredictable behavior due to local cache etc...
> > > >
> > > > The increased build / testing time, does not imply that the
> repository
> > is
> > > > too big, but that the current build system is not setup correctly
> (eg.
> > > > checkstyle takes for ages on my box, ...) / user is unaware of how to
> > > > leverage the current build system (eg. does not need to build
> > everything
> > > > from scratch every time he makes a change; can be improved in docs).
> > > >
> > > > In case of CI, as Dawid already mentioned, you only need to trigger
> > > build /
> > > > tests for the code you have changed and it's dependents. This should
> > > > greatly improve runtime of CI builds.
> > > >
> > > > D.
> > > >
> > > > On Thu, Aug 8, 2019 at 4:19 PM Dawid Wysakowicz <
> > [hidden email]
> > > <mailto:[hidden email]>>
> > > > wrote:
> > > >
> > > >> First of all I don't have much(if not at all) experience with
> working
> > > with
> > > >> a multi repository project of Flink's size. I would like to mention
> a
> > > few
> > > >> thoughts of mine, though. In general I am slightly against splitting
> > the
> > > >> repository. I fear that what we actually want to do is to introduce
> > > double
> > > >> standards for different modules with the repository split.
> > > >>
> > > >> As I understand there are two issues we want to solve with the
> split:
> > > >>
> > > >> 1) long build/testing time
> > > >>
> > > >> 2) increasing number of PRs
> > > >>
> > > >> Ad. 1 I agree this is a problem and that we don't necessarily need
> to
> > > run
> > > >> all the tests with every change or build the whole project all the
> > time.
> > > >> However, I think we could achieve that in a single repository and at
> > the
> > > >> same time keep the option to build all modules at once. If I am not
> > > >> mistaken this the approach that Apache Beam community decided to
> take
> > > (see
> > > >> e.g.
> > > >>
> > >
> >
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PreCommit_Java.groovy
> > > >> where they define paths to file that if changed trigger the
> > > corresponding
> > > >> CI job). Maybe we could make it easier if we restructure the
> > > repository? To
> > > >> something like:
> > > >>
> > > >> flink/
> > > >> |--flink-main/
> > > >> |--flink-core/
> > > >> |--flink-runtime/
> > > >> ...
> > > >> |--flink-connectors/
> > > >> ...
> > > >> |--flink-filesystems.../
> > > >> ...
> > > >>
> > > >> |--root.pom
> > > >>
> > > >> In my opinion the Releases section from Chesnay's message shows well
> > > that
> > > >> it might not be the best option to split the repository. The option
> a)
> > > >> looks for me equivalent to what I suggested above but with a split.
> > The
> > > >> option b) looks for me super complicated and I can see no benefit
> over
> > > >> option a). The option c) would be the most reasonable one if we
> > decided
> > > to
> > > >> split the repository, if you ask me. The problem with this approach
> is
> > > the
> > > >> compatibility matrix (which versions of connectors work with which
> > > versions
> > > >> of Flink?). Moreover, for me it is an indicator of what I mentioned
> > > that we
> > > >> introduce double standards for those modules. I am not saying that I
> > am
> > > >> totally against that, but I think this should be a conscious
> decision.
> > > >>
> > > >> Ad. 2 I can't see how repository split could help with that rather
> > than
> > > >> moving some of the PRs to a separate list (that probably even less
> > > people
> > > >> would look into). Also I think we can achieve something like that
> > > already
> > > >> with github filters, no?
> > > >>
> > > >> To sum up my thoughts:
> > > >>
> > > >> 1. I think it is a good idea to split our CI builds to sub-modules
> > > >> (connectors being the first candidate), that would trigger on a
> > > changed
> > > >> path basis, but without splitting the repo.
> > > >> 2. My feeling is that the real question is if we want to change
> our
> > > >> stability guarantees of certain modules to be "just best effort".
> > > >> 3. If we were to vote on this proposal I would vote -0. I am
> > slightly
> > > >> against this change, but wouldn't oppose.
> > > >>
> > > >> Best,
> > > >>
> > > >> Dawid
> > > >> On 08/08/2019 13:23, Chesnay Schepler wrote:
> > > >>
> > > >>> I would like to also raise an additional issue: currently quite
> some
> > > >> bugs (like release blockers [1]) are being discovered by ITCases of
> > the
> > > >> connectors. It means that at least initially, the main repository
> will
> > > lose
> > > >> some test coverage.
> > > >>
> > > >> True, but I think this is more a symptom of us not properly testing
> > the
> > > >> contracts that are exposed to connectors.
> > > >> That we lose lose test coverage is already a big red flag as it
> > implies
> > > >> that issues were fixed and are now verified by a connector test, and
> > > not by
> > > >> a test in the Flink core.
> > > >> We could also look into tooling surrounding the CI bot for running
> the
> > > >> connectors tests on-demand, although this is very much long-term.
> > > >>
> > > >> On 08/08/2019 13:14, Piotr Nowojski wrote:
> > > >>
> > > >> Hi,
> > > >>
> > > >> Thanks for proposing and writing this down Chesney.
> > > >>
> > > >> Generally speaking +1 from my side for the idea. It will create
> > > additional
> > > >> pain for cross repository development, like some new feature in
> > > connectors
> > > >> that need some change in the main repository. I’ve worked in such
> > setup
> > > >> before and the teams then regretted having such split. But I agree
> > that
> > > we
> > > >> should try this to try solve the stability/build time issues.
> > > >>
> > > >> I have no experience in making such kind of splits so I can not help
> > > here.
> > > >>
> > > >> I would like to also raise an additional issue: currently quite some
> > > bugs
> > > >> (like release blockers [1]) are being discovered by ITCases of the
> > > >> connectors. It means that at least initially, the main repository
> will
> > > lose
> > > >> some test coverage.
> > > >>
> > > >> Piotrek
> > > >>
> > > >> [1] https://issues.apache.org/jira/browse/FLINK-13593 <
> > > https://issues.apache.org/jira/browse/FLINK-13593>
> > > >> <https://issues.apache.org/jira/browse/FLINK-13593 <
> > > https://issues.apache.org/jira/browse/FLINK-13593>>
> > > >> <https://issues.apache.org/jira/browse/FLINK-13593 <
> > > https://issues.apache.org/jira/browse/FLINK-13593>>
> > > >>
> > > >> On 7 Aug 2019, at 13:14, Chesnay Schepler <[hidden email]
> > <mailto:
> > > [hidden email]>>
> > > >> <[hidden email] <mailto:[hidden email]>> wrote:
> > > >>
> > > >> Hello everyone,
> > > >>
> > > >> The Flink project sees an ever-increasing amount of dev activity,
> both
> > > in
> > > >> terms of reworked and new features.
> > > >>
> > > >> This is of course an excellent situation to be in, but we are
> getting
> > > to a
> > > >> point where the associate downsides are becoming increasingly
> > > troublesome.
> > > >>
> > > >> The ever increasing build times, in addition to unstable tests,
> > > >> significantly slow down the develoment process.
> > > >> Additionally, pull requests for smaller features frequently slip
> > through
> > > >> the crasks as they are being buried under a mountain of other pull
> > > >> requests.
> > > >>
> > > >> As a result I'd like to start a discussion on splitting the Flink
> > > >> repository.
> > > >>
> > > >> In this mail I will outline the core idea, and what problems I
> > currently
> > > >> envision.
> > > >>
> > > >> I'd specifically like to encourage those who were part of similar
> > > >> initiatives in other projects to share the experiences and ideas.
> > > >>
> > > >>
> > > >> General Idea
> > > >>
> > > >> For starters, the idea is to create a new repository for
> > > >> "flink-connectors".
> > > >> For the remainder of this mail, the current Flink repository is
> > referred
> > > >> to as "flink-main".
> > > >>
> > > >> There are also other candidates that we could discuss in the future,
> > > like
> > > >> flink-libraries (the next top-priority repo to ease flink-ml
> > > development),
> > > >> metric reporters, filesystems and flink-formats.
> > > >>
> > > >> Moving out flink-connectors provides the most benefits, as we
> straight
> > > >> away save at-least an hour of testing time, and not being included
> in
> > > the
> > > >> binary distribution simplifies a few things.
> > > >>
> > > >>
> > > >> Problems to solve
> > > >>
> > > >> To make this a reality there's a number of questions we have to
> > discuss;
> > > >> some in the short-term, others in the long-term.
> > > >>
> > > >> 1) Git history
> > > >>
> > > >> We have to decide whether we want to rewrite the history of sub
> > > >> repositories to only contain diffs/commits related to this part of
> > > >> Flink, or whether we just fork from some commit in flink-main and
> > > >> add a commit to the connector repo that "transforms" it from
> > > >> flink-main to flink-connectors (i.e., remove everything unrelated
> to
> > > >> connectors + update module structure etc.).
> > > >>
> > > >> The latter option would have the advantage that our commit book
> > > >> keeping in JIRA would still be correct, but it would create a
> > > >> significant divide between the current and past state of the
> > > >> repository.
> > > >>
> > > >> 2) Maven
> > > >>
> > > >> We should look into whether there's a way to share
> dependency/plugin
> > > >> configurations and similar, so we don't have to keep them in sync
> > > >> manually across multiple repositories.
> > > >>
> > > >> A new parent Flink pom that all repositories define as their
> parent
> > > >> could work; this would imply splicing out part of the current room
> > > >> pom.xml.
> > > >>
> > > >> 3) Documentation
> > > >>
> > > >> Splitting the repository realistically also implies splitting the
> > > >> documentation source files (At the beginning we can get by with
> > > >> having it still in flink-main).
> > > >> We could just move the relevant files to the respective repository
> > > >> (while maintaining the directory structure), and merge them when
> > > >> building the docs.
> > > >>
> > > >> We also have to look at how we can handle java-/scaladocs; e.g.
> > > >> whether it is possible to aggregate them across projects.
> > > >>
> > > >> 4) CI (end-to-end tests)
> > > >>
> > > >> The very basic question we have to answer is whether we want E2E
> > > >> tests in the sub repositories. If so, we need to find a way to
> share
> > > >> e2e-tooling.
> > > >>
> > > >> 5) Releases
> > > >>
> > > >> We have to discuss how our release process will look like. This
> may
> > > >> also have repercussions on how repositories may depend on each
> other
> > > >> (SNAPSHOT vs LATEST). Note that this should be discussed for each
> > > >> repo separately.
> > > >>
> > > >> The current options I see are the following:
> > > >>
> > > >> a) Single release
> > > >>
> > > >> Release all repositories at once as a single product.
> > > >>
> > > >> The source release would be a collection of repositories, like
> > > >> flink/
> > > >> |--flink-main/
> > > >> |--flink-core/
> > > >> |--flink-runtime/
> > > >> ...
> > > >> |--flink-connectors/
> > > >> ...
> > > >> |--flink-.../
> > > >> ...
> > > >>
> > > >> This option requires a SNAPSHOT dependency between Flink
> > > >> repositories, but it is pretty much how things work at the
> > moment.
> > > >>
> > > >> b) Synced releases
> > > >>
> > > >> Similar to a), except that each repository gets their own
> source
> > > >> release that they may released independent of other
> > repositories.
> > > >> For a given release cycle each repo would produce exactly one
> > > >> release.
> > > >>
> > > >> This option requires a SNAPSHOT dependency between Flink
> > > >> repositories. Once any repositories has created an RC or
> > > >> finished it's release, release-branches in other repos can
> > > >> switch to that version.
> > > >>
> > > >> This approach is a tad more flexible than a), but requires
> more
> > > >> coordination between the repos.
> > > >>
> > > >> c) Separate releases
> > > >>
> > > >> Just like we handle flink-shaded; entirely separate release
> > > >> cycles; some repositories may have more releases in a given
> time
> > > >> period than others.
> > > >>
> > > >> This option implies a LATEST dependency between Flink
> > > repositories.
> > > >>
> > > >> Note that hybrid approaches would also make sense, like doing b)
> for
> > > >> major versions and c) for bugfix releases.
> > > >>
> > > >> For something like flink-libraries this question may also have
> > > >> repercussions on how/whether they are bundled in the distribution;
> > > >> options a)/b) would maintain the status-quo, c) and hybrid
> > > >> approaches will likely necessitate the exclusion from the
> > > distribution.
> > >
> > >
> >
>

mxm

Re: [DISCUSS] Repository split

Apart from a technical explanation, the initial suggestion does not propose how the repository should be split up. The only meaningful split I see is for the connectors.

This discussion dates back a few years: https://lists.apache.org/thread.html/4ee502667a5801d23d76a01406e747e1a934417dc67ef7d26fb7f79c@1449757911@%3Cdev.flink.apache.org%3E

I would be in favor of keeping the mono repository. Like already mentioned here, there are other ways to resolve build time issue. For instance, in Beam we have granular build triggers that allow to test only specific components and their dependencies: https://github.com/apache/beam/blob/a2b57e3b55a09d641cee8c3b796cc6971a008db0/.test-infra/jenkins/job_PreCommit_Java.groovy#L26

Thanks,
Max

On 09.08.19 09:14, Biao Liu wrote:

> Hi folks,
>
> Thanks for bringing this discussion Chesnay.
>
> +1 for the motivation. It's really a bad experience of waiting Travis
> building for a long time.
>
> WRT the solution, personally I agree with Dawid/David.
>
> IMO the biggest benefit of splitting repository is reducing build time. I
> think it could be achieved without splitting the repository. That's the
> best solution for me.
>
> And there would be several pains I do really care about if we split the
> repository.
>
> 1. Most of our users are developer. The non-developer users probably do not
> care the code structure at all. They might use the released binary file
> directly. For developers, the multiple repositories are not so friendly to
> read, build or test the codes. I think it's a big regression.
>
> 2. It's definitely a nightmare to work across repositories. As Piotr said,
> it's should be a rare case. However Jack raised a good example, debugging a
> sub-repository IT case. Image the scenario, I'm debugging an unstable Kafka
> IT case. I need to add some logs in runtime components to find some clues.
> What should I do? I have to locally install the flink-main project for each
> time after adding logs. And it's easy to make mistakes with switching
> between repositories time after time.
>
> To sum up, at least for now I agree with Dawid that we should go toward
> splitting the CI builds not the repository.
>
> Thanks,
> Biao /'bɪ.aʊ/
>
>
>
> On Fri, Aug 9, 2019 at 12:55 AM Jark Wu <[hidden email]> wrote:
>
> > Hi,
> >
> > First of all, I agree with Dawid and David's point.
> >
> > I will share some experience on the repository split. We have been through
> > it for Alibaba Blink, which is the most worthwhile project to learn from I
> > think.
> > We split Blink project into "blink-connectors" and "blink", but we didn't
> > get much benefit for better development process. In the contrary, it slow
> > down the development sometimes.
> > We have suffered from the following issues after split as far as I can see:
> >
> > 1. Unstable build and test:
> > The interface or behavior changes in the underlying (e.g. core, table) will
> > lead to build fail and tests fail in the connectors repo. AFAIK, table api
> > are still under heavy evolution.
> > This will make connectors repo more unstable and makes us busy to fix the
> > build problems and tests problems **after-commit**.
> > First, it's not easy to locate which commit of main repo lead to the
> > connectors repo fail (we have over 70+ commits every day in flink master
> > now and it is growing).
> > Second, when 2 or 3 build/test problems happened at one time, it's hard to
> > fix the problem because we can't make the build/test pass in separate
> > hotfix pull requests.
> >
> > 2. Debug difficulty:
> > As modules are separate in different repositories, if we want to debug a
> > Kafka IT case,
> > we may need to debug some code in flink runtime or verify whether the
> > runtime code change
> > can fix the Kafka case. However, it will be more complex because they are
> > not in one project.
> >
> > IMO, this actually slows down the development process.
> >
> > ------
> >
> > In my understanding, the issues we want to solve with the split include:
> > 1) long build/testing time
> > 2) unstable tests
> > 3) increasing number of PRs
> >
> > Ad. 1 I think we have several ways to reduce the build/testing time. As
> > Dawid said, we can trigger corresponding CI in a single repository (without
> > to run all the tests).
> > An easy way might be to analyse the pom.xml that which modules depends on
> > the changed module. And one thing we can do right now is skipping all the
> > tests for documentation changes.
> >
> > Ad. 2 I can't see how unstable connectors tests can be fixed more quickly
> > after moved to a separate repositories. As far as I can tell, this problem
> > might be more significant.
> >
> > Ad. 3 I also doubt how repository split could help with this. I think this
> > will give the sub-repositories less exposure and bahir-flink[1] is an
> > example (only 3 commits in the last 2 months).
> >
> > At the end, from my point of view,
> >   1) if we want to reduce build/testing time, we can start a new thread to
> > collect ideas from community. We can try some approaches to see if they can
> > solve most of the problems.
> >   2) if we want to split repository, we need to be cautious enough to the
> > potential development slow down we might meet.
> >
> > Regards,
> > Jark
> >
> > [1]: https://github.com/apache/bahir-flink/graphs/commit-activity
> >
> >
> >
> >
> > On Fri, 9 Aug 2019 at 00:26, Till Rohrmann <[hidden email]> wrote:
> >
> >> I pretty much agree with your points Dav/wid. Some problems which we want
> >> to solve with a respository split are clearly caused by the existing
> > build
> >> system (no incremental builds, not enough flexibility to only build a
> >> subset of modules). Given that a repository split would be a major
> >> endeavour with a lot of uncertainties, changing Flink's build system
> > might
> >> actually be simpler.
> >>
> >> In the past I tried to build Flink with Gradle because it better supports
> >> incremental builds. Unfortunately, I never got it really off the grounds
> >> because of too little time. Maybe it could be an option to investigate
> >> other build systems like Gradle or Bazel and whether they could solve the
> >> pain points around build time allowing us to keep a single repository.
> >>
> >> I second Piotr's concerns that we would actually lose test coverage with
> >> splitting the repository. Just with the 1.9 release we found a problem in
> >> the CheckpointFailureManager because of failing Kafka tests. It might
> > have
> >> taken us more time to figure this problem out if the test were failing
> > in a
> >> separate repository.
> >>
> >> Cheers,
> >> Till
> >>
> >> On Thu, Aug 8, 2019 at 5:47 PM Piotr Nowojski <[hidden email]>
> > wrote:
> >>
> >>> Hey,
> >>>
> >>> I retract my +1 (at least temporarily, until we discuss about
> > alternative
> >>> solutions).
> >>>
> >>>>> I would like to also raise an additional issue: currently quite
> > some
> >>> bugs (like release blockers [1]) are being discovered by ITCases of the
> >>> connectors. It means that at least initially, the main repository will
> >> lose
> >>> some test coverage.
> >>>>>
> >>>> True, but I think this is more a symptom of us not properly testing
> > the
> >>> contracts that are exposed to connectors.
> >>>
> >>> Sure. In ideal world we should have properly test coverage and
> >>> self-contained modules. In reality, especially when it comes to weird
> > and
> >>> quirky race conditions, some executions paths/races are triggered only
> > in
> >>> specific scenarios. For example when test is written in a very special
> >> way,
> >>> or there are special timing constrains.
> >>>
> >>> I’m not saying that this should block the split, but it is something
> > that
> >>> might need to be taken into account. Even if no immediate action
> >> required,
> >>> core/runtime modules contributors must be aware of small coverage and
> >> that
> >>> they should also monitor from time to time test failures in the
> >> connectors.
> >>>
> >>> Re David and Dawid.
> >>>
> >>> I agree that this can create big pains from time to time. However if we
> >> do
> >>> the split correctly, along reasonably stable APIs boundaries, it should
> >> be
> >>> rare that some development effort requires changes/refactoring in the
> >> core
> >>> modules. Personally I’m only aware of one case when this would be
> > needed
> >> in
> >>> the past two years in Flink: when adding Kafka 0.11 connector, I was
> > also
> >>> adding `TwoPhaseCommitSinkFunction`. And until Kafka 0.11 connector has
> >>> stabilised, there were at least couple of changes added later to the
> >>> `TwoPhaseCommitSinkFunction` in order for Kafka 0.11 connector to work
> >>> (like transaction time outs).
> >>>
> >>> If we have counter proposal, let's talk it through.
> >>>
> >>>> In case of CI, as Dawid already mentioned, you only need to trigger
> >>> build /
> >>>> tests for the code you have changed and it's dependents. This should
> >>>> greatly improve runtime of CI builds.
> >>>
> >>> However when we are doing change to network stack, in perfect setup,
> > with
> >>> good test coverage in `Flink-runtime` module, we shouldn’t be running
> >>> connector or flink-ml tests (as long as we are not modifying the
> >> behaviour
> >>> or public apis). So triggering tests based on the dependencies would
> > only
> >>> half solve the problem.
> >>>
> >>> Besides that, there are two more benefits of repository split:
> >>>
> >>> 1. Test instabilities/intermittent failures of sub modules
> >>> (connectors/flink-ml/flink-python/table-api) were causing us much more
> >>> problems in the recent months, slowing down the development of lower
> >> level
> >>> modules. The more such modules we have, the more developers we have, it
> >>> means that even assuming that we maintain our current standards, the
> >> sheer
> >>> number of intermittent failures will grow. If we comparmentize the
> >>> repository into smaller one, we reduce the global probability of build
> >>> failure (now the probability of a single build failure is P(Flink-core
> >>> fails) + P(connector fails) + P(flink-ml fails) + … )
> >>>
> >>> But maybe we could also solve this with a more clever/better build
> >> script?
> >>> Defining test boundary - that connector tests are executed ONLY if the
> >>> connector code was changed?
> >>>
> >>> Piotrek
> >>>
> >>>> On 8 Aug 2019, at 17:16, David Morávek <[hidden email]> wrote:
> >>>>
> >>>> +1 for the motivation, -1 for the solution as all of the problems
> >> mention
> >>>> above can be addressed with the mono-repo as well.
> >>>>
> >>>> Multiple repositories:
> >>>> 1) This creates a big pain in case of change that targets code base
> > in
> >>>> multiple repositories. Change needs to be split in multiple PRs, that
> >>> need
> >>>> to be reviewed separately, merged in proper order, otherwise CI would
> >>> fail
> >>>> (also you need to rebuild "dependent PR", once its dependency gets
> >>> merged -
> >>>> this will just result in a lot of false positive PR build failures).
> >> Also
> >>>> if the change needs to be cherry-picked into multiple releases, it's
> >>> really
> >>>> easy to make a mistake.
> >>>> 2) PR builds are not reproducible in case you depend on SNAPSHOTS.
> >>>> 3) It makes release management way harder as all the parts are
> >> versioned
> >>>> separately.
> >>>> 4) Refactoring over multi repositories.
> >>>> 5) For newcomers, it's way harder to contribute, as the local setup
> >> gets
> >>>> complicated. Also depending on SNAPSHOTS from other project, can be
> >> very
> >>>> frustrating for people that are not too familiar with dep.
> > management,
> >> as
> >>>> it often leads to unpredictable behavior due to local cache etc...
> >>>>
> >>>> The increased build / testing time, does not imply that the
> > repository
> >> is
> >>>> too big, but that the current build system is not setup correctly
> > (eg.
> >>>> checkstyle takes for ages on my box, ...) / user is unaware of how to
> >>>> leverage the current build system (eg. does not need to build
> >> everything
> >>>> from scratch every time he makes a change; can be improved in docs).
> >>>>
> >>>> In case of CI, as Dawid already mentioned, you only need to trigger
> >>> build /
> >>>> tests for the code you have changed and it's dependents. This should
> >>>> greatly improve runtime of CI builds.
> >>>>
> >>>> D.
> >>>>
> >>>> On Thu, Aug 8, 2019 at 4:19 PM Dawid Wysakowicz <
> >> [hidden email]
> >>> <mailto:[hidden email]>>
> >>>> wrote:
> >>>>
> >>>>> First of all I don't have much(if not at all) experience with
> > working
> >>> with
> >>>>> a multi repository project of Flink's size. I would like to mention
> > a
> >>> few
> >>>>> thoughts of mine, though. In general I am slightly against splitting
> >> the
> >>>>> repository. I fear that what we actually want to do is to introduce
> >>> double
> >>>>> standards for different modules with the repository split.
> >>>>>
> >>>>> As I understand there are two issues we want to solve with the
> > split:
> >>>>>
> >>>>> 1) long build/testing time
> >>>>>
> >>>>> 2) increasing number of PRs
> >>>>>
> >>>>> Ad. 1 I agree this is a problem and that we don't necessarily need
> > to
> >>> run
> >>>>> all the tests with every change or build the whole project all the
> >> time.
> >>>>> However, I think we could achieve that in a single repository and at
> >> the
> >>>>> same time keep the option to build all modules at once. If I am not
> >>>>> mistaken this the approach that Apache Beam community decided to
> > take
> >>> (see
> >>>>> e.g.
> >>>>>
> >>>
> >>
> > https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PreCommit_Java.groovy
> >>>>> where they define paths to file that if changed trigger the
> >>> corresponding
> >>>>> CI job). Maybe we could make it easier if we restructure the
> >>> repository? To
> >>>>> something like:
> >>>>>
> >>>>>       flink/
> >>>>>       |--flink-main/
> >>>>>           |--flink-core/
> >>>>>           |--flink-runtime/
> >>>>>           ...
> >>>>>       |--flink-connectors/
> >>>>>           ...
> >>>>>       |--flink-filesystems.../
> >>>>>       ...
> >>>>>
> >>>>>       |--root.pom
> >>>>>
> >>>>> In my opinion the Releases section from Chesnay's message shows well
> >>> that
> >>>>> it might not be the best option to split the repository. The option
> > a)
> >>>>> looks for me equivalent to what I suggested above but with a split.
> >> The
> >>>>> option b) looks for me super complicated and I can see no benefit
> > over
> >>>>> option a). The option c) would be the most reasonable one if we
> >> decided
> >>> to
> >>>>> split the repository, if you ask me. The problem with this approach
> > is
> >>> the
> >>>>> compatibility matrix (which versions of connectors work with which
> >>> versions
> >>>>> of Flink?). Moreover, for me it is an indicator of what I mentioned
> >>> that we
> >>>>> introduce double standards for those modules. I am not saying that I
> >> am
> >>>>> totally against that, but I think this should be a conscious
> > decision.
> >>>>>
> >>>>> Ad. 2 I can't see how repository split could help with that rather
> >> than
> >>>>> moving some of the PRs to a separate list (that probably even less
> >>> people
> >>>>> would look into). Also I think we can achieve something like that
> >>> already
> >>>>> with github filters, no?
> >>>>>
> >>>>> To sum up my thoughts:
> >>>>>
> >>>>>   1. I think it is a good idea to split our CI builds to sub-modules
> >>>>>   (connectors being the first candidate), that would trigger on a
> >>> changed
> >>>>>   path basis, but without splitting the repo.
> >>>>>   2. My feeling is that the real question is if we want to change
> > our
> >>>>>   stability guarantees of certain modules to be "just best effort".
> >>>>>   3. If we were to vote on this proposal I would vote -0. I am
> >> slightly
> >>>>>   against this change, but wouldn't oppose.
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Dawid
> >>>>> On 08/08/2019 13:23, Chesnay Schepler wrote:
> >>>>>
> >>>>>> I would like to also raise an additional issue: currently quite
> > some
> >>>>> bugs (like release blockers [1]) are being discovered by ITCases of
> >> the
> >>>>> connectors. It means that at least initially, the main repository
> > will
> >>> lose
> >>>>> some test coverage.
> >>>>>
> >>>>> True, but I think this is more a symptom of us not properly testing
> >> the
> >>>>> contracts that are exposed to connectors.
> >>>>> That we lose lose test coverage is already a big red flag as it
> >> implies
> >>>>> that issues were fixed and are now verified by a connector test, and
> >>> not by
> >>>>> a test in the Flink core.
> >>>>> We could also look into tooling surrounding the CI bot for running
> > the
> >>>>> connectors tests on-demand, although this is very much long-term.
> >>>>>
> >>>>> On 08/08/2019 13:14, Piotr Nowojski wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> Thanks for proposing and writing this down Chesney.
> >>>>>
> >>>>> Generally speaking +1 from my side for the idea. It will create
> >>> additional
> >>>>> pain for cross repository development, like some new feature in
> >>> connectors
> >>>>> that need some change in the main repository. I’ve worked in such
> >> setup
> >>>>> before and the teams then regretted having such split. But I agree
> >> that
> >>> we
> >>>>> should try this to try solve the stability/build time issues.
> >>>>>
> >>>>> I have no experience in making such kind of splits so I can not help
> >>> here.
> >>>>>
> >>>>> I would like to also raise an additional issue: currently quite some
> >>> bugs
> >>>>> (like release blockers [1]) are being discovered by ITCases of the
> >>>>> connectors. It means that at least initially, the main repository
> > will
> >>> lose
> >>>>> some test coverage.
> >>>>>
> >>>>> Piotrek
> >>>>>
> >>>>> [1] https://issues.apache.org/jira/browse/FLINK-13593 <
> >>> https://issues.apache.org/jira/browse/FLINK-13593>
> >>>>> <https://issues.apache.org/jira/browse/FLINK-13593 <
> >>> https://issues.apache.org/jira/browse/FLINK-13593>>
> >>>>> <https://issues.apache.org/jira/browse/FLINK-13593 <
> >>> https://issues.apache.org/jira/browse/FLINK-13593>>
> >>>>>
> >>>>> On 7 Aug 2019, at 13:14, Chesnay Schepler <[hidden email]
> >> <mailto:
> >>> [hidden email]>>
> >>>>> <[hidden email] <mailto:[hidden email]>> wrote:
> >>>>>
> >>>>> Hello everyone,
> >>>>>
> >>>>> The Flink project sees an ever-increasing amount of dev activity,
> > both
> >>> in
> >>>>> terms of reworked and new features.
> >>>>>
> >>>>> This is of course an excellent situation to be in, but we are
> > getting
> >>> to a
> >>>>> point where the associate downsides are becoming increasingly
> >>> troublesome.
> >>>>>
> >>>>> The ever increasing build times, in addition to unstable tests,
> >>>>> significantly slow down the develoment process.
> >>>>> Additionally, pull requests for smaller features frequently slip
> >> through
> >>>>> the crasks as they are being buried under a mountain of other pull
> >>>>> requests.
> >>>>>
> >>>>> As a result I'd like to start a discussion on splitting the Flink
> >>>>> repository.
> >>>>>
> >>>>> In this mail I will outline the core idea, and what problems I
> >> currently
> >>>>> envision.
> >>>>>
> >>>>> I'd specifically like to encourage those who were part of similar
> >>>>> initiatives in other projects to share the experiences and ideas.
> >>>>>
> >>>>>
> >>>>>       General Idea
> >>>>>
> >>>>> For starters, the idea is to create a new repository for
> >>>>> "flink-connectors".
> >>>>> For the remainder of this mail, the current Flink repository is
> >> referred
> >>>>> to as "flink-main".
> >>>>>
> >>>>> There are also other candidates that we could discuss in the future,
> >>> like
> >>>>> flink-libraries (the next top-priority repo to ease flink-ml
> >>> development),
> >>>>> metric reporters, filesystems and flink-formats.
> >>>>>
> >>>>> Moving out flink-connectors provides the most benefits, as we
> > straight
> >>>>> away save at-least an hour of testing time, and not being included
> > in
> >>> the
> >>>>> binary distribution simplifies a few things.
> >>>>>
> >>>>>
> >>>>>       Problems to solve
> >>>>>
> >>>>> To make this a reality there's a number of questions we have to
> >> discuss;
> >>>>> some in the short-term, others in the long-term.
> >>>>>
> >>>>> 1) Git history
> >>>>>
> >>>>>   We have to decide whether we want to rewrite the history of sub
> >>>>>   repositories to only contain diffs/commits related to this part of
> >>>>>   Flink, or whether we just fork from some commit in flink-main and
> >>>>>   add a commit to the connector repo that "transforms" it from
> >>>>>   flink-main to flink-connectors (i.e., remove everything unrelated
> > to
> >>>>>   connectors + update module structure etc.).
> >>>>>
> >>>>>   The latter option would have the advantage that our commit book
> >>>>>   keeping in JIRA would still be correct, but it would create a
> >>>>>   significant divide between the current and past state of the
> >>>>> repository.
> >>>>>
> >>>>> 2) Maven
> >>>>>
> >>>>>   We should look into whether there's a way to share
> > dependency/plugin
> >>>>>   configurations and similar, so we don't have to keep them in sync
> >>>>>   manually across multiple repositories.
> >>>>>
> >>>>>   A new parent Flink pom that all repositories define as their
> > parent
> >>>>>   could work; this would imply splicing out part of the current room
> >>>>>   pom.xml.
> >>>>>
> >>>>> 3) Documentation
> >>>>>
> >>>>>   Splitting the repository realistically also implies splitting the
> >>>>>   documentation source files (At the beginning we can get by with
> >>>>>   having it still in flink-main).
> >>>>>   We could just move the relevant files to the respective repository
> >>>>>   (while maintaining the directory structure), and merge them when
> >>>>>   building the docs.
> >>>>>
> >>>>>   We also have to look at how we can handle java-/scaladocs; e.g.
> >>>>>   whether it is possible to aggregate them across projects.
> >>>>>
> >>>>> 4) CI (end-to-end tests)
> >>>>>
> >>>>>   The very basic question we have to answer is whether we want E2E
> >>>>>   tests in the sub repositories. If so, we need to find a way to
> > share
> >>>>>   e2e-tooling.
> >>>>>
> >>>>> 5) Releases
> >>>>>
> >>>>>   We have to discuss how our release process will look like. This
> > may
> >>>>>   also have repercussions on how repositories may depend on each
> > other
> >>>>>   (SNAPSHOT vs LATEST). Note that this should be discussed for each
> >>>>>   repo separately.
> >>>>>
> >>>>>   The current options I see are the following:
> >>>>>
> >>>>>   a) Single release
> >>>>>
> >>>>>       Release all repositories at once as a single product.
> >>>>>
> >>>>>       The source release would be a collection of repositories, like
> >>>>>       flink/
> >>>>>       |--flink-main/
> >>>>>           |--flink-core/
> >>>>>           |--flink-runtime/
> >>>>>           ...
> >>>>>       |--flink-connectors/
> >>>>>           ...
> >>>>>       |--flink-.../
> >>>>>       ...
> >>>>>
> >>>>>       This option requires a SNAPSHOT dependency between Flink
> >>>>>       repositories, but it is pretty much how things work at the
> >> moment.
> >>>>>
> >>>>>   b) Synced releases
> >>>>>
> >>>>>       Similar to a), except that each repository gets their own
> > source
> >>>>>       release that they may released independent of other
> >> repositories.
> >>>>>       For a given release cycle each repo would produce exactly one
> >>>>>       release.
> >>>>>
> >>>>>       This option requires a SNAPSHOT dependency between Flink
> >>>>>       repositories. Once any repositories has created an RC or
> >>>>>       finished it's release, release-branches in other repos can
> >>>>>       switch to that version.
> >>>>>
> >>>>>       This approach is a tad more flexible than a), but requires
> > more
> >>>>>       coordination between the repos.
> >>>>>
> >>>>>   c) Separate releases
> >>>>>
> >>>>>       Just like we handle flink-shaded; entirely separate release
> >>>>>       cycles; some repositories may have more releases in a given
> > time
> >>>>>       period than others.
> >>>>>
> >>>>>       This option implies a LATEST dependency between Flink
> >>> repositories.
> >>>>>
> >>>>>   Note that hybrid approaches would also make sense, like doing b)
> > for
> >>>>>   major versions and c) for bugfix releases.
> >>>>>
> >>>>>   For something like flink-libraries this question may also have
> >>>>>   repercussions on how/whether they are bundled in the distribution;
> >>>>>   options a)/b) would maintain the status-quo, c) and hybrid
> >>>>>   approaches will likely necessitate the exclusion from the
> >>> distribution.
> >>>
> >>>
> >>
> >
>

Stephan Ewen

Re: [DISCUSS] Repository split

Thank you all for the good discussion.

I was one of the folks that thinking about such a repository split together
with Chesnay, but due to lack of prior experience, happy to hear all the
points that

Let's investigate a bit what would be alternatives to this that solve the
two problems.

(1) Build time: Seems that some sophisticated CI setup can go a long way
there. Would that also be possible on Travis (via artifact caching?)

(2) Huge amount of PRs against the same repo, PRs getting lost, fast
moving master makes it hard to catch up and have a clean rebase-test-merge
run.

Just as a thought experiment: What if we "mimic" the repo split in the mono
repo, by re-arranging the modules in the mono-repo as they would be in
separate repos?
A developer could simply "cd flink-main" and "mvn clean verify" in there.
Would be a simple experience, no need to understand build profiles, etc.

Would that help us with anything? Maybe even with point (2), if we start
treating the top level directories as organizational units?

flink
+-- flink-main
+-- flink-core
+-- flink-datastream-java
+-- flink-runtime
+-- etc
+-- flink-connectors
+-- flink-kafka
+-- flink-kinesis
+-- etc
+-- flink-libs
+-- flink-ml
+-- flink-
+-- flink-core
+-- etc

Best,
Stephan

On Sun, Aug 11, 2019 at 4:36 PM Maximilian Michels <[hidden email]> wrote:

> Apart from a technical explanation, the initial suggestion does not
> propose how the repository should be split up. The only meaningful split I
> see is for the connectors.
>
> This discussion dates back a few years:
> https://lists.apache.org/thread.html/4ee502667a5801d23d76a01406e747e1a934417dc67ef7d26fb7f79c@1449757911@%3Cdev.flink.apache.org%3E
>
> I would be in favor of keeping the mono repository. Like already mentioned
> here, there are other ways to resolve build time issue. For instance, in
> Beam we have granular build triggers that allow to test only specific
> components and their dependencies:
> https://github.com/apache/beam/blob/a2b57e3b55a09d641cee8c3b796cc6971a008db0/.test-infra/jenkins/job_PreCommit_Java.groovy#L26
>
> Thanks,
> Max
>
> On 09.08.19 09:14, Biao Liu wrote:
> > Hi folks,
> >
> > Thanks for bringing this discussion Chesnay.
> >
> > +1 for the motivation. It's really a bad experience of waiting Travis
> > building for a long time.
> >
> > WRT the solution, personally I agree with Dawid/David.
> >
> > IMO the biggest benefit of splitting repository is reducing build time. I
> > think it could be achieved without splitting the repository. That's the
> > best solution for me.
> >
> > And there would be several pains I do really care about if we split the
> > repository.
> >
> > 1. Most of our users are developer. The non-developer users probably do
> not
> > care the code structure at all. They might use the released binary file
> > directly. For developers, the multiple repositories are not so friendly
> to
> > read, build or test the codes. I think it's a big regression.
> >
> > 2. It's definitely a nightmare to work across repositories. As Piotr
> said,
> > it's should be a rare case. However Jack raised a good example,
> debugging a
> > sub-repository IT case. Image the scenario, I'm debugging an unstable
> Kafka
> > IT case. I need to add some logs in runtime components to find some
> clues.
> > What should I do? I have to locally install the flink-main project for
> each
> > time after adding logs. And it's easy to make mistakes with switching
> > between repositories time after time.
> >
> > To sum up, at least for now I agree with Dawid that we should go toward
> > splitting the CI builds not the repository.
> >
> > Thanks,
> > Biao /'bɪ.aʊ/
> >
> >
> >
> > On Fri, Aug 9, 2019 at 12:55 AM Jark Wu <[hidden email]> wrote:
> >
> > > Hi,
> > >
> > > First of all, I agree with Dawid and David's point.
> > >
> > > I will share some experience on the repository split. We have been
> through
> > > it for Alibaba Blink, which is the most worthwhile project to learn
> from I
> > > think.
> > > We split Blink project into "blink-connectors" and "blink", but we
> didn't
> > > get much benefit for better development process. In the contrary, it
> slow
> > > down the development sometimes.
> > > We have suffered from the following issues after split as far as I can
> see:
> > >
> > > 1. Unstable build and test:
> > > The interface or behavior changes in the underlying (e.g. core, table)
> will
> > > lead to build fail and tests fail in the connectors repo. AFAIK, table
> api
> > > are still under heavy evolution.
> > > This will make connectors repo more unstable and makes us busy to fix
> the
> > > build problems and tests problems **after-commit**.
> > > First, it's not easy to locate which commit of main repo lead to the
> > > connectors repo fail (we have over 70+ commits every day in flink
> master
> > > now and it is growing).
> > > Second, when 2 or 3 build/test problems happened at one time, it's
> hard to
> > > fix the problem because we can't make the build/test pass in separate
> > > hotfix pull requests.
> > >
> > > 2. Debug difficulty:
> > > As modules are separate in different repositories, if we want to debug
> a
> > > Kafka IT case,
> > > we may need to debug some code in flink runtime or verify whether the
> > > runtime code change
> > > can fix the Kafka case. However, it will be more complex because they
> are
> > > not in one project.
> > >
> > > IMO, this actually slows down the development process.
> > >
> > > ------
> > >
> > > In my understanding, the issues we want to solve with the split
> include:
> > > 1) long build/testing time
> > > 2) unstable tests
> > > 3) increasing number of PRs
> > >
> > > Ad. 1 I think we have several ways to reduce the build/testing time. As
> > > Dawid said, we can trigger corresponding CI in a single repository
> (without
> > > to run all the tests).
> > > An easy way might be to analyse the pom.xml that which modules depends
> on
> > > the changed module. And one thing we can do right now is skipping all
> the
> > > tests for documentation changes.
> > >
> > > Ad. 2 I can't see how unstable connectors tests can be fixed more
> quickly
> > > after moved to a separate repositories. As far as I can tell, this
> problem
> > > might be more significant.
> > >
> > > Ad. 3 I also doubt how repository split could help with this. I think
> this
> > > will give the sub-repositories less exposure and bahir-flink[1] is an
> > > example (only 3 commits in the last 2 months).
> > >
> > > At the end, from my point of view,
> > > 1) if we want to reduce build/testing time, we can start a new
> thread to
> > > collect ideas from community. We can try some approaches to see if
> they can
> > > solve most of the problems.
> > > 2) if we want to split repository, we need to be cautious enough to
> the
> > > potential development slow down we might meet.
> > >
> > > Regards,
> > > Jark
> > >
> > > [1]: https://github.com/apache/bahir-flink/graphs/commit-activity
> > >
> > >
> > >
> > >
> > > On Fri, 9 Aug 2019 at 00:26, Till Rohrmann <[hidden email]>
> wrote:
> > >
> > >> I pretty much agree with your points Dav/wid. Some problems which we
> want
> > >> to solve with a respository split are clearly caused by the existing
> > > build
> > >> system (no incremental builds, not enough flexibility to only build a
> > >> subset of modules). Given that a repository split would be a major
> > >> endeavour with a lot of uncertainties, changing Flink's build system
> > > might
> > >> actually be simpler.
> > >>
> > >> In the past I tried to build Flink with Gradle because it better
> supports
> > >> incremental builds. Unfortunately, I never got it really off the
> grounds
> > >> because of too little time. Maybe it could be an option to investigate
> > >> other build systems like Gradle or Bazel and whether they could solve
> the
> > >> pain points around build time allowing us to keep a single repository.
> > >>
> > >> I second Piotr's concerns that we would actually lose test coverage
> with
> > >> splitting the repository. Just with the 1.9 release we found a
> problem in
> > >> the CheckpointFailureManager because of failing Kafka tests. It might
> > > have
> > >> taken us more time to figure this problem out if the test were failing
> > > in a
> > >> separate repository.
> > >>
> > >> Cheers,
> > >> Till
> > >>
> > >> On Thu, Aug 8, 2019 at 5:47 PM Piotr Nowojski <[hidden email]>
> > > wrote:
> > >>
> > >>> Hey,
> > >>>
> > >>> I retract my +1 (at least temporarily, until we discuss about
> > > alternative
> > >>> solutions).
> > >>>
> > >>>>> I would like to also raise an additional issue: currently quite
> > > some
> > >>> bugs (like release blockers [1]) are being discovered by ITCases of
> the
> > >>> connectors. It means that at least initially, the main repository
> will
> > >> lose
> > >>> some test coverage.
> > >>>>>
> > >>>> True, but I think this is more a symptom of us not properly testing
> > > the
> > >>> contracts that are exposed to connectors.
> > >>>
> > >>> Sure. In ideal world we should have properly test coverage and
> > >>> self-contained modules. In reality, especially when it comes to weird
> > > and
> > >>> quirky race conditions, some executions paths/races are triggered
> only
> > > in
> > >>> specific scenarios. For example when test is written in a very
> special
> > >> way,
> > >>> or there are special timing constrains.
> > >>>
> > >>> I’m not saying that this should block the split, but it is something
> > > that
> > >>> might need to be taken into account. Even if no immediate action
> > >> required,
> > >>> core/runtime modules contributors must be aware of small coverage and
> > >> that
> > >>> they should also monitor from time to time test failures in the
> > >> connectors.
> > >>>
> > >>> Re David and Dawid.
> > >>>
> > >>> I agree that this can create big pains from time to time. However if
> we
> > >> do
> > >>> the split correctly, along reasonably stable APIs boundaries, it
> should
> > >> be
> > >>> rare that some development effort requires changes/refactoring in the
> > >> core
> > >>> modules. Personally I’m only aware of one case when this would be
> > > needed
> > >> in
> > >>> the past two years in Flink: when adding Kafka 0.11 connector, I was
> > > also
> > >>> adding `TwoPhaseCommitSinkFunction`. And until Kafka 0.11 connector
> has
> > >>> stabilised, there were at least couple of changes added later to the
> > >>> `TwoPhaseCommitSinkFunction` in order for Kafka 0.11 connector to
> work
> > >>> (like transaction time outs).
> > >>>
> > >>> If we have counter proposal, let's talk it through.
> > >>>
> > >>>> In case of CI, as Dawid already mentioned, you only need to trigger
> > >>> build /
> > >>>> tests for the code you have changed and it's dependents. This should
> > >>>> greatly improve runtime of CI builds.
> > >>>
> > >>> However when we are doing change to network stack, in perfect setup,
> > > with
> > >>> good test coverage in `Flink-runtime` module, we shouldn’t be running
> > >>> connector or flink-ml tests (as long as we are not modifying the
> > >> behaviour
> > >>> or public apis). So triggering tests based on the dependencies would
> > > only
> > >>> half solve the problem.
> > >>>
> > >>> Besides that, there are two more benefits of repository split:
> > >>>
> > >>> 1. Test instabilities/intermittent failures of sub modules
> > >>> (connectors/flink-ml/flink-python/table-api) were causing us much
> more
> > >>> problems in the recent months, slowing down the development of lower
> > >> level
> > >>> modules. The more such modules we have, the more developers we have,
> it
> > >>> means that even assuming that we maintain our current standards, the
> > >> sheer
> > >>> number of intermittent failures will grow. If we comparmentize the
> > >>> repository into smaller one, we reduce the global probability of
> build
> > >>> failure (now the probability of a single build failure is
> P(Flink-core
> > >>> fails) + P(connector fails) + P(flink-ml fails) + … )
> > >>>
> > >>> But maybe we could also solve this with a more clever/better build
> > >> script?
> > >>> Defining test boundary - that connector tests are executed ONLY if
> the
> > >>> connector code was changed?
> > >>>
> > >>> Piotrek
> > >>>
> > >>>> On 8 Aug 2019, at 17:16, David Morávek <[hidden email]> wrote:
> > >>>>
> > >>>> +1 for the motivation, -1 for the solution as all of the problems
> > >> mention
> > >>>> above can be addressed with the mono-repo as well.
> > >>>>
> > >>>> Multiple repositories:
> > >>>> 1) This creates a big pain in case of change that targets code base
> > > in
> > >>>> multiple repositories. Change needs to be split in multiple PRs,
> that
> > >>> need
> > >>>> to be reviewed separately, merged in proper order, otherwise CI
> would
> > >>> fail
> > >>>> (also you need to rebuild "dependent PR", once its dependency gets
> > >>> merged -
> > >>>> this will just result in a lot of false positive PR build failures).
> > >> Also
> > >>>> if the change needs to be cherry-picked into multiple releases, it's
> > >>> really
> > >>>> easy to make a mistake.
> > >>>> 2) PR builds are not reproducible in case you depend on SNAPSHOTS.
> > >>>> 3) It makes release management way harder as all the parts are
> > >> versioned
> > >>>> separately.
> > >>>> 4) Refactoring over multi repositories.
> > >>>> 5) For newcomers, it's way harder to contribute, as the local setup
> > >> gets
> > >>>> complicated. Also depending on SNAPSHOTS from other project, can be
> > >> very
> > >>>> frustrating for people that are not too familiar with dep.
> > > management,
> > >> as
> > >>>> it often leads to unpredictable behavior due to local cache etc...
> > >>>>
> > >>>> The increased build / testing time, does not imply that the
> > > repository
> > >> is
> > >>>> too big, but that the current build system is not setup correctly
> > > (eg.
> > >>>> checkstyle takes for ages on my box, ...) / user is unaware of how
> to
> > >>>> leverage the current build system (eg. does not need to build
> > >> everything
> > >>>> from scratch every time he makes a change; can be improved in docs).
> > >>>>
> > >>>> In case of CI, as Dawid already mentioned, you only need to trigger
> > >>> build /
> > >>>> tests for the code you have changed and it's dependents. This should
> > >>>> greatly improve runtime of CI builds.
> > >>>>
> > >>>> D.
> > >>>>
> > >>>> On Thu, Aug 8, 2019 at 4:19 PM Dawid Wysakowicz <
> > >> [hidden email]
> > >>> <mailto:[hidden email]>>
> > >>>> wrote:
> > >>>>
> > >>>>> First of all I don't have much(if not at all) experience with
> > > working
> > >>> with
> > >>>>> a multi repository project of Flink's size. I would like to mention
> > > a
> > >>> few
> > >>>>> thoughts of mine, though. In general I am slightly against
> splitting
> > >> the
> > >>>>> repository. I fear that what we actually want to do is to introduce
> > >>> double
> > >>>>> standards for different modules with the repository split.
> > >>>>>
> > >>>>> As I understand there are two issues we want to solve with the
> > > split:
> > >>>>>
> > >>>>> 1) long build/testing time
> > >>>>>
> > >>>>> 2) increasing number of PRs
> > >>>>>
> > >>>>> Ad. 1 I agree this is a problem and that we don't necessarily need
> > > to
> > >>> run
> > >>>>> all the tests with every change or build the whole project all the
> > >> time.
> > >>>>> However, I think we could achieve that in a single repository and
> at
> > >> the
> > >>>>> same time keep the option to build all modules at once. If I am not
> > >>>>> mistaken this the approach that Apache Beam community decided to
> > > take
> > >>> (see
> > >>>>> e.g.
> > >>>>>
> > >>>
> > >>
> > >
> https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PreCommit_Java.groovy
> > >>>>> where they define paths to file that if changed trigger the
> > >>> corresponding
> > >>>>> CI job). Maybe we could make it easier if we restructure the
> > >>> repository? To
> > >>>>> something like:
> > >>>>>
> > >>>>> flink/
> > >>>>> |--flink-main/
> > >>>>> |--flink-core/
> > >>>>> |--flink-runtime/
> > >>>>> ...
> > >>>>> |--flink-connectors/
> > >>>>> ...
> > >>>>> |--flink-filesystems.../
> > >>>>> ...
> > >>>>>
> > >>>>> |--root.pom
> > >>>>>
> > >>>>> In my opinion the Releases section from Chesnay's message shows
> well
> > >>> that
> > >>>>> it might not be the best option to split the repository. The option
> > > a)
> > >>>>> looks for me equivalent to what I suggested above but with a split.
> > >> The
> > >>>>> option b) looks for me super complicated and I can see no benefit
> > > over
> > >>>>> option a). The option c) would be the most reasonable one if we
> > >> decided
> > >>> to
> > >>>>> split the repository, if you ask me. The problem with this approach
> > > is
> > >>> the
> > >>>>> compatibility matrix (which versions of connectors work with which
> > >>> versions
> > >>>>> of Flink?). Moreover, for me it is an indicator of what I mentioned
> > >>> that we
> > >>>>> introduce double standards for those modules. I am not saying that
> I
> > >> am
> > >>>>> totally against that, but I think this should be a conscious
> > > decision.
> > >>>>>
> > >>>>> Ad. 2 I can't see how repository split could help with that rather
> > >> than
> > >>>>> moving some of the PRs to a separate list (that probably even less
> > >>> people
> > >>>>> would look into). Also I think we can achieve something like that
> > >>> already
> > >>>>> with github filters, no?
> > >>>>>
> > >>>>> To sum up my thoughts:
> > >>>>>
> > >>>>> 1. I think it is a good idea to split our CI builds to
> sub-modules
> > >>>>> (connectors being the first candidate), that would trigger on a
> > >>> changed
> > >>>>> path basis, but without splitting the repo.
> > >>>>> 2. My feeling is that the real question is if we want to change
> > > our
> > >>>>> stability guarantees of certain modules to be "just best effort".
> > >>>>> 3. If we were to vote on this proposal I would vote -0. I am
> > >> slightly
> > >>>>> against this change, but wouldn't oppose.
> > >>>>>
> > >>>>> Best,
> > >>>>>
> > >>>>> Dawid
> > >>>>> On 08/08/2019 13:23, Chesnay Schepler wrote:
> > >>>>>
> > >>>>>> I would like to also raise an additional issue: currently quite
> > > some
> > >>>>> bugs (like release blockers [1]) are being discovered by ITCases of
> > >> the
> > >>>>> connectors. It means that at least initially, the main repository
> > > will
> > >>> lose
> > >>>>> some test coverage.
> > >>>>>
> > >>>>> True, but I think this is more a symptom of us not properly testing
> > >> the
> > >>>>> contracts that are exposed to connectors.
> > >>>>> That we lose lose test coverage is already a big red flag as it
> > >> implies
> > >>>>> that issues were fixed and are now verified by a connector test,
> and
> > >>> not by
> > >>>>> a test in the Flink core.
> > >>>>> We could also look into tooling surrounding the CI bot for running
> > > the
> > >>>>> connectors tests on-demand, although this is very much long-term.
> > >>>>>
> > >>>>> On 08/08/2019 13:14, Piotr Nowojski wrote:
> > >>>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> Thanks for proposing and writing this down Chesney.
> > >>>>>
> > >>>>> Generally speaking +1 from my side for the idea. It will create
> > >>> additional
> > >>>>> pain for cross repository development, like some new feature in
> > >>> connectors
> > >>>>> that need some change in the main repository. I’ve worked in such
> > >> setup
> > >>>>> before and the teams then regretted having such split. But I agree
> > >> that
> > >>> we
> > >>>>> should try this to try solve the stability/build time issues.
> > >>>>>
> > >>>>> I have no experience in making such kind of splits so I can not
> help
> > >>> here.
> > >>>>>
> > >>>>> I would like to also raise an additional issue: currently quite
> some
> > >>> bugs
> > >>>>> (like release blockers [1]) are being discovered by ITCases of the
> > >>>>> connectors. It means that at least initially, the main repository
> > > will
> > >>> lose
> > >>>>> some test coverage.
> > >>>>>
> > >>>>> Piotrek
> > >>>>>
> > >>>>> [1] https://issues.apache.org/jira/browse/FLINK-13593 <
> > >>> https://issues.apache.org/jira/browse/FLINK-13593>
> > >>>>> <https://issues.apache.org/jira/browse/FLINK-13593 <
> > >>> https://issues.apache.org/jira/browse/FLINK-13593>>
> > >>>>> <https://issues.apache.org/jira/browse/FLINK-13593 <
> > >>> https://issues.apache.org/jira/browse/FLINK-13593>>
> > >>>>>
> > >>>>> On 7 Aug 2019, at 13:14, Chesnay Schepler <[hidden email]
> > >> <mailto:
> > >>> [hidden email]>>
> > >>>>> <[hidden email] <mailto:[hidden email]>> wrote:
> > >>>>>
> > >>>>> Hello everyone,
> > >>>>>
> > >>>>> The Flink project sees an ever-increasing amount of dev activity,
> > > both
> > >>> in
> > >>>>> terms of reworked and new features.
> > >>>>>
> > >>>>> This is of course an excellent situation to be in, but we are
> > > getting
> > >>> to a
> > >>>>> point where the associate downsides are becoming increasingly
> > >>> troublesome.
> > >>>>>
> > >>>>> The ever increasing build times, in addition to unstable tests,
> > >>>>> significantly slow down the develoment process.
> > >>>>> Additionally, pull requests for smaller features frequently slip
> > >> through
> > >>>>> the crasks as they are being buried under a mountain of other pull
> > >>>>> requests.
> > >>>>>
> > >>>>> As a result I'd like to start a discussion on splitting the Flink
> > >>>>> repository.
> > >>>>>
> > >>>>> In this mail I will outline the core idea, and what problems I
> > >> currently
> > >>>>> envision.
> > >>>>>
> > >>>>> I'd specifically like to encourage those who were part of similar
> > >>>>> initiatives in other projects to share the experiences and ideas.
> > >>>>>
> > >>>>>
> > >>>>> General Idea
> > >>>>>
> > >>>>> For starters, the idea is to create a new repository for
> > >>>>> "flink-connectors".
> > >>>>> For the remainder of this mail, the current Flink repository is
> > >> referred
> > >>>>> to as "flink-main".
> > >>>>>
> > >>>>> There are also other candidates that we could discuss in the
> future,
> > >>> like
> > >>>>> flink-libraries (the next top-priority repo to ease flink-ml
> > >>> development),
> > >>>>> metric reporters, filesystems and flink-formats.
> > >>>>>
> > >>>>> Moving out flink-connectors provides the most benefits, as we
> > > straight
> > >>>>> away save at-least an hour of testing time, and not being included
> > > in
> > >>> the
> > >>>>> binary distribution simplifies a few things.
> > >>>>>
> > >>>>>
> > >>>>> Problems to solve
> > >>>>>
> > >>>>> To make this a reality there's a number of questions we have to
> > >> discuss;
> > >>>>> some in the short-term, others in the long-term.
> > >>>>>
> > >>>>> 1) Git history
> > >>>>>
> > >>>>> We have to decide whether we want to rewrite the history of sub
> > >>>>> repositories to only contain diffs/commits related to this part
> of
> > >>>>> Flink, or whether we just fork from some commit in flink-main and
> > >>>>> add a commit to the connector repo that "transforms" it from
> > >>>>> flink-main to flink-connectors (i.e., remove everything unrelated
> > > to
> > >>>>> connectors + update module structure etc.).
> > >>>>>
> > >>>>> The latter option would have the advantage that our commit book
> > >>>>> keeping in JIRA would still be correct, but it would create a
> > >>>>> significant divide between the current and past state of the
> > >>>>> repository.
> > >>>>>
> > >>>>> 2) Maven
> > >>>>>
> > >>>>> We should look into whether there's a way to share
> > > dependency/plugin
> > >>>>> configurations and similar, so we don't have to keep them in sync
> > >>>>> manually across multiple repositories.
> > >>>>>
> > >>>>> A new parent Flink pom that all repositories define as their
> > > parent
> > >>>>> could work; this would imply splicing out part of the current
> room
> > >>>>> pom.xml.
> > >>>>>
> > >>>>> 3) Documentation
> > >>>>>
> > >>>>> Splitting the repository realistically also implies splitting the
> > >>>>> documentation source files (At the beginning we can get by with
> > >>>>> having it still in flink-main).
> > >>>>> We could just move the relevant files to the respective
> repository
> > >>>>> (while maintaining the directory structure), and merge them when
> > >>>>> building the docs.
> > >>>>>
> > >>>>> We also have to look at how we can handle java-/scaladocs; e.g.
> > >>>>> whether it is possible to aggregate them across projects.
> > >>>>>
> > >>>>> 4) CI (end-to-end tests)
> > >>>>>
> > >>>>> The very basic question we have to answer is whether we want E2E
> > >>>>> tests in the sub repositories. If so, we need to find a way to
> > > share
> > >>>>> e2e-tooling.
> > >>>>>
> > >>>>> 5) Releases
> > >>>>>
> > >>>>> We have to discuss how our release process will look like. This
> > > may
> > >>>>> also have repercussions on how repositories may depend on each
> > > other
> > >>>>> (SNAPSHOT vs LATEST). Note that this should be discussed for each
> > >>>>> repo separately.
> > >>>>>
> > >>>>> The current options I see are the following:
> > >>>>>
> > >>>>> a) Single release
> > >>>>>
> > >>>>> Release all repositories at once as a single product.
> > >>>>>
> > >>>>> The source release would be a collection of repositories,
> like
> > >>>>> flink/
> > >>>>> |--flink-main/
> > >>>>> |--flink-core/
> > >>>>> |--flink-runtime/
> > >>>>> ...
> > >>>>> |--flink-connectors/
> > >>>>> ...
> > >>>>> |--flink-.../
> > >>>>> ...
> > >>>>>
> > >>>>> This option requires a SNAPSHOT dependency between Flink
> > >>>>> repositories, but it is pretty much how things work at the
> > >> moment.
> > >>>>>
> > >>>>> b) Synced releases
> > >>>>>
> > >>>>> Similar to a), except that each repository gets their own
> > > source
> > >>>>> release that they may released independent of other
> > >> repositories.
> > >>>>> For a given release cycle each repo would produce exactly one
> > >>>>> release.
> > >>>>>
> > >>>>> This option requires a SNAPSHOT dependency between Flink
> > >>>>> repositories. Once any repositories has created an RC or
> > >>>>> finished it's release, release-branches in other repos can
> > >>>>> switch to that version.
> > >>>>>
> > >>>>> This approach is a tad more flexible than a), but requires
> > > more
> > >>>>> coordination between the repos.
> > >>>>>
> > >>>>> c) Separate releases
> > >>>>>
> > >>>>> Just like we handle flink-shaded; entirely separate release
> > >>>>> cycles; some repositories may have more releases in a given
> > > time
> > >>>>> period than others.
> > >>>>>
> > >>>>> This option implies a LATEST dependency between Flink
> > >>> repositories.
> > >>>>>
> > >>>>> Note that hybrid approaches would also make sense, like doing b)
> > > for
> > >>>>> major versions and c) for bugfix releases.
> > >>>>>
> > >>>>> For something like flink-libraries this question may also have
> > >>>>> repercussions on how/whether they are bundled in the
> distribution;
> > >>>>> options a)/b) would maintain the status-quo, c) and hybrid
> > >>>>> approaches will likely necessitate the exclusion from the
> > >>> distribution.
> > >>>
> > >>>
> > >>
> > >
> >
>
>

Stephan Ewen

Re: [DISCUSS] Repository split

In reply to this post by Chesnay Schepler-3

Just in case we decide to pursue the repo split in the end, some thoughts
on Chesnay's questions:

(1) Git History

We can also use "git filter-branch" to rewrite the history to only contain
the connectors.
It changes commit hashes, but not sure that this is a problem. The commit
hashes are still valid in the main repo, so one can look up the commits
that fixed an earlier issue.

(2) Maven

+1 to a shared flink-parent pom.xml file

(3) Docs

One option would be to not integrate the docs.
That would mean a top level navigation between Flink, Connectors, Libraries
(for example as a horizontal bar at the top) and then per repository
navigation as we currently have it.
Of course, sharing docs build setup would be desirable.

(4) End-2-End tests

I think we absolutely need those on the other repos.
As Piotr pointed out, some of the end to end test coverage depends on
connectors and libraries.

While ideally that would not be necessary, I believe that realistically,
targeted test coverage in the core will never absolutely perfect. So a
certain amount of additional coverage (especially for bugs due to
distributed race conditions) will be caught by the extended test coverage
we get from connector and library end-to-end tests.

Let's find a way to keep that, maybe not as per-commit tests, but as
nightly ones.

On Wed, Aug 7, 2019 at 1:14 PM Chesnay Schepler <[hidden email]> wrote:

> Hello everyone,
>
> The Flink project sees an ever-increasing amount of dev activity, both
> in terms of reworked and new features.
>
> This is of course an excellent situation to be in, but we are getting to
> a point where the associate downsides are becoming increasingly
> troublesome.
>
> The ever increasing build times, in addition to unstable tests,
> significantly slow down the develoment process.
> Additionally, pull requests for smaller features frequently slip through
> the crasks as they are being buried under a mountain of other pull
> requests.
>
> As a result I'd like to start a discussion on splitting the Flink
> repository.
>
> In this mail I will outline the core idea, and what problems I currently
> envision.
>
> I'd specifically like to encourage those who were part of similar
> initiatives in other projects to share the experiences and ideas.
>
>
> General Idea
>
> For starters, the idea is to create a new repository for
> "flink-connectors".
> For the remainder of this mail, the current Flink repository is referred
> to as "flink-main".
>
> There are also other candidates that we could discuss in the future,
> like flink-libraries (the next top-priority repo to ease flink-ml
> development), metric reporters, filesystems and flink-formats.
>
> Moving out flink-connectors provides the most benefits, as we straight
> away save at-least an hour of testing time, and not being included in
> the binary distribution simplifies a few things.
>
>
> Problems to solve
>
> To make this a reality there's a number of questions we have to discuss;
> some in the short-term, others in the long-term.
>
> 1) Git history
>
> We have to decide whether we want to rewrite the history of sub
> repositories to only contain diffs/commits related to this part of
> Flink, or whether we just fork from some commit in flink-main and
> add a commit to the connector repo that "transforms" it from
> flink-main to flink-connectors (i.e., remove everything unrelated to
> connectors + update module structure etc.).
>
> The latter option would have the advantage that our commit book
> keeping in JIRA would still be correct, but it would create a
> significant divide between the current and past state of the
> repository.
>
> 2) Maven
>
> We should look into whether there's a way to share dependency/plugin
> configurations and similar, so we don't have to keep them in sync
> manually across multiple repositories.
>
> A new parent Flink pom that all repositories define as their parent
> could work; this would imply splicing out part of the current room
> pom.xml.
>
> 3) Documentation
>
> Splitting the repository realistically also implies splitting the
> documentation source files (At the beginning we can get by with
> having it still in flink-main).
> We could just move the relevant files to the respective repository
> (while maintaining the directory structure), and merge them when
> building the docs.
>
> We also have to look at how we can handle java-/scaladocs; e.g.
> whether it is possible to aggregate them across projects.
>
> 4) CI (end-to-end tests)
>
> The very basic question we have to answer is whether we want E2E
> tests in the sub repositories. If so, we need to find a way to share
> e2e-tooling.
>
> 5) Releases
>
> We have to discuss how our release process will look like. This may
> also have repercussions on how repositories may depend on each other
> (SNAPSHOT vs LATEST). Note that this should be discussed for each
> repo separately.
>
> The current options I see are the following:
>
> a) Single release
>
> Release all repositories at once as a single product.
>
> The source release would be a collection of repositories, like
> flink/
> |--flink-main/
> |--flink-core/
> |--flink-runtime/
> ...
> |--flink-connectors/
> ...
> |--flink-.../
> ...
>
> This option requires a SNAPSHOT dependency between Flink
> repositories, but it is pretty much how things work at the moment.
>
> b) Synced releases
>
> Similar to a), except that each repository gets their own source
> release that they may released independent of other repositories.
> For a given release cycle each repo would produce exactly one
> release.
>
> This option requires a SNAPSHOT dependency between Flink
> repositories. Once any repositories has created an RC or
> finished it's release, release-branches in other repos can
> switch to that version.
>
> This approach is a tad more flexible than a), but requires more
> coordination between the repos.
>
> c) Separate releases
>
> Just like we handle flink-shaded; entirely separate release
> cycles; some repositories may have more releases in a given time
> period than others.
>
> This option implies a LATEST dependency between Flink repositories.
>
> Note that hybrid approaches would also make sense, like doing b) for
> major versions and c) for bugfix releases.
>
> For something like flink-libraries this question may also have
> repercussions on how/whether they are bundled in the distribution;
> options a)/b) would maintain the status-quo, c) and hybrid
> approaches will likely necessitate the exclusion from the distribution.
>
>

Robert Metzger

Re: [DISCUSS] Repository split

Thanks a lot for starting the discussion Chesnay!

I would like to throw in another aspect into the discussion: What if we
consider this repo split as a first step towards making connectors, machine
learning, gelly, table/SQL? independent projects within the ASF, with their
own mailing lists, committers and JIRA?

Of course, we would not establish the new repos as new projects
immediately, but after we have found good boundaries between the projects
(interfaces, tests, documentation, communities) (6-24 months)

Each project (or repo initially) would create separate releases, and depend
on stable versions.

This allows each project to come up with their own release cadence.

Also, the projects could establish their own processes. A connectors
project would probably have more turnover in terms of new connector
contributions, so something like a “connector incubator” would make sense?
A “young” machine learning project might benefit from a monthly release
model initially.

I see this as a way of establishing different standards based on the
requirements of each project (the concern of double standards has been
voiced)

With a clearer “separation of concerns”, the connector project would report
bugs to upstream Flink, they would fix & test it. In the current setup, the
bug might just be validated through the connector test. A split would force
upstream Flink to have a proper test in place.

To some extend, Flink is already a project that contains different
sub-communities, working on the core, table api or machine learning.

Maybe Flink’s growth (from a development perspective) is limited by the
noise and complexity of having multiple sub-communities within one
community?

Throughout this discussion so far, various issues have been mentioned, that
would solve naturally if we have that mindset:

a) Depending on SNAPSHOT versions / releases:

The new repos would depend on stable flink releases. Interface changes, bug
fixes would have to wait for the next upstream flink release.

PRs would be reproducible. Local setups would be easy, as downstream
projects depend on a stable upstream Flink release.

b) Number of pull requests:

The concern is that the number of open pull requests would not decrease
with a repo split.

If we consider splitted repositories independent projects, they can attract
their own contributors / committers. In particular for machine learning and
SQL, I can actually see a lot of potential for attracting new PR reviewers.

I'm putting this thought out just to see what you are thinking about this
in general. This is not a final proposal for solving all issues mentioned
here :) But if we do a split now, let's do something future-proof, even if
it is painful in the short run.

Best,
Robert

On Mon, Aug 12, 2019 at 10:09 AM Stephan Ewen <[hidden email]> wrote:

> Just in case we decide to pursue the repo split in the end, some thoughts
> on Chesnay's questions:
>
> (1) Git History
>
> We can also use "git filter-branch" to rewrite the history to only contain
> the connectors.
> It changes commit hashes, but not sure that this is a problem. The commit
> hashes are still valid in the main repo, so one can look up the commits
> that fixed an earlier issue.
>
> (2) Maven
>
> +1 to a shared flink-parent pom.xml file
>
> (3) Docs
>
> One option would be to not integrate the docs.
> That would mean a top level navigation between Flink, Connectors, Libraries
> (for example as a horizontal bar at the top) and then per repository
> navigation as we currently have it.
> Of course, sharing docs build setup would be desirable.
>
> (4) End-2-End tests
>
> I think we absolutely need those on the other repos.
> As Piotr pointed out, some of the end to end test coverage depends on
> connectors and libraries.
>
> While ideally that would not be necessary, I believe that realistically,
> targeted test coverage in the core will never absolutely perfect. So a
> certain amount of additional coverage (especially for bugs due to
> distributed race conditions) will be caught by the extended test coverage
> we get from connector and library end-to-end tests.
>
> Let's find a way to keep that, maybe not as per-commit tests, but as
> nightly ones.
>
> On Wed, Aug 7, 2019 at 1:14 PM Chesnay Schepler <[hidden email]>
> wrote:
>
> > Hello everyone,
> >
> > The Flink project sees an ever-increasing amount of dev activity, both
> > in terms of reworked and new features.
> >
> > This is of course an excellent situation to be in, but we are getting to
> > a point where the associate downsides are becoming increasingly
> > troublesome.
> >
> > The ever increasing build times, in addition to unstable tests,
> > significantly slow down the develoment process.
> > Additionally, pull requests for smaller features frequently slip through
> > the crasks as they are being buried under a mountain of other pull
> > requests.
> >
> > As a result I'd like to start a discussion on splitting the Flink
> > repository.
> >
> > In this mail I will outline the core idea, and what problems I currently
> > envision.
> >
> > I'd specifically like to encourage those who were part of similar
> > initiatives in other projects to share the experiences and ideas.
> >
> >
> > General Idea
> >
> > For starters, the idea is to create a new repository for
> > "flink-connectors".
> > For the remainder of this mail, the current Flink repository is referred
> > to as "flink-main".
> >
> > There are also other candidates that we could discuss in the future,
> > like flink-libraries (the next top-priority repo to ease flink-ml
> > development), metric reporters, filesystems and flink-formats.
> >
> > Moving out flink-connectors provides the most benefits, as we straight
> > away save at-least an hour of testing time, and not being included in
> > the binary distribution simplifies a few things.
> >
> >
> > Problems to solve
> >
> > To make this a reality there's a number of questions we have to discuss;
> > some in the short-term, others in the long-term.
> >
> > 1) Git history
> >
> > We have to decide whether we want to rewrite the history of sub
> > repositories to only contain diffs/commits related to this part of
> > Flink, or whether we just fork from some commit in flink-main and
> > add a commit to the connector repo that "transforms" it from
> > flink-main to flink-connectors (i.e., remove everything unrelated to
> > connectors + update module structure etc.).
> >
> > The latter option would have the advantage that our commit book
> > keeping in JIRA would still be correct, but it would create a
> > significant divide between the current and past state of the
> > repository.
> >
> > 2) Maven
> >
> > We should look into whether there's a way to share dependency/plugin
> > configurations and similar, so we don't have to keep them in sync
> > manually across multiple repositories.
> >
> > A new parent Flink pom that all repositories define as their parent
> > could work; this would imply splicing out part of the current room
> > pom.xml.
> >
> > 3) Documentation
> >
> > Splitting the repository realistically also implies splitting the
> > documentation source files (At the beginning we can get by with
> > having it still in flink-main).
> > We could just move the relevant files to the respective repository
> > (while maintaining the directory structure), and merge them when
> > building the docs.
> >
> > We also have to look at how we can handle java-/scaladocs; e.g.
> > whether it is possible to aggregate them across projects.
> >
> > 4) CI (end-to-end tests)
> >
> > The very basic question we have to answer is whether we want E2E
> > tests in the sub repositories. If so, we need to find a way to share
> > e2e-tooling.
> >
> > 5) Releases
> >
> > We have to discuss how our release process will look like. This may
> > also have repercussions on how repositories may depend on each other
> > (SNAPSHOT vs LATEST). Note that this should be discussed for each
> > repo separately.
> >
> > The current options I see are the following:
> >
> > a) Single release
> >
> > Release all repositories at once as a single product.
> >
> > The source release would be a collection of repositories, like
> > flink/
> > |--flink-main/
> > |--flink-core/
> > |--flink-runtime/
> > ...
> > |--flink-connectors/
> > ...
> > |--flink-.../
> > ...
> >
> > This option requires a SNAPSHOT dependency between Flink
> > repositories, but it is pretty much how things work at the
> moment.
> >
> > b) Synced releases
> >
> > Similar to a), except that each repository gets their own source
> > release that they may released independent of other repositories.
> > For a given release cycle each repo would produce exactly one
> > release.
> >
> > This option requires a SNAPSHOT dependency between Flink
> > repositories. Once any repositories has created an RC or
> > finished it's release, release-branches in other repos can
> > switch to that version.
> >
> > This approach is a tad more flexible than a), but requires more
> > coordination between the repos.
> >
> > c) Separate releases
> >
> > Just like we handle flink-shaded; entirely separate release
> > cycles; some repositories may have more releases in a given time
> > period than others.
> >
> > This option implies a LATEST dependency between Flink
> repositories.
> >
> > Note that hybrid approaches would also make sense, like doing b) for
> > major versions and c) for bugfix releases.
> >
> > For something like flink-libraries this question may also have
> > repercussions on how/whether they are bundled in the distribution;
> > options a)/b) would maintain the status-quo, c) and hybrid
> > approaches will likely necessitate the exclusion from the
> distribution.
> >
> >
>

Arvid Heise

Re: [DISCUSS] Repository split

I split small and medium-sized repositories in several projects for various
reasons. In general, the more mature a project, the fewer pain after the
split. If interfaces are somewhat stable, it's naturally easier to work in
a distributed manner.

However, projects should be split for the right reasons. Robert pointed the
most important out: growth of somewhat individual communities. Another
reason would be that we actually want to force better coverage inside the
modules (for example, adding tests to the core modules when e2e fail).
Another reason is to actually slow down development: Make sure that a new
API endpoint is well-crafted before adding the implementation in some
module. API changes will occur less, when devs have to adopt it throughout
several modules and feel the pain of users. Sometimes API changes will
actually become more visible through separate projects.
One issue that would be addressed that I currently have is reduced
complexity while onboarding.

In contrast, other issues can be solved without splitting the repository
and sacrificing development speed: build times can be lowered with
company-wide build caches (https://gradle.com/ , also for maven, although I
know only the gradle version).

I think that I have not enough experience with the project yet to cast a
vote. I made good experiences in the past with splitting (although it takes
time to pay off), but I see many valid points raised.

I do have a strong opinion on reducing build times though and would be
avail to explore that, but that sounds like a separate discussion to me.

Best,

Arvid

On Mon, Aug 12, 2019 at 4:26 PM Robert Metzger <[hidden email]> wrote:

> Thanks a lot for starting the discussion Chesnay!
>
>
> I would like to throw in another aspect into the discussion: What if we
> consider this repo split as a first step towards making connectors, machine
> learning, gelly, table/SQL? independent projects within the ASF, with their
> own mailing lists, committers and JIRA?
>
>
> Of course, we would not establish the new repos as new projects
> immediately, but after we have found good boundaries between the projects
> (interfaces, tests, documentation, communities) (6-24 months)
>
>
> Each project (or repo initially) would create separate releases, and depend
> on stable versions.
>
> This allows each project to come up with their own release cadence.
>
>
> Also, the projects could establish their own processes. A connectors
> project would probably have more turnover in terms of new connector
> contributions, so something like a “connector incubator” would make sense?
> A “young” machine learning project might benefit from a monthly release
> model initially.
>
> I see this as a way of establishing different standards based on the
> requirements of each project (the concern of double standards has been
> voiced)
>
>
> With a clearer “separation of concerns”, the connector project would report
> bugs to upstream Flink, they would fix & test it. In the current setup, the
> bug might just be validated through the connector test. A split would force
> upstream Flink to have a proper test in place.
>
>
> To some extend, Flink is already a project that contains different
> sub-communities, working on the core, table api or machine learning.
>
> Maybe Flink’s growth (from a development perspective) is limited by the
> noise and complexity of having multiple sub-communities within one
> community?
>
>
>
> Throughout this discussion so far, various issues have been mentioned, that
> would solve naturally if we have that mindset:
>
>
> a) Depending on SNAPSHOT versions / releases:
>
> The new repos would depend on stable flink releases. Interface changes, bug
> fixes would have to wait for the next upstream flink release.
>
> PRs would be reproducible. Local setups would be easy, as downstream
> projects depend on a stable upstream Flink release.
>
>
> b) Number of pull requests:
>
> The concern is that the number of open pull requests would not decrease
> with a repo split.
>
> If we consider splitted repositories independent projects, they can attract
> their own contributors / committers. In particular for machine learning and
> SQL, I can actually see a lot of potential for attracting new PR reviewers.
>
> I'm putting this thought out just to see what you are thinking about this
> in general. This is not a final proposal for solving all issues mentioned
> here :) But if we do a split now, let's do something future-proof, even if
> it is painful in the short run.
>
> Best,
> Robert
>
>
> On Mon, Aug 12, 2019 at 10:09 AM Stephan Ewen <[hidden email]> wrote:
>
> > Just in case we decide to pursue the repo split in the end, some thoughts
> > on Chesnay's questions:
> >
> > (1) Git History
> >
> > We can also use "git filter-branch" to rewrite the history to only
> contain
> > the connectors.
> > It changes commit hashes, but not sure that this is a problem. The commit
> > hashes are still valid in the main repo, so one can look up the commits
> > that fixed an earlier issue.
> >
> > (2) Maven
> >
> > +1 to a shared flink-parent pom.xml file
> >
> > (3) Docs
> >
> > One option would be to not integrate the docs.
> > That would mean a top level navigation between Flink, Connectors,
> Libraries
> > (for example as a horizontal bar at the top) and then per repository
> > navigation as we currently have it.
> > Of course, sharing docs build setup would be desirable.
> >
> > (4) End-2-End tests
> >
> > I think we absolutely need those on the other repos.
> > As Piotr pointed out, some of the end to end test coverage depends on
> > connectors and libraries.
> >
> > While ideally that would not be necessary, I believe that realistically,
> > targeted test coverage in the core will never absolutely perfect. So a
> > certain amount of additional coverage (especially for bugs due to
> > distributed race conditions) will be caught by the extended test coverage
> > we get from connector and library end-to-end tests.
> >
> > Let's find a way to keep that, maybe not as per-commit tests, but as
> > nightly ones.
> >
> > On Wed, Aug 7, 2019 at 1:14 PM Chesnay Schepler <[hidden email]>
> > wrote:
> >
> > > Hello everyone,
> > >
> > > The Flink project sees an ever-increasing amount of dev activity, both
> > > in terms of reworked and new features.
> > >
> > > This is of course an excellent situation to be in, but we are getting
> to
> > > a point where the associate downsides are becoming increasingly
> > > troublesome.
> > >
> > > The ever increasing build times, in addition to unstable tests,
> > > significantly slow down the develoment process.
> > > Additionally, pull requests for smaller features frequently slip
> through
> > > the crasks as they are being buried under a mountain of other pull
> > > requests.
> > >
> > > As a result I'd like to start a discussion on splitting the Flink
> > > repository.
> > >
> > > In this mail I will outline the core idea, and what problems I
> currently
> > > envision.
> > >
> > > I'd specifically like to encourage those who were part of similar
> > > initiatives in other projects to share the experiences and ideas.
> > >
> > >
> > > General Idea
> > >
> > > For starters, the idea is to create a new repository for
> > > "flink-connectors".
> > > For the remainder of this mail, the current Flink repository is
> referred
> > > to as "flink-main".
> > >
> > > There are also other candidates that we could discuss in the future,
> > > like flink-libraries (the next top-priority repo to ease flink-ml
> > > development), metric reporters, filesystems and flink-formats.
> > >
> > > Moving out flink-connectors provides the most benefits, as we straight
> > > away save at-least an hour of testing time, and not being included in
> > > the binary distribution simplifies a few things.
> > >
> > >
> > > Problems to solve
> > >
> > > To make this a reality there's a number of questions we have to
> discuss;
> > > some in the short-term, others in the long-term.
> > >
> > > 1) Git history
> > >
> > > We have to decide whether we want to rewrite the history of sub
> > > repositories to only contain diffs/commits related to this part of
> > > Flink, or whether we just fork from some commit in flink-main and
> > > add a commit to the connector repo that "transforms" it from
> > > flink-main to flink-connectors (i.e., remove everything unrelated
> to
> > > connectors + update module structure etc.).
> > >
> > > The latter option would have the advantage that our commit book
> > > keeping in JIRA would still be correct, but it would create a
> > > significant divide between the current and past state of the
> > > repository.
> > >
> > > 2) Maven
> > >
> > > We should look into whether there's a way to share
> dependency/plugin
> > > configurations and similar, so we don't have to keep them in sync
> > > manually across multiple repositories.
> > >
> > > A new parent Flink pom that all repositories define as their parent
> > > could work; this would imply splicing out part of the current room
> > > pom.xml.
> > >
> > > 3) Documentation
> > >
> > > Splitting the repository realistically also implies splitting the
> > > documentation source files (At the beginning we can get by with
> > > having it still in flink-main).
> > > We could just move the relevant files to the respective repository
> > > (while maintaining the directory structure), and merge them when
> > > building the docs.
> > >
> > > We also have to look at how we can handle java-/scaladocs; e.g.
> > > whether it is possible to aggregate them across projects.
> > >
> > > 4) CI (end-to-end tests)
> > >
> > > The very basic question we have to answer is whether we want E2E
> > > tests in the sub repositories. If so, we need to find a way to
> share
> > > e2e-tooling.
> > >
> > > 5) Releases
> > >
> > > We have to discuss how our release process will look like. This may
> > > also have repercussions on how repositories may depend on each
> other
> > > (SNAPSHOT vs LATEST). Note that this should be discussed for each
> > > repo separately.
> > >
> > > The current options I see are the following:
> > >
> > > a) Single release
> > >
> > > Release all repositories at once as a single product.
> > >
> > > The source release would be a collection of repositories, like
> > > flink/
> > > |--flink-main/
> > > |--flink-core/
> > > |--flink-runtime/
> > > ...
> > > |--flink-connectors/
> > > ...
> > > |--flink-.../
> > > ...
> > >
> > > This option requires a SNAPSHOT dependency between Flink
> > > repositories, but it is pretty much how things work at the
> > moment.
> > >
> > > b) Synced releases
> > >
> > > Similar to a), except that each repository gets their own
> source
> > > release that they may released independent of other
> repositories.
> > > For a given release cycle each repo would produce exactly one
> > > release.
> > >
> > > This option requires a SNAPSHOT dependency between Flink
> > > repositories. Once any repositories has created an RC or
> > > finished it's release, release-branches in other repos can
> > > switch to that version.
> > >
> > > This approach is a tad more flexible than a), but requires more
> > > coordination between the repos.
> > >
> > > c) Separate releases
> > >
> > > Just like we handle flink-shaded; entirely separate release
> > > cycles; some repositories may have more releases in a given
> time
> > > period than others.
> > >
> > > This option implies a LATEST dependency between Flink
> > repositories.
> > >
> > > Note that hybrid approaches would also make sense, like doing b)
> for
> > > major versions and c) for bugfix releases.
> > >
> > > For something like flink-libraries this question may also have
> > > repercussions on how/whether they are bundled in the distribution;
> > > options a)/b) would maintain the status-quo, c) and hybrid
> > > approaches will likely necessitate the exclusion from the
> > distribution.
> > >
> > >
> >
>

--

Arvid Heise | Senior Software Engineer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen

bowen.li

Re: [DISCUSS] Repository split

-1 for rushing into conclusions that we need to split the repo before
saturating our efforts in improving current build/CI mechanism. Besides all
the build system issues mentioned above (no incremental builds, no
flexibility to build only docs or subsets of components), it's hard to keep
configurations (like code style, permissions, etc) consistent between repos.

IMHO, one area we can further achieve build performance is CI bot. From my
experience, a few simple but effective changes we can make are 1) cancel
previous build when submitting a new commit (this seems to have been fixed
10 days ago [1]), 2) cancel previous build when the PR is closed, either
merged or abandoned. And many to come.

Though I like the soft split approach Stephan raised slightly better than
the hard split, I hope that's not the ultimate approach either, **unless
really no better way presents itself**, because it still seems to me that
we are trying to identify dependency graphs **manually** just to make up
for the incapability of build tool. Gradle is surely capable of doing that
as people mentioned and I used that capability before. I researched maven
previously but didn't get much due to lack of good documentations, and thus
I'm not sure if maven is "modern" enough for that task. Hopefully we won't
need to reinvent the wheels the hard way just for the sake of complementing
maven.

[1]
https://github.com/flink-ci/ci-bot/commit/82bb83fd997fac97405fd956d758af100b0f289c

On Mon, Aug 12, 2019 at 7:44 AM Arvid Heise <[hidden email]> wrote:

> I split small and medium-sized repositories in several projects for various
> reasons. In general, the more mature a project, the fewer pain after the
> split. If interfaces are somewhat stable, it's naturally easier to work in
> a distributed manner.
>
> However, projects should be split for the right reasons. Robert pointed the
> most important out: growth of somewhat individual communities. Another
> reason would be that we actually want to force better coverage inside the
> modules (for example, adding tests to the core modules when e2e fail).
> Another reason is to actually slow down development: Make sure that a new
> API endpoint is well-crafted before adding the implementation in some
> module. API changes will occur less, when devs have to adopt it throughout
> several modules and feel the pain of users. Sometimes API changes will
> actually become more visible through separate projects.
> One issue that would be addressed that I currently have is reduced
> complexity while onboarding.
>
> In contrast, other issues can be solved without splitting the repository
> and sacrificing development speed: build times can be lowered with
> company-wide build caches (https://gradle.com/ , also for maven, although
> I
> know only the gradle version).
>
> I think that I have not enough experience with the project yet to cast a
> vote. I made good experiences in the past with splitting (although it takes
> time to pay off), but I see many valid points raised.
>
> I do have a strong opinion on reducing build times though and would be
> avail to explore that, but that sounds like a separate discussion to me.
>
> Best,
>
> Arvid
>
> On Mon, Aug 12, 2019 at 4:26 PM Robert Metzger <[hidden email]>
> wrote:
>
> > Thanks a lot for starting the discussion Chesnay!
> >
> >
> > I would like to throw in another aspect into the discussion: What if we
> > consider this repo split as a first step towards making connectors,
> machine
> > learning, gelly, table/SQL? independent projects within the ASF, with
> their
> > own mailing lists, committers and JIRA?
> >
> >
> > Of course, we would not establish the new repos as new projects
> > immediately, but after we have found good boundaries between the projects
> > (interfaces, tests, documentation, communities) (6-24 months)
> >
> >
> > Each project (or repo initially) would create separate releases, and
> depend
> > on stable versions.
> >
> > This allows each project to come up with their own release cadence.
> >
> >
> > Also, the projects could establish their own processes. A connectors
> > project would probably have more turnover in terms of new connector
> > contributions, so something like a “connector incubator” would make
> sense?
> > A “young” machine learning project might benefit from a monthly release
> > model initially.
> >
> > I see this as a way of establishing different standards based on the
> > requirements of each project (the concern of double standards has been
> > voiced)
> >
> >
> > With a clearer “separation of concerns”, the connector project would
> report
> > bugs to upstream Flink, they would fix & test it. In the current setup,
> the
> > bug might just be validated through the connector test. A split would
> force
> > upstream Flink to have a proper test in place.
> >
> >
> > To some extend, Flink is already a project that contains different
> > sub-communities, working on the core, table api or machine learning.
> >
> > Maybe Flink’s growth (from a development perspective) is limited by the
> > noise and complexity of having multiple sub-communities within one
> > community?
> >
> >
> >
> > Throughout this discussion so far, various issues have been mentioned,
> that
> > would solve naturally if we have that mindset:
> >
> >
> > a) Depending on SNAPSHOT versions / releases:
> >
> > The new repos would depend on stable flink releases. Interface changes,
> bug
> > fixes would have to wait for the next upstream flink release.
> >
> > PRs would be reproducible. Local setups would be easy, as downstream
> > projects depend on a stable upstream Flink release.
> >
> >
> > b) Number of pull requests:
> >
> > The concern is that the number of open pull requests would not decrease
> > with a repo split.
> >
> > If we consider splitted repositories independent projects, they can
> attract
> > their own contributors / committers. In particular for machine learning
> and
> > SQL, I can actually see a lot of potential for attracting new PR
> reviewers.
> >
> > I'm putting this thought out just to see what you are thinking about this
> > in general. This is not a final proposal for solving all issues mentioned
> > here :) But if we do a split now, let's do something future-proof, even
> if
> > it is painful in the short run.
> >
> > Best,
> > Robert
> >
> >
> > On Mon, Aug 12, 2019 at 10:09 AM Stephan Ewen <[hidden email]> wrote:
> >
> > > Just in case we decide to pursue the repo split in the end, some
> thoughts
> > > on Chesnay's questions:
> > >
> > > (1) Git History
> > >
> > > We can also use "git filter-branch" to rewrite the history to only
> > contain
> > > the connectors.
> > > It changes commit hashes, but not sure that this is a problem. The
> commit
> > > hashes are still valid in the main repo, so one can look up the commits
> > > that fixed an earlier issue.
> > >
> > > (2) Maven
> > >
> > > +1 to a shared flink-parent pom.xml file
> > >
> > > (3) Docs
> > >
> > > One option would be to not integrate the docs.
> > > That would mean a top level navigation between Flink, Connectors,
> > Libraries
> > > (for example as a horizontal bar at the top) and then per repository
> > > navigation as we currently have it.
> > > Of course, sharing docs build setup would be desirable.
> > >
> > > (4) End-2-End tests
> > >
> > > I think we absolutely need those on the other repos.
> > > As Piotr pointed out, some of the end to end test coverage depends on
> > > connectors and libraries.
> > >
> > > While ideally that would not be necessary, I believe that
> realistically,
> > > targeted test coverage in the core will never absolutely perfect. So a
> > > certain amount of additional coverage (especially for bugs due to
> > > distributed race conditions) will be caught by the extended test
> coverage
> > > we get from connector and library end-to-end tests.
> > >
> > > Let's find a way to keep that, maybe not as per-commit tests, but as
> > > nightly ones.
> > >
> > > On Wed, Aug 7, 2019 at 1:14 PM Chesnay Schepler <[hidden email]>
> > > wrote:
> > >
> > > > Hello everyone,
> > > >
> > > > The Flink project sees an ever-increasing amount of dev activity,
> both
> > > > in terms of reworked and new features.
> > > >
> > > > This is of course an excellent situation to be in, but we are getting
> > to
> > > > a point where the associate downsides are becoming increasingly
> > > > troublesome.
> > > >
> > > > The ever increasing build times, in addition to unstable tests,
> > > > significantly slow down the develoment process.
> > > > Additionally, pull requests for smaller features frequently slip
> > through
> > > > the crasks as they are being buried under a mountain of other pull
> > > > requests.
> > > >
> > > > As a result I'd like to start a discussion on splitting the Flink
> > > > repository.
> > > >
> > > > In this mail I will outline the core idea, and what problems I
> > currently
> > > > envision.
> > > >
> > > > I'd specifically like to encourage those who were part of similar
> > > > initiatives in other projects to share the experiences and ideas.
> > > >
> > > >
> > > > General Idea
> > > >
> > > > For starters, the idea is to create a new repository for
> > > > "flink-connectors".
> > > > For the remainder of this mail, the current Flink repository is
> > referred
> > > > to as "flink-main".
> > > >
> > > > There are also other candidates that we could discuss in the future,
> > > > like flink-libraries (the next top-priority repo to ease flink-ml
> > > > development), metric reporters, filesystems and flink-formats.
> > > >
> > > > Moving out flink-connectors provides the most benefits, as we
> straight
> > > > away save at-least an hour of testing time, and not being included in
> > > > the binary distribution simplifies a few things.
> > > >
> > > >
> > > > Problems to solve
> > > >
> > > > To make this a reality there's a number of questions we have to
> > discuss;
> > > > some in the short-term, others in the long-term.
> > > >
> > > > 1) Git history
> > > >
> > > > We have to decide whether we want to rewrite the history of sub
> > > > repositories to only contain diffs/commits related to this part
> of
> > > > Flink, or whether we just fork from some commit in flink-main and
> > > > add a commit to the connector repo that "transforms" it from
> > > > flink-main to flink-connectors (i.e., remove everything unrelated
> > to
> > > > connectors + update module structure etc.).
> > > >
> > > > The latter option would have the advantage that our commit book
> > > > keeping in JIRA would still be correct, but it would create a
> > > > significant divide between the current and past state of the
> > > > repository.
> > > >
> > > > 2) Maven
> > > >
> > > > We should look into whether there's a way to share
> > dependency/plugin
> > > > configurations and similar, so we don't have to keep them in sync
> > > > manually across multiple repositories.
> > > >
> > > > A new parent Flink pom that all repositories define as their
> parent
> > > > could work; this would imply splicing out part of the current
> room
> > > > pom.xml.
> > > >
> > > > 3) Documentation
> > > >
> > > > Splitting the repository realistically also implies splitting the
> > > > documentation source files (At the beginning we can get by with
> > > > having it still in flink-main).
> > > > We could just move the relevant files to the respective
> repository
> > > > (while maintaining the directory structure), and merge them when
> > > > building the docs.
> > > >
> > > > We also have to look at how we can handle java-/scaladocs; e.g.
> > > > whether it is possible to aggregate them across projects.
> > > >
> > > > 4) CI (end-to-end tests)
> > > >
> > > > The very basic question we have to answer is whether we want E2E
> > > > tests in the sub repositories. If so, we need to find a way to
> > share
> > > > e2e-tooling.
> > > >
> > > > 5) Releases
> > > >
> > > > We have to discuss how our release process will look like. This
> may
> > > > also have repercussions on how repositories may depend on each
> > other
> > > > (SNAPSHOT vs LATEST). Note that this should be discussed for each
> > > > repo separately.
> > > >
> > > > The current options I see are the following:
> > > >
> > > > a) Single release
> > > >
> > > > Release all repositories at once as a single product.
> > > >
> > > > The source release would be a collection of repositories,
> like
> > > > flink/
> > > > |--flink-main/
> > > > |--flink-core/
> > > > |--flink-runtime/
> > > > ...
> > > > |--flink-connectors/
> > > > ...
> > > > |--flink-.../
> > > > ...
> > > >
> > > > This option requires a SNAPSHOT dependency between Flink
> > > > repositories, but it is pretty much how things work at the
> > > moment.
> > > >
> > > > b) Synced releases
> > > >
> > > > Similar to a), except that each repository gets their own
> > source
> > > > release that they may released independent of other
> > repositories.
> > > > For a given release cycle each repo would produce exactly one
> > > > release.
> > > >
> > > > This option requires a SNAPSHOT dependency between Flink
> > > > repositories. Once any repositories has created an RC or
> > > > finished it's release, release-branches in other repos can
> > > > switch to that version.
> > > >
> > > > This approach is a tad more flexible than a), but requires
> more
> > > > coordination between the repos.
> > > >
> > > > c) Separate releases
> > > >
> > > > Just like we handle flink-shaded; entirely separate release
> > > > cycles; some repositories may have more releases in a given
> > time
> > > > period than others.
> > > >
> > > > This option implies a LATEST dependency between Flink
> > > repositories.
> > > >
> > > > Note that hybrid approaches would also make sense, like doing b)
> > for
> > > > major versions and c) for bugfix releases.
> > > >
> > > > For something like flink-libraries this question may also have
> > > > repercussions on how/whether they are bundled in the
> distribution;
> > > > options a)/b) would maintain the status-quo, c) and hybrid
> > > > approaches will likely necessitate the exclusion from the
> > > distribution.
> > > >
> > > >
> > >
> >
>
>
> --
>
> Arvid Heise | Senior Software Engineer
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
>

Chesnay Schepler-3

Re: [DISCUSS] Repository split

In reply to this post by Chesnay Schepler-3

Let's recap a bit:

Several people have raised the argument that build times can be kept in
check via other means (mostly differential builds via some means, be it
custom scripts or switching to gradle). I will start a separate
discussion thread on this topic, since it is a useful discussion in any
case.
I agree with this, and believe it is feasible to update the CI process
to behave as if the repository was split.

The suggestion of a "project split" within a single repository was
brought up.
This approach is a mixed bag; it avoids the downsides to the development
process that multiple repositories would incur, but also only has few
upsides. It seems primarily relevant for local development, where one
might want to skip certain modules when running tests.

There's no benefit from the CI side: since we're still limited to 1
.travis.yml, whatever rules we want to set up (e.g., "do not test core
if only connectors are modified") have to be handled by the CI scripts
regardless of whether the project is split or not.

Overall, I'd like to put this item on ice for the time being; the
subsequent item is related, vastly more impactful and may also render
this item obsolete.

A major topic of discussion is that of the development process. It was
pointed how that having a split repository makes the dev process more
complicated, since certain changes turn into a 2 step process (merge to
core, then merge to connectors). Others have pointed out that this may
actually be an advantage, as it (to some extent) enforces that changes
to core are also tested in core.

I find myself more in the latter camp; it is all to easy for people to
make a change to the core while making whatever adjustments to
connectors to make things fit. A recent change to the ClosureCleaner in
1.8.0 <https://issues.apache.org/jira/browse/FLINK-13586> comes to mind,
which, with a split repo, may have resulted in build failures in the
connectors project. (provided that the time-frame between the 2 merges
is sufficiently large...) As Arvid pointed out, having to feel the pain
that users have to go through may not be such a bad thing.

This is a fundamental discussion as to whether we want to continue with
a centralized development of all components.

Robert also pointed out that such a split could result in us
establishing entirely separate projects. We've had times in the past
(like the first flink-ml library) where such a setup may have simplified
things (back then we had lot's of contributors but no committer to
shepherd the effort; a separate project could be more lenient when it
comes to appointing new committers).

@Robert We should have a SNAPSHOT dependency /somewhere/ in the
connector repo, to detect issues (like the ClosureCleaner one) in a
timely manner and to prepare for new features so that we can have a
timely release after core, but not necessarily on the master branch.

@Bowen I have implemented and deployed your suggestion to cancel Travis
builds if the associated PR has been closed.

On 07/08/2019 13:14, Chesnay Schepler wrote:

> Hello everyone,
>
> The Flink project sees an ever-increasing amount of dev activity, both
> in terms of reworked and new features.
>
> This is of course an excellent situation to be in, but we are getting
> to a point where the associate downsides are becoming increasingly
> troublesome.
>
> The ever increasing build times, in addition to unstable tests,
> significantly slow down the develoment process.
> Additionally, pull requests for smaller features frequently slip
> through the crasks as they are being buried under a mountain of other
> pull requests.
>
> As a result I'd like to start a discussion on splitting the Flink
> repository.
>
> In this mail I will outline the core idea, and what problems I
> currently envision.
>
> I'd specifically like to encourage those who were part of similar
> initiatives in other projects to share the experiences and ideas.
>
>
>        General Idea
>
> For starters, the idea is to create a new repository for
> "flink-connectors".
> For the remainder of this mail, the current Flink repository is
> referred to as "flink-main".
>
> There are also other candidates that we could discuss in the future,
> like flink-libraries (the next top-priority repo to ease flink-ml
> development), metric reporters, filesystems and flink-formats.
>
> Moving out flink-connectors provides the most benefits, as we straight
> away save at-least an hour of testing time, and not being included in
> the binary distribution simplifies a few things.
>
>
>        Problems to solve
>
> To make this a reality there's a number of questions we have to
> discuss; some in the short-term, others in the long-term.
>
> 1) Git history
>
>    We have to decide whether we want to rewrite the history of sub
>    repositories to only contain diffs/commits related to this part of
>    Flink, or whether we just fork from some commit in flink-main and
>    add a commit to the connector repo that "transforms" it from
>    flink-main to flink-connectors (i.e., remove everything unrelated to
>    connectors + update module structure etc.).
>
>    The latter option would have the advantage that our commit book
>    keeping in JIRA would still be correct, but it would create a
>    significant divide between the current and past state of the
> repository.
>
> 2) Maven
>
>    We should look into whether there's a way to share dependency/plugin
>    configurations and similar, so we don't have to keep them in sync
>    manually across multiple repositories.
>
>    A new parent Flink pom that all repositories define as their parent
>    could work; this would imply splicing out part of the current room
>    pom.xml.
>
> 3) Documentation
>
>    Splitting the repository realistically also implies splitting the
>    documentation source files (At the beginning we can get by with
>    having it still in flink-main).
>    We could just move the relevant files to the respective repository
>    (while maintaining the directory structure), and merge them when
>    building the docs.
>
>    We also have to look at how we can handle java-/scaladocs; e.g.
>    whether it is possible to aggregate them across projects.
>
> 4) CI (end-to-end tests)
>
>    The very basic question we have to answer is whether we want E2E
>    tests in the sub repositories. If so, we need to find a way to share
>    e2e-tooling.
>
> 5) Releases
>
>    We have to discuss how our release process will look like. This may
>    also have repercussions on how repositories may depend on each other
>    (SNAPSHOT vs LATEST). Note that this should be discussed for each
>    repo separately.
>
>    The current options I see are the following:
>
>    a) Single release
>
>        Release all repositories at once as a single product.
>
>        The source release would be a collection of repositories, like
>        flink/
>        |--flink-main/
>            |--flink-core/
>            |--flink-runtime/
>            ...
>        |--flink-connectors/
>            ...
>        |--flink-.../
>        ...
>
>        This option requires a SNAPSHOT dependency between Flink
>        repositories, but it is pretty much how things work at the moment.
>
>    b) Synced releases
>
>        Similar to a), except that each repository gets their own source
>        release that they may released independent of other repositories.
>        For a given release cycle each repo would produce exactly one
>        release.
>
>        This option requires a SNAPSHOT dependency between Flink
>        repositories. Once any repositories has created an RC or
>        finished it's release, release-branches in other repos can
>        switch to that version.
>
>        This approach is a tad more flexible than a), but requires more
>        coordination between the repos.
>
>    c) Separate releases
>
>        Just like we handle flink-shaded; entirely separate release
>        cycles; some repositories may have more releases in a given time
>        period than others.
>
>        This option implies a LATEST dependency between Flink
> repositories.
>
>    Note that hybrid approaches would also make sense, like doing b) for
>    major versions and c) for bugfix releases.
>
>    For something like flink-libraries this question may also have
>    repercussions on how/whether they are bundled in the distribution;
>    options a)/b) would maintain the status-quo, c) and hybrid
>    approaches will likely necessitate the exclusion from the
> distribution.
>
>

Robert Metzger

Re: [DISCUSS] Repository split

Thanks a lot for your summary Chesnay.
I agree with you that we have no consensus in the community for splitting
up the repository immediately, and I agree with you that we should have a
separate discussion about reducing the build time (which is already making
good progress).

Also, I will keep the thoughts about decentralising the Flink development
in the back of my head and bring it up again whenever I feel it's the right
time.

On Wed, Aug 14, 2019 at 2:26 PM Chesnay Schepler <[hidden email]> wrote:

> Let's recap a bit:
>
> Several people have raised the argument that build times can be kept in
> check via other means (mostly differential builds via some means, be it
> custom scripts or switching to gradle). I will start a separate
> discussion thread on this topic, since it is a useful discussion in any
> case.
> I agree with this, and believe it is feasible to update the CI process
> to behave as if the repository was split.
>
>
> The suggestion of a "project split" within a single repository was
> brought up.
> This approach is a mixed bag; it avoids the downsides to the development
> process that multiple repositories would incur, but also only has few
> upsides. It seems primarily relevant for local development, where one
> might want to skip certain modules when running tests.
>
> There's no benefit from the CI side: since we're still limited to 1
> .travis.yml, whatever rules we want to set up (e.g., "do not test core
> if only connectors are modified") have to be handled by the CI scripts
> regardless of whether the project is split or not.
>
> Overall, I'd like to put this item on ice for the time being; the
> subsequent item is related, vastly more impactful and may also render
> this item obsolete.
>
>
> A major topic of discussion is that of the development process. It was
> pointed how that having a split repository makes the dev process more
> complicated, since certain changes turn into a 2 step process (merge to
> core, then merge to connectors). Others have pointed out that this may
> actually be an advantage, as it (to some extent) enforces that changes
> to core are also tested in core.
>
> I find myself more in the latter camp; it is all to easy for people to
> make a change to the core while making whatever adjustments to
> connectors to make things fit. A recent change to the ClosureCleaner in
> 1.8.0 <https://issues.apache.org/jira/browse/FLINK-13586> comes to mind,
> which, with a split repo, may have resulted in build failures in the
> connectors project. (provided that the time-frame between the 2 merges
> is sufficiently large...) As Arvid pointed out, having to feel the pain
> that users have to go through may not be such a bad thing.
>
> This is a fundamental discussion as to whether we want to continue with
> a centralized development of all components.
>
> Robert also pointed out that such a split could result in us
> establishing entirely separate projects. We've had times in the past
> (like the first flink-ml library) where such a setup may have simplified
> things (back then we had lot's of contributors but no committer to
> shepherd the effort; a separate project could be more lenient when it
> comes to appointing new committers).
>
>
> @Robert We should have a SNAPSHOT dependency /somewhere/ in the
> connector repo, to detect issues (like the ClosureCleaner one) in a
> timely manner and to prepare for new features so that we can have a
> timely release after core, but not necessarily on the master branch.
>
> @Bowen I have implemented and deployed your suggestion to cancel Travis
> builds if the associated PR has been closed.
>
>
> On 07/08/2019 13:14, Chesnay Schepler wrote:
> > Hello everyone,
> >
> > The Flink project sees an ever-increasing amount of dev activity, both
> > in terms of reworked and new features.
> >
> > This is of course an excellent situation to be in, but we are getting
> > to a point where the associate downsides are becoming increasingly
> > troublesome.
> >
> > The ever increasing build times, in addition to unstable tests,
> > significantly slow down the develoment process.
> > Additionally, pull requests for smaller features frequently slip
> > through the crasks as they are being buried under a mountain of other
> > pull requests.
> >
> > As a result I'd like to start a discussion on splitting the Flink
> > repository.
> >
> > In this mail I will outline the core idea, and what problems I
> > currently envision.
> >
> > I'd specifically like to encourage those who were part of similar
> > initiatives in other projects to share the experiences and ideas.
> >
> >
> > General Idea
> >
> > For starters, the idea is to create a new repository for
> > "flink-connectors".
> > For the remainder of this mail, the current Flink repository is
> > referred to as "flink-main".
> >
> > There are also other candidates that we could discuss in the future,
> > like flink-libraries (the next top-priority repo to ease flink-ml
> > development), metric reporters, filesystems and flink-formats.
> >
> > Moving out flink-connectors provides the most benefits, as we straight
> > away save at-least an hour of testing time, and not being included in
> > the binary distribution simplifies a few things.
> >
> >
> > Problems to solve
> >
> > To make this a reality there's a number of questions we have to
> > discuss; some in the short-term, others in the long-term.
> >
> > 1) Git history
> >
> > We have to decide whether we want to rewrite the history of sub
> > repositories to only contain diffs/commits related to this part of
> > Flink, or whether we just fork from some commit in flink-main and
> > add a commit to the connector repo that "transforms" it from
> > flink-main to flink-connectors (i.e., remove everything unrelated to
> > connectors + update module structure etc.).
> >
> > The latter option would have the advantage that our commit book
> > keeping in JIRA would still be correct, but it would create a
> > significant divide between the current and past state of the
> > repository.
> >
> > 2) Maven
> >
> > We should look into whether there's a way to share dependency/plugin
> > configurations and similar, so we don't have to keep them in sync
> > manually across multiple repositories.
> >
> > A new parent Flink pom that all repositories define as their parent
> > could work; this would imply splicing out part of the current room
> > pom.xml.
> >
> > 3) Documentation
> >
> > Splitting the repository realistically also implies splitting the
> > documentation source files (At the beginning we can get by with
> > having it still in flink-main).
> > We could just move the relevant files to the respective repository
> > (while maintaining the directory structure), and merge them when
> > building the docs.
> >
> > We also have to look at how we can handle java-/scaladocs; e.g.
> > whether it is possible to aggregate them across projects.
> >
> > 4) CI (end-to-end tests)
> >
> > The very basic question we have to answer is whether we want E2E
> > tests in the sub repositories. If so, we need to find a way to share
> > e2e-tooling.
> >
> > 5) Releases
> >
> > We have to discuss how our release process will look like. This may
> > also have repercussions on how repositories may depend on each other
> > (SNAPSHOT vs LATEST). Note that this should be discussed for each
> > repo separately.
> >
> > The current options I see are the following:
> >
> > a) Single release
> >
> > Release all repositories at once as a single product.
> >
> > The source release would be a collection of repositories, like
> > flink/
> > |--flink-main/
> > |--flink-core/
> > |--flink-runtime/
> > ...
> > |--flink-connectors/
> > ...
> > |--flink-.../
> > ...
> >
> > This option requires a SNAPSHOT dependency between Flink
> > repositories, but it is pretty much how things work at the moment.
> >
> > b) Synced releases
> >
> > Similar to a), except that each repository gets their own source
> > release that they may released independent of other repositories.
> > For a given release cycle each repo would produce exactly one
> > release.
> >
> > This option requires a SNAPSHOT dependency between Flink
> > repositories. Once any repositories has created an RC or
> > finished it's release, release-branches in other repos can
> > switch to that version.
> >
> > This approach is a tad more flexible than a), but requires more
> > coordination between the repos.
> >
> > c) Separate releases
> >
> > Just like we handle flink-shaded; entirely separate release
> > cycles; some repositories may have more releases in a given time
> > period than others.
> >
> > This option implies a LATEST dependency between Flink
> > repositories.
> >
> > Note that hybrid approaches would also make sense, like doing b) for
> > major versions and c) for bugfix releases.
> >
> > For something like flink-libraries this question may also have
> > repercussions on how/whether they are bundled in the distribution;
> > options a)/b) would maintain the status-quo, c) and hybrid
> > approaches will likely necessitate the exclusion from the
> > distribution.
> >
> >
>
>