[DISCUSS] A strategy for merging the Blink enhancements

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] A strategy for merging the Blink enhancements

Stephan Ewen
Dear Flink community!

As a follow-up to the thread announcing Alibaba's offer to contribute the
Blink code [1]
<https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E>
,
here are some thoughts on how this contribution could be merged.

As described in the announcement thread, it is a big contribution, and we
need to
carefully plan how to handle the contribution. We would like to get the
improvements to Flink,
while making it as non-disruptive as possible for the community.
I hope that this plan gives the community get a better understanding of
what the
proposed contribution would mean.

Here is an initial rough proposal, with thoughts from
Timo, Piotr, Dawid, Kurt, Shaoxuan, Jincheng, Jark, Aljoscha, Fabian,
Xiaowei:

  - It is obviously very hard to merge all changes in a quick move, because
we
    are talking about multiple 100k lines of code.

  - As much as possible, we want to maintain compatibility with the current
Table API,
    so that this becomes a transparent change for most users.

  - The two areas with the most changes we identified were
     (1) The SQL/Table query processor
     (2) The batch scheduling/failover/shuffle

  - For the query processor part, this is what we found and propose:

    -> The Blink and Flink code have the same semantics (ANSI SQL) except
for minor
       aspects (under discussion). Blink also covers more SQL operations.

    -> The Blink code is quite different from the current Flink SQL runtime.
       Merging as changes seems hardly feasible. From the current
evaluation, the
       Blink query processor uses the more advanced architecture, so it
would make
       sense to converge to that design.

    -> We propose to gradually build up the Blink-based query processor as
a second
       query processor under the SQL/Table API. Think of it as two
different runners
       for the Table API.
       As the new query processor becomes fully merged and stable, we can
deprecate and
       eventually remove the existing query processor. That should give the
least
       disruption to Flink users and allow for gradual merge/development.

    -> Some refactoring of the Table API is necessary to support the above
strategy.
       Most of the prerequisite refactoring is around splitting the project
into
       different modules, following a similar idea as FLIP-28 [2]
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free>
.

    -> A more detailed proposal is being worked on.

    -> Same as FLIP-28, this approach would probably need to suspend Table
API
       contributions for a short while. We hope that this can be a very
short period,
       to not impact the very active development in Flink on Table API/SQL
too much.

  - For the batch scheduling and failover enhancements, we should be able
to build
    on the currently ongoing refactoring of the scheduling logic [3]
<https://issues.apache.org/jira/browse/FLINK-10429>. That should
    make it easy to plug in a new scheduler and failover logic. We can port
the Blink
    enhancements as a new scheduler / failover handler. We can later make
it the
    default for bounded stream programs once the merge is completed and it
is tested.

  - For the catalog and source/sink design and interfaces, we would like to
    continue with the already started design discussion threads. Once these
are
    converged, we might use some of the Blink code for the implementation,
if it
    is close to the outcome of the design discussions.

Best,
Stephan

[1]
https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E

[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free

[3] https://issues.apache.org/jira/browse/FLINK-10429
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A strategy for merging the Blink enhancements

Zhang, Xuefu
Hi Stephan,

Thanks for bringing up the discussions. I'm +1 on the merging plan. One question though: since the merge will not be completed for some time and there are might be uses trying blink branch, what's the plan for the development in the branch? Personally I think we may discourage big contributions to the branch, which would further complicate the merge, while we shouldn't stop critical fixes as well.

What's your take on this?

Thanks,
Xuefu


------------------------------------------------------------------
From:Stephan Ewen <[hidden email]>
Sent At:2019 Jan. 22 (Tue.) 06:16
To:dev <[hidden email]>
Subject:[DISCUSS] A strategy for merging the Blink enhancements

Dear Flink community!

As a follow-up to the thread announcing Alibaba's offer to contribute the
Blink code [1]
<https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E>
,
here are some thoughts on how this contribution could be merged.

As described in the announcement thread, it is a big contribution, and we
need to
carefully plan how to handle the contribution. We would like to get the
improvements to Flink,
while making it as non-disruptive as possible for the community.
I hope that this plan gives the community get a better understanding of
what the
proposed contribution would mean.

Here is an initial rough proposal, with thoughts from
Timo, Piotr, Dawid, Kurt, Shaoxuan, Jincheng, Jark, Aljoscha, Fabian,
Xiaowei:

  - It is obviously very hard to merge all changes in a quick move, because
we
    are talking about multiple 100k lines of code.

  - As much as possible, we want to maintain compatibility with the current
Table API,
    so that this becomes a transparent change for most users.

  - The two areas with the most changes we identified were
     (1) The SQL/Table query processor
     (2) The batch scheduling/failover/shuffle

  - For the query processor part, this is what we found and propose:

    -> The Blink and Flink code have the same semantics (ANSI SQL) except
for minor
       aspects (under discussion). Blink also covers more SQL operations.

    -> The Blink code is quite different from the current Flink SQL runtime.
       Merging as changes seems hardly feasible. From the current
evaluation, the
       Blink query processor uses the more advanced architecture, so it
would make
       sense to converge to that design.

    -> We propose to gradually build up the Blink-based query processor as
a second
       query processor under the SQL/Table API. Think of it as two
different runners
       for the Table API.
       As the new query processor becomes fully merged and stable, we can
deprecate and
       eventually remove the existing query processor. That should give the
least
       disruption to Flink users and allow for gradual merge/development.

    -> Some refactoring of the Table API is necessary to support the above
strategy.
       Most of the prerequisite refactoring is around splitting the project
into
       different modules, following a similar idea as FLIP-28 [2]
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free>
.

    -> A more detailed proposal is being worked on.

    -> Same as FLIP-28, this approach would probably need to suspend Table
API
       contributions for a short while. We hope that this can be a very
short period,
       to not impact the very active development in Flink on Table API/SQL
too much.

  - For the batch scheduling and failover enhancements, we should be able
to build
    on the currently ongoing refactoring of the scheduling logic [3]
<https://issues.apache.org/jira/browse/FLINK-10429>. That should
    make it easy to plug in a new scheduler and failover logic. We can port
the Blink
    enhancements as a new scheduler / failover handler. We can later make
it the
    default for bounded stream programs once the merge is completed and it
is tested.

  - For the catalog and source/sink design and interfaces, we would like to
    continue with the already started design discussion threads. Once these
are
    converged, we might use some of the Blink code for the implementation,
if it
    is close to the outcome of the design discussions.

Best,
Stephan

[1]
https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E

[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free

[3] https://issues.apache.org/jira/browse/FLINK-10429
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A strategy for merging the Blink enhancements

Stephan Ewen
I think that is a reasonable proposal. Bugs that are identified could be
fixed in the blink branch, so that we merge the working code.

New feature contributions to that branch would complicate the merge. I
would try and rather focus on merging and let new contributions go to the
master branch.

On Tue, Jan 22, 2019 at 11:12 PM Zhang, Xuefu <[hidden email]>
wrote:

> Hi Stephan,
>
> Thanks for bringing up the discussions. I'm +1 on the merging plan. One
> question though: since the merge will not be completed for some time and
> there are might be uses trying blink branch, what's the plan for the
> development in the branch? Personally I think we may discourage big
> contributions to the branch, which would further complicate the merge,
> while we shouldn't stop critical fixes as well.
>
> What's your take on this?
>
> Thanks,
> Xuefu
>
>
> ------------------------------------------------------------------
> From:Stephan Ewen <[hidden email]>
> Sent At:2019 Jan. 22 (Tue.) 06:16
> To:dev <[hidden email]>
> Subject:[DISCUSS] A strategy for merging the Blink enhancements
>
> Dear Flink community!
>
> As a follow-up to the thread announcing Alibaba's offer to contribute the
> Blink code [1]
> <
> https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E
> >
> ,
> here are some thoughts on how this contribution could be merged.
>
> As described in the announcement thread, it is a big contribution, and we
> need to
> carefully plan how to handle the contribution. We would like to get the
> improvements to Flink,
> while making it as non-disruptive as possible for the community.
> I hope that this plan gives the community get a better understanding of
> what the
> proposed contribution would mean.
>
> Here is an initial rough proposal, with thoughts from
> Timo, Piotr, Dawid, Kurt, Shaoxuan, Jincheng, Jark, Aljoscha, Fabian,
> Xiaowei:
>
>   - It is obviously very hard to merge all changes in a quick move, because
> we
>     are talking about multiple 100k lines of code.
>
>   - As much as possible, we want to maintain compatibility with the current
> Table API,
>     so that this becomes a transparent change for most users.
>
>   - The two areas with the most changes we identified were
>      (1) The SQL/Table query processor
>      (2) The batch scheduling/failover/shuffle
>
>   - For the query processor part, this is what we found and propose:
>
>     -> The Blink and Flink code have the same semantics (ANSI SQL) except
> for minor
>        aspects (under discussion). Blink also covers more SQL operations.
>
>     -> The Blink code is quite different from the current Flink SQL
> runtime.
>        Merging as changes seems hardly feasible. From the current
> evaluation, the
>        Blink query processor uses the more advanced architecture, so it
> would make
>        sense to converge to that design.
>
>     -> We propose to gradually build up the Blink-based query processor as
> a second
>        query processor under the SQL/Table API. Think of it as two
> different runners
>        for the Table API.
>        As the new query processor becomes fully merged and stable, we can
> deprecate and
>        eventually remove the existing query processor. That should give the
> least
>        disruption to Flink users and allow for gradual merge/development.
>
>     -> Some refactoring of the Table API is necessary to support the above
> strategy.
>        Most of the prerequisite refactoring is around splitting the project
> into
>        different modules, following a similar idea as FLIP-28 [2]
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free
> >
> .
>
>     -> A more detailed proposal is being worked on.
>
>     -> Same as FLIP-28, this approach would probably need to suspend Table
> API
>        contributions for a short while. We hope that this can be a very
> short period,
>        to not impact the very active development in Flink on Table API/SQL
> too much.
>
>   - For the batch scheduling and failover enhancements, we should be able
> to build
>     on the currently ongoing refactoring of the scheduling logic [3]
> <https://issues.apache.org/jira/browse/FLINK-10429>. That should
>     make it easy to plug in a new scheduler and failover logic. We can port
> the Blink
>     enhancements as a new scheduler / failover handler. We can later make
> it the
>     default for bounded stream programs once the merge is completed and it
> is tested.
>
>   - For the catalog and source/sink design and interfaces, we would like to
>     continue with the already started design discussion threads. Once these
> are
>     converged, we might use some of the Blink code for the implementation,
> if it
>     is close to the outcome of the design discussions.
>
> Best,
> Stephan
>
> [1]
>
> https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E
>
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free
>
> [3] https://issues.apache.org/jira/browse/FLINK-10429
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] A strategy for merging the Blink enhancements

Till Rohrmann
+1 for Stephan's merge proposal. I think it makes sense to pause the
development of the Table API for a short time in order to be able to
quickly converge on a common API.

From my experience with the Flip-6 refactoring it can be challenging to
catch up with a branch which is actively developed. The biggest danger is
to miss changes which are only ported to a single branch and to develop
features which are not compatible with the other branch. Limiting the
changes to critical fixes and paying attention to applying them also to the
other branch should help with this problem.

Cheers,
Till

On Wed, Jan 23, 2019 at 3:28 PM Stephan Ewen <[hidden email]> wrote:

> I think that is a reasonable proposal. Bugs that are identified could be
> fixed in the blink branch, so that we merge the working code.
>
> New feature contributions to that branch would complicate the merge. I
> would try and rather focus on merging and let new contributions go to the
> master branch.
>
> On Tue, Jan 22, 2019 at 11:12 PM Zhang, Xuefu <[hidden email]>
> wrote:
>
> > Hi Stephan,
> >
> > Thanks for bringing up the discussions. I'm +1 on the merging plan. One
> > question though: since the merge will not be completed for some time and
> > there are might be uses trying blink branch, what's the plan for the
> > development in the branch? Personally I think we may discourage big
> > contributions to the branch, which would further complicate the merge,
> > while we shouldn't stop critical fixes as well.
> >
> > What's your take on this?
> >
> > Thanks,
> > Xuefu
> >
> >
> > ------------------------------------------------------------------
> > From:Stephan Ewen <[hidden email]>
> > Sent At:2019 Jan. 22 (Tue.) 06:16
> > To:dev <[hidden email]>
> > Subject:[DISCUSS] A strategy for merging the Blink enhancements
> >
> > Dear Flink community!
> >
> > As a follow-up to the thread announcing Alibaba's offer to contribute the
> > Blink code [1]
> > <
> >
> https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E
> > >
> > ,
> > here are some thoughts on how this contribution could be merged.
> >
> > As described in the announcement thread, it is a big contribution, and we
> > need to
> > carefully plan how to handle the contribution. We would like to get the
> > improvements to Flink,
> > while making it as non-disruptive as possible for the community.
> > I hope that this plan gives the community get a better understanding of
> > what the
> > proposed contribution would mean.
> >
> > Here is an initial rough proposal, with thoughts from
> > Timo, Piotr, Dawid, Kurt, Shaoxuan, Jincheng, Jark, Aljoscha, Fabian,
> > Xiaowei:
> >
> >   - It is obviously very hard to merge all changes in a quick move,
> because
> > we
> >     are talking about multiple 100k lines of code.
> >
> >   - As much as possible, we want to maintain compatibility with the
> current
> > Table API,
> >     so that this becomes a transparent change for most users.
> >
> >   - The two areas with the most changes we identified were
> >      (1) The SQL/Table query processor
> >      (2) The batch scheduling/failover/shuffle
> >
> >   - For the query processor part, this is what we found and propose:
> >
> >     -> The Blink and Flink code have the same semantics (ANSI SQL) except
> > for minor
> >        aspects (under discussion). Blink also covers more SQL operations.
> >
> >     -> The Blink code is quite different from the current Flink SQL
> > runtime.
> >        Merging as changes seems hardly feasible. From the current
> > evaluation, the
> >        Blink query processor uses the more advanced architecture, so it
> > would make
> >        sense to converge to that design.
> >
> >     -> We propose to gradually build up the Blink-based query processor
> as
> > a second
> >        query processor under the SQL/Table API. Think of it as two
> > different runners
> >        for the Table API.
> >        As the new query processor becomes fully merged and stable, we can
> > deprecate and
> >        eventually remove the existing query processor. That should give
> the
> > least
> >        disruption to Flink users and allow for gradual merge/development.
> >
> >     -> Some refactoring of the Table API is necessary to support the
> above
> > strategy.
> >        Most of the prerequisite refactoring is around splitting the
> project
> > into
> >        different modules, following a similar idea as FLIP-28 [2]
> > <
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free
> > >
> > .
> >
> >     -> A more detailed proposal is being worked on.
> >
> >     -> Same as FLIP-28, this approach would probably need to suspend
> Table
> > API
> >        contributions for a short while. We hope that this can be a very
> > short period,
> >        to not impact the very active development in Flink on Table
> API/SQL
> > too much.
> >
> >   - For the batch scheduling and failover enhancements, we should be able
> > to build
> >     on the currently ongoing refactoring of the scheduling logic [3]
> > <https://issues.apache.org/jira/browse/FLINK-10429>. That should
> >     make it easy to plug in a new scheduler and failover logic. We can
> port
> > the Blink
> >     enhancements as a new scheduler / failover handler. We can later make
> > it the
> >     default for bounded stream programs once the merge is completed and
> it
> > is tested.
> >
> >   - For the catalog and source/sink design and interfaces, we would like
> to
> >     continue with the already started design discussion threads. Once
> these
> > are
> >     converged, we might use some of the Blink code for the
> implementation,
> > if it
> >     is close to the outcome of the design discussions.
> >
> > Best,
> > Stephan
> >
> > [1]
> >
> >
> https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E
> >
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free
> >
> > [3] https://issues.apache.org/jira/browse/FLINK-10429
> >
>