[DISCUSS] Supporting multiple Flink versions vs. tech debt

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Supporting multiple Flink versions vs. tech debt

David Morávek
Hello,

we currently have an opened PR for Flink 1.9
<https://github.com/apache/beam/pull/9296>, which greatly improves the
runner for batch use-case. In case the PR gets merged, we would be
supporting 5 latest major versions of Flink, which obviously come with high
maintenance price and makes future development harder (there are already a
sub-optimal parts due to compatibility with previous versions). Thomas and
Max expressed needs for addressing the issue with the current release.

Let's break down possible solution for the problem.

*1) Current solution*

Currently we maintain separate build for each version. The project
structure looks as follows:

*flink/*
+
*1.5/*
   + *src/** # implementation of classes that differ between versions*
   - build.gradle
+ *1.6/*
   + build.gradle #* the version is backward compatible, so it can reuse
"overrides" from 1.5*
+ *1.7/*
   + build.gradle #* the version is backward compatible, so it can reuse
"overrides" from 1.5*
+ *1.8/*
   + *src/ **# implementation of classes that differ between versions*
   - build.gradle
+ *1.9/*
   + *src/ **# implementation of classes that differ between versions*
   - build.gradle
+ *src/*
* # common source, shared among runner versions*
- flink_runner.gradle
* # included by  each <version>/build.gradle*

The problem with this structure is, that we always need to copy all of the
version specific classes between backward incompatible versions, which
results in *duplicate files* (we can not simply override a single file,
because it wouldn't compile due to duplicate classes).

*2) Symlink duplicates*

Maybe we can simply symlink duplicates between versions and only override
the files that need to be changed?

*3) Adjusting the gradle build*

Currently a version build looks something like this (this one is for 1.7.x
version):

project.ext {
  // Set the version of all Flink-related dependencies here.
  flink_version = '1.7.2'
  // Main source directory and Flink version specific code.
  main_source_dirs = ["$basePath/src/main/java", "../1.5/src/main/java"]
  test_source_dirs = ["$basePath/src/test/java", "../1.5/src/test/java"]
  main_resources_dirs = ["$basePath/src/main/resources"]
  test_resources_dirs = ["$basePath/src/test/resources"]
  archives_base_name = 'beam-runners-flink-1.7'
}

// Load the main build script which contains all build logic.
apply from: "$basePath/flink_runner.gradle"

It basically says, take the common source and append version specific
implementations from 1.5 version. Let's say we want to override a single
file for 1.8. We need to copy everything from 1.5/src and the build file
would look as follows:

/* All properties required for loading the Flink build script */
project.ext {
  // Set the version of all Flink-related dependencies here.
  flink_version = '1.8.0'
  // Main source directory and Flink version specific code.
  main_source_dirs = ["$basePath/src/main/java", "./src/main/java"]
  test_source_dirs = ["$basePath/src/test/java", "./src/test/java"]
  main_resources_dirs = ["$basePath/src/main/resources"]
  test_resources_dirs = ["$basePath/src/test/resources"]
  archives_base_name = 'beam-runners-flink-1.8'
}

// Load the main build script which contains all build logic.
apply from: "$basePath/flink_runner.gradle"

For simplicity, let's only focus on *main_source_dirs*. What we really want
to do is to tell the build, to use everything from 1.5 and override a
single class (e.g. CoderTypeSerializer).

def copyOverrides = tasks.register('copyOverrides', Copy) {
  it.from '../1.5/src/', './src'
  it.into "${project.buildDir}/flink-overrides/src"
  it.duplicatesStrategy DuplicatesStrategy.INCLUDE // The last duplicate
file 'wins'.
}

compileJava.dependsOn copyOverrides

projext.ext {
  main_source_dirs = ["$basePath/src/main/java",
"${project.buildDir}/flink-overrides/src/main/java"]
}

This would copy all overrides into build directory, and it case of
duplicate it picks the latest one. Than the build would simple compile
classes from the newly created java files in build directory.

*4) Maintaining last 3 major versions only*

I recall that Flink community only supports 3 latest major versions
<https://flink.apache.org/downloads.html> (please correct me if I'm
mistaken). I suggest the the* Beam would do the same*. There is already an
opened BEAM-7962 <https://jira.apache.org/jira/browse/BEAM-7962> that
suggest dropping 1.5 & 1.6 versions. Maybe this would allow us to keep the
current structure with bearable amount of technical debt?

Personally I'm in favor of *4)* combined with *3)*.

What do you think? Do you have any other suggestions how to solve this?

Thanks,
D.
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Supporting multiple Flink versions vs. tech debt

David Morávek
sorry, wrong mailing list, I wanted to send this to the beam one.

Sorry for the confusion.
D.

On Sat, Sep 7, 2019 at 12:32 PM David Morávek <[hidden email]> wrote:

> Hello,
>
> we currently have an opened PR for Flink 1.9
> <https://github.com/apache/beam/pull/9296>, which greatly improves the
> runner for batch use-case. In case the PR gets merged, we would be
> supporting 5 latest major versions of Flink, which obviously come with high
> maintenance price and makes future development harder (there are already a
> sub-optimal parts due to compatibility with previous versions). Thomas and
> Max expressed needs for addressing the issue with the current release.
>
> Let's break down possible solution for the problem.
>
> *1) Current solution*
>
> Currently we maintain separate build for each version. The project
> structure looks as follows:
>
> *flink/*
> +
> *1.5/*
>    + *src/** # implementation of classes that differ between versions*
>    - build.gradle
> + *1.6/*
>    + build.gradle #* the version is backward compatible, so it can reuse
> "overrides" from 1.5*
> + *1.7/*
>    + build.gradle #* the version is backward compatible, so it can reuse
> "overrides" from 1.5*
> + *1.8/*
>    + *src/ **# implementation of classes that differ between versions*
>    - build.gradle
> + *1.9/*
>    + *src/ **# implementation of classes that differ between versions*
>    - build.gradle
> + *src/*
> * # common source, shared among runner versions*
> - flink_runner.gradle
> * # included by  each <version>/build.gradle*
>
> The problem with this structure is, that we always need to copy all of the
> version specific classes between backward incompatible versions, which
> results in *duplicate files* (we can not simply override a single file,
> because it wouldn't compile due to duplicate classes).
>
> *2) Symlink duplicates*
>
> Maybe we can simply symlink duplicates between versions and only override
> the files that need to be changed?
>
> *3) Adjusting the gradle build*
>
> Currently a version build looks something like this (this one is for 1.7.x
> version):
>
> project.ext {
>   // Set the version of all Flink-related dependencies here.
>   flink_version = '1.7.2'
>   // Main source directory and Flink version specific code.
>   main_source_dirs = ["$basePath/src/main/java", "../1.5/src/main/java"]
>   test_source_dirs = ["$basePath/src/test/java", "../1.5/src/test/java"]
>   main_resources_dirs = ["$basePath/src/main/resources"]
>   test_resources_dirs = ["$basePath/src/test/resources"]
>   archives_base_name = 'beam-runners-flink-1.7'
> }
>
> // Load the main build script which contains all build logic.
> apply from: "$basePath/flink_runner.gradle"
>
> It basically says, take the common source and append version specific
> implementations from 1.5 version. Let's say we want to override a single
> file for 1.8. We need to copy everything from 1.5/src and the build file
> would look as follows:
>
> /* All properties required for loading the Flink build script */
> project.ext {
>   // Set the version of all Flink-related dependencies here.
>   flink_version = '1.8.0'
>   // Main source directory and Flink version specific code.
>   main_source_dirs = ["$basePath/src/main/java", "./src/main/java"]
>   test_source_dirs = ["$basePath/src/test/java", "./src/test/java"]
>   main_resources_dirs = ["$basePath/src/main/resources"]
>   test_resources_dirs = ["$basePath/src/test/resources"]
>   archives_base_name = 'beam-runners-flink-1.8'
> }
>
> // Load the main build script which contains all build logic.
> apply from: "$basePath/flink_runner.gradle"
>
> For simplicity, let's only focus on *main_source_dirs*. What we really want
> to do is to tell the build, to use everything from 1.5 and override a
> single class (e.g. CoderTypeSerializer).
>
> def copyOverrides = tasks.register('copyOverrides', Copy) {
>   it.from '../1.5/src/', './src'
>   it.into "${project.buildDir}/flink-overrides/src"
>   it.duplicatesStrategy DuplicatesStrategy.INCLUDE // The last duplicate
> file 'wins'.
> }
>
> compileJava.dependsOn copyOverrides
>
> projext.ext {
>   main_source_dirs = ["$basePath/src/main/java",
> "${project.buildDir}/flink-overrides/src/main/java"]
> }
>
> This would copy all overrides into build directory, and it case of
> duplicate it picks the latest one. Than the build would simple compile
> classes from the newly created java files in build directory.
>
> *4) Maintaining last 3 major versions only*
>
> I recall that Flink community only supports 3 latest major versions
> <https://flink.apache.org/downloads.html> (please correct me if I'm
> mistaken). I suggest the the* Beam would do the same*. There is already
> an opened BEAM-7962 <https://jira.apache.org/jira/browse/BEAM-7962> that
> suggest dropping 1.5 & 1.6 versions. Maybe this would allow us to keep the
> current structure with bearable amount of technical debt?
>
> Personally I'm in favor of *4)* combined with *3)*.
>
> What do you think? Do you have any other suggestions how to solve this?
>
> Thanks,
> D.
>