Hello,
we currently have an opened PR for Flink 1.9 <https://github.com/apache/beam/pull/9296>, which greatly improves the runner for batch use-case. In case the PR gets merged, we would be supporting 5 latest major versions of Flink, which obviously come with high maintenance price and makes future development harder (there are already a sub-optimal parts due to compatibility with previous versions). Thomas and Max expressed needs for addressing the issue with the current release. Let's break down possible solution for the problem. *1) Current solution* Currently we maintain separate build for each version. The project structure looks as follows: *flink/* + *1.5/* + *src/** # implementation of classes that differ between versions* - build.gradle + *1.6/* + build.gradle #* the version is backward compatible, so it can reuse "overrides" from 1.5* + *1.7/* + build.gradle #* the version is backward compatible, so it can reuse "overrides" from 1.5* + *1.8/* + *src/ **# implementation of classes that differ between versions* - build.gradle + *1.9/* + *src/ **# implementation of classes that differ between versions* - build.gradle + *src/* * # common source, shared among runner versions* - flink_runner.gradle * # included by each <version>/build.gradle* The problem with this structure is, that we always need to copy all of the version specific classes between backward incompatible versions, which results in *duplicate files* (we can not simply override a single file, because it wouldn't compile due to duplicate classes). *2) Symlink duplicates* Maybe we can simply symlink duplicates between versions and only override the files that need to be changed? *3) Adjusting the gradle build* Currently a version build looks something like this (this one is for 1.7.x version): project.ext { // Set the version of all Flink-related dependencies here. flink_version = '1.7.2' // Main source directory and Flink version specific code. main_source_dirs = ["$basePath/src/main/java", "../1.5/src/main/java"] test_source_dirs = ["$basePath/src/test/java", "../1.5/src/test/java"] main_resources_dirs = ["$basePath/src/main/resources"] test_resources_dirs = ["$basePath/src/test/resources"] archives_base_name = 'beam-runners-flink-1.7' } // Load the main build script which contains all build logic. apply from: "$basePath/flink_runner.gradle" It basically says, take the common source and append version specific implementations from 1.5 version. Let's say we want to override a single file for 1.8. We need to copy everything from 1.5/src and the build file would look as follows: /* All properties required for loading the Flink build script */ project.ext { // Set the version of all Flink-related dependencies here. flink_version = '1.8.0' // Main source directory and Flink version specific code. main_source_dirs = ["$basePath/src/main/java", "./src/main/java"] test_source_dirs = ["$basePath/src/test/java", "./src/test/java"] main_resources_dirs = ["$basePath/src/main/resources"] test_resources_dirs = ["$basePath/src/test/resources"] archives_base_name = 'beam-runners-flink-1.8' } // Load the main build script which contains all build logic. apply from: "$basePath/flink_runner.gradle" For simplicity, let's only focus on *main_source_dirs*. What we really want to do is to tell the build, to use everything from 1.5 and override a single class (e.g. CoderTypeSerializer). def copyOverrides = tasks.register('copyOverrides', Copy) { it.from '../1.5/src/', './src' it.into "${project.buildDir}/flink-overrides/src" it.duplicatesStrategy DuplicatesStrategy.INCLUDE // The last duplicate file 'wins'. } compileJava.dependsOn copyOverrides projext.ext { main_source_dirs = ["$basePath/src/main/java", "${project.buildDir}/flink-overrides/src/main/java"] } This would copy all overrides into build directory, and it case of duplicate it picks the latest one. Than the build would simple compile classes from the newly created java files in build directory. *4) Maintaining last 3 major versions only* I recall that Flink community only supports 3 latest major versions <https://flink.apache.org/downloads.html> (please correct me if I'm mistaken). I suggest the the* Beam would do the same*. There is already an opened BEAM-7962 <https://jira.apache.org/jira/browse/BEAM-7962> that suggest dropping 1.5 & 1.6 versions. Maybe this would allow us to keep the current structure with bearable amount of technical debt? Personally I'm in favor of *4)* combined with *3)*. What do you think? Do you have any other suggestions how to solve this? Thanks, D. |
sorry, wrong mailing list, I wanted to send this to the beam one.
Sorry for the confusion. D. On Sat, Sep 7, 2019 at 12:32 PM David Morávek <[hidden email]> wrote: > Hello, > > we currently have an opened PR for Flink 1.9 > <https://github.com/apache/beam/pull/9296>, which greatly improves the > runner for batch use-case. In case the PR gets merged, we would be > supporting 5 latest major versions of Flink, which obviously come with high > maintenance price and makes future development harder (there are already a > sub-optimal parts due to compatibility with previous versions). Thomas and > Max expressed needs for addressing the issue with the current release. > > Let's break down possible solution for the problem. > > *1) Current solution* > > Currently we maintain separate build for each version. The project > structure looks as follows: > > *flink/* > + > *1.5/* > + *src/** # implementation of classes that differ between versions* > - build.gradle > + *1.6/* > + build.gradle #* the version is backward compatible, so it can reuse > "overrides" from 1.5* > + *1.7/* > + build.gradle #* the version is backward compatible, so it can reuse > "overrides" from 1.5* > + *1.8/* > + *src/ **# implementation of classes that differ between versions* > - build.gradle > + *1.9/* > + *src/ **# implementation of classes that differ between versions* > - build.gradle > + *src/* > * # common source, shared among runner versions* > - flink_runner.gradle > * # included by each <version>/build.gradle* > > The problem with this structure is, that we always need to copy all of the > version specific classes between backward incompatible versions, which > results in *duplicate files* (we can not simply override a single file, > because it wouldn't compile due to duplicate classes). > > *2) Symlink duplicates* > > Maybe we can simply symlink duplicates between versions and only override > the files that need to be changed? > > *3) Adjusting the gradle build* > > Currently a version build looks something like this (this one is for 1.7.x > version): > > project.ext { > // Set the version of all Flink-related dependencies here. > flink_version = '1.7.2' > // Main source directory and Flink version specific code. > main_source_dirs = ["$basePath/src/main/java", "../1.5/src/main/java"] > test_source_dirs = ["$basePath/src/test/java", "../1.5/src/test/java"] > main_resources_dirs = ["$basePath/src/main/resources"] > test_resources_dirs = ["$basePath/src/test/resources"] > archives_base_name = 'beam-runners-flink-1.7' > } > > // Load the main build script which contains all build logic. > apply from: "$basePath/flink_runner.gradle" > > It basically says, take the common source and append version specific > implementations from 1.5 version. Let's say we want to override a single > file for 1.8. We need to copy everything from 1.5/src and the build file > would look as follows: > > /* All properties required for loading the Flink build script */ > project.ext { > // Set the version of all Flink-related dependencies here. > flink_version = '1.8.0' > // Main source directory and Flink version specific code. > main_source_dirs = ["$basePath/src/main/java", "./src/main/java"] > test_source_dirs = ["$basePath/src/test/java", "./src/test/java"] > main_resources_dirs = ["$basePath/src/main/resources"] > test_resources_dirs = ["$basePath/src/test/resources"] > archives_base_name = 'beam-runners-flink-1.8' > } > > // Load the main build script which contains all build logic. > apply from: "$basePath/flink_runner.gradle" > > For simplicity, let's only focus on *main_source_dirs*. What we really want > to do is to tell the build, to use everything from 1.5 and override a > single class (e.g. CoderTypeSerializer). > > def copyOverrides = tasks.register('copyOverrides', Copy) { > it.from '../1.5/src/', './src' > it.into "${project.buildDir}/flink-overrides/src" > it.duplicatesStrategy DuplicatesStrategy.INCLUDE // The last duplicate > file 'wins'. > } > > compileJava.dependsOn copyOverrides > > projext.ext { > main_source_dirs = ["$basePath/src/main/java", > "${project.buildDir}/flink-overrides/src/main/java"] > } > > This would copy all overrides into build directory, and it case of > duplicate it picks the latest one. Than the build would simple compile > classes from the newly created java files in build directory. > > *4) Maintaining last 3 major versions only* > > I recall that Flink community only supports 3 latest major versions > <https://flink.apache.org/downloads.html> (please correct me if I'm > mistaken). I suggest the the* Beam would do the same*. There is already > an opened BEAM-7962 <https://jira.apache.org/jira/browse/BEAM-7962> that > suggest dropping 1.5 & 1.6 versions. Maybe this would allow us to keep the > current structure with bearable amount of technical debt? > > Personally I'm in favor of *4)* combined with *3)*. > > What do you think? Do you have any other suggestions how to solve this? > > Thanks, > D. > |
Free forum by Nabble | Edit this page |