Hi,
I wanted to bring back the topic of backward compatibility with respect to all/most of the user facing aspects of Flink. Please note that isn't limited to the programming API, but also includes job submission and management. As can be seen in [1], changes in these areas cause difficulties downstream. Projects have to choose between Flink versions and users are ultimately at disadvantage, either by not being able to use the desired dependency or facing forced upgrades to their infrastructure. IMO the preferred solution would be that downstream projects can build against a minimum version of Flink and expect compatibility with future releases of the major version stream. For example, my project depends on 1.6.x and can expect to run without recompilation on 1.7.x and later. How far away is Flink from stabilizing the surface that affects typical users? Thanks, Thomas [1] https://issues.apache.org/jira/browse/BEAM-5419 |
I think this discussion needs specific examples as to what should be
possible as it otherwise is to vague / open to interpretation. For example, "job submission" may refer to CLI invocations continuing to work (i.e. CLI arguments), or being able to use a 1.6 client against a 1.7 cluster, which are entirely different things. What does "management" include? Dependencies? Set of jars that are released on maven? Set of jars bundled with flink-dist? On 26.11.2018 17:24, Thomas Weise wrote: > Hi, > > I wanted to bring back the topic of backward compatibility with respect to > all/most of the user facing aspects of Flink. Please note that isn't > limited to the programming API, but also includes job submission and > management. > > As can be seen in [1], changes in these areas cause difficulties > downstream. Projects have to choose between Flink versions and users are > ultimately at disadvantage, either by not being able to use the desired > dependency or facing forced upgrades to their infrastructure. > > IMO the preferred solution would be that downstream projects can build > against a minimum version of Flink and expect compatibility with future > releases of the major version stream. For example, my project depends on > 1.6.x and can expect to run without recompilation on 1.7.x and later. > > How far away is Flink from stabilizing the surface that affects typical > users? > > Thanks, > Thomas > > [1] https://issues.apache.org/jira/browse/BEAM-5419 > |
Hi,
I think this is a very good discussion to have. Flink is becoming part of more and more production deployments and more tools are built around it. The question is do we want to (or can we) make parts of the control/maintenance/monitoring API stable such that external systems/frameworks can rely on them as stable. Which APIs are relevant? Which APIs could be declared as stable? Which parts are still evolving? Fabian Am Di., 27. Nov. 2018 um 15:10 Uhr schrieb Chesnay Schepler < [hidden email]>: > I think this discussion needs specific examples as to what should be > possible as it otherwise is to vague / open to interpretation. > > For example, "job submission" may refer to CLI invocations continuing to > work (i.e. CLI arguments), or being able to use a 1.6 client against a > 1.7 cluster, which are entirely different things. > > What does "management" include? Dependencies? Set of jars that are > released on maven? Set of jars bundled with flink-dist? > > On 26.11.2018 17:24, Thomas Weise wrote: > > Hi, > > > > I wanted to bring back the topic of backward compatibility with respect > to > > all/most of the user facing aspects of Flink. Please note that isn't > > limited to the programming API, but also includes job submission and > > management. > > > > As can be seen in [1], changes in these areas cause difficulties > > downstream. Projects have to choose between Flink versions and users are > > ultimately at disadvantage, either by not being able to use the desired > > dependency or facing forced upgrades to their infrastructure. > > > > IMO the preferred solution would be that downstream projects can build > > against a minimum version of Flink and expect compatibility with future > > releases of the major version stream. For example, my project depends on > > 1.6.x and can expect to run without recompilation on 1.7.x and later. > > > > How far away is Flink from stabilizing the surface that affects typical > > users? > > > > Thanks, > > Thomas > > > > [1] https://issues.apache.org/jira/browse/BEAM-5419 > > > > |
Some scenarios that come to mind:
Flink client binary compatibility with remote cluster: This would include RemoteStreamEnvironment, RESTClusterClient etc. - User should be able to submit the job built with 1.6.x using the 1.6.x binaries to the remote Flink 1.7.x or later cluster. The use case for this is Beam. REST API compatibility: User tooling built against 1.6.x REST API spec continues to work with 1.7.x or later REST API CLI compatibility: The commands/options exposed in the CLI continue to be available after an upgrade. Users can just point to the new CLI location. Metrics: Metrics that exist in 1.6.x are available in 1.7.x There is probably a lot more (such as various backends that users can configure and their options) and there are different levels of cost/complexity trade-offs. I brought up the REST API in the past after observing the tools breakage when going from 1.4.x to 1.5.x. The client binary compatibility issue will grow more severe as the ecosystem expands. Beam is a representative example in that category. To solve the issue downstream, different communities and users each would need to come up with build system/release support for multiple parallel Flink versions. It would be better to shield from such complexity. Thanks, Thomas On Tue, Nov 27, 2018 at 6:27 AM Fabian Hueske <[hidden email]> wrote: > Hi, > > I think this is a very good discussion to have. > Flink is becoming part of more and more production deployments and more > tools are built around it. > The question is do we want to (or can we) make parts of the > control/maintenance/monitoring API stable such that external > systems/frameworks can rely on them as stable. > > Which APIs are relevant? > Which APIs could be declared as stable? > Which parts are still evolving? > > Fabian > > Am Di., 27. Nov. 2018 um 15:10 Uhr schrieb Chesnay Schepler < > [hidden email]>: > >> I think this discussion needs specific examples as to what should be >> possible as it otherwise is to vague / open to interpretation. >> >> For example, "job submission" may refer to CLI invocations continuing to >> work (i.e. CLI arguments), or being able to use a 1.6 client against a >> 1.7 cluster, which are entirely different things. >> >> What does "management" include? Dependencies? Set of jars that are >> released on maven? Set of jars bundled with flink-dist? >> >> On 26.11.2018 17:24, Thomas Weise wrote: >> > Hi, >> > >> > I wanted to bring back the topic of backward compatibility with respect >> to >> > all/most of the user facing aspects of Flink. Please note that isn't >> > limited to the programming API, but also includes job submission and >> > management. >> > >> > As can be seen in [1], changes in these areas cause difficulties >> > downstream. Projects have to choose between Flink versions and users are >> > ultimately at disadvantage, either by not being able to use the desired >> > dependency or facing forced upgrades to their infrastructure. >> > >> > IMO the preferred solution would be that downstream projects can build >> > against a minimum version of Flink and expect compatibility with future >> > releases of the major version stream. For example, my project depends on >> > 1.6.x and can expect to run without recompilation on 1.7.x and later. >> > >> > How far away is Flink from stabilizing the surface that affects typical >> > users? >> > >> > Thanks, >> > Thomas >> > >> > [1] https://issues.apache.org/jira/browse/BEAM-5419 >> > >> >> |
so let's take a look...
binary client compatibility: The key issue i see hasn't changed since the last time this was brought up: Clients rely on the JobGraph to submit the job which is an internal data structure. AFAIK there will also be changes made to said class soon(ish). So long as we don't introduce a decoupled structure and/or compatibility routines here this is not feasible. The client in general may be in the way here. The unfortunate reality is that the client code is one big mess that is due for a complete rewrite. I doubt anyone has an all-encompassing view over hidden assumptions that are baked into it, that we would have to retain if we go for backwards compatibility. CLI compatibility: Does this include all start scripts or just the flink executable? I think this makes sense, but so far we did a reasonable job at not changing command-line parameters. (But maybe only because changing this part of the CLI is a massive pain...) REST API: The versioning introduced in 1.7.0 is a significant step towards a stable API as it allows us to modify things without (inherently) breaking it. We're primarily missing tests here to verify the stability, but these are being worked on. Metrics: I would not categorize them as stable in general, the reason being that we are still refactoring and stream-lining the usage. For some core system metrics (checkpoint info, IO) we can _probably_ guarantee stability. On 27.11.2018 18:43, Thomas Weise wrote: > Some scenarios that come to mind: > > Flink client binary compatibility with remote cluster: This would include > RemoteStreamEnvironment, RESTClusterClient etc. - User should be able to > submit the job built with 1.6.x using the 1.6.x binaries to the remote > Flink 1.7.x or later cluster. The use case for this is Beam. > > REST API compatibility: User tooling built against 1.6.x REST API spec > continues to work with 1.7.x or later REST API > > CLI compatibility: The commands/options exposed in the CLI continue to be > available after an upgrade. Users can just point to the new CLI location. > > Metrics: Metrics that exist in 1.6.x are available in 1.7.x > > There is probably a lot more (such as various backends that users can > configure and their options) and there are different levels of > cost/complexity trade-offs. I brought up the REST API in the past after > observing the tools breakage when going from 1.4.x to 1.5.x. > > The client binary compatibility issue will grow more severe as the > ecosystem expands. Beam is a representative example in that category. To > solve the issue downstream, different communities and users each would need > to come up with build system/release support for multiple parallel Flink > versions. It would be better to shield from such complexity. > > Thanks, > Thomas > > > On Tue, Nov 27, 2018 at 6:27 AM Fabian Hueske <[hidden email]> wrote: > >> Hi, >> >> I think this is a very good discussion to have. >> Flink is becoming part of more and more production deployments and more >> tools are built around it. >> The question is do we want to (or can we) make parts of the >> control/maintenance/monitoring API stable such that external >> systems/frameworks can rely on them as stable. >> >> Which APIs are relevant? >> Which APIs could be declared as stable? >> Which parts are still evolving? >> >> Fabian >> >> Am Di., 27. Nov. 2018 um 15:10 Uhr schrieb Chesnay Schepler < >> [hidden email]>: >> >>> I think this discussion needs specific examples as to what should be >>> possible as it otherwise is to vague / open to interpretation. >>> >>> For example, "job submission" may refer to CLI invocations continuing to >>> work (i.e. CLI arguments), or being able to use a 1.6 client against a >>> 1.7 cluster, which are entirely different things. >>> >>> What does "management" include? Dependencies? Set of jars that are >>> released on maven? Set of jars bundled with flink-dist? >>> >>> On 26.11.2018 17:24, Thomas Weise wrote: >>>> Hi, >>>> >>>> I wanted to bring back the topic of backward compatibility with respect >>> to >>>> all/most of the user facing aspects of Flink. Please note that isn't >>>> limited to the programming API, but also includes job submission and >>>> management. >>>> >>>> As can be seen in [1], changes in these areas cause difficulties >>>> downstream. Projects have to choose between Flink versions and users are >>>> ultimately at disadvantage, either by not being able to use the desired >>>> dependency or facing forced upgrades to their infrastructure. >>>> >>>> IMO the preferred solution would be that downstream projects can build >>>> against a minimum version of Flink and expect compatibility with future >>>> releases of the major version stream. For example, my project depends on >>>> 1.6.x and can expect to run without recompilation on 1.7.x and later. >>>> >>>> How far away is Flink from stabilizing the surface that affects typical >>>> users? >>>> >>>> Thanks, >>>> Thomas >>>> >>>> [1] https://issues.apache.org/jira/browse/BEAM-5419 >>>> >>> |
Few thoughts from my side:
(1) The client needs big refactoring / cleanup. It should use a proper HTTP client library to help with future authentication mechanisms. Once that is done, we should identify a "client API" that we make stable, just as the DataStream / DataSet API. (2) We will most likely refactor the stack in the near future (see discussion threads on batch / streaming unification). I would suggest that we define a DAG API as the common substrate and as the data structure in which jobs are submitted to the REST API (session modes) and stored in HA services (job mode). Think of it as a JobGraph++. It may be a good idea to define that structure via ProtoBuf (or a similar tool) to support forward/backwards compatibility. Best, Stephan On Wed, Nov 28, 2018 at 10:45 AM Chesnay Schepler <[hidden email]> wrote: > so let's take a look... > > binary client compatibility: The key issue i see hasn't changed since > the last time this was brought up: Clients rely on the JobGraph to > submit the job which is an internal data structure. AFAIK there will > also be changes made to said class soon(ish). So long as we don't > introduce a decoupled structure and/or compatibility routines here this > is not feasible. > The client in general may be in the way here. The unfortunate reality is > that the client code is one big mess that is due for a complete rewrite. > I doubt anyone has an all-encompassing view over hidden assumptions that > are baked into it, that we would have to retain if we go for backwards > compatibility. > > CLI compatibility: Does this include all start scripts or just the flink > executable? I think this makes sense, but so far we did a reasonable job > at not changing command-line parameters. (But maybe only because > changing this part of the CLI is a massive pain...) > > REST API: The versioning introduced in 1.7.0 is a significant step > towards a stable API as it allows us to modify things without > (inherently) breaking it. > We're primarily missing tests here to verify the stability, but these > are being worked on. > > Metrics: I would not categorize them as stable in general, the reason > being that we are still refactoring and stream-lining the usage. For > some core system metrics (checkpoint info, IO) we can _probably_ > guarantee stability. > > On 27.11.2018 18:43, Thomas Weise wrote: > > Some scenarios that come to mind: > > > > Flink client binary compatibility with remote cluster: This would include > > RemoteStreamEnvironment, RESTClusterClient etc. - User should be able to > > submit the job built with 1.6.x using the 1.6.x binaries to the remote > > Flink 1.7.x or later cluster. The use case for this is Beam. > > > > REST API compatibility: User tooling built against 1.6.x REST API spec > > continues to work with 1.7.x or later REST API > > > > CLI compatibility: The commands/options exposed in the CLI continue to be > > available after an upgrade. Users can just point to the new CLI location. > > > > Metrics: Metrics that exist in 1.6.x are available in 1.7.x > > > > There is probably a lot more (such as various backends that users can > > configure and their options) and there are different levels of > > cost/complexity trade-offs. I brought up the REST API in the past after > > observing the tools breakage when going from 1.4.x to 1.5.x. > > > > The client binary compatibility issue will grow more severe as the > > ecosystem expands. Beam is a representative example in that category. To > > solve the issue downstream, different communities and users each would > need > > to come up with build system/release support for multiple parallel Flink > > versions. It would be better to shield from such complexity. > > > > Thanks, > > Thomas > > > > > > On Tue, Nov 27, 2018 at 6:27 AM Fabian Hueske <[hidden email]> wrote: > > > >> Hi, > >> > >> I think this is a very good discussion to have. > >> Flink is becoming part of more and more production deployments and more > >> tools are built around it. > >> The question is do we want to (or can we) make parts of the > >> control/maintenance/monitoring API stable such that external > >> systems/frameworks can rely on them as stable. > >> > >> Which APIs are relevant? > >> Which APIs could be declared as stable? > >> Which parts are still evolving? > >> > >> Fabian > >> > >> Am Di., 27. Nov. 2018 um 15:10 Uhr schrieb Chesnay Schepler < > >> [hidden email]>: > >> > >>> I think this discussion needs specific examples as to what should be > >>> possible as it otherwise is to vague / open to interpretation. > >>> > >>> For example, "job submission" may refer to CLI invocations continuing > to > >>> work (i.e. CLI arguments), or being able to use a 1.6 client against a > >>> 1.7 cluster, which are entirely different things. > >>> > >>> What does "management" include? Dependencies? Set of jars that are > >>> released on maven? Set of jars bundled with flink-dist? > >>> > >>> On 26.11.2018 17:24, Thomas Weise wrote: > >>>> Hi, > >>>> > >>>> I wanted to bring back the topic of backward compatibility with > respect > >>> to > >>>> all/most of the user facing aspects of Flink. Please note that isn't > >>>> limited to the programming API, but also includes job submission and > >>>> management. > >>>> > >>>> As can be seen in [1], changes in these areas cause difficulties > >>>> downstream. Projects have to choose between Flink versions and users > are > >>>> ultimately at disadvantage, either by not being able to use the > desired > >>>> dependency or facing forced upgrades to their infrastructure. > >>>> > >>>> IMO the preferred solution would be that downstream projects can build > >>>> against a minimum version of Flink and expect compatibility with > future > >>>> releases of the major version stream. For example, my project depends > on > >>>> 1.6.x and can expect to run without recompilation on 1.7.x and later. > >>>> > >>>> How far away is Flink from stabilizing the surface that affects > typical > >>>> users? > >>>> > >>>> Thanks, > >>>> Thomas > >>>> > >>>> [1] https://issues.apache.org/jira/browse/BEAM-5419 > >>>> > >>> > > |
Free forum by Nabble | Edit this page |