(DEPRECATED) Apache Flink Mailing List archive.

[DISCUSS] Flink Docker Playgrounds

Classic

List

Threaded

7 messages Options

Fabian Hueske-2

[DISCUSS] Flink Docker Playgrounds

Hi everyone,

As you might know, some of us are currently working on Docker-based
playgrounds that make it very easy for first-time Flink users to try out
and play with Flink [0].

Our current setup (still work in progress with some parts merged to the
master branch) looks as follows:
* The playground is a Docker Compose environment [1] consisting of Flink,
Kafka, and Zookeeper images (ZK for Kafka). The playground is based on a
specific Flink job.
* We had planned to add the example job of the playground as an example to
the flink main repository to bundle it with the Flink distribution. Hence,
it would have been included in the Docker-hub-official (soon to be
published) Flink 1.9 Docker image [2].
* The main motivation of adding the job to the examples module in the flink
main repo was to avoid the maintenance overhead for a customized Docker
image.

When discussing to backport the playground job (and its data generator) to
include it in the Flink 1.9 examples, concerns were raised about their
Kafka dependency which will become a problem, if the community agrees on
the recently proposed repository split, which would remove flink-kafka from
the main repository [3]. I think this is a fair concern, that we did not
consider when designing the playground (also the repo split was not
proposed yet).

If we don't add the playground job to the examples, we need to put it
somewhere else. The obvious choice would be the flink-playgrounds [4]
repository, which was intended for the docker-compose configuration files.
However, we would not be able to include it in the Docker-hub-official
Flink image any more and would need to maintain a custom Docker image, what
we tried to avoid. The custom image would of course be based on the
Docker-hub-official Flink image.

There are different approaches for this:

1) Building one (or more) official ASF images
There is an official Apache Docker Hub user [5] and a bunch of projects
publish Docker images via this user. Apache Infra seems to support an
process that automatically builds and publishes Docker images when a
release tag is added to a repository. This feature needs to be enabled. I
haven't found detailed documentation on this but there is a bunch of INFRA
Jira tickets that discuss this mechanism.
This approach would mean that we need a formal Apache release for
flink-playgrounds (similar to flink-shaded). The obvious benefits are that
these images would be ASF-official Docker images. In case we can publish
more than one image per repo, we could also publish images for other
playgrounds (like the SQL playground, which could be based on the SQL
training that I built [6] which uses an image that is published under my
user [7]).

2) Rely on an external image
This image could be build by somebody in the community (like me). Problem
is of course, that the image is not an official image and we would rely on
a volunteer to build the images.
OTOH, the overhead would be pretty small. No need to roll run full
releases, integration with Infra's build process, etc.

IMO, the first approach is clearly the better choice but also needs a bunch
of things to be put into place.

What do others think?
Does somebody have another idea?

Cheers,
Fabian

[0]
https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html
[1]
https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html#anatomy-of-this-playground
[2] https://hub.docker.com/_/flink
[3]
https://lists.apache.org/thread.html/eb841f610ef2c191b8d00b6c07b2eab513da2e4eb2d7da5c5e6846f4@%3Cdev.flink.apache.org%3E
[4] https://github.com/apache/flink-playgrounds
[5] https://hub.docker.com/u/apache
[6] https://github.com/ververica/sql-training/
[7] https://hub.docker.com/r/fhueske/flink-sql-client-training-1.7.2

Fabian Hueske-2

Re: [DISCUSS] Flink Docker Playgrounds

One more thing to add.
If we move the code to flink-playgrounds and build custom images, the
playgrounds effort won't be tied to the Flink 1.9 release any more.
So, we'd be a bit more flexible time-wise but would also need to manually
update the playgrounds for every release.

Am Do., 8. Aug. 2019 um 17:36 Uhr schrieb Fabian Hueske <[hidden email]>:

> Hi everyone,
>
> As you might know, some of us are currently working on Docker-based
> playgrounds that make it very easy for first-time Flink users to try out
> and play with Flink [0].
>
> Our current setup (still work in progress with some parts merged to the
> master branch) looks as follows:
> * The playground is a Docker Compose environment [1] consisting of Flink,
> Kafka, and Zookeeper images (ZK for Kafka). The playground is based on a
> specific Flink job.
> * We had planned to add the example job of the playground as an example to
> the flink main repository to bundle it with the Flink distribution. Hence,
> it would have been included in the Docker-hub-official (soon to be
> published) Flink 1.9 Docker image [2].
> * The main motivation of adding the job to the examples module in the
> flink main repo was to avoid the maintenance overhead for a customized
> Docker image.
>
> When discussing to backport the playground job (and its data generator) to
> include it in the Flink 1.9 examples, concerns were raised about their
> Kafka dependency which will become a problem, if the community agrees on
> the recently proposed repository split, which would remove flink-kafka from
> the main repository [3]. I think this is a fair concern, that we did not
> consider when designing the playground (also the repo split was not
> proposed yet).
>
> If we don't add the playground job to the examples, we need to put it
> somewhere else. The obvious choice would be the flink-playgrounds [4]
> repository, which was intended for the docker-compose configuration files.
> However, we would not be able to include it in the Docker-hub-official
> Flink image any more and would need to maintain a custom Docker image, what
> we tried to avoid. The custom image would of course be based on the
> Docker-hub-official Flink image.
>
> There are different approaches for this:
>
> 1) Building one (or more) official ASF images
> There is an official Apache Docker Hub user [5] and a bunch of projects
> publish Docker images via this user. Apache Infra seems to support an
> process that automatically builds and publishes Docker images when a
> release tag is added to a repository. This feature needs to be enabled. I
> haven't found detailed documentation on this but there is a bunch of INFRA
> Jira tickets that discuss this mechanism.
> This approach would mean that we need a formal Apache release for
> flink-playgrounds (similar to flink-shaded). The obvious benefits are that
> these images would be ASF-official Docker images. In case we can publish
> more than one image per repo, we could also publish images for other
> playgrounds (like the SQL playground, which could be based on the SQL
> training that I built [6] which uses an image that is published under my
> user [7]).
>
> 2) Rely on an external image
> This image could be build by somebody in the community (like me). Problem
> is of course, that the image is not an official image and we would rely on
> a volunteer to build the images.
> OTOH, the overhead would be pretty small. No need to roll run full
> releases, integration with Infra's build process, etc.
>
> IMO, the first approach is clearly the better choice but also needs a
> bunch of things to be put into place.
>
> What do others think?
> Does somebody have another idea?
>
> Cheers,
> Fabian
>
> [0]
> https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html#anatomy-of-this-playground
> [2] https://hub.docker.com/_/flink
> [3]
> https://lists.apache.org/thread.html/eb841f610ef2c191b8d00b6c07b2eab513da2e4eb2d7da5c5e6846f4@%3Cdev.flink.apache.org%3E
> [4] https://github.com/apache/flink-playgrounds
> [5] https://hub.docker.com/u/apache
> [6] https://github.com/ververica/sql-training/
> [7] https://hub.docker.com/r/fhueske/flink-sql-client-training-1.7.2
>

Seth Wiesman-4

Re: [DISCUSS] Flink Docker Playgrounds

In reply to this post by Fabian Hueske-2

Hey Fabian,

I support option 1.

As per FLIP-42, playgrounds are going to become core to flinks getting started experience and I believe it is worth the effort to get this right.

- As you mentioned, we may (and in my opinion definitely will) add more images in the future. Setting up an integration now will set the stage for those future additions.

- These images will be many users first exposure to Flink and having a proper release cycle to ensure they work properly may be worth the effort in and of itself. We already found during the first PR to that repo that we needed to find users with different OSs to test.

- Similarly to the above point, having the images hosted under an official Apache account adds a certain amount of credibility and shows the community that we take on-boarding new users seriously.

- I am generally opposed having the official flink docs rely on something that is hosted under someone’s personal account. I don’t want bug fixes or updates to be blocked by your (or some else’s ) availability.

Seth

> On Aug 8, 2019, at 10:36 AM, Fabian Hueske <[hidden email]> wrote:
>
> Hi everyone,
>
> As you might know, some of us are currently working on Docker-based
> playgrounds that make it very easy for first-time Flink users to try out
> and play with Flink [0].
>
> Our current setup (still work in progress with some parts merged to the
> master branch) looks as follows:
> * The playground is a Docker Compose environment [1] consisting of Flink,
> Kafka, and Zookeeper images (ZK for Kafka). The playground is based on a
> specific Flink job.
> * We had planned to add the example job of the playground as an example to
> the flink main repository to bundle it with the Flink distribution. Hence,
> it would have been included in the Docker-hub-official (soon to be
> published) Flink 1.9 Docker image [2].
> * The main motivation of adding the job to the examples module in the flink
> main repo was to avoid the maintenance overhead for a customized Docker
> image.
>
> When discussing to backport the playground job (and its data generator) to
> include it in the Flink 1.9 examples, concerns were raised about their
> Kafka dependency which will become a problem, if the community agrees on
> the recently proposed repository split, which would remove flink-kafka from
> the main repository [3]. I think this is a fair concern, that we did not
> consider when designing the playground (also the repo split was not
> proposed yet).
>
> If we don't add the playground job to the examples, we need to put it
> somewhere else. The obvious choice would be the flink-playgrounds [4]
> repository, which was intended for the docker-compose configuration files.
> However, we would not be able to include it in the Docker-hub-official
> Flink image any more and would need to maintain a custom Docker image, what
> we tried to avoid. The custom image would of course be based on the
> Docker-hub-official Flink image.
>
> There are different approaches for this:
>
> 1) Building one (or more) official ASF images
> There is an official Apache Docker Hub user [5] and a bunch of projects
> publish Docker images via this user. Apache Infra seems to support an
> process that automatically builds and publishes Docker images when a
> release tag is added to a repository. This feature needs to be enabled. I
> haven't found detailed documentation on this but there is a bunch of INFRA
> Jira tickets that discuss this mechanism.
> This approach would mean that we need a formal Apache release for
> flink-playgrounds (similar to flink-shaded). The obvious benefits are that
> these images would be ASF-official Docker images. In case we can publish
> more than one image per repo, we could also publish images for other
> playgrounds (like the SQL playground, which could be based on the SQL
> training that I built [6] which uses an image that is published under my
> user [7]).
>
> 2) Rely on an external image
> This image could be build by somebody in the community (like me). Problem
> is of course, that the image is not an official image and we would rely on
> a volunteer to build the images.
> OTOH, the overhead would be pretty small. No need to roll run full
> releases, integration with Infra's build process, etc.
>
> IMO, the first approach is clearly the better choice but also needs a bunch
> of things to be put into place.
>
> What do others think?
> Does somebody have another idea?
>
> Cheers,
> Fabian
>
> [0]
> https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html#anatomy-of-this-playground
> [2] https://hub.docker.com/_/flink
> [3]
> https://lists.apache.org/thread.html/eb841f610ef2c191b8d00b6c07b2eab513da2e4eb2d7da5c5e6846f4@%3Cdev.flink.apache.org%3E
> [4] https://github.com/apache/flink-playgrounds
> [5] https://hub.docker.com/u/apache
> [6] https://github.com/ververica/sql-training/
> [7] https://hub.docker.com/r/fhueske/flink-sql-client-training-1.7.2

Till Rohrmann

Re: [DISCUSS] Flink Docker Playgrounds

I would be in favour of option 1.

We could also think about making the flink-playgrounds and the Flink docker
image release part of the Flink release process [1] if we don't want to
have independent release cycles. I think at the moment the official Flink
docker image is too often forgotten.

[1]
https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release

Cheers,
Till

On Thu, Aug 8, 2019 at 6:25 PM Seth Wiesman <[hidden email]> wrote:

> Hey Fabian,
>
> I support option 1.
>
> As per FLIP-42, playgrounds are going to become core to flinks getting
> started experience and I believe it is worth the effort to get this right.
>
> - As you mentioned, we may (and in my opinion definitely will) add more
> images in the future. Setting up an integration now will set the stage for
> those future additions.
>
> - These images will be many users first exposure to Flink and having a
> proper release cycle to ensure they work properly may be worth the effort
> in and of itself. We already found during the first PR to that repo that we
> needed to find users with different OSs to test.
>
> - Similarly to the above point, having the images hosted under an official
> Apache account adds a certain amount of credibility and shows the community
> that we take on-boarding new users seriously.
>
> - I am generally opposed having the official flink docs rely on something
> that is hosted under someone’s personal account. I don’t want bug fixes or
> updates to be blocked by your (or some else’s ) availability.
>
> Seth
>
> > On Aug 8, 2019, at 10:36 AM, Fabian Hueske <[hidden email]> wrote:
> >
> > Hi everyone,
> >
> > As you might know, some of us are currently working on Docker-based
> > playgrounds that make it very easy for first-time Flink users to try out
> > and play with Flink [0].
> >
> > Our current setup (still work in progress with some parts merged to the
> > master branch) looks as follows:
> > * The playground is a Docker Compose environment [1] consisting of Flink,
> > Kafka, and Zookeeper images (ZK for Kafka). The playground is based on a
> > specific Flink job.
> > * We had planned to add the example job of the playground as an example
> to
> > the flink main repository to bundle it with the Flink distribution.
> Hence,
> > it would have been included in the Docker-hub-official (soon to be
> > published) Flink 1.9 Docker image [2].
> > * The main motivation of adding the job to the examples module in the
> flink
> > main repo was to avoid the maintenance overhead for a customized Docker
> > image.
> >
> > When discussing to backport the playground job (and its data generator)
> to
> > include it in the Flink 1.9 examples, concerns were raised about their
> > Kafka dependency which will become a problem, if the community agrees on
> > the recently proposed repository split, which would remove flink-kafka
> from
> > the main repository [3]. I think this is a fair concern, that we did not
> > consider when designing the playground (also the repo split was not
> > proposed yet).
> >
> > If we don't add the playground job to the examples, we need to put it
> > somewhere else. The obvious choice would be the flink-playgrounds [4]
> > repository, which was intended for the docker-compose configuration
> files.
> > However, we would not be able to include it in the Docker-hub-official
> > Flink image any more and would need to maintain a custom Docker image,
> what
> > we tried to avoid. The custom image would of course be based on the
> > Docker-hub-official Flink image.
> >
> > There are different approaches for this:
> >
> > 1) Building one (or more) official ASF images
> > There is an official Apache Docker Hub user [5] and a bunch of projects
> > publish Docker images via this user. Apache Infra seems to support an
> > process that automatically builds and publishes Docker images when a
> > release tag is added to a repository. This feature needs to be enabled. I
> > haven't found detailed documentation on this but there is a bunch of
> INFRA
> > Jira tickets that discuss this mechanism.
> > This approach would mean that we need a formal Apache release for
> > flink-playgrounds (similar to flink-shaded). The obvious benefits are
> that
> > these images would be ASF-official Docker images. In case we can publish
> > more than one image per repo, we could also publish images for other
> > playgrounds (like the SQL playground, which could be based on the SQL
> > training that I built [6] which uses an image that is published under my
> > user [7]).
> >
> > 2) Rely on an external image
> > This image could be build by somebody in the community (like me). Problem
> > is of course, that the image is not an official image and we would rely
> on
> > a volunteer to build the images.
> > OTOH, the overhead would be pretty small. No need to roll run full
> > releases, integration with Infra's build process, etc.
> >
> > IMO, the first approach is clearly the better choice but also needs a
> bunch
> > of things to be put into place.
> >
> > What do others think?
> > Does somebody have another idea?
> >
> > Cheers,
> > Fabian
> >
> > [0]
> >
> https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html
> > [1]
> >
> https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html#anatomy-of-this-playground
> > [2] https://hub.docker.com/_/flink
> > [3]
> >
> https://lists.apache.org/thread.html/eb841f610ef2c191b8d00b6c07b2eab513da2e4eb2d7da5c5e6846f4@%3Cdev.flink.apache.org%3E
> > [4] https://github.com/apache/flink-playgrounds
> > [5] https://hub.docker.com/u/apache
> > [6] https://github.com/ververica/sql-training/
> > [7] https://hub.docker.com/r/fhueske/flink-sql-client-training-1.7.2
>

Stephan Ewen

Re: [DISCUSS] Flink Docker Playgrounds

I remember that Patrick (who maintained the docker-flink images so far)
frequently raised the point that its good practice to have the images
decoupled from the project release cycle.
Changes to the images can be done frequently and released fast that way.

In addition, one typically supports images for multiple releases, meaning
the versioning is different than in the main code base.

On Thu, Aug 8, 2019 at 6:35 PM Till Rohrmann <[hidden email]> wrote:

> I would be in favour of option 1.
>
> We could also think about making the flink-playgrounds and the Flink docker
> image release part of the Flink release process [1] if we don't want to
> have independent release cycles. I think at the moment the official Flink
> docker image is too often forgotten.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
>
> Cheers,
> Till
>
> On Thu, Aug 8, 2019 at 6:25 PM Seth Wiesman <[hidden email]> wrote:
>
> > Hey Fabian,
> >
> > I support option 1.
> >
> > As per FLIP-42, playgrounds are going to become core to flinks getting
> > started experience and I believe it is worth the effort to get this
> right.
> >
> > - As you mentioned, we may (and in my opinion definitely will) add more
> > images in the future. Setting up an integration now will set the stage
> for
> > those future additions.
> >
> > - These images will be many users first exposure to Flink and having a
> > proper release cycle to ensure they work properly may be worth the effort
> > in and of itself. We already found during the first PR to that repo that
> we
> > needed to find users with different OSs to test.
> >
> > - Similarly to the above point, having the images hosted under an
> official
> > Apache account adds a certain amount of credibility and shows the
> community
> > that we take on-boarding new users seriously.
> >
> > - I am generally opposed having the official flink docs rely on something
> > that is hosted under someone’s personal account. I don’t want bug fixes
> or
> > updates to be blocked by your (or some else’s ) availability.
> >
> > Seth
> >
> > > On Aug 8, 2019, at 10:36 AM, Fabian Hueske <[hidden email]> wrote:
> > >
> > > Hi everyone,
> > >
> > > As you might know, some of us are currently working on Docker-based
> > > playgrounds that make it very easy for first-time Flink users to try
> out
> > > and play with Flink [0].
> > >
> > > Our current setup (still work in progress with some parts merged to the
> > > master branch) looks as follows:
> > > * The playground is a Docker Compose environment [1] consisting of
> Flink,
> > > Kafka, and Zookeeper images (ZK for Kafka). The playground is based on
> a
> > > specific Flink job.
> > > * We had planned to add the example job of the playground as an example
> > to
> > > the flink main repository to bundle it with the Flink distribution.
> > Hence,
> > > it would have been included in the Docker-hub-official (soon to be
> > > published) Flink 1.9 Docker image [2].
> > > * The main motivation of adding the job to the examples module in the
> > flink
> > > main repo was to avoid the maintenance overhead for a customized Docker
> > > image.
> > >
> > > When discussing to backport the playground job (and its data generator)
> > to
> > > include it in the Flink 1.9 examples, concerns were raised about their
> > > Kafka dependency which will become a problem, if the community agrees
> on
> > > the recently proposed repository split, which would remove flink-kafka
> > from
> > > the main repository [3]. I think this is a fair concern, that we did
> not
> > > consider when designing the playground (also the repo split was not
> > > proposed yet).
> > >
> > > If we don't add the playground job to the examples, we need to put it
> > > somewhere else. The obvious choice would be the flink-playgrounds [4]
> > > repository, which was intended for the docker-compose configuration
> > files.
> > > However, we would not be able to include it in the Docker-hub-official
> > > Flink image any more and would need to maintain a custom Docker image,
> > what
> > > we tried to avoid. The custom image would of course be based on the
> > > Docker-hub-official Flink image.
> > >
> > > There are different approaches for this:
> > >
> > > 1) Building one (or more) official ASF images
> > > There is an official Apache Docker Hub user [5] and a bunch of projects
> > > publish Docker images via this user. Apache Infra seems to support an
> > > process that automatically builds and publishes Docker images when a
> > > release tag is added to a repository. This feature needs to be
> enabled. I
> > > haven't found detailed documentation on this but there is a bunch of
> > INFRA
> > > Jira tickets that discuss this mechanism.
> > > This approach would mean that we need a formal Apache release for
> > > flink-playgrounds (similar to flink-shaded). The obvious benefits are
> > that
> > > these images would be ASF-official Docker images. In case we can
> publish
> > > more than one image per repo, we could also publish images for other
> > > playgrounds (like the SQL playground, which could be based on the SQL
> > > training that I built [6] which uses an image that is published under
> my
> > > user [7]).
> > >
> > > 2) Rely on an external image
> > > This image could be build by somebody in the community (like me).
> Problem
> > > is of course, that the image is not an official image and we would rely
> > on
> > > a volunteer to build the images.
> > > OTOH, the overhead would be pretty small. No need to roll run full
> > > releases, integration with Infra's build process, etc.
> > >
> > > IMO, the first approach is clearly the better choice but also needs a
> > bunch
> > > of things to be put into place.
> > >
> > > What do others think?
> > > Does somebody have another idea?
> > >
> > > Cheers,
> > > Fabian
> > >
> > > [0]
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html
> > > [1]
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html#anatomy-of-this-playground
> > > [2] https://hub.docker.com/_/flink
> > > [3]
> > >
> >
> https://lists.apache.org/thread.html/eb841f610ef2c191b8d00b6c07b2eab513da2e4eb2d7da5c5e6846f4@%3Cdev.flink.apache.org%3E
> > > [4] https://github.com/apache/flink-playgrounds
> > > [5] https://hub.docker.com/u/apache
> > > [6] https://github.com/ververica/sql-training/
> > > [7] https://hub.docker.com/r/fhueske/flink-sql-client-training-1.7.2
> >
>

Yang Wang

Re: [DISCUSS] Flink Docker Playgrounds

Hey Fabian,

Sounds great! It will be more easier to build an end-to-end play ground
with docker.

I prefer option 1.

We need to build the official docker images and push to docker hub after
every version is released. It could be used to play with docker compose,
kubernetes, etc.

Today we could found that more and more uses are running flink with
container on kubernetes, Yarn, Mesos. So if the official flink image could
be released and maintained, it will help our users a lot.

They could build their own image with dependencies based on the official
image to run a flink application. Also they could start a flink cluster
with official image, then submit flink jobs to the existed cluster.

Best,

Yang

Stephan Ewen <[hidden email]> 于2019年8月9日周五上午1:58写道：

> I remember that Patrick (who maintained the docker-flink images so far)
> frequently raised the point that its good practice to have the images
> decoupled from the project release cycle.
> Changes to the images can be done frequently and released fast that way.
>
> In addition, one typically supports images for multiple releases, meaning
> the versioning is different than in the main code base.
>
> On Thu, Aug 8, 2019 at 6:35 PM Till Rohrmann <[hidden email]> wrote:
>
> > I would be in favour of option 1.
> >
> > We could also think about making the flink-playgrounds and the Flink
> docker
> > image release part of the Flink release process [1] if we don't want to
> > have independent release cycles. I think at the moment the official Flink
> > docker image is too often forgotten.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
> >
> > Cheers,
> > Till
> >
> > On Thu, Aug 8, 2019 at 6:25 PM Seth Wiesman <[hidden email]> wrote:
> >
> > > Hey Fabian,
> > >
> > > I support option 1.
> > >
> > > As per FLIP-42, playgrounds are going to become core to flinks getting
> > > started experience and I believe it is worth the effort to get this
> > right.
> > >
> > > - As you mentioned, we may (and in my opinion definitely will) add more
> > > images in the future. Setting up an integration now will set the stage
> > for
> > > those future additions.
> > >
> > > - These images will be many users first exposure to Flink and having a
> > > proper release cycle to ensure they work properly may be worth the
> effort
> > > in and of itself. We already found during the first PR to that repo
> that
> > we
> > > needed to find users with different OSs to test.
> > >
> > > - Similarly to the above point, having the images hosted under an
> > official
> > > Apache account adds a certain amount of credibility and shows the
> > community
> > > that we take on-boarding new users seriously.
> > >
> > > - I am generally opposed having the official flink docs rely on
> something
> > > that is hosted under someone’s personal account. I don’t want bug fixes
> > or
> > > updates to be blocked by your (or some else’s ) availability.
> > >
> > > Seth
> > >
> > > > On Aug 8, 2019, at 10:36 AM, Fabian Hueske <[hidden email]>
> wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > As you might know, some of us are currently working on Docker-based
> > > > playgrounds that make it very easy for first-time Flink users to try
> > out
> > > > and play with Flink [0].
> > > >
> > > > Our current setup (still work in progress with some parts merged to
> the
> > > > master branch) looks as follows:
> > > > * The playground is a Docker Compose environment [1] consisting of
> > Flink,
> > > > Kafka, and Zookeeper images (ZK for Kafka). The playground is based
> on
> > a
> > > > specific Flink job.
> > > > * We had planned to add the example job of the playground as an
> example
> > > to
> > > > the flink main repository to bundle it with the Flink distribution.
> > > Hence,
> > > > it would have been included in the Docker-hub-official (soon to be
> > > > published) Flink 1.9 Docker image [2].
> > > > * The main motivation of adding the job to the examples module in the
> > > flink
> > > > main repo was to avoid the maintenance overhead for a customized
> Docker
> > > > image.
> > > >
> > > > When discussing to backport the playground job (and its data
> generator)
> > > to
> > > > include it in the Flink 1.9 examples, concerns were raised about
> their
> > > > Kafka dependency which will become a problem, if the community agrees
> > on
> > > > the recently proposed repository split, which would remove
> flink-kafka
> > > from
> > > > the main repository [3]. I think this is a fair concern, that we did
> > not
> > > > consider when designing the playground (also the repo split was not
> > > > proposed yet).
> > > >
> > > > If we don't add the playground job to the examples, we need to put it
> > > > somewhere else. The obvious choice would be the flink-playgrounds [4]
> > > > repository, which was intended for the docker-compose configuration
> > > files.
> > > > However, we would not be able to include it in the
> Docker-hub-official
> > > > Flink image any more and would need to maintain a custom Docker
> image,
> > > what
> > > > we tried to avoid. The custom image would of course be based on the
> > > > Docker-hub-official Flink image.
> > > >
> > > > There are different approaches for this:
> > > >
> > > > 1) Building one (or more) official ASF images
> > > > There is an official Apache Docker Hub user [5] and a bunch of
> projects
> > > > publish Docker images via this user. Apache Infra seems to support an
> > > > process that automatically builds and publishes Docker images when a
> > > > release tag is added to a repository. This feature needs to be
> > enabled. I
> > > > haven't found detailed documentation on this but there is a bunch of
> > > INFRA
> > > > Jira tickets that discuss this mechanism.
> > > > This approach would mean that we need a formal Apache release for
> > > > flink-playgrounds (similar to flink-shaded). The obvious benefits are
> > > that
> > > > these images would be ASF-official Docker images. In case we can
> > publish
> > > > more than one image per repo, we could also publish images for other
> > > > playgrounds (like the SQL playground, which could be based on the SQL
> > > > training that I built [6] which uses an image that is published under
> > my
> > > > user [7]).
> > > >
> > > > 2) Rely on an external image
> > > > This image could be build by somebody in the community (like me).
> > Problem
> > > > is of course, that the image is not an official image and we would
> rely
> > > on
> > > > a volunteer to build the images.
> > > > OTOH, the overhead would be pretty small. No need to roll run full
> > > > releases, integration with Infra's build process, etc.
> > > >
> > > > IMO, the first approach is clearly the better choice but also needs a
> > > bunch
> > > > of things to be put into place.
> > > >
> > > > What do others think?
> > > > Does somebody have another idea?
> > > >
> > > > Cheers,
> > > > Fabian
> > > >
> > > > [0]
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html
> > > > [1]
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html#anatomy-of-this-playground
> > > > [2] https://hub.docker.com/_/flink
> > > > [3]
> > > >
> > >
> >
> https://lists.apache.org/thread.html/eb841f610ef2c191b8d00b6c07b2eab513da2e4eb2d7da5c5e6846f4@%3Cdev.flink.apache.org%3E
> > > > [4] https://github.com/apache/flink-playgrounds
> > > > [5] https://hub.docker.com/u/apache
> > > > [6] https://github.com/ververica/sql-training/
> > > > [7] https://hub.docker.com/r/fhueske/flink-sql-client-training-1.7.2
> > >
> >
>

Konstantin Knauf-3

Re: [DISCUSS] Flink Docker Playgrounds

Hi everyone, Hi Fabian,

I am also in favor of option 1.

Besides the playgrounds it is a good opportunity to explore this process
for official Docker images as Till suggested. This needs a separate
discussion, though.

Best,

Konstantin

On Fri, Aug 9, 2019 at 5:25 AM Yang Wang <[hidden email]> wrote:

> Hey Fabian,
>
>
> Sounds great! It will be more easier to build an end-to-end play ground
> with docker.
>
>
> I prefer option 1.
>
> We need to build the official docker images and push to docker hub after
> every version is released. It could be used to play with docker compose,
> kubernetes, etc.
>
>
> Today we could found that more and more uses are running flink with
> container on kubernetes, Yarn, Mesos. So if the official flink image could
> be released and maintained, it will help our users a lot.
>
> They could build their own image with dependencies based on the official
> image to run a flink application. Also they could start a flink cluster
> with official image, then submit flink jobs to the existed cluster.
>
>
>
>
> Best,
>
> Yang
>
> Stephan Ewen <[hidden email]> 于2019年8月9日周五上午1:58写道：
>
> > I remember that Patrick (who maintained the docker-flink images so far)
> > frequently raised the point that its good practice to have the images
> > decoupled from the project release cycle.
> > Changes to the images can be done frequently and released fast that way.
> >
> > In addition, one typically supports images for multiple releases, meaning
> > the versioning is different than in the main code base.
> >
> > On Thu, Aug 8, 2019 at 6:35 PM Till Rohrmann <[hidden email]>
> wrote:
> >
> > > I would be in favour of option 1.
> > >
> > > We could also think about making the flink-playgrounds and the Flink
> > docker
> > > image release part of the Flink release process [1] if we don't want to
> > > have independent release cycles. I think at the moment the official
> Flink
> > > docker image is too often forgotten.
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release
> > >
> > > Cheers,
> > > Till
> > >
> > > On Thu, Aug 8, 2019 at 6:25 PM Seth Wiesman <[hidden email]>
> wrote:
> > >
> > > > Hey Fabian,
> > > >
> > > > I support option 1.
> > > >
> > > > As per FLIP-42, playgrounds are going to become core to flinks
> getting
> > > > started experience and I believe it is worth the effort to get this
> > > right.
> > > >
> > > > - As you mentioned, we may (and in my opinion definitely will) add
> more
> > > > images in the future. Setting up an integration now will set the
> stage
> > > for
> > > > those future additions.
> > > >
> > > > - These images will be many users first exposure to Flink and having
> a
> > > > proper release cycle to ensure they work properly may be worth the
> > effort
> > > > in and of itself. We already found during the first PR to that repo
> > that
> > > we
> > > > needed to find users with different OSs to test.
> > > >
> > > > - Similarly to the above point, having the images hosted under an
> > > official
> > > > Apache account adds a certain amount of credibility and shows the
> > > community
> > > > that we take on-boarding new users seriously.
> > > >
> > > > - I am generally opposed having the official flink docs rely on
> > something
> > > > that is hosted under someone’s personal account. I don’t want bug
> fixes
> > > or
> > > > updates to be blocked by your (or some else’s ) availability.
> > > >
> > > > Seth
> > > >
> > > > > On Aug 8, 2019, at 10:36 AM, Fabian Hueske <[hidden email]>
> > wrote:
> > > > >
> > > > > Hi everyone,
> > > > >
> > > > > As you might know, some of us are currently working on Docker-based
> > > > > playgrounds that make it very easy for first-time Flink users to
> try
> > > out
> > > > > and play with Flink [0].
> > > > >
> > > > > Our current setup (still work in progress with some parts merged to
> > the
> > > > > master branch) looks as follows:
> > > > > * The playground is a Docker Compose environment [1] consisting of
> > > Flink,
> > > > > Kafka, and Zookeeper images (ZK for Kafka). The playground is based
> > on
> > > a
> > > > > specific Flink job.
> > > > > * We had planned to add the example job of the playground as an
> > example
> > > > to
> > > > > the flink main repository to bundle it with the Flink distribution.
> > > > Hence,
> > > > > it would have been included in the Docker-hub-official (soon to be
> > > > > published) Flink 1.9 Docker image [2].
> > > > > * The main motivation of adding the job to the examples module in
> the
> > > > flink
> > > > > main repo was to avoid the maintenance overhead for a customized
> > Docker
> > > > > image.
> > > > >
> > > > > When discussing to backport the playground job (and its data
> > generator)
> > > > to
> > > > > include it in the Flink 1.9 examples, concerns were raised about
> > their
> > > > > Kafka dependency which will become a problem, if the community
> agrees
> > > on
> > > > > the recently proposed repository split, which would remove
> > flink-kafka
> > > > from
> > > > > the main repository [3]. I think this is a fair concern, that we
> did
> > > not
> > > > > consider when designing the playground (also the repo split was not
> > > > > proposed yet).
> > > > >
> > > > > If we don't add the playground job to the examples, we need to put
> it
> > > > > somewhere else. The obvious choice would be the flink-playgrounds
> [4]
> > > > > repository, which was intended for the docker-compose configuration
> > > > files.
> > > > > However, we would not be able to include it in the
> > Docker-hub-official
> > > > > Flink image any more and would need to maintain a custom Docker
> > image,
> > > > what
> > > > > we tried to avoid. The custom image would of course be based on the
> > > > > Docker-hub-official Flink image.
> > > > >
> > > > > There are different approaches for this:
> > > > >
> > > > > 1) Building one (or more) official ASF images
> > > > > There is an official Apache Docker Hub user [5] and a bunch of
> > projects
> > > > > publish Docker images via this user. Apache Infra seems to support
> an
> > > > > process that automatically builds and publishes Docker images when
> a
> > > > > release tag is added to a repository. This feature needs to be
> > > enabled. I
> > > > > haven't found detailed documentation on this but there is a bunch
> of
> > > > INFRA
> > > > > Jira tickets that discuss this mechanism.
> > > > > This approach would mean that we need a formal Apache release for
> > > > > flink-playgrounds (similar to flink-shaded). The obvious benefits
> are
> > > > that
> > > > > these images would be ASF-official Docker images. In case we can
> > > publish
> > > > > more than one image per repo, we could also publish images for
> other
> > > > > playgrounds (like the SQL playground, which could be based on the
> SQL
> > > > > training that I built [6] which uses an image that is published
> under
> > > my
> > > > > user [7]).
> > > > >
> > > > > 2) Rely on an external image
> > > > > This image could be build by somebody in the community (like me).
> > > Problem
> > > > > is of course, that the image is not an official image and we would
> > rely
> > > > on
> > > > > a volunteer to build the images.
> > > > > OTOH, the overhead would be pretty small. No need to roll run full
> > > > > releases, integration with Infra's build process, etc.
> > > > >
> > > > > IMO, the first approach is clearly the better choice but also
> needs a
> > > > bunch
> > > > > of things to be put into place.
> > > > >
> > > > > What do others think?
> > > > > Does somebody have another idea?
> > > > >
> > > > > Cheers,
> > > > > Fabian
> > > > >
> > > > > [0]
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink_cluster_playground.html#anatomy-of-this-playground
> > > > > [2] https://hub.docker.com/_/flink
> > > > > [3]
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/eb841f610ef2c191b8d00b6c07b2eab513da2e4eb2d7da5c5e6846f4@%3Cdev.flink.apache.org%3E
> > > > > [4] https://github.com/apache/flink-playgrounds
> > > > > [5] https://hub.docker.com/u/apache
> > > > > [6] https://github.com/ververica/sql-training/
> > > > > [7]
> https://hub.docker.com/r/fhueske/flink-sql-client-training-1.7.2
> > > >
> > >
> >
>

--

Konstantin Knauf | Solutions Architect

+49 160 91394525

Planned Absences: 10.08.2019 - 31.08.2019, 05.09. - 06.09.2019

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen