Hi everyone,
I would like to start a discussion about integrating publication of the Flink Docker images hosted on Docker Hub[1] more tightly with the Flink release process. Apologies in advance for the long post. More than two and a half years ago (time flies!) we introduced “official” Docker images for Flink[2]. Since then, the popularity of running containerized applications in general and containerized Flink in particular has continued to grow. Today, Flink is one of the most popular “official” images on Docker Hub[3]. > A graph of Flink Docker image pulls over time: https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png “Official” is in quotation marks because while that’s how the Docker community refers to top-level images on Docker Hub (i.e. those that can be run with just <docker run foo>), they are not official in the sense of being officially endorsed by the Flink PMC. I think it’s time for that to change. Currently, the Dockerfiles that produce these images are maintained in a repository called docker-flink[4] in a separate, community-managed GitHub organization of the same name. When a new release of Flink is available, or when other changes are necessary, these Dockerfiles—one per image—are updated, and then a pull request[5] is made to the Docker Hub official-images repo with an updated manifest of images and tags, after which infrastructure run by Docker Hub builds, checks, and publishes the images. A question that has come up regularly is “Why are the Dockerfiles in a separate repository from Flink?”, and there are a few different answers: - These Dockerfiles package only released, published distributions of Flink, and are therefore decoupled from a particular commit in the Flink repo - All the Dockerfiles for supported versions (and the corresponding Scala version variants) should be available in one Git tree for discoverability - The master branch of Flink is not the right place to encode what the supported versions are, or how to run previous versions of Flink—it should be concerned with the point-in-time of the code represented in that commit But mostly, having a dedicated repo for Dockerfiles is a convention shared by nearly every other “official” image on Docker Hub[6]. If the Flink community wants to do this differently, we will need to work with the Docker Hub maintainers to make sure we continue to work within their guidelines and expectations. While it seems intuitive that integrating these images into the Flink release process is a good thing, I don’t believe it is strictly necessary, since the images only package approved and signed Flink releases, and do not themselves build Flink from source. However, there are some concrete advantages: - Putting the Docker images on (almost) equal footing with Flink binary release artifacts will help the legitimacy of and user confidence in running Flink in containerized environments - By publishing release candidate (and possibly nightly) images, the release testing and automated testing processes could be improved - The delay between Flink releases and when the corresponding Docker images are available will be reduced Considering all of this, I propose the following: - We move the Git repository containing the Dockerfiles from the docker-flink GitHub organization to Apache, placing it under control of the Flink PMC - We codify updating these Dockerfiles and notifying Docker Hub into the Flink release process - For release candidates, Dockerfiles should be added to a special directory which will be automatically built and pushed to the Apache Docker Hub organization[7], e.g. apache/flink-rc:1.10.0-rc1 - Upon release, the appropriate “release” Dockerfiles are added (e.g. under the 1.10 directory) and release candidate Dockerfiles removed, and then a pull request opened on the docker-library/official-images repository - Optionally, we introduce “nightly” builds, with an automated process building and pushing images to the Apache Docker Hub organization, e.g. apache/flink-dev:1.10-SNAPSHOT If we choose to move forward in this direction, there are some further steps we could take to improve the experience of both developing and using Flink with Docker (these are actually mostly orthogonal to the proposed changes above, but I think this is a natural first step and should make the following ideas easier to implement). First, there are important differences between images meant for running Flink and those meant for development: the former should strictly package only released distributions of software and be as thin of a layer as possible over the software itself, while the latter can be used during development and testing, and can easily be rebuilt from a “working copy” of the software’s source code. By standardizing on defining such “production” images in the docker-flink repository and “development” image(s) in the Flink repository itself, it is much clearer to developers and users what the right Dockerfile or image they should use for a given purpose. To that end, we could introduce one or more documented Maven goals or Make targets for building a Docker image from the current source tree or a specific release (including unreleased or unsupported versions). Additionally, there has been discussion among Flink contributors for some time about the confusing state of Dockerfiles within the Flink repository, each meant for a different way of running Flink. I’m not completely up to speed about these different efforts, but we could possibly solve this by either building additional “official” images with different entrypoints for these various purposes, or by developing an improved entrypoint script that conveniently supports all cases. I defer to Till Rohrmann, Konstantin Knauf, or Stephan Ewen for further discussion on this point. I apologize again for the wall of text, but if you made it this far, thank you! These improvements have been a long time coming, and I hope we can find a solution that serves the Flink and Docker communities well. Please don’t hesitate to ask any questions. -- Patrick Lucas [1] https://hub.docker.com/_/flink [2] https://lists.apache.org/thread.html/c50297f8659aaa59d4f2ae327b69c4d46d1ab8ecc53138e659e4fe91%40%3Cdev.flink.apache.org%3E [3] On page 2 at the time we went to press: https://hub.docker.com/search?q=&type=image&image_filter=official [4] https://github.com/docker-flink/docker-flink [5] https://github.com/docker-library/official-images/pulls?q=is%3Apr+label%3Alibrary%2Fflink [6] I looked at the 25 most popular “official” images (see [3]) as well as “official” images of Apache software from the top 125; all use a dedicated repo [7] https://hub.docker.com/u/apache |
Big +1 for this effort.
It is really exciting we have started this great work. More and more companies start to use Flink in container environment(docker, Kubernetes, Mesos, even Yarn-3.x). So it is very important that we could have unified official image building and releasing process. The image building process in this proposal is really good and i just have the following thoughts. >> Keep a dedicated repo for Dockerfiles to build official image I think this is a good way and we do not need to make some unnecessary changes to Flink repository. >> Integrate building image into the Flink release process It will bring a better experience for container environment users. In my opinion, a complete release includes the official image. It should be verified to work well. >> Nightly building Do we support for all the release branch or just master branch? >> Multiple purpose Flink images It is really indeed. In developing and testing process, we need some profiling tools to help us investigate some problems. Currently, we do not even have jstack/jmap in the image. >> Unify the Dockerfile in Flink repository In the current code base, we have flink-contrib/docker-flink/Dockerfile to build a image for session cluster. However, it is not updated. For per-job cluster, flink-container/docker/Dockerfile could be used to build a flink image with user artifacts. I think we need to unify them and provide a more powerful build script and entry point. Best, Yang Patrick Lucas <[hidden email]> 于2019年12月19日周四 下午9:20写道: > Hi everyone, > > > I would like to start a discussion about integrating publication of the > Flink Docker images hosted on Docker Hub[1] more tightly with the Flink > release process. Apologies in advance for the long post. > > More than two and a half years ago (time flies!) we introduced “official” > Docker images for Flink[2]. Since then, the popularity of running > containerized applications in general and containerized Flink in particular > has continued to grow. Today, Flink is one of the most popular “official” > images on Docker Hub[3]. > > > A graph of Flink Docker image pulls over time: > > https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png > > “Official” is in quotation marks because while that’s how the Docker > community refers to top-level images on Docker Hub (i.e. those that can be > run with just <docker run foo>), they are not official in the sense of > being officially endorsed by the Flink PMC. > > I think it’s time for that to change. > > Currently, the Dockerfiles that produce these images are maintained in a > repository called docker-flink[4] in a separate, community-managed GitHub > organization of the same name. When a new release of Flink is available, or > when other changes are necessary, these Dockerfiles—one per image—are > updated, and then a pull request[5] is made to the Docker Hub > official-images repo with an updated manifest of images and tags, after > which infrastructure run by Docker Hub builds, checks, and publishes the > images. > > A question that has come up regularly is “Why are the Dockerfiles in a > separate repository from Flink?”, and there are a few different answers: > > - > > These Dockerfiles package only released, published distributions of > Flink, and are therefore decoupled from a particular commit in the Flink > repo > - > > All the Dockerfiles for supported versions (and the corresponding Scala > version variants) should be available in one Git tree for > discoverability > - > > The master branch of Flink is not the right place to encode what the > supported versions are, or how to run previous versions of Flink—it > should > be concerned with the point-in-time of the code represented in that > commit > > > But mostly, having a dedicated repo for Dockerfiles is a convention shared > by nearly every other “official” image on Docker Hub[6]. If the Flink > community wants to do this differently, we will need to work with the > Docker Hub maintainers to make sure we continue to work within their > guidelines and expectations. > > While it seems intuitive that integrating these images into the Flink > release process is a good thing, I don’t believe it is strictly necessary, > since the images only package approved and signed Flink releases, and do > not themselves build Flink from source. However, there are some concrete > advantages: > > - > > Putting the Docker images on (almost) equal footing with Flink binary > release artifacts will help the legitimacy of and user confidence in > running Flink in containerized environments > - > > By publishing release candidate (and possibly nightly) images, the > release testing and automated testing processes could be improved > - > > The delay between Flink releases and when the corresponding Docker > images are available will be reduced > > > Considering all of this, I propose the following: > > - > > We move the Git repository containing the Dockerfiles from the > docker-flink GitHub organization to Apache, placing it under control of > the > Flink PMC > - > > We codify updating these Dockerfiles and notifying Docker Hub into the > Flink release process > - > > For release candidates, Dockerfiles should be added to a special > directory which will be automatically built and pushed to the > Apache Docker > Hub organization[7], e.g. apache/flink-rc:1.10.0-rc1 > - > > Upon release, the appropriate “release” Dockerfiles are added (e.g. > under the 1.10 directory) and release candidate Dockerfiles removed, > and > then a pull request opened on the docker-library/official-images > repository > - > > Optionally, we introduce “nightly” builds, with an automated process > building and pushing images to the Apache Docker Hub organization, e.g. > apache/flink-dev:1.10-SNAPSHOT > > > If we choose to move forward in this direction, there are some further > steps we could take to improve the experience of both developing and using > Flink with Docker (these are actually mostly orthogonal to the proposed > changes above, but I think this is a natural first step and should make the > following ideas easier to implement). > > First, there are important differences between images meant for running > Flink and those meant for development: the former should strictly package > only released distributions of software and be as thin of a layer as > possible over the software itself, while the latter can be used during > development and testing, and can easily be rebuilt from a “working copy” of > the software’s source code. > > By standardizing on defining such “production” images in the docker-flink > repository and “development” image(s) in the Flink repository itself, it is > much clearer to developers and users what the right Dockerfile or image > they should use for a given purpose. To that end, we could introduce one or > more documented Maven goals or Make targets for building a Docker image > from the current source tree or a specific release (including unreleased or > unsupported versions). > > Additionally, there has been discussion among Flink contributors for some > time about the confusing state of Dockerfiles within the Flink repository, > each meant for a different way of running Flink. I’m not completely up to > speed about these different efforts, but we could possibly solve this by > either building additional “official” images with different entrypoints for > these various purposes, or by developing an improved entrypoint script that > conveniently supports all cases. I defer to Till Rohrmann, Konstantin > Knauf, or Stephan Ewen for further discussion on this point. > > I apologize again for the wall of text, but if you made it this far, thank > you! These improvements have been a long time coming, and I hope we can > find a solution that serves the Flink and Docker communities well. Please > don’t hesitate to ask any questions. > > -- > > Patrick Lucas > > [1] https://hub.docker.com/_/flink > > [2] > > https://lists.apache.org/thread.html/c50297f8659aaa59d4f2ae327b69c4d46d1ab8ecc53138e659e4fe91%40%3Cdev.flink.apache.org%3E > > [3] On page 2 at the time we went to press: > https://hub.docker.com/search?q=&type=image&image_filter=official > > [4] https://github.com/docker-flink/docker-flink > > [5] > > https://github.com/docker-library/official-images/pulls?q=is%3Apr+label%3Alibrary%2Fflink > > [6] I looked at the 25 most popular “official” images (see [3]) as well as > “official” images of Apache software from the top 125; all use a dedicated > repo > [7] https://hub.docker.com/u/apache > |
Hi Patrick,
Thanks a lot for your continued work on the Docker images. That’s really really a great job! And I have also benefited from it. Big +1 for integrating docker image publication into the Flink release process since we can leverage the Flink release process to make sure a more legitimacy docker publication. We can also check and vote on it during the release. I think the most import thing we need to discuss first is whether to have a dedicated git repo for the Dockerfiles. Although it is convention shared by nearly every other “official” image on Docker Hub to have a dedicated repo, I'm still not sure about it. Maybe I have missed something important. From my point of view, I think it’s better to have the Dockerfiles in the (main)Flink repo. - First, I think the Dockerfiles can be treated as part of the release. And it is also natural to put the corresponding version of the Dockerfile in the corresponding Flink release. - Second, we can put the Dockerfiles in the path like flink/docker-flink/version/ and the version varies in different releases. For example, for release 1.8.3, we have a flink/docker-flink/1.8.3 folder(or maybe flink/docker-flink/1.8). Even though all Dockerfiles for supported versions are not in one path but they are still in one Git tree with different refs. - Third, it seems the Docker Hub also supports specifying different refs. For the file[1], we can change the GitRepo link from https://github.com/docker-flink/docker-flink.git to https://github.com/apache/flink.git and add a GitFetch for each tag, e.g., GitFetch: refs/tags/release-1.8.3. There are some examples in the file of ubuntu[2]. If the above assumptions are right and there are no more obstacles, I'm intended to have these Dockerfiles in the main Flink repo. In this case, we can reduce the number of repos and reduce the management overhead. What do you think? Best, Hequn [1] https://github.com/docker-library/official-images/blob/master/library/flink [2] https://github.com/docker-library/official-images/blob/master/library/ubuntu On Fri, Dec 20, 2019 at 5:29 PM Yang Wang <[hidden email]> wrote: > Big +1 for this effort. > > It is really exciting we have started this great work. More and more > companies start to > use Flink in container environment(docker, Kubernetes, Mesos, even > Yarn-3.x). So it is > very important that we could have unified official image building and > releasing process. > > > The image building process in this proposal is really good and i just have > the following thoughts. > > >> Keep a dedicated repo for Dockerfiles to build official image > I think this is a good way and we do not need to make some unnecessary > changes to Flink repository. > > >> Integrate building image into the Flink release process > It will bring a better experience for container environment users. In my > opinion, a complete > release includes the official image. It should be verified to work well. > > >> Nightly building > Do we support for all the release branch or just master branch? > > >> Multiple purpose Flink images > It is really indeed. In developing and testing process, we need some > profiling tools to help > us investigate some problems. Currently, we do not even have jstack/jmap in > the image. > > >> Unify the Dockerfile in Flink repository > In the current code base, we have flink-contrib/docker-flink/Dockerfile to > build a image > for session cluster. However, it is not updated. For per-job cluster, > flink-container/docker/Dockerfile > could be used to build a flink image with user artifacts. I think we need > to unify them and > provide a more powerful build script and entry point. > > > > Best, > Yang > > Patrick Lucas <[hidden email]> 于2019年12月19日周四 下午9:20写道: > > > Hi everyone, > > > > > > I would like to start a discussion about integrating publication of the > > Flink Docker images hosted on Docker Hub[1] more tightly with the Flink > > release process. Apologies in advance for the long post. > > > > More than two and a half years ago (time flies!) we introduced “official” > > Docker images for Flink[2]. Since then, the popularity of running > > containerized applications in general and containerized Flink in > particular > > has continued to grow. Today, Flink is one of the most popular “official” > > images on Docker Hub[3]. > > > > > A graph of Flink Docker image pulls over time: > > > > > https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png > > > > “Official” is in quotation marks because while that’s how the Docker > > community refers to top-level images on Docker Hub (i.e. those that can > be > > run with just <docker run foo>), they are not official in the sense of > > being officially endorsed by the Flink PMC. > > > > I think it’s time for that to change. > > > > Currently, the Dockerfiles that produce these images are maintained in a > > repository called docker-flink[4] in a separate, community-managed GitHub > > organization of the same name. When a new release of Flink is available, > or > > when other changes are necessary, these Dockerfiles—one per image—are > > updated, and then a pull request[5] is made to the Docker Hub > > official-images repo with an updated manifest of images and tags, after > > which infrastructure run by Docker Hub builds, checks, and publishes the > > images. > > > > A question that has come up regularly is “Why are the Dockerfiles in a > > separate repository from Flink?”, and there are a few different answers: > > > > - > > > > These Dockerfiles package only released, published distributions of > > Flink, and are therefore decoupled from a particular commit in the > Flink > > repo > > - > > > > All the Dockerfiles for supported versions (and the corresponding > Scala > > version variants) should be available in one Git tree for > > discoverability > > - > > > > The master branch of Flink is not the right place to encode what the > > supported versions are, or how to run previous versions of Flink—it > > should > > be concerned with the point-in-time of the code represented in that > > commit > > > > > > But mostly, having a dedicated repo for Dockerfiles is a convention > shared > > by nearly every other “official” image on Docker Hub[6]. If the Flink > > community wants to do this differently, we will need to work with the > > Docker Hub maintainers to make sure we continue to work within their > > guidelines and expectations. > > > > While it seems intuitive that integrating these images into the Flink > > release process is a good thing, I don’t believe it is strictly > necessary, > > since the images only package approved and signed Flink releases, and do > > not themselves build Flink from source. However, there are some concrete > > advantages: > > > > - > > > > Putting the Docker images on (almost) equal footing with Flink binary > > release artifacts will help the legitimacy of and user confidence in > > running Flink in containerized environments > > - > > > > By publishing release candidate (and possibly nightly) images, the > > release testing and automated testing processes could be improved > > - > > > > The delay between Flink releases and when the corresponding Docker > > images are available will be reduced > > > > > > Considering all of this, I propose the following: > > > > - > > > > We move the Git repository containing the Dockerfiles from the > > docker-flink GitHub organization to Apache, placing it under control > of > > the > > Flink PMC > > - > > > > We codify updating these Dockerfiles and notifying Docker Hub into the > > Flink release process > > - > > > > For release candidates, Dockerfiles should be added to a special > > directory which will be automatically built and pushed to the > > Apache Docker > > Hub organization[7], e.g. apache/flink-rc:1.10.0-rc1 > > - > > > > Upon release, the appropriate “release” Dockerfiles are added (e.g. > > under the 1.10 directory) and release candidate Dockerfiles > removed, > > and > > then a pull request opened on the docker-library/official-images > > repository > > - > > > > Optionally, we introduce “nightly” builds, with an automated process > > building and pushing images to the Apache Docker Hub organization, > e.g. > > apache/flink-dev:1.10-SNAPSHOT > > > > > > If we choose to move forward in this direction, there are some further > > steps we could take to improve the experience of both developing and > using > > Flink with Docker (these are actually mostly orthogonal to the proposed > > changes above, but I think this is a natural first step and should make > the > > following ideas easier to implement). > > > > First, there are important differences between images meant for running > > Flink and those meant for development: the former should strictly package > > only released distributions of software and be as thin of a layer as > > possible over the software itself, while the latter can be used during > > development and testing, and can easily be rebuilt from a “working copy” > of > > the software’s source code. > > > > By standardizing on defining such “production” images in the docker-flink > > repository and “development” image(s) in the Flink repository itself, it > is > > much clearer to developers and users what the right Dockerfile or image > > they should use for a given purpose. To that end, we could introduce one > or > > more documented Maven goals or Make targets for building a Docker image > > from the current source tree or a specific release (including unreleased > or > > unsupported versions). > > > > Additionally, there has been discussion among Flink contributors for some > > time about the confusing state of Dockerfiles within the Flink > repository, > > each meant for a different way of running Flink. I’m not completely up to > > speed about these different efforts, but we could possibly solve this by > > either building additional “official” images with different entrypoints > for > > these various purposes, or by developing an improved entrypoint script > that > > conveniently supports all cases. I defer to Till Rohrmann, Konstantin > > Knauf, or Stephan Ewen for further discussion on this point. > > > > I apologize again for the wall of text, but if you made it this far, > thank > > you! These improvements have been a long time coming, and I hope we can > > find a solution that serves the Flink and Docker communities well. Please > > don’t hesitate to ask any questions. > > > > -- > > > > Patrick Lucas > > > > [1] https://hub.docker.com/_/flink > > > > [2] > > > > > https://lists.apache.org/thread.html/c50297f8659aaa59d4f2ae327b69c4d46d1ab8ecc53138e659e4fe91%40%3Cdev.flink.apache.org%3E > > > > [3] On page 2 at the time we went to press: > > https://hub.docker.com/search?q=&type=image&image_filter=official > > > > [4] https://github.com/docker-flink/docker-flink > > > > [5] > > > > > https://github.com/docker-library/official-images/pulls?q=is%3Apr+label%3Alibrary%2Fflink > > > > [6] I looked at the 25 most popular “official” images (see [3]) as well > as > > “official” images of Apache software from the top 125; all use a > dedicated > > repo > > [7] https://hub.docker.com/u/apache > > > |
Thanks a lot for starting this discussion Patrick! I think it is a very
good idea to move Flink's docker image more under the jurisdiction of the Flink PMC and to make it releasing new docker images part of Flink's release process (not saying that we cannot release new docker images independent of Flink's release cycle). One thing I have no strong opinion about is where to place the Dockerfiles (apache/flink.git vs. apache/flink-docker.git). I see the point that one wants to separate concerns (Flink code vs. Dockerfiles) and, hence, that having separate repositories might help with this objective. But on the other hand, I don't have a lot of experience with Docker Hub and how to best host Dockerfiles. Consequently, it would be helpful if others who have made some experience could share it with us. Cheers, Till On Sat, Dec 21, 2019 at 2:28 PM Hequn Cheng <[hidden email]> wrote: > Hi Patrick, > > Thanks a lot for your continued work on the Docker images. That’s really > really a great job! And I have also benefited from it. > > Big +1 for integrating docker image publication into the Flink release > process since we can leverage the Flink release process to make sure a more > legitimacy docker publication. We can also check and vote on it during the > release. > > I think the most import thing we need to discuss first is whether to have a > dedicated git repo for the Dockerfiles. > > Although it is convention shared by nearly every other “official” image on > Docker Hub to have a dedicated repo, I'm still not sure about it. Maybe I > have missed something important. From my point of view, I think it’s better > to have the Dockerfiles in the (main)Flink repo. > - First, I think the Dockerfiles can be treated as part of the release. > And it is also natural to put the corresponding version of the Dockerfile > in the corresponding Flink release. > - Second, we can put the Dockerfiles in the path like > flink/docker-flink/version/ and the version varies in different releases. > For example, for release 1.8.3, we have a flink/docker-flink/1.8.3 > folder(or maybe flink/docker-flink/1.8). Even though all Dockerfiles for > supported versions are not in one path but they are still in one Git tree > with different refs. > - Third, it seems the Docker Hub also supports specifying different refs. > For the file[1], we can change the GitRepo link from > https://github.com/docker-flink/docker-flink.git to > https://github.com/apache/flink.git and add a GitFetch for each tag, e.g., > GitFetch: refs/tags/release-1.8.3. There are some examples in the file of > ubuntu[2]. > > If the above assumptions are right and there are no more obstacles, I'm > intended to have these Dockerfiles in the main Flink repo. In this case, we > can reduce the number of repos and reduce the management overhead. > What do you think? > > Best, > Hequn > > [1] > https://github.com/docker-library/official-images/blob/master/library/flink > [2] > > https://github.com/docker-library/official-images/blob/master/library/ubuntu > > > On Fri, Dec 20, 2019 at 5:29 PM Yang Wang <[hidden email]> wrote: > > > Big +1 for this effort. > > > > It is really exciting we have started this great work. More and more > > companies start to > > use Flink in container environment(docker, Kubernetes, Mesos, even > > Yarn-3.x). So it is > > very important that we could have unified official image building and > > releasing process. > > > > > > The image building process in this proposal is really good and i just > have > > the following thoughts. > > > > >> Keep a dedicated repo for Dockerfiles to build official image > > I think this is a good way and we do not need to make some unnecessary > > changes to Flink repository. > > > > >> Integrate building image into the Flink release process > > It will bring a better experience for container environment users. In my > > opinion, a complete > > release includes the official image. It should be verified to work well. > > > > >> Nightly building > > Do we support for all the release branch or just master branch? > > > > >> Multiple purpose Flink images > > It is really indeed. In developing and testing process, we need some > > profiling tools to help > > us investigate some problems. Currently, we do not even have jstack/jmap > in > > the image. > > > > >> Unify the Dockerfile in Flink repository > > In the current code base, we have flink-contrib/docker-flink/Dockerfile > to > > build a image > > for session cluster. However, it is not updated. For per-job cluster, > > flink-container/docker/Dockerfile > > could be used to build a flink image with user artifacts. I think we need > > to unify them and > > provide a more powerful build script and entry point. > > > > > > > > Best, > > Yang > > > > Patrick Lucas <[hidden email]> 于2019年12月19日周四 下午9:20写道: > > > > > Hi everyone, > > > > > > > > > I would like to start a discussion about integrating publication of the > > > Flink Docker images hosted on Docker Hub[1] more tightly with the Flink > > > release process. Apologies in advance for the long post. > > > > > > More than two and a half years ago (time flies!) we introduced > “official” > > > Docker images for Flink[2]. Since then, the popularity of running > > > containerized applications in general and containerized Flink in > > particular > > > has continued to grow. Today, Flink is one of the most popular > “official” > > > images on Docker Hub[3]. > > > > > > > A graph of Flink Docker image pulls over time: > > > > > > > > > https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png > > > > > > “Official” is in quotation marks because while that’s how the Docker > > > community refers to top-level images on Docker Hub (i.e. those that can > > be > > > run with just <docker run foo>), they are not official in the sense of > > > being officially endorsed by the Flink PMC. > > > > > > I think it’s time for that to change. > > > > > > Currently, the Dockerfiles that produce these images are maintained in > a > > > repository called docker-flink[4] in a separate, community-managed > GitHub > > > organization of the same name. When a new release of Flink is > available, > > or > > > when other changes are necessary, these Dockerfiles—one per image—are > > > updated, and then a pull request[5] is made to the Docker Hub > > > official-images repo with an updated manifest of images and tags, after > > > which infrastructure run by Docker Hub builds, checks, and publishes > the > > > images. > > > > > > A question that has come up regularly is “Why are the Dockerfiles in a > > > separate repository from Flink?”, and there are a few different > answers: > > > > > > - > > > > > > These Dockerfiles package only released, published distributions of > > > Flink, and are therefore decoupled from a particular commit in the > > Flink > > > repo > > > - > > > > > > All the Dockerfiles for supported versions (and the corresponding > > Scala > > > version variants) should be available in one Git tree for > > > discoverability > > > - > > > > > > The master branch of Flink is not the right place to encode what the > > > supported versions are, or how to run previous versions of Flink—it > > > should > > > be concerned with the point-in-time of the code represented in that > > > commit > > > > > > > > > But mostly, having a dedicated repo for Dockerfiles is a convention > > shared > > > by nearly every other “official” image on Docker Hub[6]. If the Flink > > > community wants to do this differently, we will need to work with the > > > Docker Hub maintainers to make sure we continue to work within their > > > guidelines and expectations. > > > > > > While it seems intuitive that integrating these images into the Flink > > > release process is a good thing, I don’t believe it is strictly > > necessary, > > > since the images only package approved and signed Flink releases, and > do > > > not themselves build Flink from source. However, there are some > concrete > > > advantages: > > > > > > - > > > > > > Putting the Docker images on (almost) equal footing with Flink > binary > > > release artifacts will help the legitimacy of and user confidence in > > > running Flink in containerized environments > > > - > > > > > > By publishing release candidate (and possibly nightly) images, the > > > release testing and automated testing processes could be improved > > > - > > > > > > The delay between Flink releases and when the corresponding Docker > > > images are available will be reduced > > > > > > > > > Considering all of this, I propose the following: > > > > > > - > > > > > > We move the Git repository containing the Dockerfiles from the > > > docker-flink GitHub organization to Apache, placing it under control > > of > > > the > > > Flink PMC > > > - > > > > > > We codify updating these Dockerfiles and notifying Docker Hub into > the > > > Flink release process > > > - > > > > > > For release candidates, Dockerfiles should be added to a special > > > directory which will be automatically built and pushed to the > > > Apache Docker > > > Hub organization[7], e.g. apache/flink-rc:1.10.0-rc1 > > > - > > > > > > Upon release, the appropriate “release” Dockerfiles are added > (e.g. > > > under the 1.10 directory) and release candidate Dockerfiles > > removed, > > > and > > > then a pull request opened on the docker-library/official-images > > > repository > > > - > > > > > > Optionally, we introduce “nightly” builds, with an automated process > > > building and pushing images to the Apache Docker Hub organization, > > e.g. > > > apache/flink-dev:1.10-SNAPSHOT > > > > > > > > > If we choose to move forward in this direction, there are some further > > > steps we could take to improve the experience of both developing and > > using > > > Flink with Docker (these are actually mostly orthogonal to the proposed > > > changes above, but I think this is a natural first step and should make > > the > > > following ideas easier to implement). > > > > > > First, there are important differences between images meant for running > > > Flink and those meant for development: the former should strictly > package > > > only released distributions of software and be as thin of a layer as > > > possible over the software itself, while the latter can be used during > > > development and testing, and can easily be rebuilt from a “working > copy” > > of > > > the software’s source code. > > > > > > By standardizing on defining such “production” images in the > docker-flink > > > repository and “development” image(s) in the Flink repository itself, > it > > is > > > much clearer to developers and users what the right Dockerfile or image > > > they should use for a given purpose. To that end, we could introduce > one > > or > > > more documented Maven goals or Make targets for building a Docker image > > > from the current source tree or a specific release (including > unreleased > > or > > > unsupported versions). > > > > > > Additionally, there has been discussion among Flink contributors for > some > > > time about the confusing state of Dockerfiles within the Flink > > repository, > > > each meant for a different way of running Flink. I’m not completely up > to > > > speed about these different efforts, but we could possibly solve this > by > > > either building additional “official” images with different entrypoints > > for > > > these various purposes, or by developing an improved entrypoint script > > that > > > conveniently supports all cases. I defer to Till Rohrmann, Konstantin > > > Knauf, or Stephan Ewen for further discussion on this point. > > > > > > I apologize again for the wall of text, but if you made it this far, > > thank > > > you! These improvements have been a long time coming, and I hope we can > > > find a solution that serves the Flink and Docker communities well. > Please > > > don’t hesitate to ask any questions. > > > > > > -- > > > > > > Patrick Lucas > > > > > > [1] https://hub.docker.com/_/flink > > > > > > [2] > > > > > > > > > https://lists.apache.org/thread.html/c50297f8659aaa59d4f2ae327b69c4d46d1ab8ecc53138e659e4fe91%40%3Cdev.flink.apache.org%3E > > > > > > [3] On page 2 at the time we went to press: > > > https://hub.docker.com/search?q=&type=image&image_filter=official > > > > > > [4] https://github.com/docker-flink/docker-flink > > > > > > [5] > > > > > > > > > https://github.com/docker-library/official-images/pulls?q=is%3Apr+label%3Alibrary%2Fflink > > > > > > [6] I looked at the 25 most popular “official” images (see [3]) as well > > as > > > “official” images of Apache software from the top 125; all use a > > dedicated > > > repo > > > [7] https://hub.docker.com/u/apache > > > > > > |
Big +1 for
* official images in a separate repository * unified images (session cluster vs application cluster) * images for development in Apache flink repository On Fri, Jan 10, 2020 at 7:14 PM Till Rohrmann <[hidden email]> wrote: > Thanks a lot for starting this discussion Patrick! I think it is a very > good idea to move Flink's docker image more under the jurisdiction of the > Flink PMC and to make it releasing new docker images part of Flink's > release process (not saying that we cannot release new docker images > independent of Flink's release cycle). > > One thing I have no strong opinion about is where to place the Dockerfiles > (apache/flink.git vs. apache/flink-docker.git). I see the point that one > wants to separate concerns (Flink code vs. Dockerfiles) and, hence, that > having separate repositories might help with this objective. But on the > other hand, I don't have a lot of experience with Docker Hub and how to > best host Dockerfiles. Consequently, it would be helpful if others who have > made some experience could share it with us. > > Cheers, > Till > > On Sat, Dec 21, 2019 at 2:28 PM Hequn Cheng <[hidden email]> wrote: > > > Hi Patrick, > > > > Thanks a lot for your continued work on the Docker images. That’s really > > really a great job! And I have also benefited from it. > > > > Big +1 for integrating docker image publication into the Flink release > > process since we can leverage the Flink release process to make sure a > more > > legitimacy docker publication. We can also check and vote on it during > the > > release. > > > > I think the most import thing we need to discuss first is whether to > have a > > dedicated git repo for the Dockerfiles. > > > > Although it is convention shared by nearly every other “official” image > on > > Docker Hub to have a dedicated repo, I'm still not sure about it. Maybe I > > have missed something important. From my point of view, I think it’s > better > > to have the Dockerfiles in the (main)Flink repo. > > - First, I think the Dockerfiles can be treated as part of the release. > > And it is also natural to put the corresponding version of the Dockerfile > > in the corresponding Flink release. > > - Second, we can put the Dockerfiles in the path like > > flink/docker-flink/version/ and the version varies in different releases. > > For example, for release 1.8.3, we have a flink/docker-flink/1.8.3 > > folder(or maybe flink/docker-flink/1.8). Even though all Dockerfiles for > > supported versions are not in one path but they are still in one Git tree > > with different refs. > > - Third, it seems the Docker Hub also supports specifying different > refs. > > For the file[1], we can change the GitRepo link from > > https://github.com/docker-flink/docker-flink.git to > > https://github.com/apache/flink.git and add a GitFetch for each tag, > e.g., > > GitFetch: refs/tags/release-1.8.3. There are some examples in the file of > > ubuntu[2]. > > > > If the above assumptions are right and there are no more obstacles, I'm > > intended to have these Dockerfiles in the main Flink repo. In this case, > we > > can reduce the number of repos and reduce the management overhead. > > What do you think? > > > > Best, > > Hequn > > > > [1] > > > https://github.com/docker-library/official-images/blob/master/library/flink > > [2] > > > > > https://github.com/docker-library/official-images/blob/master/library/ubuntu > > > > > > On Fri, Dec 20, 2019 at 5:29 PM Yang Wang <[hidden email]> wrote: > > > > > Big +1 for this effort. > > > > > > It is really exciting we have started this great work. More and more > > > companies start to > > > use Flink in container environment(docker, Kubernetes, Mesos, even > > > Yarn-3.x). So it is > > > very important that we could have unified official image building and > > > releasing process. > > > > > > > > > The image building process in this proposal is really good and i just > > have > > > the following thoughts. > > > > > > >> Keep a dedicated repo for Dockerfiles to build official image > > > I think this is a good way and we do not need to make some unnecessary > > > changes to Flink repository. > > > > > > >> Integrate building image into the Flink release process > > > It will bring a better experience for container environment users. In > my > > > opinion, a complete > > > release includes the official image. It should be verified to work > well. > > > > > > >> Nightly building > > > Do we support for all the release branch or just master branch? > > > > > > >> Multiple purpose Flink images > > > It is really indeed. In developing and testing process, we need some > > > profiling tools to help > > > us investigate some problems. Currently, we do not even have > jstack/jmap > > in > > > the image. > > > > > > >> Unify the Dockerfile in Flink repository > > > In the current code base, we have flink-contrib/docker-flink/Dockerfile > > to > > > build a image > > > for session cluster. However, it is not updated. For per-job cluster, > > > flink-container/docker/Dockerfile > > > could be used to build a flink image with user artifacts. I think we > need > > > to unify them and > > > provide a more powerful build script and entry point. > > > > > > > > > > > > Best, > > > Yang > > > > > > Patrick Lucas <[hidden email]> 于2019年12月19日周四 下午9:20写道: > > > > > > > Hi everyone, > > > > > > > > > > > > I would like to start a discussion about integrating publication of > the > > > > Flink Docker images hosted on Docker Hub[1] more tightly with the > Flink > > > > release process. Apologies in advance for the long post. > > > > > > > > More than two and a half years ago (time flies!) we introduced > > “official” > > > > Docker images for Flink[2]. Since then, the popularity of running > > > > containerized applications in general and containerized Flink in > > > particular > > > > has continued to grow. Today, Flink is one of the most popular > > “official” > > > > images on Docker Hub[3]. > > > > > > > > > A graph of Flink Docker image pulls over time: > > > > > > > > > > > > > > https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png > > > > > > > > “Official” is in quotation marks because while that’s how the Docker > > > > community refers to top-level images on Docker Hub (i.e. those that > can > > > be > > > > run with just <docker run foo>), they are not official in the sense > of > > > > being officially endorsed by the Flink PMC. > > > > > > > > I think it’s time for that to change. > > > > > > > > Currently, the Dockerfiles that produce these images are maintained > in > > a > > > > repository called docker-flink[4] in a separate, community-managed > > GitHub > > > > organization of the same name. When a new release of Flink is > > available, > > > or > > > > when other changes are necessary, these Dockerfiles—one per image—are > > > > updated, and then a pull request[5] is made to the Docker Hub > > > > official-images repo with an updated manifest of images and tags, > after > > > > which infrastructure run by Docker Hub builds, checks, and publishes > > the > > > > images. > > > > > > > > A question that has come up regularly is “Why are the Dockerfiles in > a > > > > separate repository from Flink?”, and there are a few different > > answers: > > > > > > > > - > > > > > > > > These Dockerfiles package only released, published distributions > of > > > > Flink, and are therefore decoupled from a particular commit in the > > > Flink > > > > repo > > > > - > > > > > > > > All the Dockerfiles for supported versions (and the corresponding > > > Scala > > > > version variants) should be available in one Git tree for > > > > discoverability > > > > - > > > > > > > > The master branch of Flink is not the right place to encode what > the > > > > supported versions are, or how to run previous versions of > Flink—it > > > > should > > > > be concerned with the point-in-time of the code represented in > that > > > > commit > > > > > > > > > > > > But mostly, having a dedicated repo for Dockerfiles is a convention > > > shared > > > > by nearly every other “official” image on Docker Hub[6]. If the Flink > > > > community wants to do this differently, we will need to work with the > > > > Docker Hub maintainers to make sure we continue to work within their > > > > guidelines and expectations. > > > > > > > > While it seems intuitive that integrating these images into the Flink > > > > release process is a good thing, I don’t believe it is strictly > > > necessary, > > > > since the images only package approved and signed Flink releases, and > > do > > > > not themselves build Flink from source. However, there are some > > concrete > > > > advantages: > > > > > > > > - > > > > > > > > Putting the Docker images on (almost) equal footing with Flink > > binary > > > > release artifacts will help the legitimacy of and user confidence > in > > > > running Flink in containerized environments > > > > - > > > > > > > > By publishing release candidate (and possibly nightly) images, the > > > > release testing and automated testing processes could be improved > > > > - > > > > > > > > The delay between Flink releases and when the corresponding Docker > > > > images are available will be reduced > > > > > > > > > > > > Considering all of this, I propose the following: > > > > > > > > - > > > > > > > > We move the Git repository containing the Dockerfiles from the > > > > docker-flink GitHub organization to Apache, placing it under > control > > > of > > > > the > > > > Flink PMC > > > > - > > > > > > > > We codify updating these Dockerfiles and notifying Docker Hub into > > the > > > > Flink release process > > > > - > > > > > > > > For release candidates, Dockerfiles should be added to a > special > > > > directory which will be automatically built and pushed to the > > > > Apache Docker > > > > Hub organization[7], e.g. apache/flink-rc:1.10.0-rc1 > > > > - > > > > > > > > Upon release, the appropriate “release” Dockerfiles are added > > (e.g. > > > > under the 1.10 directory) and release candidate Dockerfiles > > > removed, > > > > and > > > > then a pull request opened on the > docker-library/official-images > > > > repository > > > > - > > > > > > > > Optionally, we introduce “nightly” builds, with an automated > process > > > > building and pushing images to the Apache Docker Hub organization, > > > e.g. > > > > apache/flink-dev:1.10-SNAPSHOT > > > > > > > > > > > > If we choose to move forward in this direction, there are some > further > > > > steps we could take to improve the experience of both developing and > > > using > > > > Flink with Docker (these are actually mostly orthogonal to the > proposed > > > > changes above, but I think this is a natural first step and should > make > > > the > > > > following ideas easier to implement). > > > > > > > > First, there are important differences between images meant for > running > > > > Flink and those meant for development: the former should strictly > > package > > > > only released distributions of software and be as thin of a layer as > > > > possible over the software itself, while the latter can be used > during > > > > development and testing, and can easily be rebuilt from a “working > > copy” > > > of > > > > the software’s source code. > > > > > > > > By standardizing on defining such “production” images in the > > docker-flink > > > > repository and “development” image(s) in the Flink repository itself, > > it > > > is > > > > much clearer to developers and users what the right Dockerfile or > image > > > > they should use for a given purpose. To that end, we could introduce > > one > > > or > > > > more documented Maven goals or Make targets for building a Docker > image > > > > from the current source tree or a specific release (including > > unreleased > > > or > > > > unsupported versions). > > > > > > > > Additionally, there has been discussion among Flink contributors for > > some > > > > time about the confusing state of Dockerfiles within the Flink > > > repository, > > > > each meant for a different way of running Flink. I’m not completely > up > > to > > > > speed about these different efforts, but we could possibly solve this > > by > > > > either building additional “official” images with different > entrypoints > > > for > > > > these various purposes, or by developing an improved entrypoint > script > > > that > > > > conveniently supports all cases. I defer to Till Rohrmann, Konstantin > > > > Knauf, or Stephan Ewen for further discussion on this point. > > > > > > > > I apologize again for the wall of text, but if you made it this far, > > > thank > > > > you! These improvements have been a long time coming, and I hope we > can > > > > find a solution that serves the Flink and Docker communities well. > > Please > > > > don’t hesitate to ask any questions. > > > > > > > > -- > > > > > > > > Patrick Lucas > > > > > > > > [1] https://hub.docker.com/_/flink > > > > > > > > [2] > > > > > > > > > > > > > > https://lists.apache.org/thread.html/c50297f8659aaa59d4f2ae327b69c4d46d1ab8ecc53138e659e4fe91%40%3Cdev.flink.apache.org%3E > > > > > > > > [3] On page 2 at the time we went to press: > > > > https://hub.docker.com/search?q=&type=image&image_filter=official > > > > > > > > [4] https://github.com/docker-flink/docker-flink > > > > > > > > [5] > > > > > > > > > > > > > > https://github.com/docker-library/official-images/pulls?q=is%3Apr+label%3Alibrary%2Fflink > > > > > > > > [6] I looked at the 25 most popular “official” images (see [3]) as > well > > > as > > > > “official” images of Apache software from the top 125; all use a > > > dedicated > > > > repo > > > > [7] https://hub.docker.com/u/apache > > > > > > > > > > -- Konstantin Knauf | Solutions Architect +49 160 91394525 Follow us @VervericaData Ververica <https://www.ververica.com/> -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Tony) Cheng |
Hey all,
first of all a big thank you for driving many of the Docker image releases in the last two years. *(1) Moving docker-flink/docker-flink to apache/docker-flink* +1 to do this as you outlined. I would propose to aim for a first integration with the 1.10 release without major changes to the existing Dockerfiles. The work items would be to move the Dockerfiles and update the release process documentation so everyone is on the same page. *(2) Consolidate Dockerfiles in apache/flink* +1 to start the process for this. I think this requires a bit of thinking about what the requirements are and which problems we want to solve. From skimming the existing Dockerfiles, it seems to me that the Docker image builds fulfil quite a few different tasks. We have a script that can bundle Hadoop, can copy an existing Flink distribution, can include user jars, etc. The scope of this is quite broad and would warrant a design document/a FLIP. I would move the questions about nightly builds, using a different base image or having image variants with debug tooling to after (1) and (2) or make it part of (2). *(3) Next steps* If there are no objections, I would propose to tackle (1) and (2) separate and to continue as follows: (i) Create tickets for (1) and aim to align with 1.10 release timeline (ideally before the first RC). Since this does not touch any code in the release branches, I think this would not be affected by the feature freeze. The major work item would be to update the docs and potential refactorings of the existing process and Dockerfiles. I can help with the process to create a new repo. (ii) Create first draft for consolidation of existing Dockerfiles. After this proposal is done, I would propose to bring it up for a separate discussion on the ML. What do you think? @Patrick: would you be interested in working on both (1) + (2) or did you mainly have (1) in mind? Best, Ufuk On Sun, Jan 12, 2020 at 8:30 PM Konstantin Knauf <[hidden email]> wrote: > Big +1 for > > * official images in a separate repository > * unified images (session cluster vs application cluster) > * images for development in Apache flink repository > > On Fri, Jan 10, 2020 at 7:14 PM Till Rohrmann <[hidden email]> > wrote: > > > Thanks a lot for starting this discussion Patrick! I think it is a very > > good idea to move Flink's docker image more under the jurisdiction of the > > Flink PMC and to make it releasing new docker images part of Flink's > > release process (not saying that we cannot release new docker images > > independent of Flink's release cycle). > > > > One thing I have no strong opinion about is where to place the > Dockerfiles > > (apache/flink.git vs. apache/flink-docker.git). I see the point that one > > wants to separate concerns (Flink code vs. Dockerfiles) and, hence, that > > having separate repositories might help with this objective. But on the > > other hand, I don't have a lot of experience with Docker Hub and how to > > best host Dockerfiles. Consequently, it would be helpful if others who > have > > made some experience could share it with us. > > > > Cheers, > > Till > > > > On Sat, Dec 21, 2019 at 2:28 PM Hequn Cheng <[hidden email]> > wrote: > > > > > Hi Patrick, > > > > > > Thanks a lot for your continued work on the Docker images. That’s > really > > > really a great job! And I have also benefited from it. > > > > > > Big +1 for integrating docker image publication into the Flink release > > > process since we can leverage the Flink release process to make sure a > > more > > > legitimacy docker publication. We can also check and vote on it during > > the > > > release. > > > > > > I think the most import thing we need to discuss first is whether to > > have a > > > dedicated git repo for the Dockerfiles. > > > > > > Although it is convention shared by nearly every other “official” image > > on > > > Docker Hub to have a dedicated repo, I'm still not sure about it. > Maybe I > > > have missed something important. From my point of view, I think it’s > > better > > > to have the Dockerfiles in the (main)Flink repo. > > > - First, I think the Dockerfiles can be treated as part of the > release. > > > And it is also natural to put the corresponding version of the > Dockerfile > > > in the corresponding Flink release. > > > - Second, we can put the Dockerfiles in the path like > > > flink/docker-flink/version/ and the version varies in different > releases. > > > For example, for release 1.8.3, we have a flink/docker-flink/1.8.3 > > > folder(or maybe flink/docker-flink/1.8). Even though all Dockerfiles > for > > > supported versions are not in one path but they are still in one Git > tree > > > with different refs. > > > - Third, it seems the Docker Hub also supports specifying different > > refs. > > > For the file[1], we can change the GitRepo link from > > > https://github.com/docker-flink/docker-flink.git to > > > https://github.com/apache/flink.git and add a GitFetch for each tag, > > e.g., > > > GitFetch: refs/tags/release-1.8.3. There are some examples in the file > of > > > ubuntu[2]. > > > > > > If the above assumptions are right and there are no more obstacles, I'm > > > intended to have these Dockerfiles in the main Flink repo. In this > case, > > we > > > can reduce the number of repos and reduce the management overhead. > > > What do you think? > > > > > > Best, > > > Hequn > > > > > > [1] > > > > > > https://github.com/docker-library/official-images/blob/master/library/flink > > > [2] > > > > > > > > > https://github.com/docker-library/official-images/blob/master/library/ubuntu > > > > > > > > > On Fri, Dec 20, 2019 at 5:29 PM Yang Wang <[hidden email]> > wrote: > > > > > > > Big +1 for this effort. > > > > > > > > It is really exciting we have started this great work. More and more > > > > companies start to > > > > use Flink in container environment(docker, Kubernetes, Mesos, even > > > > Yarn-3.x). So it is > > > > very important that we could have unified official image building and > > > > releasing process. > > > > > > > > > > > > The image building process in this proposal is really good and i just > > > have > > > > the following thoughts. > > > > > > > > >> Keep a dedicated repo for Dockerfiles to build official image > > > > I think this is a good way and we do not need to make some > unnecessary > > > > changes to Flink repository. > > > > > > > > >> Integrate building image into the Flink release process > > > > It will bring a better experience for container environment users. In > > my > > > > opinion, a complete > > > > release includes the official image. It should be verified to work > > well. > > > > > > > > >> Nightly building > > > > Do we support for all the release branch or just master branch? > > > > > > > > >> Multiple purpose Flink images > > > > It is really indeed. In developing and testing process, we need some > > > > profiling tools to help > > > > us investigate some problems. Currently, we do not even have > > jstack/jmap > > > in > > > > the image. > > > > > > > > >> Unify the Dockerfile in Flink repository > > > > In the current code base, we have > flink-contrib/docker-flink/Dockerfile > > > to > > > > build a image > > > > for session cluster. However, it is not updated. For per-job cluster, > > > > flink-container/docker/Dockerfile > > > > could be used to build a flink image with user artifacts. I think we > > need > > > > to unify them and > > > > provide a more powerful build script and entry point. > > > > > > > > > > > > > > > > Best, > > > > Yang > > > > > > > > Patrick Lucas <[hidden email]> 于2019年12月19日周四 下午9:20写道: > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > I would like to start a discussion about integrating publication of > > the > > > > > Flink Docker images hosted on Docker Hub[1] more tightly with the > > Flink > > > > > release process. Apologies in advance for the long post. > > > > > > > > > > More than two and a half years ago (time flies!) we introduced > > > “official” > > > > > Docker images for Flink[2]. Since then, the popularity of running > > > > > containerized applications in general and containerized Flink in > > > > particular > > > > > has continued to grow. Today, Flink is one of the most popular > > > “official” > > > > > images on Docker Hub[3]. > > > > > > > > > > > A graph of Flink Docker image pulls over time: > > > > > > > > > > > > > > > > > > > > https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png > > > > > > > > > > “Official” is in quotation marks because while that’s how the > Docker > > > > > community refers to top-level images on Docker Hub (i.e. those that > > can > > > > be > > > > > run with just <docker run foo>), they are not official in the sense > > of > > > > > being officially endorsed by the Flink PMC. > > > > > > > > > > I think it’s time for that to change. > > > > > > > > > > Currently, the Dockerfiles that produce these images are maintained > > in > > > a > > > > > repository called docker-flink[4] in a separate, community-managed > > > GitHub > > > > > organization of the same name. When a new release of Flink is > > > available, > > > > or > > > > > when other changes are necessary, these Dockerfiles—one per > image—are > > > > > updated, and then a pull request[5] is made to the Docker Hub > > > > > official-images repo with an updated manifest of images and tags, > > after > > > > > which infrastructure run by Docker Hub builds, checks, and > publishes > > > the > > > > > images. > > > > > > > > > > A question that has come up regularly is “Why are the Dockerfiles > in > > a > > > > > separate repository from Flink?”, and there are a few different > > > answers: > > > > > > > > > > - > > > > > > > > > > These Dockerfiles package only released, published distributions > > of > > > > > Flink, and are therefore decoupled from a particular commit in > the > > > > Flink > > > > > repo > > > > > - > > > > > > > > > > All the Dockerfiles for supported versions (and the > corresponding > > > > Scala > > > > > version variants) should be available in one Git tree for > > > > > discoverability > > > > > - > > > > > > > > > > The master branch of Flink is not the right place to encode what > > the > > > > > supported versions are, or how to run previous versions of > > Flink—it > > > > > should > > > > > be concerned with the point-in-time of the code represented in > > that > > > > > commit > > > > > > > > > > > > > > > But mostly, having a dedicated repo for Dockerfiles is a convention > > > > shared > > > > > by nearly every other “official” image on Docker Hub[6]. If the > Flink > > > > > community wants to do this differently, we will need to work with > the > > > > > Docker Hub maintainers to make sure we continue to work within > their > > > > > guidelines and expectations. > > > > > > > > > > While it seems intuitive that integrating these images into the > Flink > > > > > release process is a good thing, I don’t believe it is strictly > > > > necessary, > > > > > since the images only package approved and signed Flink releases, > and > > > do > > > > > not themselves build Flink from source. However, there are some > > > concrete > > > > > advantages: > > > > > > > > > > - > > > > > > > > > > Putting the Docker images on (almost) equal footing with Flink > > > binary > > > > > release artifacts will help the legitimacy of and user > confidence > > in > > > > > running Flink in containerized environments > > > > > - > > > > > > > > > > By publishing release candidate (and possibly nightly) images, > the > > > > > release testing and automated testing processes could be > improved > > > > > - > > > > > > > > > > The delay between Flink releases and when the corresponding > Docker > > > > > images are available will be reduced > > > > > > > > > > > > > > > Considering all of this, I propose the following: > > > > > > > > > > - > > > > > > > > > > We move the Git repository containing the Dockerfiles from the > > > > > docker-flink GitHub organization to Apache, placing it under > > control > > > > of > > > > > the > > > > > Flink PMC > > > > > - > > > > > > > > > > We codify updating these Dockerfiles and notifying Docker Hub > into > > > the > > > > > Flink release process > > > > > - > > > > > > > > > > For release candidates, Dockerfiles should be added to a > > special > > > > > directory which will be automatically built and pushed to the > > > > > Apache Docker > > > > > Hub organization[7], e.g. apache/flink-rc:1.10.0-rc1 > > > > > - > > > > > > > > > > Upon release, the appropriate “release” Dockerfiles are added > > > (e.g. > > > > > under the 1.10 directory) and release candidate Dockerfiles > > > > removed, > > > > > and > > > > > then a pull request opened on the > > docker-library/official-images > > > > > repository > > > > > - > > > > > > > > > > Optionally, we introduce “nightly” builds, with an automated > > process > > > > > building and pushing images to the Apache Docker Hub > organization, > > > > e.g. > > > > > apache/flink-dev:1.10-SNAPSHOT > > > > > > > > > > > > > > > If we choose to move forward in this direction, there are some > > further > > > > > steps we could take to improve the experience of both developing > and > > > > using > > > > > Flink with Docker (these are actually mostly orthogonal to the > > proposed > > > > > changes above, but I think this is a natural first step and should > > make > > > > the > > > > > following ideas easier to implement). > > > > > > > > > > First, there are important differences between images meant for > > running > > > > > Flink and those meant for development: the former should strictly > > > package > > > > > only released distributions of software and be as thin of a layer > as > > > > > possible over the software itself, while the latter can be used > > during > > > > > development and testing, and can easily be rebuilt from a “working > > > copy” > > > > of > > > > > the software’s source code. > > > > > > > > > > By standardizing on defining such “production” images in the > > > docker-flink > > > > > repository and “development” image(s) in the Flink repository > itself, > > > it > > > > is > > > > > much clearer to developers and users what the right Dockerfile or > > image > > > > > they should use for a given purpose. To that end, we could > introduce > > > one > > > > or > > > > > more documented Maven goals or Make targets for building a Docker > > image > > > > > from the current source tree or a specific release (including > > > unreleased > > > > or > > > > > unsupported versions). > > > > > > > > > > Additionally, there has been discussion among Flink contributors > for > > > some > > > > > time about the confusing state of Dockerfiles within the Flink > > > > repository, > > > > > each meant for a different way of running Flink. I’m not completely > > up > > > to > > > > > speed about these different efforts, but we could possibly solve > this > > > by > > > > > either building additional “official” images with different > > entrypoints > > > > for > > > > > these various purposes, or by developing an improved entrypoint > > script > > > > that > > > > > conveniently supports all cases. I defer to Till Rohrmann, > Konstantin > > > > > Knauf, or Stephan Ewen for further discussion on this point. > > > > > > > > > > I apologize again for the wall of text, but if you made it this > far, > > > > thank > > > > > you! These improvements have been a long time coming, and I hope we > > can > > > > > find a solution that serves the Flink and Docker communities well. > > > Please > > > > > don’t hesitate to ask any questions. > > > > > > > > > > -- > > > > > > > > > > Patrick Lucas > > > > > > > > > > [1] https://hub.docker.com/_/flink > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > https://lists.apache.org/thread.html/c50297f8659aaa59d4f2ae327b69c4d46d1ab8ecc53138e659e4fe91%40%3Cdev.flink.apache.org%3E > > > > > > > > > > [3] On page 2 at the time we went to press: > > > > > https://hub.docker.com/search?q=&type=image&image_filter=official > > > > > > > > > > [4] https://github.com/docker-flink/docker-flink > > > > > > > > > > [5] > > > > > > > > > > > > > > > > > > > > https://github.com/docker-library/official-images/pulls?q=is%3Apr+label%3Alibrary%2Fflink > > > > > > > > > > [6] I looked at the 25 most popular “official” images (see [3]) as > > well > > > > as > > > > > “official” images of Apache software from the top 125; all use a > > > > dedicated > > > > > repo > > > > > [7] https://hub.docker.com/u/apache > > > > > > > > > > > > > > > > > -- > > Konstantin Knauf | Solutions Architect > > +49 160 91394525 > > > Follow us @VervericaData Ververica <https://www.ververica.com/> > > > -- > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > Conference > > Stream Processing | Event Driven | Real Time > > -- > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > -- > Ververica GmbH > Registered at Amtsgericht Charlottenburg: HRB 158244 B > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji > (Tony) Cheng > |
Hi everyone,
First of all, thank you very much Patrick for maintaining and publishing the Flink Docker images so far and for starting this discussion! I'm in favor of adding the Dockerfiles in a separate repository and not in the main Flink repository. I also think that it makes sense to first focus on the contribution of the Dockerfiles and consolidation of existing Dockerfiles before discussing special cases for development and testing. In addition to the Dockerfiles in the Flink main repo, there is also one in the flink-playgrounds repo [1] to build a customized Docker image for the playground. Besides building and publishing "official" Flink images via DockerHub, there is also the option to let ASF Infra build Docker images and publish them under https://hub.docker.com/u/apache. These images would not be "official" DockerHub images anymore, but available under the Apache DockerHub user. However, I think it would be a good idea to keep the current setup for the main Flink images (those that depend on Flink releases) for better visibility and to not confuse our users. We might want to publish less critical images (playground images, dev images, nightly builds, etc) via Infra under the Apache DockerHub user. Best, Fabian Am Mo., 13. Jan. 2020 um 11:38 Uhr schrieb Ufuk Celebi <[hidden email]>: > Hey all, > > first of all a big thank you for driving many of the Docker image releases > in the last two years. > > *(1) Moving docker-flink/docker-flink to apache/docker-flink* > > +1 to do this as you outlined. I would propose to aim for a first > integration with the 1.10 release without major changes to the existing > Dockerfiles. The work items would be to move the Dockerfiles and update the > release process documentation so everyone is on the same page. > > *(2) Consolidate Dockerfiles in apache/flink* > > +1 to start the process for this. I think this requires a bit of thinking > about what the requirements are and which problems we want to solve. From > skimming the existing Dockerfiles, it seems to me that the Docker image > builds fulfil quite a few different tasks. We have a script that can bundle > Hadoop, can copy an existing Flink distribution, can include user jars, > etc. The scope of this is quite broad and would warrant a design document/a > FLIP. > > I would move the questions about nightly builds, using a different base > image or having image variants with debug tooling to after (1) and (2) or > make it part of (2). > > *(3) Next steps* > > If there are no objections, I would propose to tackle (1) and (2) separate > and to continue as follows: > > (i) Create tickets for (1) and aim to align with 1.10 release timeline > (ideally before the first RC). Since this does not touch any code in the > release branches, I think this would not be affected by the feature freeze. > The major work item would be to update the docs and potential refactorings > of the existing process and Dockerfiles. I can help with the process to > create a new repo. > > (ii) Create first draft for consolidation of existing Dockerfiles. After > this proposal is done, I would propose to bring it up for a separate > discussion on the ML. > > > What do you think? @Patrick: would you be interested in working on both (1) > + (2) or did you mainly have (1) in mind? > > Best, > > Ufuk > > On Sun, Jan 12, 2020 at 8:30 PM Konstantin Knauf <[hidden email] > > > wrote: > > > Big +1 for > > > > * official images in a separate repository > > * unified images (session cluster vs application cluster) > > * images for development in Apache flink repository > > > > On Fri, Jan 10, 2020 at 7:14 PM Till Rohrmann <[hidden email]> > > wrote: > > > > > Thanks a lot for starting this discussion Patrick! I think it is a very > > > good idea to move Flink's docker image more under the jurisdiction of > the > > > Flink PMC and to make it releasing new docker images part of Flink's > > > release process (not saying that we cannot release new docker images > > > independent of Flink's release cycle). > > > > > > One thing I have no strong opinion about is where to place the > > Dockerfiles > > > (apache/flink.git vs. apache/flink-docker.git). I see the point that > one > > > wants to separate concerns (Flink code vs. Dockerfiles) and, hence, > that > > > having separate repositories might help with this objective. But on the > > > other hand, I don't have a lot of experience with Docker Hub and how to > > > best host Dockerfiles. Consequently, it would be helpful if others who > > have > > > made some experience could share it with us. > > > > > > Cheers, > > > Till > > > > > > On Sat, Dec 21, 2019 at 2:28 PM Hequn Cheng <[hidden email]> > > wrote: > > > > > > > Hi Patrick, > > > > > > > > Thanks a lot for your continued work on the Docker images. That’s > > really > > > > really a great job! And I have also benefited from it. > > > > > > > > Big +1 for integrating docker image publication into the Flink > release > > > > process since we can leverage the Flink release process to make sure > a > > > more > > > > legitimacy docker publication. We can also check and vote on it > during > > > the > > > > release. > > > > > > > > I think the most import thing we need to discuss first is whether to > > > have a > > > > dedicated git repo for the Dockerfiles. > > > > > > > > Although it is convention shared by nearly every other “official” > image > > > on > > > > Docker Hub to have a dedicated repo, I'm still not sure about it. > > Maybe I > > > > have missed something important. From my point of view, I think it’s > > > better > > > > to have the Dockerfiles in the (main)Flink repo. > > > > - First, I think the Dockerfiles can be treated as part of the > > release. > > > > And it is also natural to put the corresponding version of the > > Dockerfile > > > > in the corresponding Flink release. > > > > - Second, we can put the Dockerfiles in the path like > > > > flink/docker-flink/version/ and the version varies in different > > releases. > > > > For example, for release 1.8.3, we have a flink/docker-flink/1.8.3 > > > > folder(or maybe flink/docker-flink/1.8). Even though all Dockerfiles > > for > > > > supported versions are not in one path but they are still in one Git > > tree > > > > with different refs. > > > > - Third, it seems the Docker Hub also supports specifying different > > > refs. > > > > For the file[1], we can change the GitRepo link from > > > > https://github.com/docker-flink/docker-flink.git to > > > > https://github.com/apache/flink.git and add a GitFetch for each tag, > > > e.g., > > > > GitFetch: refs/tags/release-1.8.3. There are some examples in the > file > > of > > > > ubuntu[2]. > > > > > > > > If the above assumptions are right and there are no more obstacles, > I'm > > > > intended to have these Dockerfiles in the main Flink repo. In this > > case, > > > we > > > > can reduce the number of repos and reduce the management overhead. > > > > What do you think? > > > > > > > > Best, > > > > Hequn > > > > > > > > [1] > > > > > > > > > > https://github.com/docker-library/official-images/blob/master/library/flink > > > > [2] > > > > > > > > > > > > > > https://github.com/docker-library/official-images/blob/master/library/ubuntu > > > > > > > > > > > > On Fri, Dec 20, 2019 at 5:29 PM Yang Wang <[hidden email]> > > wrote: > > > > > > > > > Big +1 for this effort. > > > > > > > > > > It is really exciting we have started this great work. More and > more > > > > > companies start to > > > > > use Flink in container environment(docker, Kubernetes, Mesos, even > > > > > Yarn-3.x). So it is > > > > > very important that we could have unified official image building > and > > > > > releasing process. > > > > > > > > > > > > > > > The image building process in this proposal is really good and i > just > > > > have > > > > > the following thoughts. > > > > > > > > > > >> Keep a dedicated repo for Dockerfiles to build official image > > > > > I think this is a good way and we do not need to make some > > unnecessary > > > > > changes to Flink repository. > > > > > > > > > > >> Integrate building image into the Flink release process > > > > > It will bring a better experience for container environment users. > In > > > my > > > > > opinion, a complete > > > > > release includes the official image. It should be verified to work > > > well. > > > > > > > > > > >> Nightly building > > > > > Do we support for all the release branch or just master branch? > > > > > > > > > > >> Multiple purpose Flink images > > > > > It is really indeed. In developing and testing process, we need > some > > > > > profiling tools to help > > > > > us investigate some problems. Currently, we do not even have > > > jstack/jmap > > > > in > > > > > the image. > > > > > > > > > > >> Unify the Dockerfile in Flink repository > > > > > In the current code base, we have > > flink-contrib/docker-flink/Dockerfile > > > > to > > > > > build a image > > > > > for session cluster. However, it is not updated. For per-job > cluster, > > > > > flink-container/docker/Dockerfile > > > > > could be used to build a flink image with user artifacts. I think > we > > > need > > > > > to unify them and > > > > > provide a more powerful build script and entry point. > > > > > > > > > > > > > > > > > > > > Best, > > > > > Yang > > > > > > > > > > Patrick Lucas <[hidden email]> 于2019年12月19日周四 下午9:20写道: > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > I would like to start a discussion about integrating publication > of > > > the > > > > > > Flink Docker images hosted on Docker Hub[1] more tightly with the > > > Flink > > > > > > release process. Apologies in advance for the long post. > > > > > > > > > > > > More than two and a half years ago (time flies!) we introduced > > > > “official” > > > > > > Docker images for Flink[2]. Since then, the popularity of running > > > > > > containerized applications in general and containerized Flink in > > > > > particular > > > > > > has continued to grow. Today, Flink is one of the most popular > > > > “official” > > > > > > images on Docker Hub[3]. > > > > > > > > > > > > > A graph of Flink Docker image pulls over time: > > > > > > > > > > > > > > > > > > > > > > > > > > > https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png > > > > > > > > > > > > “Official” is in quotation marks because while that’s how the > > Docker > > > > > > community refers to top-level images on Docker Hub (i.e. those > that > > > can > > > > > be > > > > > > run with just <docker run foo>), they are not official in the > sense > > > of > > > > > > being officially endorsed by the Flink PMC. > > > > > > > > > > > > I think it’s time for that to change. > > > > > > > > > > > > Currently, the Dockerfiles that produce these images are > maintained > > > in > > > > a > > > > > > repository called docker-flink[4] in a separate, > community-managed > > > > GitHub > > > > > > organization of the same name. When a new release of Flink is > > > > available, > > > > > or > > > > > > when other changes are necessary, these Dockerfiles—one per > > image—are > > > > > > updated, and then a pull request[5] is made to the Docker Hub > > > > > > official-images repo with an updated manifest of images and tags, > > > after > > > > > > which infrastructure run by Docker Hub builds, checks, and > > publishes > > > > the > > > > > > images. > > > > > > > > > > > > A question that has come up regularly is “Why are the Dockerfiles > > in > > > a > > > > > > separate repository from Flink?”, and there are a few different > > > > answers: > > > > > > > > > > > > - > > > > > > > > > > > > These Dockerfiles package only released, published > distributions > > > of > > > > > > Flink, and are therefore decoupled from a particular commit in > > the > > > > > Flink > > > > > > repo > > > > > > - > > > > > > > > > > > > All the Dockerfiles for supported versions (and the > > corresponding > > > > > Scala > > > > > > version variants) should be available in one Git tree for > > > > > > discoverability > > > > > > - > > > > > > > > > > > > The master branch of Flink is not the right place to encode > what > > > the > > > > > > supported versions are, or how to run previous versions of > > > Flink—it > > > > > > should > > > > > > be concerned with the point-in-time of the code represented in > > > that > > > > > > commit > > > > > > > > > > > > > > > > > > But mostly, having a dedicated repo for Dockerfiles is a > convention > > > > > shared > > > > > > by nearly every other “official” image on Docker Hub[6]. If the > > Flink > > > > > > community wants to do this differently, we will need to work with > > the > > > > > > Docker Hub maintainers to make sure we continue to work within > > their > > > > > > guidelines and expectations. > > > > > > > > > > > > While it seems intuitive that integrating these images into the > > Flink > > > > > > release process is a good thing, I don’t believe it is strictly > > > > > necessary, > > > > > > since the images only package approved and signed Flink releases, > > and > > > > do > > > > > > not themselves build Flink from source. However, there are some > > > > concrete > > > > > > advantages: > > > > > > > > > > > > - > > > > > > > > > > > > Putting the Docker images on (almost) equal footing with Flink > > > > binary > > > > > > release artifacts will help the legitimacy of and user > > confidence > > > in > > > > > > running Flink in containerized environments > > > > > > - > > > > > > > > > > > > By publishing release candidate (and possibly nightly) images, > > the > > > > > > release testing and automated testing processes could be > > improved > > > > > > - > > > > > > > > > > > > The delay between Flink releases and when the corresponding > > Docker > > > > > > images are available will be reduced > > > > > > > > > > > > > > > > > > Considering all of this, I propose the following: > > > > > > > > > > > > - > > > > > > > > > > > > We move the Git repository containing the Dockerfiles from the > > > > > > docker-flink GitHub organization to Apache, placing it under > > > control > > > > > of > > > > > > the > > > > > > Flink PMC > > > > > > - > > > > > > > > > > > > We codify updating these Dockerfiles and notifying Docker Hub > > into > > > > the > > > > > > Flink release process > > > > > > - > > > > > > > > > > > > For release candidates, Dockerfiles should be added to a > > > special > > > > > > directory which will be automatically built and pushed to > the > > > > > > Apache Docker > > > > > > Hub organization[7], e.g. apache/flink-rc:1.10.0-rc1 > > > > > > - > > > > > > > > > > > > Upon release, the appropriate “release” Dockerfiles are > added > > > > (e.g. > > > > > > under the 1.10 directory) and release candidate Dockerfiles > > > > > removed, > > > > > > and > > > > > > then a pull request opened on the > > > docker-library/official-images > > > > > > repository > > > > > > - > > > > > > > > > > > > Optionally, we introduce “nightly” builds, with an automated > > > process > > > > > > building and pushing images to the Apache Docker Hub > > organization, > > > > > e.g. > > > > > > apache/flink-dev:1.10-SNAPSHOT > > > > > > > > > > > > > > > > > > If we choose to move forward in this direction, there are some > > > further > > > > > > steps we could take to improve the experience of both developing > > and > > > > > using > > > > > > Flink with Docker (these are actually mostly orthogonal to the > > > proposed > > > > > > changes above, but I think this is a natural first step and > should > > > make > > > > > the > > > > > > following ideas easier to implement). > > > > > > > > > > > > First, there are important differences between images meant for > > > running > > > > > > Flink and those meant for development: the former should strictly > > > > package > > > > > > only released distributions of software and be as thin of a layer > > as > > > > > > possible over the software itself, while the latter can be used > > > during > > > > > > development and testing, and can easily be rebuilt from a > “working > > > > copy” > > > > > of > > > > > > the software’s source code. > > > > > > > > > > > > By standardizing on defining such “production” images in the > > > > docker-flink > > > > > > repository and “development” image(s) in the Flink repository > > itself, > > > > it > > > > > is > > > > > > much clearer to developers and users what the right Dockerfile or > > > image > > > > > > they should use for a given purpose. To that end, we could > > introduce > > > > one > > > > > or > > > > > > more documented Maven goals or Make targets for building a Docker > > > image > > > > > > from the current source tree or a specific release (including > > > > unreleased > > > > > or > > > > > > unsupported versions). > > > > > > > > > > > > Additionally, there has been discussion among Flink contributors > > for > > > > some > > > > > > time about the confusing state of Dockerfiles within the Flink > > > > > repository, > > > > > > each meant for a different way of running Flink. I’m not > completely > > > up > > > > to > > > > > > speed about these different efforts, but we could possibly solve > > this > > > > by > > > > > > either building additional “official” images with different > > > entrypoints > > > > > for > > > > > > these various purposes, or by developing an improved entrypoint > > > script > > > > > that > > > > > > conveniently supports all cases. I defer to Till Rohrmann, > > Konstantin > > > > > > Knauf, or Stephan Ewen for further discussion on this point. > > > > > > > > > > > > I apologize again for the wall of text, but if you made it this > > far, > > > > > thank > > > > > > you! These improvements have been a long time coming, and I hope > we > > > can > > > > > > find a solution that serves the Flink and Docker communities > well. > > > > Please > > > > > > don’t hesitate to ask any questions. > > > > > > > > > > > > -- > > > > > > > > > > > > Patrick Lucas > > > > > > > > > > > > [1] https://hub.docker.com/_/flink > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > https://lists.apache.org/thread.html/c50297f8659aaa59d4f2ae327b69c4d46d1ab8ecc53138e659e4fe91%40%3Cdev.flink.apache.org%3E > > > > > > > > > > > > [3] On page 2 at the time we went to press: > > > > > > > https://hub.docker.com/search?q=&type=image&image_filter=official > > > > > > > > > > > > [4] https://github.com/docker-flink/docker-flink > > > > > > > > > > > > [5] > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/docker-library/official-images/pulls?q=is%3Apr+label%3Alibrary%2Fflink > > > > > > > > > > > > [6] I looked at the 25 most popular “official” images (see [3]) > as > > > well > > > > > as > > > > > > “official” images of Apache software from the top 125; all use a > > > > > dedicated > > > > > > repo > > > > > > [7] https://hub.docker.com/u/apache > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Konstantin Knauf | Solutions Architect > > > > +49 160 91394525 > > > > > > Follow us @VervericaData Ververica <https://www.ververica.com/> > > > > > > -- > > > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > > Conference > > > > Stream Processing | Event Driven | Real Time > > > > -- > > > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > > > -- > > Ververica GmbH > > Registered at Amtsgericht Charlottenburg: HRB 158244 B > > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji > > (Tony) Cheng > > > |
+1 for Ufuk's proposal how to proceed. I guess the immediate next step
would be a VOTE for accepting the dockerfiles and where to store them. Cheers, Till On Wed, Jan 22, 2020 at 4:05 PM Fabian Hueske <[hidden email]> wrote: > Hi everyone, > > First of all, thank you very much Patrick for maintaining and publishing > the Flink Docker images so far and for starting this discussion! > > I'm in favor of adding the Dockerfiles in a separate repository and not in > the main Flink repository. > I also think that it makes sense to first focus on the contribution of the > Dockerfiles and consolidation of existing Dockerfiles before discussing > special cases for development and testing. > > In addition to the Dockerfiles in the Flink main repo, there is also one in > the flink-playgrounds repo [1] to build a customized Docker image for the > playground. > > Besides building and publishing "official" Flink images via DockerHub, > there is also the option to let ASF Infra build Docker images and publish > them under https://hub.docker.com/u/apache. > These images would not be "official" DockerHub images anymore, but > available under the Apache DockerHub user. > However, I think it would be a good idea to keep the current setup for the > main Flink images (those that depend on Flink releases) for better > visibility and to not confuse our users. > We might want to publish less critical images (playground images, dev > images, nightly builds, etc) via Infra under the Apache DockerHub user. > > Best, > Fabian > > Am Mo., 13. Jan. 2020 um 11:38 Uhr schrieb Ufuk Celebi <[hidden email]>: > > > Hey all, > > > > first of all a big thank you for driving many of the Docker image > releases > > in the last two years. > > > > *(1) Moving docker-flink/docker-flink to apache/docker-flink* > > > > +1 to do this as you outlined. I would propose to aim for a first > > integration with the 1.10 release without major changes to the existing > > Dockerfiles. The work items would be to move the Dockerfiles and update > the > > release process documentation so everyone is on the same page. > > > > *(2) Consolidate Dockerfiles in apache/flink* > > > > +1 to start the process for this. I think this requires a bit of thinking > > about what the requirements are and which problems we want to solve. From > > skimming the existing Dockerfiles, it seems to me that the Docker image > > builds fulfil quite a few different tasks. We have a script that can > bundle > > Hadoop, can copy an existing Flink distribution, can include user jars, > > etc. The scope of this is quite broad and would warrant a design > document/a > > FLIP. > > > > I would move the questions about nightly builds, using a different base > > image or having image variants with debug tooling to after (1) and (2) or > > make it part of (2). > > > > *(3) Next steps* > > > > If there are no objections, I would propose to tackle (1) and (2) > separate > > and to continue as follows: > > > > (i) Create tickets for (1) and aim to align with 1.10 release timeline > > (ideally before the first RC). Since this does not touch any code in the > > release branches, I think this would not be affected by the feature > freeze. > > The major work item would be to update the docs and potential > refactorings > > of the existing process and Dockerfiles. I can help with the process to > > create a new repo. > > > > (ii) Create first draft for consolidation of existing Dockerfiles. After > > this proposal is done, I would propose to bring it up for a separate > > discussion on the ML. > > > > > > What do you think? @Patrick: would you be interested in working on both > (1) > > + (2) or did you mainly have (1) in mind? > > > > Best, > > > > Ufuk > > > > On Sun, Jan 12, 2020 at 8:30 PM Konstantin Knauf < > [hidden email] > > > > > wrote: > > > > > Big +1 for > > > > > > * official images in a separate repository > > > * unified images (session cluster vs application cluster) > > > * images for development in Apache flink repository > > > > > > On Fri, Jan 10, 2020 at 7:14 PM Till Rohrmann <[hidden email]> > > > wrote: > > > > > > > Thanks a lot for starting this discussion Patrick! I think it is a > very > > > > good idea to move Flink's docker image more under the jurisdiction of > > the > > > > Flink PMC and to make it releasing new docker images part of Flink's > > > > release process (not saying that we cannot release new docker images > > > > independent of Flink's release cycle). > > > > > > > > One thing I have no strong opinion about is where to place the > > > Dockerfiles > > > > (apache/flink.git vs. apache/flink-docker.git). I see the point that > > one > > > > wants to separate concerns (Flink code vs. Dockerfiles) and, hence, > > that > > > > having separate repositories might help with this objective. But on > the > > > > other hand, I don't have a lot of experience with Docker Hub and how > to > > > > best host Dockerfiles. Consequently, it would be helpful if others > who > > > have > > > > made some experience could share it with us. > > > > > > > > Cheers, > > > > Till > > > > > > > > On Sat, Dec 21, 2019 at 2:28 PM Hequn Cheng <[hidden email]> > > > wrote: > > > > > > > > > Hi Patrick, > > > > > > > > > > Thanks a lot for your continued work on the Docker images. That’s > > > really > > > > > really a great job! And I have also benefited from it. > > > > > > > > > > Big +1 for integrating docker image publication into the Flink > > release > > > > > process since we can leverage the Flink release process to make > sure > > a > > > > more > > > > > legitimacy docker publication. We can also check and vote on it > > during > > > > the > > > > > release. > > > > > > > > > > I think the most import thing we need to discuss first is whether > to > > > > have a > > > > > dedicated git repo for the Dockerfiles. > > > > > > > > > > Although it is convention shared by nearly every other “official” > > image > > > > on > > > > > Docker Hub to have a dedicated repo, I'm still not sure about it. > > > Maybe I > > > > > have missed something important. From my point of view, I think > it’s > > > > better > > > > > to have the Dockerfiles in the (main)Flink repo. > > > > > - First, I think the Dockerfiles can be treated as part of the > > > release. > > > > > And it is also natural to put the corresponding version of the > > > Dockerfile > > > > > in the corresponding Flink release. > > > > > - Second, we can put the Dockerfiles in the path like > > > > > flink/docker-flink/version/ and the version varies in different > > > releases. > > > > > For example, for release 1.8.3, we have a flink/docker-flink/1.8.3 > > > > > folder(or maybe flink/docker-flink/1.8). Even though all > Dockerfiles > > > for > > > > > supported versions are not in one path but they are still in one > Git > > > tree > > > > > with different refs. > > > > > - Third, it seems the Docker Hub also supports specifying > different > > > > refs. > > > > > For the file[1], we can change the GitRepo link from > > > > > https://github.com/docker-flink/docker-flink.git to > > > > > https://github.com/apache/flink.git and add a GitFetch for each > tag, > > > > e.g., > > > > > GitFetch: refs/tags/release-1.8.3. There are some examples in the > > file > > > of > > > > > ubuntu[2]. > > > > > > > > > > If the above assumptions are right and there are no more obstacles, > > I'm > > > > > intended to have these Dockerfiles in the main Flink repo. In this > > > case, > > > > we > > > > > can reduce the number of repos and reduce the management overhead. > > > > > What do you think? > > > > > > > > > > Best, > > > > > Hequn > > > > > > > > > > [1] > > > > > > > > > > > > > > > https://github.com/docker-library/official-images/blob/master/library/flink > > > > > [2] > > > > > > > > > > > > > > > > > > > > https://github.com/docker-library/official-images/blob/master/library/ubuntu > > > > > > > > > > > > > > > On Fri, Dec 20, 2019 at 5:29 PM Yang Wang <[hidden email]> > > > wrote: > > > > > > > > > > > Big +1 for this effort. > > > > > > > > > > > > It is really exciting we have started this great work. More and > > more > > > > > > companies start to > > > > > > use Flink in container environment(docker, Kubernetes, Mesos, > even > > > > > > Yarn-3.x). So it is > > > > > > very important that we could have unified official image building > > and > > > > > > releasing process. > > > > > > > > > > > > > > > > > > The image building process in this proposal is really good and i > > just > > > > > have > > > > > > the following thoughts. > > > > > > > > > > > > >> Keep a dedicated repo for Dockerfiles to build official image > > > > > > I think this is a good way and we do not need to make some > > > unnecessary > > > > > > changes to Flink repository. > > > > > > > > > > > > >> Integrate building image into the Flink release process > > > > > > It will bring a better experience for container environment > users. > > In > > > > my > > > > > > opinion, a complete > > > > > > release includes the official image. It should be verified to > work > > > > well. > > > > > > > > > > > > >> Nightly building > > > > > > Do we support for all the release branch or just master branch? > > > > > > > > > > > > >> Multiple purpose Flink images > > > > > > It is really indeed. In developing and testing process, we need > > some > > > > > > profiling tools to help > > > > > > us investigate some problems. Currently, we do not even have > > > > jstack/jmap > > > > > in > > > > > > the image. > > > > > > > > > > > > >> Unify the Dockerfile in Flink repository > > > > > > In the current code base, we have > > > flink-contrib/docker-flink/Dockerfile > > > > > to > > > > > > build a image > > > > > > for session cluster. However, it is not updated. For per-job > > cluster, > > > > > > flink-container/docker/Dockerfile > > > > > > could be used to build a flink image with user artifacts. I think > > we > > > > need > > > > > > to unify them and > > > > > > provide a more powerful build script and entry point. > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > Yang > > > > > > > > > > > > Patrick Lucas <[hidden email]> 于2019年12月19日周四 下午9:20写道: > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > I would like to start a discussion about integrating > publication > > of > > > > the > > > > > > > Flink Docker images hosted on Docker Hub[1] more tightly with > the > > > > Flink > > > > > > > release process. Apologies in advance for the long post. > > > > > > > > > > > > > > More than two and a half years ago (time flies!) we introduced > > > > > “official” > > > > > > > Docker images for Flink[2]. Since then, the popularity of > running > > > > > > > containerized applications in general and containerized Flink > in > > > > > > particular > > > > > > > has continued to grow. Today, Flink is one of the most popular > > > > > “official” > > > > > > > images on Docker Hub[3]. > > > > > > > > > > > > > > > A graph of Flink Docker image pulls over time: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png > > > > > > > > > > > > > > “Official” is in quotation marks because while that’s how the > > > Docker > > > > > > > community refers to top-level images on Docker Hub (i.e. those > > that > > > > can > > > > > > be > > > > > > > run with just <docker run foo>), they are not official in the > > sense > > > > of > > > > > > > being officially endorsed by the Flink PMC. > > > > > > > > > > > > > > I think it’s time for that to change. > > > > > > > > > > > > > > Currently, the Dockerfiles that produce these images are > > maintained > > > > in > > > > > a > > > > > > > repository called docker-flink[4] in a separate, > > community-managed > > > > > GitHub > > > > > > > organization of the same name. When a new release of Flink is > > > > > available, > > > > > > or > > > > > > > when other changes are necessary, these Dockerfiles—one per > > > image—are > > > > > > > updated, and then a pull request[5] is made to the Docker Hub > > > > > > > official-images repo with an updated manifest of images and > tags, > > > > after > > > > > > > which infrastructure run by Docker Hub builds, checks, and > > > publishes > > > > > the > > > > > > > images. > > > > > > > > > > > > > > A question that has come up regularly is “Why are the > Dockerfiles > > > in > > > > a > > > > > > > separate repository from Flink?”, and there are a few different > > > > > answers: > > > > > > > > > > > > > > - > > > > > > > > > > > > > > These Dockerfiles package only released, published > > distributions > > > > of > > > > > > > Flink, and are therefore decoupled from a particular commit > in > > > the > > > > > > Flink > > > > > > > repo > > > > > > > - > > > > > > > > > > > > > > All the Dockerfiles for supported versions (and the > > > corresponding > > > > > > Scala > > > > > > > version variants) should be available in one Git tree for > > > > > > > discoverability > > > > > > > - > > > > > > > > > > > > > > The master branch of Flink is not the right place to encode > > what > > > > the > > > > > > > supported versions are, or how to run previous versions of > > > > Flink—it > > > > > > > should > > > > > > > be concerned with the point-in-time of the code represented > in > > > > that > > > > > > > commit > > > > > > > > > > > > > > > > > > > > > But mostly, having a dedicated repo for Dockerfiles is a > > convention > > > > > > shared > > > > > > > by nearly every other “official” image on Docker Hub[6]. If the > > > Flink > > > > > > > community wants to do this differently, we will need to work > with > > > the > > > > > > > Docker Hub maintainers to make sure we continue to work within > > > their > > > > > > > guidelines and expectations. > > > > > > > > > > > > > > While it seems intuitive that integrating these images into the > > > Flink > > > > > > > release process is a good thing, I don’t believe it is strictly > > > > > > necessary, > > > > > > > since the images only package approved and signed Flink > releases, > > > and > > > > > do > > > > > > > not themselves build Flink from source. However, there are some > > > > > concrete > > > > > > > advantages: > > > > > > > > > > > > > > - > > > > > > > > > > > > > > Putting the Docker images on (almost) equal footing with > Flink > > > > > binary > > > > > > > release artifacts will help the legitimacy of and user > > > confidence > > > > in > > > > > > > running Flink in containerized environments > > > > > > > - > > > > > > > > > > > > > > By publishing release candidate (and possibly nightly) > images, > > > the > > > > > > > release testing and automated testing processes could be > > > improved > > > > > > > - > > > > > > > > > > > > > > The delay between Flink releases and when the corresponding > > > Docker > > > > > > > images are available will be reduced > > > > > > > > > > > > > > > > > > > > > Considering all of this, I propose the following: > > > > > > > > > > > > > > - > > > > > > > > > > > > > > We move the Git repository containing the Dockerfiles from > the > > > > > > > docker-flink GitHub organization to Apache, placing it under > > > > control > > > > > > of > > > > > > > the > > > > > > > Flink PMC > > > > > > > - > > > > > > > > > > > > > > We codify updating these Dockerfiles and notifying Docker > Hub > > > into > > > > > the > > > > > > > Flink release process > > > > > > > - > > > > > > > > > > > > > > For release candidates, Dockerfiles should be added to a > > > > special > > > > > > > directory which will be automatically built and pushed to > > the > > > > > > > Apache Docker > > > > > > > Hub organization[7], e.g. apache/flink-rc:1.10.0-rc1 > > > > > > > - > > > > > > > > > > > > > > Upon release, the appropriate “release” Dockerfiles are > > added > > > > > (e.g. > > > > > > > under the 1.10 directory) and release candidate > Dockerfiles > > > > > > removed, > > > > > > > and > > > > > > > then a pull request opened on the > > > > docker-library/official-images > > > > > > > repository > > > > > > > - > > > > > > > > > > > > > > Optionally, we introduce “nightly” builds, with an automated > > > > process > > > > > > > building and pushing images to the Apache Docker Hub > > > organization, > > > > > > e.g. > > > > > > > apache/flink-dev:1.10-SNAPSHOT > > > > > > > > > > > > > > > > > > > > > If we choose to move forward in this direction, there are some > > > > further > > > > > > > steps we could take to improve the experience of both > developing > > > and > > > > > > using > > > > > > > Flink with Docker (these are actually mostly orthogonal to the > > > > proposed > > > > > > > changes above, but I think this is a natural first step and > > should > > > > make > > > > > > the > > > > > > > following ideas easier to implement). > > > > > > > > > > > > > > First, there are important differences between images meant for > > > > running > > > > > > > Flink and those meant for development: the former should > strictly > > > > > package > > > > > > > only released distributions of software and be as thin of a > layer > > > as > > > > > > > possible over the software itself, while the latter can be used > > > > during > > > > > > > development and testing, and can easily be rebuilt from a > > “working > > > > > copy” > > > > > > of > > > > > > > the software’s source code. > > > > > > > > > > > > > > By standardizing on defining such “production” images in the > > > > > docker-flink > > > > > > > repository and “development” image(s) in the Flink repository > > > itself, > > > > > it > > > > > > is > > > > > > > much clearer to developers and users what the right Dockerfile > or > > > > image > > > > > > > they should use for a given purpose. To that end, we could > > > introduce > > > > > one > > > > > > or > > > > > > > more documented Maven goals or Make targets for building a > Docker > > > > image > > > > > > > from the current source tree or a specific release (including > > > > > unreleased > > > > > > or > > > > > > > unsupported versions). > > > > > > > > > > > > > > Additionally, there has been discussion among Flink > contributors > > > for > > > > > some > > > > > > > time about the confusing state of Dockerfiles within the Flink > > > > > > repository, > > > > > > > each meant for a different way of running Flink. I’m not > > completely > > > > up > > > > > to > > > > > > > speed about these different efforts, but we could possibly > solve > > > this > > > > > by > > > > > > > either building additional “official” images with different > > > > entrypoints > > > > > > for > > > > > > > these various purposes, or by developing an improved entrypoint > > > > script > > > > > > that > > > > > > > conveniently supports all cases. I defer to Till Rohrmann, > > > Konstantin > > > > > > > Knauf, or Stephan Ewen for further discussion on this point. > > > > > > > > > > > > > > I apologize again for the wall of text, but if you made it this > > > far, > > > > > > thank > > > > > > > you! These improvements have been a long time coming, and I > hope > > we > > > > can > > > > > > > find a solution that serves the Flink and Docker communities > > well. > > > > > Please > > > > > > > don’t hesitate to ask any questions. > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > Patrick Lucas > > > > > > > > > > > > > > [1] https://hub.docker.com/_/flink > > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://lists.apache.org/thread.html/c50297f8659aaa59d4f2ae327b69c4d46d1ab8ecc53138e659e4fe91%40%3Cdev.flink.apache.org%3E > > > > > > > > > > > > > > [3] On page 2 at the time we went to press: > > > > > > > > > https://hub.docker.com/search?q=&type=image&image_filter=official > > > > > > > > > > > > > > [4] https://github.com/docker-flink/docker-flink > > > > > > > > > > > > > > [5] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/docker-library/official-images/pulls?q=is%3Apr+label%3Alibrary%2Fflink > > > > > > > > > > > > > > [6] I looked at the 25 most popular “official” images (see [3]) > > as > > > > well > > > > > > as > > > > > > > “official” images of Apache software from the top 125; all use > a > > > > > > dedicated > > > > > > > repo > > > > > > > [7] https://hub.docker.com/u/apache > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Konstantin Knauf | Solutions Architect > > > > > > +49 160 91394525 > > > > > > > > > Follow us @VervericaData Ververica <https://www.ververica.com/> > > > > > > > > > -- > > > > > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > > > Conference > > > > > > Stream Processing | Event Driven | Real Time > > > > > > -- > > > > > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > > > > > -- > > > Ververica GmbH > > > Registered at Amtsgericht Charlottenburg: HRB 158244 B > > > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji > > > (Tony) Cheng > > > > > > |
Thanks all for chiming in. I'll continue tomorrow with a VOTE as suggested
by Till. Regarding my initially proposed timeline: I don't think we will have everything ready before the first 1.10 RC, but I also think it's not that big of a deal. ;-) – Ufuk On Fri, Jan 24, 2020 at 11:59 AM Till Rohrmann <[hidden email]> wrote: > +1 for Ufuk's proposal how to proceed. I guess the immediate next step > would be a VOTE for accepting the dockerfiles and where to store them. > > Cheers, > Till > > On Wed, Jan 22, 2020 at 4:05 PM Fabian Hueske <[hidden email]> wrote: > > > Hi everyone, > > > > First of all, thank you very much Patrick for maintaining and publishing > > the Flink Docker images so far and for starting this discussion! > > > > I'm in favor of adding the Dockerfiles in a separate repository and not > in > > the main Flink repository. > > I also think that it makes sense to first focus on the contribution of > the > > Dockerfiles and consolidation of existing Dockerfiles before discussing > > special cases for development and testing. > > > > In addition to the Dockerfiles in the Flink main repo, there is also one > in > > the flink-playgrounds repo [1] to build a customized Docker image for the > > playground. > > > > Besides building and publishing "official" Flink images via DockerHub, > > there is also the option to let ASF Infra build Docker images and publish > > them under https://hub.docker.com/u/apache. > > These images would not be "official" DockerHub images anymore, but > > available under the Apache DockerHub user. > > However, I think it would be a good idea to keep the current setup for > the > > main Flink images (those that depend on Flink releases) for better > > visibility and to not confuse our users. > > We might want to publish less critical images (playground images, dev > > images, nightly builds, etc) via Infra under the Apache DockerHub user. > > > > Best, > > Fabian > > > > Am Mo., 13. Jan. 2020 um 11:38 Uhr schrieb Ufuk Celebi <[hidden email]>: > > > > > Hey all, > > > > > > first of all a big thank you for driving many of the Docker image > > releases > > > in the last two years. > > > > > > *(1) Moving docker-flink/docker-flink to apache/docker-flink* > > > > > > +1 to do this as you outlined. I would propose to aim for a first > > > integration with the 1.10 release without major changes to the existing > > > Dockerfiles. The work items would be to move the Dockerfiles and update > > the > > > release process documentation so everyone is on the same page. > > > > > > *(2) Consolidate Dockerfiles in apache/flink* > > > > > > +1 to start the process for this. I think this requires a bit of > thinking > > > about what the requirements are and which problems we want to solve. > From > > > skimming the existing Dockerfiles, it seems to me that the Docker image > > > builds fulfil quite a few different tasks. We have a script that can > > bundle > > > Hadoop, can copy an existing Flink distribution, can include user jars, > > > etc. The scope of this is quite broad and would warrant a design > > document/a > > > FLIP. > > > > > > I would move the questions about nightly builds, using a different base > > > image or having image variants with debug tooling to after (1) and (2) > or > > > make it part of (2). > > > > > > *(3) Next steps* > > > > > > If there are no objections, I would propose to tackle (1) and (2) > > separate > > > and to continue as follows: > > > > > > (i) Create tickets for (1) and aim to align with 1.10 release timeline > > > (ideally before the first RC). Since this does not touch any code in > the > > > release branches, I think this would not be affected by the feature > > freeze. > > > The major work item would be to update the docs and potential > > refactorings > > > of the existing process and Dockerfiles. I can help with the process to > > > create a new repo. > > > > > > (ii) Create first draft for consolidation of existing Dockerfiles. > After > > > this proposal is done, I would propose to bring it up for a separate > > > discussion on the ML. > > > > > > > > > What do you think? @Patrick: would you be interested in working on both > > (1) > > > + (2) or did you mainly have (1) in mind? > > > > > > Best, > > > > > > Ufuk > > > > > > On Sun, Jan 12, 2020 at 8:30 PM Konstantin Knauf < > > [hidden email] > > > > > > > wrote: > > > > > > > Big +1 for > > > > > > > > * official images in a separate repository > > > > * unified images (session cluster vs application cluster) > > > > * images for development in Apache flink repository > > > > > > > > On Fri, Jan 10, 2020 at 7:14 PM Till Rohrmann <[hidden email]> > > > > wrote: > > > > > > > > > Thanks a lot for starting this discussion Patrick! I think it is a > > very > > > > > good idea to move Flink's docker image more under the jurisdiction > of > > > the > > > > > Flink PMC and to make it releasing new docker images part of > Flink's > > > > > release process (not saying that we cannot release new docker > images > > > > > independent of Flink's release cycle). > > > > > > > > > > One thing I have no strong opinion about is where to place the > > > > Dockerfiles > > > > > (apache/flink.git vs. apache/flink-docker.git). I see the point > that > > > one > > > > > wants to separate concerns (Flink code vs. Dockerfiles) and, hence, > > > that > > > > > having separate repositories might help with this objective. But on > > the > > > > > other hand, I don't have a lot of experience with Docker Hub and > how > > to > > > > > best host Dockerfiles. Consequently, it would be helpful if others > > who > > > > have > > > > > made some experience could share it with us. > > > > > > > > > > Cheers, > > > > > Till > > > > > > > > > > On Sat, Dec 21, 2019 at 2:28 PM Hequn Cheng <[hidden email]> > > > > wrote: > > > > > > > > > > > Hi Patrick, > > > > > > > > > > > > Thanks a lot for your continued work on the Docker images. That’s > > > > really > > > > > > really a great job! And I have also benefited from it. > > > > > > > > > > > > Big +1 for integrating docker image publication into the Flink > > > release > > > > > > process since we can leverage the Flink release process to make > > sure > > > a > > > > > more > > > > > > legitimacy docker publication. We can also check and vote on it > > > during > > > > > the > > > > > > release. > > > > > > > > > > > > I think the most import thing we need to discuss first is whether > > to > > > > > have a > > > > > > dedicated git repo for the Dockerfiles. > > > > > > > > > > > > Although it is convention shared by nearly every other “official” > > > image > > > > > on > > > > > > Docker Hub to have a dedicated repo, I'm still not sure about it. > > > > Maybe I > > > > > > have missed something important. From my point of view, I think > > it’s > > > > > better > > > > > > to have the Dockerfiles in the (main)Flink repo. > > > > > > - First, I think the Dockerfiles can be treated as part of the > > > > release. > > > > > > And it is also natural to put the corresponding version of the > > > > Dockerfile > > > > > > in the corresponding Flink release. > > > > > > - Second, we can put the Dockerfiles in the path like > > > > > > flink/docker-flink/version/ and the version varies in different > > > > releases. > > > > > > For example, for release 1.8.3, we have a > flink/docker-flink/1.8.3 > > > > > > folder(or maybe flink/docker-flink/1.8). Even though all > > Dockerfiles > > > > for > > > > > > supported versions are not in one path but they are still in one > > Git > > > > tree > > > > > > with different refs. > > > > > > - Third, it seems the Docker Hub also supports specifying > > different > > > > > refs. > > > > > > For the file[1], we can change the GitRepo link from > > > > > > https://github.com/docker-flink/docker-flink.git to > > > > > > https://github.com/apache/flink.git and add a GitFetch for each > > tag, > > > > > e.g., > > > > > > GitFetch: refs/tags/release-1.8.3. There are some examples in the > > > file > > > > of > > > > > > ubuntu[2]. > > > > > > > > > > > > If the above assumptions are right and there are no more > obstacles, > > > I'm > > > > > > intended to have these Dockerfiles in the main Flink repo. In > this > > > > case, > > > > > we > > > > > > can reduce the number of repos and reduce the management > overhead. > > > > > > What do you think? > > > > > > > > > > > > Best, > > > > > > Hequn > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > https://github.com/docker-library/official-images/blob/master/library/flink > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/docker-library/official-images/blob/master/library/ubuntu > > > > > > > > > > > > > > > > > > On Fri, Dec 20, 2019 at 5:29 PM Yang Wang <[hidden email] > > > > > > wrote: > > > > > > > > > > > > > Big +1 for this effort. > > > > > > > > > > > > > > It is really exciting we have started this great work. More and > > > more > > > > > > > companies start to > > > > > > > use Flink in container environment(docker, Kubernetes, Mesos, > > even > > > > > > > Yarn-3.x). So it is > > > > > > > very important that we could have unified official image > building > > > and > > > > > > > releasing process. > > > > > > > > > > > > > > > > > > > > > The image building process in this proposal is really good and > i > > > just > > > > > > have > > > > > > > the following thoughts. > > > > > > > > > > > > > > >> Keep a dedicated repo for Dockerfiles to build official > image > > > > > > > I think this is a good way and we do not need to make some > > > > unnecessary > > > > > > > changes to Flink repository. > > > > > > > > > > > > > > >> Integrate building image into the Flink release process > > > > > > > It will bring a better experience for container environment > > users. > > > In > > > > > my > > > > > > > opinion, a complete > > > > > > > release includes the official image. It should be verified to > > work > > > > > well. > > > > > > > > > > > > > > >> Nightly building > > > > > > > Do we support for all the release branch or just master branch? > > > > > > > > > > > > > > >> Multiple purpose Flink images > > > > > > > It is really indeed. In developing and testing process, we need > > > some > > > > > > > profiling tools to help > > > > > > > us investigate some problems. Currently, we do not even have > > > > > jstack/jmap > > > > > > in > > > > > > > the image. > > > > > > > > > > > > > > >> Unify the Dockerfile in Flink repository > > > > > > > In the current code base, we have > > > > flink-contrib/docker-flink/Dockerfile > > > > > > to > > > > > > > build a image > > > > > > > for session cluster. However, it is not updated. For per-job > > > cluster, > > > > > > > flink-container/docker/Dockerfile > > > > > > > could be used to build a flink image with user artifacts. I > think > > > we > > > > > need > > > > > > > to unify them and > > > > > > > provide a more powerful build script and entry point. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > Yang > > > > > > > > > > > > > > Patrick Lucas <[hidden email]> 于2019年12月19日周四 下午9:20写道: > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > > > I would like to start a discussion about integrating > > publication > > > of > > > > > the > > > > > > > > Flink Docker images hosted on Docker Hub[1] more tightly with > > the > > > > > Flink > > > > > > > > release process. Apologies in advance for the long post. > > > > > > > > > > > > > > > > More than two and a half years ago (time flies!) we > introduced > > > > > > “official” > > > > > > > > Docker images for Flink[2]. Since then, the popularity of > > running > > > > > > > > containerized applications in general and containerized Flink > > in > > > > > > > particular > > > > > > > > has continued to grow. Today, Flink is one of the most > popular > > > > > > “official” > > > > > > > > images on Docker Hub[3]. > > > > > > > > > > > > > > > > > A graph of Flink Docker image pulls over time: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png > > > > > > > > > > > > > > > > “Official” is in quotation marks because while that’s how the > > > > Docker > > > > > > > > community refers to top-level images on Docker Hub (i.e. > those > > > that > > > > > can > > > > > > > be > > > > > > > > run with just <docker run foo>), they are not official in the > > > sense > > > > > of > > > > > > > > being officially endorsed by the Flink PMC. > > > > > > > > > > > > > > > > I think it’s time for that to change. > > > > > > > > > > > > > > > > Currently, the Dockerfiles that produce these images are > > > maintained > > > > > in > > > > > > a > > > > > > > > repository called docker-flink[4] in a separate, > > > community-managed > > > > > > GitHub > > > > > > > > organization of the same name. When a new release of Flink is > > > > > > available, > > > > > > > or > > > > > > > > when other changes are necessary, these Dockerfiles—one per > > > > image—are > > > > > > > > updated, and then a pull request[5] is made to the Docker Hub > > > > > > > > official-images repo with an updated manifest of images and > > tags, > > > > > after > > > > > > > > which infrastructure run by Docker Hub builds, checks, and > > > > publishes > > > > > > the > > > > > > > > images. > > > > > > > > > > > > > > > > A question that has come up regularly is “Why are the > > Dockerfiles > > > > in > > > > > a > > > > > > > > separate repository from Flink?”, and there are a few > different > > > > > > answers: > > > > > > > > > > > > > > > > - > > > > > > > > > > > > > > > > These Dockerfiles package only released, published > > > distributions > > > > > of > > > > > > > > Flink, and are therefore decoupled from a particular > commit > > in > > > > the > > > > > > > Flink > > > > > > > > repo > > > > > > > > - > > > > > > > > > > > > > > > > All the Dockerfiles for supported versions (and the > > > > corresponding > > > > > > > Scala > > > > > > > > version variants) should be available in one Git tree for > > > > > > > > discoverability > > > > > > > > - > > > > > > > > > > > > > > > > The master branch of Flink is not the right place to > encode > > > what > > > > > the > > > > > > > > supported versions are, or how to run previous versions of > > > > > Flink—it > > > > > > > > should > > > > > > > > be concerned with the point-in-time of the code > represented > > in > > > > > that > > > > > > > > commit > > > > > > > > > > > > > > > > > > > > > > > > But mostly, having a dedicated repo for Dockerfiles is a > > > convention > > > > > > > shared > > > > > > > > by nearly every other “official” image on Docker Hub[6]. If > the > > > > Flink > > > > > > > > community wants to do this differently, we will need to work > > with > > > > the > > > > > > > > Docker Hub maintainers to make sure we continue to work > within > > > > their > > > > > > > > guidelines and expectations. > > > > > > > > > > > > > > > > While it seems intuitive that integrating these images into > the > > > > Flink > > > > > > > > release process is a good thing, I don’t believe it is > strictly > > > > > > > necessary, > > > > > > > > since the images only package approved and signed Flink > > releases, > > > > and > > > > > > do > > > > > > > > not themselves build Flink from source. However, there are > some > > > > > > concrete > > > > > > > > advantages: > > > > > > > > > > > > > > > > - > > > > > > > > > > > > > > > > Putting the Docker images on (almost) equal footing with > > Flink > > > > > > binary > > > > > > > > release artifacts will help the legitimacy of and user > > > > confidence > > > > > in > > > > > > > > running Flink in containerized environments > > > > > > > > - > > > > > > > > > > > > > > > > By publishing release candidate (and possibly nightly) > > images, > > > > the > > > > > > > > release testing and automated testing processes could be > > > > improved > > > > > > > > - > > > > > > > > > > > > > > > > The delay between Flink releases and when the > corresponding > > > > Docker > > > > > > > > images are available will be reduced > > > > > > > > > > > > > > > > > > > > > > > > Considering all of this, I propose the following: > > > > > > > > > > > > > > > > - > > > > > > > > > > > > > > > > We move the Git repository containing the Dockerfiles from > > the > > > > > > > > docker-flink GitHub organization to Apache, placing it > under > > > > > control > > > > > > > of > > > > > > > > the > > > > > > > > Flink PMC > > > > > > > > - > > > > > > > > > > > > > > > > We codify updating these Dockerfiles and notifying Docker > > Hub > > > > into > > > > > > the > > > > > > > > Flink release process > > > > > > > > - > > > > > > > > > > > > > > > > For release candidates, Dockerfiles should be added to > a > > > > > special > > > > > > > > directory which will be automatically built and pushed > to > > > the > > > > > > > > Apache Docker > > > > > > > > Hub organization[7], e.g. apache/flink-rc:1.10.0-rc1 > > > > > > > > - > > > > > > > > > > > > > > > > Upon release, the appropriate “release” Dockerfiles are > > > added > > > > > > (e.g. > > > > > > > > under the 1.10 directory) and release candidate > > Dockerfiles > > > > > > > removed, > > > > > > > > and > > > > > > > > then a pull request opened on the > > > > > docker-library/official-images > > > > > > > > repository > > > > > > > > - > > > > > > > > > > > > > > > > Optionally, we introduce “nightly” builds, with an > automated > > > > > process > > > > > > > > building and pushing images to the Apache Docker Hub > > > > organization, > > > > > > > e.g. > > > > > > > > apache/flink-dev:1.10-SNAPSHOT > > > > > > > > > > > > > > > > > > > > > > > > If we choose to move forward in this direction, there are > some > > > > > further > > > > > > > > steps we could take to improve the experience of both > > developing > > > > and > > > > > > > using > > > > > > > > Flink with Docker (these are actually mostly orthogonal to > the > > > > > proposed > > > > > > > > changes above, but I think this is a natural first step and > > > should > > > > > make > > > > > > > the > > > > > > > > following ideas easier to implement). > > > > > > > > > > > > > > > > First, there are important differences between images meant > for > > > > > running > > > > > > > > Flink and those meant for development: the former should > > strictly > > > > > > package > > > > > > > > only released distributions of software and be as thin of a > > layer > > > > as > > > > > > > > possible over the software itself, while the latter can be > used > > > > > during > > > > > > > > development and testing, and can easily be rebuilt from a > > > “working > > > > > > copy” > > > > > > > of > > > > > > > > the software’s source code. > > > > > > > > > > > > > > > > By standardizing on defining such “production” images in the > > > > > > docker-flink > > > > > > > > repository and “development” image(s) in the Flink repository > > > > itself, > > > > > > it > > > > > > > is > > > > > > > > much clearer to developers and users what the right > Dockerfile > > or > > > > > image > > > > > > > > they should use for a given purpose. To that end, we could > > > > introduce > > > > > > one > > > > > > > or > > > > > > > > more documented Maven goals or Make targets for building a > > Docker > > > > > image > > > > > > > > from the current source tree or a specific release (including > > > > > > unreleased > > > > > > > or > > > > > > > > unsupported versions). > > > > > > > > > > > > > > > > Additionally, there has been discussion among Flink > > contributors > > > > for > > > > > > some > > > > > > > > time about the confusing state of Dockerfiles within the > Flink > > > > > > > repository, > > > > > > > > each meant for a different way of running Flink. I’m not > > > completely > > > > > up > > > > > > to > > > > > > > > speed about these different efforts, but we could possibly > > solve > > > > this > > > > > > by > > > > > > > > either building additional “official” images with different > > > > > entrypoints > > > > > > > for > > > > > > > > these various purposes, or by developing an improved > entrypoint > > > > > script > > > > > > > that > > > > > > > > conveniently supports all cases. I defer to Till Rohrmann, > > > > Konstantin > > > > > > > > Knauf, or Stephan Ewen for further discussion on this point. > > > > > > > > > > > > > > > > I apologize again for the wall of text, but if you made it > this > > > > far, > > > > > > > thank > > > > > > > > you! These improvements have been a long time coming, and I > > hope > > > we > > > > > can > > > > > > > > find a solution that serves the Flink and Docker communities > > > well. > > > > > > Please > > > > > > > > don’t hesitate to ask any questions. > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > Patrick Lucas > > > > > > > > > > > > > > > > [1] https://hub.docker.com/_/flink > > > > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://lists.apache.org/thread.html/c50297f8659aaa59d4f2ae327b69c4d46d1ab8ecc53138e659e4fe91%40%3Cdev.flink.apache.org%3E > > > > > > > > > > > > > > > > [3] On page 2 at the time we went to press: > > > > > > > > > > > https://hub.docker.com/search?q=&type=image&image_filter=official > > > > > > > > > > > > > > > > [4] https://github.com/docker-flink/docker-flink > > > > > > > > > > > > > > > > [5] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/docker-library/official-images/pulls?q=is%3Apr+label%3Alibrary%2Fflink > > > > > > > > > > > > > > > > [6] I looked at the 25 most popular “official” images (see > [3]) > > > as > > > > > well > > > > > > > as > > > > > > > > “official” images of Apache software from the top 125; all > use > > a > > > > > > > dedicated > > > > > > > > repo > > > > > > > > [7] https://hub.docker.com/u/apache > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Konstantin Knauf | Solutions Architect > > > > > > > > +49 160 91394525 > > > > > > > > > > > > Follow us @VervericaData Ververica <https://www.ververica.com/> > > > > > > > > > > > > -- > > > > > > > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > > > > Conference > > > > > > > > Stream Processing | Event Driven | Real Time > > > > > > > > -- > > > > > > > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > > > > > > > -- > > > > Ververica GmbH > > > > Registered at Amtsgericht Charlottenburg: HRB 158244 B > > > > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, > Ji > > > > (Tony) Cheng > > > > > > > > > > |
Thanks everyone for your input on this!
@Fabian: I concur with utilizing the ASF infra and ASF Docker Hub organization to build and host any "less-critical" images like you propose. I would also add RC builds to that list, as alluded to in my original email. -- Patrick On Sun, Jan 26, 2020 at 4:28 PM Ufuk Celebi <[hidden email]> wrote: > Thanks all for chiming in. I'll continue tomorrow with a VOTE as suggested > by Till. > > Regarding my initially proposed timeline: I don't think we will have > everything ready before the first 1.10 RC, but I also think it's not that > big of a deal. ;-) > > – Ufuk > > > On Fri, Jan 24, 2020 at 11:59 AM Till Rohrmann <[hidden email]> > wrote: > > > +1 for Ufuk's proposal how to proceed. I guess the immediate next step > > would be a VOTE for accepting the dockerfiles and where to store them. > > > > Cheers, > > Till > > > > On Wed, Jan 22, 2020 at 4:05 PM Fabian Hueske <[hidden email]> wrote: > > > > > Hi everyone, > > > > > > First of all, thank you very much Patrick for maintaining and > publishing > > > the Flink Docker images so far and for starting this discussion! > > > > > > I'm in favor of adding the Dockerfiles in a separate repository and not > > in > > > the main Flink repository. > > > I also think that it makes sense to first focus on the contribution of > > the > > > Dockerfiles and consolidation of existing Dockerfiles before discussing > > > special cases for development and testing. > > > > > > In addition to the Dockerfiles in the Flink main repo, there is also > one > > in > > > the flink-playgrounds repo [1] to build a customized Docker image for > the > > > playground. > > > > > > Besides building and publishing "official" Flink images via DockerHub, > > > there is also the option to let ASF Infra build Docker images and > publish > > > them under https://hub.docker.com/u/apache. > > > These images would not be "official" DockerHub images anymore, but > > > available under the Apache DockerHub user. > > > However, I think it would be a good idea to keep the current setup for > > the > > > main Flink images (those that depend on Flink releases) for better > > > visibility and to not confuse our users. > > > We might want to publish less critical images (playground images, dev > > > images, nightly builds, etc) via Infra under the Apache DockerHub user. > > > > > > Best, > > > Fabian > > > > > > Am Mo., 13. Jan. 2020 um 11:38 Uhr schrieb Ufuk Celebi <[hidden email] > >: > > > > > > > Hey all, > > > > > > > > first of all a big thank you for driving many of the Docker image > > > releases > > > > in the last two years. > > > > > > > > *(1) Moving docker-flink/docker-flink to apache/docker-flink* > > > > > > > > +1 to do this as you outlined. I would propose to aim for a first > > > > integration with the 1.10 release without major changes to the > existing > > > > Dockerfiles. The work items would be to move the Dockerfiles and > update > > > the > > > > release process documentation so everyone is on the same page. > > > > > > > > *(2) Consolidate Dockerfiles in apache/flink* > > > > > > > > +1 to start the process for this. I think this requires a bit of > > thinking > > > > about what the requirements are and which problems we want to solve. > > From > > > > skimming the existing Dockerfiles, it seems to me that the Docker > image > > > > builds fulfil quite a few different tasks. We have a script that can > > > bundle > > > > Hadoop, can copy an existing Flink distribution, can include user > jars, > > > > etc. The scope of this is quite broad and would warrant a design > > > document/a > > > > FLIP. > > > > > > > > I would move the questions about nightly builds, using a different > base > > > > image or having image variants with debug tooling to after (1) and > (2) > > or > > > > make it part of (2). > > > > > > > > *(3) Next steps* > > > > > > > > If there are no objections, I would propose to tackle (1) and (2) > > > separate > > > > and to continue as follows: > > > > > > > > (i) Create tickets for (1) and aim to align with 1.10 release > timeline > > > > (ideally before the first RC). Since this does not touch any code in > > the > > > > release branches, I think this would not be affected by the feature > > > freeze. > > > > The major work item would be to update the docs and potential > > > refactorings > > > > of the existing process and Dockerfiles. I can help with the process > to > > > > create a new repo. > > > > > > > > (ii) Create first draft for consolidation of existing Dockerfiles. > > After > > > > this proposal is done, I would propose to bring it up for a separate > > > > discussion on the ML. > > > > > > > > > > > > What do you think? @Patrick: would you be interested in working on > both > > > (1) > > > > + (2) or did you mainly have (1) in mind? > > > > > > > > Best, > > > > > > > > Ufuk > > > > > > > > On Sun, Jan 12, 2020 at 8:30 PM Konstantin Knauf < > > > [hidden email] > > > > > > > > > wrote: > > > > > > > > > Big +1 for > > > > > > > > > > * official images in a separate repository > > > > > * unified images (session cluster vs application cluster) > > > > > * images for development in Apache flink repository > > > > > > > > > > On Fri, Jan 10, 2020 at 7:14 PM Till Rohrmann < > [hidden email]> > > > > > wrote: > > > > > > > > > > > Thanks a lot for starting this discussion Patrick! I think it is > a > > > very > > > > > > good idea to move Flink's docker image more under the > jurisdiction > > of > > > > the > > > > > > Flink PMC and to make it releasing new docker images part of > > Flink's > > > > > > release process (not saying that we cannot release new docker > > images > > > > > > independent of Flink's release cycle). > > > > > > > > > > > > One thing I have no strong opinion about is where to place the > > > > > Dockerfiles > > > > > > (apache/flink.git vs. apache/flink-docker.git). I see the point > > that > > > > one > > > > > > wants to separate concerns (Flink code vs. Dockerfiles) and, > hence, > > > > that > > > > > > having separate repositories might help with this objective. But > on > > > the > > > > > > other hand, I don't have a lot of experience with Docker Hub and > > how > > > to > > > > > > best host Dockerfiles. Consequently, it would be helpful if > others > > > who > > > > > have > > > > > > made some experience could share it with us. > > > > > > > > > > > > Cheers, > > > > > > Till > > > > > > > > > > > > On Sat, Dec 21, 2019 at 2:28 PM Hequn Cheng < > [hidden email]> > > > > > wrote: > > > > > > > > > > > > > Hi Patrick, > > > > > > > > > > > > > > Thanks a lot for your continued work on the Docker images. > That’s > > > > > really > > > > > > > really a great job! And I have also benefited from it. > > > > > > > > > > > > > > Big +1 for integrating docker image publication into the Flink > > > > release > > > > > > > process since we can leverage the Flink release process to make > > > sure > > > > a > > > > > > more > > > > > > > legitimacy docker publication. We can also check and vote on it > > > > during > > > > > > the > > > > > > > release. > > > > > > > > > > > > > > I think the most import thing we need to discuss first is > whether > > > to > > > > > > have a > > > > > > > dedicated git repo for the Dockerfiles. > > > > > > > > > > > > > > Although it is convention shared by nearly every other > “official” > > > > image > > > > > > on > > > > > > > Docker Hub to have a dedicated repo, I'm still not sure about > it. > > > > > Maybe I > > > > > > > have missed something important. From my point of view, I think > > > it’s > > > > > > better > > > > > > > to have the Dockerfiles in the (main)Flink repo. > > > > > > > - First, I think the Dockerfiles can be treated as part of > the > > > > > release. > > > > > > > And it is also natural to put the corresponding version of the > > > > > Dockerfile > > > > > > > in the corresponding Flink release. > > > > > > > - Second, we can put the Dockerfiles in the path like > > > > > > > flink/docker-flink/version/ and the version varies in different > > > > > releases. > > > > > > > For example, for release 1.8.3, we have a > > flink/docker-flink/1.8.3 > > > > > > > folder(or maybe flink/docker-flink/1.8). Even though all > > > Dockerfiles > > > > > for > > > > > > > supported versions are not in one path but they are still in > one > > > Git > > > > > tree > > > > > > > with different refs. > > > > > > > - Third, it seems the Docker Hub also supports specifying > > > different > > > > > > refs. > > > > > > > For the file[1], we can change the GitRepo link from > > > > > > > https://github.com/docker-flink/docker-flink.git to > > > > > > > https://github.com/apache/flink.git and add a GitFetch for > each > > > tag, > > > > > > e.g., > > > > > > > GitFetch: refs/tags/release-1.8.3. There are some examples in > the > > > > file > > > > > of > > > > > > > ubuntu[2]. > > > > > > > > > > > > > > If the above assumptions are right and there are no more > > obstacles, > > > > I'm > > > > > > > intended to have these Dockerfiles in the main Flink repo. In > > this > > > > > case, > > > > > > we > > > > > > > can reduce the number of repos and reduce the management > > overhead. > > > > > > > What do you think? > > > > > > > > > > > > > > Best, > > > > > > > Hequn > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/docker-library/official-images/blob/master/library/flink > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/docker-library/official-images/blob/master/library/ubuntu > > > > > > > > > > > > > > > > > > > > > On Fri, Dec 20, 2019 at 5:29 PM Yang Wang < > [hidden email] > > > > > > > > wrote: > > > > > > > > > > > > > > > Big +1 for this effort. > > > > > > > > > > > > > > > > It is really exciting we have started this great work. More > and > > > > more > > > > > > > > companies start to > > > > > > > > use Flink in container environment(docker, Kubernetes, Mesos, > > > even > > > > > > > > Yarn-3.x). So it is > > > > > > > > very important that we could have unified official image > > building > > > > and > > > > > > > > releasing process. > > > > > > > > > > > > > > > > > > > > > > > > The image building process in this proposal is really good > and > > i > > > > just > > > > > > > have > > > > > > > > the following thoughts. > > > > > > > > > > > > > > > > >> Keep a dedicated repo for Dockerfiles to build official > > image > > > > > > > > I think this is a good way and we do not need to make some > > > > > unnecessary > > > > > > > > changes to Flink repository. > > > > > > > > > > > > > > > > >> Integrate building image into the Flink release process > > > > > > > > It will bring a better experience for container environment > > > users. > > > > In > > > > > > my > > > > > > > > opinion, a complete > > > > > > > > release includes the official image. It should be verified to > > > work > > > > > > well. > > > > > > > > > > > > > > > > >> Nightly building > > > > > > > > Do we support for all the release branch or just master > branch? > > > > > > > > > > > > > > > > >> Multiple purpose Flink images > > > > > > > > It is really indeed. In developing and testing process, we > need > > > > some > > > > > > > > profiling tools to help > > > > > > > > us investigate some problems. Currently, we do not even have > > > > > > jstack/jmap > > > > > > > in > > > > > > > > the image. > > > > > > > > > > > > > > > > >> Unify the Dockerfile in Flink repository > > > > > > > > In the current code base, we have > > > > > flink-contrib/docker-flink/Dockerfile > > > > > > > to > > > > > > > > build a image > > > > > > > > for session cluster. However, it is not updated. For per-job > > > > cluster, > > > > > > > > flink-container/docker/Dockerfile > > > > > > > > could be used to build a flink image with user artifacts. I > > think > > > > we > > > > > > need > > > > > > > > to unify them and > > > > > > > > provide a more powerful build script and entry point. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > Yang > > > > > > > > > > > > > > > > Patrick Lucas <[hidden email]> 于2019年12月19日周四 > 下午9:20写道: > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > > > > > > I would like to start a discussion about integrating > > > publication > > > > of > > > > > > the > > > > > > > > > Flink Docker images hosted on Docker Hub[1] more tightly > with > > > the > > > > > > Flink > > > > > > > > > release process. Apologies in advance for the long post. > > > > > > > > > > > > > > > > > > More than two and a half years ago (time flies!) we > > introduced > > > > > > > “official” > > > > > > > > > Docker images for Flink[2]. Since then, the popularity of > > > running > > > > > > > > > containerized applications in general and containerized > Flink > > > in > > > > > > > > particular > > > > > > > > > has continued to grow. Today, Flink is one of the most > > popular > > > > > > > “official” > > > > > > > > > images on Docker Hub[3]. > > > > > > > > > > > > > > > > > > > A graph of Flink Docker image pulls over time: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://gist.githubusercontent.com/patricklucas/7312444b1056ff82528e9a129e74e2b3/raw/9c8e139c1abc70b2b3fb34aadd7f44d46a540fe8/docker-flink-pulls.png > > > > > > > > > > > > > > > > > > “Official” is in quotation marks because while that’s how > the > > > > > Docker > > > > > > > > > community refers to top-level images on Docker Hub (i.e. > > those > > > > that > > > > > > can > > > > > > > > be > > > > > > > > > run with just <docker run foo>), they are not official in > the > > > > sense > > > > > > of > > > > > > > > > being officially endorsed by the Flink PMC. > > > > > > > > > > > > > > > > > > I think it’s time for that to change. > > > > > > > > > > > > > > > > > > Currently, the Dockerfiles that produce these images are > > > > maintained > > > > > > in > > > > > > > a > > > > > > > > > repository called docker-flink[4] in a separate, > > > > community-managed > > > > > > > GitHub > > > > > > > > > organization of the same name. When a new release of Flink > is > > > > > > > available, > > > > > > > > or > > > > > > > > > when other changes are necessary, these Dockerfiles—one per > > > > > image—are > > > > > > > > > updated, and then a pull request[5] is made to the Docker > Hub > > > > > > > > > official-images repo with an updated manifest of images and > > > tags, > > > > > > after > > > > > > > > > which infrastructure run by Docker Hub builds, checks, and > > > > > publishes > > > > > > > the > > > > > > > > > images. > > > > > > > > > > > > > > > > > > A question that has come up regularly is “Why are the > > > Dockerfiles > > > > > in > > > > > > a > > > > > > > > > separate repository from Flink?”, and there are a few > > different > > > > > > > answers: > > > > > > > > > > > > > > > > > > - > > > > > > > > > > > > > > > > > > These Dockerfiles package only released, published > > > > distributions > > > > > > of > > > > > > > > > Flink, and are therefore decoupled from a particular > > commit > > > in > > > > > the > > > > > > > > Flink > > > > > > > > > repo > > > > > > > > > - > > > > > > > > > > > > > > > > > > All the Dockerfiles for supported versions (and the > > > > > corresponding > > > > > > > > Scala > > > > > > > > > version variants) should be available in one Git tree > for > > > > > > > > > discoverability > > > > > > > > > - > > > > > > > > > > > > > > > > > > The master branch of Flink is not the right place to > > encode > > > > what > > > > > > the > > > > > > > > > supported versions are, or how to run previous versions > of > > > > > > Flink—it > > > > > > > > > should > > > > > > > > > be concerned with the point-in-time of the code > > represented > > > in > > > > > > that > > > > > > > > > commit > > > > > > > > > > > > > > > > > > > > > > > > > > > But mostly, having a dedicated repo for Dockerfiles is a > > > > convention > > > > > > > > shared > > > > > > > > > by nearly every other “official” image on Docker Hub[6]. If > > the > > > > > Flink > > > > > > > > > community wants to do this differently, we will need to > work > > > with > > > > > the > > > > > > > > > Docker Hub maintainers to make sure we continue to work > > within > > > > > their > > > > > > > > > guidelines and expectations. > > > > > > > > > > > > > > > > > > While it seems intuitive that integrating these images into > > the > > > > > Flink > > > > > > > > > release process is a good thing, I don’t believe it is > > strictly > > > > > > > > necessary, > > > > > > > > > since the images only package approved and signed Flink > > > releases, > > > > > and > > > > > > > do > > > > > > > > > not themselves build Flink from source. However, there are > > some > > > > > > > concrete > > > > > > > > > advantages: > > > > > > > > > > > > > > > > > > - > > > > > > > > > > > > > > > > > > Putting the Docker images on (almost) equal footing with > > > Flink > > > > > > > binary > > > > > > > > > release artifacts will help the legitimacy of and user > > > > > confidence > > > > > > in > > > > > > > > > running Flink in containerized environments > > > > > > > > > - > > > > > > > > > > > > > > > > > > By publishing release candidate (and possibly nightly) > > > images, > > > > > the > > > > > > > > > release testing and automated testing processes could be > > > > > improved > > > > > > > > > - > > > > > > > > > > > > > > > > > > The delay between Flink releases and when the > > corresponding > > > > > Docker > > > > > > > > > images are available will be reduced > > > > > > > > > > > > > > > > > > > > > > > > > > > Considering all of this, I propose the following: > > > > > > > > > > > > > > > > > > - > > > > > > > > > > > > > > > > > > We move the Git repository containing the Dockerfiles > from > > > the > > > > > > > > > docker-flink GitHub organization to Apache, placing it > > under > > > > > > control > > > > > > > > of > > > > > > > > > the > > > > > > > > > Flink PMC > > > > > > > > > - > > > > > > > > > > > > > > > > > > We codify updating these Dockerfiles and notifying > Docker > > > Hub > > > > > into > > > > > > > the > > > > > > > > > Flink release process > > > > > > > > > - > > > > > > > > > > > > > > > > > > For release candidates, Dockerfiles should be added > to > > a > > > > > > special > > > > > > > > > directory which will be automatically built and > pushed > > to > > > > the > > > > > > > > > Apache Docker > > > > > > > > > Hub organization[7], e.g. apache/flink-rc:1.10.0-rc1 > > > > > > > > > - > > > > > > > > > > > > > > > > > > Upon release, the appropriate “release” Dockerfiles > are > > > > added > > > > > > > (e.g. > > > > > > > > > under the 1.10 directory) and release candidate > > > Dockerfiles > > > > > > > > removed, > > > > > > > > > and > > > > > > > > > then a pull request opened on the > > > > > > docker-library/official-images > > > > > > > > > repository > > > > > > > > > - > > > > > > > > > > > > > > > > > > Optionally, we introduce “nightly” builds, with an > > automated > > > > > > process > > > > > > > > > building and pushing images to the Apache Docker Hub > > > > > organization, > > > > > > > > e.g. > > > > > > > > > apache/flink-dev:1.10-SNAPSHOT > > > > > > > > > > > > > > > > > > > > > > > > > > > If we choose to move forward in this direction, there are > > some > > > > > > further > > > > > > > > > steps we could take to improve the experience of both > > > developing > > > > > and > > > > > > > > using > > > > > > > > > Flink with Docker (these are actually mostly orthogonal to > > the > > > > > > proposed > > > > > > > > > changes above, but I think this is a natural first step and > > > > should > > > > > > make > > > > > > > > the > > > > > > > > > following ideas easier to implement). > > > > > > > > > > > > > > > > > > First, there are important differences between images meant > > for > > > > > > running > > > > > > > > > Flink and those meant for development: the former should > > > strictly > > > > > > > package > > > > > > > > > only released distributions of software and be as thin of a > > > layer > > > > > as > > > > > > > > > possible over the software itself, while the latter can be > > used > > > > > > during > > > > > > > > > development and testing, and can easily be rebuilt from a > > > > “working > > > > > > > copy” > > > > > > > > of > > > > > > > > > the software’s source code. > > > > > > > > > > > > > > > > > > By standardizing on defining such “production” images in > the > > > > > > > docker-flink > > > > > > > > > repository and “development” image(s) in the Flink > repository > > > > > itself, > > > > > > > it > > > > > > > > is > > > > > > > > > much clearer to developers and users what the right > > Dockerfile > > > or > > > > > > image > > > > > > > > > they should use for a given purpose. To that end, we could > > > > > introduce > > > > > > > one > > > > > > > > or > > > > > > > > > more documented Maven goals or Make targets for building a > > > Docker > > > > > > image > > > > > > > > > from the current source tree or a specific release > (including > > > > > > > unreleased > > > > > > > > or > > > > > > > > > unsupported versions). > > > > > > > > > > > > > > > > > > Additionally, there has been discussion among Flink > > > contributors > > > > > for > > > > > > > some > > > > > > > > > time about the confusing state of Dockerfiles within the > > Flink > > > > > > > > repository, > > > > > > > > > each meant for a different way of running Flink. I’m not > > > > completely > > > > > > up > > > > > > > to > > > > > > > > > speed about these different efforts, but we could possibly > > > solve > > > > > this > > > > > > > by > > > > > > > > > either building additional “official” images with different > > > > > > entrypoints > > > > > > > > for > > > > > > > > > these various purposes, or by developing an improved > > entrypoint > > > > > > script > > > > > > > > that > > > > > > > > > conveniently supports all cases. I defer to Till Rohrmann, > > > > > Konstantin > > > > > > > > > Knauf, or Stephan Ewen for further discussion on this > point. > > > > > > > > > > > > > > > > > > I apologize again for the wall of text, but if you made it > > this > > > > > far, > > > > > > > > thank > > > > > > > > > you! These improvements have been a long time coming, and I > > > hope > > > > we > > > > > > can > > > > > > > > > find a solution that serves the Flink and Docker > communities > > > > well. > > > > > > > Please > > > > > > > > > don’t hesitate to ask any questions. > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > Patrick Lucas > > > > > > > > > > > > > > > > > > [1] https://hub.docker.com/_/flink > > > > > > > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://lists.apache.org/thread.html/c50297f8659aaa59d4f2ae327b69c4d46d1ab8ecc53138e659e4fe91%40%3Cdev.flink.apache.org%3E > > > > > > > > > > > > > > > > > > [3] On page 2 at the time we went to press: > > > > > > > > > > > > > https://hub.docker.com/search?q=&type=image&image_filter=official > > > > > > > > > > > > > > > > > > [4] https://github.com/docker-flink/docker-flink > > > > > > > > > > > > > > > > > > [5] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/docker-library/official-images/pulls?q=is%3Apr+label%3Alibrary%2Fflink > > > > > > > > > > > > > > > > > > [6] I looked at the 25 most popular “official” images (see > > [3]) > > > > as > > > > > > well > > > > > > > > as > > > > > > > > > “official” images of Apache software from the top 125; all > > use > > > a > > > > > > > > dedicated > > > > > > > > > repo > > > > > > > > > [7] https://hub.docker.com/u/apache > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Konstantin Knauf | Solutions Architect > > > > > > > > > > +49 160 91394525 > > > > > > > > > > > > > > > Follow us @VervericaData Ververica <https://www.ververica.com/> > > > > > > > > > > > > > > > -- > > > > > > > > > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > > > > > Conference > > > > > > > > > > Stream Processing | Event Driven | Real Time > > > > > > > > > > -- > > > > > > > > > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > > > > > > > > > -- > > > > > Ververica GmbH > > > > > Registered at Amtsgericht Charlottenburg: HRB 158244 B > > > > > Managing Directors: Timothy Alexander Steinert, Yip Park Tung > Jason, > > Ji > > > > > (Tony) Cheng > > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |