Improvements to Mesos Deployments Using Docker

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Improvements to Mesos Deployments Using Docker

Addison Higham
I am currently in the process of getting flink 1.5 running in a mesos
cluster using docker.

I have come across a few improvements that I think could be helpful with
this configuration (and will probably also apply for any future
containerized deployments, like Kubernetes)

I have already created two issues to track this:
https://issues.apache.org/jira/browse/FLINK-9611 and
https://issues.apache.org/jira/browse/FLINK-9612.

A quick summary:
FLINK-9611 - Allow for a configuration option to add user defined artifacts
to be downloaded into the container. This is useful for cases where you
want to add credentials to pull a private docker image (but probably has
many other use cases). While this could easily be done via config, it
*might* allow for better extensiblity to dynamic classload a user defined
overlay class, that could tweak the container specification as needed

FLINK-9612 - Add an option for disabling pulling of most of the
FlinkDistributionOverlay. Currently, if you are trying to deploy many
TaskManagers with a pre-built docker image with a flink distribution, it is
very wasteful, as it re-downloads all the dependencies. This can cause
problems with swarming the MesosArtifactServer and it doesn't take too many
nodes deploying to see some failed downloads.

I am willing to implement these two features, but would be interested in
getting some feedback.

Some questions
- Would a limited (but simple) property like `
mesos.resourcemanager.tasks.uris` with a comma separated list of URIs be
preferable to a more powerful (but more complex)
`mesos.resourcemanager.tasks.user-overlay` property that, when defined,
would use a classloader to dynamically add another overlay?
- Is there any files that are generated by flink that would need to always
be downloaded from as an artifact into the container? As best as I can
tell, that isn't the case, at least in the `FlinkDistributionOverlay`
- Are there any other overlay layers that are redundant in container
deployment using pre-built docker images?

Thanks for your feedback!
Reply | Threaded
Open this post in threaded view
|

Re: Improvements to Mesos Deployments Using Docker

Till Rohrmann
Hi Addison,

thanks for starting the discussion. My gut feeling is that we could solve
FLINK-9611 and FLINK-9612 both with allowing the user to specify a custom
AbstractContainerOverlay implementation. Thus, introducing an
AbstractContainerOverlayFactory instead of the specific
mesos.resourcemanager.tasks.uris option sounds favorable to me.

When being able to specify a custom overlay, we should extend the
MessoJobClusterEntrypoint to add the retrieved job graph and the use code
jars to the overlay. That way, we don't have to fetch the user code jars
via the BlobServer. But this would be a follow up task.

I'm not aware of any other redundant layers for the pre-built docker images.

Cheers,
Till

On Tue, Jun 19, 2018 at 3:57 AM Addison Higham <[hidden email]> wrote:

> I am currently in the process of getting flink 1.5 running in a mesos
> cluster using docker.
>
> I have come across a few improvements that I think could be helpful with
> this configuration (and will probably also apply for any future
> containerized deployments, like Kubernetes)
>
> I have already created two issues to track this:
> https://issues.apache.org/jira/browse/FLINK-9611 and
> https://issues.apache.org/jira/browse/FLINK-9612.
>
> A quick summary:
> FLINK-9611 - Allow for a configuration option to add user defined artifacts
> to be downloaded into the container. This is useful for cases where you
> want to add credentials to pull a private docker image (but probably has
> many other use cases). While this could easily be done via config, it
> *might* allow for better extensiblity to dynamic classload a user defined
> overlay class, that could tweak the container specification as needed
>
> FLINK-9612 - Add an option for disabling pulling of most of the
> FlinkDistributionOverlay. Currently, if you are trying to deploy many
> TaskManagers with a pre-built docker image with a flink distribution, it is
> very wasteful, as it re-downloads all the dependencies. This can cause
> problems with swarming the MesosArtifactServer and it doesn't take too many
> nodes deploying to see some failed downloads.
>
> I am willing to implement these two features, but would be interested in
> getting some feedback.
>
> Some questions
> - Would a limited (but simple) property like `
> mesos.resourcemanager.tasks.uris` with a comma separated list of URIs be
> preferable to a more powerful (but more complex)
> `mesos.resourcemanager.tasks.user-overlay` property that, when defined,
> would use a classloader to dynamically add another overlay?
> - Is there any files that are generated by flink that would need to always
> be downloaded from as an artifact into the container? As best as I can
> tell, that isn't the case, at least in the `FlinkDistributionOverlay`
> - Are there any other overlay layers that are redundant in container
> deployment using pre-built docker images?
>
> Thanks for your feedback!
>