[DISCUSS] Adding e2e tests for Flink's Mesos integration

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Adding e2e tests for Flink's Mesos integration

Yangze Guo
Hi, all,

Currently, there is no end to end test or IT case for Mesos deployment
while the common deployment related developing would inevitably touch
the logic of this component. Thus, some work needs to be done to
guarantee experience for both Meos users and contributors. After
offline discussion with Till and Xintong, we have some basic ideas and
would like to start a discussion thread on adding end to end tests for
Flink's Mesos integration.

As a first step, we would like to keep the scope of this contribution
to be relative small. This may also help us to quickly get some basic
test cases that might be helpful for the upcoming 1.10 release.

As far as we can think of, what needs to be done is to setup a Mesos
framework during the testing and determine which tests need to be
included.


** Regarding the Mesos framework, after trying out several approaches,
I find that setting up Mesos in docker is probably what we want. The
resources needed for building and setting up Mesos from source is
probably not affordable in most of the scenarios. So, the one open
question that worth discussion is the choice of Docker image. We have
come up with two options.

- Using official Mesos image[1]
The official image was the first alternative that come to our mind,
but we run into some sort of Java version compatibility problem that
leads to failures of launching task executors. Flink supports Java 9
since version 1.9.0 [2], However, the official Docker image of Mesos
is built with a development version of JDK 9, which probably has
caused this problem. Unless we want to make Flink to also be
compatible with the JDK development version used by the official mesos
image, this option does not work out. Besides, according to the
official roadmap[5], Java 9 is not a long-term support version, which
may bring stability risk in future.

- Build a custom image
I've already tried build a custom image[3] and successfully run most
of the existing end to end tests cases with it. The image is built
with Ubuntu 16.04, JDK 8 and Mesos 1.7.1. For the mesos e2e test
framework, we could either build the image from a Docker file or pull
the pre-built image from DockerHub (or other hub services) during the
testing.
If we decide to publish the an image on DockerHub, we probably need a
Flink official  repository/account to hold it.


** Regarding the test coverage, we think the following three tests
could be a good starting point that covers a very essential set of
behaviors for Mesos deployment.
- Wordcount end-to-end test. For verifying the basic process of Mesos
deployment.
- Multiple submissions of the same job. For preventing resource
management problems on Mesos, such as [4]
- State TTL RocksDb backend end-to-end test. For verifying memory
configuration behaviors, since Mesos has it’s own config options and
logics.

Unfortunately, neither of us who participated the initial offline
discussion has much experience for running flink on mesos in
production. It would be good that users and experts who actually use
flink on mesos can join the discussion and provide some feedbacks. Any
feedback, idea, suggestion, concern and question will be welcomed and
appreciated.


BTW, we would like to raise a survey on the usages of Flink on Mesos
in the community. For the Flink on Mesos users, we would like to
learn:
- Which version of Mesos do you use and what setups (such as Marathon)
do you need for Mesos
- Is it Flink job cluster or session cluster that  is majorly used
- How is the scale of the Flink / Mesos cluster


[1]https://hub.docker.com/r/mesosphere/mesos
[2]https://issues.apache.org/jira/browse/FLINK-11307
[3]https://hub.docker.com/repository/docker/karmagyz/mesos-flink
[4]https://issues.apache.org/jira/browse/FLINK-14074
[5]https://www.oracle.com/technetwork/java/java-se-support-roadmap.html


Best,
Yangze Guo
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration

Till Rohrmann
Big +1 for adding a fully working e2e test for Flink's Mesos integration.
Ideally we would have it ready for the 1.10 release. The lack of such a
test has bitten us already multiple times.

In general I would prefer to use the official image if possible since it
frees us from maintaining our own custom image. Since Java 9 is no longer
officially supported as we opted for supporting Java 11 (LTS) it might not
be feasible, though. How much longer would building the custom image vs.
downloading the custom image from DockerHub be? Maybe it is ok to build the
image locally. Then we would not have to maintain the image.

Cheers,
Till

On Fri, Dec 6, 2019 at 11:05 AM Yangze Guo <[hidden email]> wrote:

> Hi, all,
>
> Currently, there is no end to end test or IT case for Mesos deployment
> while the common deployment related developing would inevitably touch
> the logic of this component. Thus, some work needs to be done to
> guarantee experience for both Meos users and contributors. After
> offline discussion with Till and Xintong, we have some basic ideas and
> would like to start a discussion thread on adding end to end tests for
> Flink's Mesos integration.
>
> As a first step, we would like to keep the scope of this contribution
> to be relative small. This may also help us to quickly get some basic
> test cases that might be helpful for the upcoming 1.10 release.
>
> As far as we can think of, what needs to be done is to setup a Mesos
> framework during the testing and determine which tests need to be
> included.
>
>
> ** Regarding the Mesos framework, after trying out several approaches,
> I find that setting up Mesos in docker is probably what we want. The
> resources needed for building and setting up Mesos from source is
> probably not affordable in most of the scenarios. So, the one open
> question that worth discussion is the choice of Docker image. We have
> come up with two options.
>
> - Using official Mesos image[1]
> The official image was the first alternative that come to our mind,
> but we run into some sort of Java version compatibility problem that
> leads to failures of launching task executors. Flink supports Java 9
> since version 1.9.0 [2], However, the official Docker image of Mesos
> is built with a development version of JDK 9, which probably has
> caused this problem. Unless we want to make Flink to also be
> compatible with the JDK development version used by the official mesos
> image, this option does not work out. Besides, according to the
> official roadmap[5], Java 9 is not a long-term support version, which
> may bring stability risk in future.
>
> - Build a custom image
> I've already tried build a custom image[3] and successfully run most
> of the existing end to end tests cases with it. The image is built
> with Ubuntu 16.04, JDK 8 and Mesos 1.7.1. For the mesos e2e test
> framework, we could either build the image from a Docker file or pull
> the pre-built image from DockerHub (or other hub services) during the
> testing.
> If we decide to publish the an image on DockerHub, we probably need a
> Flink official  repository/account to hold it.
>
>
> ** Regarding the test coverage, we think the following three tests
> could be a good starting point that covers a very essential set of
> behaviors for Mesos deployment.
> - Wordcount end-to-end test. For verifying the basic process of Mesos
> deployment.
> - Multiple submissions of the same job. For preventing resource
> management problems on Mesos, such as [4]
> - State TTL RocksDb backend end-to-end test. For verifying memory
> configuration behaviors, since Mesos has it’s own config options and
> logics.
>
> Unfortunately, neither of us who participated the initial offline
> discussion has much experience for running flink on mesos in
> production. It would be good that users and experts who actually use
> flink on mesos can join the discussion and provide some feedbacks. Any
> feedback, idea, suggestion, concern and question will be welcomed and
> appreciated.
>
>
> BTW, we would like to raise a survey on the usages of Flink on Mesos
> in the community. For the Flink on Mesos users, we would like to
> learn:
> - Which version of Mesos do you use and what setups (such as Marathon)
> do you need for Mesos
> - Is it Flink job cluster or session cluster that  is majorly used
> - How is the scale of the Flink / Mesos cluster
>
>
> [1]https://hub.docker.com/r/mesosphere/mesos
> [2]https://issues.apache.org/jira/browse/FLINK-11307
> [3]https://hub.docker.com/repository/docker/karmagyz/mesos-flink
> [4]https://issues.apache.org/jira/browse/FLINK-14074
> [5]https://www.oracle.com/technetwork/java/java-se-support-roadmap.html
>
>
> Best,
> Yangze Guo
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration

Piyush Narang
+1 from our end as well. At Criteo, we are running some Flink jobs on Mesos in production to compute short term features for machine learning. We’d love to help out and contribute on this initiative.

Thanks,
-- Piyush


From: Till Rohrmann <[hidden email]>
Date: Friday, December 6, 2019 at 8:10 AM
To: dev <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration

Big +1 for adding a fully working e2e test for Flink's Mesos integration. Ideally we would have it ready for the 1.10 release. The lack of such a test has bitten us already multiple times.

In general I would prefer to use the official image if possible since it frees us from maintaining our own custom image. Since Java 9 is no longer officially supported as we opted for supporting Java 11 (LTS) it might not be feasible, though. How much longer would building the custom image vs. downloading the custom image from DockerHub be? Maybe it is ok to build the image locally. Then we would not have to maintain the image.

Cheers,
Till

On Fri, Dec 6, 2019 at 11:05 AM Yangze Guo <[hidden email]<mailto:[hidden email]>> wrote:
Hi, all,

Currently, there is no end to end test or IT case for Mesos deployment
while the common deployment related developing would inevitably touch
the logic of this component. Thus, some work needs to be done to
guarantee experience for both Meos users and contributors. After
offline discussion with Till and Xintong, we have some basic ideas and
would like to start a discussion thread on adding end to end tests for
Flink's Mesos integration.

As a first step, we would like to keep the scope of this contribution
to be relative small. This may also help us to quickly get some basic
test cases that might be helpful for the upcoming 1.10 release.

As far as we can think of, what needs to be done is to setup a Mesos
framework during the testing and determine which tests need to be
included.


** Regarding the Mesos framework, after trying out several approaches,
I find that setting up Mesos in docker is probably what we want. The
resources needed for building and setting up Mesos from source is
probably not affordable in most of the scenarios. So, the one open
question that worth discussion is the choice of Docker image. We have
come up with two options.

- Using official Mesos image[1]
The official image was the first alternative that come to our mind,
but we run into some sort of Java version compatibility problem that
leads to failures of launching task executors. Flink supports Java 9
since version 1.9.0 [2], However, the official Docker image of Mesos
is built with a development version of JDK 9, which probably has
caused this problem. Unless we want to make Flink to also be
compatible with the JDK development version used by the official mesos
image, this option does not work out. Besides, according to the
official roadmap[5], Java 9 is not a long-term support version, which
may bring stability risk in future.

- Build a custom image
I've already tried build a custom image[3] and successfully run most
of the existing end to end tests cases with it. The image is built
with Ubuntu 16.04, JDK 8 and Mesos 1.7.1. For the mesos e2e test
framework, we could either build the image from a Docker file or pull
the pre-built image from DockerHub (or other hub services) during the
testing.
If we decide to publish the an image on DockerHub, we probably need a
Flink official  repository/account to hold it.


** Regarding the test coverage, we think the following three tests
could be a good starting point that covers a very essential set of
behaviors for Mesos deployment.
- Wordcount end-to-end test. For verifying the basic process of Mesos
deployment.
- Multiple submissions of the same job. For preventing resource
management problems on Mesos, such as [4]
- State TTL RocksDb backend end-to-end test. For verifying memory
configuration behaviors, since Mesos has it’s own config options and
logics.

Unfortunately, neither of us who participated the initial offline
discussion has much experience for running flink on mesos in
production. It would be good that users and experts who actually use
flink on mesos can join the discussion and provide some feedbacks. Any
feedback, idea, suggestion, concern and question will be welcomed and
appreciated.


BTW, we would like to raise a survey on the usages of Flink on Mesos
in the community. For the Flink on Mesos users, we would like to
learn:
- Which version of Mesos do you use and what setups (such as Marathon)
do you need for Mesos
- Is it Flink job cluster or session cluster that  is majorly used
- How is the scale of the Flink / Mesos cluster


[1]https://hub.docker.com/r/mesosphere/mesos
[2]https://issues.apache.org/jira/browse/FLINK-11307
[3]https://hub.docker.com/repository/docker/karmagyz/mesos-flink
[4]https://issues.apache.org/jira/browse/FLINK-14074
[5]https://www.oracle.com/technetwork/java/java-se-support-roadmap.html


Best,
Yangze Guo
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration

Yangze Guo
Thanks for your feedback!

@Till
Regarding the time overhead, I think it mainly come from the network
transmission. For building the image locally, it will totally download
260MB files including the base image and packages. For pulling from
DockerHub, the compressed size of the image is 347MB. Thus, I agree
that it is ok to build the image locally.

@Piyush
Thank you for offering the help and sharing your usage scenario. In
current stage, I think it will be really helpful if you can compress
the custom image[1] or reduce the time overhead to build it locally.
Any ideas for improving test coverage will also be appreciated.

[1]https://hub.docker.com/layers/karmagyz/mesos-flink/latest/images/sha256-4e1caefea107818aa11374d6ac8a6e889922c81806f5cd791ead141f18ec7e64

Best,
Yangze Guo

On Sat, Dec 7, 2019 at 3:17 AM Piyush Narang <[hidden email]> wrote:

>
> +1 from our end as well. At Criteo, we are running some Flink jobs on Mesos in production to compute short term features for machine learning. We’d love to help out and contribute on this initiative.
>
> Thanks,
> -- Piyush
>
>
> From: Till Rohrmann <[hidden email]>
> Date: Friday, December 6, 2019 at 8:10 AM
> To: dev <[hidden email]>
> Cc: user <[hidden email]>
> Subject: Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration
>
> Big +1 for adding a fully working e2e test for Flink's Mesos integration. Ideally we would have it ready for the 1.10 release. The lack of such a test has bitten us already multiple times.
>
> In general I would prefer to use the official image if possible since it frees us from maintaining our own custom image. Since Java 9 is no longer officially supported as we opted for supporting Java 11 (LTS) it might not be feasible, though. How much longer would building the custom image vs. downloading the custom image from DockerHub be? Maybe it is ok to build the image locally. Then we would not have to maintain the image.
>
> Cheers,
> Till
>
> On Fri, Dec 6, 2019 at 11:05 AM Yangze Guo <[hidden email]<mailto:[hidden email]>> wrote:
> Hi, all,
>
> Currently, there is no end to end test or IT case for Mesos deployment
> while the common deployment related developing would inevitably touch
> the logic of this component. Thus, some work needs to be done to
> guarantee experience for both Meos users and contributors. After
> offline discussion with Till and Xintong, we have some basic ideas and
> would like to start a discussion thread on adding end to end tests for
> Flink's Mesos integration.
>
> As a first step, we would like to keep the scope of this contribution
> to be relative small. This may also help us to quickly get some basic
> test cases that might be helpful for the upcoming 1.10 release.
>
> As far as we can think of, what needs to be done is to setup a Mesos
> framework during the testing and determine which tests need to be
> included.
>
>
> ** Regarding the Mesos framework, after trying out several approaches,
> I find that setting up Mesos in docker is probably what we want. The
> resources needed for building and setting up Mesos from source is
> probably not affordable in most of the scenarios. So, the one open
> question that worth discussion is the choice of Docker image. We have
> come up with two options.
>
> - Using official Mesos image[1]
> The official image was the first alternative that come to our mind,
> but we run into some sort of Java version compatibility problem that
> leads to failures of launching task executors. Flink supports Java 9
> since version 1.9.0 [2], However, the official Docker image of Mesos
> is built with a development version of JDK 9, which probably has
> caused this problem. Unless we want to make Flink to also be
> compatible with the JDK development version used by the official mesos
> image, this option does not work out. Besides, according to the
> official roadmap[5], Java 9 is not a long-term support version, which
> may bring stability risk in future.
>
> - Build a custom image
> I've already tried build a custom image[3] and successfully run most
> of the existing end to end tests cases with it. The image is built
> with Ubuntu 16.04, JDK 8 and Mesos 1.7.1. For the mesos e2e test
> framework, we could either build the image from a Docker file or pull
> the pre-built image from DockerHub (or other hub services) during the
> testing.
> If we decide to publish the an image on DockerHub, we probably need a
> Flink official  repository/account to hold it.
>
>
> ** Regarding the test coverage, we think the following three tests
> could be a good starting point that covers a very essential set of
> behaviors for Mesos deployment.
> - Wordcount end-to-end test. For verifying the basic process of Mesos
> deployment.
> - Multiple submissions of the same job. For preventing resource
> management problems on Mesos, such as [4]
> - State TTL RocksDb backend end-to-end test. For verifying memory
> configuration behaviors, since Mesos has it’s own config options and
> logics.
>
> Unfortunately, neither of us who participated the initial offline
> discussion has much experience for running flink on mesos in
> production. It would be good that users and experts who actually use
> flink on mesos can join the discussion and provide some feedbacks. Any
> feedback, idea, suggestion, concern and question will be welcomed and
> appreciated.
>
>
> BTW, we would like to raise a survey on the usages of Flink on Mesos
> in the community. For the Flink on Mesos users, we would like to
> learn:
> - Which version of Mesos do you use and what setups (such as Marathon)
> do you need for Mesos
> - Is it Flink job cluster or session cluster that  is majorly used
> - How is the scale of the Flink / Mesos cluster
>
>
> [1]https://hub.docker.com/r/mesosphere/mesos
> [2]https://issues.apache.org/jira/browse/FLINK-11307
> [3]https://hub.docker.com/repository/docker/karmagyz/mesos-flink
> [4]https://issues.apache.org/jira/browse/FLINK-14074
> [5]https://www.oracle.com/technetwork/java/java-se-support-roadmap.html
>
>
> Best,
> Yangze Guo
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration

Yang Wang
Thanks Yangze for starting this discussion.

Just share my thoughts.

If the mesos official docker image could not meet our requirement, i
suggest to build the image locally.
We have done the same things for yarn e2e tests. This way is more flexible
and easy to maintain. However,
i have no idea how long building the mesos image locally will take. Based
on previous experience of yarn, i
think it may not take too much time.



Best,
Yang

Yangze Guo <[hidden email]> 于2019年12月7日周六 下午4:25写道:

> Thanks for your feedback!
>
> @Till
> Regarding the time overhead, I think it mainly come from the network
> transmission. For building the image locally, it will totally download
> 260MB files including the base image and packages. For pulling from
> DockerHub, the compressed size of the image is 347MB. Thus, I agree
> that it is ok to build the image locally.
>
> @Piyush
> Thank you for offering the help and sharing your usage scenario. In
> current stage, I think it will be really helpful if you can compress
> the custom image[1] or reduce the time overhead to build it locally.
> Any ideas for improving test coverage will also be appreciated.
>
> [1]
> https://hub.docker.com/layers/karmagyz/mesos-flink/latest/images/sha256-4e1caefea107818aa11374d6ac8a6e889922c81806f5cd791ead141f18ec7e64
>
> Best,
> Yangze Guo
>
> On Sat, Dec 7, 2019 at 3:17 AM Piyush Narang <[hidden email]> wrote:
> >
> > +1 from our end as well. At Criteo, we are running some Flink jobs on
> Mesos in production to compute short term features for machine learning.
> We’d love to help out and contribute on this initiative.
> >
> > Thanks,
> > -- Piyush
> >
> >
> > From: Till Rohrmann <[hidden email]>
> > Date: Friday, December 6, 2019 at 8:10 AM
> > To: dev <[hidden email]>
> > Cc: user <[hidden email]>
> > Subject: Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration
> >
> > Big +1 for adding a fully working e2e test for Flink's Mesos
> integration. Ideally we would have it ready for the 1.10 release. The lack
> of such a test has bitten us already multiple times.
> >
> > In general I would prefer to use the official image if possible since it
> frees us from maintaining our own custom image. Since Java 9 is no longer
> officially supported as we opted for supporting Java 11 (LTS) it might not
> be feasible, though. How much longer would building the custom image vs.
> downloading the custom image from DockerHub be? Maybe it is ok to build the
> image locally. Then we would not have to maintain the image.
> >
> > Cheers,
> > Till
> >
> > On Fri, Dec 6, 2019 at 11:05 AM Yangze Guo <[hidden email]<mailto:
> [hidden email]>> wrote:
> > Hi, all,
> >
> > Currently, there is no end to end test or IT case for Mesos deployment
> > while the common deployment related developing would inevitably touch
> > the logic of this component. Thus, some work needs to be done to
> > guarantee experience for both Meos users and contributors. After
> > offline discussion with Till and Xintong, we have some basic ideas and
> > would like to start a discussion thread on adding end to end tests for
> > Flink's Mesos integration.
> >
> > As a first step, we would like to keep the scope of this contribution
> > to be relative small. This may also help us to quickly get some basic
> > test cases that might be helpful for the upcoming 1.10 release.
> >
> > As far as we can think of, what needs to be done is to setup a Mesos
> > framework during the testing and determine which tests need to be
> > included.
> >
> >
> > ** Regarding the Mesos framework, after trying out several approaches,
> > I find that setting up Mesos in docker is probably what we want. The
> > resources needed for building and setting up Mesos from source is
> > probably not affordable in most of the scenarios. So, the one open
> > question that worth discussion is the choice of Docker image. We have
> > come up with two options.
> >
> > - Using official Mesos image[1]
> > The official image was the first alternative that come to our mind,
> > but we run into some sort of Java version compatibility problem that
> > leads to failures of launching task executors. Flink supports Java 9
> > since version 1.9.0 [2], However, the official Docker image of Mesos
> > is built with a development version of JDK 9, which probably has
> > caused this problem. Unless we want to make Flink to also be
> > compatible with the JDK development version used by the official mesos
> > image, this option does not work out. Besides, according to the
> > official roadmap[5], Java 9 is not a long-term support version, which
> > may bring stability risk in future.
> >
> > - Build a custom image
> > I've already tried build a custom image[3] and successfully run most
> > of the existing end to end tests cases with it. The image is built
> > with Ubuntu 16.04, JDK 8 and Mesos 1.7.1. For the mesos e2e test
> > framework, we could either build the image from a Docker file or pull
> > the pre-built image from DockerHub (or other hub services) during the
> > testing.
> > If we decide to publish the an image on DockerHub, we probably need a
> > Flink official  repository/account to hold it.
> >
> >
> > ** Regarding the test coverage, we think the following three tests
> > could be a good starting point that covers a very essential set of
> > behaviors for Mesos deployment.
> > - Wordcount end-to-end test. For verifying the basic process of Mesos
> > deployment.
> > - Multiple submissions of the same job. For preventing resource
> > management problems on Mesos, such as [4]
> > - State TTL RocksDb backend end-to-end test. For verifying memory
> > configuration behaviors, since Mesos has it’s own config options and
> > logics.
> >
> > Unfortunately, neither of us who participated the initial offline
> > discussion has much experience for running flink on mesos in
> > production. It would be good that users and experts who actually use
> > flink on mesos can join the discussion and provide some feedbacks. Any
> > feedback, idea, suggestion, concern and question will be welcomed and
> > appreciated.
> >
> >
> > BTW, we would like to raise a survey on the usages of Flink on Mesos
> > in the community. For the Flink on Mesos users, we would like to
> > learn:
> > - Which version of Mesos do you use and what setups (such as Marathon)
> > do you need for Mesos
> > - Is it Flink job cluster or session cluster that  is majorly used
> > - How is the scale of the Flink / Mesos cluster
> >
> >
> > [1]https://hub.docker.com/r/mesosphere/mesos
> > [2]https://issues.apache.org/jira/browse/FLINK-11307
> > [3]https://hub.docker.com/repository/docker/karmagyz/mesos-flink
> > [4]https://issues.apache.org/jira/browse/FLINK-14074
> > [5]https://www.oracle.com/technetwork/java/java-se-support-roadmap.html
> >
> >
> > Best,
> > Yangze Guo
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration

Yangze Guo
Thanks for the feedback, Yang.

Some updates I want to share in this thread.
I have built a PoC version of Meos e2e test with WordCount
workflow.[1] Then, I ran it in the testing environment. As the result
shown here[2]:
- For pulling image from DockerHub, it took 1 minute and 21 seconds
- For building it locally, it took 2 minutes and 54 seconds.

I prefer building it locally. Although it is slower, I think the time
overhead, comparing to the cost of maintaining the image in DockerHub
and the whole test process, is trivial for building or pulling the
image.

I look forward to hearing from you. ;)

Best,
Yangze Guo

[1]https://github.com/KarmaGYZ/flink/commit/0406d942446a1b17f81d93235b21a829bf88ccf0
[2]https://travis-ci.org/KarmaGYZ/flink/jobs/623207957
Best,
Yangze Guo

On Mon, Dec 9, 2019 at 2:39 PM Yang Wang <[hidden email]> wrote:

>
> Thanks Yangze for starting this discussion.
>
> Just share my thoughts.
>
> If the mesos official docker image could not meet our requirement, i suggest to build the image locally.
> We have done the same things for yarn e2e tests. This way is more flexible and easy to maintain. However,
> i have no idea how long building the mesos image locally will take. Based on previous experience of yarn, i
> think it may not take too much time.
>
>
>
> Best,
> Yang
>
> Yangze Guo <[hidden email]> 于2019年12月7日周六 下午4:25写道:
>>
>> Thanks for your feedback!
>>
>> @Till
>> Regarding the time overhead, I think it mainly come from the network
>> transmission. For building the image locally, it will totally download
>> 260MB files including the base image and packages. For pulling from
>> DockerHub, the compressed size of the image is 347MB. Thus, I agree
>> that it is ok to build the image locally.
>>
>> @Piyush
>> Thank you for offering the help and sharing your usage scenario. In
>> current stage, I think it will be really helpful if you can compress
>> the custom image[1] or reduce the time overhead to build it locally.
>> Any ideas for improving test coverage will also be appreciated.
>>
>> [1]https://hub.docker.com/layers/karmagyz/mesos-flink/latest/images/sha256-4e1caefea107818aa11374d6ac8a6e889922c81806f5cd791ead141f18ec7e64
>>
>> Best,
>> Yangze Guo
>>
>> On Sat, Dec 7, 2019 at 3:17 AM Piyush Narang <[hidden email]> wrote:
>> >
>> > +1 from our end as well. At Criteo, we are running some Flink jobs on Mesos in production to compute short term features for machine learning. We’d love to help out and contribute on this initiative.
>> >
>> > Thanks,
>> > -- Piyush
>> >
>> >
>> > From: Till Rohrmann <[hidden email]>
>> > Date: Friday, December 6, 2019 at 8:10 AM
>> > To: dev <[hidden email]>
>> > Cc: user <[hidden email]>
>> > Subject: Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration
>> >
>> > Big +1 for adding a fully working e2e test for Flink's Mesos integration. Ideally we would have it ready for the 1.10 release. The lack of such a test has bitten us already multiple times.
>> >
>> > In general I would prefer to use the official image if possible since it frees us from maintaining our own custom image. Since Java 9 is no longer officially supported as we opted for supporting Java 11 (LTS) it might not be feasible, though. How much longer would building the custom image vs. downloading the custom image from DockerHub be? Maybe it is ok to build the image locally. Then we would not have to maintain the image.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Fri, Dec 6, 2019 at 11:05 AM Yangze Guo <[hidden email]<mailto:[hidden email]>> wrote:
>> > Hi, all,
>> >
>> > Currently, there is no end to end test or IT case for Mesos deployment
>> > while the common deployment related developing would inevitably touch
>> > the logic of this component. Thus, some work needs to be done to
>> > guarantee experience for both Meos users and contributors. After
>> > offline discussion with Till and Xintong, we have some basic ideas and
>> > would like to start a discussion thread on adding end to end tests for
>> > Flink's Mesos integration.
>> >
>> > As a first step, we would like to keep the scope of this contribution
>> > to be relative small. This may also help us to quickly get some basic
>> > test cases that might be helpful for the upcoming 1.10 release.
>> >
>> > As far as we can think of, what needs to be done is to setup a Mesos
>> > framework during the testing and determine which tests need to be
>> > included.
>> >
>> >
>> > ** Regarding the Mesos framework, after trying out several approaches,
>> > I find that setting up Mesos in docker is probably what we want. The
>> > resources needed for building and setting up Mesos from source is
>> > probably not affordable in most of the scenarios. So, the one open
>> > question that worth discussion is the choice of Docker image. We have
>> > come up with two options.
>> >
>> > - Using official Mesos image[1]
>> > The official image was the first alternative that come to our mind,
>> > but we run into some sort of Java version compatibility problem that
>> > leads to failures of launching task executors. Flink supports Java 9
>> > since version 1.9.0 [2], However, the official Docker image of Mesos
>> > is built with a development version of JDK 9, which probably has
>> > caused this problem. Unless we want to make Flink to also be
>> > compatible with the JDK development version used by the official mesos
>> > image, this option does not work out. Besides, according to the
>> > official roadmap[5], Java 9 is not a long-term support version, which
>> > may bring stability risk in future.
>> >
>> > - Build a custom image
>> > I've already tried build a custom image[3] and successfully run most
>> > of the existing end to end tests cases with it. The image is built
>> > with Ubuntu 16.04, JDK 8 and Mesos 1.7.1. For the mesos e2e test
>> > framework, we could either build the image from a Docker file or pull
>> > the pre-built image from DockerHub (or other hub services) during the
>> > testing.
>> > If we decide to publish the an image on DockerHub, we probably need a
>> > Flink official  repository/account to hold it.
>> >
>> >
>> > ** Regarding the test coverage, we think the following three tests
>> > could be a good starting point that covers a very essential set of
>> > behaviors for Mesos deployment.
>> > - Wordcount end-to-end test. For verifying the basic process of Mesos
>> > deployment.
>> > - Multiple submissions of the same job. For preventing resource
>> > management problems on Mesos, such as [4]
>> > - State TTL RocksDb backend end-to-end test. For verifying memory
>> > configuration behaviors, since Mesos has it’s own config options and
>> > logics.
>> >
>> > Unfortunately, neither of us who participated the initial offline
>> > discussion has much experience for running flink on mesos in
>> > production. It would be good that users and experts who actually use
>> > flink on mesos can join the discussion and provide some feedbacks. Any
>> > feedback, idea, suggestion, concern and question will be welcomed and
>> > appreciated.
>> >
>> >
>> > BTW, we would like to raise a survey on the usages of Flink on Mesos
>> > in the community. For the Flink on Mesos users, we would like to
>> > learn:
>> > - Which version of Mesos do you use and what setups (such as Marathon)
>> > do you need for Mesos
>> > - Is it Flink job cluster or session cluster that  is majorly used
>> > - How is the scale of the Flink / Mesos cluster
>> >
>> >
>> > [1]https://hub.docker.com/r/mesosphere/mesos
>> > [2]https://issues.apache.org/jira/browse/FLINK-11307
>> > [3]https://hub.docker.com/repository/docker/karmagyz/mesos-flink
>> > [4]https://issues.apache.org/jira/browse/FLINK-14074
>> > [5]https://www.oracle.com/technetwork/java/java-se-support-roadmap.html
>> >
>> >
>> > Best,
>> > Yangze Guo
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration

Xintong Song
Thanks, Yangtze.

+1 for building the image locally.
The time consumption for both building image locally and pulling it from
DockerHub sounds reasonable and affordable. Therefore, I'm also in favor of
avoiding the cost maintaining a custom image.

Thank you~

Xintong Song



On Wed, Dec 11, 2019 at 10:11 AM Yangze Guo <[hidden email]> wrote:

> Thanks for the feedback, Yang.
>
> Some updates I want to share in this thread.
> I have built a PoC version of Meos e2e test with WordCount
> workflow.[1] Then, I ran it in the testing environment. As the result
> shown here[2]:
> - For pulling image from DockerHub, it took 1 minute and 21 seconds
> - For building it locally, it took 2 minutes and 54 seconds.
>
> I prefer building it locally. Although it is slower, I think the time
> overhead, comparing to the cost of maintaining the image in DockerHub
> and the whole test process, is trivial for building or pulling the
> image.
>
> I look forward to hearing from you. ;)
>
> Best,
> Yangze Guo
>
> [1]
> https://github.com/KarmaGYZ/flink/commit/0406d942446a1b17f81d93235b21a829bf88ccf0
> [2]https://travis-ci.org/KarmaGYZ/flink/jobs/623207957
> Best,
> Yangze Guo
>
> On Mon, Dec 9, 2019 at 2:39 PM Yang Wang <[hidden email]> wrote:
> >
> > Thanks Yangze for starting this discussion.
> >
> > Just share my thoughts.
> >
> > If the mesos official docker image could not meet our requirement, i
> suggest to build the image locally.
> > We have done the same things for yarn e2e tests. This way is more
> flexible and easy to maintain. However,
> > i have no idea how long building the mesos image locally will take.
> Based on previous experience of yarn, i
> > think it may not take too much time.
> >
> >
> >
> > Best,
> > Yang
> >
> > Yangze Guo <[hidden email]> 于2019年12月7日周六 下午4:25写道:
> >>
> >> Thanks for your feedback!
> >>
> >> @Till
> >> Regarding the time overhead, I think it mainly come from the network
> >> transmission. For building the image locally, it will totally download
> >> 260MB files including the base image and packages. For pulling from
> >> DockerHub, the compressed size of the image is 347MB. Thus, I agree
> >> that it is ok to build the image locally.
> >>
> >> @Piyush
> >> Thank you for offering the help and sharing your usage scenario. In
> >> current stage, I think it will be really helpful if you can compress
> >> the custom image[1] or reduce the time overhead to build it locally.
> >> Any ideas for improving test coverage will also be appreciated.
> >>
> >> [1]
> https://hub.docker.com/layers/karmagyz/mesos-flink/latest/images/sha256-4e1caefea107818aa11374d6ac8a6e889922c81806f5cd791ead141f18ec7e64
> >>
> >> Best,
> >> Yangze Guo
> >>
> >> On Sat, Dec 7, 2019 at 3:17 AM Piyush Narang <[hidden email]>
> wrote:
> >> >
> >> > +1 from our end as well. At Criteo, we are running some Flink jobs on
> Mesos in production to compute short term features for machine learning.
> We’d love to help out and contribute on this initiative.
> >> >
> >> > Thanks,
> >> > -- Piyush
> >> >
> >> >
> >> > From: Till Rohrmann <[hidden email]>
> >> > Date: Friday, December 6, 2019 at 8:10 AM
> >> > To: dev <[hidden email]>
> >> > Cc: user <[hidden email]>
> >> > Subject: Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration
> >> >
> >> > Big +1 for adding a fully working e2e test for Flink's Mesos
> integration. Ideally we would have it ready for the 1.10 release. The lack
> of such a test has bitten us already multiple times.
> >> >
> >> > In general I would prefer to use the official image if possible since
> it frees us from maintaining our own custom image. Since Java 9 is no
> longer officially supported as we opted for supporting Java 11 (LTS) it
> might not be feasible, though. How much longer would building the custom
> image vs. downloading the custom image from DockerHub be? Maybe it is ok to
> build the image locally. Then we would not have to maintain the image.
> >> >
> >> > Cheers,
> >> > Till
> >> >
> >> > On Fri, Dec 6, 2019 at 11:05 AM Yangze Guo <[hidden email]
> <mailto:[hidden email]>> wrote:
> >> > Hi, all,
> >> >
> >> > Currently, there is no end to end test or IT case for Mesos deployment
> >> > while the common deployment related developing would inevitably touch
> >> > the logic of this component. Thus, some work needs to be done to
> >> > guarantee experience for both Meos users and contributors. After
> >> > offline discussion with Till and Xintong, we have some basic ideas and
> >> > would like to start a discussion thread on adding end to end tests for
> >> > Flink's Mesos integration.
> >> >
> >> > As a first step, we would like to keep the scope of this contribution
> >> > to be relative small. This may also help us to quickly get some basic
> >> > test cases that might be helpful for the upcoming 1.10 release.
> >> >
> >> > As far as we can think of, what needs to be done is to setup a Mesos
> >> > framework during the testing and determine which tests need to be
> >> > included.
> >> >
> >> >
> >> > ** Regarding the Mesos framework, after trying out several approaches,
> >> > I find that setting up Mesos in docker is probably what we want. The
> >> > resources needed for building and setting up Mesos from source is
> >> > probably not affordable in most of the scenarios. So, the one open
> >> > question that worth discussion is the choice of Docker image. We have
> >> > come up with two options.
> >> >
> >> > - Using official Mesos image[1]
> >> > The official image was the first alternative that come to our mind,
> >> > but we run into some sort of Java version compatibility problem that
> >> > leads to failures of launching task executors. Flink supports Java 9
> >> > since version 1.9.0 [2], However, the official Docker image of Mesos
> >> > is built with a development version of JDK 9, which probably has
> >> > caused this problem. Unless we want to make Flink to also be
> >> > compatible with the JDK development version used by the official mesos
> >> > image, this option does not work out. Besides, according to the
> >> > official roadmap[5], Java 9 is not a long-term support version, which
> >> > may bring stability risk in future.
> >> >
> >> > - Build a custom image
> >> > I've already tried build a custom image[3] and successfully run most
> >> > of the existing end to end tests cases with it. The image is built
> >> > with Ubuntu 16.04, JDK 8 and Mesos 1.7.1. For the mesos e2e test
> >> > framework, we could either build the image from a Docker file or pull
> >> > the pre-built image from DockerHub (or other hub services) during the
> >> > testing.
> >> > If we decide to publish the an image on DockerHub, we probably need a
> >> > Flink official  repository/account to hold it.
> >> >
> >> >
> >> > ** Regarding the test coverage, we think the following three tests
> >> > could be a good starting point that covers a very essential set of
> >> > behaviors for Mesos deployment.
> >> > - Wordcount end-to-end test. For verifying the basic process of Mesos
> >> > deployment.
> >> > - Multiple submissions of the same job. For preventing resource
> >> > management problems on Mesos, such as [4]
> >> > - State TTL RocksDb backend end-to-end test. For verifying memory
> >> > configuration behaviors, since Mesos has it’s own config options and
> >> > logics.
> >> >
> >> > Unfortunately, neither of us who participated the initial offline
> >> > discussion has much experience for running flink on mesos in
> >> > production. It would be good that users and experts who actually use
> >> > flink on mesos can join the discussion and provide some feedbacks. Any
> >> > feedback, idea, suggestion, concern and question will be welcomed and
> >> > appreciated.
> >> >
> >> >
> >> > BTW, we would like to raise a survey on the usages of Flink on Mesos
> >> > in the community. For the Flink on Mesos users, we would like to
> >> > learn:
> >> > - Which version of Mesos do you use and what setups (such as Marathon)
> >> > do you need for Mesos
> >> > - Is it Flink job cluster or session cluster that  is majorly used
> >> > - How is the scale of the Flink / Mesos cluster
> >> >
> >> >
> >> > [1]https://hub.docker.com/r/mesosphere/mesos
> >> > [2]https://issues.apache.org/jira/browse/FLINK-11307
> >> > [3]https://hub.docker.com/repository/docker/karmagyz/mesos-flink
> >> > [4]https://issues.apache.org/jira/browse/FLINK-14074
> >> > [5]
> https://www.oracle.com/technetwork/java/java-se-support-roadmap.html
> >> >
> >> >
> >> > Best,
> >> > Yangze Guo
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration

Till Rohrmann
+1 for building the image locally. If need should arise, then we could
change it always later.

Cheers,
Till

On Wed, Dec 11, 2019 at 4:05 AM Xintong Song <[hidden email]> wrote:

> Thanks, Yangtze.
>
> +1 for building the image locally.
> The time consumption for both building image locally and pulling it from
> DockerHub sounds reasonable and affordable. Therefore, I'm also in favor of
> avoiding the cost maintaining a custom image.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, Dec 11, 2019 at 10:11 AM Yangze Guo <[hidden email]> wrote:
>
> > Thanks for the feedback, Yang.
> >
> > Some updates I want to share in this thread.
> > I have built a PoC version of Meos e2e test with WordCount
> > workflow.[1] Then, I ran it in the testing environment. As the result
> > shown here[2]:
> > - For pulling image from DockerHub, it took 1 minute and 21 seconds
> > - For building it locally, it took 2 minutes and 54 seconds.
> >
> > I prefer building it locally. Although it is slower, I think the time
> > overhead, comparing to the cost of maintaining the image in DockerHub
> > and the whole test process, is trivial for building or pulling the
> > image.
> >
> > I look forward to hearing from you. ;)
> >
> > Best,
> > Yangze Guo
> >
> > [1]
> >
> https://github.com/KarmaGYZ/flink/commit/0406d942446a1b17f81d93235b21a829bf88ccf0
> > [2]https://travis-ci.org/KarmaGYZ/flink/jobs/623207957
> > Best,
> > Yangze Guo
> >
> > On Mon, Dec 9, 2019 at 2:39 PM Yang Wang <[hidden email]> wrote:
> > >
> > > Thanks Yangze for starting this discussion.
> > >
> > > Just share my thoughts.
> > >
> > > If the mesos official docker image could not meet our requirement, i
> > suggest to build the image locally.
> > > We have done the same things for yarn e2e tests. This way is more
> > flexible and easy to maintain. However,
> > > i have no idea how long building the mesos image locally will take.
> > Based on previous experience of yarn, i
> > > think it may not take too much time.
> > >
> > >
> > >
> > > Best,
> > > Yang
> > >
> > > Yangze Guo <[hidden email]> 于2019年12月7日周六 下午4:25写道:
> > >>
> > >> Thanks for your feedback!
> > >>
> > >> @Till
> > >> Regarding the time overhead, I think it mainly come from the network
> > >> transmission. For building the image locally, it will totally download
> > >> 260MB files including the base image and packages. For pulling from
> > >> DockerHub, the compressed size of the image is 347MB. Thus, I agree
> > >> that it is ok to build the image locally.
> > >>
> > >> @Piyush
> > >> Thank you for offering the help and sharing your usage scenario. In
> > >> current stage, I think it will be really helpful if you can compress
> > >> the custom image[1] or reduce the time overhead to build it locally.
> > >> Any ideas for improving test coverage will also be appreciated.
> > >>
> > >> [1]
> >
> https://hub.docker.com/layers/karmagyz/mesos-flink/latest/images/sha256-4e1caefea107818aa11374d6ac8a6e889922c81806f5cd791ead141f18ec7e64
> > >>
> > >> Best,
> > >> Yangze Guo
> > >>
> > >> On Sat, Dec 7, 2019 at 3:17 AM Piyush Narang <[hidden email]>
> > wrote:
> > >> >
> > >> > +1 from our end as well. At Criteo, we are running some Flink jobs
> on
> > Mesos in production to compute short term features for machine learning.
> > We’d love to help out and contribute on this initiative.
> > >> >
> > >> > Thanks,
> > >> > -- Piyush
> > >> >
> > >> >
> > >> > From: Till Rohrmann <[hidden email]>
> > >> > Date: Friday, December 6, 2019 at 8:10 AM
> > >> > To: dev <[hidden email]>
> > >> > Cc: user <[hidden email]>
> > >> > Subject: Re: [DISCUSS] Adding e2e tests for Flink's Mesos
> integration
> > >> >
> > >> > Big +1 for adding a fully working e2e test for Flink's Mesos
> > integration. Ideally we would have it ready for the 1.10 release. The
> lack
> > of such a test has bitten us already multiple times.
> > >> >
> > >> > In general I would prefer to use the official image if possible
> since
> > it frees us from maintaining our own custom image. Since Java 9 is no
> > longer officially supported as we opted for supporting Java 11 (LTS) it
> > might not be feasible, though. How much longer would building the custom
> > image vs. downloading the custom image from DockerHub be? Maybe it is ok
> to
> > build the image locally. Then we would not have to maintain the image.
> > >> >
> > >> > Cheers,
> > >> > Till
> > >> >
> > >> > On Fri, Dec 6, 2019 at 11:05 AM Yangze Guo <[hidden email]
> > <mailto:[hidden email]>> wrote:
> > >> > Hi, all,
> > >> >
> > >> > Currently, there is no end to end test or IT case for Mesos
> deployment
> > >> > while the common deployment related developing would inevitably
> touch
> > >> > the logic of this component. Thus, some work needs to be done to
> > >> > guarantee experience for both Meos users and contributors. After
> > >> > offline discussion with Till and Xintong, we have some basic ideas
> and
> > >> > would like to start a discussion thread on adding end to end tests
> for
> > >> > Flink's Mesos integration.
> > >> >
> > >> > As a first step, we would like to keep the scope of this
> contribution
> > >> > to be relative small. This may also help us to quickly get some
> basic
> > >> > test cases that might be helpful for the upcoming 1.10 release.
> > >> >
> > >> > As far as we can think of, what needs to be done is to setup a Mesos
> > >> > framework during the testing and determine which tests need to be
> > >> > included.
> > >> >
> > >> >
> > >> > ** Regarding the Mesos framework, after trying out several
> approaches,
> > >> > I find that setting up Mesos in docker is probably what we want. The
> > >> > resources needed for building and setting up Mesos from source is
> > >> > probably not affordable in most of the scenarios. So, the one open
> > >> > question that worth discussion is the choice of Docker image. We
> have
> > >> > come up with two options.
> > >> >
> > >> > - Using official Mesos image[1]
> > >> > The official image was the first alternative that come to our mind,
> > >> > but we run into some sort of Java version compatibility problem that
> > >> > leads to failures of launching task executors. Flink supports Java 9
> > >> > since version 1.9.0 [2], However, the official Docker image of Mesos
> > >> > is built with a development version of JDK 9, which probably has
> > >> > caused this problem. Unless we want to make Flink to also be
> > >> > compatible with the JDK development version used by the official
> mesos
> > >> > image, this option does not work out. Besides, according to the
> > >> > official roadmap[5], Java 9 is not a long-term support version,
> which
> > >> > may bring stability risk in future.
> > >> >
> > >> > - Build a custom image
> > >> > I've already tried build a custom image[3] and successfully run most
> > >> > of the existing end to end tests cases with it. The image is built
> > >> > with Ubuntu 16.04, JDK 8 and Mesos 1.7.1. For the mesos e2e test
> > >> > framework, we could either build the image from a Docker file or
> pull
> > >> > the pre-built image from DockerHub (or other hub services) during
> the
> > >> > testing.
> > >> > If we decide to publish the an image on DockerHub, we probably need
> a
> > >> > Flink official  repository/account to hold it.
> > >> >
> > >> >
> > >> > ** Regarding the test coverage, we think the following three tests
> > >> > could be a good starting point that covers a very essential set of
> > >> > behaviors for Mesos deployment.
> > >> > - Wordcount end-to-end test. For verifying the basic process of
> Mesos
> > >> > deployment.
> > >> > - Multiple submissions of the same job. For preventing resource
> > >> > management problems on Mesos, such as [4]
> > >> > - State TTL RocksDb backend end-to-end test. For verifying memory
> > >> > configuration behaviors, since Mesos has it’s own config options and
> > >> > logics.
> > >> >
> > >> > Unfortunately, neither of us who participated the initial offline
> > >> > discussion has much experience for running flink on mesos in
> > >> > production. It would be good that users and experts who actually use
> > >> > flink on mesos can join the discussion and provide some feedbacks.
> Any
> > >> > feedback, idea, suggestion, concern and question will be welcomed
> and
> > >> > appreciated.
> > >> >
> > >> >
> > >> > BTW, we would like to raise a survey on the usages of Flink on Mesos
> > >> > in the community. For the Flink on Mesos users, we would like to
> > >> > learn:
> > >> > - Which version of Mesos do you use and what setups (such as
> Marathon)
> > >> > do you need for Mesos
> > >> > - Is it Flink job cluster or session cluster that  is majorly used
> > >> > - How is the scale of the Flink / Mesos cluster
> > >> >
> > >> >
> > >> > [1]https://hub.docker.com/r/mesosphere/mesos
> > >> > [2]https://issues.apache.org/jira/browse/FLINK-11307
> > >> > [3]https://hub.docker.com/repository/docker/karmagyz/mesos-flink
> > >> > [4]https://issues.apache.org/jira/browse/FLINK-14074
> > >> > [5]
> > https://www.oracle.com/technetwork/java/java-se-support-roadmap.html
> > >> >
> > >> >
> > >> > Best,
> > >> > Yangze Guo
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration

Gary Yao-3
Thanks for driving this effort. Also +1 from my side. I have left a few
questions below.

> - Wordcount end-to-end test. For verifying the basic process of Mesos
> deployment.

Would this add additional test coverage compared to the
"multiple submissions" test case? I am asking because the E2E tests are
already
expensive to run, and adding new tests should be carefully considered.

> - State TTL RocksDb backend end-to-end test. For verifying memory
> configuration behaviors, since Mesos has it’s own config options and
> logics.

Can you elaborate more on this? Which config options are relevant here?

On Wed, Dec 11, 2019 at 9:58 AM Till Rohrmann <[hidden email]> wrote:

> +1 for building the image locally. If need should arise, then we could
> change it always later.
>
> Cheers,
> Till
>
> On Wed, Dec 11, 2019 at 4:05 AM Xintong Song <[hidden email]>
> wrote:
>
> > Thanks, Yangtze.
> >
> > +1 for building the image locally.
> > The time consumption for both building image locally and pulling it from
> > DockerHub sounds reasonable and affordable. Therefore, I'm also in favor
> of
> > avoiding the cost maintaining a custom image.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Wed, Dec 11, 2019 at 10:11 AM Yangze Guo <[hidden email]> wrote:
> >
> > > Thanks for the feedback, Yang.
> > >
> > > Some updates I want to share in this thread.
> > > I have built a PoC version of Meos e2e test with WordCount
> > > workflow.[1] Then, I ran it in the testing environment. As the result
> > > shown here[2]:
> > > - For pulling image from DockerHub, it took 1 minute and 21 seconds
> > > - For building it locally, it took 2 minutes and 54 seconds.
> > >
> > > I prefer building it locally. Although it is slower, I think the time
> > > overhead, comparing to the cost of maintaining the image in DockerHub
> > > and the whole test process, is trivial for building or pulling the
> > > image.
> > >
> > > I look forward to hearing from you. ;)
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > [1]
> > >
> >
> https://github.com/KarmaGYZ/flink/commit/0406d942446a1b17f81d93235b21a829bf88ccf0
> > > [2]https://travis-ci.org/KarmaGYZ/flink/jobs/623207957
> > > Best,
> > > Yangze Guo
> > >
> > > On Mon, Dec 9, 2019 at 2:39 PM Yang Wang <[hidden email]>
> wrote:
> > > >
> > > > Thanks Yangze for starting this discussion.
> > > >
> > > > Just share my thoughts.
> > > >
> > > > If the mesos official docker image could not meet our requirement, i
> > > suggest to build the image locally.
> > > > We have done the same things for yarn e2e tests. This way is more
> > > flexible and easy to maintain. However,
> > > > i have no idea how long building the mesos image locally will take.
> > > Based on previous experience of yarn, i
> > > > think it may not take too much time.
> > > >
> > > >
> > > >
> > > > Best,
> > > > Yang
> > > >
> > > > Yangze Guo <[hidden email]> 于2019年12月7日周六 下午4:25写道:
> > > >>
> > > >> Thanks for your feedback!
> > > >>
> > > >> @Till
> > > >> Regarding the time overhead, I think it mainly come from the network
> > > >> transmission. For building the image locally, it will totally
> download
> > > >> 260MB files including the base image and packages. For pulling from
> > > >> DockerHub, the compressed size of the image is 347MB. Thus, I agree
> > > >> that it is ok to build the image locally.
> > > >>
> > > >> @Piyush
> > > >> Thank you for offering the help and sharing your usage scenario. In
> > > >> current stage, I think it will be really helpful if you can compress
> > > >> the custom image[1] or reduce the time overhead to build it locally.
> > > >> Any ideas for improving test coverage will also be appreciated.
> > > >>
> > > >> [1]
> > >
> >
> https://hub.docker.com/layers/karmagyz/mesos-flink/latest/images/sha256-4e1caefea107818aa11374d6ac8a6e889922c81806f5cd791ead141f18ec7e64
> > > >>
> > > >> Best,
> > > >> Yangze Guo
> > > >>
> > > >> On Sat, Dec 7, 2019 at 3:17 AM Piyush Narang <[hidden email]>
> > > wrote:
> > > >> >
> > > >> > +1 from our end as well. At Criteo, we are running some Flink jobs
> > on
> > > Mesos in production to compute short term features for machine
> learning.
> > > We’d love to help out and contribute on this initiative.
> > > >> >
> > > >> > Thanks,
> > > >> > -- Piyush
> > > >> >
> > > >> >
> > > >> > From: Till Rohrmann <[hidden email]>
> > > >> > Date: Friday, December 6, 2019 at 8:10 AM
> > > >> > To: dev <[hidden email]>
> > > >> > Cc: user <[hidden email]>
> > > >> > Subject: Re: [DISCUSS] Adding e2e tests for Flink's Mesos
> > integration
> > > >> >
> > > >> > Big +1 for adding a fully working e2e test for Flink's Mesos
> > > integration. Ideally we would have it ready for the 1.10 release. The
> > lack
> > > of such a test has bitten us already multiple times.
> > > >> >
> > > >> > In general I would prefer to use the official image if possible
> > since
> > > it frees us from maintaining our own custom image. Since Java 9 is no
> > > longer officially supported as we opted for supporting Java 11 (LTS) it
> > > might not be feasible, though. How much longer would building the
> custom
> > > image vs. downloading the custom image from DockerHub be? Maybe it is
> ok
> > to
> > > build the image locally. Then we would not have to maintain the image.
> > > >> >
> > > >> > Cheers,
> > > >> > Till
> > > >> >
> > > >> > On Fri, Dec 6, 2019 at 11:05 AM Yangze Guo <[hidden email]
> > > <mailto:[hidden email]>> wrote:
> > > >> > Hi, all,
> > > >> >
> > > >> > Currently, there is no end to end test or IT case for Mesos
> > deployment
> > > >> > while the common deployment related developing would inevitably
> > touch
> > > >> > the logic of this component. Thus, some work needs to be done to
> > > >> > guarantee experience for both Meos users and contributors. After
> > > >> > offline discussion with Till and Xintong, we have some basic ideas
> > and
> > > >> > would like to start a discussion thread on adding end to end tests
> > for
> > > >> > Flink's Mesos integration.
> > > >> >
> > > >> > As a first step, we would like to keep the scope of this
> > contribution
> > > >> > to be relative small. This may also help us to quickly get some
> > basic
> > > >> > test cases that might be helpful for the upcoming 1.10 release.
> > > >> >
> > > >> > As far as we can think of, what needs to be done is to setup a
> Mesos
> > > >> > framework during the testing and determine which tests need to be
> > > >> > included.
> > > >> >
> > > >> >
> > > >> > ** Regarding the Mesos framework, after trying out several
> > approaches,
> > > >> > I find that setting up Mesos in docker is probably what we want.
> The
> > > >> > resources needed for building and setting up Mesos from source is
> > > >> > probably not affordable in most of the scenarios. So, the one open
> > > >> > question that worth discussion is the choice of Docker image. We
> > have
> > > >> > come up with two options.
> > > >> >
> > > >> > - Using official Mesos image[1]
> > > >> > The official image was the first alternative that come to our
> mind,
> > > >> > but we run into some sort of Java version compatibility problem
> that
> > > >> > leads to failures of launching task executors. Flink supports
> Java 9
> > > >> > since version 1.9.0 [2], However, the official Docker image of
> Mesos
> > > >> > is built with a development version of JDK 9, which probably has
> > > >> > caused this problem. Unless we want to make Flink to also be
> > > >> > compatible with the JDK development version used by the official
> > mesos
> > > >> > image, this option does not work out. Besides, according to the
> > > >> > official roadmap[5], Java 9 is not a long-term support version,
> > which
> > > >> > may bring stability risk in future.
> > > >> >
> > > >> > - Build a custom image
> > > >> > I've already tried build a custom image[3] and successfully run
> most
> > > >> > of the existing end to end tests cases with it. The image is built
> > > >> > with Ubuntu 16.04, JDK 8 and Mesos 1.7.1. For the mesos e2e test
> > > >> > framework, we could either build the image from a Docker file or
> > pull
> > > >> > the pre-built image from DockerHub (or other hub services) during
> > the
> > > >> > testing.
> > > >> > If we decide to publish the an image on DockerHub, we probably
> need
> > a
> > > >> > Flink official  repository/account to hold it.
> > > >> >
> > > >> >
> > > >> > ** Regarding the test coverage, we think the following three tests
> > > >> > could be a good starting point that covers a very essential set of
> > > >> > behaviors for Mesos deployment.
> > > >> > - Wordcount end-to-end test. For verifying the basic process of
> > Mesos
> > > >> > deployment.
> > > >> > - Multiple submissions of the same job. For preventing resource
> > > >> > management problems on Mesos, such as [4]
> > > >> > - State TTL RocksDb backend end-to-end test. For verifying memory
> > > >> > configuration behaviors, since Mesos has it’s own config options
> and
> > > >> > logics.
> > > >> >
> > > >> > Unfortunately, neither of us who participated the initial offline
> > > >> > discussion has much experience for running flink on mesos in
> > > >> > production. It would be good that users and experts who actually
> use
> > > >> > flink on mesos can join the discussion and provide some feedbacks.
> > Any
> > > >> > feedback, idea, suggestion, concern and question will be welcomed
> > and
> > > >> > appreciated.
> > > >> >
> > > >> >
> > > >> > BTW, we would like to raise a survey on the usages of Flink on
> Mesos
> > > >> > in the community. For the Flink on Mesos users, we would like to
> > > >> > learn:
> > > >> > - Which version of Mesos do you use and what setups (such as
> > Marathon)
> > > >> > do you need for Mesos
> > > >> > - Is it Flink job cluster or session cluster that  is majorly used
> > > >> > - How is the scale of the Flink / Mesos cluster
> > > >> >
> > > >> >
> > > >> > [1]https://hub.docker.com/r/mesosphere/mesos
> > > >> > [2]https://issues.apache.org/jira/browse/FLINK-11307
> > > >> > [3]https://hub.docker.com/repository/docker/karmagyz/mesos-flink
> > > >> > [4]https://issues.apache.org/jira/browse/FLINK-14074
> > > >> > [5]
> > > https://www.oracle.com/technetwork/java/java-se-support-roadmap.html
> > > >> >
> > > >> >
> > > >> > Best,
> > > >> > Yangze Guo
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration

Yangze Guo
Thanks for the feedback, Gary.

Regarding the WordCount test:
- True. There is no test coverage increment compared to others.
However, I think each test case better not have multiple purposes so
that we could find out the root cause quickly. As discussed in
FLINK-15135[1], I prefer only including WordCount test as the first
step. If the time overhead of E2E tests become severe in the future, I
agree to remove it. WDYT?
- I think the main overhead comes from building the image. The
subsequent tests will run fast since they will not build it again.

Regarding the Rocks test, I think it is a typical scenario using
off-heap memory. The main purpose is to verify the memory usage and
memory configuration in Mesos mode. Two typical use cases are off-heap
and on-heap. Thus, I think the following two test cases are valuable
to be included:
- A streaming task using heap backend. It should explicitly set the
“taskmanager.memory.managed.size” to zero to check the potential
unexpected usage of off-heap memory.
- A streaming task using rocks backend. It covers the scenario using
off-heap memory.

Look forward to your kind feedback.

[1]https://issues.apache.org/jira/browse/FLINK-15135

Best,
Yangze Guo



On Wed, Dec 11, 2019 at 6:14 PM Gary Yao <[hidden email]> wrote:

>
> Thanks for driving this effort. Also +1 from my side. I have left a few
> questions below.
>
> > - Wordcount end-to-end test. For verifying the basic process of Mesos
> > deployment.
>
> Would this add additional test coverage compared to the
> "multiple submissions" test case? I am asking because the E2E tests are
> already
> expensive to run, and adding new tests should be carefully considered.
>
> > - State TTL RocksDb backend end-to-end test. For verifying memory
> > configuration behaviors, since Mesos has it’s own config options and
> > logics.
>
> Can you elaborate more on this? Which config options are relevant here?
>
> On Wed, Dec 11, 2019 at 9:58 AM Till Rohrmann <[hidden email]> wrote:
>
> > +1 for building the image locally. If need should arise, then we could
> > change it always later.
> >
> > Cheers,
> > Till
> >
> > On Wed, Dec 11, 2019 at 4:05 AM Xintong Song <[hidden email]>
> > wrote:
> >
> > > Thanks, Yangtze.
> > >
> > > +1 for building the image locally.
> > > The time consumption for both building image locally and pulling it from
> > > DockerHub sounds reasonable and affordable. Therefore, I'm also in favor
> > of
> > > avoiding the cost maintaining a custom image.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Wed, Dec 11, 2019 at 10:11 AM Yangze Guo <[hidden email]> wrote:
> > >
> > > > Thanks for the feedback, Yang.
> > > >
> > > > Some updates I want to share in this thread.
> > > > I have built a PoC version of Meos e2e test with WordCount
> > > > workflow.[1] Then, I ran it in the testing environment. As the result
> > > > shown here[2]:
> > > > - For pulling image from DockerHub, it took 1 minute and 21 seconds
> > > > - For building it locally, it took 2 minutes and 54 seconds.
> > > >
> > > > I prefer building it locally. Although it is slower, I think the time
> > > > overhead, comparing to the cost of maintaining the image in DockerHub
> > > > and the whole test process, is trivial for building or pulling the
> > > > image.
> > > >
> > > > I look forward to hearing from you. ;)
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > [1]
> > > >
> > >
> > https://github.com/KarmaGYZ/flink/commit/0406d942446a1b17f81d93235b21a829bf88ccf0
> > > > [2]https://travis-ci.org/KarmaGYZ/flink/jobs/623207957
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Mon, Dec 9, 2019 at 2:39 PM Yang Wang <[hidden email]>
> > wrote:
> > > > >
> > > > > Thanks Yangze for starting this discussion.
> > > > >
> > > > > Just share my thoughts.
> > > > >
> > > > > If the mesos official docker image could not meet our requirement, i
> > > > suggest to build the image locally.
> > > > > We have done the same things for yarn e2e tests. This way is more
> > > > flexible and easy to maintain. However,
> > > > > i have no idea how long building the mesos image locally will take.
> > > > Based on previous experience of yarn, i
> > > > > think it may not take too much time.
> > > > >
> > > > >
> > > > >
> > > > > Best,
> > > > > Yang
> > > > >
> > > > > Yangze Guo <[hidden email]> 于2019年12月7日周六 下午4:25写道:
> > > > >>
> > > > >> Thanks for your feedback!
> > > > >>
> > > > >> @Till
> > > > >> Regarding the time overhead, I think it mainly come from the network
> > > > >> transmission. For building the image locally, it will totally
> > download
> > > > >> 260MB files including the base image and packages. For pulling from
> > > > >> DockerHub, the compressed size of the image is 347MB. Thus, I agree
> > > > >> that it is ok to build the image locally.
> > > > >>
> > > > >> @Piyush
> > > > >> Thank you for offering the help and sharing your usage scenario. In
> > > > >> current stage, I think it will be really helpful if you can compress
> > > > >> the custom image[1] or reduce the time overhead to build it locally.
> > > > >> Any ideas for improving test coverage will also be appreciated.
> > > > >>
> > > > >> [1]
> > > >
> > >
> > https://hub.docker.com/layers/karmagyz/mesos-flink/latest/images/sha256-4e1caefea107818aa11374d6ac8a6e889922c81806f5cd791ead141f18ec7e64
> > > > >>
> > > > >> Best,
> > > > >> Yangze Guo
> > > > >>
> > > > >> On Sat, Dec 7, 2019 at 3:17 AM Piyush Narang <[hidden email]>
> > > > wrote:
> > > > >> >
> > > > >> > +1 from our end as well. At Criteo, we are running some Flink jobs
> > > on
> > > > Mesos in production to compute short term features for machine
> > learning.
> > > > We’d love to help out and contribute on this initiative.
> > > > >> >
> > > > >> > Thanks,
> > > > >> > -- Piyush
> > > > >> >
> > > > >> >
> > > > >> > From: Till Rohrmann <[hidden email]>
> > > > >> > Date: Friday, December 6, 2019 at 8:10 AM
> > > > >> > To: dev <[hidden email]>
> > > > >> > Cc: user <[hidden email]>
> > > > >> > Subject: Re: [DISCUSS] Adding e2e tests for Flink's Mesos
> > > integration
> > > > >> >
> > > > >> > Big +1 for adding a fully working e2e test for Flink's Mesos
> > > > integration. Ideally we would have it ready for the 1.10 release. The
> > > lack
> > > > of such a test has bitten us already multiple times.
> > > > >> >
> > > > >> > In general I would prefer to use the official image if possible
> > > since
> > > > it frees us from maintaining our own custom image. Since Java 9 is no
> > > > longer officially supported as we opted for supporting Java 11 (LTS) it
> > > > might not be feasible, though. How much longer would building the
> > custom
> > > > image vs. downloading the custom image from DockerHub be? Maybe it is
> > ok
> > > to
> > > > build the image locally. Then we would not have to maintain the image.
> > > > >> >
> > > > >> > Cheers,
> > > > >> > Till
> > > > >> >
> > > > >> > On Fri, Dec 6, 2019 at 11:05 AM Yangze Guo <[hidden email]
> > > > <mailto:[hidden email]>> wrote:
> > > > >> > Hi, all,
> > > > >> >
> > > > >> > Currently, there is no end to end test or IT case for Mesos
> > > deployment
> > > > >> > while the common deployment related developing would inevitably
> > > touch
> > > > >> > the logic of this component. Thus, some work needs to be done to
> > > > >> > guarantee experience for both Meos users and contributors. After
> > > > >> > offline discussion with Till and Xintong, we have some basic ideas
> > > and
> > > > >> > would like to start a discussion thread on adding end to end tests
> > > for
> > > > >> > Flink's Mesos integration.
> > > > >> >
> > > > >> > As a first step, we would like to keep the scope of this
> > > contribution
> > > > >> > to be relative small. This may also help us to quickly get some
> > > basic
> > > > >> > test cases that might be helpful for the upcoming 1.10 release.
> > > > >> >
> > > > >> > As far as we can think of, what needs to be done is to setup a
> > Mesos
> > > > >> > framework during the testing and determine which tests need to be
> > > > >> > included.
> > > > >> >
> > > > >> >
> > > > >> > ** Regarding the Mesos framework, after trying out several
> > > approaches,
> > > > >> > I find that setting up Mesos in docker is probably what we want.
> > The
> > > > >> > resources needed for building and setting up Mesos from source is
> > > > >> > probably not affordable in most of the scenarios. So, the one open
> > > > >> > question that worth discussion is the choice of Docker image. We
> > > have
> > > > >> > come up with two options.
> > > > >> >
> > > > >> > - Using official Mesos image[1]
> > > > >> > The official image was the first alternative that come to our
> > mind,
> > > > >> > but we run into some sort of Java version compatibility problem
> > that
> > > > >> > leads to failures of launching task executors. Flink supports
> > Java 9
> > > > >> > since version 1.9.0 [2], However, the official Docker image of
> > Mesos
> > > > >> > is built with a development version of JDK 9, which probably has
> > > > >> > caused this problem. Unless we want to make Flink to also be
> > > > >> > compatible with the JDK development version used by the official
> > > mesos
> > > > >> > image, this option does not work out. Besides, according to the
> > > > >> > official roadmap[5], Java 9 is not a long-term support version,
> > > which
> > > > >> > may bring stability risk in future.
> > > > >> >
> > > > >> > - Build a custom image
> > > > >> > I've already tried build a custom image[3] and successfully run
> > most
> > > > >> > of the existing end to end tests cases with it. The image is built
> > > > >> > with Ubuntu 16.04, JDK 8 and Mesos 1.7.1. For the mesos e2e test
> > > > >> > framework, we could either build the image from a Docker file or
> > > pull
> > > > >> > the pre-built image from DockerHub (or other hub services) during
> > > the
> > > > >> > testing.
> > > > >> > If we decide to publish the an image on DockerHub, we probably
> > need
> > > a
> > > > >> > Flink official  repository/account to hold it.
> > > > >> >
> > > > >> >
> > > > >> > ** Regarding the test coverage, we think the following three tests
> > > > >> > could be a good starting point that covers a very essential set of
> > > > >> > behaviors for Mesos deployment.
> > > > >> > - Wordcount end-to-end test. For verifying the basic process of
> > > Mesos
> > > > >> > deployment.
> > > > >> > - Multiple submissions of the same job. For preventing resource
> > > > >> > management problems on Mesos, such as [4]
> > > > >> > - State TTL RocksDb backend end-to-end test. For verifying memory
> > > > >> > configuration behaviors, since Mesos has it’s own config options
> > and
> > > > >> > logics.
> > > > >> >
> > > > >> > Unfortunately, neither of us who participated the initial offline
> > > > >> > discussion has much experience for running flink on mesos in
> > > > >> > production. It would be good that users and experts who actually
> > use
> > > > >> > flink on mesos can join the discussion and provide some feedbacks.
> > > Any
> > > > >> > feedback, idea, suggestion, concern and question will be welcomed
> > > and
> > > > >> > appreciated.
> > > > >> >
> > > > >> >
> > > > >> > BTW, we would like to raise a survey on the usages of Flink on
> > Mesos
> > > > >> > in the community. For the Flink on Mesos users, we would like to
> > > > >> > learn:
> > > > >> > - Which version of Mesos do you use and what setups (such as
> > > Marathon)
> > > > >> > do you need for Mesos
> > > > >> > - Is it Flink job cluster or session cluster that  is majorly used
> > > > >> > - How is the scale of the Flink / Mesos cluster
> > > > >> >
> > > > >> >
> > > > >> > [1]https://hub.docker.com/r/mesosphere/mesos
> > > > >> > [2]https://issues.apache.org/jira/browse/FLINK-11307
> > > > >> > [3]https://hub.docker.com/repository/docker/karmagyz/mesos-flink
> > > > >> > [4]https://issues.apache.org/jira/browse/FLINK-14074
> > > > >> > [5]
> > > > https://www.oracle.com/technetwork/java/java-se-support-roadmap.html
> > > > >> >
> > > > >> >
> > > > >> > Best,
> > > > >> > Yangze Guo
> > > >
> > >
> >
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Adding e2e tests for Flink's Mesos integration

Gary Yao-3
Thanks for your explanation. I think the proposal is reasonable.

On Thu, Dec 12, 2019 at 3:32 AM Yangze Guo <[hidden email]> wrote:

> Thanks for the feedback, Gary.
>
> Regarding the WordCount test:
> - True. There is no test coverage increment compared to others.
> However, I think each test case better not have multiple purposes so
> that we could find out the root cause quickly. As discussed in
> FLINK-15135[1], I prefer only including WordCount test as the first
> step. If the time overhead of E2E tests become severe in the future, I
> agree to remove it. WDYT?
> - I think the main overhead comes from building the image. The
> subsequent tests will run fast since they will not build it again.
>
> Regarding the Rocks test, I think it is a typical scenario using
> off-heap memory. The main purpose is to verify the memory usage and
> memory configuration in Mesos mode. Two typical use cases are off-heap
> and on-heap. Thus, I think the following two test cases are valuable
> to be included:
> - A streaming task using heap backend. It should explicitly set the
> “taskmanager.memory.managed.size” to zero to check the potential
> unexpected usage of off-heap memory.
> - A streaming task using rocks backend. It covers the scenario using
> off-heap memory.
>
> Look forward to your kind feedback.
>
> [1]https://issues.apache.org/jira/browse/FLINK-15135
>
> Best,
> Yangze Guo
>
>
>
> On Wed, Dec 11, 2019 at 6:14 PM Gary Yao <[hidden email]> wrote:
> >
> > Thanks for driving this effort. Also +1 from my side. I have left a few
> > questions below.
> >
> > > - Wordcount end-to-end test. For verifying the basic process of Mesos
> > > deployment.
> >
> > Would this add additional test coverage compared to the
> > "multiple submissions" test case? I am asking because the E2E tests are
> > already
> > expensive to run, and adding new tests should be carefully considered.
> >
> > > - State TTL RocksDb backend end-to-end test. For verifying memory
> > > configuration behaviors, since Mesos has it’s own config options and
> > > logics.
> >
> > Can you elaborate more on this? Which config options are relevant here?
> >
> > On Wed, Dec 11, 2019 at 9:58 AM Till Rohrmann <[hidden email]>
> wrote:
> >
> > > +1 for building the image locally. If need should arise, then we could
> > > change it always later.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Wed, Dec 11, 2019 at 4:05 AM Xintong Song <[hidden email]>
> > > wrote:
> > >
> > > > Thanks, Yangtze.
> > > >
> > > > +1 for building the image locally.
> > > > The time consumption for both building image locally and pulling it
> from
> > > > DockerHub sounds reasonable and affordable. Therefore, I'm also in
> favor
> > > of
> > > > avoiding the cost maintaining a custom image.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Wed, Dec 11, 2019 at 10:11 AM Yangze Guo <[hidden email]>
> wrote:
> > > >
> > > > > Thanks for the feedback, Yang.
> > > > >
> > > > > Some updates I want to share in this thread.
> > > > > I have built a PoC version of Meos e2e test with WordCount
> > > > > workflow.[1] Then, I ran it in the testing environment. As the
> result
> > > > > shown here[2]:
> > > > > - For pulling image from DockerHub, it took 1 minute and 21 seconds
> > > > > - For building it locally, it took 2 minutes and 54 seconds.
> > > > >
> > > > > I prefer building it locally. Although it is slower, I think the
> time
> > > > > overhead, comparing to the cost of maintaining the image in
> DockerHub
> > > > > and the whole test process, is trivial for building or pulling the
> > > > > image.
> > > > >
> > > > > I look forward to hearing from you. ;)
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> https://github.com/KarmaGYZ/flink/commit/0406d942446a1b17f81d93235b21a829bf88ccf0
> > > > > [2]https://travis-ci.org/KarmaGYZ/flink/jobs/623207957
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Mon, Dec 9, 2019 at 2:39 PM Yang Wang <[hidden email]>
> > > wrote:
> > > > > >
> > > > > > Thanks Yangze for starting this discussion.
> > > > > >
> > > > > > Just share my thoughts.
> > > > > >
> > > > > > If the mesos official docker image could not meet our
> requirement, i
> > > > > suggest to build the image locally.
> > > > > > We have done the same things for yarn e2e tests. This way is more
> > > > > flexible and easy to maintain. However,
> > > > > > i have no idea how long building the mesos image locally will
> take.
> > > > > Based on previous experience of yarn, i
> > > > > > think it may not take too much time.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Yang
> > > > > >
> > > > > > Yangze Guo <[hidden email]> 于2019年12月7日周六 下午4:25写道:
> > > > > >>
> > > > > >> Thanks for your feedback!
> > > > > >>
> > > > > >> @Till
> > > > > >> Regarding the time overhead, I think it mainly come from the
> network
> > > > > >> transmission. For building the image locally, it will totally
> > > download
> > > > > >> 260MB files including the base image and packages. For pulling
> from
> > > > > >> DockerHub, the compressed size of the image is 347MB. Thus, I
> agree
> > > > > >> that it is ok to build the image locally.
> > > > > >>
> > > > > >> @Piyush
> > > > > >> Thank you for offering the help and sharing your usage
> scenario. In
> > > > > >> current stage, I think it will be really helpful if you can
> compress
> > > > > >> the custom image[1] or reduce the time overhead to build it
> locally.
> > > > > >> Any ideas for improving test coverage will also be appreciated.
> > > > > >>
> > > > > >> [1]
> > > > >
> > > >
> > >
> https://hub.docker.com/layers/karmagyz/mesos-flink/latest/images/sha256-4e1caefea107818aa11374d6ac8a6e889922c81806f5cd791ead141f18ec7e64
> > > > > >>
> > > > > >> Best,
> > > > > >> Yangze Guo
> > > > > >>
> > > > > >> On Sat, Dec 7, 2019 at 3:17 AM Piyush Narang <
> [hidden email]>
> > > > > wrote:
> > > > > >> >
> > > > > >> > +1 from our end as well. At Criteo, we are running some Flink
> jobs
> > > > on
> > > > > Mesos in production to compute short term features for machine
> > > learning.
> > > > > We’d love to help out and contribute on this initiative.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> > -- Piyush
> > > > > >> >
> > > > > >> >
> > > > > >> > From: Till Rohrmann <[hidden email]>
> > > > > >> > Date: Friday, December 6, 2019 at 8:10 AM
> > > > > >> > To: dev <[hidden email]>
> > > > > >> > Cc: user <[hidden email]>
> > > > > >> > Subject: Re: [DISCUSS] Adding e2e tests for Flink's Mesos
> > > > integration
> > > > > >> >
> > > > > >> > Big +1 for adding a fully working e2e test for Flink's Mesos
> > > > > integration. Ideally we would have it ready for the 1.10 release.
> The
> > > > lack
> > > > > of such a test has bitten us already multiple times.
> > > > > >> >
> > > > > >> > In general I would prefer to use the official image if
> possible
> > > > since
> > > > > it frees us from maintaining our own custom image. Since Java 9 is
> no
> > > > > longer officially supported as we opted for supporting Java 11
> (LTS) it
> > > > > might not be feasible, though. How much longer would building the
> > > custom
> > > > > image vs. downloading the custom image from DockerHub be? Maybe it
> is
> > > ok
> > > > to
> > > > > build the image locally. Then we would not have to maintain the
> image.
> > > > > >> >
> > > > > >> > Cheers,
> > > > > >> > Till
> > > > > >> >
> > > > > >> > On Fri, Dec 6, 2019 at 11:05 AM Yangze Guo <
> [hidden email]
> > > > > <mailto:[hidden email]>> wrote:
> > > > > >> > Hi, all,
> > > > > >> >
> > > > > >> > Currently, there is no end to end test or IT case for Mesos
> > > > deployment
> > > > > >> > while the common deployment related developing would
> inevitably
> > > > touch
> > > > > >> > the logic of this component. Thus, some work needs to be done
> to
> > > > > >> > guarantee experience for both Meos users and contributors.
> After
> > > > > >> > offline discussion with Till and Xintong, we have some basic
> ideas
> > > > and
> > > > > >> > would like to start a discussion thread on adding end to end
> tests
> > > > for
> > > > > >> > Flink's Mesos integration.
> > > > > >> >
> > > > > >> > As a first step, we would like to keep the scope of this
> > > > contribution
> > > > > >> > to be relative small. This may also help us to quickly get
> some
> > > > basic
> > > > > >> > test cases that might be helpful for the upcoming 1.10
> release.
> > > > > >> >
> > > > > >> > As far as we can think of, what needs to be done is to setup a
> > > Mesos
> > > > > >> > framework during the testing and determine which tests need
> to be
> > > > > >> > included.
> > > > > >> >
> > > > > >> >
> > > > > >> > ** Regarding the Mesos framework, after trying out several
> > > > approaches,
> > > > > >> > I find that setting up Mesos in docker is probably what we
> want.
> > > The
> > > > > >> > resources needed for building and setting up Mesos from
> source is
> > > > > >> > probably not affordable in most of the scenarios. So, the one
> open
> > > > > >> > question that worth discussion is the choice of Docker image.
> We
> > > > have
> > > > > >> > come up with two options.
> > > > > >> >
> > > > > >> > - Using official Mesos image[1]
> > > > > >> > The official image was the first alternative that come to our
> > > mind,
> > > > > >> > but we run into some sort of Java version compatibility
> problem
> > > that
> > > > > >> > leads to failures of launching task executors. Flink supports
> > > Java 9
> > > > > >> > since version 1.9.0 [2], However, the official Docker image of
> > > Mesos
> > > > > >> > is built with a development version of JDK 9, which probably
> has
> > > > > >> > caused this problem. Unless we want to make Flink to also be
> > > > > >> > compatible with the JDK development version used by the
> official
> > > > mesos
> > > > > >> > image, this option does not work out. Besides, according to
> the
> > > > > >> > official roadmap[5], Java 9 is not a long-term support
> version,
> > > > which
> > > > > >> > may bring stability risk in future.
> > > > > >> >
> > > > > >> > - Build a custom image
> > > > > >> > I've already tried build a custom image[3] and successfully
> run
> > > most
> > > > > >> > of the existing end to end tests cases with it. The image is
> built
> > > > > >> > with Ubuntu 16.04, JDK 8 and Mesos 1.7.1. For the mesos e2e
> test
> > > > > >> > framework, we could either build the image from a Docker file
> or
> > > > pull
> > > > > >> > the pre-built image from DockerHub (or other hub services)
> during
> > > > the
> > > > > >> > testing.
> > > > > >> > If we decide to publish the an image on DockerHub, we probably
> > > need
> > > > a
> > > > > >> > Flink official  repository/account to hold it.
> > > > > >> >
> > > > > >> >
> > > > > >> > ** Regarding the test coverage, we think the following three
> tests
> > > > > >> > could be a good starting point that covers a very essential
> set of
> > > > > >> > behaviors for Mesos deployment.
> > > > > >> > - Wordcount end-to-end test. For verifying the basic process
> of
> > > > Mesos
> > > > > >> > deployment.
> > > > > >> > - Multiple submissions of the same job. For preventing
> resource
> > > > > >> > management problems on Mesos, such as [4]
> > > > > >> > - State TTL RocksDb backend end-to-end test. For verifying
> memory
> > > > > >> > configuration behaviors, since Mesos has it’s own config
> options
> > > and
> > > > > >> > logics.
> > > > > >> >
> > > > > >> > Unfortunately, neither of us who participated the initial
> offline
> > > > > >> > discussion has much experience for running flink on mesos in
> > > > > >> > production. It would be good that users and experts who
> actually
> > > use
> > > > > >> > flink on mesos can join the discussion and provide some
> feedbacks.
> > > > Any
> > > > > >> > feedback, idea, suggestion, concern and question will be
> welcomed
> > > > and
> > > > > >> > appreciated.
> > > > > >> >
> > > > > >> >
> > > > > >> > BTW, we would like to raise a survey on the usages of Flink on
> > > Mesos
> > > > > >> > in the community. For the Flink on Mesos users, we would like
> to
> > > > > >> > learn:
> > > > > >> > - Which version of Mesos do you use and what setups (such as
> > > > Marathon)
> > > > > >> > do you need for Mesos
> > > > > >> > - Is it Flink job cluster or session cluster that  is majorly
> used
> > > > > >> > - How is the scale of the Flink / Mesos cluster
> > > > > >> >
> > > > > >> >
> > > > > >> > [1]https://hub.docker.com/r/mesosphere/mesos
> > > > > >> > [2]https://issues.apache.org/jira/browse/FLINK-11307
> > > > > >> > [3]
> https://hub.docker.com/repository/docker/karmagyz/mesos-flink
> > > > > >> > [4]https://issues.apache.org/jira/browse/FLINK-14074
> > > > > >> > [5]
> > > > >
> https://www.oracle.com/technetwork/java/java-se-support-roadmap.html
> > > > > >> >
> > > > > >> >
> > > > > >> > Best,
> > > > > >> > Yangze Guo
> > > > >
> > > >
> > >
>