Hi all,
tl;dr: I will have to cancel some E2E test executions of pull requests because we have reached the capacity limit of Flink's Azure Pipelines account. Long version: We have two types of agent pools in Azure Pipelines: Microsoft-hosted VMs and Alibaba-hosted Docker environment. In the Microsoft VMs, we are running the E2E tests, because we have an environment that will always be destroyed after each execution (and the E2E tests often leave dangling docker containers, processes etc.; and they modify files in system directories) In the Alibaba-hosted Docker environment, we are compiling and testing the regular Maven tests. We only have 10 Microsoft-hosted VMs available, and each E2E execution takes around 3.5 hours. That means we have a daily capacity of ~70 E2E tests a day. On Tuesday, we had 110 builds, on Wednesday 98 builds. Because of this, I will (manually) cancel some E2E test executions for pull requests. If I see that a PR is explicitly changing something on E2E tests, I will keep it. If I see that a PR is a docs change, has other test failures etc., I will cancel the E2E execution. If you want to verify that the E2E tests are passing for your own changes, you can set up Azure Pipelines for your GitHub account, it's free and works quite well. Here's a tutorial: https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines#AzurePipelines-Tutorial:SettingupAzurePipelinesforaforkoftheFlinkrepository What can we do to avoid this situation in the future? Sadly, Microsoft does not allow to buy additional processing slots for open source projects [1]. However, I'm in touch with a product manager at Microsoft who promised me (yesterday) to increase the limit for us. In the Alibaba environment, we have 80 slots available, and usually no capacity constraints. This means we don't need to make compromises there. Sorry for this inconvenience. Best, Robert PS: I'm considering keeping this thread as a permanent "status update" thread for Azure Pipelines [1] https://developercommunity.visualstudio.com/content/problem/1028884/additionally-purchased-microsoft-hosted-build-agen.html |
Thanks for the update Robert.
One idea to make the e2e also run on the Alibaba infrastructure would be to ensure that e2e tests clean up after they have run. Do we know which e2e tests don't do this properly? Cheers, Till On Thu, May 14, 2020 at 8:38 AM Robert Metzger <[hidden email]> wrote: > Hi all, > > tl;dr: I will have to cancel some E2E test executions of pull requests > because we have reached the capacity limit of Flink's Azure Pipelines > account. > > Long version: We have two types of agent pools in Azure Pipelines: > Microsoft-hosted VMs and Alibaba-hosted Docker environment. > In the Microsoft VMs, we are running the E2E tests, because we have an > environment that will always be destroyed after each execution (and the E2E > tests often leave dangling docker containers, processes etc.; and they > modify files in system directories) > In the Alibaba-hosted Docker environment, we are compiling and testing the > regular Maven tests. > > We only have 10 Microsoft-hosted VMs available, and each E2E execution > takes around 3.5 hours. That means we have a daily capacity of ~70 E2E > tests a day. > On Tuesday, we had 110 builds, on Wednesday 98 builds. > Because of this, I will (manually) cancel some E2E test executions for pull > requests. If I see that a PR is explicitly changing something on E2E tests, > I will keep it. If I see that a PR is a docs change, has other test > failures etc., I will cancel the E2E execution. > > If you want to verify that the E2E tests are passing for your own changes, > you can set up Azure Pipelines for your GitHub account, it's free and works > quite well. Here's a tutorial: > > https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines#AzurePipelines-Tutorial:SettingupAzurePipelinesforaforkoftheFlinkrepository > > What can we do to avoid this situation in the future? > Sadly, Microsoft does not allow to buy additional processing slots for open > source projects [1]. However, I'm in touch with a product manager at > Microsoft who promised me (yesterday) to increase the limit for us. > > In the Alibaba environment, we have 80 slots available, and usually no > capacity constraints. This means we don't need to make compromises there. > > Sorry for this inconvenience. > > Best, > Robert > > PS: I'm considering keeping this thread as a permanent "status update" > thread for Azure Pipelines > > [1] > > https://developercommunity.visualstudio.com/content/problem/1028884/additionally-purchased-microsoft-hosted-build-agen.html > |
Roughly speaking, I see the following problematic areas (I have initially
tried running the E2E tests on those machines): a) e2e tests starting Docker images (including Kubernetes). Since the tests on the Ali infra are running in docker themselves, we need to adjust the test scripts (which is not trivial, because both containers need to be in the same network, and the volume mount paths are different) b) tests that modify the underlying file system: common_kubernetes.sh installs stuff in "/usr/local/bin/". (Now that I think about it, it's not a problem in the docker environment). c) Tests that don't clean up properly when failing. IIRC I saw leftover docker containers by test_streaming_kinesis.sh when I was trying to run the E2E tests on the Ali machines. And then there pull requests that propose changes to the e2e scripts that mess something up :) We certainly need to isolate the e2e test execution somehow. Maybe we could launch VMs on the Ali machines for running the E2Es? (Using Vagrant) If Microsoft is not going to provide us with more test capacity, I will evaluate other options for the E2E tests. On Thu, May 14, 2020 at 10:36 AM Till Rohrmann <[hidden email]> wrote: > Thanks for the update Robert. > > One idea to make the e2e also run on the Alibaba infrastructure would be to > ensure that e2e tests clean up after they have run. Do we know which e2e > tests don't do this properly? > > Cheers, > Till > > On Thu, May 14, 2020 at 8:38 AM Robert Metzger <[hidden email]> > wrote: > > > Hi all, > > > > tl;dr: I will have to cancel some E2E test executions of pull requests > > because we have reached the capacity limit of Flink's Azure Pipelines > > account. > > > > Long version: We have two types of agent pools in Azure Pipelines: > > Microsoft-hosted VMs and Alibaba-hosted Docker environment. > > In the Microsoft VMs, we are running the E2E tests, because we have an > > environment that will always be destroyed after each execution (and the > E2E > > tests often leave dangling docker containers, processes etc.; and they > > modify files in system directories) > > In the Alibaba-hosted Docker environment, we are compiling and testing > the > > regular Maven tests. > > > > We only have 10 Microsoft-hosted VMs available, and each E2E execution > > takes around 3.5 hours. That means we have a daily capacity of ~70 E2E > > tests a day. > > On Tuesday, we had 110 builds, on Wednesday 98 builds. > > Because of this, I will (manually) cancel some E2E test executions for > pull > > requests. If I see that a PR is explicitly changing something on E2E > tests, > > I will keep it. If I see that a PR is a docs change, has other test > > failures etc., I will cancel the E2E execution. > > > > If you want to verify that the E2E tests are passing for your own > changes, > > you can set up Azure Pipelines for your GitHub account, it's free and > works > > quite well. Here's a tutorial: > > > > > https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines#AzurePipelines-Tutorial:SettingupAzurePipelinesforaforkoftheFlinkrepository > > > > What can we do to avoid this situation in the future? > > Sadly, Microsoft does not allow to buy additional processing slots for > open > > source projects [1]. However, I'm in touch with a product manager at > > Microsoft who promised me (yesterday) to increase the limit for us. > > > > In the Alibaba environment, we have 80 slots available, and usually no > > capacity constraints. This means we don't need to make compromises there. > > > > Sorry for this inconvenience. > > > > Best, > > Robert > > > > PS: I'm considering keeping this thread as a permanent "status update" > > thread for Azure Pipelines > > > > [1] > > > > > https://developercommunity.visualstudio.com/content/problem/1028884/additionally-purchased-microsoft-hosted-build-agen.html > > > |
Microsoft has not increased our capacity yet (even though it was promised
to me yesterday again). I have now merged a hotfix disabling the e2e test execution on pull requests to have enough capacity on master. Please run e2e tests using your private Azure accounts. Thanks for your understanding! Best, Robert On Thu, May 14, 2020 at 11:23 AM Robert Metzger <[hidden email]> wrote: > Roughly speaking, I see the following problematic areas (I have initially > tried running the E2E tests on those machines): > > a) e2e tests starting Docker images (including Kubernetes). Since the > tests on the Ali infra are running in docker themselves, we need to adjust > the test scripts (which is not trivial, because both containers need to be > in the same network, and the volume mount paths are different) > > b) tests that modify the underlying file system: common_kubernetes.sh > installs stuff in "/usr/local/bin/". (Now that I think about it, it's not a > problem in the docker environment). > > c) Tests that don't clean up properly when failing. IIRC I saw leftover > docker containers by test_streaming_kinesis.sh when I was trying to run the > E2E tests on the Ali machines. > > And then there pull requests that propose changes to the e2e scripts that > mess something up :) > We certainly need to isolate the e2e test execution somehow. Maybe we > could launch VMs on the Ali machines for running the E2Es? (Using Vagrant) > > If Microsoft is not going to provide us with more test capacity, I will > evaluate other options for the E2E tests. > > > On Thu, May 14, 2020 at 10:36 AM Till Rohrmann <[hidden email]> > wrote: > >> Thanks for the update Robert. >> >> One idea to make the e2e also run on the Alibaba infrastructure would be >> to >> ensure that e2e tests clean up after they have run. Do we know which e2e >> tests don't do this properly? >> >> Cheers, >> Till >> >> On Thu, May 14, 2020 at 8:38 AM Robert Metzger <[hidden email]> >> wrote: >> >> > Hi all, >> > >> > tl;dr: I will have to cancel some E2E test executions of pull requests >> > because we have reached the capacity limit of Flink's Azure Pipelines >> > account. >> > >> > Long version: We have two types of agent pools in Azure Pipelines: >> > Microsoft-hosted VMs and Alibaba-hosted Docker environment. >> > In the Microsoft VMs, we are running the E2E tests, because we have an >> > environment that will always be destroyed after each execution (and the >> E2E >> > tests often leave dangling docker containers, processes etc.; and they >> > modify files in system directories) >> > In the Alibaba-hosted Docker environment, we are compiling and testing >> the >> > regular Maven tests. >> > >> > We only have 10 Microsoft-hosted VMs available, and each E2E execution >> > takes around 3.5 hours. That means we have a daily capacity of ~70 E2E >> > tests a day. >> > On Tuesday, we had 110 builds, on Wednesday 98 builds. >> > Because of this, I will (manually) cancel some E2E test executions for >> pull >> > requests. If I see that a PR is explicitly changing something on E2E >> tests, >> > I will keep it. If I see that a PR is a docs change, has other test >> > failures etc., I will cancel the E2E execution. >> > >> > If you want to verify that the E2E tests are passing for your own >> changes, >> > you can set up Azure Pipelines for your GitHub account, it's free and >> works >> > quite well. Here's a tutorial: >> > >> > >> https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines#AzurePipelines-Tutorial:SettingupAzurePipelinesforaforkoftheFlinkrepository >> > >> > What can we do to avoid this situation in the future? >> > Sadly, Microsoft does not allow to buy additional processing slots for >> open >> > source projects [1]. However, I'm in touch with a product manager at >> > Microsoft who promised me (yesterday) to increase the limit for us. >> > >> > In the Alibaba environment, we have 80 slots available, and usually no >> > capacity constraints. This means we don't need to make compromises >> there. >> > >> > Sorry for this inconvenience. >> > >> > Best, >> > Robert >> > >> > PS: I'm considering keeping this thread as a permanent "status update" >> > thread for Azure Pipelines >> > >> > [1] >> > >> > >> https://developercommunity.visualstudio.com/content/problem/1028884/additionally-purchased-microsoft-hosted-build-agen.html >> > >> > |
Microsoft has now doubled our CI capacity (to 20 concurrent VMs for
executing e2e tests). If the e2e test execution is normalized tomorrow, I will revert the hotfix, enabling e2e tests on PRs again. Sorry for the back and forth. On Tue, May 19, 2020 at 3:11 PM Robert Metzger <[hidden email]> wrote: > Microsoft has not increased our capacity yet (even though it was promised > to me yesterday again). > > I have now merged a hotfix disabling the e2e test execution on pull > requests to have enough capacity on master. > Please run e2e tests using your private Azure accounts. Thanks for your > understanding! > > Best, > Robert > > > On Thu, May 14, 2020 at 11:23 AM Robert Metzger <[hidden email]> > wrote: > >> Roughly speaking, I see the following problematic areas (I have initially >> tried running the E2E tests on those machines): >> >> a) e2e tests starting Docker images (including Kubernetes). Since the >> tests on the Ali infra are running in docker themselves, we need to adjust >> the test scripts (which is not trivial, because both containers need to be >> in the same network, and the volume mount paths are different) >> >> b) tests that modify the underlying file system: common_kubernetes.sh >> installs stuff in "/usr/local/bin/". (Now that I think about it, it's not a >> problem in the docker environment). >> >> c) Tests that don't clean up properly when failing. IIRC I saw leftover >> docker containers by test_streaming_kinesis.sh when I was trying to run the >> E2E tests on the Ali machines. >> >> And then there pull requests that propose changes to the e2e scripts that >> mess something up :) >> We certainly need to isolate the e2e test execution somehow. Maybe we >> could launch VMs on the Ali machines for running the E2Es? (Using Vagrant) >> >> If Microsoft is not going to provide us with more test capacity, I will >> evaluate other options for the E2E tests. >> >> >> On Thu, May 14, 2020 at 10:36 AM Till Rohrmann <[hidden email]> >> wrote: >> >>> Thanks for the update Robert. >>> >>> One idea to make the e2e also run on the Alibaba infrastructure would be >>> to >>> ensure that e2e tests clean up after they have run. Do we know which e2e >>> tests don't do this properly? >>> >>> Cheers, >>> Till >>> >>> On Thu, May 14, 2020 at 8:38 AM Robert Metzger <[hidden email]> >>> wrote: >>> >>> > Hi all, >>> > >>> > tl;dr: I will have to cancel some E2E test executions of pull requests >>> > because we have reached the capacity limit of Flink's Azure Pipelines >>> > account. >>> > >>> > Long version: We have two types of agent pools in Azure Pipelines: >>> > Microsoft-hosted VMs and Alibaba-hosted Docker environment. >>> > In the Microsoft VMs, we are running the E2E tests, because we have an >>> > environment that will always be destroyed after each execution (and >>> the E2E >>> > tests often leave dangling docker containers, processes etc.; and they >>> > modify files in system directories) >>> > In the Alibaba-hosted Docker environment, we are compiling and testing >>> the >>> > regular Maven tests. >>> > >>> > We only have 10 Microsoft-hosted VMs available, and each E2E execution >>> > takes around 3.5 hours. That means we have a daily capacity of ~70 E2E >>> > tests a day. >>> > On Tuesday, we had 110 builds, on Wednesday 98 builds. >>> > Because of this, I will (manually) cancel some E2E test executions for >>> pull >>> > requests. If I see that a PR is explicitly changing something on E2E >>> tests, >>> > I will keep it. If I see that a PR is a docs change, has other test >>> > failures etc., I will cancel the E2E execution. >>> > >>> > If you want to verify that the E2E tests are passing for your own >>> changes, >>> > you can set up Azure Pipelines for your GitHub account, it's free and >>> works >>> > quite well. Here's a tutorial: >>> > >>> > >>> https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines#AzurePipelines-Tutorial:SettingupAzurePipelinesforaforkoftheFlinkrepository >>> > >>> > What can we do to avoid this situation in the future? >>> > Sadly, Microsoft does not allow to buy additional processing slots for >>> open >>> > source projects [1]. However, I'm in touch with a product manager at >>> > Microsoft who promised me (yesterday) to increase the limit for us. >>> > >>> > In the Alibaba environment, we have 80 slots available, and usually no >>> > capacity constraints. This means we don't need to make compromises >>> there. >>> > >>> > Sorry for this inconvenience. >>> > >>> > Best, >>> > Robert >>> > >>> > PS: I'm considering keeping this thread as a permanent "status update" >>> > thread for Azure Pipelines >>> > >>> > [1] >>> > >>> > >>> https://developercommunity.visualstudio.com/content/problem/1028884/additionally-purchased-microsoft-hosted-build-agen.html >>> > >>> >> |
Free forum by Nabble | Edit this page |