[DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

Robert Metzger
Hey devs,

I would like to discuss whether it makes sense to fully switch to Azure
Pipelines and phase out our Travis integration.
More information on our Azure integration can be found here:
https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines

Travis will stay for the release-1.10 and older branches, as I have set up
Azure only for the master branch.

Proposal:
- We keep the flinkbot infrastructure supporting both Travis and Azure
around, while we are still receive pull requests and pushes for the
"master" and "release-1.10" branches.
- We remove the travis-specific files from "master", so that builds are not
triggered anymore
- once we receive no more builds at Travis (because 1.11 has been
released), we remove the remaining travis-related infrastructure

What do you think?


Best,
Robert
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

Chesnay Schepler-3
Some thoughts:
- by virtue of maintaining the past 2 releases we will have to maintain
any Travis infrastructure as long as 1.10 is supported, i.e., until 1.12
- the azure setup doesn't appear to be equivalent yet since the java e2e
profile isn't setting the hadoop switch (-Pe2e-hadoop), as a result of
which SQLClientKafkaITCase isn't run
- the nightly scripts still seems to be using a maven version other than
3.2.5; from today on master:

2020-03-25T05:31:52.7412964Z [INFO] --------< org.apache.flink:flink-end-to-end-tests-common-kafka >--------
2020-03-25T05:31:52.7413854Z [INFO] Building flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT       [39/46]
2020-03-25T05:31:52.7414689Z [INFO] --------------------------------[ jar ]---------------------------------
2020-03-25T05:31:52.7518360Z [INFO]
2020-03-25T05:31:52.7519770Z [INFO] --- maven-checkstyle-plugin:2.17:check (validate) @ flink-end-to-end-tests-common-kafka ---

- there is no real benefit in retiring the travis support in CiBot; the
important part is whether Travis is run or not for pull requests.

 From what I can tell though azure seems to be working fine for pull
requests, so +1 to at least disable the travis PR runs.

On 23/03/2020 14:48, Robert Metzger wrote:

> Hey devs,
>
> I would like to discuss whether it makes sense to fully switch to Azure
> Pipelines and phase out our Travis integration.
> More information on our Azure integration can be found here:
> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
>
> Travis will stay for the release-1.10 and older branches, as I have set up
> Azure only for the master branch.
>
> Proposal:
> - We keep the flinkbot infrastructure supporting both Travis and Azure
> around, while we are still receive pull requests and pushes for the
> "master" and "release-1.10" branches.
> - We remove the travis-specific files from "master", so that builds are not
> triggered anymore
> - once we receive no more builds at Travis (because 1.11 has been
> released), we remove the remaining travis-related infrastructure
>
> What do you think?
>
>
> Best,
> Robert
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

Dian Fu-2
Hi Robert,

Thanks a lot for your great work!

Overall I'm +1 to switch to Azure as the primary CI tool if it's stable enough as I think there is no need to run both the travis and Azure for one single PR.

However, there are still some improvements need to do and it would be great if these issues could be addressed before fully switch to Azure:
- The report of Azure is still not viewable[1] (I noticed that Hequn has also reported this issue in another thread). This is very useful information.
- For PR test of Azure pipeline, it seems that it will not rebase the master code before running the tests.

Thanks,
Dian

[1] https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9 <https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9>

> 在 2020年3月25日,下午3:33,Chesnay Schepler <[hidden email]> 写道:
>
> Some thoughts:
> - by virtue of maintaining the past 2 releases we will have to maintain any Travis infrastructure as long as 1.10 is supported, i.e., until 1.12
> - the azure setup doesn't appear to be equivalent yet since the java e2e profile isn't setting the hadoop switch (-Pe2e-hadoop), as a result of which SQLClientKafkaITCase isn't run
> - the nightly scripts still seems to be using a maven version other than 3.2.5; from today on master:
>
> 2020-03-25T05:31:52.7412964Z [INFO] --------< org.apache.flink:flink-end-to-end-tests-common-kafka >--------
> 2020-03-25T05:31:52.7413854Z [INFO] Building flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT       [39/46]
> 2020-03-25T05:31:52.7414689Z [INFO] --------------------------------[ jar ]---------------------------------
> 2020-03-25T05:31:52.7518360Z [INFO]
> 2020-03-25T05:31:52.7519770Z [INFO] --- maven-checkstyle-plugin:2.17:check (validate) @ flink-end-to-end-tests-common-kafka ---
>
> - there is no real benefit in retiring the travis support in CiBot; the important part is whether Travis is run or not for pull requests.
>
> From what I can tell though azure seems to be working fine for pull requests, so +1 to at least disable the travis PR runs.
>
> On 23/03/2020 14:48, Robert Metzger wrote:
>> Hey devs,
>>
>> I would like to discuss whether it makes sense to fully switch to Azure
>> Pipelines and phase out our Travis integration.
>> More information on our Azure integration can be found here:
>> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
>>
>> Travis will stay for the release-1.10 and older branches, as I have set up
>> Azure only for the master branch.
>>
>> Proposal:
>> - We keep the flinkbot infrastructure supporting both Travis and Azure
>> around, while we are still receive pull requests and pushes for the
>> "master" and "release-1.10" branches.
>> - We remove the travis-specific files from "master", so that builds are not
>> triggered anymore
>> - once we receive no more builds at Travis (because 1.11 has been
>> released), we remove the remaining travis-related infrastructure
>>
>> What do you think?
>>
>>
>> Best,
>> Robert
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

Chesnay Schepler-3
@Dian we haven't been rebasing PR's against master for months, ever
since we switched to CiBot.

On 25/03/2020 09:29, Dian Fu wrote:

> Hi Robert,
>
> Thanks a lot for your great work!
>
> Overall I'm +1 to switch to Azure as the primary CI tool if it's stable enough as I think there is no need to run both the travis and Azure for one single PR.
>
> However, there are still some improvements need to do and it would be great if these issues could be addressed before fully switch to Azure:
> - The report of Azure is still not viewable[1] (I noticed that Hequn has also reported this issue in another thread). This is very useful information.
> - For PR test of Azure pipeline, it seems that it will not rebase the master code before running the tests.
>
> Thanks,
> Dian
>
> [1] https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9 <https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9>
>> 在 2020年3月25日,下午3:33,Chesnay Schepler <[hidden email]> 写道:
>>
>> Some thoughts:
>> - by virtue of maintaining the past 2 releases we will have to maintain any Travis infrastructure as long as 1.10 is supported, i.e., until 1.12
>> - the azure setup doesn't appear to be equivalent yet since the java e2e profile isn't setting the hadoop switch (-Pe2e-hadoop), as a result of which SQLClientKafkaITCase isn't run
>> - the nightly scripts still seems to be using a maven version other than 3.2.5; from today on master:
>>
>> 2020-03-25T05:31:52.7412964Z [INFO] --------< org.apache.flink:flink-end-to-end-tests-common-kafka >--------
>> 2020-03-25T05:31:52.7413854Z [INFO] Building flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT       [39/46]
>> 2020-03-25T05:31:52.7414689Z [INFO] --------------------------------[ jar ]---------------------------------
>> 2020-03-25T05:31:52.7518360Z [INFO]
>> 2020-03-25T05:31:52.7519770Z [INFO] --- maven-checkstyle-plugin:2.17:check (validate) @ flink-end-to-end-tests-common-kafka ---
>>
>> - there is no real benefit in retiring the travis support in CiBot; the important part is whether Travis is run or not for pull requests.
>>
>>  From what I can tell though azure seems to be working fine for pull requests, so +1 to at least disable the travis PR runs.
>>
>> On 23/03/2020 14:48, Robert Metzger wrote:
>>> Hey devs,
>>>
>>> I would like to discuss whether it makes sense to fully switch to Azure
>>> Pipelines and phase out our Travis integration.
>>> More information on our Azure integration can be found here:
>>> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
>>>
>>> Travis will stay for the release-1.10 and older branches, as I have set up
>>> Azure only for the master branch.
>>>
>>> Proposal:
>>> - We keep the flinkbot infrastructure supporting both Travis and Azure
>>> around, while we are still receive pull requests and pushes for the
>>> "master" and "release-1.10" branches.
>>> - We remove the travis-specific files from "master", so that builds are not
>>> triggered anymore
>>> - once we receive no more builds at Travis (because 1.11 has been
>>> released), we remove the remaining travis-related infrastructure
>>>
>>> What do you think?
>>>
>>>
>>> Best,
>>> Robert
>>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

Dian Fu-2
Thanks for the information. I'm sorry that I'm not aware of this before and I have checked the build log of travis and confirmed that this is true.

@Chesnay Are there any specific reasons for this and is it possible to add this back for Azure Pipelines?

Thanks,
Dian

> 在 2020年3月25日,下午4:43,Chesnay Schepler <[hidden email]> 写道:
>
> @Dian we haven't been rebasing PR's against master for months, ever since we switched to CiBot.
>
> On 25/03/2020 09:29, Dian Fu wrote:
>> Hi Robert,
>>
>> Thanks a lot for your great work!
>>
>> Overall I'm +1 to switch to Azure as the primary CI tool if it's stable enough as I think there is no need to run both the travis and Azure for one single PR.
>>
>> However, there are still some improvements need to do and it would be great if these issues could be addressed before fully switch to Azure:
>> - The report of Azure is still not viewable[1] (I noticed that Hequn has also reported this issue in another thread). This is very useful information.
>> - For PR test of Azure pipeline, it seems that it will not rebase the master code before running the tests.
>>
>> Thanks,
>> Dian
>>
>> [1] https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9 <https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9> <https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9 <https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9>>
>>> 在 2020年3月25日,下午3:33,Chesnay Schepler <[hidden email]> 写道:
>>>
>>> Some thoughts:
>>> - by virtue of maintaining the past 2 releases we will have to maintain any Travis infrastructure as long as 1.10 is supported, i.e., until 1.12
>>> - the azure setup doesn't appear to be equivalent yet since the java e2e profile isn't setting the hadoop switch (-Pe2e-hadoop), as a result of which SQLClientKafkaITCase isn't run
>>> - the nightly scripts still seems to be using a maven version other than 3.2.5; from today on master:
>>>
>>> 2020-03-25T05:31:52.7412964Z [INFO] --------< org.apache.flink:flink-end-to-end-tests-common-kafka >--------
>>> 2020-03-25T05:31:52.7413854Z [INFO] Building flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT       [39/46]
>>> 2020-03-25T05:31:52.7414689Z [INFO] --------------------------------[ jar ]---------------------------------
>>> 2020-03-25T05:31:52.7518360Z [INFO]
>>> 2020-03-25T05:31:52.7519770Z [INFO] --- maven-checkstyle-plugin:2.17:check (validate) @ flink-end-to-end-tests-common-kafka ---
>>>
>>> - there is no real benefit in retiring the travis support in CiBot; the important part is whether Travis is run or not for pull requests.
>>>
>>> From what I can tell though azure seems to be working fine for pull requests, so +1 to at least disable the travis PR runs.
>>>
>>> On 23/03/2020 14:48, Robert Metzger wrote:
>>>> Hey devs,
>>>>
>>>> I would like to discuss whether it makes sense to fully switch to Azure
>>>> Pipelines and phase out our Travis integration.
>>>> More information on our Azure integration can be found here:
>>>> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
>>>>
>>>> Travis will stay for the release-1.10 and older branches, as I have set up
>>>> Azure only for the master branch.
>>>>
>>>> Proposal:
>>>> - We keep the flinkbot infrastructure supporting both Travis and Azure
>>>> around, while we are still receive pull requests and pushes for the
>>>> "master" and "release-1.10" branches.
>>>> - We remove the travis-specific files from "master", so that builds are not
>>>> triggered anymore
>>>> - once we receive no more builds at Travis (because 1.11 has been
>>>> released), we remove the remaining travis-related infrastructure
>>>>
>>>> What do you think?
>>>>
>>>>
>>>> Best,
>>>> Robert

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

Chesnay Schepler-3
It was left out since it adds significant additional complexity and the
value is dubious at best for PRs that aren't merged shortly after the
build has finished.

On 25/03/2020 10:28, Dian Fu wrote:

> Thanks for the information. I'm sorry that I'm not aware of this before and I have checked the build log of travis and confirmed that this is true.
>
> @Chesnay Are there any specific reasons for this and is it possible to add this back for Azure Pipelines?
>
> Thanks,
> Dian
>
>> 在 2020年3月25日,下午4:43,Chesnay Schepler <[hidden email]> 写道:
>>
>> @Dian we haven't been rebasing PR's against master for months, ever since we switched to CiBot.
>>
>> On 25/03/2020 09:29, Dian Fu wrote:
>>> Hi Robert,
>>>
>>> Thanks a lot for your great work!
>>>
>>> Overall I'm +1 to switch to Azure as the primary CI tool if it's stable enough as I think there is no need to run both the travis and Azure for one single PR.
>>>
>>> However, there are still some improvements need to do and it would be great if these issues could be addressed before fully switch to Azure:
>>> - The report of Azure is still not viewable[1] (I noticed that Hequn has also reported this issue in another thread). This is very useful information.
>>> - For PR test of Azure pipeline, it seems that it will not rebase the master code before running the tests.
>>>
>>> Thanks,
>>> Dian
>>>
>>> [1] https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9 <https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9> <https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9 <https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9>>
>>>> 在 2020年3月25日,下午3:33,Chesnay Schepler <[hidden email]> 写道:
>>>>
>>>> Some thoughts:
>>>> - by virtue of maintaining the past 2 releases we will have to maintain any Travis infrastructure as long as 1.10 is supported, i.e., until 1.12
>>>> - the azure setup doesn't appear to be equivalent yet since the java e2e profile isn't setting the hadoop switch (-Pe2e-hadoop), as a result of which SQLClientKafkaITCase isn't run
>>>> - the nightly scripts still seems to be using a maven version other than 3.2.5; from today on master:
>>>>
>>>> 2020-03-25T05:31:52.7412964Z [INFO] --------< org.apache.flink:flink-end-to-end-tests-common-kafka >--------
>>>> 2020-03-25T05:31:52.7413854Z [INFO] Building flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT       [39/46]
>>>> 2020-03-25T05:31:52.7414689Z [INFO] --------------------------------[ jar ]---------------------------------
>>>> 2020-03-25T05:31:52.7518360Z [INFO]
>>>> 2020-03-25T05:31:52.7519770Z [INFO] --- maven-checkstyle-plugin:2.17:check (validate) @ flink-end-to-end-tests-common-kafka ---
>>>>
>>>> - there is no real benefit in retiring the travis support in CiBot; the important part is whether Travis is run or not for pull requests.
>>>>
>>>>  From what I can tell though azure seems to be working fine for pull requests, so +1 to at least disable the travis PR runs.
>>>>
>>>> On 23/03/2020 14:48, Robert Metzger wrote:
>>>>> Hey devs,
>>>>>
>>>>> I would like to discuss whether it makes sense to fully switch to Azure
>>>>> Pipelines and phase out our Travis integration.
>>>>> More information on our Azure integration can be found here:
>>>>> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
>>>>>
>>>>> Travis will stay for the release-1.10 and older branches, as I have set up
>>>>> Azure only for the master branch.
>>>>>
>>>>> Proposal:
>>>>> - We keep the flinkbot infrastructure supporting both Travis and Azure
>>>>> around, while we are still receive pull requests and pushes for the
>>>>> "master" and "release-1.10" branches.
>>>>> - We remove the travis-specific files from "master", so that builds are not
>>>>> triggered anymore
>>>>> - once we receive no more builds at Travis (because 1.11 has been
>>>>> released), we remove the remaining travis-related infrastructure
>>>>>
>>>>> What do you think?
>>>>>
>>>>>
>>>>> Best,
>>>>> Robert
>


Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

Robert Metzger
Thank you for the feedback so far.

Responses to the items Chesnay raised:

- by virtue of maintaining the past 2 releases we will have to maintain any
> Travis infrastructure as long as 1.10 is supported, i.e., until 1.12
>

Okay. I wasn't sure about the exact policy there.


> - the azure setup doesn't appear to be equivalent yet since the java e2e
> profile isn't setting the hadoop switch (-Pe2e-hadoop), as a result of
> which SQLClientKafkaITCase isn't run
>

I filed a ticket to address this:
https://issues.apache.org/jira/browse/FLINK-16778


> - the nightly scripts still seems to be using a maven version other than
> 3.2.5; from today on master:
> 2020-03-25T05:31:52.7412964Z [INFO] --------<
> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
> 2020-03-25T05:31:52.7413854Z [INFO] Building
> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT [39/46]
> 2020-03-25T05:31:52.7414689Z [INFO] --------------------------------[ jar
> ]---------------------------------
> 2020-03-25T05:31:52.7518360Z [INFO]
> 2020-03-25T05:31:52.7519770Z [INFO] --- maven-checkstyle-plugin:2.17:check
> (validate) @ flink-end-to-end-tests-common-kafka ---
>

I'm planning to address this as part of
https://issues.apache.org/jira/browse/FLINK-16411, where I work on
centralizing all mvn invocations.


> - there is no real benefit in retiring the travis support in CiBot; the
> important part is whether Travis is run or not for pull requests.
> From what I can tell though azure seems to be working fine for pull
> requests, so +1 to at least disable the travis PR runs.


So we disable Travis for https://github.com/flink-ci/flink ? I will do it
once there are no new concerns and above tickets are resolved.

What about disabling travis for master pushes? (e.g. removing the
.travis.yml file from master)?


@Dian:
Thanks a lot for your feedback.

- The report of Azure is still not viewable[1] (I noticed that Hequn has
> also reported this issue in another thread). This is very useful
> information.


You are referring to the emails send to [hidden email] right?
I have reported this both as a bug [1] and a feature request [2] to Azure.
But I don't believe they will resolve this issue anytime soon.
Azure has an notifications API that we could use to build a service that
sends emails to that list, but I feel that this is really a waste of time.
The URL in the link even contains the ID of the build. We would just need
to extract this ID and generate the appropriate URL. I will try to directly
reach the product management of AZP, maybe I can get some attention from
there.



[1]
https://developercommunity.visualstudio.com/content/problem/957778/third-parties-are-unable-to-access-notification-li.html?childToView=960403#comment-960403
[2]
https://developercommunity.visualstudio.com/content/idea/960472/third-parties-are-unable-to-access-notification-li-1.html



On Wed, Mar 25, 2020 at 10:34 AM Chesnay Schepler <[hidden email]>
wrote:

> It was left out since it adds significant additional complexity and the
> value is dubious at best for PRs that aren't merged shortly after the
> build has finished.
>
> On 25/03/2020 10:28, Dian Fu wrote:
> > Thanks for the information. I'm sorry that I'm not aware of this before
> and I have checked the build log of travis and confirmed that this is true.
> >
> > @Chesnay Are there any specific reasons for this and is it possible to
> add this back for Azure Pipelines?
> >
> > Thanks,
> > Dian
> >
> >> 在 2020年3月25日,下午4:43,Chesnay Schepler <[hidden email]> 写道:
> >>
> >> @Dian we haven't been rebasing PR's against master for months, ever
> since we switched to CiBot.
> >>
> >> On 25/03/2020 09:29, Dian Fu wrote:
> >>> Hi Robert,
> >>>
> >>> Thanks a lot for your great work!
> >>>
> >>> Overall I'm +1 to switch to Azure as the primary CI tool if it's
> stable enough as I think there is no need to run both the travis and Azure
> for one single PR.
> >>>
> >>> However, there are still some improvements need to do and it would be
> great if these issues could be addressed before fully switch to Azure:
> >>> - The report of Azure is still not viewable[1] (I noticed that Hequn
> has also reported this issue in another thread). This is very useful
> information.
> >>> - For PR test of Azure pipeline, it seems that it will not rebase the
> master code before running the tests.
> >>>
> >>> Thanks,
> >>> Dian
> >>>
> >>> [1]
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> <
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9>
> <
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> <
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> >>
> >>>> 在 2020年3月25日,下午3:33,Chesnay Schepler <[hidden email]> 写道:
> >>>>
> >>>> Some thoughts:
> >>>> - by virtue of maintaining the past 2 releases we will have to
> maintain any Travis infrastructure as long as 1.10 is supported, i.e.,
> until 1.12
> >>>> - the azure setup doesn't appear to be equivalent yet since the java
> e2e profile isn't setting the hadoop switch (-Pe2e-hadoop), as a result of
> which SQLClientKafkaITCase isn't run
> >>>> - the nightly scripts still seems to be using a maven version other
> than 3.2.5; from today on master:
> >>>>
> >>>> 2020-03-25T05:31:52.7412964Z [INFO] --------<
> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
> >>>> 2020-03-25T05:31:52.7413854Z [INFO] Building
> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT       [39/46]
> >>>> 2020-03-25T05:31:52.7414689Z [INFO] --------------------------------[
> jar ]---------------------------------
> >>>> 2020-03-25T05:31:52.7518360Z [INFO]
> >>>> 2020-03-25T05:31:52.7519770Z [INFO] ---
> maven-checkstyle-plugin:2.17:check (validate) @
> flink-end-to-end-tests-common-kafka ---
> >>>>
> >>>> - there is no real benefit in retiring the travis support in CiBot;
> the important part is whether Travis is run or not for pull requests.
> >>>>
> >>>>  From what I can tell though azure seems to be working fine for pull
> requests, so +1 to at least disable the travis PR runs.
> >>>>
> >>>> On 23/03/2020 14:48, Robert Metzger wrote:
> >>>>> Hey devs,
> >>>>>
> >>>>> I would like to discuss whether it makes sense to fully switch to
> Azure
> >>>>> Pipelines and phase out our Travis integration.
> >>>>> More information on our Azure integration can be found here:
> >>>>>
> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
> >>>>>
> >>>>> Travis will stay for the release-1.10 and older branches, as I have
> set up
> >>>>> Azure only for the master branch.
> >>>>>
> >>>>> Proposal:
> >>>>> - We keep the flinkbot infrastructure supporting both Travis and
> Azure
> >>>>> around, while we are still receive pull requests and pushes for the
> >>>>> "master" and "release-1.10" branches.
> >>>>> - We remove the travis-specific files from "master", so that builds
> are not
> >>>>> triggered anymore
> >>>>> - once we receive no more builds at Travis (because 1.11 has been
> >>>>> released), we remove the remaining travis-related infrastructure
> >>>>>
> >>>>> What do you think?
> >>>>>
> >>>>>
> >>>>> Best,
> >>>>> Robert
> >
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

Yu Li
Thanks for the efforts Robert!

Checking the pipeline failure report [1] the pass rate is relatively low,
and I'm wondering whether we need more efforts to stabilize it before
replacing travis PR runs.

From the report, uploading log fails 1/5 of the tests, which indicates the
access from azure to s3 is not stable enough. Let me spend more time
checking the most unstable test failures (need to further prioritize the
recorded issues [2]), will report back later.

Best Regards,
Yu

[1]
https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=4
[2] https://s.apache.org/390qo


On Wed, 25 Mar 2020 at 22:03, Robert Metzger <[hidden email]> wrote:

> Thank you for the feedback so far.
>
> Responses to the items Chesnay raised:
>
> - by virtue of maintaining the past 2 releases we will have to maintain any
> > Travis infrastructure as long as 1.10 is supported, i.e., until 1.12
> >
>
> Okay. I wasn't sure about the exact policy there.
>
>
> > - the azure setup doesn't appear to be equivalent yet since the java e2e
> > profile isn't setting the hadoop switch (-Pe2e-hadoop), as a result of
> > which SQLClientKafkaITCase isn't run
> >
>
> I filed a ticket to address this:
> https://issues.apache.org/jira/browse/FLINK-16778
>
>
> > - the nightly scripts still seems to be using a maven version other than
> > 3.2.5; from today on master:
> > 2020-03-25T05:31:52.7412964Z [INFO] --------<
> > org.apache.flink:flink-end-to-end-tests-common-kafka >--------
> > 2020-03-25T05:31:52.7413854Z [INFO] Building
> > flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT [39/46]
> > 2020-03-25T05:31:52.7414689Z [INFO] --------------------------------[ jar
> > ]---------------------------------
> > 2020-03-25T05:31:52.7518360Z [INFO]
> > 2020-03-25T05:31:52.7519770Z [INFO] ---
> maven-checkstyle-plugin:2.17:check
> > (validate) @ flink-end-to-end-tests-common-kafka ---
> >
>
> I'm planning to address this as part of
> https://issues.apache.org/jira/browse/FLINK-16411, where I work on
> centralizing all mvn invocations.
>
>
> > - there is no real benefit in retiring the travis support in CiBot; the
> > important part is whether Travis is run or not for pull requests.
> > From what I can tell though azure seems to be working fine for pull
> > requests, so +1 to at least disable the travis PR runs.
>
>
> So we disable Travis for https://github.com/flink-ci/flink ? I will do it
> once there are no new concerns and above tickets are resolved.
>
> What about disabling travis for master pushes? (e.g. removing the
> .travis.yml file from master)?
>
>
> @Dian:
> Thanks a lot for your feedback.
>
> - The report of Azure is still not viewable[1] (I noticed that Hequn has
> > also reported this issue in another thread). This is very useful
> > information.
>
>
> You are referring to the emails send to [hidden email] right?
> I have reported this both as a bug [1] and a feature request [2] to Azure.
> But I don't believe they will resolve this issue anytime soon.
> Azure has an notifications API that we could use to build a service that
> sends emails to that list, but I feel that this is really a waste of time.
> The URL in the link even contains the ID of the build. We would just need
> to extract this ID and generate the appropriate URL. I will try to directly
> reach the product management of AZP, maybe I can get some attention from
> there.
>
>
>
> [1]
>
> https://developercommunity.visualstudio.com/content/problem/957778/third-parties-are-unable-to-access-notification-li.html?childToView=960403#comment-960403
> [2]
>
> https://developercommunity.visualstudio.com/content/idea/960472/third-parties-are-unable-to-access-notification-li-1.html
>
>
>
> On Wed, Mar 25, 2020 at 10:34 AM Chesnay Schepler <[hidden email]>
> wrote:
>
> > It was left out since it adds significant additional complexity and the
> > value is dubious at best for PRs that aren't merged shortly after the
> > build has finished.
> >
> > On 25/03/2020 10:28, Dian Fu wrote:
> > > Thanks for the information. I'm sorry that I'm not aware of this before
> > and I have checked the build log of travis and confirmed that this is
> true.
> > >
> > > @Chesnay Are there any specific reasons for this and is it possible to
> > add this back for Azure Pipelines?
> > >
> > > Thanks,
> > > Dian
> > >
> > >> 在 2020年3月25日,下午4:43,Chesnay Schepler <[hidden email]> 写道:
> > >>
> > >> @Dian we haven't been rebasing PR's against master for months, ever
> > since we switched to CiBot.
> > >>
> > >> On 25/03/2020 09:29, Dian Fu wrote:
> > >>> Hi Robert,
> > >>>
> > >>> Thanks a lot for your great work!
> > >>>
> > >>> Overall I'm +1 to switch to Azure as the primary CI tool if it's
> > stable enough as I think there is no need to run both the travis and
> Azure
> > for one single PR.
> > >>>
> > >>> However, there are still some improvements need to do and it would be
> > great if these issues could be addressed before fully switch to Azure:
> > >>> - The report of Azure is still not viewable[1] (I noticed that Hequn
> > has also reported this issue in another thread). This is very useful
> > information.
> > >>> - For PR test of Azure pipeline, it seems that it will not rebase the
> > master code before running the tests.
> > >>>
> > >>> Thanks,
> > >>> Dian
> > >>>
> > >>> [1]
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > <
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> >
> > <
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > <
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > >>
> > >>>> 在 2020年3月25日,下午3:33,Chesnay Schepler <[hidden email]> 写道:
> > >>>>
> > >>>> Some thoughts:
> > >>>> - by virtue of maintaining the past 2 releases we will have to
> > maintain any Travis infrastructure as long as 1.10 is supported, i.e.,
> > until 1.12
> > >>>> - the azure setup doesn't appear to be equivalent yet since the java
> > e2e profile isn't setting the hadoop switch (-Pe2e-hadoop), as a result
> of
> > which SQLClientKafkaITCase isn't run
> > >>>> - the nightly scripts still seems to be using a maven version other
> > than 3.2.5; from today on master:
> > >>>>
> > >>>> 2020-03-25T05:31:52.7412964Z [INFO] --------<
> > org.apache.flink:flink-end-to-end-tests-common-kafka >--------
> > >>>> 2020-03-25T05:31:52.7413854Z [INFO] Building
> > flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT       [39/46]
> > >>>> 2020-03-25T05:31:52.7414689Z [INFO]
> --------------------------------[
> > jar ]---------------------------------
> > >>>> 2020-03-25T05:31:52.7518360Z [INFO]
> > >>>> 2020-03-25T05:31:52.7519770Z [INFO] ---
> > maven-checkstyle-plugin:2.17:check (validate) @
> > flink-end-to-end-tests-common-kafka ---
> > >>>>
> > >>>> - there is no real benefit in retiring the travis support in CiBot;
> > the important part is whether Travis is run or not for pull requests.
> > >>>>
> > >>>>  From what I can tell though azure seems to be working fine for pull
> > requests, so +1 to at least disable the travis PR runs.
> > >>>>
> > >>>> On 23/03/2020 14:48, Robert Metzger wrote:
> > >>>>> Hey devs,
> > >>>>>
> > >>>>> I would like to discuss whether it makes sense to fully switch to
> > Azure
> > >>>>> Pipelines and phase out our Travis integration.
> > >>>>> More information on our Azure integration can be found here:
> > >>>>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
> > >>>>>
> > >>>>> Travis will stay for the release-1.10 and older branches, as I have
> > set up
> > >>>>> Azure only for the master branch.
> > >>>>>
> > >>>>> Proposal:
> > >>>>> - We keep the flinkbot infrastructure supporting both Travis and
> > Azure
> > >>>>> around, while we are still receive pull requests and pushes for the
> > >>>>> "master" and "release-1.10" branches.
> > >>>>> - We remove the travis-specific files from "master", so that builds
> > are not
> > >>>>> triggered anymore
> > >>>>> - once we receive no more builds at Travis (because 1.11 has been
> > >>>>> released), we remove the remaining travis-related infrastructure
> > >>>>>
> > >>>>> What do you think?
> > >>>>>
> > >>>>>
> > >>>>> Best,
> > >>>>> Robert
> > >
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

Chesnay Schepler-3
In reply to this post by Robert Metzger
The easiest way to disable travis for pushes is to remove all builds
from the .travis.yml with a push/pr condition.

On 25/03/2020 15:03, Robert Metzger wrote:

> Thank you for the feedback so far.
>
> Responses to the items Chesnay raised:
>
> - by virtue of maintaining the past 2 releases we will have to maintain any
>> Travis infrastructure as long as 1.10 is supported, i.e., until 1.12
>>
> Okay. I wasn't sure about the exact policy there.
>
>
>> - the azure setup doesn't appear to be equivalent yet since the java e2e
>> profile isn't setting the hadoop switch (-Pe2e-hadoop), as a result of
>> which SQLClientKafkaITCase isn't run
>>
> I filed a ticket to address this:
> https://issues.apache.org/jira/browse/FLINK-16778
>
>
>> - the nightly scripts still seems to be using a maven version other than
>> 3.2.5; from today on master:
>> 2020-03-25T05:31:52.7412964Z [INFO] --------<
>> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
>> 2020-03-25T05:31:52.7413854Z [INFO] Building
>> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT [39/46]
>> 2020-03-25T05:31:52.7414689Z [INFO] --------------------------------[ jar
>> ]---------------------------------
>> 2020-03-25T05:31:52.7518360Z [INFO]
>> 2020-03-25T05:31:52.7519770Z [INFO] --- maven-checkstyle-plugin:2.17:check
>> (validate) @ flink-end-to-end-tests-common-kafka ---
>>
> I'm planning to address this as part of
> https://issues.apache.org/jira/browse/FLINK-16411, where I work on
> centralizing all mvn invocations.
>
>
>> - there is no real benefit in retiring the travis support in CiBot; the
>> important part is whether Travis is run or not for pull requests.
>>  From what I can tell though azure seems to be working fine for pull
>> requests, so +1 to at least disable the travis PR runs.
>
> So we disable Travis for https://github.com/flink-ci/flink ? I will do it
> once there are no new concerns and above tickets are resolved.
>
> What about disabling travis for master pushes? (e.g. removing the
> .travis.yml file from master)?
>
>
> @Dian:
> Thanks a lot for your feedback.
>
> - The report of Azure is still not viewable[1] (I noticed that Hequn has
>> also reported this issue in another thread). This is very useful
>> information.
>
> You are referring to the emails send to [hidden email] right?
> I have reported this both as a bug [1] and a feature request [2] to Azure.
> But I don't believe they will resolve this issue anytime soon.
> Azure has an notifications API that we could use to build a service that
> sends emails to that list, but I feel that this is really a waste of time.
> The URL in the link even contains the ID of the build. We would just need
> to extract this ID and generate the appropriate URL. I will try to directly
> reach the product management of AZP, maybe I can get some attention from
> there.
>
>
>
> [1]
> https://developercommunity.visualstudio.com/content/problem/957778/third-parties-are-unable-to-access-notification-li.html?childToView=960403#comment-960403
> [2]
> https://developercommunity.visualstudio.com/content/idea/960472/third-parties-are-unable-to-access-notification-li-1.html
>
>
>
> On Wed, Mar 25, 2020 at 10:34 AM Chesnay Schepler <[hidden email]>
> wrote:
>
>> It was left out since it adds significant additional complexity and the
>> value is dubious at best for PRs that aren't merged shortly after the
>> build has finished.
>>
>> On 25/03/2020 10:28, Dian Fu wrote:
>>> Thanks for the information. I'm sorry that I'm not aware of this before
>> and I have checked the build log of travis and confirmed that this is true.
>>> @Chesnay Are there any specific reasons for this and is it possible to
>> add this back for Azure Pipelines?
>>> Thanks,
>>> Dian
>>>
>>>> 在 2020年3月25日,下午4:43,Chesnay Schepler <[hidden email]> 写道:
>>>>
>>>> @Dian we haven't been rebasing PR's against master for months, ever
>> since we switched to CiBot.
>>>> On 25/03/2020 09:29, Dian Fu wrote:
>>>>> Hi Robert,
>>>>>
>>>>> Thanks a lot for your great work!
>>>>>
>>>>> Overall I'm +1 to switch to Azure as the primary CI tool if it's
>> stable enough as I think there is no need to run both the travis and Azure
>> for one single PR.
>>>>> However, there are still some improvements need to do and it would be
>> great if these issues could be addressed before fully switch to Azure:
>>>>> - The report of Azure is still not viewable[1] (I noticed that Hequn
>> has also reported this issue in another thread). This is very useful
>> information.
>>>>> - For PR test of Azure pipeline, it seems that it will not rebase the
>> master code before running the tests.
>>>>> Thanks,
>>>>> Dian
>>>>>
>>>>> [1]
>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
>> <
>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9>
>> <
>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
>> <
>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
>>>>>> 在 2020年3月25日,下午3:33,Chesnay Schepler <[hidden email]> 写道:
>>>>>>
>>>>>> Some thoughts:
>>>>>> - by virtue of maintaining the past 2 releases we will have to
>> maintain any Travis infrastructure as long as 1.10 is supported, i.e.,
>> until 1.12
>>>>>> - the azure setup doesn't appear to be equivalent yet since the java
>> e2e profile isn't setting the hadoop switch (-Pe2e-hadoop), as a result of
>> which SQLClientKafkaITCase isn't run
>>>>>> - the nightly scripts still seems to be using a maven version other
>> than 3.2.5; from today on master:
>>>>>> 2020-03-25T05:31:52.7412964Z [INFO] --------<
>> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
>>>>>> 2020-03-25T05:31:52.7413854Z [INFO] Building
>> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT       [39/46]
>>>>>> 2020-03-25T05:31:52.7414689Z [INFO] --------------------------------[
>> jar ]---------------------------------
>>>>>> 2020-03-25T05:31:52.7518360Z [INFO]
>>>>>> 2020-03-25T05:31:52.7519770Z [INFO] ---
>> maven-checkstyle-plugin:2.17:check (validate) @
>> flink-end-to-end-tests-common-kafka ---
>>>>>> - there is no real benefit in retiring the travis support in CiBot;
>> the important part is whether Travis is run or not for pull requests.
>>>>>>   From what I can tell though azure seems to be working fine for pull
>> requests, so +1 to at least disable the travis PR runs.
>>>>>> On 23/03/2020 14:48, Robert Metzger wrote:
>>>>>>> Hey devs,
>>>>>>>
>>>>>>> I would like to discuss whether it makes sense to fully switch to
>> Azure
>>>>>>> Pipelines and phase out our Travis integration.
>>>>>>> More information on our Azure integration can be found here:
>>>>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
>>>>>>> Travis will stay for the release-1.10 and older branches, as I have
>> set up
>>>>>>> Azure only for the master branch.
>>>>>>>
>>>>>>> Proposal:
>>>>>>> - We keep the flinkbot infrastructure supporting both Travis and
>> Azure
>>>>>>> around, while we are still receive pull requests and pushes for the
>>>>>>> "master" and "release-1.10" branches.
>>>>>>> - We remove the travis-specific files from "master", so that builds
>> are not
>>>>>>> triggered anymore
>>>>>>> - once we receive no more builds at Travis (because 1.11 has been
>>>>>>> released), we remove the remaining travis-related infrastructure
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>> Robert
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

Robert Metzger
Thank you for your responses.

@Yu Li: In the current master, the log upload always fails, if the e2e job
failed. I just merged a PR that fixes this issue [1]. The problem was not
really the network stability, rather a problem with the interaction of the
jobs in the pipeline (the e2e job did not set the right variables for the
log upload)
Secondly, you are looking at the report of the "flink-ci.flink" pipeline,
where pull requests are build. Naturally, pull request builds fail all the
time, because the PRs are not yet perfect.

"flink-ci.flink-master" is the right pipeline to look at:
https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=8&contextType=build
We have a fairly high number of failures there, because we currently have
some issues downloading the maven artifacts [2]. I'm working already with
Chesnay on merging a fix for that.


[1]
https://github.com/apache/flink/commit/1c86b8b9dd05615a3b2600984db738a9bf388259
[2]https://issues.apache.org/jira/browse/FLINK-16720



On Wed, Mar 25, 2020 at 4:48 PM Chesnay Schepler <[hidden email]> wrote:

> The easiest way to disable travis for pushes is to remove all builds
> from the .travis.yml with a push/pr condition.
>
> On 25/03/2020 15:03, Robert Metzger wrote:
> > Thank you for the feedback so far.
> >
> > Responses to the items Chesnay raised:
> >
> > - by virtue of maintaining the past 2 releases we will have to maintain
> any
> >> Travis infrastructure as long as 1.10 is supported, i.e., until 1.12
> >>
> > Okay. I wasn't sure about the exact policy there.
> >
> >
> >> - the azure setup doesn't appear to be equivalent yet since the java e2e
> >> profile isn't setting the hadoop switch (-Pe2e-hadoop), as a result of
> >> which SQLClientKafkaITCase isn't run
> >>
> > I filed a ticket to address this:
> > https://issues.apache.org/jira/browse/FLINK-16778
> >
> >
> >> - the nightly scripts still seems to be using a maven version other than
> >> 3.2.5; from today on master:
> >> 2020-03-25T05:31:52.7412964Z [INFO] --------<
> >> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
> >> 2020-03-25T05:31:52.7413854Z [INFO] Building
> >> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT [39/46]
> >> 2020-03-25T05:31:52.7414689Z [INFO] --------------------------------[
> jar
> >> ]---------------------------------
> >> 2020-03-25T05:31:52.7518360Z [INFO]
> >> 2020-03-25T05:31:52.7519770Z [INFO] ---
> maven-checkstyle-plugin:2.17:check
> >> (validate) @ flink-end-to-end-tests-common-kafka ---
> >>
> > I'm planning to address this as part of
> > https://issues.apache.org/jira/browse/FLINK-16411, where I work on
> > centralizing all mvn invocations.
> >
> >
> >> - there is no real benefit in retiring the travis support in CiBot; the
> >> important part is whether Travis is run or not for pull requests.
> >>  From what I can tell though azure seems to be working fine for pull
> >> requests, so +1 to at least disable the travis PR runs.
> >
> > So we disable Travis for https://github.com/flink-ci/flink ? I will do
> it
> > once there are no new concerns and above tickets are resolved.
> >
> > What about disabling travis for master pushes? (e.g. removing the
> > .travis.yml file from master)?
> >
> >
> > @Dian:
> > Thanks a lot for your feedback.
> >
> > - The report of Azure is still not viewable[1] (I noticed that Hequn has
> >> also reported this issue in another thread). This is very useful
> >> information.
> >
> > You are referring to the emails send to [hidden email] right?
> > I have reported this both as a bug [1] and a feature request [2] to
> Azure.
> > But I don't believe they will resolve this issue anytime soon.
> > Azure has an notifications API that we could use to build a service that
> > sends emails to that list, but I feel that this is really a waste of
> time.
> > The URL in the link even contains the ID of the build. We would just need
> > to extract this ID and generate the appropriate URL. I will try to
> directly
> > reach the product management of AZP, maybe I can get some attention from
> > there.
> >
> >
> >
> > [1]
> >
> https://developercommunity.visualstudio.com/content/problem/957778/third-parties-are-unable-to-access-notification-li.html?childToView=960403#comment-960403
> > [2]
> >
> https://developercommunity.visualstudio.com/content/idea/960472/third-parties-are-unable-to-access-notification-li-1.html
> >
> >
> >
> > On Wed, Mar 25, 2020 at 10:34 AM Chesnay Schepler <[hidden email]>
> > wrote:
> >
> >> It was left out since it adds significant additional complexity and the
> >> value is dubious at best for PRs that aren't merged shortly after the
> >> build has finished.
> >>
> >> On 25/03/2020 10:28, Dian Fu wrote:
> >>> Thanks for the information. I'm sorry that I'm not aware of this before
> >> and I have checked the build log of travis and confirmed that this is
> true.
> >>> @Chesnay Are there any specific reasons for this and is it possible to
> >> add this back for Azure Pipelines?
> >>> Thanks,
> >>> Dian
> >>>
> >>>> 在 2020年3月25日,下午4:43,Chesnay Schepler <[hidden email]> 写道:
> >>>>
> >>>> @Dian we haven't been rebasing PR's against master for months, ever
> >> since we switched to CiBot.
> >>>> On 25/03/2020 09:29, Dian Fu wrote:
> >>>>> Hi Robert,
> >>>>>
> >>>>> Thanks a lot for your great work!
> >>>>>
> >>>>> Overall I'm +1 to switch to Azure as the primary CI tool if it's
> >> stable enough as I think there is no need to run both the travis and
> Azure
> >> for one single PR.
> >>>>> However, there are still some improvements need to do and it would be
> >> great if these issues could be addressed before fully switch to Azure:
> >>>>> - The report of Azure is still not viewable[1] (I noticed that Hequn
> >> has also reported this issue in another thread). This is very useful
> >> information.
> >>>>> - For PR test of Azure pipeline, it seems that it will not rebase the
> >> master code before running the tests.
> >>>>> Thanks,
> >>>>> Dian
> >>>>>
> >>>>> [1]
> >>
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> >> <
> >>
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> >
> >> <
> >>
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> >> <
> >>
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> >>>>>> 在 2020年3月25日,下午3:33,Chesnay Schepler <[hidden email]> 写道:
> >>>>>>
> >>>>>> Some thoughts:
> >>>>>> - by virtue of maintaining the past 2 releases we will have to
> >> maintain any Travis infrastructure as long as 1.10 is supported, i.e.,
> >> until 1.12
> >>>>>> - the azure setup doesn't appear to be equivalent yet since the java
> >> e2e profile isn't setting the hadoop switch (-Pe2e-hadoop), as a result
> of
> >> which SQLClientKafkaITCase isn't run
> >>>>>> - the nightly scripts still seems to be using a maven version other
> >> than 3.2.5; from today on master:
> >>>>>> 2020-03-25T05:31:52.7412964Z [INFO] --------<
> >> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
> >>>>>> 2020-03-25T05:31:52.7413854Z [INFO] Building
> >> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT       [39/46]
> >>>>>> 2020-03-25T05:31:52.7414689Z [INFO]
> --------------------------------[
> >> jar ]---------------------------------
> >>>>>> 2020-03-25T05:31:52.7518360Z [INFO]
> >>>>>> 2020-03-25T05:31:52.7519770Z [INFO] ---
> >> maven-checkstyle-plugin:2.17:check (validate) @
> >> flink-end-to-end-tests-common-kafka ---
> >>>>>> - there is no real benefit in retiring the travis support in CiBot;
> >> the important part is whether Travis is run or not for pull requests.
> >>>>>>   From what I can tell though azure seems to be working fine for
> pull
> >> requests, so +1 to at least disable the travis PR runs.
> >>>>>> On 23/03/2020 14:48, Robert Metzger wrote:
> >>>>>>> Hey devs,
> >>>>>>>
> >>>>>>> I would like to discuss whether it makes sense to fully switch to
> >> Azure
> >>>>>>> Pipelines and phase out our Travis integration.
> >>>>>>> More information on our Azure integration can be found here:
> >>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
> >>>>>>> Travis will stay for the release-1.10 and older branches, as I have
> >> set up
> >>>>>>> Azure only for the master branch.
> >>>>>>>
> >>>>>>> Proposal:
> >>>>>>> - We keep the flinkbot infrastructure supporting both Travis and
> >> Azure
> >>>>>>> around, while we are still receive pull requests and pushes for the
> >>>>>>> "master" and "release-1.10" branches.
> >>>>>>> - We remove the travis-specific files from "master", so that builds
> >> are not
> >>>>>>> triggered anymore
> >>>>>>> - once we receive no more builds at Travis (because 1.11 has been
> >>>>>>> released), we remove the remaining travis-related infrastructure
> >>>>>>>
> >>>>>>> What do you think?
> >>>>>>>
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Robert
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

Till Rohrmann
Thanks for driving this effort Robert. I'd be in favour of disabling Travis
for PRs once AZP is decently stable.

Cheers,
Till

On Wed, Mar 25, 2020 at 8:28 PM Robert Metzger <[hidden email]> wrote:

> Thank you for your responses.
>
> @Yu Li: In the current master, the log upload always fails, if the e2e job
> failed. I just merged a PR that fixes this issue [1]. The problem was not
> really the network stability, rather a problem with the interaction of the
> jobs in the pipeline (the e2e job did not set the right variables for the
> log upload)
> Secondly, you are looking at the report of the "flink-ci.flink" pipeline,
> where pull requests are build. Naturally, pull request builds fail all the
> time, because the PRs are not yet perfect.
>
> "flink-ci.flink-master" is the right pipeline to look at:
>
> https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=8&contextType=build
> We have a fairly high number of failures there, because we currently have
> some issues downloading the maven artifacts [2]. I'm working already with
> Chesnay on merging a fix for that.
>
>
> [1]
>
> https://github.com/apache/flink/commit/1c86b8b9dd05615a3b2600984db738a9bf388259
> [2]https://issues.apache.org/jira/browse/FLINK-16720
>
>
>
> On Wed, Mar 25, 2020 at 4:48 PM Chesnay Schepler <[hidden email]>
> wrote:
>
> > The easiest way to disable travis for pushes is to remove all builds
> > from the .travis.yml with a push/pr condition.
> >
> > On 25/03/2020 15:03, Robert Metzger wrote:
> > > Thank you for the feedback so far.
> > >
> > > Responses to the items Chesnay raised:
> > >
> > > - by virtue of maintaining the past 2 releases we will have to maintain
> > any
> > >> Travis infrastructure as long as 1.10 is supported, i.e., until 1.12
> > >>
> > > Okay. I wasn't sure about the exact policy there.
> > >
> > >
> > >> - the azure setup doesn't appear to be equivalent yet since the java
> e2e
> > >> profile isn't setting the hadoop switch (-Pe2e-hadoop), as a result of
> > >> which SQLClientKafkaITCase isn't run
> > >>
> > > I filed a ticket to address this:
> > > https://issues.apache.org/jira/browse/FLINK-16778
> > >
> > >
> > >> - the nightly scripts still seems to be using a maven version other
> than
> > >> 3.2.5; from today on master:
> > >> 2020-03-25T05:31:52.7412964Z [INFO] --------<
> > >> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
> > >> 2020-03-25T05:31:52.7413854Z [INFO] Building
> > >> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT [39/46]
> > >> 2020-03-25T05:31:52.7414689Z [INFO] --------------------------------[
> > jar
> > >> ]---------------------------------
> > >> 2020-03-25T05:31:52.7518360Z [INFO]
> > >> 2020-03-25T05:31:52.7519770Z [INFO] ---
> > maven-checkstyle-plugin:2.17:check
> > >> (validate) @ flink-end-to-end-tests-common-kafka ---
> > >>
> > > I'm planning to address this as part of
> > > https://issues.apache.org/jira/browse/FLINK-16411, where I work on
> > > centralizing all mvn invocations.
> > >
> > >
> > >> - there is no real benefit in retiring the travis support in CiBot;
> the
> > >> important part is whether Travis is run or not for pull requests.
> > >>  From what I can tell though azure seems to be working fine for pull
> > >> requests, so +1 to at least disable the travis PR runs.
> > >
> > > So we disable Travis for https://github.com/flink-ci/flink ? I will do
> > it
> > > once there are no new concerns and above tickets are resolved.
> > >
> > > What about disabling travis for master pushes? (e.g. removing the
> > > .travis.yml file from master)?
> > >
> > >
> > > @Dian:
> > > Thanks a lot for your feedback.
> > >
> > > - The report of Azure is still not viewable[1] (I noticed that Hequn
> has
> > >> also reported this issue in another thread). This is very useful
> > >> information.
> > >
> > > You are referring to the emails send to [hidden email] right?
> > > I have reported this both as a bug [1] and a feature request [2] to
> > Azure.
> > > But I don't believe they will resolve this issue anytime soon.
> > > Azure has an notifications API that we could use to build a service
> that
> > > sends emails to that list, but I feel that this is really a waste of
> > time.
> > > The URL in the link even contains the ID of the build. We would just
> need
> > > to extract this ID and generate the appropriate URL. I will try to
> > directly
> > > reach the product management of AZP, maybe I can get some attention
> from
> > > there.
> > >
> > >
> > >
> > > [1]
> > >
> >
> https://developercommunity.visualstudio.com/content/problem/957778/third-parties-are-unable-to-access-notification-li.html?childToView=960403#comment-960403
> > > [2]
> > >
> >
> https://developercommunity.visualstudio.com/content/idea/960472/third-parties-are-unable-to-access-notification-li-1.html
> > >
> > >
> > >
> > > On Wed, Mar 25, 2020 at 10:34 AM Chesnay Schepler <[hidden email]>
> > > wrote:
> > >
> > >> It was left out since it adds significant additional complexity and
> the
> > >> value is dubious at best for PRs that aren't merged shortly after the
> > >> build has finished.
> > >>
> > >> On 25/03/2020 10:28, Dian Fu wrote:
> > >>> Thanks for the information. I'm sorry that I'm not aware of this
> before
> > >> and I have checked the build log of travis and confirmed that this is
> > true.
> > >>> @Chesnay Are there any specific reasons for this and is it possible
> to
> > >> add this back for Azure Pipelines?
> > >>> Thanks,
> > >>> Dian
> > >>>
> > >>>> 在 2020年3月25日,下午4:43,Chesnay Schepler <[hidden email]> 写道:
> > >>>>
> > >>>> @Dian we haven't been rebasing PR's against master for months, ever
> > >> since we switched to CiBot.
> > >>>> On 25/03/2020 09:29, Dian Fu wrote:
> > >>>>> Hi Robert,
> > >>>>>
> > >>>>> Thanks a lot for your great work!
> > >>>>>
> > >>>>> Overall I'm +1 to switch to Azure as the primary CI tool if it's
> > >> stable enough as I think there is no need to run both the travis and
> > Azure
> > >> for one single PR.
> > >>>>> However, there are still some improvements need to do and it would
> be
> > >> great if these issues could be addressed before fully switch to Azure:
> > >>>>> - The report of Azure is still not viewable[1] (I noticed that
> Hequn
> > >> has also reported this issue in another thread). This is very useful
> > >> information.
> > >>>>> - For PR test of Azure pipeline, it seems that it will not rebase
> the
> > >> master code before running the tests.
> > >>>>> Thanks,
> > >>>>> Dian
> > >>>>>
> > >>>>> [1]
> > >>
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > >> <
> > >>
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > >
> > >> <
> > >>
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > >> <
> > >>
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > >>>>>> 在 2020年3月25日,下午3:33,Chesnay Schepler <[hidden email]> 写道:
> > >>>>>>
> > >>>>>> Some thoughts:
> > >>>>>> - by virtue of maintaining the past 2 releases we will have to
> > >> maintain any Travis infrastructure as long as 1.10 is supported, i.e.,
> > >> until 1.12
> > >>>>>> - the azure setup doesn't appear to be equivalent yet since the
> java
> > >> e2e profile isn't setting the hadoop switch (-Pe2e-hadoop), as a
> result
> > of
> > >> which SQLClientKafkaITCase isn't run
> > >>>>>> - the nightly scripts still seems to be using a maven version
> other
> > >> than 3.2.5; from today on master:
> > >>>>>> 2020-03-25T05:31:52.7412964Z [INFO] --------<
> > >> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
> > >>>>>> 2020-03-25T05:31:52.7413854Z [INFO] Building
> > >> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT       [39/46]
> > >>>>>> 2020-03-25T05:31:52.7414689Z [INFO]
> > --------------------------------[
> > >> jar ]---------------------------------
> > >>>>>> 2020-03-25T05:31:52.7518360Z [INFO]
> > >>>>>> 2020-03-25T05:31:52.7519770Z [INFO] ---
> > >> maven-checkstyle-plugin:2.17:check (validate) @
> > >> flink-end-to-end-tests-common-kafka ---
> > >>>>>> - there is no real benefit in retiring the travis support in
> CiBot;
> > >> the important part is whether Travis is run or not for pull requests.
> > >>>>>>   From what I can tell though azure seems to be working fine for
> > pull
> > >> requests, so +1 to at least disable the travis PR runs.
> > >>>>>> On 23/03/2020 14:48, Robert Metzger wrote:
> > >>>>>>> Hey devs,
> > >>>>>>>
> > >>>>>>> I would like to discuss whether it makes sense to fully switch to
> > >> Azure
> > >>>>>>> Pipelines and phase out our Travis integration.
> > >>>>>>> More information on our Azure integration can be found here:
> > >>>>>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
> > >>>>>>> Travis will stay for the release-1.10 and older branches, as I
> have
> > >> set up
> > >>>>>>> Azure only for the master branch.
> > >>>>>>>
> > >>>>>>> Proposal:
> > >>>>>>> - We keep the flinkbot infrastructure supporting both Travis and
> > >> Azure
> > >>>>>>> around, while we are still receive pull requests and pushes for
> the
> > >>>>>>> "master" and "release-1.10" branches.
> > >>>>>>> - We remove the travis-specific files from "master", so that
> builds
> > >> are not
> > >>>>>>> triggered anymore
> > >>>>>>> - once we receive no more builds at Travis (because 1.11 has been
> > >>>>>>> released), we remove the remaining travis-related infrastructure
> > >>>>>>>
> > >>>>>>> What do you think?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Robert
> > >>
> > >>
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

Yu Li
In reply to this post by Robert Metzger
Thanks for the clarification Robert.

Since the first step plan is to replace the travis PR runs, I checked all
PR builds from 2020-01-01 (PR#10735-11526) [1], and below is the result:

* Travis FAILURE: 298
* Travis SUCCESS: 649 (68.5%)
* Azure FAILURE: 420
* Azure SUCCESS: 571 (57.6%)

Since the patch for each run is equivalent for Travis and Azure, there
seems to be slightly higher failure rate (~10%) when running in Azure.

However, with the just-merged fix for uploading logs (FLINK-16480), I
believe the success rate of Azure could compete with Travis now (uploading
files contribute to 20% of the failures according to the report [2]).

So I'm +1 to disable travis runs according to the numbers.

Best Regards,
Yu

[1]
https://github.com/apache/flink/pulls?q=is%3Apr+created%3A%3E%3D2020-01-01
[2]
https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=4

On Thu, 26 Mar 2020 at 03:28, Robert Metzger <[hidden email]> wrote:

> Thank you for your responses.
>
> @Yu Li: In the current master, the log upload always fails, if the e2e job
> failed. I just merged a PR that fixes this issue [1]. The problem was not
> really the network stability, rather a problem with the interaction of the
> jobs in the pipeline (the e2e job did not set the right variables for the
> log upload)
> Secondly, you are looking at the report of the "flink-ci.flink" pipeline,
> where pull requests are build. Naturally, pull request builds fail all the
> time, because the PRs are not yet perfect.
>
> "flink-ci.flink-master" is the right pipeline to look at:
>
> https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=8&contextType=build
> We have a fairly high number of failures there, because we currently have
> some issues downloading the maven artifacts [2]. I'm working already with
> Chesnay on merging a fix for that.
>
>
> [1]
>
> https://github.com/apache/flink/commit/1c86b8b9dd05615a3b2600984db738a9bf388259
> [2]https://issues.apache.org/jira/browse/FLINK-16720
>
>
>
> On Wed, Mar 25, 2020 at 4:48 PM Chesnay Schepler <[hidden email]>
> wrote:
>
> > The easiest way to disable travis for pushes is to remove all builds
> > from the .travis.yml with a push/pr condition.
> >
> > On 25/03/2020 15:03, Robert Metzger wrote:
> > > Thank you for the feedback so far.
> > >
> > > Responses to the items Chesnay raised:
> > >
> > > - by virtue of maintaining the past 2 releases we will have to maintain
> > any
> > >> Travis infrastructure as long as 1.10 is supported, i.e., until 1.12
> > >>
> > > Okay. I wasn't sure about the exact policy there.
> > >
> > >
> > >> - the azure setup doesn't appear to be equivalent yet since the java
> e2e
> > >> profile isn't setting the hadoop switch (-Pe2e-hadoop), as a result of
> > >> which SQLClientKafkaITCase isn't run
> > >>
> > > I filed a ticket to address this:
> > > https://issues.apache.org/jira/browse/FLINK-16778
> > >
> > >
> > >> - the nightly scripts still seems to be using a maven version other
> than
> > >> 3.2.5; from today on master:
> > >> 2020-03-25T05:31:52.7412964Z [INFO] --------<
> > >> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
> > >> 2020-03-25T05:31:52.7413854Z [INFO] Building
> > >> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT [39/46]
> > >> 2020-03-25T05:31:52.7414689Z [INFO] --------------------------------[
> > jar
> > >> ]---------------------------------
> > >> 2020-03-25T05:31:52.7518360Z [INFO]
> > >> 2020-03-25T05:31:52.7519770Z [INFO] ---
> > maven-checkstyle-plugin:2.17:check
> > >> (validate) @ flink-end-to-end-tests-common-kafka ---
> > >>
> > > I'm planning to address this as part of
> > > https://issues.apache.org/jira/browse/FLINK-16411, where I work on
> > > centralizing all mvn invocations.
> > >
> > >
> > >> - there is no real benefit in retiring the travis support in CiBot;
> the
> > >> important part is whether Travis is run or not for pull requests.
> > >>  From what I can tell though azure seems to be working fine for pull
> > >> requests, so +1 to at least disable the travis PR runs.
> > >
> > > So we disable Travis for https://github.com/flink-ci/flink ? I will do
> > it
> > > once there are no new concerns and above tickets are resolved.
> > >
> > > What about disabling travis for master pushes? (e.g. removing the
> > > .travis.yml file from master)?
> > >
> > >
> > > @Dian:
> > > Thanks a lot for your feedback.
> > >
> > > - The report of Azure is still not viewable[1] (I noticed that Hequn
> has
> > >> also reported this issue in another thread). This is very useful
> > >> information.
> > >
> > > You are referring to the emails send to [hidden email] right?
> > > I have reported this both as a bug [1] and a feature request [2] to
> > Azure.
> > > But I don't believe they will resolve this issue anytime soon.
> > > Azure has an notifications API that we could use to build a service
> that
> > > sends emails to that list, but I feel that this is really a waste of
> > time.
> > > The URL in the link even contains the ID of the build. We would just
> need
> > > to extract this ID and generate the appropriate URL. I will try to
> > directly
> > > reach the product management of AZP, maybe I can get some attention
> from
> > > there.
> > >
> > >
> > >
> > > [1]
> > >
> >
> https://developercommunity.visualstudio.com/content/problem/957778/third-parties-are-unable-to-access-notification-li.html?childToView=960403#comment-960403
> > > [2]
> > >
> >
> https://developercommunity.visualstudio.com/content/idea/960472/third-parties-are-unable-to-access-notification-li-1.html
> > >
> > >
> > >
> > > On Wed, Mar 25, 2020 at 10:34 AM Chesnay Schepler <[hidden email]>
> > > wrote:
> > >
> > >> It was left out since it adds significant additional complexity and
> the
> > >> value is dubious at best for PRs that aren't merged shortly after the
> > >> build has finished.
> > >>
> > >> On 25/03/2020 10:28, Dian Fu wrote:
> > >>> Thanks for the information. I'm sorry that I'm not aware of this
> before
> > >> and I have checked the build log of travis and confirmed that this is
> > true.
> > >>> @Chesnay Are there any specific reasons for this and is it possible
> to
> > >> add this back for Azure Pipelines?
> > >>> Thanks,
> > >>> Dian
> > >>>
> > >>>> 在 2020年3月25日,下午4:43,Chesnay Schepler <[hidden email]> 写道:
> > >>>>
> > >>>> @Dian we haven't been rebasing PR's against master for months, ever
> > >> since we switched to CiBot.
> > >>>> On 25/03/2020 09:29, Dian Fu wrote:
> > >>>>> Hi Robert,
> > >>>>>
> > >>>>> Thanks a lot for your great work!
> > >>>>>
> > >>>>> Overall I'm +1 to switch to Azure as the primary CI tool if it's
> > >> stable enough as I think there is no need to run both the travis and
> > Azure
> > >> for one single PR.
> > >>>>> However, there are still some improvements need to do and it would
> be
> > >> great if these issues could be addressed before fully switch to Azure:
> > >>>>> - The report of Azure is still not viewable[1] (I noticed that
> Hequn
> > >> has also reported this issue in another thread). This is very useful
> > >> information.
> > >>>>> - For PR test of Azure pipeline, it seems that it will not rebase
> the
> > >> master code before running the tests.
> > >>>>> Thanks,
> > >>>>> Dian
> > >>>>>
> > >>>>> [1]
> > >>
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > >> <
> > >>
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > >
> > >> <
> > >>
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > >> <
> > >>
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > >>>>>> 在 2020年3月25日,下午3:33,Chesnay Schepler <[hidden email]> 写道:
> > >>>>>>
> > >>>>>> Some thoughts:
> > >>>>>> - by virtue of maintaining the past 2 releases we will have to
> > >> maintain any Travis infrastructure as long as 1.10 is supported, i.e.,
> > >> until 1.12
> > >>>>>> - the azure setup doesn't appear to be equivalent yet since the
> java
> > >> e2e profile isn't setting the hadoop switch (-Pe2e-hadoop), as a
> result
> > of
> > >> which SQLClientKafkaITCase isn't run
> > >>>>>> - the nightly scripts still seems to be using a maven version
> other
> > >> than 3.2.5; from today on master:
> > >>>>>> 2020-03-25T05:31:52.7412964Z [INFO] --------<
> > >> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
> > >>>>>> 2020-03-25T05:31:52.7413854Z [INFO] Building
> > >> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT       [39/46]
> > >>>>>> 2020-03-25T05:31:52.7414689Z [INFO]
> > --------------------------------[
> > >> jar ]---------------------------------
> > >>>>>> 2020-03-25T05:31:52.7518360Z [INFO]
> > >>>>>> 2020-03-25T05:31:52.7519770Z [INFO] ---
> > >> maven-checkstyle-plugin:2.17:check (validate) @
> > >> flink-end-to-end-tests-common-kafka ---
> > >>>>>> - there is no real benefit in retiring the travis support in
> CiBot;
> > >> the important part is whether Travis is run or not for pull requests.
> > >>>>>>   From what I can tell though azure seems to be working fine for
> > pull
> > >> requests, so +1 to at least disable the travis PR runs.
> > >>>>>> On 23/03/2020 14:48, Robert Metzger wrote:
> > >>>>>>> Hey devs,
> > >>>>>>>
> > >>>>>>> I would like to discuss whether it makes sense to fully switch to
> > >> Azure
> > >>>>>>> Pipelines and phase out our Travis integration.
> > >>>>>>> More information on our Azure integration can be found here:
> > >>>>>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
> > >>>>>>> Travis will stay for the release-1.10 and older branches, as I
> have
> > >> set up
> > >>>>>>> Azure only for the master branch.
> > >>>>>>>
> > >>>>>>> Proposal:
> > >>>>>>> - We keep the flinkbot infrastructure supporting both Travis and
> > >> Azure
> > >>>>>>> around, while we are still receive pull requests and pushes for
> the
> > >>>>>>> "master" and "release-1.10" branches.
> > >>>>>>> - We remove the travis-specific files from "master", so that
> builds
> > >> are not
> > >>>>>>> triggered anymore
> > >>>>>>> - once we receive no more builds at Travis (because 1.11 has been
> > >>>>>>> released), we remove the remaining travis-related infrastructure
> > >>>>>>>
> > >>>>>>> What do you think?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Robert
> > >>
> > >>
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

Gary Yao-4
I am in favour of decommissioning Travis.

Moreover, I wanted to use this thread to raise another issue with Travis
that I
have discovered recently; many of the builds running in my private Travis
account are timing out in the compilation stage (i.e., compilation takes
more
than 50 minutes). This means that I am not able to reliably run a full
build on
a CI server without creating a pull request. If other developers also
experience
this issue, it would speak for putting more effort into making Azure
Pipelines
the project-wide default.

Best,
Gary

On Thu, Mar 26, 2020 at 12:26 PM Yu Li <[hidden email]> wrote:

> Thanks for the clarification Robert.
>
> Since the first step plan is to replace the travis PR runs, I checked all
> PR builds from 2020-01-01 (PR#10735-11526) [1], and below is the result:
>
> * Travis FAILURE: 298
> * Travis SUCCESS: 649 (68.5%)
> * Azure FAILURE: 420
> * Azure SUCCESS: 571 (57.6%)
>
> Since the patch for each run is equivalent for Travis and Azure, there
> seems to be slightly higher failure rate (~10%) when running in Azure.
>
> However, with the just-merged fix for uploading logs (FLINK-16480), I
> believe the success rate of Azure could compete with Travis now (uploading
> files contribute to 20% of the failures according to the report [2]).
>
> So I'm +1 to disable travis runs according to the numbers.
>
> Best Regards,
> Yu
>
> [1]
> https://github.com/apache/flink/pulls?q=is%3Apr+created%3A%3E%3D2020-01-01
> [2]
>
> https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=4
>
> On Thu, 26 Mar 2020 at 03:28, Robert Metzger <[hidden email]> wrote:
>
> > Thank you for your responses.
> >
> > @Yu Li: In the current master, the log upload always fails, if the e2e
> job
> > failed. I just merged a PR that fixes this issue [1]. The problem was not
> > really the network stability, rather a problem with the interaction of
> the
> > jobs in the pipeline (the e2e job did not set the right variables for the
> > log upload)
> > Secondly, you are looking at the report of the "flink-ci.flink" pipeline,
> > where pull requests are build. Naturally, pull request builds fail all
> the
> > time, because the PRs are not yet perfect.
> >
> > "flink-ci.flink-master" is the right pipeline to look at:
> >
> >
> https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=8&contextType=build
> > We have a fairly high number of failures there, because we currently have
> > some issues downloading the maven artifacts [2]. I'm working already with
> > Chesnay on merging a fix for that.
> >
> >
> > [1]
> >
> >
> https://github.com/apache/flink/commit/1c86b8b9dd05615a3b2600984db738a9bf388259
> > [2]https://issues.apache.org/jira/browse/FLINK-16720
> >
> >
> >
> > On Wed, Mar 25, 2020 at 4:48 PM Chesnay Schepler <[hidden email]>
> > wrote:
> >
> > > The easiest way to disable travis for pushes is to remove all builds
> > > from the .travis.yml with a push/pr condition.
> > >
> > > On 25/03/2020 15:03, Robert Metzger wrote:
> > > > Thank you for the feedback so far.
> > > >
> > > > Responses to the items Chesnay raised:
> > > >
> > > > - by virtue of maintaining the past 2 releases we will have to
> maintain
> > > any
> > > >> Travis infrastructure as long as 1.10 is supported, i.e., until 1.12
> > > >>
> > > > Okay. I wasn't sure about the exact policy there.
> > > >
> > > >
> > > >> - the azure setup doesn't appear to be equivalent yet since the java
> > e2e
> > > >> profile isn't setting the hadoop switch (-Pe2e-hadoop), as a result
> of
> > > >> which SQLClientKafkaITCase isn't run
> > > >>
> > > > I filed a ticket to address this:
> > > > https://issues.apache.org/jira/browse/FLINK-16778
> > > >
> > > >
> > > >> - the nightly scripts still seems to be using a maven version other
> > than
> > > >> 3.2.5; from today on master:
> > > >> 2020-03-25T05:31:52.7412964Z [INFO] --------<
> > > >> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
> > > >> 2020-03-25T05:31:52.7413854Z [INFO] Building
> > > >> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT [39/46]
> > > >> 2020-03-25T05:31:52.7414689Z [INFO]
> --------------------------------[
> > > jar
> > > >> ]---------------------------------
> > > >> 2020-03-25T05:31:52.7518360Z [INFO]
> > > >> 2020-03-25T05:31:52.7519770Z [INFO] ---
> > > maven-checkstyle-plugin:2.17:check
> > > >> (validate) @ flink-end-to-end-tests-common-kafka ---
> > > >>
> > > > I'm planning to address this as part of
> > > > https://issues.apache.org/jira/browse/FLINK-16411, where I work on
> > > > centralizing all mvn invocations.
> > > >
> > > >
> > > >> - there is no real benefit in retiring the travis support in CiBot;
> > the
> > > >> important part is whether Travis is run or not for pull requests.
> > > >>  From what I can tell though azure seems to be working fine for pull
> > > >> requests, so +1 to at least disable the travis PR runs.
> > > >
> > > > So we disable Travis for https://github.com/flink-ci/flink ? I will
> do
> > > it
> > > > once there are no new concerns and above tickets are resolved.
> > > >
> > > > What about disabling travis for master pushes? (e.g. removing the
> > > > .travis.yml file from master)?
> > > >
> > > >
> > > > @Dian:
> > > > Thanks a lot for your feedback.
> > > >
> > > > - The report of Azure is still not viewable[1] (I noticed that Hequn
> > has
> > > >> also reported this issue in another thread). This is very useful
> > > >> information.
> > > >
> > > > You are referring to the emails send to [hidden email] right?
> > > > I have reported this both as a bug [1] and a feature request [2] to
> > > Azure.
> > > > But I don't believe they will resolve this issue anytime soon.
> > > > Azure has an notifications API that we could use to build a service
> > that
> > > > sends emails to that list, but I feel that this is really a waste of
> > > time.
> > > > The URL in the link even contains the ID of the build. We would just
> > need
> > > > to extract this ID and generate the appropriate URL. I will try to
> > > directly
> > > > reach the product management of AZP, maybe I can get some attention
> > from
> > > > there.
> > > >
> > > >
> > > >
> > > > [1]
> > > >
> > >
> >
> https://developercommunity.visualstudio.com/content/problem/957778/third-parties-are-unable-to-access-notification-li.html?childToView=960403#comment-960403
> > > > [2]
> > > >
> > >
> >
> https://developercommunity.visualstudio.com/content/idea/960472/third-parties-are-unable-to-access-notification-li-1.html
> > > >
> > > >
> > > >
> > > > On Wed, Mar 25, 2020 at 10:34 AM Chesnay Schepler <
> [hidden email]>
> > > > wrote:
> > > >
> > > >> It was left out since it adds significant additional complexity and
> > the
> > > >> value is dubious at best for PRs that aren't merged shortly after
> the
> > > >> build has finished.
> > > >>
> > > >> On 25/03/2020 10:28, Dian Fu wrote:
> > > >>> Thanks for the information. I'm sorry that I'm not aware of this
> > before
> > > >> and I have checked the build log of travis and confirmed that this
> is
> > > true.
> > > >>> @Chesnay Are there any specific reasons for this and is it possible
> > to
> > > >> add this back for Azure Pipelines?
> > > >>> Thanks,
> > > >>> Dian
> > > >>>
> > > >>>> 在 2020年3月25日,下午4:43,Chesnay Schepler <[hidden email]> 写道:
> > > >>>>
> > > >>>> @Dian we haven't been rebasing PR's against master for months,
> ever
> > > >> since we switched to CiBot.
> > > >>>> On 25/03/2020 09:29, Dian Fu wrote:
> > > >>>>> Hi Robert,
> > > >>>>>
> > > >>>>> Thanks a lot for your great work!
> > > >>>>>
> > > >>>>> Overall I'm +1 to switch to Azure as the primary CI tool if it's
> > > >> stable enough as I think there is no need to run both the travis and
> > > Azure
> > > >> for one single PR.
> > > >>>>> However, there are still some improvements need to do and it
> would
> > be
> > > >> great if these issues could be addressed before fully switch to
> Azure:
> > > >>>>> - The report of Azure is still not viewable[1] (I noticed that
> > Hequn
> > > >> has also reported this issue in another thread). This is very useful
> > > >> information.
> > > >>>>> - For PR test of Azure pipeline, it seems that it will not rebase
> > the
> > > >> master code before running the tests.
> > > >>>>> Thanks,
> > > >>>>> Dian
> > > >>>>>
> > > >>>>> [1]
> > > >>
> > >
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > > >> <
> > > >>
> > >
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > > >
> > > >> <
> > > >>
> > >
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > > >> <
> > > >>
> > >
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > > >>>>>> 在 2020年3月25日,下午3:33,Chesnay Schepler <[hidden email]> 写道:
> > > >>>>>>
> > > >>>>>> Some thoughts:
> > > >>>>>> - by virtue of maintaining the past 2 releases we will have to
> > > >> maintain any Travis infrastructure as long as 1.10 is supported,
> i.e.,
> > > >> until 1.12
> > > >>>>>> - the azure setup doesn't appear to be equivalent yet since the
> > java
> > > >> e2e profile isn't setting the hadoop switch (-Pe2e-hadoop), as a
> > result
> > > of
> > > >> which SQLClientKafkaITCase isn't run
> > > >>>>>> - the nightly scripts still seems to be using a maven version
> > other
> > > >> than 3.2.5; from today on master:
> > > >>>>>> 2020-03-25T05:31:52.7412964Z [INFO] --------<
> > > >> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
> > > >>>>>> 2020-03-25T05:31:52.7413854Z [INFO] Building
> > > >> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT       [39/46]
> > > >>>>>> 2020-03-25T05:31:52.7414689Z [INFO]
> > > --------------------------------[
> > > >> jar ]---------------------------------
> > > >>>>>> 2020-03-25T05:31:52.7518360Z [INFO]
> > > >>>>>> 2020-03-25T05:31:52.7519770Z [INFO] ---
> > > >> maven-checkstyle-plugin:2.17:check (validate) @
> > > >> flink-end-to-end-tests-common-kafka ---
> > > >>>>>> - there is no real benefit in retiring the travis support in
> > CiBot;
> > > >> the important part is whether Travis is run or not for pull
> requests.
> > > >>>>>>   From what I can tell though azure seems to be working fine for
> > > pull
> > > >> requests, so +1 to at least disable the travis PR runs.
> > > >>>>>> On 23/03/2020 14:48, Robert Metzger wrote:
> > > >>>>>>> Hey devs,
> > > >>>>>>>
> > > >>>>>>> I would like to discuss whether it makes sense to fully switch
> to
> > > >> Azure
> > > >>>>>>> Pipelines and phase out our Travis integration.
> > > >>>>>>> More information on our Azure integration can be found here:
> > > >>>>>>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
> > > >>>>>>> Travis will stay for the release-1.10 and older branches, as I
> > have
> > > >> set up
> > > >>>>>>> Azure only for the master branch.
> > > >>>>>>>
> > > >>>>>>> Proposal:
> > > >>>>>>> - We keep the flinkbot infrastructure supporting both Travis
> and
> > > >> Azure
> > > >>>>>>> around, while we are still receive pull requests and pushes for
> > the
> > > >>>>>>> "master" and "release-1.10" branches.
> > > >>>>>>> - We remove the travis-specific files from "master", so that
> > builds
> > > >> are not
> > > >>>>>>> triggered anymore
> > > >>>>>>> - once we receive no more builds at Travis (because 1.11 has
> been
> > > >>>>>>> released), we remove the remaining travis-related
> infrastructure
> > > >>>>>>>
> > > >>>>>>> What do you think?
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Best,
> > > >>>>>>> Robert
> > > >>
> > > >>
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

Robert Metzger
Thanks a lot for bringing up this topic again.
The reason why I was hesitant to decommission Travis was that we were still
facing some issues with the Azure infrastructure that I want to resolve, so
that we have a strong test coverage.

In the last few weeks, we had the following issues:
- unstable e2e tests (we are running the e2e tests much more frequently,
thus we see more failures (and discover actual bugs!))
- network issues (mostly around downloading maven artifacts. This is solved
at the cost of slower builds. I'm preparing a fix to have stable & fast
maven downloads)
- the private builds were never really stable (this is work in progress.
the situation is definitely better than the private Travis builds)
- I haven't followed the overall master stability closely before February,
but I have the feeling that April so far was a pretty unstable month on
master. Piotr is regularly reverting commits that somehow broke master. The
problem with unstable master is that is causes a "CI fatigue", were people
assume that failing builds are not worth investigating anymore, leading to
more instability. This is not a problem of the CI infrastructure itself,
but it makes me less confident switching systems :)


Unless something unexpected happens, I'm proposing to disable pull request
processing on Travis next week.



On Fri, Apr 17, 2020 at 10:05 AM Gary Yao <[hidden email]> wrote:

> I am in favour of decommissioning Travis.
>
> Moreover, I wanted to use this thread to raise another issue with Travis
> that I
> have discovered recently; many of the builds running in my private Travis
> account are timing out in the compilation stage (i.e., compilation takes
> more
> than 50 minutes). This means that I am not able to reliably run a full
> build on
> a CI server without creating a pull request. If other developers also
> experience
> this issue, it would speak for putting more effort into making Azure
> Pipelines
> the project-wide default.
>
> Best,
> Gary
>
> On Thu, Mar 26, 2020 at 12:26 PM Yu Li <[hidden email]> wrote:
>
> > Thanks for the clarification Robert.
> >
> > Since the first step plan is to replace the travis PR runs, I checked all
> > PR builds from 2020-01-01 (PR#10735-11526) [1], and below is the result:
> >
> > * Travis FAILURE: 298
> > * Travis SUCCESS: 649 (68.5%)
> > * Azure FAILURE: 420
> > * Azure SUCCESS: 571 (57.6%)
> >
> > Since the patch for each run is equivalent for Travis and Azure, there
> > seems to be slightly higher failure rate (~10%) when running in Azure.
> >
> > However, with the just-merged fix for uploading logs (FLINK-16480), I
> > believe the success rate of Azure could compete with Travis now
> (uploading
> > files contribute to 20% of the failures according to the report [2]).
> >
> > So I'm +1 to disable travis runs according to the numbers.
> >
> > Best Regards,
> > Yu
> >
> > [1]
> >
> https://github.com/apache/flink/pulls?q=is%3Apr+created%3A%3E%3D2020-01-01
> > [2]
> >
> >
> https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=4
> >
> > On Thu, 26 Mar 2020 at 03:28, Robert Metzger <[hidden email]>
> wrote:
> >
> > > Thank you for your responses.
> > >
> > > @Yu Li: In the current master, the log upload always fails, if the e2e
> > job
> > > failed. I just merged a PR that fixes this issue [1]. The problem was
> not
> > > really the network stability, rather a problem with the interaction of
> > the
> > > jobs in the pipeline (the e2e job did not set the right variables for
> the
> > > log upload)
> > > Secondly, you are looking at the report of the "flink-ci.flink"
> pipeline,
> > > where pull requests are build. Naturally, pull request builds fail all
> > the
> > > time, because the PRs are not yet perfect.
> > >
> > > "flink-ci.flink-master" is the right pipeline to look at:
> > >
> > >
> >
> https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=8&contextType=build
> > > We have a fairly high number of failures there, because we currently
> have
> > > some issues downloading the maven artifacts [2]. I'm working already
> with
> > > Chesnay on merging a fix for that.
> > >
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/flink/commit/1c86b8b9dd05615a3b2600984db738a9bf388259
> > > [2]https://issues.apache.org/jira/browse/FLINK-16720
> > >
> > >
> > >
> > > On Wed, Mar 25, 2020 at 4:48 PM Chesnay Schepler <[hidden email]>
> > > wrote:
> > >
> > > > The easiest way to disable travis for pushes is to remove all builds
> > > > from the .travis.yml with a push/pr condition.
> > > >
> > > > On 25/03/2020 15:03, Robert Metzger wrote:
> > > > > Thank you for the feedback so far.
> > > > >
> > > > > Responses to the items Chesnay raised:
> > > > >
> > > > > - by virtue of maintaining the past 2 releases we will have to
> > maintain
> > > > any
> > > > >> Travis infrastructure as long as 1.10 is supported, i.e., until
> 1.12
> > > > >>
> > > > > Okay. I wasn't sure about the exact policy there.
> > > > >
> > > > >
> > > > >> - the azure setup doesn't appear to be equivalent yet since the
> java
> > > e2e
> > > > >> profile isn't setting the hadoop switch (-Pe2e-hadoop), as a
> result
> > of
> > > > >> which SQLClientKafkaITCase isn't run
> > > > >>
> > > > > I filed a ticket to address this:
> > > > > https://issues.apache.org/jira/browse/FLINK-16778
> > > > >
> > > > >
> > > > >> - the nightly scripts still seems to be using a maven version
> other
> > > than
> > > > >> 3.2.5; from today on master:
> > > > >> 2020-03-25T05:31:52.7412964Z [INFO] --------<
> > > > >> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
> > > > >> 2020-03-25T05:31:52.7413854Z [INFO] Building
> > > > >> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT [39/46]
> > > > >> 2020-03-25T05:31:52.7414689Z [INFO]
> > --------------------------------[
> > > > jar
> > > > >> ]---------------------------------
> > > > >> 2020-03-25T05:31:52.7518360Z [INFO]
> > > > >> 2020-03-25T05:31:52.7519770Z [INFO] ---
> > > > maven-checkstyle-plugin:2.17:check
> > > > >> (validate) @ flink-end-to-end-tests-common-kafka ---
> > > > >>
> > > > > I'm planning to address this as part of
> > > > > https://issues.apache.org/jira/browse/FLINK-16411, where I work on
> > > > > centralizing all mvn invocations.
> > > > >
> > > > >
> > > > >> - there is no real benefit in retiring the travis support in
> CiBot;
> > > the
> > > > >> important part is whether Travis is run or not for pull requests.
> > > > >>  From what I can tell though azure seems to be working fine for
> pull
> > > > >> requests, so +1 to at least disable the travis PR runs.
> > > > >
> > > > > So we disable Travis for https://github.com/flink-ci/flink ? I
> will
> > do
> > > > it
> > > > > once there are no new concerns and above tickets are resolved.
> > > > >
> > > > > What about disabling travis for master pushes? (e.g. removing the
> > > > > .travis.yml file from master)?
> > > > >
> > > > >
> > > > > @Dian:
> > > > > Thanks a lot for your feedback.
> > > > >
> > > > > - The report of Azure is still not viewable[1] (I noticed that
> Hequn
> > > has
> > > > >> also reported this issue in another thread). This is very useful
> > > > >> information.
> > > > >
> > > > > You are referring to the emails send to [hidden email] right?
> > > > > I have reported this both as a bug [1] and a feature request [2] to
> > > > Azure.
> > > > > But I don't believe they will resolve this issue anytime soon.
> > > > > Azure has an notifications API that we could use to build a service
> > > that
> > > > > sends emails to that list, but I feel that this is really a waste
> of
> > > > time.
> > > > > The URL in the link even contains the ID of the build. We would
> just
> > > need
> > > > > to extract this ID and generate the appropriate URL. I will try to
> > > > directly
> > > > > reach the product management of AZP, maybe I can get some attention
> > > from
> > > > > there.
> > > > >
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://developercommunity.visualstudio.com/content/problem/957778/third-parties-are-unable-to-access-notification-li.html?childToView=960403#comment-960403
> > > > > [2]
> > > > >
> > > >
> > >
> >
> https://developercommunity.visualstudio.com/content/idea/960472/third-parties-are-unable-to-access-notification-li-1.html
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Mar 25, 2020 at 10:34 AM Chesnay Schepler <
> > [hidden email]>
> > > > > wrote:
> > > > >
> > > > >> It was left out since it adds significant additional complexity
> and
> > > the
> > > > >> value is dubious at best for PRs that aren't merged shortly after
> > the
> > > > >> build has finished.
> > > > >>
> > > > >> On 25/03/2020 10:28, Dian Fu wrote:
> > > > >>> Thanks for the information. I'm sorry that I'm not aware of this
> > > before
> > > > >> and I have checked the build log of travis and confirmed that this
> > is
> > > > true.
> > > > >>> @Chesnay Are there any specific reasons for this and is it
> possible
> > > to
> > > > >> add this back for Azure Pipelines?
> > > > >>> Thanks,
> > > > >>> Dian
> > > > >>>
> > > > >>>> 在 2020年3月25日,下午4:43,Chesnay Schepler <[hidden email]> 写道:
> > > > >>>>
> > > > >>>> @Dian we haven't been rebasing PR's against master for months,
> > ever
> > > > >> since we switched to CiBot.
> > > > >>>> On 25/03/2020 09:29, Dian Fu wrote:
> > > > >>>>> Hi Robert,
> > > > >>>>>
> > > > >>>>> Thanks a lot for your great work!
> > > > >>>>>
> > > > >>>>> Overall I'm +1 to switch to Azure as the primary CI tool if
> it's
> > > > >> stable enough as I think there is no need to run both the travis
> and
> > > > Azure
> > > > >> for one single PR.
> > > > >>>>> However, there are still some improvements need to do and it
> > would
> > > be
> > > > >> great if these issues could be addressed before fully switch to
> > Azure:
> > > > >>>>> - The report of Azure is still not viewable[1] (I noticed that
> > > Hequn
> > > > >> has also reported this issue in another thread). This is very
> useful
> > > > >> information.
> > > > >>>>> - For PR test of Azure pipeline, it seems that it will not
> rebase
> > > the
> > > > >> master code before running the tests.
> > > > >>>>> Thanks,
> > > > >>>>> Dian
> > > > >>>>>
> > > > >>>>> [1]
> > > > >>
> > > >
> > >
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > > > >> <
> > > > >>
> > > >
> > >
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > > > >
> > > > >> <
> > > > >>
> > > >
> > >
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > > > >> <
> > > > >>
> > > >
> > >
> >
> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
> > > > >>>>>> 在 2020年3月25日,下午3:33,Chesnay Schepler <[hidden email]> 写道:
> > > > >>>>>>
> > > > >>>>>> Some thoughts:
> > > > >>>>>> - by virtue of maintaining the past 2 releases we will have to
> > > > >> maintain any Travis infrastructure as long as 1.10 is supported,
> > i.e.,
> > > > >> until 1.12
> > > > >>>>>> - the azure setup doesn't appear to be equivalent yet since
> the
> > > java
> > > > >> e2e profile isn't setting the hadoop switch (-Pe2e-hadoop), as a
> > > result
> > > > of
> > > > >> which SQLClientKafkaITCase isn't run
> > > > >>>>>> - the nightly scripts still seems to be using a maven version
> > > other
> > > > >> than 3.2.5; from today on master:
> > > > >>>>>> 2020-03-25T05:31:52.7412964Z [INFO] --------<
> > > > >> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
> > > > >>>>>> 2020-03-25T05:31:52.7413854Z [INFO] Building
> > > > >> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT       [39/46]
> > > > >>>>>> 2020-03-25T05:31:52.7414689Z [INFO]
> > > > --------------------------------[
> > > > >> jar ]---------------------------------
> > > > >>>>>> 2020-03-25T05:31:52.7518360Z [INFO]
> > > > >>>>>> 2020-03-25T05:31:52.7519770Z [INFO] ---
> > > > >> maven-checkstyle-plugin:2.17:check (validate) @
> > > > >> flink-end-to-end-tests-common-kafka ---
> > > > >>>>>> - there is no real benefit in retiring the travis support in
> > > CiBot;
> > > > >> the important part is whether Travis is run or not for pull
> > requests.
> > > > >>>>>>   From what I can tell though azure seems to be working fine
> for
> > > > pull
> > > > >> requests, so +1 to at least disable the travis PR runs.
> > > > >>>>>> On 23/03/2020 14:48, Robert Metzger wrote:
> > > > >>>>>>> Hey devs,
> > > > >>>>>>>
> > > > >>>>>>> I would like to discuss whether it makes sense to fully
> switch
> > to
> > > > >> Azure
> > > > >>>>>>> Pipelines and phase out our Travis integration.
> > > > >>>>>>> More information on our Azure integration can be found here:
> > > > >>>>>>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
> > > > >>>>>>> Travis will stay for the release-1.10 and older branches, as
> I
> > > have
> > > > >> set up
> > > > >>>>>>> Azure only for the master branch.
> > > > >>>>>>>
> > > > >>>>>>> Proposal:
> > > > >>>>>>> - We keep the flinkbot infrastructure supporting both Travis
> > and
> > > > >> Azure
> > > > >>>>>>> around, while we are still receive pull requests and pushes
> for
> > > the
> > > > >>>>>>> "master" and "release-1.10" branches.
> > > > >>>>>>> - We remove the travis-specific files from "master", so that
> > > builds
> > > > >> are not
> > > > >>>>>>> triggered anymore
> > > > >>>>>>> - once we receive no more builds at Travis (because 1.11 has
> > been
> > > > >>>>>>> released), we remove the remaining travis-related
> > infrastructure
> > > > >>>>>>>
> > > > >>>>>>> What do you think?
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> Best,
> > > > >>>>>>> Robert
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

Robert Metzger
FYI: I have moved the Flink PR and master builds from my personal Azure
account to a PMC controlled account:
https://dev.azure.com/apache-flink/apache-flink/_build

On Fri, Apr 17, 2020 at 8:28 PM Robert Metzger <[hidden email]> wrote:

> Thanks a lot for bringing up this topic again.
> The reason why I was hesitant to decommission Travis was that we were
> still facing some issues with the Azure infrastructure that I want to
> resolve, so that we have a strong test coverage.
>
> In the last few weeks, we had the following issues:
> - unstable e2e tests (we are running the e2e tests much more frequently,
> thus we see more failures (and discover actual bugs!))
> - network issues (mostly around downloading maven artifacts. This is
> solved at the cost of slower builds. I'm preparing a fix to have stable &
> fast maven downloads)
> - the private builds were never really stable (this is work in progress.
> the situation is definitely better than the private Travis builds)
> - I haven't followed the overall master stability closely before February,
> but I have the feeling that April so far was a pretty unstable month on
> master. Piotr is regularly reverting commits that somehow broke master. The
> problem with unstable master is that is causes a "CI fatigue", were people
> assume that failing builds are not worth investigating anymore, leading to
> more instability. This is not a problem of the CI infrastructure itself,
> but it makes me less confident switching systems :)
>
>
> Unless something unexpected happens, I'm proposing to disable pull request
> processing on Travis next week.
>
>
>
> On Fri, Apr 17, 2020 at 10:05 AM Gary Yao <[hidden email]> wrote:
>
>> I am in favour of decommissioning Travis.
>>
>> Moreover, I wanted to use this thread to raise another issue with Travis
>> that I
>> have discovered recently; many of the builds running in my private Travis
>> account are timing out in the compilation stage (i.e., compilation takes
>> more
>> than 50 minutes). This means that I am not able to reliably run a full
>> build on
>> a CI server without creating a pull request. If other developers also
>> experience
>> this issue, it would speak for putting more effort into making Azure
>> Pipelines
>> the project-wide default.
>>
>> Best,
>> Gary
>>
>> On Thu, Mar 26, 2020 at 12:26 PM Yu Li <[hidden email]> wrote:
>>
>> > Thanks for the clarification Robert.
>> >
>> > Since the first step plan is to replace the travis PR runs, I checked
>> all
>> > PR builds from 2020-01-01 (PR#10735-11526) [1], and below is the result:
>> >
>> > * Travis FAILURE: 298
>> > * Travis SUCCESS: 649 (68.5%)
>> > * Azure FAILURE: 420
>> > * Azure SUCCESS: 571 (57.6%)
>> >
>> > Since the patch for each run is equivalent for Travis and Azure, there
>> > seems to be slightly higher failure rate (~10%) when running in Azure.
>> >
>> > However, with the just-merged fix for uploading logs (FLINK-16480), I
>> > believe the success rate of Azure could compete with Travis now
>> (uploading
>> > files contribute to 20% of the failures according to the report [2]).
>> >
>> > So I'm +1 to disable travis runs according to the numbers.
>> >
>> > Best Regards,
>> > Yu
>> >
>> > [1]
>> >
>> https://github.com/apache/flink/pulls?q=is%3Apr+created%3A%3E%3D2020-01-01
>> > [2]
>> >
>> >
>> https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=4
>> >
>> > On Thu, 26 Mar 2020 at 03:28, Robert Metzger <[hidden email]>
>> wrote:
>> >
>> > > Thank you for your responses.
>> > >
>> > > @Yu Li: In the current master, the log upload always fails, if the e2e
>> > job
>> > > failed. I just merged a PR that fixes this issue [1]. The problem was
>> not
>> > > really the network stability, rather a problem with the interaction of
>> > the
>> > > jobs in the pipeline (the e2e job did not set the right variables for
>> the
>> > > log upload)
>> > > Secondly, you are looking at the report of the "flink-ci.flink"
>> pipeline,
>> > > where pull requests are build. Naturally, pull request builds fail all
>> > the
>> > > time, because the PRs are not yet perfect.
>> > >
>> > > "flink-ci.flink-master" is the right pipeline to look at:
>> > >
>> > >
>> >
>> https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=8&contextType=build
>> > > We have a fairly high number of failures there, because we currently
>> have
>> > > some issues downloading the maven artifacts [2]. I'm working already
>> with
>> > > Chesnay on merging a fix for that.
>> > >
>> > >
>> > > [1]
>> > >
>> > >
>> >
>> https://github.com/apache/flink/commit/1c86b8b9dd05615a3b2600984db738a9bf388259
>> > > [2]https://issues.apache.org/jira/browse/FLINK-16720
>> > >
>> > >
>> > >
>> > > On Wed, Mar 25, 2020 at 4:48 PM Chesnay Schepler <[hidden email]>
>> > > wrote:
>> > >
>> > > > The easiest way to disable travis for pushes is to remove all builds
>> > > > from the .travis.yml with a push/pr condition.
>> > > >
>> > > > On 25/03/2020 15:03, Robert Metzger wrote:
>> > > > > Thank you for the feedback so far.
>> > > > >
>> > > > > Responses to the items Chesnay raised:
>> > > > >
>> > > > > - by virtue of maintaining the past 2 releases we will have to
>> > maintain
>> > > > any
>> > > > >> Travis infrastructure as long as 1.10 is supported, i.e., until
>> 1.12
>> > > > >>
>> > > > > Okay. I wasn't sure about the exact policy there.
>> > > > >
>> > > > >
>> > > > >> - the azure setup doesn't appear to be equivalent yet since the
>> java
>> > > e2e
>> > > > >> profile isn't setting the hadoop switch (-Pe2e-hadoop), as a
>> result
>> > of
>> > > > >> which SQLClientKafkaITCase isn't run
>> > > > >>
>> > > > > I filed a ticket to address this:
>> > > > > https://issues.apache.org/jira/browse/FLINK-16778
>> > > > >
>> > > > >
>> > > > >> - the nightly scripts still seems to be using a maven version
>> other
>> > > than
>> > > > >> 3.2.5; from today on master:
>> > > > >> 2020-03-25T05:31:52.7412964Z [INFO] --------<
>> > > > >> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
>> > > > >> 2020-03-25T05:31:52.7413854Z [INFO] Building
>> > > > >> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT [39/46]
>> > > > >> 2020-03-25T05:31:52.7414689Z [INFO]
>> > --------------------------------[
>> > > > jar
>> > > > >> ]---------------------------------
>> > > > >> 2020-03-25T05:31:52.7518360Z [INFO]
>> > > > >> 2020-03-25T05:31:52.7519770Z [INFO] ---
>> > > > maven-checkstyle-plugin:2.17:check
>> > > > >> (validate) @ flink-end-to-end-tests-common-kafka ---
>> > > > >>
>> > > > > I'm planning to address this as part of
>> > > > > https://issues.apache.org/jira/browse/FLINK-16411, where I work
>> on
>> > > > > centralizing all mvn invocations.
>> > > > >
>> > > > >
>> > > > >> - there is no real benefit in retiring the travis support in
>> CiBot;
>> > > the
>> > > > >> important part is whether Travis is run or not for pull requests.
>> > > > >>  From what I can tell though azure seems to be working fine for
>> pull
>> > > > >> requests, so +1 to at least disable the travis PR runs.
>> > > > >
>> > > > > So we disable Travis for https://github.com/flink-ci/flink ? I
>> will
>> > do
>> > > > it
>> > > > > once there are no new concerns and above tickets are resolved.
>> > > > >
>> > > > > What about disabling travis for master pushes? (e.g. removing the
>> > > > > .travis.yml file from master)?
>> > > > >
>> > > > >
>> > > > > @Dian:
>> > > > > Thanks a lot for your feedback.
>> > > > >
>> > > > > - The report of Azure is still not viewable[1] (I noticed that
>> Hequn
>> > > has
>> > > > >> also reported this issue in another thread). This is very useful
>> > > > >> information.
>> > > > >
>> > > > > You are referring to the emails send to [hidden email] right?
>> > > > > I have reported this both as a bug [1] and a feature request [2]
>> to
>> > > > Azure.
>> > > > > But I don't believe they will resolve this issue anytime soon.
>> > > > > Azure has an notifications API that we could use to build a
>> service
>> > > that
>> > > > > sends emails to that list, but I feel that this is really a waste
>> of
>> > > > time.
>> > > > > The URL in the link even contains the ID of the build. We would
>> just
>> > > need
>> > > > > to extract this ID and generate the appropriate URL. I will try to
>> > > > directly
>> > > > > reach the product management of AZP, maybe I can get some
>> attention
>> > > from
>> > > > > there.
>> > > > >
>> > > > >
>> > > > >
>> > > > > [1]
>> > > > >
>> > > >
>> > >
>> >
>> https://developercommunity.visualstudio.com/content/problem/957778/third-parties-are-unable-to-access-notification-li.html?childToView=960403#comment-960403
>> > > > > [2]
>> > > > >
>> > > >
>> > >
>> >
>> https://developercommunity.visualstudio.com/content/idea/960472/third-parties-are-unable-to-access-notification-li-1.html
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Wed, Mar 25, 2020 at 10:34 AM Chesnay Schepler <
>> > [hidden email]>
>> > > > > wrote:
>> > > > >
>> > > > >> It was left out since it adds significant additional complexity
>> and
>> > > the
>> > > > >> value is dubious at best for PRs that aren't merged shortly after
>> > the
>> > > > >> build has finished.
>> > > > >>
>> > > > >> On 25/03/2020 10:28, Dian Fu wrote:
>> > > > >>> Thanks for the information. I'm sorry that I'm not aware of this
>> > > before
>> > > > >> and I have checked the build log of travis and confirmed that
>> this
>> > is
>> > > > true.
>> > > > >>> @Chesnay Are there any specific reasons for this and is it
>> possible
>> > > to
>> > > > >> add this back for Azure Pipelines?
>> > > > >>> Thanks,
>> > > > >>> Dian
>> > > > >>>
>> > > > >>>> 在 2020年3月25日,下午4:43,Chesnay Schepler <[hidden email]> 写道:
>> > > > >>>>
>> > > > >>>> @Dian we haven't been rebasing PR's against master for months,
>> > ever
>> > > > >> since we switched to CiBot.
>> > > > >>>> On 25/03/2020 09:29, Dian Fu wrote:
>> > > > >>>>> Hi Robert,
>> > > > >>>>>
>> > > > >>>>> Thanks a lot for your great work!
>> > > > >>>>>
>> > > > >>>>> Overall I'm +1 to switch to Azure as the primary CI tool if
>> it's
>> > > > >> stable enough as I think there is no need to run both the travis
>> and
>> > > > Azure
>> > > > >> for one single PR.
>> > > > >>>>> However, there are still some improvements need to do and it
>> > would
>> > > be
>> > > > >> great if these issues could be addressed before fully switch to
>> > Azure:
>> > > > >>>>> - The report of Azure is still not viewable[1] (I noticed that
>> > > Hequn
>> > > > >> has also reported this issue in another thread). This is very
>> useful
>> > > > >> information.
>> > > > >>>>> - For PR test of Azure pipeline, it seems that it will not
>> rebase
>> > > the
>> > > > >> master code before running the tests.
>> > > > >>>>> Thanks,
>> > > > >>>>> Dian
>> > > > >>>>>
>> > > > >>>>> [1]
>> > > > >>
>> > > >
>> > >
>> >
>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
>> > > > >> <
>> > > > >>
>> > > >
>> > >
>> >
>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
>> > > > >
>> > > > >> <
>> > > > >>
>> > > >
>> > >
>> >
>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
>> > > > >> <
>> > > > >>
>> > > >
>> > >
>> >
>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
>> > > > >>>>>> 在 2020年3月25日,下午3:33,Chesnay Schepler <[hidden email]>
>> 写道:
>> > > > >>>>>>
>> > > > >>>>>> Some thoughts:
>> > > > >>>>>> - by virtue of maintaining the past 2 releases we will have
>> to
>> > > > >> maintain any Travis infrastructure as long as 1.10 is supported,
>> > i.e.,
>> > > > >> until 1.12
>> > > > >>>>>> - the azure setup doesn't appear to be equivalent yet since
>> the
>> > > java
>> > > > >> e2e profile isn't setting the hadoop switch (-Pe2e-hadoop), as a
>> > > result
>> > > > of
>> > > > >> which SQLClientKafkaITCase isn't run
>> > > > >>>>>> - the nightly scripts still seems to be using a maven version
>> > > other
>> > > > >> than 3.2.5; from today on master:
>> > > > >>>>>> 2020-03-25T05:31:52.7412964Z [INFO] --------<
>> > > > >> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
>> > > > >>>>>> 2020-03-25T05:31:52.7413854Z [INFO] Building
>> > > > >> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT       [39/46]
>> > > > >>>>>> 2020-03-25T05:31:52.7414689Z [INFO]
>> > > > --------------------------------[
>> > > > >> jar ]---------------------------------
>> > > > >>>>>> 2020-03-25T05:31:52.7518360Z [INFO]
>> > > > >>>>>> 2020-03-25T05:31:52.7519770Z [INFO] ---
>> > > > >> maven-checkstyle-plugin:2.17:check (validate) @
>> > > > >> flink-end-to-end-tests-common-kafka ---
>> > > > >>>>>> - there is no real benefit in retiring the travis support in
>> > > CiBot;
>> > > > >> the important part is whether Travis is run or not for pull
>> > requests.
>> > > > >>>>>>   From what I can tell though azure seems to be working fine
>> for
>> > > > pull
>> > > > >> requests, so +1 to at least disable the travis PR runs.
>> > > > >>>>>> On 23/03/2020 14:48, Robert Metzger wrote:
>> > > > >>>>>>> Hey devs,
>> > > > >>>>>>>
>> > > > >>>>>>> I would like to discuss whether it makes sense to fully
>> switch
>> > to
>> > > > >> Azure
>> > > > >>>>>>> Pipelines and phase out our Travis integration.
>> > > > >>>>>>> More information on our Azure integration can be found here:
>> > > > >>>>>>>
>> > > > >>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
>> > > > >>>>>>> Travis will stay for the release-1.10 and older branches,
>> as I
>> > > have
>> > > > >> set up
>> > > > >>>>>>> Azure only for the master branch.
>> > > > >>>>>>>
>> > > > >>>>>>> Proposal:
>> > > > >>>>>>> - We keep the flinkbot infrastructure supporting both Travis
>> > and
>> > > > >> Azure
>> > > > >>>>>>> around, while we are still receive pull requests and pushes
>> for
>> > > the
>> > > > >>>>>>> "master" and "release-1.10" branches.
>> > > > >>>>>>> - We remove the travis-specific files from "master", so that
>> > > builds
>> > > > >> are not
>> > > > >>>>>>> triggered anymore
>> > > > >>>>>>> - once we receive no more builds at Travis (because 1.11 has
>> > been
>> > > > >>>>>>> released), we remove the remaining travis-related
>> > infrastructure
>> > > > >>>>>>>
>> > > > >>>>>>> What do you think?
>> > > > >>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>>> Best,
>> > > > >>>>>>> Robert
>> > > > >>
>> > > > >>
>> > > >
>> > > >
>> > >
>> >
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

Yun Tang

I noticed that there still existed travis related bot commands in the github PR page, and I think we should remove the command hint now.
________________________________
From: Robert Metzger <[hidden email]>
Sent: Thursday, April 23, 2020 15:44
To: dev <[hidden email]>
Subject: Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

FYI: I have moved the Flink PR and master builds from my personal Azure
account to a PMC controlled account:
https://dev.azure.com/apache-flink/apache-flink/_build

On Fri, Apr 17, 2020 at 8:28 PM Robert Metzger <[hidden email]> wrote:

> Thanks a lot for bringing up this topic again.
> The reason why I was hesitant to decommission Travis was that we were
> still facing some issues with the Azure infrastructure that I want to
> resolve, so that we have a strong test coverage.
>
> In the last few weeks, we had the following issues:
> - unstable e2e tests (we are running the e2e tests much more frequently,
> thus we see more failures (and discover actual bugs!))
> - network issues (mostly around downloading maven artifacts. This is
> solved at the cost of slower builds. I'm preparing a fix to have stable &
> fast maven downloads)
> - the private builds were never really stable (this is work in progress.
> the situation is definitely better than the private Travis builds)
> - I haven't followed the overall master stability closely before February,
> but I have the feeling that April so far was a pretty unstable month on
> master. Piotr is regularly reverting commits that somehow broke master. The
> problem with unstable master is that is causes a "CI fatigue", were people
> assume that failing builds are not worth investigating anymore, leading to
> more instability. This is not a problem of the CI infrastructure itself,
> but it makes me less confident switching systems :)
>
>
> Unless something unexpected happens, I'm proposing to disable pull request
> processing on Travis next week.
>
>
>
> On Fri, Apr 17, 2020 at 10:05 AM Gary Yao <[hidden email]> wrote:
>
>> I am in favour of decommissioning Travis.
>>
>> Moreover, I wanted to use this thread to raise another issue with Travis
>> that I
>> have discovered recently; many of the builds running in my private Travis
>> account are timing out in the compilation stage (i.e., compilation takes
>> more
>> than 50 minutes). This means that I am not able to reliably run a full
>> build on
>> a CI server without creating a pull request. If other developers also
>> experience
>> this issue, it would speak for putting more effort into making Azure
>> Pipelines
>> the project-wide default.
>>
>> Best,
>> Gary
>>
>> On Thu, Mar 26, 2020 at 12:26 PM Yu Li <[hidden email]> wrote:
>>
>> > Thanks for the clarification Robert.
>> >
>> > Since the first step plan is to replace the travis PR runs, I checked
>> all
>> > PR builds from 2020-01-01 (PR#10735-11526) [1], and below is the result:
>> >
>> > * Travis FAILURE: 298
>> > * Travis SUCCESS: 649 (68.5%)
>> > * Azure FAILURE: 420
>> > * Azure SUCCESS: 571 (57.6%)
>> >
>> > Since the patch for each run is equivalent for Travis and Azure, there
>> > seems to be slightly higher failure rate (~10%) when running in Azure.
>> >
>> > However, with the just-merged fix for uploading logs (FLINK-16480), I
>> > believe the success rate of Azure could compete with Travis now
>> (uploading
>> > files contribute to 20% of the failures according to the report [2]).
>> >
>> > So I'm +1 to disable travis runs according to the numbers.
>> >
>> > Best Regards,
>> > Yu
>> >
>> > [1]
>> >
>> https://github.com/apache/flink/pulls?q=is%3Apr+created%3A%3E%3D2020-01-01
>> > [2]
>> >
>> >
>> https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=4
>> >
>> > On Thu, 26 Mar 2020 at 03:28, Robert Metzger <[hidden email]>
>> wrote:
>> >
>> > > Thank you for your responses.
>> > >
>> > > @Yu Li: In the current master, the log upload always fails, if the e2e
>> > job
>> > > failed. I just merged a PR that fixes this issue [1]. The problem was
>> not
>> > > really the network stability, rather a problem with the interaction of
>> > the
>> > > jobs in the pipeline (the e2e job did not set the right variables for
>> the
>> > > log upload)
>> > > Secondly, you are looking at the report of the "flink-ci.flink"
>> pipeline,
>> > > where pull requests are build. Naturally, pull request builds fail all
>> > the
>> > > time, because the PRs are not yet perfect.
>> > >
>> > > "flink-ci.flink-master" is the right pipeline to look at:
>> > >
>> > >
>> >
>> https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=8&contextType=build
>> > > We have a fairly high number of failures there, because we currently
>> have
>> > > some issues downloading the maven artifacts [2]. I'm working already
>> with
>> > > Chesnay on merging a fix for that.
>> > >
>> > >
>> > > [1]
>> > >
>> > >
>> >
>> https://github.com/apache/flink/commit/1c86b8b9dd05615a3b2600984db738a9bf388259
>> > > [2]https://issues.apache.org/jira/browse/FLINK-16720
>> > >
>> > >
>> > >
>> > > On Wed, Mar 25, 2020 at 4:48 PM Chesnay Schepler <[hidden email]>
>> > > wrote:
>> > >
>> > > > The easiest way to disable travis for pushes is to remove all builds
>> > > > from the .travis.yml with a push/pr condition.
>> > > >
>> > > > On 25/03/2020 15:03, Robert Metzger wrote:
>> > > > > Thank you for the feedback so far.
>> > > > >
>> > > > > Responses to the items Chesnay raised:
>> > > > >
>> > > > > - by virtue of maintaining the past 2 releases we will have to
>> > maintain
>> > > > any
>> > > > >> Travis infrastructure as long as 1.10 is supported, i.e., until
>> 1.12
>> > > > >>
>> > > > > Okay. I wasn't sure about the exact policy there.
>> > > > >
>> > > > >
>> > > > >> - the azure setup doesn't appear to be equivalent yet since the
>> java
>> > > e2e
>> > > > >> profile isn't setting the hadoop switch (-Pe2e-hadoop), as a
>> result
>> > of
>> > > > >> which SQLClientKafkaITCase isn't run
>> > > > >>
>> > > > > I filed a ticket to address this:
>> > > > > https://issues.apache.org/jira/browse/FLINK-16778
>> > > > >
>> > > > >
>> > > > >> - the nightly scripts still seems to be using a maven version
>> other
>> > > than
>> > > > >> 3.2.5; from today on master:
>> > > > >> 2020-03-25T05:31:52.7412964Z [INFO] --------<
>> > > > >> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
>> > > > >> 2020-03-25T05:31:52.7413854Z [INFO] Building
>> > > > >> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT [39/46]
>> > > > >> 2020-03-25T05:31:52.7414689Z [INFO]
>> > --------------------------------[
>> > > > jar
>> > > > >> ]---------------------------------
>> > > > >> 2020-03-25T05:31:52.7518360Z [INFO]
>> > > > >> 2020-03-25T05:31:52.7519770Z [INFO] ---
>> > > > maven-checkstyle-plugin:2.17:check
>> > > > >> (validate) @ flink-end-to-end-tests-common-kafka ---
>> > > > >>
>> > > > > I'm planning to address this as part of
>> > > > > https://issues.apache.org/jira/browse/FLINK-16411, where I work
>> on
>> > > > > centralizing all mvn invocations.
>> > > > >
>> > > > >
>> > > > >> - there is no real benefit in retiring the travis support in
>> CiBot;
>> > > the
>> > > > >> important part is whether Travis is run or not for pull requests.
>> > > > >>  From what I can tell though azure seems to be working fine for
>> pull
>> > > > >> requests, so +1 to at least disable the travis PR runs.
>> > > > >
>> > > > > So we disable Travis for https://github.com/flink-ci/flink ? I
>> will
>> > do
>> > > > it
>> > > > > once there are no new concerns and above tickets are resolved.
>> > > > >
>> > > > > What about disabling travis for master pushes? (e.g. removing the
>> > > > > .travis.yml file from master)?
>> > > > >
>> > > > >
>> > > > > @Dian:
>> > > > > Thanks a lot for your feedback.
>> > > > >
>> > > > > - The report of Azure is still not viewable[1] (I noticed that
>> Hequn
>> > > has
>> > > > >> also reported this issue in another thread). This is very useful
>> > > > >> information.
>> > > > >
>> > > > > You are referring to the emails send to [hidden email] right?
>> > > > > I have reported this both as a bug [1] and a feature request [2]
>> to
>> > > > Azure.
>> > > > > But I don't believe they will resolve this issue anytime soon.
>> > > > > Azure has an notifications API that we could use to build a
>> service
>> > > that
>> > > > > sends emails to that list, but I feel that this is really a waste
>> of
>> > > > time.
>> > > > > The URL in the link even contains the ID of the build. We would
>> just
>> > > need
>> > > > > to extract this ID and generate the appropriate URL. I will try to
>> > > > directly
>> > > > > reach the product management of AZP, maybe I can get some
>> attention
>> > > from
>> > > > > there.
>> > > > >
>> > > > >
>> > > > >
>> > > > > [1]
>> > > > >
>> > > >
>> > >
>> >
>> https://developercommunity.visualstudio.com/content/problem/957778/third-parties-are-unable-to-access-notification-li.html?childToView=960403#comment-960403
>> > > > > [2]
>> > > > >
>> > > >
>> > >
>> >
>> https://developercommunity.visualstudio.com/content/idea/960472/third-parties-are-unable-to-access-notification-li-1.html
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Wed, Mar 25, 2020 at 10:34 AM Chesnay Schepler <
>> > [hidden email]>
>> > > > > wrote:
>> > > > >
>> > > > >> It was left out since it adds significant additional complexity
>> and
>> > > the
>> > > > >> value is dubious at best for PRs that aren't merged shortly after
>> > the
>> > > > >> build has finished.
>> > > > >>
>> > > > >> On 25/03/2020 10:28, Dian Fu wrote:
>> > > > >>> Thanks for the information. I'm sorry that I'm not aware of this
>> > > before
>> > > > >> and I have checked the build log of travis and confirmed that
>> this
>> > is
>> > > > true.
>> > > > >>> @Chesnay Are there any specific reasons for this and is it
>> possible
>> > > to
>> > > > >> add this back for Azure Pipelines?
>> > > > >>> Thanks,
>> > > > >>> Dian
>> > > > >>>
>> > > > >>>> 在 2020年3月25日,下午4:43,Chesnay Schepler <[hidden email]> 写道:
>> > > > >>>>
>> > > > >>>> @Dian we haven't been rebasing PR's against master for months,
>> > ever
>> > > > >> since we switched to CiBot.
>> > > > >>>> On 25/03/2020 09:29, Dian Fu wrote:
>> > > > >>>>> Hi Robert,
>> > > > >>>>>
>> > > > >>>>> Thanks a lot for your great work!
>> > > > >>>>>
>> > > > >>>>> Overall I'm +1 to switch to Azure as the primary CI tool if
>> it's
>> > > > >> stable enough as I think there is no need to run both the travis
>> and
>> > > > Azure
>> > > > >> for one single PR.
>> > > > >>>>> However, there are still some improvements need to do and it
>> > would
>> > > be
>> > > > >> great if these issues could be addressed before fully switch to
>> > Azure:
>> > > > >>>>> - The report of Azure is still not viewable[1] (I noticed that
>> > > Hequn
>> > > > >> has also reported this issue in another thread). This is very
>> useful
>> > > > >> information.
>> > > > >>>>> - For PR test of Azure pipeline, it seems that it will not
>> rebase
>> > > the
>> > > > >> master code before running the tests.
>> > > > >>>>> Thanks,
>> > > > >>>>> Dian
>> > > > >>>>>
>> > > > >>>>> [1]
>> > > > >>
>> > > >
>> > >
>> >
>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
>> > > > >> <
>> > > > >>
>> > > >
>> > >
>> >
>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
>> > > > >
>> > > > >> <
>> > > > >>
>> > > >
>> > >
>> >
>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
>> > > > >> <
>> > > > >>
>> > > >
>> > >
>> >
>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
>> > > > >>>>>> 在 2020年3月25日,下午3:33,Chesnay Schepler <[hidden email]>
>> 写道:
>> > > > >>>>>>
>> > > > >>>>>> Some thoughts:
>> > > > >>>>>> - by virtue of maintaining the past 2 releases we will have
>> to
>> > > > >> maintain any Travis infrastructure as long as 1.10 is supported,
>> > i.e.,
>> > > > >> until 1.12
>> > > > >>>>>> - the azure setup doesn't appear to be equivalent yet since
>> the
>> > > java
>> > > > >> e2e profile isn't setting the hadoop switch (-Pe2e-hadoop), as a
>> > > result
>> > > > of
>> > > > >> which SQLClientKafkaITCase isn't run
>> > > > >>>>>> - the nightly scripts still seems to be using a maven version
>> > > other
>> > > > >> than 3.2.5; from today on master:
>> > > > >>>>>> 2020-03-25T05:31:52.7412964Z [INFO] --------<
>> > > > >> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
>> > > > >>>>>> 2020-03-25T05:31:52.7413854Z [INFO] Building
>> > > > >> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT       [39/46]
>> > > > >>>>>> 2020-03-25T05:31:52.7414689Z [INFO]
>> > > > --------------------------------[
>> > > > >> jar ]---------------------------------
>> > > > >>>>>> 2020-03-25T05:31:52.7518360Z [INFO]
>> > > > >>>>>> 2020-03-25T05:31:52.7519770Z [INFO] ---
>> > > > >> maven-checkstyle-plugin:2.17:check (validate) @
>> > > > >> flink-end-to-end-tests-common-kafka ---
>> > > > >>>>>> - there is no real benefit in retiring the travis support in
>> > > CiBot;
>> > > > >> the important part is whether Travis is run or not for pull
>> > requests.
>> > > > >>>>>>   From what I can tell though azure seems to be working fine
>> for
>> > > > pull
>> > > > >> requests, so +1 to at least disable the travis PR runs.
>> > > > >>>>>> On 23/03/2020 14:48, Robert Metzger wrote:
>> > > > >>>>>>> Hey devs,
>> > > > >>>>>>>
>> > > > >>>>>>> I would like to discuss whether it makes sense to fully
>> switch
>> > to
>> > > > >> Azure
>> > > > >>>>>>> Pipelines and phase out our Travis integration.
>> > > > >>>>>>> More information on our Azure integration can be found here:
>> > > > >>>>>>>
>> > > > >>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
>> > > > >>>>>>> Travis will stay for the release-1.10 and older branches,
>> as I
>> > > have
>> > > > >> set up
>> > > > >>>>>>> Azure only for the master branch.
>> > > > >>>>>>>
>> > > > >>>>>>> Proposal:
>> > > > >>>>>>> - We keep the flinkbot infrastructure supporting both Travis
>> > and
>> > > > >> Azure
>> > > > >>>>>>> around, while we are still receive pull requests and pushes
>> for
>> > > the
>> > > > >>>>>>> "master" and "release-1.10" branches.
>> > > > >>>>>>> - We remove the travis-specific files from "master", so that
>> > > builds
>> > > > >> are not
>> > > > >>>>>>> triggered anymore
>> > > > >>>>>>> - once we receive no more builds at Travis (because 1.11 has
>> > been
>> > > > >>>>>>> released), we remove the remaining travis-related
>> > infrastructure
>> > > > >>>>>>>
>> > > > >>>>>>> What do you think?
>> > > > >>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>>> Best,
>> > > > >>>>>>> Robert
>> > > > >>
>> > > > >>
>> > > >
>> > > >
>> > >
>> >
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

Chesnay Schepler-3
The travis bot commands must be retained so long as we accept PRs for  
1.9/1.10 .

On 25/05/2020 10:50, Yun Tang wrote:

> I noticed that there still existed travis related bot commands in the github PR page, and I think we should remove the command hint now.
> ________________________________
> From: Robert Metzger <[hidden email]>
> Sent: Thursday, April 23, 2020 15:44
> To: dev <[hidden email]>
> Subject: Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis
>
> FYI: I have moved the Flink PR and master builds from my personal Azure
> account to a PMC controlled account:
> https://dev.azure.com/apache-flink/apache-flink/_build
>
> On Fri, Apr 17, 2020 at 8:28 PM Robert Metzger <[hidden email]> wrote:
>
>> Thanks a lot for bringing up this topic again.
>> The reason why I was hesitant to decommission Travis was that we were
>> still facing some issues with the Azure infrastructure that I want to
>> resolve, so that we have a strong test coverage.
>>
>> In the last few weeks, we had the following issues:
>> - unstable e2e tests (we are running the e2e tests much more frequently,
>> thus we see more failures (and discover actual bugs!))
>> - network issues (mostly around downloading maven artifacts. This is
>> solved at the cost of slower builds. I'm preparing a fix to have stable &
>> fast maven downloads)
>> - the private builds were never really stable (this is work in progress.
>> the situation is definitely better than the private Travis builds)
>> - I haven't followed the overall master stability closely before February,
>> but I have the feeling that April so far was a pretty unstable month on
>> master. Piotr is regularly reverting commits that somehow broke master. The
>> problem with unstable master is that is causes a "CI fatigue", were people
>> assume that failing builds are not worth investigating anymore, leading to
>> more instability. This is not a problem of the CI infrastructure itself,
>> but it makes me less confident switching systems :)
>>
>>
>> Unless something unexpected happens, I'm proposing to disable pull request
>> processing on Travis next week.
>>
>>
>>
>> On Fri, Apr 17, 2020 at 10:05 AM Gary Yao <[hidden email]> wrote:
>>
>>> I am in favour of decommissioning Travis.
>>>
>>> Moreover, I wanted to use this thread to raise another issue with Travis
>>> that I
>>> have discovered recently; many of the builds running in my private Travis
>>> account are timing out in the compilation stage (i.e., compilation takes
>>> more
>>> than 50 minutes). This means that I am not able to reliably run a full
>>> build on
>>> a CI server without creating a pull request. If other developers also
>>> experience
>>> this issue, it would speak for putting more effort into making Azure
>>> Pipelines
>>> the project-wide default.
>>>
>>> Best,
>>> Gary
>>>
>>> On Thu, Mar 26, 2020 at 12:26 PM Yu Li <[hidden email]> wrote:
>>>
>>>> Thanks for the clarification Robert.
>>>>
>>>> Since the first step plan is to replace the travis PR runs, I checked
>>> all
>>>> PR builds from 2020-01-01 (PR#10735-11526) [1], and below is the result:
>>>>
>>>> * Travis FAILURE: 298
>>>> * Travis SUCCESS: 649 (68.5%)
>>>> * Azure FAILURE: 420
>>>> * Azure SUCCESS: 571 (57.6%)
>>>>
>>>> Since the patch for each run is equivalent for Travis and Azure, there
>>>> seems to be slightly higher failure rate (~10%) when running in Azure.
>>>>
>>>> However, with the just-merged fix for uploading logs (FLINK-16480), I
>>>> believe the success rate of Azure could compete with Travis now
>>> (uploading
>>>> files contribute to 20% of the failures according to the report [2]).
>>>>
>>>> So I'm +1 to disable travis runs according to the numbers.
>>>>
>>>> Best Regards,
>>>> Yu
>>>>
>>>> [1]
>>>>
>>> https://github.com/apache/flink/pulls?q=is%3Apr+created%3A%3E%3D2020-01-01
>>>> [2]
>>>>
>>>>
>>> https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=4
>>>> On Thu, 26 Mar 2020 at 03:28, Robert Metzger <[hidden email]>
>>> wrote:
>>>>> Thank you for your responses.
>>>>>
>>>>> @Yu Li: In the current master, the log upload always fails, if the e2e
>>>> job
>>>>> failed. I just merged a PR that fixes this issue [1]. The problem was
>>> not
>>>>> really the network stability, rather a problem with the interaction of
>>>> the
>>>>> jobs in the pipeline (the e2e job did not set the right variables for
>>> the
>>>>> log upload)
>>>>> Secondly, you are looking at the report of the "flink-ci.flink"
>>> pipeline,
>>>>> where pull requests are build. Naturally, pull request builds fail all
>>>> the
>>>>> time, because the PRs are not yet perfect.
>>>>>
>>>>> "flink-ci.flink-master" is the right pipeline to look at:
>>>>>
>>>>>
>>> https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=8&contextType=build
>>>>> We have a fairly high number of failures there, because we currently
>>> have
>>>>> some issues downloading the maven artifacts [2]. I'm working already
>>> with
>>>>> Chesnay on merging a fix for that.
>>>>>
>>>>>
>>>>> [1]
>>>>>
>>>>>
>>> https://github.com/apache/flink/commit/1c86b8b9dd05615a3b2600984db738a9bf388259
>>>>> [2]https://issues.apache.org/jira/browse/FLINK-16720
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 25, 2020 at 4:48 PM Chesnay Schepler <[hidden email]>
>>>>> wrote:
>>>>>
>>>>>> The easiest way to disable travis for pushes is to remove all builds
>>>>>> from the .travis.yml with a push/pr condition.
>>>>>>
>>>>>> On 25/03/2020 15:03, Robert Metzger wrote:
>>>>>>> Thank you for the feedback so far.
>>>>>>>
>>>>>>> Responses to the items Chesnay raised:
>>>>>>>
>>>>>>> - by virtue of maintaining the past 2 releases we will have to
>>>> maintain
>>>>>> any
>>>>>>>> Travis infrastructure as long as 1.10 is supported, i.e., until
>>> 1.12
>>>>>>> Okay. I wasn't sure about the exact policy there.
>>>>>>>
>>>>>>>
>>>>>>>> - the azure setup doesn't appear to be equivalent yet since the
>>> java
>>>>> e2e
>>>>>>>> profile isn't setting the hadoop switch (-Pe2e-hadoop), as a
>>> result
>>>> of
>>>>>>>> which SQLClientKafkaITCase isn't run
>>>>>>>>
>>>>>>> I filed a ticket to address this:
>>>>>>> https://issues.apache.org/jira/browse/FLINK-16778
>>>>>>>
>>>>>>>
>>>>>>>> - the nightly scripts still seems to be using a maven version
>>> other
>>>>> than
>>>>>>>> 3.2.5; from today on master:
>>>>>>>> 2020-03-25T05:31:52.7412964Z [INFO] --------<
>>>>>>>> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
>>>>>>>> 2020-03-25T05:31:52.7413854Z [INFO] Building
>>>>>>>> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT [39/46]
>>>>>>>> 2020-03-25T05:31:52.7414689Z [INFO]
>>>> --------------------------------[
>>>>>> jar
>>>>>>>> ]---------------------------------
>>>>>>>> 2020-03-25T05:31:52.7518360Z [INFO]
>>>>>>>> 2020-03-25T05:31:52.7519770Z [INFO] ---
>>>>>> maven-checkstyle-plugin:2.17:check
>>>>>>>> (validate) @ flink-end-to-end-tests-common-kafka ---
>>>>>>>>
>>>>>>> I'm planning to address this as part of
>>>>>>> https://issues.apache.org/jira/browse/FLINK-16411, where I work
>>> on
>>>>>>> centralizing all mvn invocations.
>>>>>>>
>>>>>>>
>>>>>>>> - there is no real benefit in retiring the travis support in
>>> CiBot;
>>>>> the
>>>>>>>> important part is whether Travis is run or not for pull requests.
>>>>>>>>   From what I can tell though azure seems to be working fine for
>>> pull
>>>>>>>> requests, so +1 to at least disable the travis PR runs.
>>>>>>> So we disable Travis for https://github.com/flink-ci/flink ? I
>>> will
>>>> do
>>>>>> it
>>>>>>> once there are no new concerns and above tickets are resolved.
>>>>>>>
>>>>>>> What about disabling travis for master pushes? (e.g. removing the
>>>>>>> .travis.yml file from master)?
>>>>>>>
>>>>>>>
>>>>>>> @Dian:
>>>>>>> Thanks a lot for your feedback.
>>>>>>>
>>>>>>> - The report of Azure is still not viewable[1] (I noticed that
>>> Hequn
>>>>> has
>>>>>>>> also reported this issue in another thread). This is very useful
>>>>>>>> information.
>>>>>>> You are referring to the emails send to [hidden email] right?
>>>>>>> I have reported this both as a bug [1] and a feature request [2]
>>> to
>>>>>> Azure.
>>>>>>> But I don't believe they will resolve this issue anytime soon.
>>>>>>> Azure has an notifications API that we could use to build a
>>> service
>>>>> that
>>>>>>> sends emails to that list, but I feel that this is really a waste
>>> of
>>>>>> time.
>>>>>>> The URL in the link even contains the ID of the build. We would
>>> just
>>>>> need
>>>>>>> to extract this ID and generate the appropriate URL. I will try to
>>>>>> directly
>>>>>>> reach the product management of AZP, maybe I can get some
>>> attention
>>>>> from
>>>>>>> there.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>> https://developercommunity.visualstudio.com/content/problem/957778/third-parties-are-unable-to-access-notification-li.html?childToView=960403#comment-960403
>>>>>>> [2]
>>>>>>>
>>> https://developercommunity.visualstudio.com/content/idea/960472/third-parties-are-unable-to-access-notification-li-1.html
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 25, 2020 at 10:34 AM Chesnay Schepler <
>>>> [hidden email]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> It was left out since it adds significant additional complexity
>>> and
>>>>> the
>>>>>>>> value is dubious at best for PRs that aren't merged shortly after
>>>> the
>>>>>>>> build has finished.
>>>>>>>>
>>>>>>>> On 25/03/2020 10:28, Dian Fu wrote:
>>>>>>>>> Thanks for the information. I'm sorry that I'm not aware of this
>>>>> before
>>>>>>>> and I have checked the build log of travis and confirmed that
>>> this
>>>> is
>>>>>> true.
>>>>>>>>> @Chesnay Are there any specific reasons for this and is it
>>> possible
>>>>> to
>>>>>>>> add this back for Azure Pipelines?
>>>>>>>>> Thanks,
>>>>>>>>> Dian
>>>>>>>>>
>>>>>>>>>> 在 2020年3月25日,下午4:43,Chesnay Schepler <[hidden email]> 写道:
>>>>>>>>>>
>>>>>>>>>> @Dian we haven't been rebasing PR's against master for months,
>>>> ever
>>>>>>>> since we switched to CiBot.
>>>>>>>>>> On 25/03/2020 09:29, Dian Fu wrote:
>>>>>>>>>>> Hi Robert,
>>>>>>>>>>>
>>>>>>>>>>> Thanks a lot for your great work!
>>>>>>>>>>>
>>>>>>>>>>> Overall I'm +1 to switch to Azure as the primary CI tool if
>>> it's
>>>>>>>> stable enough as I think there is no need to run both the travis
>>> and
>>>>>> Azure
>>>>>>>> for one single PR.
>>>>>>>>>>> However, there are still some improvements need to do and it
>>>> would
>>>>> be
>>>>>>>> great if these issues could be addressed before fully switch to
>>>> Azure:
>>>>>>>>>>> - The report of Azure is still not viewable[1] (I noticed that
>>>>> Hequn
>>>>>>>> has also reported this issue in another thread). This is very
>>> useful
>>>>>>>> information.
>>>>>>>>>>> - For PR test of Azure pipeline, it seems that it will not
>>> rebase
>>>>> the
>>>>>>>> master code before running the tests.
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Dian
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
>>>>>>>> <
>>>>>>>>
>>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
>>>>>>>> <
>>>>>>>>
>>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
>>>>>>>> <
>>>>>>>>
>>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
>>>>>>>>>>>> 在 2020年3月25日,下午3:33,Chesnay Schepler <[hidden email]>
>>> 写道:
>>>>>>>>>>>> Some thoughts:
>>>>>>>>>>>> - by virtue of maintaining the past 2 releases we will have
>>> to
>>>>>>>> maintain any Travis infrastructure as long as 1.10 is supported,
>>>> i.e.,
>>>>>>>> until 1.12
>>>>>>>>>>>> - the azure setup doesn't appear to be equivalent yet since
>>> the
>>>>> java
>>>>>>>> e2e profile isn't setting the hadoop switch (-Pe2e-hadoop), as a
>>>>> result
>>>>>> of
>>>>>>>> which SQLClientKafkaITCase isn't run
>>>>>>>>>>>> - the nightly scripts still seems to be using a maven version
>>>>> other
>>>>>>>> than 3.2.5; from today on master:
>>>>>>>>>>>> 2020-03-25T05:31:52.7412964Z [INFO] --------<
>>>>>>>> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
>>>>>>>>>>>> 2020-03-25T05:31:52.7413854Z [INFO] Building
>>>>>>>> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT       [39/46]
>>>>>>>>>>>> 2020-03-25T05:31:52.7414689Z [INFO]
>>>>>> --------------------------------[
>>>>>>>> jar ]---------------------------------
>>>>>>>>>>>> 2020-03-25T05:31:52.7518360Z [INFO]
>>>>>>>>>>>> 2020-03-25T05:31:52.7519770Z [INFO] ---
>>>>>>>> maven-checkstyle-plugin:2.17:check (validate) @
>>>>>>>> flink-end-to-end-tests-common-kafka ---
>>>>>>>>>>>> - there is no real benefit in retiring the travis support in
>>>>> CiBot;
>>>>>>>> the important part is whether Travis is run or not for pull
>>>> requests.
>>>>>>>>>>>>    From what I can tell though azure seems to be working fine
>>> for
>>>>>> pull
>>>>>>>> requests, so +1 to at least disable the travis PR runs.
>>>>>>>>>>>> On 23/03/2020 14:48, Robert Metzger wrote:
>>>>>>>>>>>>> Hey devs,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would like to discuss whether it makes sense to fully
>>> switch
>>>> to
>>>>>>>> Azure
>>>>>>>>>>>>> Pipelines and phase out our Travis integration.
>>>>>>>>>>>>> More information on our Azure integration can be found here:
>>>>>>>>>>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
>>>>>>>>>>>>> Travis will stay for the release-1.10 and older branches,
>>> as I
>>>>> have
>>>>>>>> set up
>>>>>>>>>>>>> Azure only for the master branch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Proposal:
>>>>>>>>>>>>> - We keep the flinkbot infrastructure supporting both Travis
>>>> and
>>>>>>>> Azure
>>>>>>>>>>>>> around, while we are still receive pull requests and pushes
>>> for
>>>>> the
>>>>>>>>>>>>> "master" and "release-1.10" branches.
>>>>>>>>>>>>> - We remove the travis-specific files from "master", so that
>>>>> builds
>>>>>>>> are not
>>>>>>>>>>>>> triggered anymore
>>>>>>>>>>>>> - once we receive no more builds at Travis (because 1.11 has
>>>> been
>>>>>>>>>>>>> released), we remove the remaining travis-related
>>>> infrastructure
>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Robert
>>>>>>>>
>>>>>>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

Yun Tang
If we still need to accept PRs for Flink-1.9/1.10, that could explain why we still need that command hint.
Chesnay, thanks for your explanation.
________________________________
From: Chesnay Schepler <[hidden email]>
Sent: Monday, May 25, 2020 18:17
To: [hidden email] <[hidden email]>; Yun Tang <[hidden email]>
Subject: Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis

The travis bot commands must be retained so long as we accept PRs for
1.9/1.10 .

On 25/05/2020 10:50, Yun Tang wrote:

> I noticed that there still existed travis related bot commands in the github PR page, and I think we should remove the command hint now.
> ________________________________
> From: Robert Metzger <[hidden email]>
> Sent: Thursday, April 23, 2020 15:44
> To: dev <[hidden email]>
> Subject: Re: [DISCUSS] Switch to Azure Pipelines as the primary CI tool / switch off Travis
>
> FYI: I have moved the Flink PR and master builds from my personal Azure
> account to a PMC controlled account:
> https://dev.azure.com/apache-flink/apache-flink/_build
>
> On Fri, Apr 17, 2020 at 8:28 PM Robert Metzger <[hidden email]> wrote:
>
>> Thanks a lot for bringing up this topic again.
>> The reason why I was hesitant to decommission Travis was that we were
>> still facing some issues with the Azure infrastructure that I want to
>> resolve, so that we have a strong test coverage.
>>
>> In the last few weeks, we had the following issues:
>> - unstable e2e tests (we are running the e2e tests much more frequently,
>> thus we see more failures (and discover actual bugs!))
>> - network issues (mostly around downloading maven artifacts. This is
>> solved at the cost of slower builds. I'm preparing a fix to have stable &
>> fast maven downloads)
>> - the private builds were never really stable (this is work in progress.
>> the situation is definitely better than the private Travis builds)
>> - I haven't followed the overall master stability closely before February,
>> but I have the feeling that April so far was a pretty unstable month on
>> master. Piotr is regularly reverting commits that somehow broke master. The
>> problem with unstable master is that is causes a "CI fatigue", were people
>> assume that failing builds are not worth investigating anymore, leading to
>> more instability. This is not a problem of the CI infrastructure itself,
>> but it makes me less confident switching systems :)
>>
>>
>> Unless something unexpected happens, I'm proposing to disable pull request
>> processing on Travis next week.
>>
>>
>>
>> On Fri, Apr 17, 2020 at 10:05 AM Gary Yao <[hidden email]> wrote:
>>
>>> I am in favour of decommissioning Travis.
>>>
>>> Moreover, I wanted to use this thread to raise another issue with Travis
>>> that I
>>> have discovered recently; many of the builds running in my private Travis
>>> account are timing out in the compilation stage (i.e., compilation takes
>>> more
>>> than 50 minutes). This means that I am not able to reliably run a full
>>> build on
>>> a CI server without creating a pull request. If other developers also
>>> experience
>>> this issue, it would speak for putting more effort into making Azure
>>> Pipelines
>>> the project-wide default.
>>>
>>> Best,
>>> Gary
>>>
>>> On Thu, Mar 26, 2020 at 12:26 PM Yu Li <[hidden email]> wrote:
>>>
>>>> Thanks for the clarification Robert.
>>>>
>>>> Since the first step plan is to replace the travis PR runs, I checked
>>> all
>>>> PR builds from 2020-01-01 (PR#10735-11526) [1], and below is the result:
>>>>
>>>> * Travis FAILURE: 298
>>>> * Travis SUCCESS: 649 (68.5%)
>>>> * Azure FAILURE: 420
>>>> * Azure SUCCESS: 571 (57.6%)
>>>>
>>>> Since the patch for each run is equivalent for Travis and Azure, there
>>>> seems to be slightly higher failure rate (~10%) when running in Azure.
>>>>
>>>> However, with the just-merged fix for uploading logs (FLINK-16480), I
>>>> believe the success rate of Azure could compete with Travis now
>>> (uploading
>>>> files contribute to 20% of the failures according to the report [2]).
>>>>
>>>> So I'm +1 to disable travis runs according to the numbers.
>>>>
>>>> Best Regards,
>>>> Yu
>>>>
>>>> [1]
>>>>
>>> https://github.com/apache/flink/pulls?q=is%3Apr+created%3A%3E%3D2020-01-01
>>>> [2]
>>>>
>>>>
>>> https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=4
>>>> On Thu, 26 Mar 2020 at 03:28, Robert Metzger <[hidden email]>
>>> wrote:
>>>>> Thank you for your responses.
>>>>>
>>>>> @Yu Li: In the current master, the log upload always fails, if the e2e
>>>> job
>>>>> failed. I just merged a PR that fixes this issue [1]. The problem was
>>> not
>>>>> really the network stability, rather a problem with the interaction of
>>>> the
>>>>> jobs in the pipeline (the e2e job did not set the right variables for
>>> the
>>>>> log upload)
>>>>> Secondly, you are looking at the report of the "flink-ci.flink"
>>> pipeline,
>>>>> where pull requests are build. Naturally, pull request builds fail all
>>>> the
>>>>> time, because the PRs are not yet perfect.
>>>>>
>>>>> "flink-ci.flink-master" is the right pipeline to look at:
>>>>>
>>>>>
>>> https://dev.azure.com/rmetzger/Flink/_pipeline/analytics/stageawareoutcome?definitionId=8&contextType=build
>>>>> We have a fairly high number of failures there, because we currently
>>> have
>>>>> some issues downloading the maven artifacts [2]. I'm working already
>>> with
>>>>> Chesnay on merging a fix for that.
>>>>>
>>>>>
>>>>> [1]
>>>>>
>>>>>
>>> https://github.com/apache/flink/commit/1c86b8b9dd05615a3b2600984db738a9bf388259
>>>>> [2]https://issues.apache.org/jira/browse/FLINK-16720
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 25, 2020 at 4:48 PM Chesnay Schepler <[hidden email]>
>>>>> wrote:
>>>>>
>>>>>> The easiest way to disable travis for pushes is to remove all builds
>>>>>> from the .travis.yml with a push/pr condition.
>>>>>>
>>>>>> On 25/03/2020 15:03, Robert Metzger wrote:
>>>>>>> Thank you for the feedback so far.
>>>>>>>
>>>>>>> Responses to the items Chesnay raised:
>>>>>>>
>>>>>>> - by virtue of maintaining the past 2 releases we will have to
>>>> maintain
>>>>>> any
>>>>>>>> Travis infrastructure as long as 1.10 is supported, i.e., until
>>> 1.12
>>>>>>> Okay. I wasn't sure about the exact policy there.
>>>>>>>
>>>>>>>
>>>>>>>> - the azure setup doesn't appear to be equivalent yet since the
>>> java
>>>>> e2e
>>>>>>>> profile isn't setting the hadoop switch (-Pe2e-hadoop), as a
>>> result
>>>> of
>>>>>>>> which SQLClientKafkaITCase isn't run
>>>>>>>>
>>>>>>> I filed a ticket to address this:
>>>>>>> https://issues.apache.org/jira/browse/FLINK-16778
>>>>>>>
>>>>>>>
>>>>>>>> - the nightly scripts still seems to be using a maven version
>>> other
>>>>> than
>>>>>>>> 3.2.5; from today on master:
>>>>>>>> 2020-03-25T05:31:52.7412964Z [INFO] --------<
>>>>>>>> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
>>>>>>>> 2020-03-25T05:31:52.7413854Z [INFO] Building
>>>>>>>> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT [39/46]
>>>>>>>> 2020-03-25T05:31:52.7414689Z [INFO]
>>>> --------------------------------[
>>>>>> jar
>>>>>>>> ]---------------------------------
>>>>>>>> 2020-03-25T05:31:52.7518360Z [INFO]
>>>>>>>> 2020-03-25T05:31:52.7519770Z [INFO] ---
>>>>>> maven-checkstyle-plugin:2.17:check
>>>>>>>> (validate) @ flink-end-to-end-tests-common-kafka ---
>>>>>>>>
>>>>>>> I'm planning to address this as part of
>>>>>>> https://issues.apache.org/jira/browse/FLINK-16411, where I work
>>> on
>>>>>>> centralizing all mvn invocations.
>>>>>>>
>>>>>>>
>>>>>>>> - there is no real benefit in retiring the travis support in
>>> CiBot;
>>>>> the
>>>>>>>> important part is whether Travis is run or not for pull requests.
>>>>>>>>   From what I can tell though azure seems to be working fine for
>>> pull
>>>>>>>> requests, so +1 to at least disable the travis PR runs.
>>>>>>> So we disable Travis for https://github.com/flink-ci/flink ? I
>>> will
>>>> do
>>>>>> it
>>>>>>> once there are no new concerns and above tickets are resolved.
>>>>>>>
>>>>>>> What about disabling travis for master pushes? (e.g. removing the
>>>>>>> .travis.yml file from master)?
>>>>>>>
>>>>>>>
>>>>>>> @Dian:
>>>>>>> Thanks a lot for your feedback.
>>>>>>>
>>>>>>> - The report of Azure is still not viewable[1] (I noticed that
>>> Hequn
>>>>> has
>>>>>>>> also reported this issue in another thread). This is very useful
>>>>>>>> information.
>>>>>>> You are referring to the emails send to [hidden email] right?
>>>>>>> I have reported this both as a bug [1] and a feature request [2]
>>> to
>>>>>> Azure.
>>>>>>> But I don't believe they will resolve this issue anytime soon.
>>>>>>> Azure has an notifications API that we could use to build a
>>> service
>>>>> that
>>>>>>> sends emails to that list, but I feel that this is really a waste
>>> of
>>>>>> time.
>>>>>>> The URL in the link even contains the ID of the build. We would
>>> just
>>>>> need
>>>>>>> to extract this ID and generate the appropriate URL. I will try to
>>>>>> directly
>>>>>>> reach the product management of AZP, maybe I can get some
>>> attention
>>>>> from
>>>>>>> there.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>> https://developercommunity.visualstudio.com/content/problem/957778/third-parties-are-unable-to-access-notification-li.html?childToView=960403#comment-960403
>>>>>>> [2]
>>>>>>>
>>> https://developercommunity.visualstudio.com/content/idea/960472/third-parties-are-unable-to-access-notification-li-1.html
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 25, 2020 at 10:34 AM Chesnay Schepler <
>>>> [hidden email]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> It was left out since it adds significant additional complexity
>>> and
>>>>> the
>>>>>>>> value is dubious at best for PRs that aren't merged shortly after
>>>> the
>>>>>>>> build has finished.
>>>>>>>>
>>>>>>>> On 25/03/2020 10:28, Dian Fu wrote:
>>>>>>>>> Thanks for the information. I'm sorry that I'm not aware of this
>>>>> before
>>>>>>>> and I have checked the build log of travis and confirmed that
>>> this
>>>> is
>>>>>> true.
>>>>>>>>> @Chesnay Are there any specific reasons for this and is it
>>> possible
>>>>> to
>>>>>>>> add this back for Azure Pipelines?
>>>>>>>>> Thanks,
>>>>>>>>> Dian
>>>>>>>>>
>>>>>>>>>> 在 2020年3月25日,下午4:43,Chesnay Schepler <[hidden email]> 写道:
>>>>>>>>>>
>>>>>>>>>> @Dian we haven't been rebasing PR's against master for months,
>>>> ever
>>>>>>>> since we switched to CiBot.
>>>>>>>>>> On 25/03/2020 09:29, Dian Fu wrote:
>>>>>>>>>>> Hi Robert,
>>>>>>>>>>>
>>>>>>>>>>> Thanks a lot for your great work!
>>>>>>>>>>>
>>>>>>>>>>> Overall I'm +1 to switch to Azure as the primary CI tool if
>>> it's
>>>>>>>> stable enough as I think there is no need to run both the travis
>>> and
>>>>>> Azure
>>>>>>>> for one single PR.
>>>>>>>>>>> However, there are still some improvements need to do and it
>>>> would
>>>>> be
>>>>>>>> great if these issues could be addressed before fully switch to
>>>> Azure:
>>>>>>>>>>> - The report of Azure is still not viewable[1] (I noticed that
>>>>> Hequn
>>>>>>>> has also reported this issue in another thread). This is very
>>> useful
>>>>>>>> information.
>>>>>>>>>>> - For PR test of Azure pipeline, it seems that it will not
>>> rebase
>>>>> the
>>>>>>>> master code before running the tests.
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Dian
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
>>>>>>>> <
>>>>>>>>
>>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs%3a%2f%2f%2fBuild%2fBuild%2f6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
>>>>>>>> <
>>>>>>>>
>>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
>>>>>>>> <
>>>>>>>>
>>> https://dev.azure.com/rmetzger/web/build.aspx?pcguid=03e2a4fd-f647-46c5-a324-527d2c2984ce&builduri=vstfs:///Build/Build/6593&tracking_data=eyJTb3VyY2UiOiJFbWFpbCIsIlR5cGUiOiJOb3RpZmljYXRpb24iLCJTSUQiOiIzMzk0MzciLCJTVHlwZSI6IkdSUCIsIlJlY2lwIjoxLCJfeGNpIjp7Ik5JRCI6NDAyODQ3NzksIk1SZWNpcCI6Im0wPTEgIiwiQWN0IjoiMTNjNDc3YWMtZTBjYS00MjJkLTkxOTItZWI0NzFkZmUzMWY0In0sIkVsZW1lbnQiOiJoZXJvL2N0YSJ9
>>>>>>>>>>>> 在 2020年3月25日,下午3:33,Chesnay Schepler <[hidden email]>
>>> 写道:
>>>>>>>>>>>> Some thoughts:
>>>>>>>>>>>> - by virtue of maintaining the past 2 releases we will have
>>> to
>>>>>>>> maintain any Travis infrastructure as long as 1.10 is supported,
>>>> i.e.,
>>>>>>>> until 1.12
>>>>>>>>>>>> - the azure setup doesn't appear to be equivalent yet since
>>> the
>>>>> java
>>>>>>>> e2e profile isn't setting the hadoop switch (-Pe2e-hadoop), as a
>>>>> result
>>>>>> of
>>>>>>>> which SQLClientKafkaITCase isn't run
>>>>>>>>>>>> - the nightly scripts still seems to be using a maven version
>>>>> other
>>>>>>>> than 3.2.5; from today on master:
>>>>>>>>>>>> 2020-03-25T05:31:52.7412964Z [INFO] --------<
>>>>>>>> org.apache.flink:flink-end-to-end-tests-common-kafka >--------
>>>>>>>>>>>> 2020-03-25T05:31:52.7413854Z [INFO] Building
>>>>>>>> flink-end-to-end-tests-common-kafka 1.11-SNAPSHOT       [39/46]
>>>>>>>>>>>> 2020-03-25T05:31:52.7414689Z [INFO]
>>>>>> --------------------------------[
>>>>>>>> jar ]---------------------------------
>>>>>>>>>>>> 2020-03-25T05:31:52.7518360Z [INFO]
>>>>>>>>>>>> 2020-03-25T05:31:52.7519770Z [INFO] ---
>>>>>>>> maven-checkstyle-plugin:2.17:check (validate) @
>>>>>>>> flink-end-to-end-tests-common-kafka ---
>>>>>>>>>>>> - there is no real benefit in retiring the travis support in
>>>>> CiBot;
>>>>>>>> the important part is whether Travis is run or not for pull
>>>> requests.
>>>>>>>>>>>>    From what I can tell though azure seems to be working fine
>>> for
>>>>>> pull
>>>>>>>> requests, so +1 to at least disable the travis PR runs.
>>>>>>>>>>>> On 23/03/2020 14:48, Robert Metzger wrote:
>>>>>>>>>>>>> Hey devs,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would like to discuss whether it makes sense to fully
>>> switch
>>>> to
>>>>>>>> Azure
>>>>>>>>>>>>> Pipelines and phase out our Travis integration.
>>>>>>>>>>>>> More information on our Azure integration can be found here:
>>>>>>>>>>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines
>>>>>>>>>>>>> Travis will stay for the release-1.10 and older branches,
>>> as I
>>>>> have
>>>>>>>> set up
>>>>>>>>>>>>> Azure only for the master branch.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Proposal:
>>>>>>>>>>>>> - We keep the flinkbot infrastructure supporting both Travis
>>>> and
>>>>>>>> Azure
>>>>>>>>>>>>> around, while we are still receive pull requests and pushes
>>> for
>>>>> the
>>>>>>>>>>>>> "master" and "release-1.10" branches.
>>>>>>>>>>>>> - We remove the travis-specific files from "master", so that
>>>>> builds
>>>>>>>> are not
>>>>>>>>>>>>> triggered anymore
>>>>>>>>>>>>> - once we receive no more builds at Travis (because 1.11 has
>>>> been
>>>>>>>>>>>>> released), we remove the remaining travis-related
>>>> infrastructure
>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Robert
>>>>>>>>
>>>>>>