Hi all,
Recently Travis announced that ARM arch is in Alpha release[1]. Since Flink has integrated with Travis already, I think it's quite easy for Flink to use it for ARM CI. Maybe some of you know that I'm working on Flink ARM testing and support. I suggested to use OpenLab[2] as the ARM CI infrastructure before. Though it's not hard to use OpenLab, it'll still introduce some new concept or burden to Flink. Flink team has another choice now. And as the discussion before, we can still run ARM CI as Cron job first. I have ran POC e2e test in OpenLab for some days[3](Of cause, it can be changed to Travis). Following travis x86 test, it includes: flink-end-to-end-test-part1 split_checkpoints.sh and split_sticky.sh flink-end-to-end-test-part2 split_heavy.sh and split_ha.sh flink-end-to-end-test-part3 split_misc.sh and split_misc_hadoopfree.sh part1 and part2 runs well. part3 is not statble. I need take more time to fix part3. container part is not included because the problem5 mentioned below. While I did som hacks to make sure the job pass. It includes: 1. Frocksdb ARM package: https://issues.apache.org/jira/browse/FLINK-13598 (Not solved) 2. PrometheusReporterEndToEndITCase doesn't support ARM arch: https://issues.apache.org/jira/browse/FLINK-14086 (PR for fix: https://github.com/apache/flink/pull/9768) 3. Elasticsearch Xpack Machine Learning doesn't support ARM : https://issues.apache.org/jira/browse/FLINK-14126 (PR for fix: https://github.com/apache/flink/pull/9765) 4. maven-shade-plugin 3.2.1 doesn't work on ARM for Flink (Fixed, thanks @Dian Fu ) 5. flink e2e container test doesn't support ARM: https://issues.apache.org/jira/browse/FLINK-14241 (PR for fix: https://github.com/apache/flink/pull/9782) No matter which CI Flink will use, all the bug mentioned above should be fixed. Please help review these PRs. And if you have any question, please let me know. No matter which CI Flink will choose, I'd like to keep working on Flink ARM support and keep testing and fixing ARM related bugs. Thanks very much. [1]: https://blog.travis-ci.com/2019-10-07-multi-cpu-architecture-support [2]: https://openlabtesting.org/ [3]: http://status.openlabtesting.org/builds?project=apache%2Fflink |
According to my test, the Travis ARM CI is not ready. For example:
1. Java8 support is missing. https://travis-ci.community/t/about-the-arm-cpu-architecture-category/5336/4 2. Cache function is not supported. https://travis-ci.community/t/no-cache-support-on-arm64/5416 The compile job ran timeout without cache after 50 min. It's not a good time to use travis ARM CI at the moment. While OpenLab doesn't have any limitations (It can provide 16U16G VMs with no time limitation for CI job). Just FYI, any response is welcome. Thanks. Regards. Xiyuan Wang <[hidden email]> 于2019年10月16日周三 上午10:37写道: > Hi all, > > Recently Travis announced that ARM arch is in Alpha release[1]. Since > Flink has integrated with Travis already, I think it's quite easy for Flink > to use it for ARM CI. > > Maybe some of you know that I'm working on Flink ARM testing and support. > I suggested to use OpenLab[2] as the ARM CI infrastructure before. Though > it's not hard to use OpenLab, it'll still introduce some new concept or > burden to Flink. Flink team has another choice now. > > And as the discussion before, we can still run ARM CI as Cron job first. I > have ran POC e2e test in OpenLab for some days[3](Of cause, it can be > changed to Travis). > > Following travis x86 test, it includes: > > flink-end-to-end-test-part1 > split_checkpoints.sh and split_sticky.sh > flink-end-to-end-test-part2 > split_heavy.sh and split_ha.sh > flink-end-to-end-test-part3 > split_misc.sh and split_misc_hadoopfree.sh > > part1 and part2 runs well. part3 is not statble. I need take more time to > fix part3. container part is not included because the problem5 mentioned > below. > > While I did som hacks to make sure the job pass. It includes: > 1. Frocksdb ARM package: https://issues.apache.org/jira/browse/FLINK-13598 > (Not solved) > 2. PrometheusReporterEndToEndITCase doesn't support ARM arch: > https://issues.apache.org/jira/browse/FLINK-14086 (PR for fix: > https://github.com/apache/flink/pull/9768) > 3. Elasticsearch Xpack Machine Learning doesn't support ARM : > https://issues.apache.org/jira/browse/FLINK-14126 (PR for fix: > https://github.com/apache/flink/pull/9765) > 4. maven-shade-plugin 3.2.1 doesn't work on ARM for Flink (Fixed, thanks > @Dian Fu ) > 5. flink e2e container test doesn't support ARM: > https://issues.apache.org/jira/browse/FLINK-14241 (PR for fix: > https://github.com/apache/flink/pull/9782) > > No matter which CI Flink will use, all the bug mentioned above should be > fixed. Please help review these PRs. And if you have any question, please > let me know. > > No matter which CI Flink will choose, I'd like to keep working on Flink > ARM support and keep testing and fixing ARM related bugs. > > Thanks very much. > > > [1]: https://blog.travis-ci.com/2019-10-07-multi-cpu-architecture-support > [2]: https://openlabtesting.org/ > [3]: http://status.openlabtesting.org/builds?project=apache%2Fflink > |
Hey Xiyuan,
thanks a lot for checking out Travis ARM-based offering. As part of the "Reducing build times" discussion, we have considered moving away from Travis to Azure Pipelines. What I want to say is that Travis might not be important for the Flink community in the long run. I think running the ARM tests on the openlab infrastructure as a cron job is fine, until all tests are passing on ARM. Once we have achieved full ARM compatibility, we can consider integrating the ARM build into the regular check (through Flinkbot) to ensure that we maintain the architecture support. Thank you also for posting links to pull requests fixing the test issues. I hope a committer soon finds time to take a look. I have the feeling that there's currently no committer who's feeling responsible for helping getting this effort done. I'm only guessing why we have this situation, but I see the following potential reasons: 1. There's too many other competing pull requests, which are considered more important 2. People don't believe that ARM support is important for Flink. It seems that Apache Spark used to have ARM support, and is now re-adding it after Openlab reached out to them as well [2]. There seem to be some research projects [3] or some marketing? [4] for it, but the number of people actually asking for it is unclear to me at this point. I am not aware of any data or anecdotal knowledge of ARM-based server platforms being adopted in Flink's space. As long as we don't have users asking for it, it remains a bet. For Apache Kafka, I could not find any evidence that goes beyond toy projects. 3. Since Openlab apparently reached out to multiple open source projects regarding ARM support, I wonder about Openlab's long-term commitment and motivation. I assume your goal is to help growing the adoption of the ARM CPU architecture, by making sure as many tools as possible are supported by it. I don't want to stand in your way of growing ARM's adoption, and the benefit for Flink is also clear: We will potentially reach more users, and we might get additional attention for the project. On the other hand, I see risks, such as Openlab loosing interest / funding / ... in the middle of the project. I personally don't feel comfortable reviewing the changes, because I haven't been very active in the day to day development of Flink recently, and I don't want to make changes in code-areas I'm not fully confident in. But I hope that this discussion might shed some light into the reasons for the low activity on the effort. Best, Robert [1] https://lists.apache.org/thread.html/4d7e6b1fd5c570973a68de91438dd9045afdae1685b1d1467b2149ce@%3Cdev.flink.apache.org%3E [2] http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Ask-for-ARM-CI-for-spark-td27415.html#a27440 [3] https://developer.arm.com/-/media/Arm%20Developer%20Community/Images/White%20Paper%20and%20Webinar%20Images/HPC%20White%20Papers/UCAM_Arm_Spark_2017.pdf?revision=6e22a6b7-16a0-4478-8eca-2835b69c7305 [4] http://www.sparkonarm.com/ On Mon, Oct 21, 2019 at 10:22 AM Xiyuan Wang <[hidden email]> wrote: > According to my test, the Travis ARM CI is not ready. For example: > 1. Java8 support is missing. > > https://travis-ci.community/t/about-the-arm-cpu-architecture-category/5336/4 > 2. Cache function is not supported. > https://travis-ci.community/t/no-cache-support-on-arm64/5416 > > The compile job ran timeout without cache after 50 min. It's not a good > time to use travis ARM CI at the moment. While OpenLab doesn't have any > limitations (It can provide 16U16G VMs with no time limitation for CI job). > > Just FYI, any response is welcome. > > Thanks. Regards. > > Xiyuan Wang <[hidden email]> 于2019年10月16日周三 上午10:37写道: > > > Hi all, > > > > Recently Travis announced that ARM arch is in Alpha release[1]. Since > > Flink has integrated with Travis already, I think it's quite easy for > Flink > > to use it for ARM CI. > > > > Maybe some of you know that I'm working on Flink ARM testing and support. > > I suggested to use OpenLab[2] as the ARM CI infrastructure before. Though > > it's not hard to use OpenLab, it'll still introduce some new concept or > > burden to Flink. Flink team has another choice now. > > > > And as the discussion before, we can still run ARM CI as Cron job first. > I > > have ran POC e2e test in OpenLab for some days[3](Of cause, it can be > > changed to Travis). > > > > Following travis x86 test, it includes: > > > > flink-end-to-end-test-part1 > > split_checkpoints.sh and split_sticky.sh > > flink-end-to-end-test-part2 > > split_heavy.sh and split_ha.sh > > flink-end-to-end-test-part3 > > split_misc.sh and split_misc_hadoopfree.sh > > > > part1 and part2 runs well. part3 is not statble. I need take more time to > > fix part3. container part is not included because the problem5 mentioned > > below. > > > > While I did som hacks to make sure the job pass. It includes: > > 1. Frocksdb ARM package: > https://issues.apache.org/jira/browse/FLINK-13598 > > (Not solved) > > 2. PrometheusReporterEndToEndITCase doesn't support ARM arch: > > https://issues.apache.org/jira/browse/FLINK-14086 (PR for fix: > > https://github.com/apache/flink/pull/9768) > > 3. Elasticsearch Xpack Machine Learning doesn't support ARM : > > https://issues.apache.org/jira/browse/FLINK-14126 (PR for fix: > > https://github.com/apache/flink/pull/9765) > > 4. maven-shade-plugin 3.2.1 doesn't work on ARM for Flink (Fixed, thanks > > @Dian Fu ) > > 5. flink e2e container test doesn't support ARM: > > https://issues.apache.org/jira/browse/FLINK-14241 (PR for fix: > > https://github.com/apache/flink/pull/9782) > > > > No matter which CI Flink will use, all the bug mentioned above should be > > fixed. Please help review these PRs. And if you have any question, please > > let me know. > > > > No matter which CI Flink will choose, I'd like to keep working on Flink > > ARM support and keep testing and fixing ARM related bugs. > > > > Thanks very much. > > > > > > [1]: > https://blog.travis-ci.com/2019-10-07-multi-cpu-architecture-support > > [2]: https://openlabtesting.org/ > > [3]: http://status.openlabtesting.org/builds?project=apache%2Fflink > > > |
Hi Robert,
Glad to get your response. All the change in my PRs are related to Unit test and e2e test. They won't break any Flink functions. Any review or comment is welcome. The cron job has ran on OpenLab some days[1], Now I need the privilege to send the result to [hidden email]. So that the community can know the details. I don't know who can help me. OpenLab is an open soure community to provide CI/CD system and infrastructures for open source projects. It's not only focus on Apache and ARM. It has already supported many projects. Such as many tools in OpenStack ecosystem on X86, CNCF commuinty projects on X86 and ARM. The resources behind OpenLab come from many providers, such as Citynetwork public cloud, Chameleon cloud, Switch cloud, Linaro and Huawei public cloud. So don't worry that Openlab will lost interest/funding/... As you said, Our goal in Apache is to help building ARM echosystem. OpenLab is just a choice, not a mandatory requirement. We are now planning to donate ARM resouces to Apache Infra as well.[2] If you don't like OpenLab, Apache Infra jenkins can be another choice soon. Or even you choise other ARM tools, we can keep helping ARM support work. Now ARM plays more and more important role, not only in server side, but also in terminal side. The number of ARM based server becomes larger and the performance is keeping improving. Many companies has released their ARM servers. If Flink can support ARM, I think it can be potentially reach more users and the community can be more booming. And since Flink is wroted by Java which can be ran cross platform by default, Flink ARM support is not a big work. Just some nits need be updated, such as test code, dependency and CI gate. Thanks. [1]: http://status.openlabtesting.org/builds?project=apache%2Fflink [2]: https://issues.apache.org/jira/browse/INFRA-18761 Robert Metzger <[hidden email]> 于2019年10月22日周二 下午4:47写道: > Hey Xiyuan, > thanks a lot for checking out Travis ARM-based offering. > > As part of the "Reducing build times" discussion, we have considered moving > away from Travis to Azure Pipelines. What I want to say is that Travis > might not be important for the Flink community in the long run. > I think running the ARM tests on the openlab infrastructure as a cron job > is fine, until all tests are passing on ARM. Once we have achieved full ARM > compatibility, we can consider integrating the ARM build into the regular > check (through Flinkbot) to ensure that we maintain the architecture > support. > > Thank you also for posting links to pull requests fixing the test issues. I > hope a committer soon finds time to take a look. > > I have the feeling that there's currently no committer who's feeling > responsible for helping getting this effort done. I'm only guessing why we > have this situation, but I see the following potential reasons: > 1. There's too many other competing pull requests, which are considered > more important > 2. People don't believe that ARM support is important for Flink. It seems > that Apache Spark used to have ARM support, and is now re-adding it after > Openlab reached out to them as well [2]. There seem to be some research > projects [3] or some marketing? [4] for it, but the number of people > actually asking for it is unclear to me at this point. > I am not aware of any data or anecdotal knowledge of ARM-based server > platforms being adopted in Flink's space. As long as we don't have users > asking for it, it remains a bet. For Apache Kafka, I could not find any > evidence that goes beyond toy projects. > 3. Since Openlab apparently reached out to multiple open source projects > regarding ARM support, I wonder about Openlab's long-term commitment and > motivation. I assume your goal is to help growing the adoption of the ARM > CPU architecture, by making sure as many tools as possible are supported by > it. I don't want to stand in your way of growing ARM's adoption, and the > benefit for Flink is also clear: We will potentially reach more users, and > we might get additional attention for the project. On the other hand, I see > risks, such as Openlab loosing interest / funding / ... in the middle of > the project. > > I personally don't feel comfortable reviewing the changes, because I > haven't been very active in the day to day development of Flink recently, > and I don't want to make changes in code-areas I'm not fully confident in. > But I hope that this discussion might shed some light into the reasons for > the low activity on the effort. > > Best, > Robert > > [1] > > https://lists.apache.org/thread.html/4d7e6b1fd5c570973a68de91438dd9045afdae1685b1d1467b2149ce@%3Cdev.flink.apache.org%3E > [2] > > http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Ask-for-ARM-CI-for-spark-td27415.html#a27440 > [3] > > https://developer.arm.com/-/media/Arm%20Developer%20Community/Images/White%20Paper%20and%20Webinar%20Images/HPC%20White%20Papers/UCAM_Arm_Spark_2017.pdf?revision=6e22a6b7-16a0-4478-8eca-2835b69c7305 > [4] http://www.sparkonarm.com/ > > > On Mon, Oct 21, 2019 at 10:22 AM Xiyuan Wang <[hidden email]> > wrote: > > > According to my test, the Travis ARM CI is not ready. For example: > > 1. Java8 support is missing. > > > > > https://travis-ci.community/t/about-the-arm-cpu-architecture-category/5336/4 > > 2. Cache function is not supported. > > https://travis-ci.community/t/no-cache-support-on-arm64/5416 > > > > The compile job ran timeout without cache after 50 min. It's not a good > > time to use travis ARM CI at the moment. While OpenLab doesn't have any > > limitations (It can provide 16U16G VMs with no time limitation for CI > job). > > > > Just FYI, any response is welcome. > > > > Thanks. Regards. > > > > Xiyuan Wang <[hidden email]> 于2019年10月16日周三 上午10:37写道: > > > > > Hi all, > > > > > > Recently Travis announced that ARM arch is in Alpha release[1]. Since > > > Flink has integrated with Travis already, I think it's quite easy for > > Flink > > > to use it for ARM CI. > > > > > > Maybe some of you know that I'm working on Flink ARM testing and > support. > > > I suggested to use OpenLab[2] as the ARM CI infrastructure before. > Though > > > it's not hard to use OpenLab, it'll still introduce some new concept or > > > burden to Flink. Flink team has another choice now. > > > > > > And as the discussion before, we can still run ARM CI as Cron job > first. > > I > > > have ran POC e2e test in OpenLab for some days[3](Of cause, it can be > > > changed to Travis). > > > > > > Following travis x86 test, it includes: > > > > > > flink-end-to-end-test-part1 > > > split_checkpoints.sh and split_sticky.sh > > > flink-end-to-end-test-part2 > > > split_heavy.sh and split_ha.sh > > > flink-end-to-end-test-part3 > > > split_misc.sh and split_misc_hadoopfree.sh > > > > > > part1 and part2 runs well. part3 is not statble. I need take more time > to > > > fix part3. container part is not included because the problem5 > mentioned > > > below. > > > > > > While I did som hacks to make sure the job pass. It includes: > > > 1. Frocksdb ARM package: > > https://issues.apache.org/jira/browse/FLINK-13598 > > > (Not solved) > > > 2. PrometheusReporterEndToEndITCase doesn't support ARM arch: > > > https://issues.apache.org/jira/browse/FLINK-14086 (PR for fix: > > > https://github.com/apache/flink/pull/9768) > > > 3. Elasticsearch Xpack Machine Learning doesn't support ARM : > > > https://issues.apache.org/jira/browse/FLINK-14126 (PR for fix: > > > https://github.com/apache/flink/pull/9765) > > > 4. maven-shade-plugin 3.2.1 doesn't work on ARM for Flink (Fixed, > thanks > > > @Dian Fu ) > > > 5. flink e2e container test doesn't support ARM: > > > https://issues.apache.org/jira/browse/FLINK-14241 (PR for fix: > > > https://github.com/apache/flink/pull/9782) > > > > > > No matter which CI Flink will use, all the bug mentioned above should > be > > > fixed. Please help review these PRs. And if you have any question, > please > > > let me know. > > > > > > No matter which CI Flink will choose, I'd like to keep working on Flink > > > ARM support and keep testing and fixing ARM related bugs. > > > > > > Thanks very much. > > > > > > > > > [1]: > > https://blog.travis-ci.com/2019-10-07-multi-cpu-architecture-support > > > [2]: https://openlabtesting.org/ > > > [3]: http://status.openlabtesting.org/builds?project=apache%2Fflink > > > > > > |
Free forum by Nabble | Edit this page |