[ARM support] Travis ARM CI is now in Alpha Release

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[ARM support] Travis ARM CI is now in Alpha Release

Xiyuan Wang
Hi all,

Recently Travis announced that ARM arch is in Alpha release[1]. Since Flink
has integrated with Travis already, I think it's quite easy for Flink to
use it for ARM CI.

Maybe some of you know that I'm working on Flink ARM testing and support. I
suggested to use OpenLab[2] as the ARM CI infrastructure before. Though
it's not hard to use OpenLab, it'll still introduce some new concept or
burden to Flink. Flink team has another choice now.

And as the discussion before, we can still run ARM CI as Cron job first. I
have ran POC e2e test in OpenLab for some days[3](Of cause, it can be
changed to Travis).

Following travis x86 test, it includes:

flink-end-to-end-test-part1
    split_checkpoints.sh  and split_sticky.sh
flink-end-to-end-test-part2
     split_heavy.sh  and split_ha.sh
flink-end-to-end-test-part3
    split_misc.sh and split_misc_hadoopfree.sh

part1 and part2 runs well. part3 is not statble. I need take more time to
fix part3. container part is not included because the problem5 mentioned
below.

While I did som hacks to make sure the job pass. It includes:
1. Frocksdb ARM package: https://issues.apache.org/jira/browse/FLINK-13598
 (Not solved)
2. PrometheusReporterEndToEndITCase doesn't support ARM arch:
https://issues.apache.org/jira/browse/FLINK-14086 (PR for fix:
https://github.com/apache/flink/pull/9768)
3. Elasticsearch Xpack Machine Learning doesn't support ARM :
https://issues.apache.org/jira/browse/FLINK-14126 (PR for fix:
https://github.com/apache/flink/pull/9765)
4. maven-shade-plugin 3.2.1 doesn't work on ARM for Flink (Fixed, thanks
@Dian Fu )
5. flink e2e container test doesn't support ARM:
https://issues.apache.org/jira/browse/FLINK-14241 (PR for fix:
https://github.com/apache/flink/pull/9782)

No matter which CI Flink will use, all the bug mentioned above should be
fixed. Please help review these PRs. And if you have any question, please
let me know.

No matter which CI Flink will choose, I'd like to keep working on Flink ARM
support and keep testing and fixing ARM related bugs.

Thanks very much.


[1]: https://blog.travis-ci.com/2019-10-07-multi-cpu-architecture-support
[2]: https://openlabtesting.org/
[3]: http://status.openlabtesting.org/builds?project=apache%2Fflink
Reply | Threaded
Open this post in threaded view
|

Re: [ARM support] Travis ARM CI is now in Alpha Release

Xiyuan Wang
According to my test, the Travis ARM CI is not ready. For example:
1. Java8 support is missing.
https://travis-ci.community/t/about-the-arm-cpu-architecture-category/5336/4
2. Cache function is not supported.
https://travis-ci.community/t/no-cache-support-on-arm64/5416

The compile job ran timeout without cache after 50 min. It's not a good
time to use travis ARM CI at the moment. While OpenLab doesn't have any
limitations (It can provide 16U16G VMs with no time limitation for CI job).

Just FYI, any response is welcome.

Thanks. Regards.

Xiyuan Wang <[hidden email]> 于2019年10月16日周三 上午10:37写道:

> Hi all,
>
> Recently Travis announced that ARM arch is in Alpha release[1]. Since
> Flink has integrated with Travis already, I think it's quite easy for Flink
> to use it for ARM CI.
>
> Maybe some of you know that I'm working on Flink ARM testing and support.
> I suggested to use OpenLab[2] as the ARM CI infrastructure before. Though
> it's not hard to use OpenLab, it'll still introduce some new concept or
> burden to Flink. Flink team has another choice now.
>
> And as the discussion before, we can still run ARM CI as Cron job first. I
> have ran POC e2e test in OpenLab for some days[3](Of cause, it can be
> changed to Travis).
>
> Following travis x86 test, it includes:
>
> flink-end-to-end-test-part1
>     split_checkpoints.sh  and split_sticky.sh
> flink-end-to-end-test-part2
>      split_heavy.sh  and split_ha.sh
> flink-end-to-end-test-part3
>     split_misc.sh and split_misc_hadoopfree.sh
>
> part1 and part2 runs well. part3 is not statble. I need take more time to
> fix part3. container part is not included because the problem5 mentioned
> below.
>
> While I did som hacks to make sure the job pass. It includes:
> 1. Frocksdb ARM package: https://issues.apache.org/jira/browse/FLINK-13598
>  (Not solved)
> 2. PrometheusReporterEndToEndITCase doesn't support ARM arch:
> https://issues.apache.org/jira/browse/FLINK-14086 (PR for fix:
> https://github.com/apache/flink/pull/9768)
> 3. Elasticsearch Xpack Machine Learning doesn't support ARM :
> https://issues.apache.org/jira/browse/FLINK-14126 (PR for fix:
> https://github.com/apache/flink/pull/9765)
> 4. maven-shade-plugin 3.2.1 doesn't work on ARM for Flink (Fixed, thanks
> @Dian Fu )
> 5. flink e2e container test doesn't support ARM:
> https://issues.apache.org/jira/browse/FLINK-14241 (PR for fix:
> https://github.com/apache/flink/pull/9782)
>
> No matter which CI Flink will use, all the bug mentioned above should be
> fixed. Please help review these PRs. And if you have any question, please
> let me know.
>
> No matter which CI Flink will choose, I'd like to keep working on Flink
> ARM support and keep testing and fixing ARM related bugs.
>
> Thanks very much.
>
>
> [1]: https://blog.travis-ci.com/2019-10-07-multi-cpu-architecture-support
> [2]: https://openlabtesting.org/
> [3]: http://status.openlabtesting.org/builds?project=apache%2Fflink
>
Reply | Threaded
Open this post in threaded view
|

Re: [ARM support] Travis ARM CI is now in Alpha Release

Robert Metzger
Hey Xiyuan,
thanks a lot for checking out Travis ARM-based offering.

As part of the "Reducing build times" discussion, we have considered moving
away from Travis to Azure Pipelines. What I want to say is that Travis
might not be important for the Flink community in the long run.
I think running the ARM tests on the openlab infrastructure as a cron job
is fine, until all tests are passing on ARM. Once we have achieved full ARM
compatibility, we can consider integrating the ARM build into the regular
check (through Flinkbot) to ensure that we maintain the architecture
support.

Thank you also for posting links to pull requests fixing the test issues. I
hope a committer soon finds time to take a look.

I have the feeling that there's currently no committer who's feeling
responsible for helping getting this effort done. I'm only guessing why we
have this situation, but I see the following potential reasons:
1. There's too many other competing pull requests, which are considered
more important
2. People don't believe that ARM support is important for Flink. It seems
that Apache Spark used to have ARM support, and is now re-adding it after
Openlab reached out to them as well [2]. There seem to be some research
projects [3] or some marketing? [4] for it, but the number of people
actually asking for it is unclear to me at this point.
I am not aware of any data or anecdotal knowledge of ARM-based server
platforms being adopted in Flink's space. As long as we don't have users
asking for it, it remains a bet. For Apache Kafka, I could not find any
evidence that goes beyond toy projects.
3. Since Openlab apparently reached out to multiple open source projects
regarding ARM support, I wonder about Openlab's long-term commitment and
motivation. I assume your goal is to help growing the adoption of the ARM
CPU architecture, by making sure as many tools as possible are supported by
it. I don't want to stand in your way of growing ARM's adoption, and the
benefit for Flink is also clear: We will potentially reach more users, and
we might get additional attention for the project. On the other hand, I see
risks, such as Openlab loosing interest / funding / ... in the middle of
the project.

I personally don't feel comfortable reviewing the changes, because I
haven't been very active in the day to day development of Flink recently,
and I don't want to make changes in code-areas I'm not fully confident in.
But I hope that this discussion might shed some light into the reasons for
the low activity on the effort.

Best,
Robert

[1]
https://lists.apache.org/thread.html/4d7e6b1fd5c570973a68de91438dd9045afdae1685b1d1467b2149ce@%3Cdev.flink.apache.org%3E
[2]
http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Ask-for-ARM-CI-for-spark-td27415.html#a27440
[3]
https://developer.arm.com/-/media/Arm%20Developer%20Community/Images/White%20Paper%20and%20Webinar%20Images/HPC%20White%20Papers/UCAM_Arm_Spark_2017.pdf?revision=6e22a6b7-16a0-4478-8eca-2835b69c7305
[4] http://www.sparkonarm.com/


On Mon, Oct 21, 2019 at 10:22 AM Xiyuan Wang <[hidden email]>
wrote:

> According to my test, the Travis ARM CI is not ready. For example:
> 1. Java8 support is missing.
>
> https://travis-ci.community/t/about-the-arm-cpu-architecture-category/5336/4
> 2. Cache function is not supported.
> https://travis-ci.community/t/no-cache-support-on-arm64/5416
>
> The compile job ran timeout without cache after 50 min. It's not a good
> time to use travis ARM CI at the moment. While OpenLab doesn't have any
> limitations (It can provide 16U16G VMs with no time limitation for CI job).
>
> Just FYI, any response is welcome.
>
> Thanks. Regards.
>
> Xiyuan Wang <[hidden email]> 于2019年10月16日周三 上午10:37写道:
>
> > Hi all,
> >
> > Recently Travis announced that ARM arch is in Alpha release[1]. Since
> > Flink has integrated with Travis already, I think it's quite easy for
> Flink
> > to use it for ARM CI.
> >
> > Maybe some of you know that I'm working on Flink ARM testing and support.
> > I suggested to use OpenLab[2] as the ARM CI infrastructure before. Though
> > it's not hard to use OpenLab, it'll still introduce some new concept or
> > burden to Flink. Flink team has another choice now.
> >
> > And as the discussion before, we can still run ARM CI as Cron job first.
> I
> > have ran POC e2e test in OpenLab for some days[3](Of cause, it can be
> > changed to Travis).
> >
> > Following travis x86 test, it includes:
> >
> > flink-end-to-end-test-part1
> >     split_checkpoints.sh  and split_sticky.sh
> > flink-end-to-end-test-part2
> >      split_heavy.sh  and split_ha.sh
> > flink-end-to-end-test-part3
> >     split_misc.sh and split_misc_hadoopfree.sh
> >
> > part1 and part2 runs well. part3 is not statble. I need take more time to
> > fix part3. container part is not included because the problem5 mentioned
> > below.
> >
> > While I did som hacks to make sure the job pass. It includes:
> > 1. Frocksdb ARM package:
> https://issues.apache.org/jira/browse/FLINK-13598
> >  (Not solved)
> > 2. PrometheusReporterEndToEndITCase doesn't support ARM arch:
> > https://issues.apache.org/jira/browse/FLINK-14086 (PR for fix:
> > https://github.com/apache/flink/pull/9768)
> > 3. Elasticsearch Xpack Machine Learning doesn't support ARM :
> > https://issues.apache.org/jira/browse/FLINK-14126 (PR for fix:
> > https://github.com/apache/flink/pull/9765)
> > 4. maven-shade-plugin 3.2.1 doesn't work on ARM for Flink (Fixed, thanks
> > @Dian Fu )
> > 5. flink e2e container test doesn't support ARM:
> > https://issues.apache.org/jira/browse/FLINK-14241 (PR for fix:
> > https://github.com/apache/flink/pull/9782)
> >
> > No matter which CI Flink will use, all the bug mentioned above should be
> > fixed. Please help review these PRs. And if you have any question, please
> > let me know.
> >
> > No matter which CI Flink will choose, I'd like to keep working on Flink
> > ARM support and keep testing and fixing ARM related bugs.
> >
> > Thanks very much.
> >
> >
> > [1]:
> https://blog.travis-ci.com/2019-10-07-multi-cpu-architecture-support
> > [2]: https://openlabtesting.org/
> > [3]: http://status.openlabtesting.org/builds?project=apache%2Fflink
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [ARM support] Travis ARM CI is now in Alpha Release

Xiyuan Wang
Hi Robert,
Glad to get your response.

All the change in my PRs are related to Unit test and e2e test. They won't
break any Flink functions. Any review or comment is welcome. The cron job
has ran on OpenLab some days[1], Now I need the privilege to send the
result to [hidden email]. So that the community can know the
details. I don't know who can help me.

OpenLab is an open soure community to provide CI/CD system and
infrastructures for open source projects. It's not only focus on Apache and
ARM. It has already supported many projects. Such as many tools in
OpenStack ecosystem on X86, CNCF commuinty projects on X86 and ARM. The
resources behind OpenLab come from many providers, such as Citynetwork
public cloud, Chameleon cloud, Switch cloud, Linaro and Huawei public
cloud. So don't worry that Openlab will lost interest/funding/...

As you said, Our goal in Apache is to help building ARM echosystem. OpenLab
is just a choice, not a mandatory requirement. We are now planning to
donate ARM resouces to Apache Infra as well.[2] If you don't like OpenLab,
Apache Infra jenkins can be another choice soon. Or even you choise other
ARM tools, we can keep helping ARM support work.

Now ARM plays more and more important role, not only in server side, but
also in terminal side. The number of ARM based server becomes larger and
the performance is keeping improving. Many companies has released their ARM
servers. If Flink can support ARM, I think it can be potentially reach more
users and the community can be more booming. And since Flink is wroted by
Java which can be ran cross platform by default, Flink ARM support is not a
big work. Just some nits need be updated, such as test code, dependency and
CI gate.

Thanks.

[1]: http://status.openlabtesting.org/builds?project=apache%2Fflink
[2]: https://issues.apache.org/jira/browse/INFRA-18761

Robert Metzger <[hidden email]> 于2019年10月22日周二 下午4:47写道:

> Hey Xiyuan,
> thanks a lot for checking out Travis ARM-based offering.
>
> As part of the "Reducing build times" discussion, we have considered moving
> away from Travis to Azure Pipelines. What I want to say is that Travis
> might not be important for the Flink community in the long run.
> I think running the ARM tests on the openlab infrastructure as a cron job
> is fine, until all tests are passing on ARM. Once we have achieved full ARM
> compatibility, we can consider integrating the ARM build into the regular
> check (through Flinkbot) to ensure that we maintain the architecture
> support.
>
> Thank you also for posting links to pull requests fixing the test issues. I
> hope a committer soon finds time to take a look.
>
> I have the feeling that there's currently no committer who's feeling
> responsible for helping getting this effort done. I'm only guessing why we
> have this situation, but I see the following potential reasons:
> 1. There's too many other competing pull requests, which are considered
> more important
> 2. People don't believe that ARM support is important for Flink. It seems
> that Apache Spark used to have ARM support, and is now re-adding it after
> Openlab reached out to them as well [2]. There seem to be some research
> projects [3] or some marketing? [4] for it, but the number of people
> actually asking for it is unclear to me at this point.
> I am not aware of any data or anecdotal knowledge of ARM-based server
> platforms being adopted in Flink's space. As long as we don't have users
> asking for it, it remains a bet. For Apache Kafka, I could not find any
> evidence that goes beyond toy projects.
> 3. Since Openlab apparently reached out to multiple open source projects
> regarding ARM support, I wonder about Openlab's long-term commitment and
> motivation. I assume your goal is to help growing the adoption of the ARM
> CPU architecture, by making sure as many tools as possible are supported by
> it. I don't want to stand in your way of growing ARM's adoption, and the
> benefit for Flink is also clear: We will potentially reach more users, and
> we might get additional attention for the project. On the other hand, I see
> risks, such as Openlab loosing interest / funding / ... in the middle of
> the project.
>
> I personally don't feel comfortable reviewing the changes, because I
> haven't been very active in the day to day development of Flink recently,
> and I don't want to make changes in code-areas I'm not fully confident in.
> But I hope that this discussion might shed some light into the reasons for
> the low activity on the effort.
>
> Best,
> Robert
>
> [1]
>
> https://lists.apache.org/thread.html/4d7e6b1fd5c570973a68de91438dd9045afdae1685b1d1467b2149ce@%3Cdev.flink.apache.org%3E
> [2]
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Ask-for-ARM-CI-for-spark-td27415.html#a27440
> [3]
>
> https://developer.arm.com/-/media/Arm%20Developer%20Community/Images/White%20Paper%20and%20Webinar%20Images/HPC%20White%20Papers/UCAM_Arm_Spark_2017.pdf?revision=6e22a6b7-16a0-4478-8eca-2835b69c7305
> [4] http://www.sparkonarm.com/
>
>
> On Mon, Oct 21, 2019 at 10:22 AM Xiyuan Wang <[hidden email]>
> wrote:
>
> > According to my test, the Travis ARM CI is not ready. For example:
> > 1. Java8 support is missing.
> >
> >
> https://travis-ci.community/t/about-the-arm-cpu-architecture-category/5336/4
> > 2. Cache function is not supported.
> > https://travis-ci.community/t/no-cache-support-on-arm64/5416
> >
> > The compile job ran timeout without cache after 50 min. It's not a good
> > time to use travis ARM CI at the moment. While OpenLab doesn't have any
> > limitations (It can provide 16U16G VMs with no time limitation for CI
> job).
> >
> > Just FYI, any response is welcome.
> >
> > Thanks. Regards.
> >
> > Xiyuan Wang <[hidden email]> 于2019年10月16日周三 上午10:37写道:
> >
> > > Hi all,
> > >
> > > Recently Travis announced that ARM arch is in Alpha release[1]. Since
> > > Flink has integrated with Travis already, I think it's quite easy for
> > Flink
> > > to use it for ARM CI.
> > >
> > > Maybe some of you know that I'm working on Flink ARM testing and
> support.
> > > I suggested to use OpenLab[2] as the ARM CI infrastructure before.
> Though
> > > it's not hard to use OpenLab, it'll still introduce some new concept or
> > > burden to Flink. Flink team has another choice now.
> > >
> > > And as the discussion before, we can still run ARM CI as Cron job
> first.
> > I
> > > have ran POC e2e test in OpenLab for some days[3](Of cause, it can be
> > > changed to Travis).
> > >
> > > Following travis x86 test, it includes:
> > >
> > > flink-end-to-end-test-part1
> > >     split_checkpoints.sh  and split_sticky.sh
> > > flink-end-to-end-test-part2
> > >      split_heavy.sh  and split_ha.sh
> > > flink-end-to-end-test-part3
> > >     split_misc.sh and split_misc_hadoopfree.sh
> > >
> > > part1 and part2 runs well. part3 is not statble. I need take more time
> to
> > > fix part3. container part is not included because the problem5
> mentioned
> > > below.
> > >
> > > While I did som hacks to make sure the job pass. It includes:
> > > 1. Frocksdb ARM package:
> > https://issues.apache.org/jira/browse/FLINK-13598
> > >  (Not solved)
> > > 2. PrometheusReporterEndToEndITCase doesn't support ARM arch:
> > > https://issues.apache.org/jira/browse/FLINK-14086 (PR for fix:
> > > https://github.com/apache/flink/pull/9768)
> > > 3. Elasticsearch Xpack Machine Learning doesn't support ARM :
> > > https://issues.apache.org/jira/browse/FLINK-14126 (PR for fix:
> > > https://github.com/apache/flink/pull/9765)
> > > 4. maven-shade-plugin 3.2.1 doesn't work on ARM for Flink (Fixed,
> thanks
> > > @Dian Fu )
> > > 5. flink e2e container test doesn't support ARM:
> > > https://issues.apache.org/jira/browse/FLINK-14241 (PR for fix:
> > > https://github.com/apache/flink/pull/9782)
> > >
> > > No matter which CI Flink will use, all the bug mentioned above should
> be
> > > fixed. Please help review these PRs. And if you have any question,
> please
> > > let me know.
> > >
> > > No matter which CI Flink will choose, I'd like to keep working on Flink
> > > ARM support and keep testing and fixing ARM related bugs.
> > >
> > > Thanks very much.
> > >
> > >
> > > [1]:
> > https://blog.travis-ci.com/2019-10-07-multi-cpu-architecture-support
> > > [2]: https://openlabtesting.org/
> > > [3]: http://status.openlabtesting.org/builds?project=apache%2Fflink
> > >
> >
>