ci: remove aarch64 from Travis jobs

Message ID 20200416110053.2547791-1-thomas@monjalon.net (mailing list archive)
State Rejected, archived
Delegated to: David Marchand
Headers
Series ci: remove aarch64 from Travis jobs |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/travis-robot success Travis build: passed
ci/Intel-compilation fail Compilation issues
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-nxp-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-testing fail Testing issues

Commit Message

Thomas Monjalon April 16, 2020, 11 a.m. UTC
  Travis is not reliable for native Arm and PPC:
https://travis-ci.community/t/disk-quota-exceeded-on-arm64/7619/6

In order to get reliable Travis reports,
the use of Arm machines is removed until Travis fixes it.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 .travis.yml | 30 ------------------------------
 1 file changed, 30 deletions(-)
  

Comments

Aaron Conole April 16, 2020, 12:44 p.m. UTC | #1
Thomas Monjalon <thomas@monjalon.net> writes:

> Travis is not reliable for native Arm and PPC:
> https://travis-ci.community/t/disk-quota-exceeded-on-arm64/7619/6
>
> In order to get reliable Travis reports,
> the use of Arm machines is removed until Travis fixes it.
>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---

We should add back the cross-build if we do this - at least then we
could have a reliable compilation test of Arm64 code.   Does it make
sense?
  
Jerin Jacob April 16, 2020, 1:30 p.m. UTC | #2
On Thu, Apr 16, 2020 at 6:14 PM Aaron Conole <aconole@redhat.com> wrote:
>
> Thomas Monjalon <thomas@monjalon.net> writes:
>
> > Travis is not reliable for native Arm and PPC:
> > https://travis-ci.community/t/disk-quota-exceeded-on-arm64/7619/6
> >
> > In order to get reliable Travis reports,
> > the use of Arm machines is removed until Travis fixes it.
> >
> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> > ---
>
> We should add back the cross-build if we do this - at least then we
> could have a reliable compilation test of Arm64 code.   Does it make
> sense?

+1

>
  
David Marchand April 16, 2020, 1:43 p.m. UTC | #3
On Thu, Apr 16, 2020 at 2:44 PM Aaron Conole <aconole@redhat.com> wrote:
>
> Thomas Monjalon <thomas@monjalon.net> writes:
>
> > Travis is not reliable for native Arm and PPC:
> > https://travis-ci.community/t/disk-quota-exceeded-on-arm64/7619/6
> >
> > In order to get reliable Travis reports,
> > the use of Arm machines is removed until Travis fixes it.
> >
> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> > ---
>
> We should add back the cross-build if we do this - at least then we
> could have a reliable compilation test of Arm64 code.   Does it make
> sense?

I don't see them removed by this patch, the two jobs are still present ?

  # x86_64 cross-compiling aarch64 jobs
  - env: DEF_LIB="static" AARCH64=1
    arch: amd64
    compiler: gcc
    addons:
      apt:
        packages:
          - *aarch64_packages
  - env: DEF_LIB="shared" AARCH64=1
    arch: amd64
    compiler: gcc
    addons:
      apt:
        packages:
          - *aarch64_packages
  
Aaron Conole April 16, 2020, 1:45 p.m. UTC | #4
David Marchand <david.marchand@redhat.com> writes:

> On Thu, Apr 16, 2020 at 2:44 PM Aaron Conole <aconole@redhat.com> wrote:
>>
>> Thomas Monjalon <thomas@monjalon.net> writes:
>>
>> > Travis is not reliable for native Arm and PPC:
>> > https://travis-ci.community/t/disk-quota-exceeded-on-arm64/7619/6
>> >
>> > In order to get reliable Travis reports,
>> > the use of Arm machines is removed until Travis fixes it.
>> >
>> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
>> > ---
>>
>> We should add back the cross-build if we do this - at least then we
>> could have a reliable compilation test of Arm64 code.   Does it make
>> sense?
>
> I don't see them removed by this patch, the two jobs are still present ?

Whoops - for some reason I missed them.  Nevermind :)

>   # x86_64 cross-compiling aarch64 jobs
>   - env: DEF_LIB="static" AARCH64=1
>     arch: amd64
>     compiler: gcc
>     addons:
>       apt:
>         packages:
>           - *aarch64_packages
>   - env: DEF_LIB="shared" AARCH64=1
>     arch: amd64
>     compiler: gcc
>     addons:
>       apt:
>         packages:
>           - *aarch64_packages
  
Thomas Monjalon April 16, 2020, 2:39 p.m. UTC | #5
16/04/2020 15:45, Aaron Conole:
> David Marchand <david.marchand@redhat.com> writes:
> > On Thu, Apr 16, 2020 at 2:44 PM Aaron Conole <aconole@redhat.com> wrote:
> >> Thomas Monjalon <thomas@monjalon.net> writes:
> >>
> >> > Travis is not reliable for native Arm and PPC:
> >> > https://travis-ci.community/t/disk-quota-exceeded-on-arm64/7619/6
> >> >
> >> > In order to get reliable Travis reports,
> >> > the use of Arm machines is removed until Travis fixes it.
> >> >
> >> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> >> > ---
> >>
> >> We should add back the cross-build if we do this - at least then we
> >> could have a reliable compilation test of Arm64 code.   Does it make
> >> sense?
> >
> > I don't see them removed by this patch, the two jobs are still present ?
> 
> Whoops - for some reason I missed them.  Nevermind :)

So? Acked?
  
Honnappa Nagarahalli April 16, 2020, 3:55 p.m. UTC | #6
<snip>

> Subject: Re: [PATCH] ci: remove aarch64 from Travis jobs
> 
> 16/04/2020 15:45, Aaron Conole:
> > David Marchand <david.marchand@redhat.com> writes:
> > > On Thu, Apr 16, 2020 at 2:44 PM Aaron Conole <aconole@redhat.com>
> wrote:
> > >> Thomas Monjalon <thomas@monjalon.net> writes:
> > >>
> > >> > Travis is not reliable for native Arm and PPC:
> > >> > https://travis-ci.community/t/disk-quota-exceeded-on-arm64/7619/6
Thanks David for creating the ticket. Will escalate this through our contacts at Travis CI, hopefully it can be resolved soon.

> > >> >
> > >> > In order to get reliable Travis reports, the use of Arm machines
> > >> > is removed until Travis fixes it.
> > >> >
> > >> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> > >> > ---
> > >>
> > >> We should add back the cross-build if we do this - at least then we
> > >> could have a reliable compilation test of Arm64 code.   Does it make
> > >> sense?
> > >
> > > I don't see them removed by this patch, the two jobs are still present ?
> >
> > Whoops - for some reason I missed them.  Nevermind :)
> 
> So? Acked?
> 
> 
>
  
Aaron Conole April 16, 2020, 5:07 p.m. UTC | #7
Thomas Monjalon <thomas@monjalon.net> writes:

> 16/04/2020 15:45, Aaron Conole:
>> David Marchand <david.marchand@redhat.com> writes:
>> > On Thu, Apr 16, 2020 at 2:44 PM Aaron Conole <aconole@redhat.com> wrote:
>> >> Thomas Monjalon <thomas@monjalon.net> writes:
>> >>
>> >> > Travis is not reliable for native Arm and PPC:
>> >> > https://travis-ci.community/t/disk-quota-exceeded-on-arm64/7619/6
>> >> >
>> >> > In order to get reliable Travis reports,
>> >> > the use of Arm machines is removed until Travis fixes it.
>> >> >
>> >> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
>> >> > ---
>> >>
>> >> We should add back the cross-build if we do this - at least then we
>> >> could have a reliable compilation test of Arm64 code.   Does it make
>> >> sense?
>> >
>> > I don't see them removed by this patch, the two jobs are still present ?
>> 
>> Whoops - for some reason I missed them.  Nevermind :)
>
> So? Acked?

Yes,

Acked-by: Aaron Conole <aconole@redhat.com>
  
Aaron Conole April 16, 2020, 5:08 p.m. UTC | #8
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> writes:

> <snip>
>
>> Subject: Re: [PATCH] ci: remove aarch64 from Travis jobs
>> 
>> 16/04/2020 15:45, Aaron Conole:
>> > David Marchand <david.marchand@redhat.com> writes:
>> > > On Thu, Apr 16, 2020 at 2:44 PM Aaron Conole <aconole@redhat.com>
>> wrote:
>> > >> Thomas Monjalon <thomas@monjalon.net> writes:
>> > >>
>> > >> > Travis is not reliable for native Arm and PPC:
>> > >> > https://travis-ci.community/t/disk-quota-exceeded-on-arm64/7619/6
> Thanks David for creating the ticket. Will escalate this through our
> contacts at Travis CI, hopefully it can be resolved soon.

I did get an email from someone at travis support acknowledging the
issue and saying that they are working on it.

>> > >> >
>> > >> > In order to get reliable Travis reports, the use of Arm machines
>> > >> > is removed until Travis fixes it.
>> > >> >
>> > >> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
>> > >> > ---
>> > >>
>> > >> We should add back the cross-build if we do this - at least then we
>> > >> could have a reliable compilation test of Arm64 code.   Does it make
>> > >> sense?
>> > >
>> > > I don't see them removed by this patch, the two jobs are still present ?
>> >
>> > Whoops - for some reason I missed them.  Nevermind :)
>> 
>> So? Acked?
>> 
>> 
>>
  
Ruifeng Wang April 17, 2020, 8:49 a.m. UTC | #9
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, April 16, 2020 10:40 PM
> To: Aaron Conole <aconole@redhat.com>
> Cc: David Marchand <david.marchand@redhat.com>; dev <dev@dpdk.org>;
> Ruifeng Wang <Ruifeng.Wang@arm.com>; Gavin Hu <Gavin.Hu@arm.com>;
> Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Michael Santana
> <maicolgabriel@hotmail.com>
> Subject: Re: [PATCH] ci: remove aarch64 from Travis jobs
> 
> 16/04/2020 15:45, Aaron Conole:
> > David Marchand <david.marchand@redhat.com> writes:
> > > On Thu, Apr 16, 2020 at 2:44 PM Aaron Conole <aconole@redhat.com>
> wrote:
> > >> Thomas Monjalon <thomas@monjalon.net> writes:
> > >>
> > >> > Travis is not reliable for native Arm and PPC:
> > >> > https://travis-ci.community/t/disk-quota-exceeded-on-arm64/7619/6
> > >> >
> > >> > In order to get reliable Travis reports, the use of Arm machines
> > >> > is removed until Travis fixes it.
> > >> >
> > >> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> > >> > ---
> > >>
> > >> We should add back the cross-build if we do this - at least then we
> > >> could have a reliable compilation test of Arm64 code.   Does it make
> > >> sense?
> > >
> > > I don't see them removed by this patch, the two jobs are still present ?
> >
> > Whoops - for some reason I missed them.  Nevermind :)
> 
> So? Acked?
> 
Can we achieve this by allowing failures on AArch64 jobs?
https://docs.travis-ci.com/user/build-matrix/#rows-that-are-allowed-to-fail

Add following setting:
jobs:
  allow_failures:
  - arch: arm64

So we can keep the jobs while not suffering from unstable infrastructure.
Results of these jobs will still observable. This gives us a chance to know when jobs are stable.

Thanks.
/Ruifeng
> 
>
  
Thomas Monjalon April 17, 2020, 10:09 a.m. UTC | #10
17/04/2020 10:49, Ruifeng Wang:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 16/04/2020 15:45, Aaron Conole:
> > > David Marchand <david.marchand@redhat.com> writes:
> > > > On Thu, Apr 16, 2020 at 2:44 PM Aaron Conole <aconole@redhat.com>
> > wrote:
> > > >> Thomas Monjalon <thomas@monjalon.net> writes:
> > > >>
> > > >> > Travis is not reliable for native Arm and PPC:
> > > >> > https://travis-ci.community/t/disk-quota-exceeded-on-arm64/7619/6
> > > >> >
> > > >> > In order to get reliable Travis reports, the use of Arm machines
> > > >> > is removed until Travis fixes it.
> > > >> >
> > > >> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> > > >> > ---
> > > >>
> > > >> We should add back the cross-build if we do this - at least then we
> > > >> could have a reliable compilation test of Arm64 code.   Does it make
> > > >> sense?
> > > >
> > > > I don't see them removed by this patch, the two jobs are still present ?
> > >
> > > Whoops - for some reason I missed them.  Nevermind :)
> > 
> > So? Acked?
> > 
> Can we achieve this by allowing failures on AArch64 jobs?
> https://docs.travis-ci.com/user/build-matrix/#rows-that-are-allowed-to-fail
> 
> Add following setting:
> jobs:
>   allow_failures:
>   - arch: arm64
> 
> So we can keep the jobs while not suffering from unstable infrastructure.
> Results of these jobs will still observable. This gives us a chance to know when jobs are stable.

I don't see the benefit. It will just make Travis reports unclear.

I wait at least one more week to give Travis a chance to fix Arm support.
Please work with them.
If no result shortly, I will apply this patch to improve DPDK CI reliability.
  
David Marchand April 19, 2020, 8:01 a.m. UTC | #11
Honnappa, Ruifeng,

On Thu, Apr 16, 2020 at 5:55 PM Honnappa Nagarahalli
<Honnappa.Nagarahalli@arm.com> wrote:
>
> <snip>
>
> > Subject: Re: [PATCH] ci: remove aarch64 from Travis jobs
> >
> > 16/04/2020 15:45, Aaron Conole:
> > > David Marchand <david.marchand@redhat.com> writes:
> > > > On Thu, Apr 16, 2020 at 2:44 PM Aaron Conole <aconole@redhat.com>
> > wrote:
> > > >> Thomas Monjalon <thomas@monjalon.net> writes:
> > > >>
> > > >> > Travis is not reliable for native Arm and PPC:
> > > >> > https://travis-ci.community/t/disk-quota-exceeded-on-arm64/7619/6
> Thanks David for creating the ticket. Will escalate this through our contacts at Travis CI, hopefully it can be resolved soon.

There were failures that were obviously because of Travis, like this
quota exceeded error.
But we have other failures on the unit tests that I reported earlier
that are not clear: it might be because of Travis or running in
containers.

Example on last master build:
https://travis-ci.com/github/DPDK/dpdk/builds/160799081

- cycles_autotest failing:
https://travis-ci.com/github/DPDK/dpdk/jobs/320630402#L3460
- some random test ending up in timeout, this time table_autotest, I
also saw eal_fs_autotest:
https://travis-ci.com/github/DPDK/dpdk/jobs/320630406#L7140
  
Ruifeng Wang April 20, 2020, 3:35 p.m. UTC | #12
> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Sunday, April 19, 2020 4:01 PM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>
> Cc: thomas@monjalon.net; Aaron Conole <aconole@redhat.com>; dev
> <dev@dpdk.org>; Gavin Hu <Gavin.Hu@arm.com>; Michael Santana
> <maicolgabriel@hotmail.com>; nd <nd@arm.com>
> Subject: Re: [PATCH] ci: remove aarch64 from Travis jobs
> 
> Honnappa, Ruifeng,
> 
> On Thu, Apr 16, 2020 at 5:55 PM Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com> wrote:
> >
> > <snip>
> >
> > > Subject: Re: [PATCH] ci: remove aarch64 from Travis jobs
> > >
> > > 16/04/2020 15:45, Aaron Conole:
> > > > David Marchand <david.marchand@redhat.com> writes:
> > > > > On Thu, Apr 16, 2020 at 2:44 PM Aaron Conole
> > > > > <aconole@redhat.com>
> > > wrote:
> > > > >> Thomas Monjalon <thomas@monjalon.net> writes:
> > > > >>
> > > > >> > Travis is not reliable for native Arm and PPC:
> > > > >> > https://travis-ci.community/t/disk-quota-exceeded-on-arm64/76
> > > > >> > 19/6
> > Thanks David for creating the ticket. Will escalate this through our contacts
> at Travis CI, hopefully it can be resolved soon.
> 
> There were failures that were obviously because of Travis, like this quota
> exceeded error.
> But we have other failures on the unit tests that I reported earlier that are
> not clear: it might be because of Travis or running in containers.
> 
Yes. Unit test failures are observed more frequently recently in robot's results.

> Example on last master build:
> https://travis-ci.com/github/DPDK/dpdk/builds/160799081
> 
> - cycles_autotest failing:
> https://travis-ci.com/github/DPDK/dpdk/jobs/320630402#L3460
> - some random test ending up in timeout, this time table_autotest, I also saw
> eal_fs_autotest:
> https://travis-ci.com/github/DPDK/dpdk/jobs/320630406#L7140
> 
My ideas here:
1. Modify the test cases to relax criteria for AArch64.
2. Pick test cases to run for AArch64 on Travis.

Option 2 should be better. It only adapts for CI platform and doesn't change code.

/Ruifeng
> 
> --
> David Marchand
  
Thomas Monjalon March 25, 2021, 3:46 p.m. UTC | #13
16/04/2020 13:00, Thomas Monjalon:
> Travis is not reliable for native Arm and PPC:
> https://travis-ci.community/t/disk-quota-exceeded-on-arm64/7619/6
> 
> In order to get reliable Travis reports,
> the use of Arm machines is removed until Travis fixes it.
> 
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>

We managed without applying this patch.

After one year passed, what is the situation today regarding Travis?
Can we rely on Travis service?
For which workload? Which architecture?

Aaron, what do you recommend?
  
Aaron Conole March 25, 2021, 4:40 p.m. UTC | #14
Thomas Monjalon <thomas@monjalon.net> writes:

> 16/04/2020 13:00, Thomas Monjalon:
>> Travis is not reliable for native Arm and PPC:
>> https://travis-ci.community/t/disk-quota-exceeded-on-arm64/7619/6
>> 
>> In order to get reliable Travis reports,
>> the use of Arm machines is removed until Travis fixes it.
>> 
>> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
>
> We managed without applying this patch.
>
> After one year passed, what is the situation today regarding Travis?
> Can we rely on Travis service?

So far, yes.

> For which workload? Which architecture?

I think for all of them.  Looking at even the failures which pop up for
the latest patches, they seem like real failures.

ex:
  https://travis-ci.com/github/ovsrobot/dpdk/jobs/493722400
  https://travis-ci.com/github/ovsrobot/dpdk/jobs/493688879
  https://travis-ci.com/github/ovsrobot/dpdk/jobs/493624012
  https://travis-ci.com/github/ovsrobot/dpdk/jobs/493611597

These are ABI, and doc failures - different arches, etc.

Seems like it's quite usable.

> Aaron, what do you recommend?

I think we should drop this patch - Travis continues to be useful even
for individual developers checking their own results.  It seems the
service works quite a bit better now for the project as well, thanks to
Honnappa and other ARM folks for working with them.
  
Thomas Monjalon March 25, 2021, 5:11 p.m. UTC | #15
25/03/2021 17:40, Aaron Conole:
> Thomas Monjalon <thomas@monjalon.net> writes:
> >> Travis is not reliable for native Arm and PPC:
> >> https://travis-ci.community/t/disk-quota-exceeded-on-arm64/7619/6
> >> 
> >> In order to get reliable Travis reports,
> >> the use of Arm machines is removed until Travis fixes it.
> >> 
> >> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> >
> > We managed without applying this patch.
> >
> > After one year passed, what is the situation today regarding Travis?
> > Can we rely on Travis service?
> 
> So far, yes.
> 
> > For which workload? Which architecture?
> 
> I think for all of them.  Looking at even the failures which pop up for
> the latest patches, they seem like real failures.
> 
> ex:
>   https://travis-ci.com/github/ovsrobot/dpdk/jobs/493722400
>   https://travis-ci.com/github/ovsrobot/dpdk/jobs/493688879
>   https://travis-ci.com/github/ovsrobot/dpdk/jobs/493624012
>   https://travis-ci.com/github/ovsrobot/dpdk/jobs/493611597
> 
> These are ABI, and doc failures - different arches, etc.
> 
> Seems like it's quite usable.
> 
> > Aaron, what do you recommend?
> 
> I think we should drop this patch - Travis continues to be useful even
> for individual developers checking their own results.  It seems the
> service works quite a bit better now for the project as well, thanks to
> Honnappa and other ARM folks for working with them.

Thanks all, patch classified as "Rejected".
  

Patch

diff --git a/.travis.yml b/.travis.yml
index 2d2292ff64..b681aaccc4 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -101,33 +101,3 @@  jobs:
       apt:
         packages:
           - *aarch64_packages
-  # aarch64 gcc jobs
-  - env: DEF_LIB="static"
-    arch: arm64
-    compiler: gcc
-  - env: DEF_LIB="shared" RUN_TESTS=1
-    arch: arm64
-    compiler: gcc
-  - env: DEF_LIB="shared" BUILD_DOCS=1
-    arch: arm64
-    compiler: gcc
-    addons:
-      apt:
-        packages:
-          - *required_packages
-          - *doc_packages
-  - env: DEF_LIB="shared" ABI_CHECKS=1
-    arch: arm64
-    compiler: gcc
-    addons:
-      apt:
-        packages:
-          - *required_packages
-          - *libabigail_build_packages
-  # aarch64 clang jobs
-  - env: DEF_LIB="static"
-    arch: arm64
-    compiler: clang
-  - env: DEF_LIB="shared" RUN_TESTS=1
-    arch: arm64
-    compiler: clang