mbox series

[RFC,0/3] ci: enable unit tests for non-aarch64 platforms

Message ID 20190329172241.11916-1-aconole@redhat.com (mailing list archive)
Headers
Series ci: enable unit tests for non-aarch64 platforms |

Message

Aaron Conole March 29, 2019, 5:22 p.m. UTC
  This series is submitted as an RFC because a number of the unit tests are
not successful in the travis environment.  If all of them were passing,
this would be submitted as PATCH instead.  It could be accepted as-is but I
would prefer to see all the tests passing first.

The first patch fixes up the tests to auto-detect the number of cores on
a machine.  This helps on lower-end systems (such as i3 laptops or something)
where someone wants to verify the functionality.  The number of available
cores on the running system will be picked based on the running system
parameters.

The second patch moves some tests out - these tests don't produce output or
complete in any reasonable amount of time (10m+ for a single unit test is
a little strange - they should be investigated to see if the run time can
be reduced).  I prefer to see these separated out since travis will completely
bail if the test takes longer than 10m to produce output.

The third actually enables the testing, and runs each test leg independently.
This version populates the hugepages mapping.  However, it might be useful
to have the option of running without hugepages enabled (and I have a
separate series that can do this).  However, the --no-huge flag seems to cause
most of the unit tests to break since they either spawn a new instance of
the EAL without passing the hugepage flags, or check against the hugepage API
and use that to determine whether memory can be allocated.

Aaron Conole (3):
  test/meson: auto detect number of cores
  meson-tests: separate slower tests
  ci: enable tests on non-arm platforms

 .ci/linux-build.sh   |  7 +++++++
 .ci/linux-setup.sh   |  6 +++++-
 app/test/meson.build | 43 +++++++++++++++++++++++++++++++++----------
 3 files changed, 45 insertions(+), 11 deletions(-)
  

Comments

David Marchand April 1, 2019, 7:15 p.m. UTC | #1
On Fri, Mar 29, 2019 at 6:23 PM Aaron Conole <aconole@redhat.com> wrote:

> This series is submitted as an RFC because a number of the unit tests are
> not successful in the travis environment.  If all of them were passing,
> this would be submitted as PATCH instead.  It could be accepted as-is but I
> would prefer to see all the tests passing first.
>
> The first patch fixes up the tests to auto-detect the number of cores on
> a machine.  This helps on lower-end systems (such as i3 laptops or
> something)
> where someone wants to verify the functionality.  The number of available
> cores on the running system will be picked based on the running system
> parameters.
>
> The second patch moves some tests out - these tests don't produce output or
> complete in any reasonable amount of time (10m+ for a single unit test is
> a little strange - they should be investigated to see if the run time can
> be reduced).  I prefer to see these separated out since travis will
> completely
> bail if the test takes longer than 10m to produce output.
>
> The third actually enables the testing, and runs each test leg
> independently.
> This version populates the hugepages mapping.  However, it might be useful
> to have the option of running without hugepages enabled (and I have a
> separate series that can do this).  However, the --no-huge flag seems to
> cause
> most of the unit tests to break since they either spawn a new instance of
> the EAL without passing the hugepage flags, or check against the hugepage
> API
> and use that to determine whether memory can be allocated.
>
> Aaron Conole (3):
>   test/meson: auto detect number of cores
>   meson-tests: separate slower tests
>   ci: enable tests on non-arm platforms
>
>  .ci/linux-build.sh   |  7 +++++++
>  .ci/linux-setup.sh   |  6 +++++-
>  app/test/meson.build | 43 +++++++++++++++++++++++++++++++++----------
>  3 files changed, 45 insertions(+), 11 deletions(-)
>


I tried using meson/ninja for the tests, something that bothered me is that
I can't interrupt the tests.
I had to kill manually, meson, ninja and I had some leftover dpdk-test
processes (maybe due to some ^Z I hit...).
Is this expected ?

This is quite frustrating when testing "before" and "after" each patch.
  
Aaron Conole April 1, 2019, 7:28 p.m. UTC | #2
David Marchand <david.marchand@redhat.com> writes:

> On Fri, Mar 29, 2019 at 6:23 PM Aaron Conole <aconole@redhat.com> wrote:
>
>  This series is submitted as an RFC because a number of the unit tests are
>  not successful in the travis environment.  If all of them were passing,
>  this would be submitted as PATCH instead.  It could be accepted as-is but I
>  would prefer to see all the tests passing first.
>
>  The first patch fixes up the tests to auto-detect the number of cores on
>  a machine.  This helps on lower-end systems (such as i3 laptops or something)
>  where someone wants to verify the functionality.  The number of available
>  cores on the running system will be picked based on the running system
>  parameters.
>
>  The second patch moves some tests out - these tests don't produce output or
>  complete in any reasonable amount of time (10m+ for a single unit test is
>  a little strange - they should be investigated to see if the run time can
>  be reduced).  I prefer to see these separated out since travis will completely
>  bail if the test takes longer than 10m to produce output.
>
>  The third actually enables the testing, and runs each test leg independently.
>  This version populates the hugepages mapping.  However, it might be useful
>  to have the option of running without hugepages enabled (and I have a
>  separate series that can do this).  However, the --no-huge flag seems to cause
>  most of the unit tests to break since they either spawn a new instance of
>  the EAL without passing the hugepage flags, or check against the hugepage API
>  and use that to determine whether memory can be allocated.
>
>  Aaron Conole (3):
>    test/meson: auto detect number of cores
>    meson-tests: separate slower tests
>    ci: enable tests on non-arm platforms
>
>   .ci/linux-build.sh   |  7 +++++++
>   .ci/linux-setup.sh   |  6 +++++-
>   app/test/meson.build | 43 +++++++++++++++++++++++++++++++++----------
>   3 files changed, 45 insertions(+), 11 deletions(-)
>
> I tried using meson/ninja for the tests, something that bothered me is that I can't interrupt the tests.
> I had to kill manually, meson, ninja and I had some leftover dpdk-test processes (maybe due to some ^Z I
> hit...).
> Is this expected ?

Certainly not by me.  I usually let everything complete, though (which
takes a looong time if I run the full suite).

> This is quite frustrating when testing "before" and "after" each patch.

Agreed.  :-/

I'll have to try it out to see what's happening.  Does it only happen
with this series?  I'd be surprised, but possibly I introduced some error.
  
David Marchand April 1, 2019, 7:29 p.m. UTC | #3
On Mon, Apr 1, 2019 at 9:28 PM Aaron Conole <aconole@redhat.com> wrote:

> David Marchand <david.marchand@redhat.com> writes:
> > I tried using meson/ninja for the tests, something that bothered me is
> that I can't interrupt the tests.
> > I had to kill manually, meson, ninja and I had some leftover dpdk-test
> processes (maybe due to some ^Z I
> > hit...).
> > Is this expected ?
>
> Certainly not by me.  I usually let everything complete, though (which
> takes a looong time if I run the full suite).
>
> > This is quite frustrating when testing "before" and "after" each patch.
>
> Agreed.  :-/
>
> I'll have to try it out to see what's happening.  Does it only happen
> with this series?  I'd be surprised, but possibly I introduced some error.
>

Nop, I got this even before your first patch.
  
Bruce Richardson April 2, 2019, 9:37 a.m. UTC | #4
On Mon, Apr 01, 2019 at 09:29:51PM +0200, David Marchand wrote:
> On Mon, Apr 1, 2019 at 9:28 PM Aaron Conole <aconole@redhat.com> wrote:
> 
> > David Marchand <david.marchand@redhat.com> writes:
> > > I tried using meson/ninja for the tests, something that bothered me is
> > that I can't interrupt the tests.
> > > I had to kill manually, meson, ninja and I had some leftover dpdk-test
> > processes (maybe due to some ^Z I
> > > hit...).
> > > Is this expected ?
> >
> > Certainly not by me.  I usually let everything complete, though (which
> > takes a looong time if I run the full suite).
> >
> > > This is quite frustrating when testing "before" and "after" each patch.
> >
> > Agreed.  :-/
> >
> > I'll have to try it out to see what's happening.  Does it only happen
> > with this series?  I'd be surprised, but possibly I introduced some error.
> >
> 
> Nop, I got this even before your first patch.
> 

Is this meson related or related to the auto test binary in DPDK. I know
traditionally I've found the test binary rather difficult to kill, but I'd
like to be sure that the meson infrastructure itself isn't making it worse.

/Bruce
  
David Marchand April 2, 2019, 10:09 a.m. UTC | #5
On Tue, Apr 2, 2019 at 11:37 AM Bruce Richardson <bruce.richardson@intel.com>
wrote:

> On Mon, Apr 01, 2019 at 09:29:51PM +0200, David Marchand wrote:
> > On Mon, Apr 1, 2019 at 9:28 PM Aaron Conole <aconole@redhat.com> wrote:
> >
> > > David Marchand <david.marchand@redhat.com> writes:
> > > > I tried using meson/ninja for the tests, something that bothered me
> is
> > > that I can't interrupt the tests.
> > > > I had to kill manually, meson, ninja and I had some leftover
> dpdk-test
> > > processes (maybe due to some ^Z I
> > > > hit...).
> > > > Is this expected ?
> > >
> > > Certainly not by me.  I usually let everything complete, though (which
> > > takes a looong time if I run the full suite).
> > >
> > > > This is quite frustrating when testing "before" and "after" each
> patch.
> > >
> > > Agreed.  :-/
> > >
> > > I'll have to try it out to see what's happening.  Does it only happen
> > > with this series?  I'd be surprised, but possibly I introduced some
> error.
> > >
> >
> > Nop, I got this even before your first patch.
> >
>
> Is this meson related or related to the auto test binary in DPDK. I know
> traditionally I've found the test binary rather difficult to kill, but I'd
> like to be sure that the meson infrastructure itself isn't making it worse.
>

Hard to tell, I would have to retest and investigate, unless Aaron went
further than me.
  
Aaron Conole April 2, 2019, 12:49 p.m. UTC | #6
David Marchand <david.marchand@redhat.com> writes:

> On Tue, Apr 2, 2019 at 11:37 AM Bruce Richardson <bruce.richardson@intel.com> wrote:
>
>  On Mon, Apr 01, 2019 at 09:29:51PM +0200, David Marchand wrote:
>  > On Mon, Apr 1, 2019 at 9:28 PM Aaron Conole <aconole@redhat.com> wrote:
>  > 
>  > > David Marchand <david.marchand@redhat.com> writes:
>  > > > I tried using meson/ninja for the tests, something that bothered me is
>  > > that I can't interrupt the tests.
>  > > > I had to kill manually, meson, ninja and I had some leftover dpdk-test
>  > > processes (maybe due to some ^Z I
>  > > > hit...).
>  > > > Is this expected ?
>  > >
>  > > Certainly not by me.  I usually let everything complete, though (which
>  > > takes a looong time if I run the full suite).
>  > >
>  > > > This is quite frustrating when testing "before" and "after" each patch.
>  > >
>  > > Agreed.  :-/
>  > >
>  > > I'll have to try it out to see what's happening.  Does it only happen
>  > > with this series?  I'd be surprised, but possibly I introduced some error.
>  > >
>  > 
>  > Nop, I got this even before your first patch.
>  > 
>
>  Is this meson related or related to the auto test binary in DPDK. I know
>  traditionally I've found the test binary rather difficult to kill, but I'd
>  like to be sure that the meson infrastructure itself isn't making it worse.
>
> Hard to tell, I would have to retest and investigate, unless Aaron went further than me.

I did some investigation with this.  At least I don't see any lingering
process from meson, but it does take time for the tests to die when I
hit CTRL-C, and I get a warning - I found this related bug:
   https://github.com/mesonbuild/meson/issues/2281

I guess it could be some kind of interplay between the way meson kills
tests?  Looking at the commit that eventually 'closes' the bug, they
merely set a flag rather than pass a kill signal down, so I guess it's
probably never going to be immediate.