autotest: disable lcores_autotest on ppc

Message ID 20210420114508.397249-1-luca.boccassi@gmail.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series autotest: disable lcores_autotest on ppc |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/travis-robot success travis build: passed
ci/github-robot success github build: passed
ci/iol-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS

Commit Message

Luca Boccassi April 20, 2021, 11:45 a.m. UTC
  From: Luca Boccassi <luca.boccassi@microsoft.com>

This test consistently times out on ppc64 builds. Disable it.

Cc: stable@dpdk.org

Signed-off-by: Luca Boccassi <luca.boccassi@microsoft.com>
---
It just times out, nothing useful in the verbose logs. Eg:

https://ci.debian.net/data/autopkgtest/unstable/ppc64el/d/dpdk/11755163/log.gz

 app/test/meson.build | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
  

Comments

Thomas Monjalon April 21, 2021, 3:06 p.m. UTC | #1
20/04/2021 13:45, luca.boccassi@gmail.com:
> This test consistently times out on ppc64 builds. Disable it.

It looks like hiding an issue.
Is there any specific reason for this timeout?
  
Luca Boccassi April 22, 2021, 10:26 a.m. UTC | #2
On Wed, 2021-04-21 at 17:06 +0200, Thomas Monjalon wrote:
> 20/04/2021 13:45, luca.boccassi@gmail.com:
> > This test consistently times out on ppc64 builds. Disable it.
> 
> It looks like hiding an issue.
> Is there any specific reason for this timeout?

As mentioned in the message, there's nothing useful in the logs, it
just times out.
  
Thomas Monjalon April 22, 2021, 11:01 a.m. UTC | #3
22/04/2021 12:26, Luca Boccassi:
> On Wed, 2021-04-21 at 17:06 +0200, Thomas Monjalon wrote:
> > 20/04/2021 13:45, luca.boccassi@gmail.com:
> > > This test consistently times out on ppc64 builds. Disable it.
> > 
> > It looks like hiding an issue.
> > Is there any specific reason for this timeout?
> 
> As mentioned in the message, there's nothing useful in the logs, it
> just times out.

OK but it could be fixed probably instead of disabling.
  
Luca Boccassi April 22, 2021, 12:36 p.m. UTC | #4
On Thu, 2021-04-22 at 13:01 +0200, Thomas Monjalon wrote:
> 22/04/2021 12:26, Luca Boccassi:
> > On Wed, 2021-04-21 at 17:06 +0200, Thomas Monjalon wrote:
> > > 20/04/2021 13:45, luca.boccassi@gmail.com:
> > > > This test consistently times out on ppc64 builds. Disable it.
> > > 
> > > It looks like hiding an issue.
> > > Is there any specific reason for this timeout?
> > 
> > As mentioned in the message, there's nothing useful in the logs, it
> > just times out.
> 
> OK but it could be fixed probably instead of disabling.

I'm sure it could, and it would be great if somebody would do so - but
I do not have either the time or the hardware to take care of PPC-specific problems, apart from the bare minimum to remove blockers to get things done for Debian 11.
  
Thomas Monjalon April 22, 2021, 12:43 p.m. UTC | #5
22/04/2021 14:36, Luca Boccassi:
> On Thu, 2021-04-22 at 13:01 +0200, Thomas Monjalon wrote:
> > 22/04/2021 12:26, Luca Boccassi:
> > > On Wed, 2021-04-21 at 17:06 +0200, Thomas Monjalon wrote:
> > > > 20/04/2021 13:45, luca.boccassi@gmail.com:
> > > > > This test consistently times out on ppc64 builds. Disable it.
> > > > 
> > > > It looks like hiding an issue.
> > > > Is there any specific reason for this timeout?
> > > 
> > > As mentioned in the message, there's nothing useful in the logs, it
> > > just times out.
> > 
> > OK but it could be fixed probably instead of disabling.
> 
> I'm sure it could, and it would be great if somebody would do so - but
> I do not have either the time or the hardware to take care of
> PPC-specific problems, apart from the bare minimum to remove blockers
> to get things done for Debian 11.

First things first, let's Cc the PPC maintainer:
David Christensen <drc@linux.vnet.ibm.com>
  
Luca Boccassi April 22, 2021, 1:45 p.m. UTC | #6
On Thu, 2021-04-22 at 14:43 +0200, Thomas Monjalon wrote:
> 22/04/2021 14:36, Luca Boccassi:
> > On Thu, 2021-04-22 at 13:01 +0200, Thomas Monjalon wrote:
> > > 22/04/2021 12:26, Luca Boccassi:
> > > > On Wed, 2021-04-21 at 17:06 +0200, Thomas Monjalon wrote:
> > > > > 20/04/2021 13:45, luca.boccassi@gmail.com:
> > > > > > This test consistently times out on ppc64 builds. Disable it.
> > > > > 
> > > > > It looks like hiding an issue.
> > > > > Is there any specific reason for this timeout?
> > > > 
> > > > As mentioned in the message, there's nothing useful in the logs, it
> > > > just times out.
> > > 
> > > OK but it could be fixed probably instead of disabling.
> > 
> > I'm sure it could, and it would be great if somebody would do so - but
> > I do not have either the time or the hardware to take care of
> > PPC-specific problems, apart from the bare minimum to remove blockers
> > to get things done for Debian 11.
> 
> First things first, let's Cc the PPC maintainer:
> David Christensen <drc@linux.vnet.ibm.com>

https://bugs.dpdk.org/show_bug.cgi?id=684
  
David Christensen April 27, 2021, 5:38 p.m. UTC | #7
>> First things first, let's Cc the PPC maintainer:
>> David Christensen <drc@linux.vnet.ibm.com>
> 
> https://bugs.dpdk.org/show_bug.cgi?id=684

Tried GCC 10 on RHEL 8.3 and running the lcores_autotest individually 
does not produce any errors.  I can't see in the log file how the test 
is called when it generates an error.  Can anyone point out the 
parameters required to run the test in the situation?

Dave
  
David Christensen April 27, 2021, 6:14 p.m. UTC | #8
On 4/20/21 4:45 AM, luca.boccassi@gmail.com wrote:
> From: Luca Boccassi <luca.boccassi@microsoft.com>
> 
> This test consistently times out on ppc64 builds. Disable it.
> 
> Cc: stable@dpdk.org
> 
> Signed-off-by: Luca Boccassi <luca.boccassi@microsoft.com>

Is there something new about how/when this test is run during continuous 
integration?  The test time is dependent on the value of RTE_MAX_LCORE 
which is currently 1536.  When I run the test locally it takes around 50 
seconds to complete and completes without errors.  If I reduce the value 
down to 128 the test completes in around 5 seconds.  The current value 
has been in the code for nearly 2 years so I'm curious why the sudden 
change in CI results.

We could could reduce the default value for RTE_MAX_LCORE (I don't often 
have access to a system that actually has this many cores) or we could 
skip the test in this situation as well.

Dave
  
David Christensen April 28, 2021, 7:36 p.m. UTC | #9
On 4/20/21 4:45 AM, luca.boccassi@gmail.com wrote:
> From: Luca Boccassi <luca.boccassi@microsoft.com>
> 
> This test consistently times out on ppc64 builds. Disable it.
> 
> Cc: stable@dpdk.org
> 
> Signed-off-by: Luca Boccassi <luca.boccassi@microsoft.com>
> ---

NAK.  Will resolve with a different patch to reduce the max_lcore value 
used for PPC builds.  Both x86 and PPC require > 30 seconds if the 
max_lcore value is set to 1536.

Dave
  

Patch

diff --git a/app/test/meson.build b/app/test/meson.build
index bd50818f82..803a87dd4e 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -215,7 +215,6 @@  fast_tests = [
         ['hash_autotest', true],
         ['interrupt_autotest', true],
         ['ipfrag_autotest', false],
-        ['lcores_autotest', true],
         ['logs_autotest', true],
         ['lpm_autotest', true],
         ['lpm6_autotest', true],
@@ -422,6 +421,11 @@  if dpdk_conf.has('RTE_CRYPTO_SCHEDULER')
 	test_deps += 'crypto_scheduler'
 endif
 
+# This test consistently times out on ppc64
+if arch_subdir != 'ppc'
+	fast_tests += [['lcores_autotest', true]]
+endif
+
 foreach d:test_deps
 	def_lib = get_option('default_library')
 	test_dep_objs += get_variable(def_lib + '_rte_' + d)