test: move debug_autotest out of fast suite

Message ID 20210225194857.1991315-1-luca.boccassi@gmail.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series test: move debug_autotest out of fast suite |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/travis-robot fail travis build: failed
ci/github-robot success github build: passed
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-mellanox-Functional success Functional Testing PASS
ci/iol-testing fail Testing issues

Commit Message

Luca Boccassi Feb. 25, 2021, 7:48 p.m. UTC
  From: Luca Boccassi <luca.boccassi@microsoft.com>

It consistently fails when ran on a build machine within the
reproducible build environment, so move it out of there, so
that we can run the fast suite at build time when packaging.

Repro test environment:

https://salsa.debian.org/salsa-ci-team/pipeline/raw/master/salsa-ci.yml
https://salsa.debian.org/salsa-ci-team/pipeline/

Cc: stable@dpdk.org

Signed-off-by: Luca Boccassi <luca.boccassi@microsoft.com>
---
 app/test/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

David Marchand March 1, 2021, 8:03 a.m. UTC | #1
On Thu, Feb 25, 2021 at 8:49 PM <luca.boccassi@gmail.com> wrote:
>
> From: Luca Boccassi <luca.boccassi@microsoft.com>
>
> It consistently fails when ran on a build machine within the
> reproducible build environment, so move it out of there, so
> that we can run the fast suite at build time when packaging.

This test is quite simple.
Surprising that it consistently fails.


>
> Repro test environment:
>
> https://salsa.debian.org/salsa-ci-team/pipeline/raw/master/salsa-ci.yml
> https://salsa.debian.org/salsa-ci-team/pipeline/

I am not familiar with this CI.
Where can we find the logs of such a failure?
  
Luca Boccassi March 2, 2021, 2:13 p.m. UTC | #2
On Mon, 2021-03-01 at 09:03 +0100, David Marchand wrote:
> On Thu, Feb 25, 2021 at 8:49 PM <luca.boccassi@gmail.com> wrote:
> > From: Luca Boccassi <luca.boccassi@microsoft.com>
> > 
> > It consistently fails when ran on a build machine within the
> > reproducible build environment, so move it out of there, so
> > that we can run the fast suite at build time when packaging.
> 
> This test is quite simple.
> Surprising that it consistently fails.

It is very weird. I think it times out after the fork().

> > Repro test environment:
> > 
> > https://salsa.debian.org/salsa-ci-team/pipeline/raw/master/salsa-ci.yml
> > https://salsa.debian.org/salsa-ci-team/pipeline/
> 
> I am not familiar with this CI.
> Where can we find the logs of such a failure?

It does various scramblings to ensure the produced binaries are bit-by-
bit identical despite changes like number of CPUs, time of the day,
timezone, filesystem, locale, etc.

Not much useful info in the logs unfortunately, despite the verbose
run.

https://salsa.debian.org/bluca/dpdk/-/jobs/1453534

RTE>>EAL: Detected 2 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/debug_autotest/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: Probing VFIO support...
APP: HPET is not enabled, using TSC as default timer
RTE>>debug_autotest
10: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x61c6a) [0x5562b61f7c6a]]
9: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea) [0x7f9793188d0a]]
8: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x386ae) [0x5562b61ce6ae]]
7: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(cmdline_in+0x71) [0x7f9793925841]]
6: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(rdline_char_in+0x36b) [0x7f9793928abb]]
5: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(+0x3770) [0x7f9793925770]]
4: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(cmdline_parse+0x1af) [0x7f979392664f]]
3: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x61d5b) [0x5562b61f7d5b]]
2: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0xc2c13) [0x5562b6258c13]]
1: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_eal.so.21(rte_dump_stack+0x2e) [0x7f9793a931be]]
PANIC in test_panic():
Test Debug
11: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x61c6a) [0x5562b61f7c6a]]
10: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea) [0x7f9793188d0a]]
9: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x386ae) [0x5562b61ce6ae]]
8: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(cmdline_in+0x71) [0x7f9793925841]]
7: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(rdline_char_in+0x36b) [0x7f9793928abb]]
6: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(+0x3770) [0x7f9793925770]]
5: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(cmdline_parse+0x1af) [0x7f979392664f]]
4: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x61d5b) [0x5562b61f7d5b]]
3: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x3d671) [0x5562b61d3671]]
2: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_eal.so.21(__rte_panic+0xc1) [0x7f9793a7187f]]
1: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_eal.so.21(rte_dump_stack+0x2e) [0x7f9793a931be]]
<...>
debug_autotest time out (After 30.0 seconds)
  
David Marchand March 2, 2021, 2:19 p.m. UTC | #3
On Tue, Mar 2, 2021 at 3:13 PM Luca Boccassi <bluca@debian.org> wrote:
> RTE>>EAL: Detected 2 lcore(s)
> EAL: Detected 1 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/debug_autotest/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: Probing VFIO support...
> APP: HPET is not enabled, using TSC as default timer
> RTE>>debug_autotest
> 10: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x61c6a) [0x5562b61f7c6a]]
> 9: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea) [0x7f9793188d0a]]
> 8: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x386ae) [0x5562b61ce6ae]]
> 7: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(cmdline_in+0x71) [0x7f9793925841]]
> 6: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(rdline_char_in+0x36b) [0x7f9793928abb]]
> 5: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(+0x3770) [0x7f9793925770]]
> 4: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(cmdline_parse+0x1af) [0x7f979392664f]]
> 3: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x61d5b) [0x5562b61f7d5b]]
> 2: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0xc2c13) [0x5562b6258c13]]
> 1: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_eal.so.21(rte_dump_stack+0x2e) [0x7f9793a931be]]
> PANIC in test_panic():
> Test Debug
> 11: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x61c6a) [0x5562b61f7c6a]]
> 10: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea) [0x7f9793188d0a]]
> 9: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x386ae) [0x5562b61ce6ae]]
> 8: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(cmdline_in+0x71) [0x7f9793925841]]
> 7: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(rdline_char_in+0x36b) [0x7f9793928abb]]
> 6: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(+0x3770) [0x7f9793925770]]
> 5: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(cmdline_parse+0x1af) [0x7f979392664f]]
> 4: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x61d5b) [0x5562b61f7d5b]]
> 3: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x3d671) [0x5562b61d3671]]
> 2: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_eal.so.21(__rte_panic+0xc1) [0x7f9793a7187f]]
> 1: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_eal.so.21(rte_dump_stack+0x2e) [0x7f9793a931be]]
> <...>
> debug_autotest time out (After 30.0 seconds)

Ah a timeout.
Could it be that coredumps (of the children processes here) are being captured?

Can you test with this patch:
https://patchwork.dpdk.org/project/dpdk/patch/20210125150539.27537-1-david.marchand@redhat.com/
?
It disables coredump generation in the autotests when failure is expected.
  
Luca Boccassi March 2, 2021, 4:02 p.m. UTC | #4
On Tue, 2021-03-02 at 15:19 +0100, David Marchand wrote:
> On Tue, Mar 2, 2021 at 3:13 PM Luca Boccassi <bluca@debian.org> wrote:
> > RTE>>EAL: Detected 2 lcore(s)
> > EAL: Detected 1 NUMA nodes
> > EAL: Multi-process socket /var/run/dpdk/debug_autotest/mp_socket
> > EAL: Selected IOVA mode 'VA'
> > EAL: Probing VFIO support...
> > APP: HPET is not enabled, using TSC as default timer
> > RTE>>debug_autotest
> > 10: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x61c6a) [0x5562b61f7c6a]]
> > 9: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea) [0x7f9793188d0a]]
> > 8: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x386ae) [0x5562b61ce6ae]]
> > 7: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(cmdline_in+0x71) [0x7f9793925841]]
> > 6: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(rdline_char_in+0x36b) [0x7f9793928abb]]
> > 5: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(+0x3770) [0x7f9793925770]]
> > 4: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(cmdline_parse+0x1af) [0x7f979392664f]]
> > 3: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x61d5b) [0x5562b61f7d5b]]
> > 2: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0xc2c13) [0x5562b6258c13]]
> > 1: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_eal.so.21(rte_dump_stack+0x2e) [0x7f9793a931be]]
> > PANIC in test_panic():
> > Test Debug
> > 11: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x61c6a) [0x5562b61f7c6a]]
> > 10: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea) [0x7f9793188d0a]]
> > 9: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x386ae) [0x5562b61ce6ae]]
> > 8: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(cmdline_in+0x71) [0x7f9793925841]]
> > 7: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(rdline_char_in+0x36b) [0x7f9793928abb]]
> > 6: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(+0x3770) [0x7f9793925770]]
> > 5: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_cmdline.so.21(cmdline_parse+0x1af) [0x7f979392664f]]
> > 4: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x61d5b) [0x5562b61f7d5b]]
> > 3: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/dpdk-test(+0x3d671) [0x5562b61d3671]]
> > 2: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_eal.so.21(__rte_panic+0xc1) [0x7f9793a7187f]]
> > 1: [/tmp/reprotest.YBma9j/build-experiment-1/build-experiment-1/obj-x86_64-linux-gnu/app/test/../../lib/librte_eal.so.21(rte_dump_stack+0x2e) [0x7f9793a931be]]
> > <...>
> > debug_autotest time out (After 30.0 seconds)
> 
> Ah a timeout.
> Could it be that coredumps (of the children processes here) are being captured?
> 
> Can you test with this patch:
> https://patchwork.dpdk.org/project/dpdk/patch/20210125150539.27537-1-david.marchand@redhat.com/
> ?
> It disables coredump generation in the autotests when failure is expected.

That works, thank you!

Please mark it for backporting, so I can include it in 20.11
  

Patch

diff --git a/app/test/meson.build b/app/test/meson.build
index 561e493a29..6230d33944 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -190,7 +190,6 @@  fast_tests = [
         ['common_autotest', true],
         ['cpuflags_autotest', true],
         ['cycles_autotest', true],
-        ['debug_autotest', true],
         ['eal_flags_c_opt_autotest', false],
         ['eal_flags_main_opt_autotest', false],
         ['eal_flags_n_opt_autotest', false],
@@ -305,6 +304,7 @@  perf_test_names = [
         'hash_readwrite_lf_perf_autotest',
         'trace_perf_autotest',
 	'ipsec_perf_autotest',
+        'debug_autotest',
 ]
 
 driver_test_names = [