[v3,0/6] support oops handling

Message ID 20210906041732.1019743-1-jerinj@marvell.com (mailing list archive)


Jerin Jacob Kollanukkaran Sept. 6, 2021, 4:17 a.m. UTC
  From: Jerin Jacob <jerinj@marvell.com>


- Updated the release notes
- Introduce "--no-oops" EAL option to disable default EAL handler.
  Default EAL oops handler stores the existing handler and invoke after
  decoding. So there may not be explicit use case to use this. But added,
  just in case for control to application. Taken the similar appoarach like
  telemetry where by default it is enabled to avoid updating all the
  existing applications.
- Change oops_print to fprintf as rte_log is not safe from fault handler.(Stephen)
- Removed "sig" from signal_db as it is duplicate(Stephen)
- Add const to mem32_dump(Stephen)
- Add const to oops_signals[](Stephen)
- Fix powerpc build (David Christensen)

It is handy to get detailed OOPS information like Linux kernel
when DPDK application crashes without losing any of the features
provided by coredump infrastructure by the OS.

This patch series introduces the APIs to handle OOPS in DPDK.

Following section details the implementation and API interface to application.

On rte_eal_init() invocation and if –no-oops not provided in the EAL
command line argument, then EAL library installs the
oops handler for the essential signals.
The rte_oops_signals_enabled() API provides the list
of signals the library installed by the EAL.

The default EAL oops handler decodes the oops message using rte_oops_decode()
and then calls the signal handler installed by the application 
before invoking the rte_eal_init(). This scheme will also enable the use of
the default coredump handler(for gdb etc.) provided by OS 
if the application does not install any specific signal handler. 

The second case where the application installs the signal handler after 
the rte_eal_init() invocation, rte_oops_decode() provides the means of
decoding the oops message in the application's fault handler.

Patch split:

Patch 1/6: defines the API and stub implementation for Unix systems
Patch 2/6: The API implementation
Patch 3/6: add an optional libunwind dependency to DPDK for better backtrace in oops.
Patch 4/6: x86 specific archinfo like x86 register dump on oops
Patch 5/6: arm64 specific archinfo like arm64 register dump on oops
Patch 6/6: UT for the new APIs

Example command for the build, run, and output logs of an x86-64 linux machine.

meson --buildtype debug build
ninja -C build

echo "oops_autotest" | ./build/app/test/dpdk-test --no-huge  -c 0x2

Signal info:
PID:           2439496
Signal number: 11
Fault address: 0x5

[  0x55e8b56d5cee]: test_oops_generate()+0x75
[  0x55e8b5459843]: unit_test_suite_runner()+0x1aa
[  0x55e8b56d605c]: test_oops()+0x13
[  0x55e8b544bdfc]: cmd_autotest_parsed()+0x55
[  0x55e8b6063a0d]: cmdline_parse()+0x319
[  0x55e8b6061dea]: cmdline_valid_buffer()+0x35
[  0x55e8b6066bd8]: rdline_char_in()+0xc48
[  0x55e8b606221c]: cmdline_in()+0x62
[  0x55e8b6062495]: cmdline_interact()+0x56
[  0x55e8b5459314]: main()+0x65e
[  0x7f54b25d2b25]: __libc_start_main()+0xd5
[  0x55e8b544bc9e]: _start()+0x2e

Arch info:
R8 : 0x0000000000000000  R9 : 0x0000000000000000
R10: 0x00007f54b25b8b48  R11: 0x00007f54b25e7930
R12: 0x00007fffc695e610  R13: 0x0000000000000000
R14: 0x0000000000000000  R15: 0x0000000000000000
RAX: 0x0000000000000005  RBX: 0x0000000000000001
RCX: 0x00007f54b278a943  RDX: 0x3769043bf13a2594
RBP: 0x00007fffc6958340  RSP: 0x00007fffc6958330
RSI: 0x0000000000000000  RDI: 0x000055e8c4c1e380
RIP: 0x000055e8b56d5cee  EFL: 0x0000000000010246

Stack dump:
0x7fffc6958330: 0x6000000
0x7fffc6958334: 0x0
0x7fffc6958338: 0x30cfeac5
0x7fffc695833c: 0x0
0x7fffc6958340: 0xe08395c6
0x7fffc6958344: 0xff7f0000
0x7fffc6958348: 0x439845b5
0x7fffc695834c: 0xe8550000
0x7fffc6958350: 0x0
0x7fffc6958354: 0xb000000
0x7fffc6958358: 0x20445bb9
0x7fffc695835c: 0xe8550000
0x7fffc6958360: 0x925506b6
0x7fffc6958364: 0x0
0x7fffc6958368: 0x0
0x7fffc695836c: 0x0

Code dump:
0x55e8b56d5cee: 0xc7000000
0x55e8b56d5cf2: 0xeb12
0x55e8b56d5cf6: 0xfb6054b
0x55e8b56d5cfa: 0x87540f84
0x55e8b56d5cfe: 0xc07407b8
0x55e8b56d5d02: 0x0
0x55e8b56d5d06: 0xeb05b8ff
0x55e8b56d5d0a: 0xffffffc9
0x55e8b56d5d0e: 0xc3554889
0x55e8b56d5d12: 0xe54881ec
0x55e8b56d5d16: 0xc0000000
0x55e8b56d5d1a: 0x89bd4cff
0x55e8b56d5d1e: 0xffff4889
0x55e8b56d5d22: 0xb540ffff

Jerin Jacob (6):
  eal: introduce oops handling API
  eal: oops handling API implementation
  eal: support libunwind based backtrace
  eal/x86: support register dump for oops
  eal/arm64: support register dump for oops
  test/oops: support unit test case for oops handling APIs

 .github/workflows/build.yml               |   2 +-
 .travis.yml                               |   2 +-
 app/test/meson.build                      |   2 +
 app/test/test_oops.c                      | 122 +++++++++
 config/meson.build                        |   8 +
 doc/api/doxy-api-index.md                 |   3 +-
 doc/guides/linux_gsg/eal_args.include.rst |   4 +
 doc/guides/rel_notes/release_21_11.rst    |  10 +
 lib/eal/common/eal_common_options.c       |   5 +
 lib/eal/common/eal_internal_cfg.h         |   1 +
 lib/eal/common/eal_options.h              |   2 +
 lib/eal/common/eal_private.h              |   3 +
 lib/eal/freebsd/eal.c                     |   8 +
 lib/eal/include/meson.build               |   1 +
 lib/eal/include/rte_oops.h                | 101 ++++++++
 lib/eal/linux/eal.c                       |   7 +
 lib/eal/unix/eal_oops.c                   | 293 ++++++++++++++++++++++
 lib/eal/unix/meson.build                  |   1 +
 lib/eal/version.map                       |   4 +
 19 files changed, 576 insertions(+), 3 deletions(-)
 create mode 100644 app/test/test_oops.c
 create mode 100644 lib/eal/include/rte_oops.h
 create mode 100644 lib/eal/unix/eal_oops.c