[v2] eal/linux: enable the hugepage mem dump

Message ID 20220401091004.3227117-1-fengli@smartx.com (mailing list archive)
State New
Delegated to: David Marchand
Headers
Series [v2] eal/linux: enable the hugepage mem dump |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/iol-mellanox-Performance success Performance Testing PASS
ci/intel-Testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/github-robot: build success github build: passed
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS

Commit Message

Li Feng April 1, 2022, 9:10 a.m. UTC
  These hugepages include important structures. we should dump these
hugepages into a coredump file for debugging when generating a coredump.

Signed-off-by: Li Feng <fengli@smartx.com>
---
 lib/eal/linux/eal_memalloc.c | 2 ++
 1 file changed, 2 insertions(+)
  

Comments

Stephen Hemminger April 5, 2022, 10:46 p.m. UTC | #1
On Fri,  1 Apr 2022 17:10:04 +0800
Li Feng <fengli@smartx.com> wrote:

> These hugepages include important structures. we should dump these
> hugepages into a coredump file for debugging when generating a coredump.
> 
> Signed-off-by: Li Feng <fengli@smartx.com>
> ---
>  lib/eal/linux/eal_memalloc.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/lib/eal/linux/eal_memalloc.c b/lib/eal/linux/eal_memalloc.c
> index f8b1588cae..93c4f396cf 100644
> --- a/lib/eal/linux/eal_memalloc.c
> +++ b/lib/eal/linux/eal_memalloc.c
> @@ -677,6 +677,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
>  				__func__);
>  #endif
>  
> +	eal_mem_set_dump(addr, alloc_sz, true);
> +
>  	huge_recover_sigbus();
>  
>  	ms->addr = addr;


Don't merge this patch as is please; it would cause a lot of pain
in a cloud environment.

In our environment core dumps are collected (via systemd) and uploaded
to a central server. With this kind of change the processing would get
overloaded with multi-gigabyte core dump size. Probably couldn't even
save a core dump on these kind of smart nics.


This needs to be optional (from command line) and default to the current
behavior (not dumping huge pages).
  
Dmitry Kozlyuk April 5, 2022, 11:14 p.m. UTC | #2
2022-04-05 15:46 (UTC-0700), Stephen Hemminger:
> On Fri,  1 Apr 2022 17:10:04 +0800
> Li Feng <fengli@smartx.com> wrote:
> 
> > These hugepages include important structures. we should dump these
> > hugepages into a coredump file for debugging when generating a coredump.
> > 
> > Signed-off-by: Li Feng <fengli@smartx.com>
> > ---
> >  lib/eal/linux/eal_memalloc.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/lib/eal/linux/eal_memalloc.c b/lib/eal/linux/eal_memalloc.c
> > index f8b1588cae..93c4f396cf 100644
> > --- a/lib/eal/linux/eal_memalloc.c
> > +++ b/lib/eal/linux/eal_memalloc.c
> > @@ -677,6 +677,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
> >  				__func__);
> >  #endif
> >  
> > +	eal_mem_set_dump(addr, alloc_sz, true);
> > +
> >  	huge_recover_sigbus();
> >  
> >  	ms->addr = addr;  
> 
> 
> Don't merge this patch as is please; it would cause a lot of pain
> in a cloud environment.
> 
> In our environment core dumps are collected (via systemd) and uploaded
> to a central server. With this kind of change the processing would get
> overloaded with multi-gigabyte core dump size. Probably couldn't even
> save a core dump on these kind of smart nics.
> 
> 
> This needs to be optional (from command line) and default to the current
> behavior (not dumping huge pages).

Maybe expose eal_mem_set_dump() as rte_mem_set_dump()?
This would allow to implement the feature easily using memory callbacks.
Better, one can enable hugepages to dump selectively:
for example, dump some interesting hash tables but skip rings and mempools.
  
Li Feng April 6, 2022, 2:11 a.m. UTC | #3
On Wed, Apr 6, 2022 at 6:46 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Fri,  1 Apr 2022 17:10:04 +0800
> Li Feng <fengli@smartx.com> wrote:
>
> > These hugepages include important structures. we should dump these
> > hugepages into a coredump file for debugging when generating a coredump.
> >
> > Signed-off-by: Li Feng <fengli@smartx.com>
> > ---
> >  lib/eal/linux/eal_memalloc.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/lib/eal/linux/eal_memalloc.c b/lib/eal/linux/eal_memalloc.c
> > index f8b1588cae..93c4f396cf 100644
> > --- a/lib/eal/linux/eal_memalloc.c
> > +++ b/lib/eal/linux/eal_memalloc.c
> > @@ -677,6 +677,8 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
> >                               __func__);
> >  #endif
> >
> > +     eal_mem_set_dump(addr, alloc_sz, true);
> > +
> >       huge_recover_sigbus();
> >
> >       ms->addr = addr;
>
>
> Don't merge this patch as is please; it would cause a lot of pain
> in a cloud environment.
>
> In our environment core dumps are collected (via systemd) and uploaded
> to a central server. With this kind of change the processing would get
> overloaded with multi-gigabyte core dump size. Probably couldn't even
> save a core dump on these kind of smart nics.
>
>
> This needs to be optional (from command line) and default to the current
> behavior (not dumping huge pages).

On Linux, just with this patch, the coredump will not include these
hugepages which are shared,
we should write 0x73 to /proc/self/coredump_filter.
This is the coredump_filter explanation:
       Since  kernel 2.6.23, the Linux-specific
/proc/[pid]/coredump_filter file can be used to control which memory
segments are written to the core dump
       file in the event that a core dump is performed for the process
with the corresponding process ID.

       The value in the file is a bit mask of memory mapping types
(see mmap(2)).  If a bit is set in the mask, then memory mappings of
the  corresponding
       type are dumped; otherwise they are not dumped.  The bits in
this file have the following meanings:

           bit 0  Dump anonymous private mappings.
           bit 1  Dump anonymous shared mappings.
           bit 2  Dump file-backed private mappings.
           bit 3  Dump file-backed shared mappings.
           bit 4 (since Linux 2.6.24)
                  Dump ELF headers.
           bit 5 (since Linux 2.6.28)
                  Dump private huge pages.
           bit 6 (since Linux 2.6.28)
                  Dump shared huge pages.
           bit 7 (since Linux 4.4)
                  Dump private DAX pages.
           bit 8 (since Linux 4.4)
                  Dump shared DAX pages.

       By  default, the following bits are set: 0, 1, 4 (if the
CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS kernel configuration option is
enabled), and 5.  This
       default can be modified at boot time using the coredump_filter
boot option.
  
Stephen Hemminger July 5, 2023, 11:19 p.m. UTC | #4
On Wed, 6 Apr 2022 02:14:46 +0300
Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote:

> > 
> > Don't merge this patch as is please; it would cause a lot of pain
> > in a cloud environment.
> > 
> > In our environment core dumps are collected (via systemd) and uploaded
> > to a central server. With this kind of change the processing would get
> > overloaded with multi-gigabyte core dump size. Probably couldn't even
> > save a core dump on these kind of smart nics.
> > 
> > 
> > This needs to be optional (from command line) and default to the current
> > behavior (not dumping huge pages).  
> 
> Maybe expose eal_mem_set_dump() as rte_mem_set_dump()?
> This would allow to implement the feature easily using memory callbacks.
> Better, one can enable hugepages to dump selectively:
> for example, dump some interesting hash tables but skip rings and mempools.

As was mentioned in thread core_dump_filter will also control these.
So it won't impact users who do not enable it.
Since the granularity is a the page level, it doesn't make sense
to try and be selective for hash tables, rings, mempools etc.

Looks good as is, though it might need a rebase.

Acked-by: Stephen Hemminger <stephen@networkplumber.org>
  

Patch

diff --git a/lib/eal/linux/eal_memalloc.c b/lib/eal/linux/eal_memalloc.c
index f8b1588cae..93c4f396cf 100644
--- a/lib/eal/linux/eal_memalloc.c
+++ b/lib/eal/linux/eal_memalloc.c
@@ -677,6 +677,8 @@  alloc_seg(struct rte_memseg *ms, void *addr, int socket_id,
 				__func__);
 #endif
 
+	eal_mem_set_dump(addr, alloc_sz, true);
+
 	huge_recover_sigbus();
 
 	ms->addr = addr;