eal: fix mem alloc from control thread if socket 0 is unused

Message ID 20211029094929.29864-1-olivier.matz@6wind.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series eal: fix mem alloc from control thread if socket 0 is unused |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/github-robot: build success github build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-intel-Performance fail Performance Testing issues
ci/iol-intel-Functional success Functional Testing PASS

Commit Message

Olivier Matz Oct. 29, 2021, 9:49 a.m. UTC
  From: Ilyes Ben Hamouda <ilyes.ben_hamouda@6wind.com>

When using rte_malloc() from a control thread, the used heap is the one
from numa socket 0, which may not have available memory.

Fix this by selecting the first socket which has available memory.

Note: malloc_get_numa_socket() is only used from one .c file, so move
it there, and remove the inline keyword.

Fixes: b94580d6887e ("malloc: avoid unknown socket id")
Cc: stable@dpdk.org

Signed-off-by: Ilyes Ben Hamouda <ilyes.ben_hamouda@6wind.com>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
---
 lib/eal/common/malloc_heap.c | 20 ++++++++++++++++++++
 lib/eal/common/malloc_heap.h | 11 -----------
 2 files changed, 20 insertions(+), 11 deletions(-)
  

Comments

David Marchand Nov. 3, 2021, 8:26 p.m. UTC | #1
On Fri, Oct 29, 2021 at 11:49 AM Olivier Matz <olivier.matz@6wind.com> wrote:
>
> From: Ilyes Ben Hamouda <ilyes.ben_hamouda@6wind.com>
>
> When using rte_malloc() from a control thread, the used heap is the one
> from numa socket 0, which may not have available memory.
>
> Fix this by selecting the first socket which has available memory.
>
> Note: malloc_get_numa_socket() is only used from one .c file, so move
> it there, and remove the inline keyword.
>
> Fixes: b94580d6887e ("malloc: avoid unknown socket id")
> Cc: stable@dpdk.org
>
> Signed-off-by: Ilyes Ben Hamouda <ilyes.ben_hamouda@6wind.com>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>  lib/eal/common/malloc_heap.c | 20 ++++++++++++++++++++
>  lib/eal/common/malloc_heap.h | 11 -----------
>  2 files changed, 20 insertions(+), 11 deletions(-)
>
> diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c
> index ee400f38ec..6eff9a2284 100644
> --- a/lib/eal/common/malloc_heap.c
> +++ b/lib/eal/common/malloc_heap.c
> @@ -694,6 +694,26 @@ malloc_heap_alloc_on_heap_id(const char *type, size_t size,
>         return ret;
>  }
>
> +static unsigned int
> +malloc_get_numa_socket(void)
> +{
> +       const struct internal_config *conf = eal_get_internal_configuration();
> +       unsigned int socket_id = rte_socket_id();
> +       unsigned int idx;
> +
> +       if (socket_id != (unsigned int)SOCKET_ID_ANY)
> +               return socket_id;
> +

Strictly speaking, this affects more than control threads.

socket_id == SOCKET_ID_ANY is possible if current thread is started on
cores from 2 different sockets.
See thread_update_affinity(), which calls eal_cpuset_socket_id().

As a consequence, this change affects lcores too like if a lcore is
pinned on cores from two sockets like --lcores 0@(1,2) with core 1 on
numa1 and core 2 on numa2 (giving this odd example on purpose).
Previously, all allocations from this lcore would end up on numa0,
regardless of memory availability.
So this change fixes allocations for this odd setup too.


> +       /* for control threads, return first socket where memory is available */
> +       for (idx = 0; idx < rte_socket_count(); idx++) {
> +               socket_id = rte_socket_id_by_idx(idx);
> +               if (conf->socket_mem[socket_id] != 0)
> +                       return socket_id;
> +       }

We could look at current thread cpu affinity to check to which sockets
it is bound to (like what is done in eal_cpuset_socket_id()).
But that would make the code rather complex and the setups in which it
helps are surely even even more odd than what I mentionned above.

Your proposed heuristic looks fine to me, let's go with it if nobody objects.


> +
> +       return rte_socket_id_by_idx(0);
> +}
> +
>  void *
>  malloc_heap_alloc(const char *type, size_t size, int socket_arg,
>                 unsigned int flags, size_t align, size_t bound, bool contig)
> diff --git a/lib/eal/common/malloc_heap.h b/lib/eal/common/malloc_heap.h
> index 3a6ec6ecf0..3a29d024b4 100644
> --- a/lib/eal/common/malloc_heap.h
> +++ b/lib/eal/common/malloc_heap.h
> @@ -33,17 +33,6 @@ struct malloc_heap {
>         char name[RTE_HEAP_NAME_MAX_LEN];
>  } __rte_cache_aligned;
>
> -static inline unsigned
> -malloc_get_numa_socket(void)
> -{
> -       unsigned socket_id = rte_socket_id();
> -
> -       if (socket_id == (unsigned)SOCKET_ID_ANY)
> -               return 0;
> -
> -       return socket_id;
> -}
> -
>  void *
>  malloc_heap_alloc(const char *type, size_t size, int socket, unsigned int flags,
>                 size_t align, size_t bound, bool contig);
> --
> 2.30.2
>
  
Olivier Matz Nov. 4, 2021, 8:54 a.m. UTC | #2
On Wed, Nov 03, 2021 at 09:26:02PM +0100, David Marchand wrote:
> On Fri, Oct 29, 2021 at 11:49 AM Olivier Matz <olivier.matz@6wind.com> wrote:
> >
> > From: Ilyes Ben Hamouda <ilyes.ben_hamouda@6wind.com>
> >
> > When using rte_malloc() from a control thread, the used heap is the one
> > from numa socket 0, which may not have available memory.
> >
> > Fix this by selecting the first socket which has available memory.
> >
> > Note: malloc_get_numa_socket() is only used from one .c file, so move
> > it there, and remove the inline keyword.
> >
> > Fixes: b94580d6887e ("malloc: avoid unknown socket id")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Ilyes Ben Hamouda <ilyes.ben_hamouda@6wind.com>
> > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> > ---
> >  lib/eal/common/malloc_heap.c | 20 ++++++++++++++++++++
> >  lib/eal/common/malloc_heap.h | 11 -----------
> >  2 files changed, 20 insertions(+), 11 deletions(-)
> >
> > diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c
> > index ee400f38ec..6eff9a2284 100644
> > --- a/lib/eal/common/malloc_heap.c
> > +++ b/lib/eal/common/malloc_heap.c
> > @@ -694,6 +694,26 @@ malloc_heap_alloc_on_heap_id(const char *type, size_t size,
> >         return ret;
> >  }
> >
> > +static unsigned int
> > +malloc_get_numa_socket(void)
> > +{
> > +       const struct internal_config *conf = eal_get_internal_configuration();
> > +       unsigned int socket_id = rte_socket_id();
> > +       unsigned int idx;
> > +
> > +       if (socket_id != (unsigned int)SOCKET_ID_ANY)
> > +               return socket_id;
> > +
> 
> Strictly speaking, this affects more than control threads.
> 
> socket_id == SOCKET_ID_ANY is possible if current thread is started on
> cores from 2 different sockets.
> See thread_update_affinity(), which calls eal_cpuset_socket_id().
> 
> As a consequence, this change affects lcores too like if a lcore is
> pinned on cores from two sockets like --lcores 0@(1,2) with core 1 on
> numa1 and core 2 on numa2 (giving this odd example on purpose).
> Previously, all allocations from this lcore would end up on numa0,
> regardless of memory availability.
> So this change fixes allocations for this odd setup too.

I didn't know this was possible (and still wonder in which case it can
be useful). But yes, I can send a new version with an updated title and
commit log. What about this one below?

  eal: fix mem alloc from thread having unknown socket id

  When using rte_malloc() from a thread which is not bound to a numa
  socket (the typical case is a control thread, but it can also happen
  on a dataplane thread if its cpu affinity is on cores attached to
  several sockets), the used heap is the one from numa socket 0, which
  may not have available memory.

  Fix this by selecting the first socket which has available memory.

  Note: malloc_get_numa_socket() is only used from one .c file, so move
  it there, and remove the inline keyword.


> 
> > +       /* for control threads, return first socket where memory is available */
> > +       for (idx = 0; idx < rte_socket_count(); idx++) {
> > +               socket_id = rte_socket_id_by_idx(idx);
> > +               if (conf->socket_mem[socket_id] != 0)
> > +                       return socket_id;
> > +       }
> 
> We could look at current thread cpu affinity to check to which sockets
> it is bound to (like what is done in eal_cpuset_socket_id()).
> But that would make the code rather complex and the setups in which it
> helps are surely even even more odd than what I mentionned above.
> 
> Your proposed heuristic looks fine to me, let's go with it if nobody objects.

I also think the current heuristic covers the real-life use cases.

Thanks for the review

Olivier


> 
> 
> > +
> > +       return rte_socket_id_by_idx(0);
> > +}
> > +
> >  void *
> >  malloc_heap_alloc(const char *type, size_t size, int socket_arg,
> >                 unsigned int flags, size_t align, size_t bound, bool contig)
> > diff --git a/lib/eal/common/malloc_heap.h b/lib/eal/common/malloc_heap.h
> > index 3a6ec6ecf0..3a29d024b4 100644
> > --- a/lib/eal/common/malloc_heap.h
> > +++ b/lib/eal/common/malloc_heap.h
> > @@ -33,17 +33,6 @@ struct malloc_heap {
> >         char name[RTE_HEAP_NAME_MAX_LEN];
> >  } __rte_cache_aligned;
> >
> > -static inline unsigned
> > -malloc_get_numa_socket(void)
> > -{
> > -       unsigned socket_id = rte_socket_id();
> > -
> > -       if (socket_id == (unsigned)SOCKET_ID_ANY)
> > -               return 0;
> > -
> > -       return socket_id;
> > -}
> > -
> >  void *
> >  malloc_heap_alloc(const char *type, size_t size, int socket, unsigned int flags,
> >                 size_t align, size_t bound, bool contig);
> > --
> > 2.30.2
> >
> 
> 
> -- 
> David Marchand
>
  
David Marchand Nov. 5, 2021, 2:26 p.m. UTC | #3
On Thu, Nov 4, 2021 at 9:54 AM Olivier Matz <olivier.matz@6wind.com> wrote:
> > > From: Ilyes Ben Hamouda <ilyes.ben_hamouda@6wind.com>
> > >
> > > When using rte_malloc() from a control thread, the used heap is the one
> > > from numa socket 0, which may not have available memory.
> > >
> > > Fix this by selecting the first socket which has available memory.
> > >
> > > Note: malloc_get_numa_socket() is only used from one .c file, so move
> > > it there, and remove the inline keyword.
> > >
> > > Fixes: b94580d6887e ("malloc: avoid unknown socket id")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Ilyes Ben Hamouda <ilyes.ben_hamouda@6wind.com>
> > > Signed-off-by: Olivier Matz <olivier.matz@6wind.com>

Acked-by: David Marchand <david.marchand@redhat.com>



> I didn't know this was possible (and still wonder in which case it can
> be useful). But yes, I can send a new version with an updated title and
> commit log. What about this one below?

No need for a v2, I took your suggestion.
Applied, thanks.

>
>   eal: fix mem alloc from thread having unknown socket id
>
>   When using rte_malloc() from a thread which is not bound to a numa
>   socket (the typical case is a control thread, but it can also happen
>   on a dataplane thread if its cpu affinity is on cores attached to
>   several sockets), the used heap is the one from numa socket 0, which
>   may not have available memory.
>
>   Fix this by selecting the first socket which has available memory.
>
>   Note: malloc_get_numa_socket() is only used from one .c file, so move
>   it there, and remove the inline keyword.
>
  

Patch

diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c
index ee400f38ec..6eff9a2284 100644
--- a/lib/eal/common/malloc_heap.c
+++ b/lib/eal/common/malloc_heap.c
@@ -694,6 +694,26 @@  malloc_heap_alloc_on_heap_id(const char *type, size_t size,
 	return ret;
 }
 
+static unsigned int
+malloc_get_numa_socket(void)
+{
+	const struct internal_config *conf = eal_get_internal_configuration();
+	unsigned int socket_id = rte_socket_id();
+	unsigned int idx;
+
+	if (socket_id != (unsigned int)SOCKET_ID_ANY)
+		return socket_id;
+
+	/* for control threads, return first socket where memory is available */
+	for (idx = 0; idx < rte_socket_count(); idx++) {
+		socket_id = rte_socket_id_by_idx(idx);
+		if (conf->socket_mem[socket_id] != 0)
+			return socket_id;
+	}
+
+	return rte_socket_id_by_idx(0);
+}
+
 void *
 malloc_heap_alloc(const char *type, size_t size, int socket_arg,
 		unsigned int flags, size_t align, size_t bound, bool contig)
diff --git a/lib/eal/common/malloc_heap.h b/lib/eal/common/malloc_heap.h
index 3a6ec6ecf0..3a29d024b4 100644
--- a/lib/eal/common/malloc_heap.h
+++ b/lib/eal/common/malloc_heap.h
@@ -33,17 +33,6 @@  struct malloc_heap {
 	char name[RTE_HEAP_NAME_MAX_LEN];
 } __rte_cache_aligned;
 
-static inline unsigned
-malloc_get_numa_socket(void)
-{
-	unsigned socket_id = rte_socket_id();
-
-	if (socket_id == (unsigned)SOCKET_ID_ANY)
-		return 0;
-
-	return socket_id;
-}
-
 void *
 malloc_heap_alloc(const char *type, size_t size, int socket, unsigned int flags,
 		size_t align, size_t bound, bool contig);