[v2] eal: fix undetected NUMA nodes

Message ID 20250305162458.1059282-1-bruce.richardson@intel.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series [v2] eal: fix undetected NUMA nodes |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS
ci/github-robot: build success github build: passed
ci/iol-unit-amd64-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-unit-arm64-testing success Testing PASS
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-marvell-Functional success Functional Testing PASS RETEST #1
ci/iol-mellanox-Performance success Performance Testing PASS RETEST #1
ci/iol-intel-Functional success Functional Testing PASS RETEST #1
ci/iol-intel-Performance success Performance Testing PASS RETEST #1
ci/iol-sample-apps-testing success Testing PASS RETEST #1
ci/iol-broadcom-Performance success Performance Testing PASS RETEST #1

Commit Message

Bruce Richardson March 5, 2025, 4:24 p.m. UTC
In cases where the number of cores on a given socket is greater than
RTE_MAX_LCORES, then EAL will be unaware of all the sockets/numa nodes
on a system. Fix this limitation by having the EAL probe the NUMA node
for cores it isn't going to use, and recording that for completeness.

This is necessary as memory is tracked per node, and with the --lcores
parameters our app lcores may be on different sockets than the lcore ids
may imply. For example, lcore 0 is on socket zero, but if app is run
with --lcores=0@64, then DPDK lcore 0 may be on socket one, so DPDK
needs to be aware of that socket.

Fixes: 952b20777255 ("eal: provide API for querying valid socket ids")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

---
v2: handle case where RTE_MAX_LCORE > CPU_SETSIZE (i.e. >1024)
---
 lib/eal/common/eal_common_lcore.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)
  

Comments

Patrick Robb March 6, 2025, 3 a.m. UTC | #1
Recheck-request: iol-intel-Performance

There was an infra failure with the Arm Grace server about 12 hours ago
which caused this failure - sending a retest request.
  
Bruce Richardson March 18, 2025, 5:42 p.m. UTC | #2
On Wed, Mar 05, 2025 at 04:24:58PM +0000, Bruce Richardson wrote:
> In cases where the number of cores on a given socket is greater than
> RTE_MAX_LCORES, then EAL will be unaware of all the sockets/numa nodes
> on a system. Fix this limitation by having the EAL probe the NUMA node
> for cores it isn't going to use, and recording that for completeness.
> 
> This is necessary as memory is tracked per node, and with the --lcores
> parameters our app lcores may be on different sockets than the lcore ids
> may imply. For example, lcore 0 is on socket zero, but if app is run
> with --lcores=0@64, then DPDK lcore 0 may be on socket one, so DPDK
> needs to be aware of that socket.
> 
> Fixes: 952b20777255 ("eal: provide API for querying valid socket ids")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> ---
> v2: handle case where RTE_MAX_LCORE > CPU_SETSIZE (i.e. >1024)
> ---
>  lib/eal/common/eal_common_lcore.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
Ping for review.

For anyone wanting to test:
	 To reproduce the issue, do a build of DPDK with max_lcores option
set to less than the number of physical cores you have on a socket.
Then when running DPDK, the number of NUMA nodes printed at startup
will be incorrect. Applying this patch will then correct that.

Thanks,
/Bruce
  
David Marchand March 19, 2025, 3:42 p.m. UTC | #3
On Tue, Mar 18, 2025 at 6:42 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Wed, Mar 05, 2025 at 04:24:58PM +0000, Bruce Richardson wrote:
> > In cases where the number of cores on a given socket is greater than
> > RTE_MAX_LCORES, then EAL will be unaware of all the sockets/numa nodes
> > on a system. Fix this limitation by having the EAL probe the NUMA node
> > for cores it isn't going to use, and recording that for completeness.
> >
> > This is necessary as memory is tracked per node, and with the --lcores
> > parameters our app lcores may be on different sockets than the lcore ids
> > may imply. For example, lcore 0 is on socket zero, but if app is run
> > with --lcores=0@64, then DPDK lcore 0 may be on socket one, so DPDK
> > needs to be aware of that socket.
> >
> > Fixes: 952b20777255 ("eal: provide API for querying valid socket ids")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> >
> > ---
> > v2: handle case where RTE_MAX_LCORE > CPU_SETSIZE (i.e. >1024)
> > ---
> >  lib/eal/common/eal_common_lcore.c | 17 ++++++++++++-----
> >  1 file changed, 12 insertions(+), 5 deletions(-)
> >
> Ping for review.
>
> For anyone wanting to test:
>          To reproduce the issue, do a build of DPDK with max_lcores option
> set to less than the number of physical cores you have on a socket.

You also need to have those cores contiguous for a numa node which is
quite frequent on the systems in my lab.
But anyway, the bug is not hard to understand, I'll reply on the patch itself.
  
David Marchand March 19, 2025, 4:31 p.m. UTC | #4
On Wed, Mar 5, 2025 at 5:25 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> In cases where the number of cores on a given socket is greater than
> RTE_MAX_LCORES, then EAL will be unaware of all the sockets/numa nodes
> on a system. Fix this limitation by having the EAL probe the NUMA node
> for cores it isn't going to use, and recording that for completeness.
>
> This is necessary as memory is tracked per node, and with the --lcores
> parameters our app lcores may be on different sockets than the lcore ids
> may imply. For example, lcore 0 is on socket zero, but if app is run
> with --lcores=0@64, then DPDK lcore 0 may be on socket one, so DPDK
> needs to be aware of that socket.
>
> Fixes: 952b20777255 ("eal: provide API for querying valid socket ids")
> Cc: stable@dpdk.org
>
> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>

On the principle, the fix lgtm.

I have one comment.

>
> ---
> v2: handle case where RTE_MAX_LCORE > CPU_SETSIZE (i.e. >1024)
> ---
>  lib/eal/common/eal_common_lcore.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/lib/eal/common/eal_common_lcore.c b/lib/eal/common/eal_common_lcore.c
> index 2ff9252c52..820a6534b1 100644
> --- a/lib/eal/common/eal_common_lcore.c
> +++ b/lib/eal/common/eal_common_lcore.c
> @@ -144,7 +144,11 @@ rte_eal_cpu_init(void)
>         unsigned lcore_id;
>         unsigned count = 0;
>         unsigned int socket_id, prev_socket_id;
> -       int lcore_to_socket_id[RTE_MAX_LCORE];
> +#if CPU_SETSIZE > RTE_MAX_LCORE
> +       int lcore_to_socket_id[CPU_SETSIZE] = {0};
> +#else
> +       int lcore_to_socket_id[RTE_MAX_LCORE] = {0};
> +#endif

This initialisation was unneeded so far because, in the next loop (on
each possible lcore), eal_cpu_socket_id() (returning 0 even for
errors) was called regardless of eal_cpu_detected().
Moving this call after eal_cpu_detected() would be consistent with the
rest of this patch.


It is unrelated to this patch itself, but I also have some doubt about
the socket_id value stored per lcore, as no check against
RTE_MAX_NUMA_NODES is done afterwards.
(it is probably never hit since the default value for RTE_MAX_NUMA_NODES is 32).
  
Bruce Richardson March 19, 2025, 4:54 p.m. UTC | #5
On Wed, Mar 19, 2025 at 05:31:45PM +0100, David Marchand wrote:
> On Wed, Mar 5, 2025 at 5:25 PM Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> >
> > In cases where the number of cores on a given socket is greater than
> > RTE_MAX_LCORES, then EAL will be unaware of all the sockets/numa nodes
> > on a system. Fix this limitation by having the EAL probe the NUMA node
> > for cores it isn't going to use, and recording that for completeness.
> >
> > This is necessary as memory is tracked per node, and with the --lcores
> > parameters our app lcores may be on different sockets than the lcore ids
> > may imply. For example, lcore 0 is on socket zero, but if app is run
> > with --lcores=0@64, then DPDK lcore 0 may be on socket one, so DPDK
> > needs to be aware of that socket.
> >
> > Fixes: 952b20777255 ("eal: provide API for querying valid socket ids")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> On the principle, the fix lgtm.
> 
> I have one comment.
> 
> >
> > ---
> > v2: handle case where RTE_MAX_LCORE > CPU_SETSIZE (i.e. >1024)
> > ---
> >  lib/eal/common/eal_common_lcore.c | 17 ++++++++++++-----
> >  1 file changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/lib/eal/common/eal_common_lcore.c b/lib/eal/common/eal_common_lcore.c
> > index 2ff9252c52..820a6534b1 100644
> > --- a/lib/eal/common/eal_common_lcore.c
> > +++ b/lib/eal/common/eal_common_lcore.c
> > @@ -144,7 +144,11 @@ rte_eal_cpu_init(void)
> >         unsigned lcore_id;
> >         unsigned count = 0;
> >         unsigned int socket_id, prev_socket_id;
> > -       int lcore_to_socket_id[RTE_MAX_LCORE];
> > +#if CPU_SETSIZE > RTE_MAX_LCORE
> > +       int lcore_to_socket_id[CPU_SETSIZE] = {0};
> > +#else
> > +       int lcore_to_socket_id[RTE_MAX_LCORE] = {0};
> > +#endif
> 
> This initialisation was unneeded so far because, in the next loop (on
> each possible lcore), eal_cpu_socket_id() (returning 0 even for
> errors) was called regardless of eal_cpu_detected().
> Moving this call after eal_cpu_detected() would be consistent with the
> rest of this patch.
> 

So keep the zero-init, and move the function call to set the initial values
in the array then?

> 
> It is unrelated to this patch itself, but I also have some doubt about
> the socket_id value stored per lcore, as no check against
> RTE_MAX_NUMA_NODES is done afterwards.
> (it is probably never hit since the default value for RTE_MAX_NUMA_NODES is 32).
> 

Well, it's an open question whether RTE_MAX_NUMA_NODES is the max value for a
node id, or the maximum number of ids which can be handled. I imagine most
of the code assumes both - that we have sequential numa nodes with value <
MAX.

/Bruce
  
David Marchand March 19, 2025, 5:28 p.m. UTC | #6
On Wed, Mar 19, 2025 at 5:55 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Wed, Mar 19, 2025 at 05:31:45PM +0100, David Marchand wrote:
> > On Wed, Mar 5, 2025 at 5:25 PM Bruce Richardson
> > <bruce.richardson@intel.com> wrote:
> > >
> > > In cases where the number of cores on a given socket is greater than
> > > RTE_MAX_LCORES, then EAL will be unaware of all the sockets/numa nodes
> > > on a system. Fix this limitation by having the EAL probe the NUMA node
> > > for cores it isn't going to use, and recording that for completeness.
> > >
> > > This is necessary as memory is tracked per node, and with the --lcores
> > > parameters our app lcores may be on different sockets than the lcore ids
> > > may imply. For example, lcore 0 is on socket zero, but if app is run
> > > with --lcores=0@64, then DPDK lcore 0 may be on socket one, so DPDK
> > > needs to be aware of that socket.
> > >
> > > Fixes: 952b20777255 ("eal: provide API for querying valid socket ids")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> >
> > On the principle, the fix lgtm.
> >
> > I have one comment.
> >
> > >
> > > ---
> > > v2: handle case where RTE_MAX_LCORE > CPU_SETSIZE (i.e. >1024)
> > > ---
> > >  lib/eal/common/eal_common_lcore.c | 17 ++++++++++++-----
> > >  1 file changed, 12 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/lib/eal/common/eal_common_lcore.c b/lib/eal/common/eal_common_lcore.c
> > > index 2ff9252c52..820a6534b1 100644
> > > --- a/lib/eal/common/eal_common_lcore.c
> > > +++ b/lib/eal/common/eal_common_lcore.c
> > > @@ -144,7 +144,11 @@ rte_eal_cpu_init(void)
> > >         unsigned lcore_id;
> > >         unsigned count = 0;
> > >         unsigned int socket_id, prev_socket_id;
> > > -       int lcore_to_socket_id[RTE_MAX_LCORE];
> > > +#if CPU_SETSIZE > RTE_MAX_LCORE
> > > +       int lcore_to_socket_id[CPU_SETSIZE] = {0};
> > > +#else
> > > +       int lcore_to_socket_id[RTE_MAX_LCORE] = {0};
> > > +#endif
> >
> > This initialisation was unneeded so far because, in the next loop (on
> > each possible lcore), eal_cpu_socket_id() (returning 0 even for
> > errors) was called regardless of eal_cpu_detected().
> > Moving this call after eal_cpu_detected() would be consistent with the
> > rest of this patch.
> >
>
> So keep the zero-init, and move the function call to set the initial values
> in the array then?

I see no elegant way with current code.
I would completely separate this socket discovery from the rest...

Anyway, this is not the subject of this fix, so I'll withdraw this comment.

>
> >
> > It is unrelated to this patch itself, but I also have some doubt about
> > the socket_id value stored per lcore, as no check against
> > RTE_MAX_NUMA_NODES is done afterwards.
> > (it is probably never hit since the default value for RTE_MAX_NUMA_NODES is 32).
> >
>
> Well, it's an open question whether RTE_MAX_NUMA_NODES is the max value for a
> node id, or the maximum number of ids which can be handled. I imagine most
> of the code assumes both - that we have sequential numa nodes with value <
> MAX.

Regardless of the meaning, we can end up in a situation where a lcore
has a socket_id set in lcore_config[] / rte_lcore_XX API, that is
outside the list of numa nodes stored in config->numa_nodes[] /
rte_socket_XX API, which is used for memory init for example.
  
Thomas Monjalon March 23, 2025, 9:34 a.m. UTC | #7
19/03/2025 17:31, David Marchand:
> On Wed, Mar 5, 2025 at 5:25 PM Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> >
> > In cases where the number of cores on a given socket is greater than
> > RTE_MAX_LCORES, then EAL will be unaware of all the sockets/numa nodes
> > on a system. Fix this limitation by having the EAL probe the NUMA node
> > for cores it isn't going to use, and recording that for completeness.
> >
> > This is necessary as memory is tracked per node, and with the --lcores
> > parameters our app lcores may be on different sockets than the lcore ids
> > may imply. For example, lcore 0 is on socket zero, but if app is run
> > with --lcores=0@64, then DPDK lcore 0 may be on socket one, so DPDK
> > needs to be aware of that socket.
> >
> > Fixes: 952b20777255 ("eal: provide API for querying valid socket ids")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> 
> On the principle, the fix lgtm.

It is very late in the release cycle, but we have discussed it
and decided to merge this fix in 25.03.

Applied, thanks.
  

Patch

diff --git a/lib/eal/common/eal_common_lcore.c b/lib/eal/common/eal_common_lcore.c
index 2ff9252c52..820a6534b1 100644
--- a/lib/eal/common/eal_common_lcore.c
+++ b/lib/eal/common/eal_common_lcore.c
@@ -144,7 +144,11 @@  rte_eal_cpu_init(void)
 	unsigned lcore_id;
 	unsigned count = 0;
 	unsigned int socket_id, prev_socket_id;
-	int lcore_to_socket_id[RTE_MAX_LCORE];
+#if CPU_SETSIZE > RTE_MAX_LCORE
+	int lcore_to_socket_id[CPU_SETSIZE] = {0};
+#else
+	int lcore_to_socket_id[RTE_MAX_LCORE] = {0};
+#endif
 
 	/*
 	 * Parse the maximum set of logical cores, detect the subset of running
@@ -183,9 +187,11 @@  rte_eal_cpu_init(void)
 	for (; lcore_id < CPU_SETSIZE; lcore_id++) {
 		if (eal_cpu_detected(lcore_id) == 0)
 			continue;
+		socket_id = eal_cpu_socket_id(lcore_id);
+		lcore_to_socket_id[lcore_id] = socket_id;
 		EAL_LOG(DEBUG, "Skipped lcore %u as core %u on socket %u",
 			lcore_id, eal_cpu_core_id(lcore_id),
-			eal_cpu_socket_id(lcore_id));
+			socket_id);
 	}
 
 	/* Set the count of enabled logical cores of the EAL configuration */
@@ -201,12 +207,13 @@  rte_eal_cpu_init(void)
 
 	prev_socket_id = -1;
 	config->numa_node_count = 0;
-	for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+	for (lcore_id = 0; lcore_id < RTE_DIM(lcore_to_socket_id); lcore_id++) {
 		socket_id = lcore_to_socket_id[lcore_id];
 		if (socket_id != prev_socket_id)
-			config->numa_nodes[config->numa_node_count++] =
-					socket_id;
+			config->numa_nodes[config->numa_node_count++] =	socket_id;
 		prev_socket_id = socket_id;
+		if (config->numa_node_count >= RTE_MAX_NUMA_NODES)
+			break;
 	}
 	EAL_LOG(INFO, "Detected NUMA nodes: %u", config->numa_node_count);