config: increase default maximum number of NUMA nodes

Message ID 20210203211818.3047146-1-thomas@monjalon.net (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series config: increase default maximum number of NUMA nodes |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/iol-broadcom-Performance success Performance Testing PASS
ci/intel-Testing success Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS

Commit Message

Thomas Monjalon Feb. 3, 2021, 9:18 p.m. UTC
  AMD CPU can present a high number of NUMA nodes.
The default should be 32 for better compatibility.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 meson_options.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Jerin Jacob Feb. 4, 2021, 6:19 a.m. UTC | #1
On Thu, Feb 4, 2021 at 2:49 AM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> AMD CPU can present a high number of NUMA nodes.
> The default should be 32 for better compatibility.

The typical configuration is 4 nodes[1] for AMD. Just wondering, Is it
an exception case? if so, Do we need to consume more memory for normal
cases?

[1]
https://developer.amd.com/wp-content/resources/56308-NUMA%20Topology%20for%20AMD%20EPYC%E2%84%A2%20Naples%20Family%20Processors.PDF

>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
>  meson_options.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/meson_options.txt b/meson_options.txt
> index 5c382487da..6eff62e47d 100644
> --- a/meson_options.txt
> +++ b/meson_options.txt
> @@ -26,7 +26,7 @@ option('max_ethports', type: 'integer', value: 32,
>         description: 'maximum number of Ethernet devices')
>  option('max_lcores', type: 'integer', value: 128,
>         description: 'maximum number of cores/threads supported by EAL')
> -option('max_numa_nodes', type: 'integer', value: 4,
> +option('max_numa_nodes', type: 'integer', value: 32,
>         description: 'maximum number of NUMA nodes supported by EAL')
>  option('enable_trace_fp', type: 'boolean', value: false,
>         description: 'enable fast path trace points.')
> --
> 2.30.0
>
  
Bruce Richardson Feb. 4, 2021, 9:56 a.m. UTC | #2
On Wed, Feb 03, 2021 at 10:18:18PM +0100, Thomas Monjalon wrote:
> AMD CPU can present a high number of NUMA nodes.
> The default should be 32 for better compatibility.
> 
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
Seems reasonable.

Acked-by: Bruce Richardson <bruce.richardson@intel.com>
  
Thomas Monjalon Feb. 4, 2021, 10:28 a.m. UTC | #3
04/02/2021 07:19, Jerin Jacob:
> On Thu, Feb 4, 2021 at 2:49 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > AMD CPU can present a high number of NUMA nodes.
> > The default should be 32 for better compatibility.
> 
> The typical configuration is 4 nodes[1] for AMD. Just wondering, Is it
> an exception case? if so, Do we need to consume more memory for normal
> cases?
> 
> [1]
> https://developer.amd.com/wp-content/resources/56308-NUMA%20Topology%20for%20AMD%20EPYC%E2%84%A2%20Naples%20Family%20Processors.PDF

As you can read in
https://www.dell.com/support/kbdoc/fr-fr/000137696/amd-rome-is-it-for-real-architecture-and-initial-hpc-performance
there is an option "CCX as NUMA Domain.
This option exposes each CCX as a NUMA node.
On a system with dual-socket CPUs with 16 CCXs per CPU,
this setting will expose 32 NUMA domains."
and
"Enabling this option is expected to help virtualized environments."

I would not say it is exceptional.
And in my understanding, the memory cost is not so high for DPDK.
Do you see some large arrays depending on RTE_MAX_NUMA_NODES?
  
Asaf Penso Feb. 4, 2021, 10:56 a.m. UTC | #4
>-----Original Message-----
>From: dev <dev-bounces@dpdk.org> On Behalf Of Thomas Monjalon
>Sent: Wednesday, February 3, 2021 11:18 PM
>To: dev@dpdk.org
>Cc: Bruce Richardson <bruce.richardson@intel.com>
>Subject: [dpdk-dev] [PATCH] config: increase default maximum number of
>NUMA nodes
>
>AMD CPU can present a high number of NUMA nodes.
>The default should be 32 for better compatibility.
>
>Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
>---
> meson_options.txt | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/meson_options.txt b/meson_options.txt index
>5c382487da..6eff62e47d 100644
>--- a/meson_options.txt
>+++ b/meson_options.txt
>@@ -26,7 +26,7 @@ option('max_ethports', type: 'integer', value: 32,
> 	description: 'maximum number of Ethernet devices')
>option('max_lcores', type: 'integer', value: 128,
> 	description: 'maximum number of cores/threads supported by EAL') -
>option('max_numa_nodes', type: 'integer', value: 4,
>+option('max_numa_nodes', type: 'integer', value: 32,
> 	description: 'maximum number of NUMA nodes supported by EAL')
>option('enable_trace_fp', type: 'boolean', value: false,
> 	description: 'enable fast path trace points.')
>--
>2.30.0

Reviewed-by: Asaf Penso <asafp@nvidia.com>
  
Jerin Jacob Feb. 4, 2021, 11:39 a.m. UTC | #5
On Thu, Feb 4, 2021 at 3:58 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 04/02/2021 07:19, Jerin Jacob:
> > On Thu, Feb 4, 2021 at 2:49 AM Thomas Monjalon <thomas@monjalon.net> wrote:
> > >
> > > AMD CPU can present a high number of NUMA nodes.
> > > The default should be 32 for better compatibility.
> >
> > The typical configuration is 4 nodes[1] for AMD. Just wondering, Is it
> > an exception case? if so, Do we need to consume more memory for normal
> > cases?
> >
> > [1]
> > https://developer.amd.com/wp-content/resources/56308-NUMA%20Topology%20for%20AMD%20EPYC%E2%84%A2%20Naples%20Family%20Processors.PDF
>
> As you can read in
> https://www.dell.com/support/kbdoc/fr-fr/000137696/amd-rome-is-it-for-real-architecture-and-initial-hpc-performance
> there is an option "CCX as NUMA Domain.
> This option exposes each CCX as a NUMA node.
> On a system with dual-socket CPUs with 16 CCXs per CPU,
> this setting will expose 32 NUMA domains."
> and
> "Enabling this option is expected to help virtualized environments."

I see.

>
> I would not say it is exceptional.
> And in my understanding, the memory cost is not so high for DPDK.
> Do you see some large arrays depending on RTE_MAX_NUMA_NODES?

Not quite a lot.

lib/librte_efd/rte_efd.c:       struct efd_online_chunk
*chunks[RTE_MAX_NUMA_NODES];
lib/librte_eal/linux/eal_memory.c:      uint64_t memory[RTE_MAX_NUMA_NODES];
lib/librte_eal/linux/eal.c:     char * arg[RTE_MAX_NUMA_NODES];
lib/librte_eal/common/eal_common_dynmem.c:      uint64_t
memory[RTE_MAX_NUMA_NODES];
lib/librte_eal/common/eal_common_dynmem.c:              int
cpu_per_socket[RTE_MAX_NUMA_NODES];
lib/librte_eal/common/eal_private.h:    uint32_t
numa_nodes[RTE_MAX_NUMA_NODES]; /**< List of detected NUMA nodes. */
lib/librte_eal/common/eal_internal_cfg.h:       uint32_t
num_pages[RTE_MAX_NUMA_NODES];
lib/librte_eal/common/eal_internal_cfg.h:       volatile uint64_t
socket_mem[RTE_MAX_NUMA_NODES]; /**< amount of memory per socket */
lib/librte_eal/common/eal_internal_cfg.h:       volatile uint64_t
socket_limit[RTE_MAX_NUMA_NODES]; /**< limit amount of memory per
socket */
lib/librte_eal/windows/eal_lcore.c:     struct socket_map
sockets[RTE_MAX_NUMA_NODES];
lib/librte_node/ip4_lookup.c:   struct rte_lpm *lpm_tbl[RTE_MAX_NUMA_NODES];


Acked-by: Jerin Jacob <jerinj@marvell.com>


>
>
  
Thomas Monjalon Feb. 5, 2021, 4:36 p.m. UTC | #6
04/02/2021 10:56, Bruce Richardson:
> On Wed, Feb 03, 2021 at 10:18:18PM +0100, Thomas Monjalon wrote:
> > AMD CPU can present a high number of NUMA nodes.
> > The default should be 32 for better compatibility.
> > 
> > Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> > ---
> Seems reasonable.
> 
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>

Applied with additional explanation:

On a dual-socket with 16 CCXs per CPU,
the option "CCX (or LLC) as NUMA domain" will expose 32 NUMA nodes.
  

Patch

diff --git a/meson_options.txt b/meson_options.txt
index 5c382487da..6eff62e47d 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -26,7 +26,7 @@  option('max_ethports', type: 'integer', value: 32,
 	description: 'maximum number of Ethernet devices')
 option('max_lcores', type: 'integer', value: 128,
 	description: 'maximum number of cores/threads supported by EAL')
-option('max_numa_nodes', type: 'integer', value: 4,
+option('max_numa_nodes', type: 'integer', value: 32,
 	description: 'maximum number of NUMA nodes supported by EAL')
 option('enable_trace_fp', type: 'boolean', value: false,
 	description: 'enable fast path trace points.')