[RFC] lib: set/get max memzone segments

Message ID 20230419083634.2027689-1-ophirmu@nvidia.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [RFC] lib: set/get max memzone segments |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/intel-Functional success Functional PASS

Commit Message

Ophir Munk April 19, 2023, 8:36 a.m. UTC
  In current DPDK the RTE_MAX_MEMZONE definition is unconditionally hard
coded as 2560.  For applications requiring different values of this
parameter – it is more convenient to set the max value via an rte API -
rather than changing the dpdk source code per application.  In many
organizations, the possibility to compile a private DPDK library for a
particular application does not exist at all.  With this option there is
no need to recompile DPDK and it allows using an in-box packaged DPDK.
An example usage for updating the RTE_MAX_MEMZONE would be of an
application that uses the DPDK mempool library which is based on DPDK
memzone library.  The application may need to create a number of
steering tables, each of which will require its own mempool allocation.
This commit is not about how to optimize the application usage of
mempool nor about how to improve the mempool implementation based on
memzone.  It is about how to make the max memzone definition - run-time
customized.
This commit adds an API which must be called before rte_eal_init():
rte_memzone_max_set(int max).  If not called, the default memzone
(RTE_MAX_MEMZONE) is used.  There is also an API to query the effective
max memzone: rte_memzone_max_get().

Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
---
 app/test/test_func_reentrancy.c     |  2 +-
 app/test/test_malloc_perf.c         |  2 +-
 app/test/test_memzone.c             |  2 +-
 config/rte_config.h                 |  1 -
 drivers/net/qede/base/bcm_osal.c    | 26 +++++++++++++++++++++-----
 drivers/net/qede/base/bcm_osal.h    |  3 +++
 drivers/net/qede/qede_main.c        |  7 +++++++
 lib/eal/common/eal_common_memzone.c | 28 +++++++++++++++++++++++++---
 lib/eal/include/rte_memzone.h       | 20 ++++++++++++++++++++
 lib/eal/version.map                 |  4 ++++
 10 files changed, 83 insertions(+), 12 deletions(-)
  

Comments

Ophir Munk April 19, 2023, 8:48 a.m. UTC | #1
Devendra Singh Rawat, Alok Prasad - can you please give your feedback on the qede driver updates?

> -----Original Message-----
> In current DPDK the RTE_MAX_MEMZONE definition is unconditionally hard
> coded as 2560.  For applications requiring different values of this parameter
> – it is more convenient to set the max value via an rte API - rather than
> changing the dpdk source code per application.  In many organizations, the
> possibility to compile a private DPDK library for a particular application does
> not exist at all.  With this option there is no need to recompile DPDK and it
> allows using an in-box packaged DPDK.
> An example usage for updating the RTE_MAX_MEMZONE would be of an
> application that uses the DPDK mempool library which is based on DPDK
> memzone library.  The application may need to create a number of steering
> tables, each of which will require its own mempool allocation.
> This commit is not about how to optimize the application usage of mempool
> nor about how to improve the mempool implementation based on
> memzone.  It is about how to make the max memzone definition - run-time
> customized.
> This commit adds an API which must be called before rte_eal_init():
> rte_memzone_max_set(int max).  If not called, the default memzone
> (RTE_MAX_MEMZONE) is used.  There is also an API to query the effective
> max memzone: rte_memzone_max_get().
> 
> Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
> ---
>  app/test/test_func_reentrancy.c     |  2 +-
>  app/test/test_malloc_perf.c         |  2 +-
>  app/test/test_memzone.c             |  2 +-
>  config/rte_config.h                 |  1 -
>  drivers/net/qede/base/bcm_osal.c    | 26 +++++++++++++++++++++-----
>  drivers/net/qede/base/bcm_osal.h    |  3 +++
>  drivers/net/qede/qede_main.c        |  7 +++++++
>  lib/eal/common/eal_common_memzone.c | 28
  
Devendra Singh Rawat April 19, 2023, 1:42 p.m. UTC | #2
>diff --git a/drivers/net/qede/base/bcm_osal.c
>b/drivers/net/qede/base/bcm_osal.c
>index 2c59397..f195f2c 100644
>--- a/drivers/net/qede/base/bcm_osal.c
>+++ b/drivers/net/qede/base/bcm_osal.c
>@@ -47,10 +47,26 @@ void osal_poll_mode_dpc(osal_int_ptr_t
>hwfn_cookie)  }
>
> /* Array of memzone pointers */
>-static const struct rte_memzone
>*ecore_mz_mapping[RTE_MAX_MEMZONE];
>+static const struct rte_memzone **ecore_mz_mapping;
> /* Counter to track current memzone allocated */  static uint16_t
>ecore_mz_count;
>
>+int ecore_mz_mapping_alloc(void)
>+{
>+	ecore_mz_mapping = rte_malloc("ecore_mz_map", 0,
>+		rte_memzone_max_get() * sizeof(struct rte_memzone *));

Second parameter of rte_malloc() should be size and Third parameter should be alignment 0 in this case.

Check 
https://doc.dpdk.org/api/rte__malloc_8h.html#a247c99e8d36300c52729c9ee58c2b489

>diff --git a/drivers/net/qede/base/bcm_osal.h
>b/drivers/net/qede/base/bcm_osal.h
>index 67e7f75..97e261d 100644
>--- a/drivers/net/qede/base/bcm_osal.h
>+++ b/drivers/net/qede/base/bcm_osal.h
>@@ -477,4 +477,7 @@ enum dbg_status
>	qed_dbg_alloc_user_data(struct ecore_hwfn *p_hwfn,
> 	qed_dbg_alloc_user_data(p_hwfn, user_data_ptr)  #define
>OSAL_DB_REC_OCCURRED(p_hwfn) nothing
>
>+int ecore_mz_mapping_alloc(void);
>+void ecore_mz_mapping_free(void);
>+
> #endif /* __BCM_OSAL_H */
>diff --git a/drivers/net/qede/qede_main.c b/drivers/net/qede/qede_main.c
>index 0303903..f116e86 100644
>--- a/drivers/net/qede/qede_main.c
>+++ b/drivers/net/qede/qede_main.c
>@@ -78,6 +78,12 @@ qed_probe(struct ecore_dev *edev, struct
>rte_pci_device *pci_dev,
> 		return rc;
> 	}
>
>+	rc = ecore_mz_mapping_alloc();

ecore_mz_mapping_alloc() should be called prior to calling ecore_hw_prepare().

>+	if (rc) {
>+		DP_ERR(edev, "mem zones array allocation failed\n");
>+		return rc;
>+	}
>+
> 	return rc;
> }
>
>@@ -721,6 +727,7 @@ static void qed_remove(struct ecore_dev *edev)
> 	if (!edev)
> 		return;
>
>+	ecore_mz_mapping_free();
> 	ecore_hw_remove(edev);
> }

ecore_mz_mapping_free() should be called after ecore_hw_remove();
  
Stephen Hemminger April 19, 2023, 2:42 p.m. UTC | #3
On Wed, 19 Apr 2023 11:36:34 +0300
Ophir Munk <ophirmu@nvidia.com> wrote:

> +int ecore_mz_mapping_alloc(void)
> +{
> +	ecore_mz_mapping = rte_malloc("ecore_mz_map", 0,
> +		rte_memzone_max_get() * sizeof(struct rte_memzone *));

Why not use rte_calloc(), and devices should be using NUMA aware
allocation to put the memzone on same NUMA node as the PCI device.
  
Tyler Retzlaff April 19, 2023, 2:51 p.m. UTC | #4
On Wed, Apr 19, 2023 at 11:36:34AM +0300, Ophir Munk wrote:
> In current DPDK the RTE_MAX_MEMZONE definition is unconditionally hard
> coded as 2560.  For applications requiring different values of this
> parameter – it is more convenient to set the max value via an rte API -
> rather than changing the dpdk source code per application.  In many
> organizations, the possibility to compile a private DPDK library for a
> particular application does not exist at all.  With this option there is
> no need to recompile DPDK and it allows using an in-box packaged DPDK.
> An example usage for updating the RTE_MAX_MEMZONE would be of an
> application that uses the DPDK mempool library which is based on DPDK
> memzone library.  The application may need to create a number of
> steering tables, each of which will require its own mempool allocation.
> This commit is not about how to optimize the application usage of
> mempool nor about how to improve the mempool implementation based on
> memzone.  It is about how to make the max memzone definition - run-time
> customized.
> This commit adds an API which must be called before rte_eal_init():
> rte_memzone_max_set(int max).  If not called, the default memzone
> (RTE_MAX_MEMZONE) is used.  There is also an API to query the effective
> max memzone: rte_memzone_max_get().
> 
> Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
> ---

the use case of each application may want a different non-hard coded
value makes sense.

it's less clear to me that requiring it be called before eal init makes
sense over just providing it as configuration to eal init so that it is
composed.

can you elaborate further on why you need get if you have a one-shot
set? why would the application not know the value if you can only ever
call it once before init?

>  app/test/test_func_reentrancy.c     |  2 +-
>  app/test/test_malloc_perf.c         |  2 +-
>  app/test/test_memzone.c             |  2 +-
>  config/rte_config.h                 |  1 -
>  drivers/net/qede/base/bcm_osal.c    | 26 +++++++++++++++++++++-----
>  drivers/net/qede/base/bcm_osal.h    |  3 +++
>  drivers/net/qede/qede_main.c        |  7 +++++++
>  lib/eal/common/eal_common_memzone.c | 28 +++++++++++++++++++++++++---
>  lib/eal/include/rte_memzone.h       | 20 ++++++++++++++++++++
>  lib/eal/version.map                 |  4 ++++
>  10 files changed, 83 insertions(+), 12 deletions(-)
> 
> diff --git a/app/test/test_func_reentrancy.c b/app/test/test_func_reentrancy.c
> index d1ed5d4..ae9de6f 100644
> --- a/app/test/test_func_reentrancy.c
> +++ b/app/test/test_func_reentrancy.c
> @@ -51,7 +51,7 @@ typedef void (*case_clean_t)(unsigned lcore_id);
>  #define MEMPOOL_ELT_SIZE                    (sizeof(uint32_t))
>  #define MEMPOOL_SIZE                        (4)
>  
> -#define MAX_LCORES	(RTE_MAX_MEMZONE / (MAX_ITER_MULTI * 4U))
> +#define MAX_LCORES	(rte_memzone_max_get() / (MAX_ITER_MULTI * 4U))
>  
>  static uint32_t obj_count;
>  static uint32_t synchro;
> diff --git a/app/test/test_malloc_perf.c b/app/test/test_malloc_perf.c
> index ccec43a..9bd1662 100644
> --- a/app/test/test_malloc_perf.c
> +++ b/app/test/test_malloc_perf.c
> @@ -165,7 +165,7 @@ test_malloc_perf(void)
>  		return -1;
>  
>  	if (test_alloc_perf("rte_memzone_reserve", memzone_alloc, memzone_free,
> -			NULL, memset_us_gb, RTE_MAX_MEMZONE - 1) < 0)
> +			NULL, memset_us_gb, rte_memzone_max_get() - 1) < 0)
>  		return -1;
>  
>  	return 0;
> diff --git a/app/test/test_memzone.c b/app/test/test_memzone.c
> index c9255e5..a315826 100644
> --- a/app/test/test_memzone.c
> +++ b/app/test/test_memzone.c
> @@ -871,7 +871,7 @@ test_memzone_bounded(void)
>  static int
>  test_memzone_free(void)
>  {
> -	const struct rte_memzone *mz[RTE_MAX_MEMZONE + 1];
> +	const struct rte_memzone *mz[rte_memzone_max_get() + 1];

please no more VLAs even if in tests.

>  	int i;
>  	char name[20];
>  
> diff --git a/config/rte_config.h b/config/rte_config.h
> index 7b8c85e..400e44e 100644
> --- a/config/rte_config.h
> +++ b/config/rte_config.h
> @@ -34,7 +34,6 @@
>  #define RTE_MAX_MEM_MB_PER_LIST 32768
>  #define RTE_MAX_MEMSEG_PER_TYPE 32768
>  #define RTE_MAX_MEM_MB_PER_TYPE 65536
> -#define RTE_MAX_MEMZONE 2560
>  #define RTE_MAX_TAILQ 32
>  #define RTE_LOG_DP_LEVEL RTE_LOG_INFO
>  #define RTE_MAX_VFIO_CONTAINERS 64
> diff --git a/drivers/net/qede/base/bcm_osal.c b/drivers/net/qede/base/bcm_osal.c
> index 2c59397..f195f2c 100644
> --- a/drivers/net/qede/base/bcm_osal.c
> +++ b/drivers/net/qede/base/bcm_osal.c
> @@ -47,10 +47,26 @@ void osal_poll_mode_dpc(osal_int_ptr_t hwfn_cookie)
>  }
>  
>  /* Array of memzone pointers */
> -static const struct rte_memzone *ecore_mz_mapping[RTE_MAX_MEMZONE];
> +static const struct rte_memzone **ecore_mz_mapping;
>  /* Counter to track current memzone allocated */
>  static uint16_t ecore_mz_count;
>  
> +int ecore_mz_mapping_alloc(void)
> +{
> +	ecore_mz_mapping = rte_malloc("ecore_mz_map", 0,
> +		rte_memzone_max_get() * sizeof(struct rte_memzone *));
> +
> +	if (!ecore_mz_mapping)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +void ecore_mz_mapping_free(void)
> +{
> +	rte_free(ecore_mz_mapping);
> +}
> +
>  unsigned long qede_log2_align(unsigned long n)
>  {
>  	unsigned long ret = n ? 1 : 0;
> @@ -132,9 +148,9 @@ void *osal_dma_alloc_coherent(struct ecore_dev *p_dev,
>  	uint32_t core_id = rte_lcore_id();
>  	unsigned int socket_id;
>  
> -	if (ecore_mz_count >= RTE_MAX_MEMZONE) {
> +	if (ecore_mz_count >= rte_memzone_max_get()) {
>  		DP_ERR(p_dev, "Memzone allocation count exceeds %u\n",
> -		       RTE_MAX_MEMZONE);
> +		       rte_memzone_max_get());
>  		*phys = 0;
>  		return OSAL_NULL;
>  	}
> @@ -171,9 +187,9 @@ void *osal_dma_alloc_coherent_aligned(struct ecore_dev *p_dev,
>  	uint32_t core_id = rte_lcore_id();
>  	unsigned int socket_id;
>  
> -	if (ecore_mz_count >= RTE_MAX_MEMZONE) {
> +	if (ecore_mz_count >= rte_memzone_max_get()) {
>  		DP_ERR(p_dev, "Memzone allocation count exceeds %u\n",
> -		       RTE_MAX_MEMZONE);
> +		       rte_memzone_max_get());
>  		*phys = 0;
>  		return OSAL_NULL;
>  	}
> diff --git a/drivers/net/qede/base/bcm_osal.h b/drivers/net/qede/base/bcm_osal.h
> index 67e7f75..97e261d 100644
> --- a/drivers/net/qede/base/bcm_osal.h
> +++ b/drivers/net/qede/base/bcm_osal.h
> @@ -477,4 +477,7 @@ enum dbg_status	qed_dbg_alloc_user_data(struct ecore_hwfn *p_hwfn,
>  	qed_dbg_alloc_user_data(p_hwfn, user_data_ptr)
>  #define OSAL_DB_REC_OCCURRED(p_hwfn) nothing
>  
> +int ecore_mz_mapping_alloc(void);
> +void ecore_mz_mapping_free(void);
> +
>  #endif /* __BCM_OSAL_H */
> diff --git a/drivers/net/qede/qede_main.c b/drivers/net/qede/qede_main.c
> index 0303903..f116e86 100644
> --- a/drivers/net/qede/qede_main.c
> +++ b/drivers/net/qede/qede_main.c
> @@ -78,6 +78,12 @@ qed_probe(struct ecore_dev *edev, struct rte_pci_device *pci_dev,
>  		return rc;
>  	}
>  
> +	rc = ecore_mz_mapping_alloc();
> +	if (rc) {
> +		DP_ERR(edev, "mem zones array allocation failed\n");
> +		return rc;
> +	}
> +
>  	return rc;
>  }
>  
> @@ -721,6 +727,7 @@ static void qed_remove(struct ecore_dev *edev)
>  	if (!edev)
>  		return;
>  
> +	ecore_mz_mapping_free();
>  	ecore_hw_remove(edev);
>  }
>  
> diff --git a/lib/eal/common/eal_common_memzone.c b/lib/eal/common/eal_common_memzone.c
> index a9cd91f..6c43b7f 100644
> --- a/lib/eal/common/eal_common_memzone.c
> +++ b/lib/eal/common/eal_common_memzone.c
> @@ -22,6 +22,10 @@
>  #include "eal_private.h"
>  #include "eal_memcfg.h"
>  
> +#define RTE_DEFAULT_MAX_MEMZONE 2560
> +
> +static uint32_t memzone_max = RTE_DEFAULT_MAX_MEMZONE;

should be size_t

> +
>  static inline const struct rte_memzone *
>  memzone_lookup_thread_unsafe(const char *name)
>  {
> @@ -81,8 +85,9 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
>  	/* no more room in config */
>  	if (arr->count >= arr->len) {
>  		RTE_LOG(ERR, EAL,
> -		"%s(): Number of requested memzone segments exceeds RTE_MAX_MEMZONE\n",
> -			__func__);
> +		"%s(): Number of requested memzone segments exceeds max "
> +		"memzone segments (%d >= %d)\n",
> +			__func__, arr->count, arr->len);
>  		rte_errno = ENOSPC;
>  		return NULL;
>  	}
> @@ -396,7 +401,7 @@ rte_eal_memzone_init(void)
>  
>  	if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
>  			rte_fbarray_init(&mcfg->memzones, "memzone",
> -			RTE_MAX_MEMZONE, sizeof(struct rte_memzone))) {
> +			rte_memzone_max_get(), sizeof(struct rte_memzone))) {
>  		RTE_LOG(ERR, EAL, "Cannot allocate memzone list\n");
>  		ret = -1;
>  	} else if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
> @@ -430,3 +435,20 @@ void rte_memzone_walk(void (*func)(const struct rte_memzone *, void *),
>  	}
>  	rte_rwlock_read_unlock(&mcfg->mlock);
>  }
> +
> +int
> +rte_memzone_max_set(uint32_t max)

max should be size_t

> +{
> +	/* Setting max memzone must occur befaore calling rte_eal_init() */
> +	if (eal_get_internal_configuration()->init_complete > 0)
> +		return -1;
> +
> +	memzone_max = max;
> +	return 0;
> +}
> +
> +uint32_t
> +rte_memzone_max_get(void)

should return size_t

> +{
> +	return memzone_max;
> +}


> diff --git a/lib/eal/include/rte_memzone.h b/lib/eal/include/rte_memzone.h
> index 5302caa..ca60409 100644
> --- a/lib/eal/include/rte_memzone.h
> +++ b/lib/eal/include/rte_memzone.h
> @@ -305,6 +305,26 @@ void rte_memzone_dump(FILE *f);
>  void rte_memzone_walk(void (*func)(const struct rte_memzone *, void *arg),
>  		      void *arg);
>  
> +/**
> + * Set max memzone value
> + *
> + * @param max
> + *   Value of max memzone allocations
> + * @return
> + *  0 on success, -1 otherwise
> + */
> +__rte_experimental
> +int rte_memzone_max_set(uint32_t max);
> +
> +/**
> + * Get max memzone value
> + *
> + * @return
> + *   Value of max memzone allocations
> + */
> +__rte_experimental
> +uint32_t rte_memzone_max_get(void);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/eal/version.map b/lib/eal/version.map
> index 6d6978f..717c5b2 100644
> --- a/lib/eal/version.map
> +++ b/lib/eal/version.map
> @@ -430,6 +430,10 @@ EXPERIMENTAL {
>  	rte_thread_create_control;
>  	rte_thread_set_name;
>  	__rte_eal_trace_generic_blob;
> +
> +	# added in 23.07
> +	rte_memzone_max_set;
> +	rte_memzone_max_get;
>  };
>  
>  INTERNAL {
> -- 
> 2.8.4
  
Thomas Monjalon April 20, 2023, 7:43 a.m. UTC | #5
19/04/2023 16:51, Tyler Retzlaff:
> On Wed, Apr 19, 2023 at 11:36:34AM +0300, Ophir Munk wrote:
> > In current DPDK the RTE_MAX_MEMZONE definition is unconditionally hard
> > coded as 2560.  For applications requiring different values of this
> > parameter – it is more convenient to set the max value via an rte API -
> > rather than changing the dpdk source code per application.  In many
> > organizations, the possibility to compile a private DPDK library for a
> > particular application does not exist at all.  With this option there is
> > no need to recompile DPDK and it allows using an in-box packaged DPDK.
> > An example usage for updating the RTE_MAX_MEMZONE would be of an
> > application that uses the DPDK mempool library which is based on DPDK
> > memzone library.  The application may need to create a number of
> > steering tables, each of which will require its own mempool allocation.
> > This commit is not about how to optimize the application usage of
> > mempool nor about how to improve the mempool implementation based on
> > memzone.  It is about how to make the max memzone definition - run-time
> > customized.
> > This commit adds an API which must be called before rte_eal_init():
> > rte_memzone_max_set(int max).  If not called, the default memzone
> > (RTE_MAX_MEMZONE) is used.  There is also an API to query the effective
> > max memzone: rte_memzone_max_get().
> > 
> > Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
> > ---
> 
> the use case of each application may want a different non-hard coded
> value makes sense.
> 
> it's less clear to me that requiring it be called before eal init makes
> sense over just providing it as configuration to eal init so that it is
> composed.

Why do you think it would be better as EAL init option?
From an API perspective, I think it is simpler to call a dedicated function.
And I don't think a user wants to deal with it when starting the application.

> can you elaborate further on why you need get if you have a one-shot
> set? why would the application not know the value if you can only ever
> call it once before init?

The "get" function is used in this patch by test and qede driver.
The application could use it as well, especially to query the default value.
  
Tyler Retzlaff April 20, 2023, 6:20 p.m. UTC | #6
On Thu, Apr 20, 2023 at 09:43:28AM +0200, Thomas Monjalon wrote:
> 19/04/2023 16:51, Tyler Retzlaff:
> > On Wed, Apr 19, 2023 at 11:36:34AM +0300, Ophir Munk wrote:
> > > In current DPDK the RTE_MAX_MEMZONE definition is unconditionally hard
> > > coded as 2560.  For applications requiring different values of this
> > > parameter – it is more convenient to set the max value via an rte API -
> > > rather than changing the dpdk source code per application.  In many
> > > organizations, the possibility to compile a private DPDK library for a
> > > particular application does not exist at all.  With this option there is
> > > no need to recompile DPDK and it allows using an in-box packaged DPDK.
> > > An example usage for updating the RTE_MAX_MEMZONE would be of an
> > > application that uses the DPDK mempool library which is based on DPDK
> > > memzone library.  The application may need to create a number of
> > > steering tables, each of which will require its own mempool allocation.
> > > This commit is not about how to optimize the application usage of
> > > mempool nor about how to improve the mempool implementation based on
> > > memzone.  It is about how to make the max memzone definition - run-time
> > > customized.
> > > This commit adds an API which must be called before rte_eal_init():
> > > rte_memzone_max_set(int max).  If not called, the default memzone
> > > (RTE_MAX_MEMZONE) is used.  There is also an API to query the effective
> > > max memzone: rte_memzone_max_get().
> > > 
> > > Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
> > > ---
> > 
> > the use case of each application may want a different non-hard coded
> > value makes sense.
> > 
> > it's less clear to me that requiring it be called before eal init makes
> > sense over just providing it as configuration to eal init so that it is
> > composed.
> 
> Why do you think it would be better as EAL init option?
> From an API perspective, I think it is simpler to call a dedicated function.
> And I don't think a user wants to deal with it when starting the application.

because a dedicated function that can be called detached from the eal
state enables an opportunity for accidental and confusing use outside
the correct context.

i know the above prescribes not to do this but.

now you can call set after eal init, but we protect about calling it
after init by failing. what do we do sensibly with the failure?

> 
> > can you elaborate further on why you need get if you have a one-shot
> > set? why would the application not know the value if you can only ever
> > call it once before init?
> 
> The "get" function is used in this patch by test and qede driver.
> The application could use it as well, especially to query the default value.

this seems incoherent to me, why does the application not know if it has
called set or not? if it called set it knows what the value is, if it didn't
call set it knows what the default is.

anyway, the use case is valid and i would like to see the ability to
change it dynamically i'd prefer not to see an api like this be introduced
as prescribed but that's for you folks to decide.

anyway, i own a lot of apis that operate just like the proposed and
they're great source of support overhead. i prefer not to rely on
documenting a contract when i can enforce the contract and implicit state
machine mechanically with the api instead.

fwiw a nicer pattern for doing this one of framework influencing config
might look something like this.

struct eal_config config;

eal_config_init(&config); // defaults are set entire state made valid
eal_config_set_max_memzone(&config, 1024); // default is overridden

rte_eal_init(&config);

ty
  
Thomas Monjalon April 21, 2023, 8:34 a.m. UTC | #7
20/04/2023 20:20, Tyler Retzlaff:
> On Thu, Apr 20, 2023 at 09:43:28AM +0200, Thomas Monjalon wrote:
> > 19/04/2023 16:51, Tyler Retzlaff:
> > > On Wed, Apr 19, 2023 at 11:36:34AM +0300, Ophir Munk wrote:
> > > > In current DPDK the RTE_MAX_MEMZONE definition is unconditionally hard
> > > > coded as 2560.  For applications requiring different values of this
> > > > parameter – it is more convenient to set the max value via an rte API -
> > > > rather than changing the dpdk source code per application.  In many
> > > > organizations, the possibility to compile a private DPDK library for a
> > > > particular application does not exist at all.  With this option there is
> > > > no need to recompile DPDK and it allows using an in-box packaged DPDK.
> > > > An example usage for updating the RTE_MAX_MEMZONE would be of an
> > > > application that uses the DPDK mempool library which is based on DPDK
> > > > memzone library.  The application may need to create a number of
> > > > steering tables, each of which will require its own mempool allocation.
> > > > This commit is not about how to optimize the application usage of
> > > > mempool nor about how to improve the mempool implementation based on
> > > > memzone.  It is about how to make the max memzone definition - run-time
> > > > customized.
> > > > This commit adds an API which must be called before rte_eal_init():
> > > > rte_memzone_max_set(int max).  If not called, the default memzone
> > > > (RTE_MAX_MEMZONE) is used.  There is also an API to query the effective
> > > > max memzone: rte_memzone_max_get().
> > > > 
> > > > Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
> > > > ---
> > > 
> > > the use case of each application may want a different non-hard coded
> > > value makes sense.
> > > 
> > > it's less clear to me that requiring it be called before eal init makes
> > > sense over just providing it as configuration to eal init so that it is
> > > composed.
> > 
> > Why do you think it would be better as EAL init option?
> > From an API perspective, I think it is simpler to call a dedicated function.
> > And I don't think a user wants to deal with it when starting the application.
> 
> because a dedicated function that can be called detached from the eal
> state enables an opportunity for accidental and confusing use outside
> the correct context.
> 
> i know the above prescribes not to do this but.
> 
> now you can call set after eal init, but we protect about calling it
> after init by failing. what do we do sensibly with the failure?

It would be a developer mistake which could be fix during development stage
very easily. I don't see a problem here.

> > > can you elaborate further on why you need get if you have a one-shot
> > > set? why would the application not know the value if you can only ever
> > > call it once before init?
> > 
> > The "get" function is used in this patch by test and qede driver.
> > The application could use it as well, especially to query the default value.
> 
> this seems incoherent to me, why does the application not know if it has
> called set or not? if it called set it knows what the value is, if it didn't
> call set it knows what the default is.

No the application doesn't know the default, it is an internal value.

> anyway, the use case is valid and i would like to see the ability to
> change it dynamically i'd prefer not to see an api like this be introduced
> as prescribed but that's for you folks to decide.
> 
> anyway, i own a lot of apis that operate just like the proposed and
> they're great source of support overhead. i prefer not to rely on
> documenting a contract when i can enforce the contract and implicit state
> machine mechanically with the api instead.
> 
> fwiw a nicer pattern for doing this one of framework influencing config
> might look something like this.
> 
> struct eal_config config;
> 
> eal_config_init(&config); // defaults are set entire state made valid
> eal_config_set_max_memzone(&config, 1024); // default is overridden
> 
> rte_eal_init(&config);

In general, we discovered that functions doing too much are bad
for usability and for ABI stability.
In the function eal_config_init() that you propose,
any change in the struct eal_config will be an ABI breakage.
  
Morten Brørup April 21, 2023, 11:08 a.m. UTC | #8
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Friday, 21 April 2023 10.35
> 
> 20/04/2023 20:20, Tyler Retzlaff:
> > On Thu, Apr 20, 2023 at 09:43:28AM +0200, Thomas Monjalon wrote:
> > > 19/04/2023 16:51, Tyler Retzlaff:
> > > > On Wed, Apr 19, 2023 at 11:36:34AM +0300, Ophir Munk wrote:
> > > > > In current DPDK the RTE_MAX_MEMZONE definition is unconditionally hard
> > > > > coded as 2560.  For applications requiring different values of this
> > > > > parameter – it is more convenient to set the max value via an rte API
> -
> > > > > rather than changing the dpdk source code per application.  In many
> > > > > organizations, the possibility to compile a private DPDK library for a
> > > > > particular application does not exist at all.  With this option there
> is
> > > > > no need to recompile DPDK and it allows using an in-box packaged DPDK.
> > > > > An example usage for updating the RTE_MAX_MEMZONE would be of an
> > > > > application that uses the DPDK mempool library which is based on DPDK
> > > > > memzone library.  The application may need to create a number of
> > > > > steering tables, each of which will require its own mempool
> allocation.
> > > > > This commit is not about how to optimize the application usage of
> > > > > mempool nor about how to improve the mempool implementation based on
> > > > > memzone.  It is about how to make the max memzone definition - run-
> time
> > > > > customized.
> > > > > This commit adds an API which must be called before rte_eal_init():
> > > > > rte_memzone_max_set(int max).  If not called, the default memzone
> > > > > (RTE_MAX_MEMZONE) is used.  There is also an API to query the
> effective
> > > > > max memzone: rte_memzone_max_get().
> > > > >
> > > > > Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
> > > > > ---
> > > >
> > > > the use case of each application may want a different non-hard coded
> > > > value makes sense.
> > > >
> > > > it's less clear to me that requiring it be called before eal init makes
> > > > sense over just providing it as configuration to eal init so that it is
> > > > composed.
> > >
> > > Why do you think it would be better as EAL init option?
> > > From an API perspective, I think it is simpler to call a dedicated
> function.
> > > And I don't think a user wants to deal with it when starting the
> application.
> >
> > because a dedicated function that can be called detached from the eal
> > state enables an opportunity for accidental and confusing use outside
> > the correct context.
> >
> > i know the above prescribes not to do this but.
> >
> > now you can call set after eal init, but we protect about calling it
> > after init by failing. what do we do sensibly with the failure?
> 
> It would be a developer mistake which could be fix during development stage
> very easily. I don't see a problem here.

Why is this not just a command line parameter, like other EAL configuration options?

Do any other pre-init APIs exist, or are you introducing a new design pattern for configuring EAL?

Any application can simply modify the command line parameters before calling EAL init. It doesn't need to pass the command line parameters as-is to EAL init.

In other words: There is an existing design pattern for configuring EAL, why introduce a new design pattern?

If we want to expose APIs for configuring EAL instead of passing command line parameters, such APIs should be added for all EAL configuration parameters. That would be nice, but I dislike that some EAL configuration parameters must be passed using one method and some other passed using another method.
  
Thomas Monjalon April 21, 2023, 2:57 p.m. UTC | #9
21/04/2023 13:08, Morten Brørup:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > Sent: Friday, 21 April 2023 10.35
> > 20/04/2023 20:20, Tyler Retzlaff:
> > > On Thu, Apr 20, 2023 at 09:43:28AM +0200, Thomas Monjalon wrote:
> > > > 19/04/2023 16:51, Tyler Retzlaff:
> > > > > On Wed, Apr 19, 2023 at 11:36:34AM +0300, Ophir Munk wrote:
> > > > > > In current DPDK the RTE_MAX_MEMZONE definition is unconditionally hard
> > > > > > coded as 2560.  For applications requiring different values of this
> > > > > > parameter – it is more convenient to set the max value via an rte API
> > -
> > > > > > rather than changing the dpdk source code per application.  In many
> > > > > > organizations, the possibility to compile a private DPDK library for a
> > > > > > particular application does not exist at all.  With this option there
> > is
> > > > > > no need to recompile DPDK and it allows using an in-box packaged DPDK.
> > > > > > An example usage for updating the RTE_MAX_MEMZONE would be of an
> > > > > > application that uses the DPDK mempool library which is based on DPDK
> > > > > > memzone library.  The application may need to create a number of
> > > > > > steering tables, each of which will require its own mempool
> > allocation.
> > > > > > This commit is not about how to optimize the application usage of
> > > > > > mempool nor about how to improve the mempool implementation based on
> > > > > > memzone.  It is about how to make the max memzone definition - run-
> > time
> > > > > > customized.
> > > > > > This commit adds an API which must be called before rte_eal_init():
> > > > > > rte_memzone_max_set(int max).  If not called, the default memzone
> > > > > > (RTE_MAX_MEMZONE) is used.  There is also an API to query the
> > effective
> > > > > > max memzone: rte_memzone_max_get().
> > > > > >
> > > > > > Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
> > > > > > ---
> > > > >
> > > > > the use case of each application may want a different non-hard coded
> > > > > value makes sense.
> > > > >
> > > > > it's less clear to me that requiring it be called before eal init makes
> > > > > sense over just providing it as configuration to eal init so that it is
> > > > > composed.
> > > >
> > > > Why do you think it would be better as EAL init option?
> > > > From an API perspective, I think it is simpler to call a dedicated
> > function.
> > > > And I don't think a user wants to deal with it when starting the
> > application.
> > >
> > > because a dedicated function that can be called detached from the eal
> > > state enables an opportunity for accidental and confusing use outside
> > > the correct context.
> > >
> > > i know the above prescribes not to do this but.
> > >
> > > now you can call set after eal init, but we protect about calling it
> > > after init by failing. what do we do sensibly with the failure?
> > 
> > It would be a developer mistake which could be fix during development stage
> > very easily. I don't see a problem here.
> 
> Why is this not just a command line parameter, like other EAL configuration options?
> 
> Do any other pre-init APIs exist, or are you introducing a new design pattern for configuring EAL?

Let's say it is a "new" design pattern, as discussed multiple times in previous years.
But this one is only for the application,
it is not a user configuration as in rte_eal_init(int argc, char **argv).

> Any application can simply modify the command line parameters before calling EAL init. It doesn't need to pass the command line parameters as-is to EAL init.

It is not very easy to use.

> In other words: There is an existing design pattern for configuring EAL, why introduce a new design pattern?

Because argc/argv is a bad pattern.
We had multiple requests to avoid it.
So when introducing a new option, it is better to avoid it.

> If we want to expose APIs for configuring EAL instead of passing command line parameters, such APIs should be added for all EAL configuration parameters.

The memzone parameter is not supposed to be configured by the user,
so it does not make sense to expose it via argc/argv.

> That would be nice, but I dislike that some EAL configuration parameters must be passed using one method and some other passed using another method.

We asked multiple times for such rework.
And the patches from Bruce to split some EAL parts are in this direction.
If you want to propose some new functions to configure EAL, you are welcome.
  
Morten Brørup April 21, 2023, 3:19 p.m. UTC | #10
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Friday, 21 April 2023 16.57
> 
> 21/04/2023 13:08, Morten Brørup:
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > > Sent: Friday, 21 April 2023 10.35
> > > 20/04/2023 20:20, Tyler Retzlaff:
> > > > On Thu, Apr 20, 2023 at 09:43:28AM +0200, Thomas Monjalon wrote:
> > > > > 19/04/2023 16:51, Tyler Retzlaff:
> > > > > > On Wed, Apr 19, 2023 at 11:36:34AM +0300, Ophir Munk wrote:
> > > > > > > In current DPDK the RTE_MAX_MEMZONE definition is unconditionally
> hard
> > > > > > > coded as 2560.  For applications requiring different values of
> this
> > > > > > > parameter – it is more convenient to set the max value via an rte
> API
> > > -
> > > > > > > rather than changing the dpdk source code per application.  In
> many
> > > > > > > organizations, the possibility to compile a private DPDK library
> for a
> > > > > > > particular application does not exist at all.  With this option
> there
> > > is
> > > > > > > no need to recompile DPDK and it allows using an in-box packaged
> DPDK.
> > > > > > > An example usage for updating the RTE_MAX_MEMZONE would be of an
> > > > > > > application that uses the DPDK mempool library which is based on
> DPDK
> > > > > > > memzone library.  The application may need to create a number of
> > > > > > > steering tables, each of which will require its own mempool
> > > allocation.
> > > > > > > This commit is not about how to optimize the application usage of
> > > > > > > mempool nor about how to improve the mempool implementation based
> on
> > > > > > > memzone.  It is about how to make the max memzone definition -
> run-
> > > time
> > > > > > > customized.
> > > > > > > This commit adds an API which must be called before
> rte_eal_init():
> > > > > > > rte_memzone_max_set(int max).  If not called, the default memzone
> > > > > > > (RTE_MAX_MEMZONE) is used.  There is also an API to query the
> > > effective
> > > > > > > max memzone: rte_memzone_max_get().
> > > > > > >
> > > > > > > Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
> > > > > > > ---
> > > > > >
> > > > > > the use case of each application may want a different non-hard coded
> > > > > > value makes sense.
> > > > > >
> > > > > > it's less clear to me that requiring it be called before eal init
> makes
> > > > > > sense over just providing it as configuration to eal init so that it
> is
> > > > > > composed.
> > > > >
> > > > > Why do you think it would be better as EAL init option?
> > > > > From an API perspective, I think it is simpler to call a dedicated
> > > function.
> > > > > And I don't think a user wants to deal with it when starting the
> > > application.
> > > >
> > > > because a dedicated function that can be called detached from the eal
> > > > state enables an opportunity for accidental and confusing use outside
> > > > the correct context.
> > > >
> > > > i know the above prescribes not to do this but.
> > > >
> > > > now you can call set after eal init, but we protect about calling it
> > > > after init by failing. what do we do sensibly with the failure?
> > >
> > > It would be a developer mistake which could be fix during development
> stage
> > > very easily. I don't see a problem here.
> >
> > Why is this not just a command line parameter, like other EAL configuration
> options?
> >
> > Do any other pre-init APIs exist, or are you introducing a new design
> pattern for configuring EAL?
> 
> Let's say it is a "new" design pattern, as discussed multiple times in
> previous years.
> But this one is only for the application,
> it is not a user configuration as in rte_eal_init(int argc, char **argv).
> 
> > Any application can simply modify the command line parameters before calling
> EAL init. It doesn't need to pass the command line parameters as-is to EAL
> init.
> 
> It is not very easy to use.
> 
> > In other words: There is an existing design pattern for configuring EAL, why
> introduce a new design pattern?
> 
> Because argc/argv is a bad pattern.
> We had multiple requests to avoid it.
> So when introducing a new option, it is better to avoid it.
> 
> > If we want to expose APIs for configuring EAL instead of passing command
> line parameters, such APIs should be added for all EAL configuration
> parameters.
> 
> The memzone parameter is not supposed to be configured by the user,
> so it does not make sense to expose it via argc/argv.

Good point! I didn't think about that; in hardware appliances, the user has no access to provide EAL command line parameters.

> 
> > That would be nice, but I dislike that some EAL configuration parameters
> must be passed using one method and some other passed using another method.
> 
> We asked multiple times for such rework.

High level directions/goals for DPDK, such as replacing EAL command line parameters with APIs, should be noted on the Roadmap web page.

> And the patches from Bruce to split some EAL parts are in this direction.
> If you want to propose some new functions to configure EAL, you are welcome.

OK. I retract my objection. :-)
  
Ophir Munk April 24, 2023, 9:07 p.m. UTC | #11
Thank you Devendra Singh Rawat for your valuable comments.

> >+int ecore_mz_mapping_alloc(void)
> >+{
> >+	ecore_mz_mapping = rte_malloc("ecore_mz_map", 0,
> >+		rte_memzone_max_get() * sizeof(struct rte_memzone *));
> 
> Second parameter of rte_malloc() should be size and Third parameter should
> be alignment 0 in this case.
> 
> Check
> https://doc.dpdk.org/api/rte__malloc_8h.html#a247c99e8d36300c52729c9e
> e58c2b489

Ack

> >--- a/drivers/net/qede/qede_main.c
> >+++ b/drivers/net/qede/qede_main.c
> >@@ -78,6 +78,12 @@ qed_probe(struct ecore_dev *edev, struct
> >rte_pci_device *pci_dev,
> > 		return rc;
> > 	}
> >
> >+	rc = ecore_mz_mapping_alloc();
> 
> ecore_mz_mapping_alloc() should be called prior to calling
> ecore_hw_prepare().
> 

Ack

> >
> >@@ -721,6 +727,7 @@ static void qed_remove(struct ecore_dev *edev)
> > 	if (!edev)
> > 		return;
> >
> >+	ecore_mz_mapping_free();
> > 	ecore_hw_remove(edev);
> > }
> 
> ecore_mz_mapping_free() should be called after ecore_hw_remove();

Ack
  
Ophir Munk April 24, 2023, 9:43 p.m. UTC | #12
Thank you Stephen Memminger for you comment.

> Subject: Re: [RFC] lib: set/get max memzone segments
> 
> On Wed, 19 Apr 2023 11:36:34 +0300
> Ophir Munk <ophirmu@nvidia.com> wrote:
> 
> > +int ecore_mz_mapping_alloc(void)
> > +{
> > +	ecore_mz_mapping = rte_malloc("ecore_mz_map", 0,
> > +		rte_memzone_max_get() * sizeof(struct rte_memzone *));
> 
> Why not use rte_calloc(), 

rte_malloc() replaced with rte_zmalloc().

> and devices should be using NUMA aware
> allocation to put the memzone on same NUMA node as the PCI device.

I leave this optimization to driver developers. I don't think it should be part of this RFC.
  
Ophir Munk April 25, 2023, 1:46 p.m. UTC | #13
Thank you, Tyler Retzlaff, for your comments.

> > --- a/app/test/test_memzone.c
> > +++ b/app/test/test_memzone.c
> > @@ -871,7 +871,7 @@ test_memzone_bounded(void)  static int
> >  test_memzone_free(void)
> >  {
> > -	const struct rte_memzone *mz[RTE_MAX_MEMZONE + 1];
> > +	const struct rte_memzone *mz[rte_memzone_max_get() + 1];
> 
> please no more VLAs even if in tests.
> 

VLA replaced with dynamic allocation. 

> > --- a/lib/eal/common/eal_common_memzone.c
> > +++ b/lib/eal/common/eal_common_memzone.c
> > @@ -22,6 +22,10 @@
> >  #include "eal_private.h"
> >  #include "eal_memcfg.h"
> >
> > +#define RTE_DEFAULT_MAX_MEMZONE 2560
> > +
> > +static uint32_t memzone_max = RTE_DEFAULT_MAX_MEMZONE;
> 
> should be size_t
> 

Ack

> > +int
> > +rte_memzone_max_set(uint32_t max)
> 
> max should be size_t
> 
> > +{

Ack

> > +uint32_t
> > +rte_memzone_max_get(void)
> 
> should return size_t
> 

Ack
  
Ophir Munk April 25, 2023, 4:38 p.m. UTC | #14
Thank you Morten Brorup and Thomas Monjalon for the fruitful discussion.
I am sending a V2 version that meets the understandings in the RFC so far.
If confirmed I will send PATCH V1.
  

Patch

diff --git a/app/test/test_func_reentrancy.c b/app/test/test_func_reentrancy.c
index d1ed5d4..ae9de6f 100644
--- a/app/test/test_func_reentrancy.c
+++ b/app/test/test_func_reentrancy.c
@@ -51,7 +51,7 @@  typedef void (*case_clean_t)(unsigned lcore_id);
 #define MEMPOOL_ELT_SIZE                    (sizeof(uint32_t))
 #define MEMPOOL_SIZE                        (4)
 
-#define MAX_LCORES	(RTE_MAX_MEMZONE / (MAX_ITER_MULTI * 4U))
+#define MAX_LCORES	(rte_memzone_max_get() / (MAX_ITER_MULTI * 4U))
 
 static uint32_t obj_count;
 static uint32_t synchro;
diff --git a/app/test/test_malloc_perf.c b/app/test/test_malloc_perf.c
index ccec43a..9bd1662 100644
--- a/app/test/test_malloc_perf.c
+++ b/app/test/test_malloc_perf.c
@@ -165,7 +165,7 @@  test_malloc_perf(void)
 		return -1;
 
 	if (test_alloc_perf("rte_memzone_reserve", memzone_alloc, memzone_free,
-			NULL, memset_us_gb, RTE_MAX_MEMZONE - 1) < 0)
+			NULL, memset_us_gb, rte_memzone_max_get() - 1) < 0)
 		return -1;
 
 	return 0;
diff --git a/app/test/test_memzone.c b/app/test/test_memzone.c
index c9255e5..a315826 100644
--- a/app/test/test_memzone.c
+++ b/app/test/test_memzone.c
@@ -871,7 +871,7 @@  test_memzone_bounded(void)
 static int
 test_memzone_free(void)
 {
-	const struct rte_memzone *mz[RTE_MAX_MEMZONE + 1];
+	const struct rte_memzone *mz[rte_memzone_max_get() + 1];
 	int i;
 	char name[20];
 
diff --git a/config/rte_config.h b/config/rte_config.h
index 7b8c85e..400e44e 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -34,7 +34,6 @@ 
 #define RTE_MAX_MEM_MB_PER_LIST 32768
 #define RTE_MAX_MEMSEG_PER_TYPE 32768
 #define RTE_MAX_MEM_MB_PER_TYPE 65536
-#define RTE_MAX_MEMZONE 2560
 #define RTE_MAX_TAILQ 32
 #define RTE_LOG_DP_LEVEL RTE_LOG_INFO
 #define RTE_MAX_VFIO_CONTAINERS 64
diff --git a/drivers/net/qede/base/bcm_osal.c b/drivers/net/qede/base/bcm_osal.c
index 2c59397..f195f2c 100644
--- a/drivers/net/qede/base/bcm_osal.c
+++ b/drivers/net/qede/base/bcm_osal.c
@@ -47,10 +47,26 @@  void osal_poll_mode_dpc(osal_int_ptr_t hwfn_cookie)
 }
 
 /* Array of memzone pointers */
-static const struct rte_memzone *ecore_mz_mapping[RTE_MAX_MEMZONE];
+static const struct rte_memzone **ecore_mz_mapping;
 /* Counter to track current memzone allocated */
 static uint16_t ecore_mz_count;
 
+int ecore_mz_mapping_alloc(void)
+{
+	ecore_mz_mapping = rte_malloc("ecore_mz_map", 0,
+		rte_memzone_max_get() * sizeof(struct rte_memzone *));
+
+	if (!ecore_mz_mapping)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void ecore_mz_mapping_free(void)
+{
+	rte_free(ecore_mz_mapping);
+}
+
 unsigned long qede_log2_align(unsigned long n)
 {
 	unsigned long ret = n ? 1 : 0;
@@ -132,9 +148,9 @@  void *osal_dma_alloc_coherent(struct ecore_dev *p_dev,
 	uint32_t core_id = rte_lcore_id();
 	unsigned int socket_id;
 
-	if (ecore_mz_count >= RTE_MAX_MEMZONE) {
+	if (ecore_mz_count >= rte_memzone_max_get()) {
 		DP_ERR(p_dev, "Memzone allocation count exceeds %u\n",
-		       RTE_MAX_MEMZONE);
+		       rte_memzone_max_get());
 		*phys = 0;
 		return OSAL_NULL;
 	}
@@ -171,9 +187,9 @@  void *osal_dma_alloc_coherent_aligned(struct ecore_dev *p_dev,
 	uint32_t core_id = rte_lcore_id();
 	unsigned int socket_id;
 
-	if (ecore_mz_count >= RTE_MAX_MEMZONE) {
+	if (ecore_mz_count >= rte_memzone_max_get()) {
 		DP_ERR(p_dev, "Memzone allocation count exceeds %u\n",
-		       RTE_MAX_MEMZONE);
+		       rte_memzone_max_get());
 		*phys = 0;
 		return OSAL_NULL;
 	}
diff --git a/drivers/net/qede/base/bcm_osal.h b/drivers/net/qede/base/bcm_osal.h
index 67e7f75..97e261d 100644
--- a/drivers/net/qede/base/bcm_osal.h
+++ b/drivers/net/qede/base/bcm_osal.h
@@ -477,4 +477,7 @@  enum dbg_status	qed_dbg_alloc_user_data(struct ecore_hwfn *p_hwfn,
 	qed_dbg_alloc_user_data(p_hwfn, user_data_ptr)
 #define OSAL_DB_REC_OCCURRED(p_hwfn) nothing
 
+int ecore_mz_mapping_alloc(void);
+void ecore_mz_mapping_free(void);
+
 #endif /* __BCM_OSAL_H */
diff --git a/drivers/net/qede/qede_main.c b/drivers/net/qede/qede_main.c
index 0303903..f116e86 100644
--- a/drivers/net/qede/qede_main.c
+++ b/drivers/net/qede/qede_main.c
@@ -78,6 +78,12 @@  qed_probe(struct ecore_dev *edev, struct rte_pci_device *pci_dev,
 		return rc;
 	}
 
+	rc = ecore_mz_mapping_alloc();
+	if (rc) {
+		DP_ERR(edev, "mem zones array allocation failed\n");
+		return rc;
+	}
+
 	return rc;
 }
 
@@ -721,6 +727,7 @@  static void qed_remove(struct ecore_dev *edev)
 	if (!edev)
 		return;
 
+	ecore_mz_mapping_free();
 	ecore_hw_remove(edev);
 }
 
diff --git a/lib/eal/common/eal_common_memzone.c b/lib/eal/common/eal_common_memzone.c
index a9cd91f..6c43b7f 100644
--- a/lib/eal/common/eal_common_memzone.c
+++ b/lib/eal/common/eal_common_memzone.c
@@ -22,6 +22,10 @@ 
 #include "eal_private.h"
 #include "eal_memcfg.h"
 
+#define RTE_DEFAULT_MAX_MEMZONE 2560
+
+static uint32_t memzone_max = RTE_DEFAULT_MAX_MEMZONE;
+
 static inline const struct rte_memzone *
 memzone_lookup_thread_unsafe(const char *name)
 {
@@ -81,8 +85,9 @@  memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	/* no more room in config */
 	if (arr->count >= arr->len) {
 		RTE_LOG(ERR, EAL,
-		"%s(): Number of requested memzone segments exceeds RTE_MAX_MEMZONE\n",
-			__func__);
+		"%s(): Number of requested memzone segments exceeds max "
+		"memzone segments (%d >= %d)\n",
+			__func__, arr->count, arr->len);
 		rte_errno = ENOSPC;
 		return NULL;
 	}
@@ -396,7 +401,7 @@  rte_eal_memzone_init(void)
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
 			rte_fbarray_init(&mcfg->memzones, "memzone",
-			RTE_MAX_MEMZONE, sizeof(struct rte_memzone))) {
+			rte_memzone_max_get(), sizeof(struct rte_memzone))) {
 		RTE_LOG(ERR, EAL, "Cannot allocate memzone list\n");
 		ret = -1;
 	} else if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
@@ -430,3 +435,20 @@  void rte_memzone_walk(void (*func)(const struct rte_memzone *, void *),
 	}
 	rte_rwlock_read_unlock(&mcfg->mlock);
 }
+
+int
+rte_memzone_max_set(uint32_t max)
+{
+	/* Setting max memzone must occur befaore calling rte_eal_init() */
+	if (eal_get_internal_configuration()->init_complete > 0)
+		return -1;
+
+	memzone_max = max;
+	return 0;
+}
+
+uint32_t
+rte_memzone_max_get(void)
+{
+	return memzone_max;
+}
diff --git a/lib/eal/include/rte_memzone.h b/lib/eal/include/rte_memzone.h
index 5302caa..ca60409 100644
--- a/lib/eal/include/rte_memzone.h
+++ b/lib/eal/include/rte_memzone.h
@@ -305,6 +305,26 @@  void rte_memzone_dump(FILE *f);
 void rte_memzone_walk(void (*func)(const struct rte_memzone *, void *arg),
 		      void *arg);
 
+/**
+ * Set max memzone value
+ *
+ * @param max
+ *   Value of max memzone allocations
+ * @return
+ *  0 on success, -1 otherwise
+ */
+__rte_experimental
+int rte_memzone_max_set(uint32_t max);
+
+/**
+ * Get max memzone value
+ *
+ * @return
+ *   Value of max memzone allocations
+ */
+__rte_experimental
+uint32_t rte_memzone_max_get(void);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/version.map b/lib/eal/version.map
index 6d6978f..717c5b2 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -430,6 +430,10 @@  EXPERIMENTAL {
 	rte_thread_create_control;
 	rte_thread_set_name;
 	__rte_eal_trace_generic_blob;
+
+	# added in 23.07
+	rte_memzone_max_set;
+	rte_memzone_max_get;
 };
 
 INTERNAL {