[V3] lib: set/get max memzone segments

Message ID 20230503072641.474600-1-ophirmu@nvidia.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [V3] lib: set/get max memzone segments |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/github-robot: build success github build: passed
ci/iol-aarch64-unit-testing success Testing PASS
ci/intel-Testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-unit-testing success Testing PASS
ci/intel-Functional success Functional PASS

Commit Message

Ophir Munk May 3, 2023, 7:26 a.m. UTC
  In current DPDK the RTE_MAX_MEMZONE definition is unconditionally hard
coded as 2560.  For applications requiring different values of this
parameter – it is more convenient to set the max value via an rte API -
rather than changing the dpdk source code per application.  In many
organizations, the possibility to compile a private DPDK library for a
particular application does not exist at all.  With this option there is
no need to recompile DPDK and it allows using an in-box packaged DPDK.
An example usage for updating the RTE_MAX_MEMZONE would be of an
application that uses the DPDK mempool library which is based on DPDK
memzone library.  The application may need to create a number of
steering tables, each of which will require its own mempool allocation.
This commit is not about how to optimize the application usage of
mempool nor about how to improve the mempool implementation based on
memzone.  It is about how to make the max memzone definition - run-time
customized.
This commit adds an API which must be called before rte_eal_init():
rte_memzone_max_set(int max).  If not called, the default memzone
(RTE_MAX_MEMZONE) is used.  There is also an API to query the effective
max memzone: rte_memzone_max_get().

Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
---
 app/test/test_func_reentrancy.c     |  2 +-
 app/test/test_malloc_perf.c         |  2 +-
 app/test/test_memzone.c             | 43 ++++++++++++++++++++++++-------------
 config/rte_config.h                 |  1 -
 drivers/net/qede/base/bcm_osal.c    | 30 ++++++++++++++++++++------
 drivers/net/qede/base/bcm_osal.h    |  3 +++
 drivers/net/qede/qede_main.c        |  7 ++++++
 lib/eal/common/eal_common_memzone.c | 28 +++++++++++++++++++++---
 lib/eal/include/rte_memzone.h       | 20 +++++++++++++++++
 lib/eal/version.map                 |  4 ++++
 10 files changed, 112 insertions(+), 28 deletions(-)
  

Comments

Morten Brørup May 3, 2023, 9:41 p.m. UTC | #1
> From: Ophir Munk [mailto:ophirmu@nvidia.com]
> Sent: Wednesday, 3 May 2023 09.27
> 
> In current DPDK the RTE_MAX_MEMZONE definition is unconditionally hard
> coded as 2560.  For applications requiring different values of this
> parameter – it is more convenient to set the max value via an rte API -
> rather than changing the dpdk source code per application.  In many
> organizations, the possibility to compile a private DPDK library for a
> particular application does not exist at all.  With this option there is
> no need to recompile DPDK and it allows using an in-box packaged DPDK.
> An example usage for updating the RTE_MAX_MEMZONE would be of an
> application that uses the DPDK mempool library which is based on DPDK
> memzone library.  The application may need to create a number of
> steering tables, each of which will require its own mempool allocation.
> This commit is not about how to optimize the application usage of
> mempool nor about how to improve the mempool implementation based on
> memzone.  It is about how to make the max memzone definition - run-time
> customized.
> This commit adds an API which must be called before rte_eal_init():
> rte_memzone_max_set(int max).  If not called, the default memzone
> (RTE_MAX_MEMZONE) is used.  There is also an API to query the effective
> max memzone: rte_memzone_max_get().
> 
> Signed-off-by: Ophir Munk <ophirmu@nvidia.com>

I retracted my objection to the RFC, but should also have added:

Acked-by: Morten Brørup <mb@smartsharesystems.com>
  
David Marchand May 4, 2023, 7:27 a.m. UTC | #2
Hello Ophir,

Good thing someone is looking into this.
Thanks.

I have a few comments.


This commitlog is a bit compact.
Separating it with some empty lines would help digest it.


On Wed, May 3, 2023 at 9:27 AM Ophir Munk <ophirmu@nvidia.com> wrote:
>
> In current DPDK the RTE_MAX_MEMZONE definition is unconditionally hard
> coded as 2560.
> For applications requiring different values of this
> parameter – it is more convenient to set the max value via an rte API -
> rather than changing the dpdk source code per application.  In many
> organizations, the possibility to compile a private DPDK library for a
> particular application does not exist at all.  With this option there is
> no need to recompile DPDK and it allows using an in-box packaged DPDK.
> An example usage for updating the RTE_MAX_MEMZONE would be of an
> application that uses the DPDK mempool library which is based on DPDK
> memzone library.  The application may need to create a number of
> steering tables, each of which will require its own mempool allocation.
> This commit is not about how to optimize the application usage of
> mempool nor about how to improve the mempool implementation based on
> memzone.  It is about how to make the max memzone definition - run-time
> customized.

The code was using a rather short name "RTE_MAX_MEMZONE".
But I prefer we spell this as "max memzones count" (or a better
wording), in the descriptions/comments.


> This commit adds an API which must be called before rte_eal_init():
> rte_memzone_max_set(int max).  If not called, the default memzone

Afaics, this prototype got unaligned with the patch content, as a
size_t is now taken as input.
You can simply mention rte_memzone_max_set().


> (RTE_MAX_MEMZONE) is used.  There is also an API to query the effective

After the patch RTE_MAX_MEMZONE does not exist anymore.


> max memzone: rte_memzone_max_get().
>
> Signed-off-by: Ophir Munk <ophirmu@nvidia.com>


A global comment on the patch:

rte_calloc provides what you want in all cases below: an array of objects.
I prefer rte_calloc(count, sizeof elem) to rte_zmalloc(count * sizeof elem).

This also avoids a temporary variable to compute the total size of
such an array.


> ---
>  app/test/test_func_reentrancy.c     |  2 +-
>  app/test/test_malloc_perf.c         |  2 +-
>  app/test/test_memzone.c             | 43 ++++++++++++++++++++++++-------------
>  config/rte_config.h                 |  1 -
>  drivers/net/qede/base/bcm_osal.c    | 30 ++++++++++++++++++++------
>  drivers/net/qede/base/bcm_osal.h    |  3 +++
>  drivers/net/qede/qede_main.c        |  7 ++++++
>  lib/eal/common/eal_common_memzone.c | 28 +++++++++++++++++++++---
>  lib/eal/include/rte_memzone.h       | 20 +++++++++++++++++
>  lib/eal/version.map                 |  4 ++++
>  10 files changed, 112 insertions(+), 28 deletions(-)
>
> diff --git a/app/test/test_func_reentrancy.c b/app/test/test_func_reentrancy.c
> index d1ed5d4..ae9de6f 100644
> --- a/app/test/test_func_reentrancy.c
> +++ b/app/test/test_func_reentrancy.c
> @@ -51,7 +51,7 @@ typedef void (*case_clean_t)(unsigned lcore_id);
>  #define MEMPOOL_ELT_SIZE                    (sizeof(uint32_t))
>  #define MEMPOOL_SIZE                        (4)
>
> -#define MAX_LCORES     (RTE_MAX_MEMZONE / (MAX_ITER_MULTI * 4U))
> +#define MAX_LCORES     (rte_memzone_max_get() / (MAX_ITER_MULTI * 4U))
>
>  static uint32_t obj_count;
>  static uint32_t synchro;
> diff --git a/app/test/test_malloc_perf.c b/app/test/test_malloc_perf.c
> index ccec43a..9bd1662 100644
> --- a/app/test/test_malloc_perf.c
> +++ b/app/test/test_malloc_perf.c
> @@ -165,7 +165,7 @@ test_malloc_perf(void)
>                 return -1;
>
>         if (test_alloc_perf("rte_memzone_reserve", memzone_alloc, memzone_free,
> -                       NULL, memset_us_gb, RTE_MAX_MEMZONE - 1) < 0)
> +                       NULL, memset_us_gb, rte_memzone_max_get() - 1) < 0)
>                 return -1;
>
>         return 0;
> diff --git a/app/test/test_memzone.c b/app/test/test_memzone.c
> index c9255e5..36b9790 100644
> --- a/app/test/test_memzone.c
> +++ b/app/test/test_memzone.c
> @@ -871,9 +871,18 @@ test_memzone_bounded(void)
>  static int
>  test_memzone_free(void)
>  {
> -       const struct rte_memzone *mz[RTE_MAX_MEMZONE + 1];
> +       size_t mz_size;
> +       const struct rte_memzone **mz;
>         int i;
>         char name[20];
> +       int rc = -1;
> +
> +       mz_size = (rte_memzone_max_get() + 1) * sizeof(struct rte_memzone *);
> +       mz = rte_zmalloc("memzone_test", mz_size, 0);
> +       if (!mz) {
> +               printf("Fail allocating memzone test array\n");
> +               return rc;
> +       }
>
>         mz[0] = rte_memzone_reserve(TEST_MEMZONE_NAME("tempzone0"), 2000,
>                         SOCKET_ID_ANY, 0);
> @@ -881,42 +890,42 @@ test_memzone_free(void)
>                         SOCKET_ID_ANY, 0);
>
>         if (mz[0] > mz[1])
> -               return -1;
> +               goto exit_test;
>         if (!rte_memzone_lookup(TEST_MEMZONE_NAME("tempzone0")))
> -               return -1;
> +               goto exit_test;
>         if (!rte_memzone_lookup(TEST_MEMZONE_NAME("tempzone1")))
> -               return -1;
> +               goto exit_test;
>
>         if (rte_memzone_free(mz[0])) {
>                 printf("Fail memzone free - tempzone0\n");
> -               return -1;
> +               goto exit_test;
>         }
>         if (rte_memzone_lookup(TEST_MEMZONE_NAME("tempzone0"))) {
>                 printf("Found previously free memzone - tempzone0\n");
> -               return -1;
> +               goto exit_test;
>         }
>         mz[2] = rte_memzone_reserve(TEST_MEMZONE_NAME("tempzone2"), 2000,
>                         SOCKET_ID_ANY, 0);
>
>         if (mz[2] > mz[1]) {
>                 printf("tempzone2 should have gotten the free entry from tempzone0\n");
> -               return -1;
> +               goto exit_test;
>         }
>         if (rte_memzone_free(mz[2])) {
>                 printf("Fail memzone free - tempzone2\n");
> -               return -1;
> +               goto exit_test;
>         }
>         if (rte_memzone_lookup(TEST_MEMZONE_NAME("tempzone2"))) {
>                 printf("Found previously free memzone - tempzone2\n");
> -               return -1;
> +               goto exit_test;
>         }
>         if (rte_memzone_free(mz[1])) {
>                 printf("Fail memzone free - tempzone1\n");
> -               return -1;
> +               goto exit_test;
>         }
>         if (rte_memzone_lookup(TEST_MEMZONE_NAME("tempzone1"))) {
>                 printf("Found previously free memzone - tempzone1\n");
> -               return -1;
> +               goto exit_test;
>         }
>
>         i = 0;
> @@ -928,7 +937,7 @@ test_memzone_free(void)
>
>         if (rte_memzone_free(mz[0])) {
>                 printf("Fail memzone free - tempzone0\n");
> -               return -1;
> +               goto exit_test;
>         }
>         mz[0] = rte_memzone_reserve(TEST_MEMZONE_NAME("tempzone0new"), 0,
>                         SOCKET_ID_ANY, 0);
> @@ -936,17 +945,21 @@ test_memzone_free(void)
>         if (mz[0] == NULL) {
>                 printf("Fail to create memzone - tempzone0new - when MAX memzones were "
>                                 "created and one was free\n");
> -               return -1;
> +               goto exit_test;
>         }
>
>         for (i = i - 2; i >= 0; i--) {
>                 if (rte_memzone_free(mz[i])) {
>                         printf("Fail memzone free - tempzone%d\n", i);
> -                       return -1;
> +                       goto exit_test;
>                 }
>         }
>
> -       return 0;
> +       rc = 0;
> +
> +exit_test:
> +       rte_free(mz);
> +       return rc;
>  }
>
>  static int test_memzones_left;
> diff --git a/config/rte_config.h b/config/rte_config.h
> index 7b8c85e..400e44e 100644
> --- a/config/rte_config.h
> +++ b/config/rte_config.h
> @@ -34,7 +34,6 @@
>  #define RTE_MAX_MEM_MB_PER_LIST 32768
>  #define RTE_MAX_MEMSEG_PER_TYPE 32768
>  #define RTE_MAX_MEM_MB_PER_TYPE 65536
> -#define RTE_MAX_MEMZONE 2560
>  #define RTE_MAX_TAILQ 32
>  #define RTE_LOG_DP_LEVEL RTE_LOG_INFO
>  #define RTE_MAX_VFIO_CONTAINERS 64
> diff --git a/drivers/net/qede/base/bcm_osal.c b/drivers/net/qede/base/bcm_osal.c
> index 2c59397..0458768 100644
> --- a/drivers/net/qede/base/bcm_osal.c
> +++ b/drivers/net/qede/base/bcm_osal.c
> @@ -47,10 +47,26 @@ void osal_poll_mode_dpc(osal_int_ptr_t hwfn_cookie)
>  }
>
>  /* Array of memzone pointers */
> -static const struct rte_memzone *ecore_mz_mapping[RTE_MAX_MEMZONE];
> +static const struct rte_memzone **ecore_mz_mapping;
>  /* Counter to track current memzone allocated */
>  static uint16_t ecore_mz_count;
>
> +int ecore_mz_mapping_alloc(void)
> +{
> +       ecore_mz_mapping = rte_zmalloc("ecore_mz_map",
> +               rte_memzone_max_get() * sizeof(struct rte_memzone *), 0);

I think we should allocate only for the first call and we are missing
some refcount.


> +
> +       if (!ecore_mz_mapping)
> +               return -ENOMEM;
> +
> +       return 0;
> +}
> +
> +void ecore_mz_mapping_free(void)
> +{
> +       rte_free(ecore_mz_mapping);

Won't we release this array while another qed device is still in use?


> +}
> +
>  unsigned long qede_log2_align(unsigned long n)
>  {
>         unsigned long ret = n ? 1 : 0;
> @@ -132,9 +148,9 @@ void *osal_dma_alloc_coherent(struct ecore_dev *p_dev,
>         uint32_t core_id = rte_lcore_id();
>         unsigned int socket_id;
>
> -       if (ecore_mz_count >= RTE_MAX_MEMZONE) {
> -               DP_ERR(p_dev, "Memzone allocation count exceeds %u\n",
> -                      RTE_MAX_MEMZONE);
> +       if (ecore_mz_count >= rte_memzone_max_get()) {
> +               DP_ERR(p_dev, "Memzone allocation count exceeds %zu\n",
> +                      rte_memzone_max_get());
>                 *phys = 0;
>                 return OSAL_NULL;
>         }
> @@ -171,9 +187,9 @@ void *osal_dma_alloc_coherent_aligned(struct ecore_dev *p_dev,
>         uint32_t core_id = rte_lcore_id();
>         unsigned int socket_id;
>
> -       if (ecore_mz_count >= RTE_MAX_MEMZONE) {
> -               DP_ERR(p_dev, "Memzone allocation count exceeds %u\n",
> -                      RTE_MAX_MEMZONE);
> +       if (ecore_mz_count >= rte_memzone_max_get()) {
> +               DP_ERR(p_dev, "Memzone allocation count exceeds %zu\n",
> +                      rte_memzone_max_get());
>                 *phys = 0;
>                 return OSAL_NULL;
>         }
> diff --git a/drivers/net/qede/base/bcm_osal.h b/drivers/net/qede/base/bcm_osal.h
> index 67e7f75..97e261d 100644
> --- a/drivers/net/qede/base/bcm_osal.h
> +++ b/drivers/net/qede/base/bcm_osal.h
> @@ -477,4 +477,7 @@ enum dbg_status     qed_dbg_alloc_user_data(struct ecore_hwfn *p_hwfn,
>         qed_dbg_alloc_user_data(p_hwfn, user_data_ptr)
>  #define OSAL_DB_REC_OCCURRED(p_hwfn) nothing
>
> +int ecore_mz_mapping_alloc(void);
> +void ecore_mz_mapping_free(void);
> +
>  #endif /* __BCM_OSAL_H */
> diff --git a/drivers/net/qede/qede_main.c b/drivers/net/qede/qede_main.c
> index 0303903..fd63262 100644
> --- a/drivers/net/qede/qede_main.c
> +++ b/drivers/net/qede/qede_main.c
> @@ -72,6 +72,12 @@ qed_probe(struct ecore_dev *edev, struct rte_pci_device *pci_dev,
>         hw_prepare_params.allow_mdump = false;
>         hw_prepare_params.b_en_pacing = false;
>         hw_prepare_params.epoch = OSAL_GET_EPOCH(ECORE_LEADING_HWFN(edev));
> +       rc = ecore_mz_mapping_alloc();
> +       if (rc) {
> +               DP_ERR(edev, "mem zones array allocation failed\n");
> +               return rc;
> +       }
> +
>         rc = ecore_hw_prepare(edev, &hw_prepare_params);
>         if (rc) {
>                 DP_ERR(edev, "hw prepare failed\n");
> @@ -722,6 +728,7 @@ static void qed_remove(struct ecore_dev *edev)
>                 return;
>
>         ecore_hw_remove(edev);
> +       ecore_mz_mapping_free();
>  }
>
>  static int qed_send_drv_state(struct ecore_dev *edev, bool active)
> diff --git a/lib/eal/common/eal_common_memzone.c b/lib/eal/common/eal_common_memzone.c
> index a9cd91f..f94330b 100644
> --- a/lib/eal/common/eal_common_memzone.c
> +++ b/lib/eal/common/eal_common_memzone.c
> @@ -22,6 +22,10 @@
>  #include "eal_private.h"
>  #include "eal_memcfg.h"
>
> +#define RTE_DEFAULT_MAX_MEMZONE 2560

No need for a RTE_ prefix for a local macro and ...


> +
> +static size_t memzone_max = RTE_DEFAULT_MAX_MEMZONE;

... in the end, I see no need for the RTE_DEFAULT_MAX_MEMZONE macro at
all, it is only used as the default init value, here.


> +
>  static inline const struct rte_memzone *
>  memzone_lookup_thread_unsafe(const char *name)
>  {
> @@ -81,8 +85,9 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
>         /* no more room in config */
>         if (arr->count >= arr->len) {
>                 RTE_LOG(ERR, EAL,
> -               "%s(): Number of requested memzone segments exceeds RTE_MAX_MEMZONE\n",
> -                       __func__);
> +               "%s(): Number of requested memzone segments exceeds max "
> +               "memzone segments (%d >= %d)\n",

Won't we always display this log for the case when arr->count == arr->len ?
It is then pointless to display both arr->count and arr->len (which
should be formatted as %u).

I would simply log:
RTE_LOG(ERR, EAL,
        "%s(): Number of requested memzone segments exceeds maximum %u\n",
        __func__, arr->len);


> +                       __func__, arr->count, arr->len);
>                 rte_errno = ENOSPC;
>                 return NULL;
>         }
> @@ -396,7 +401,7 @@ rte_eal_memzone_init(void)
>
>         if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
>                         rte_fbarray_init(&mcfg->memzones, "memzone",
> -                       RTE_MAX_MEMZONE, sizeof(struct rte_memzone))) {
> +                       rte_memzone_max_get(), sizeof(struct rte_memzone))) {
>                 RTE_LOG(ERR, EAL, "Cannot allocate memzone list\n");
>                 ret = -1;
>         } else if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
> @@ -430,3 +435,20 @@ void rte_memzone_walk(void (*func)(const struct rte_memzone *, void *),
>         }
>         rte_rwlock_read_unlock(&mcfg->mlock);
>  }
> +
> +int
> +rte_memzone_max_set(size_t max)
> +{
> +       /* Setting max memzone must occur befaore calling rte_eal_init() */

before*

This comment would be better placed in the API description than in the
implementation.


> +       if (eal_get_internal_configuration()->init_complete > 0)
> +               return -1;
> +
> +       memzone_max = max;
> +       return 0;
> +}
> +
> +size_t
> +rte_memzone_max_get(void)
> +{
> +       return memzone_max;
> +}
> diff --git a/lib/eal/include/rte_memzone.h b/lib/eal/include/rte_memzone.h
> index 5302caa..3ff7c73 100644
> --- a/lib/eal/include/rte_memzone.h
> +++ b/lib/eal/include/rte_memzone.h
> @@ -305,6 +305,26 @@ void rte_memzone_dump(FILE *f);
>  void rte_memzone_walk(void (*func)(const struct rte_memzone *, void *arg),
>                       void *arg);
>
> +/**

 * @warning
 * @b EXPERIMENTAL: this API may change without prior notice

> + * Set max memzone value
> + *
> + * @param max
> + *   Value of max memzone allocations

I'd rather describe as:
"Maximum number of memzones"

Please also mention that this function can only be called prior to
rte_eal_init().


> + * @return
> + *  0 on success, -1 otherwise
> + */
> +__rte_experimental
> +int rte_memzone_max_set(size_t max);
> +
> +/**

 * @warning
 * @b EXPERIMENTAL: this API may change without prior notice

> + * Get max memzone value

Get the maximum number of memzones.

And we can remind the developer, here, that this value won't change
after rte_eal_init.


> + *
> + * @return
> + *   Value of max memzone allocations
> + */
> +__rte_experimental
> +size_t rte_memzone_max_get(void);
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/eal/version.map b/lib/eal/version.map
> index 51a820d..b52a83c 100644
> --- a/lib/eal/version.map
> +++ b/lib/eal/version.map
> @@ -430,6 +430,10 @@ EXPERIMENTAL {
>         rte_thread_create_control;
>         rte_thread_set_name;
>         __rte_eal_trace_generic_blob;
> +
> +       # added in 23.07
> +       rte_memzone_max_set;
> +       rte_memzone_max_get;
>  };
>
>  INTERNAL {
> --
> 2.8.4
>
  
Burakov, Anatoly May 18, 2023, 3:54 p.m. UTC | #3
Hi,

On 5/3/2023 8:26 AM, Ophir Munk wrote:
> In current DPDK the RTE_MAX_MEMZONE definition is unconditionally hard
> coded as 2560.  For applications requiring different values of this
> parameter – it is more convenient to set the max value via an rte API -
> rather than changing the dpdk source code per application.  In many
> organizations, the possibility to compile a private DPDK library for a
> particular application does not exist at all.  With this option there is
> no need to recompile DPDK and it allows using an in-box packaged DPDK.
> An example usage for updating the RTE_MAX_MEMZONE would be of an
> application that uses the DPDK mempool library which is based on DPDK
> memzone library.  The application may need to create a number of
> steering tables, each of which will require its own mempool allocation.
> This commit is not about how to optimize the application usage of
> mempool nor about how to improve the mempool implementation based on
> memzone.  It is about how to make the max memzone definition - run-time
> customized.
> This commit adds an API which must be called before rte_eal_init():
> rte_memzone_max_set(int max).  If not called, the default memzone
> (RTE_MAX_MEMZONE) is used.  There is also an API to query the effective
> max memzone: rte_memzone_max_get().

Commit message could use a little rewording and shortening. Suggested:

---

Currently, RTE_MAX_MEMZONE constant is used to decide how many memzones 
a DPDK application can have. This value could technically be changed by 
manually editing `rte_config.h` before compilation, but if DPDK is 
already compiled, that option is not useful. There are certain use cases 
that would benefit from making this value configurable.

This commit addresses the issue by adding a new API to set maximum 
number of memzones before EAL initialization (while using the old 
constant as default value), as well as an API to get current maximum 
number of memzones.

---

What do you think?


>   /* Array of memzone pointers */
> -static const struct rte_memzone *ecore_mz_mapping[RTE_MAX_MEMZONE];
> +static const struct rte_memzone **ecore_mz_mapping;
>   /* Counter to track current memzone allocated */
>   static uint16_t ecore_mz_count;
>   
> +int ecore_mz_mapping_alloc(void)
> +{
> +	ecore_mz_mapping = rte_zmalloc("ecore_mz_map",
> +		rte_memzone_max_get() * sizeof(struct rte_memzone *), 0);

Doesn't this need to check if it's already allocated? Does it need any 
special secondary process handling?

> +
> +	if (!ecore_mz_mapping)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +void ecore_mz_mapping_free(void)
> +{
> +	rte_free(ecore_mz_mapping);

Shouldn't this at least set the pointer to NULL to avoid double-free?

> +#define RTE_DEFAULT_MAX_MEMZONE 2560
> +
> +static size_t memzone_max = RTE_DEFAULT_MAX_MEMZONE;
> +
>   static inline const struct rte_memzone *
>   memzone_lookup_thread_unsafe(const char *name)
>   {
> @@ -81,8 +85,9 @@ memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
>   	/* no more room in config */
>   	if (arr->count >= arr->len) {
>   		RTE_LOG(ERR, EAL,
> -		"%s(): Number of requested memzone segments exceeds RTE_MAX_MEMZONE\n",
> -			__func__);
> +		"%s(): Number of requested memzone segments exceeds max "
> +		"memzone segments (%d >= %d)\n",

I think the "segments" terminology can be dropped, it is a holdover from 
the times when memzones were not allocated by malloc. The message can 
just say "Number of requested memzones exceeds maximum number of memzones".

> +			__func__, arr->count, arr->len);
>   		rte_errno = ENOSPC;
>   		return NULL;
>   	}
> @@ -396,7 +401,7 @@ rte_eal_memzone_init(void)
>   
>   	if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
>   			rte_fbarray_init(&mcfg->memzones, "memzone",
> -			RTE_MAX_MEMZONE, sizeof(struct rte_memzone))) {
> +			rte_memzone_max_get(), sizeof(struct rte_memzone))) {
>   		RTE_LOG(ERR, EAL, "Cannot allocate memzone list\n");
>   		ret = -1;
>   	} else if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
> @@ -430,3 +435,20 @@ void rte_memzone_walk(void (*func)(const struct rte_memzone *, void *),
>   	}
>   	rte_rwlock_read_unlock(&mcfg->mlock);
>   }
> +
> +int
> +rte_memzone_max_set(size_t max)
> +{
> +	/* Setting max memzone must occur befaore calling rte_eal_init() */
> +	if (eal_get_internal_configuration()->init_complete > 0)
> +		return -1;
> +
> +	memzone_max = max;
> +	return 0;
> +}
> +
> +size_t
> +rte_memzone_max_get(void)
> +{
> +	return memzone_max;
> +}

It seems that this is a local (static) value, which means it is not 
shared between processes, and thus could potentially mismatch between 
two different processes. While this _technically_ would not be a problem 
because secondary process init will not actually use this value, but the 
API will still return incorrect information.

I suggest updating/syncing this value somewhere in 
`eal_mcfg_update_internal()/eal_mcfg_update_from_internal()`, and adding 
this value to mem config. An alternative to that would be to just return 
`mem_config->memzones.count` (instead of the value itself), and return 
-1 (or zero?) if init hasn't yet completed.
  
Ophir Munk May 25, 2023, 6:35 a.m. UTC | #4
Hello David,
Thank you for your review. I noticed a review by Anatoly Burakov that addresses similar points to yours. In V4 I supply a reply to you both.

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Thursday, 4 May 2023 10:27
> To: Ophir Munk <ophirmu@nvidia.com>
> Cc: dev@dpdk.org; Bruce Richardson <bruce.richardson@intel.com>; Devendra
> Singh Rawat <dsinghrawat@marvell.com>; Alok Prasad <palok@marvell.com>;
> Ophir Munk <ophirmu@mellanox.com>; Matan Azrad <matan@nvidia.com>;
> NBU-Contact-Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>; Lior
> Margalit <lmargalit@nvidia.com>
> Subject: Re: [PATCH V3] lib: set/get max memzone segments
> 
> Hello Ophir,
> 
> Good thing someone is looking into this.
> Thanks.
> 
> I have a few comments.
> 
> 
> This commitlog is a bit compact.
> Separating it with some empty lines would help digest it.
> 
> 
> 
> The code was using a rather short name "RTE_MAX_MEMZONE".
> But I prefer we spell this as "max memzones count" (or a better wording), in
> the descriptions/comments.
> 
> 

Ack. Commit message updated.

> > This commit adds an API which must be called before rte_eal_init():
> > rte_memzone_max_set(int max).  If not called, the default memzone
> 
> Afaics, this prototype got unaligned with the patch content, as a size_t is now
> taken as input.
> You can simply mention rte_memzone_max_set().
> 
> 

Ack

> > (RTE_MAX_MEMZONE) is used.  There is also an API to query the
> > effective
> 
> After the patch RTE_MAX_MEMZONE does not exist anymore.
> 

Ack

> 
> > max memzone: rte_memzone_max_get().
> >
> > Signed-off-by: Ophir Munk <ophirmu@nvidia.com>
> 
> 
> A global comment on the patch:
> 
> rte_calloc provides what you want in all cases below: an array of objects.
> I prefer rte_calloc(count, sizeof elem) to rte_zmalloc(count * sizeof elem).
> 
> This also avoids a temporary variable to compute the total size of such an
> array.
> 

Ack

> >
> >  /* Array of memzone pointers */
> > -static const struct rte_memzone
> *ecore_mz_mapping[RTE_MAX_MEMZONE];
> > +static const struct rte_memzone **ecore_mz_mapping;
> >  /* Counter to track current memzone allocated */  static uint16_t
> > ecore_mz_count;
> >
> > +int ecore_mz_mapping_alloc(void)
> > +{
> > +       ecore_mz_mapping = rte_zmalloc("ecore_mz_map",
> > +               rte_memzone_max_get() * sizeof(struct rte_memzone *),
> > +0);
> 
> I think we should allocate only for the first call and we are missing some
> refcount.
> 
> 

Ack. This allocation occurs as part of qed_probe. I added a ref count.

> > +
> > +       if (!ecore_mz_mapping)
> > +               return -ENOMEM;
> > +
> > +       return 0;
> > +}
> > +
> > +void ecore_mz_mapping_free(void)
> > +{
> > +       rte_free(ecore_mz_mapping);
> 
> Won't we release this array while another qed device is still in use?
> 
> 

Handled with the ref count.

> >
> > +#define RTE_DEFAULT_MAX_MEMZONE 2560
> 
> No need for a RTE_ prefix for a local macro and ...
> 

Ack

> 
> > +
> > +static size_t memzone_max = RTE_DEFAULT_MAX_MEMZONE;
> 
> ... in the end, I see no need for the RTE_DEFAULT_MAX_MEMZONE macro at
> all, it is only used as the default init value, here.
> 
> 

I prefer leaving the macro as it explains the value context.

> > +
> >  static inline const struct rte_memzone *
> > memzone_lookup_thread_unsafe(const char *name)  { @@ -81,8 +85,9 @@
> > memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
> >         /* no more room in config */
> >         if (arr->count >= arr->len) {
> >                 RTE_LOG(ERR, EAL,
> > -               "%s(): Number of requested memzone segments exceeds
> RTE_MAX_MEMZONE\n",
> > -                       __func__);
> > +               "%s(): Number of requested memzone segments exceeds max "
> > +               "memzone segments (%d >= %d)\n",
> 
> Won't we always display this log for the case when arr->count == arr->len ?
> It is then pointless to display both arr->count and arr->len (which should be
> formatted as %u).
> 
> I would simply log:
> RTE_LOG(ERR, EAL,
>         "%s(): Number of requested memzone segments exceeds maximum
> %u\n",
>         __func__, arr->len);
> 

Ack

> 
> > +{
> > +       /* Setting max memzone must occur befaore calling
> > +rte_eal_init() */
> 
> before*
> 

Ack. Thanks.

> This comment would be better placed in the API description than in the
> implementation.
> 

Ack

> 
> > +       if (eal_get_internal_configuration()->init_complete > 0)
> > +               return -1;
> > +
> > diff --git a/lib/eal/include/rte_memzone.h
> > b/lib/eal/include/rte_memzone.h index 5302caa..3ff7c73 100644
> > --- a/lib/eal/include/rte_memzone.h
> > +++ b/lib/eal/include/rte_memzone.h
> > @@ -305,6 +305,26 @@ void rte_memzone_dump(FILE *f);  void
> > rte_memzone_walk(void (*func)(const struct rte_memzone *, void *arg),
> >                       void *arg);
> >
> > +/**
> 
>  * @warning
>  * @b EXPERIMENTAL: this API may change without prior notice
> 

Ack

> > + * Set max memzone value
> > + *
> > + * @param max
> > + *   Value of max memzone allocations
> 
> I'd rather describe as:
> "Maximum number of memzones"
> 
> Please also mention that this function can only be called prior to rte_eal_init().
> 
> 

Ack

> > + * @return
> > + *  0 on success, -1 otherwise
> > + */
> > +__rte_experimental
> > +int rte_memzone_max_set(size_t max);
> > +
> > +/**
> 
>  * @warning
>  * @b EXPERIMENTAL: this API may change without prior notice
> 

Ack

> > + * Get max memzone value
> 
> Get the maximum number of memzones.
> 
> And we can remind the developer, here, that this value won't change after
> rte_eal_init.
> 
> 

Ack

> >  };
> >
> >  INTERNAL {
> > --
> > 2.8.4
> >
> 
> 
> --
> David Marchand
  
Ophir Munk May 25, 2023, 6:43 a.m. UTC | #5
Hi Anatoly,
Thank you for your review. I noticed a review by David Marchand that addresses similar points to yours. In V4 I supply a reply to you both.

> 
> Commit message could use a little rewording and shortening. Suggested:
> 
> ---
> 
> Currently, RTE_MAX_MEMZONE constant is used to decide how many
> memzones a DPDK application can have. This value could technically be
> changed by manually editing `rte_config.h` before compilation, but if DPDK is
> already compiled, that option is not useful. There are certain use cases that
> would benefit from making this value configurable.
> 
> This commit addresses the issue by adding a new API to set maximum number
> of memzones before EAL initialization (while using the old constant as default
> value), as well as an API to get current maximum number of memzones.
> 
> ---
> 
> What do you think?
> 

Ack.

> 
> >   /* Array of memzone pointers */
> > -static const struct rte_memzone
> *ecore_mz_mapping[RTE_MAX_MEMZONE];
> > +static const struct rte_memzone **ecore_mz_mapping;
> >   /* Counter to track current memzone allocated */
> >   static uint16_t ecore_mz_count;
> >
> > +int ecore_mz_mapping_alloc(void)
> > +{
> > +	ecore_mz_mapping = rte_zmalloc("ecore_mz_map",
> > +		rte_memzone_max_get() * sizeof(struct rte_memzone *), 0);
> 
> Doesn't this need to check if it's already allocated? 

Ack. Issue is addressed with a ref count.

> Does it need any special secondary process handling?
> 

No. 

> > +
> > +	if (!ecore_mz_mapping)
> > +		return -ENOMEM;
> > +
> > +	return 0;
> > +}
> > +
> > +void ecore_mz_mapping_free(void)
> > +{
> > +	rte_free(ecore_mz_mapping);
> 
> Shouldn't this at least set the pointer to NULL to avoid double-free?
> 

Ack

> > +#define RTE_DEFAULT_MAX_MEMZONE 2560
> > +
> > +static size_t memzone_max = RTE_DEFAULT_MAX_MEMZONE;
> > +
> >   static inline const struct rte_memzone *
> >   memzone_lookup_thread_unsafe(const char *name)
> >   {
> > @@ -81,8 +85,9 @@ memzone_reserve_aligned_thread_unsafe(const char
> *name, size_t len,
> >   	/* no more room in config */
> >   	if (arr->count >= arr->len) {
> >   		RTE_LOG(ERR, EAL,
> > -		"%s(): Number of requested memzone segments exceeds
> RTE_MAX_MEMZONE\n",
> > -			__func__);
> > +		"%s(): Number of requested memzone segments exceeds max
> "
> > +		"memzone segments (%d >= %d)\n",
> 
> I think the "segments" terminology can be dropped, it is a holdover from the
> times when memzones were not allocated by malloc. The message can just say
> "Number of requested memzones exceeds maximum number of memzones".
> 

Ack

> > +rte_memzone_max_set(size_t max)
> > +{
> > +	/* Setting max memzone must occur befaore calling rte_eal_init() */
> > +	if (eal_get_internal_configuration()->init_complete > 0)
> > +		return -1;
> > +
> > +	memzone_max = max;
> > +	return 0;
> > +}
> > +
> > +size_t
> > +rte_memzone_max_get(void)
> > +{
> > +	return memzone_max;
> > +}
> 
> It seems that this is a local (static) value, which means it is not shared between
> processes, and thus could potentially mismatch between two different
> processes. While this _technically_ would not be a problem because secondary
> process init will not actually use this value, but the API will still return incorrect
> information.
> 
> I suggest updating/syncing this value somewhere in
> `eal_mcfg_update_internal()/eal_mcfg_update_from_internal()`, and adding
> this value to mem config. An alternative to that would be to just return
> `mem_config->memzones.count` (instead of the value itself), and return
> -1 (or zero?) if init hasn't yet completed.
> 

Static variable removed.
I opted the second alternative, but if init hasn't yet completed the return value is the default (2560) rather than -1 or 0.

> --
> Thanks,
> Anatoly
  
Ophir Munk May 25, 2023, 6:47 a.m. UTC | #6
Hi Morten,

> 
> I retracted my objection to the RFC, but should also have added:
> 
> Acked-by: Morten Brørup <mb@smartsharesystems.com>

Ack.
  

Patch

diff --git a/app/test/test_func_reentrancy.c b/app/test/test_func_reentrancy.c
index d1ed5d4..ae9de6f 100644
--- a/app/test/test_func_reentrancy.c
+++ b/app/test/test_func_reentrancy.c
@@ -51,7 +51,7 @@  typedef void (*case_clean_t)(unsigned lcore_id);
 #define MEMPOOL_ELT_SIZE                    (sizeof(uint32_t))
 #define MEMPOOL_SIZE                        (4)
 
-#define MAX_LCORES	(RTE_MAX_MEMZONE / (MAX_ITER_MULTI * 4U))
+#define MAX_LCORES	(rte_memzone_max_get() / (MAX_ITER_MULTI * 4U))
 
 static uint32_t obj_count;
 static uint32_t synchro;
diff --git a/app/test/test_malloc_perf.c b/app/test/test_malloc_perf.c
index ccec43a..9bd1662 100644
--- a/app/test/test_malloc_perf.c
+++ b/app/test/test_malloc_perf.c
@@ -165,7 +165,7 @@  test_malloc_perf(void)
 		return -1;
 
 	if (test_alloc_perf("rte_memzone_reserve", memzone_alloc, memzone_free,
-			NULL, memset_us_gb, RTE_MAX_MEMZONE - 1) < 0)
+			NULL, memset_us_gb, rte_memzone_max_get() - 1) < 0)
 		return -1;
 
 	return 0;
diff --git a/app/test/test_memzone.c b/app/test/test_memzone.c
index c9255e5..36b9790 100644
--- a/app/test/test_memzone.c
+++ b/app/test/test_memzone.c
@@ -871,9 +871,18 @@  test_memzone_bounded(void)
 static int
 test_memzone_free(void)
 {
-	const struct rte_memzone *mz[RTE_MAX_MEMZONE + 1];
+	size_t mz_size;
+	const struct rte_memzone **mz;
 	int i;
 	char name[20];
+	int rc = -1;
+
+	mz_size = (rte_memzone_max_get() + 1) * sizeof(struct rte_memzone *);
+	mz = rte_zmalloc("memzone_test", mz_size, 0);
+	if (!mz) {
+		printf("Fail allocating memzone test array\n");
+		return rc;
+	}
 
 	mz[0] = rte_memzone_reserve(TEST_MEMZONE_NAME("tempzone0"), 2000,
 			SOCKET_ID_ANY, 0);
@@ -881,42 +890,42 @@  test_memzone_free(void)
 			SOCKET_ID_ANY, 0);
 
 	if (mz[0] > mz[1])
-		return -1;
+		goto exit_test;
 	if (!rte_memzone_lookup(TEST_MEMZONE_NAME("tempzone0")))
-		return -1;
+		goto exit_test;
 	if (!rte_memzone_lookup(TEST_MEMZONE_NAME("tempzone1")))
-		return -1;
+		goto exit_test;
 
 	if (rte_memzone_free(mz[0])) {
 		printf("Fail memzone free - tempzone0\n");
-		return -1;
+		goto exit_test;
 	}
 	if (rte_memzone_lookup(TEST_MEMZONE_NAME("tempzone0"))) {
 		printf("Found previously free memzone - tempzone0\n");
-		return -1;
+		goto exit_test;
 	}
 	mz[2] = rte_memzone_reserve(TEST_MEMZONE_NAME("tempzone2"), 2000,
 			SOCKET_ID_ANY, 0);
 
 	if (mz[2] > mz[1]) {
 		printf("tempzone2 should have gotten the free entry from tempzone0\n");
-		return -1;
+		goto exit_test;
 	}
 	if (rte_memzone_free(mz[2])) {
 		printf("Fail memzone free - tempzone2\n");
-		return -1;
+		goto exit_test;
 	}
 	if (rte_memzone_lookup(TEST_MEMZONE_NAME("tempzone2"))) {
 		printf("Found previously free memzone - tempzone2\n");
-		return -1;
+		goto exit_test;
 	}
 	if (rte_memzone_free(mz[1])) {
 		printf("Fail memzone free - tempzone1\n");
-		return -1;
+		goto exit_test;
 	}
 	if (rte_memzone_lookup(TEST_MEMZONE_NAME("tempzone1"))) {
 		printf("Found previously free memzone - tempzone1\n");
-		return -1;
+		goto exit_test;
 	}
 
 	i = 0;
@@ -928,7 +937,7 @@  test_memzone_free(void)
 
 	if (rte_memzone_free(mz[0])) {
 		printf("Fail memzone free - tempzone0\n");
-		return -1;
+		goto exit_test;
 	}
 	mz[0] = rte_memzone_reserve(TEST_MEMZONE_NAME("tempzone0new"), 0,
 			SOCKET_ID_ANY, 0);
@@ -936,17 +945,21 @@  test_memzone_free(void)
 	if (mz[0] == NULL) {
 		printf("Fail to create memzone - tempzone0new - when MAX memzones were "
 				"created and one was free\n");
-		return -1;
+		goto exit_test;
 	}
 
 	for (i = i - 2; i >= 0; i--) {
 		if (rte_memzone_free(mz[i])) {
 			printf("Fail memzone free - tempzone%d\n", i);
-			return -1;
+			goto exit_test;
 		}
 	}
 
-	return 0;
+	rc = 0;
+
+exit_test:
+	rte_free(mz);
+	return rc;
 }
 
 static int test_memzones_left;
diff --git a/config/rte_config.h b/config/rte_config.h
index 7b8c85e..400e44e 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -34,7 +34,6 @@ 
 #define RTE_MAX_MEM_MB_PER_LIST 32768
 #define RTE_MAX_MEMSEG_PER_TYPE 32768
 #define RTE_MAX_MEM_MB_PER_TYPE 65536
-#define RTE_MAX_MEMZONE 2560
 #define RTE_MAX_TAILQ 32
 #define RTE_LOG_DP_LEVEL RTE_LOG_INFO
 #define RTE_MAX_VFIO_CONTAINERS 64
diff --git a/drivers/net/qede/base/bcm_osal.c b/drivers/net/qede/base/bcm_osal.c
index 2c59397..0458768 100644
--- a/drivers/net/qede/base/bcm_osal.c
+++ b/drivers/net/qede/base/bcm_osal.c
@@ -47,10 +47,26 @@  void osal_poll_mode_dpc(osal_int_ptr_t hwfn_cookie)
 }
 
 /* Array of memzone pointers */
-static const struct rte_memzone *ecore_mz_mapping[RTE_MAX_MEMZONE];
+static const struct rte_memzone **ecore_mz_mapping;
 /* Counter to track current memzone allocated */
 static uint16_t ecore_mz_count;
 
+int ecore_mz_mapping_alloc(void)
+{
+	ecore_mz_mapping = rte_zmalloc("ecore_mz_map",
+		rte_memzone_max_get() * sizeof(struct rte_memzone *), 0);
+
+	if (!ecore_mz_mapping)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void ecore_mz_mapping_free(void)
+{
+	rte_free(ecore_mz_mapping);
+}
+
 unsigned long qede_log2_align(unsigned long n)
 {
 	unsigned long ret = n ? 1 : 0;
@@ -132,9 +148,9 @@  void *osal_dma_alloc_coherent(struct ecore_dev *p_dev,
 	uint32_t core_id = rte_lcore_id();
 	unsigned int socket_id;
 
-	if (ecore_mz_count >= RTE_MAX_MEMZONE) {
-		DP_ERR(p_dev, "Memzone allocation count exceeds %u\n",
-		       RTE_MAX_MEMZONE);
+	if (ecore_mz_count >= rte_memzone_max_get()) {
+		DP_ERR(p_dev, "Memzone allocation count exceeds %zu\n",
+		       rte_memzone_max_get());
 		*phys = 0;
 		return OSAL_NULL;
 	}
@@ -171,9 +187,9 @@  void *osal_dma_alloc_coherent_aligned(struct ecore_dev *p_dev,
 	uint32_t core_id = rte_lcore_id();
 	unsigned int socket_id;
 
-	if (ecore_mz_count >= RTE_MAX_MEMZONE) {
-		DP_ERR(p_dev, "Memzone allocation count exceeds %u\n",
-		       RTE_MAX_MEMZONE);
+	if (ecore_mz_count >= rte_memzone_max_get()) {
+		DP_ERR(p_dev, "Memzone allocation count exceeds %zu\n",
+		       rte_memzone_max_get());
 		*phys = 0;
 		return OSAL_NULL;
 	}
diff --git a/drivers/net/qede/base/bcm_osal.h b/drivers/net/qede/base/bcm_osal.h
index 67e7f75..97e261d 100644
--- a/drivers/net/qede/base/bcm_osal.h
+++ b/drivers/net/qede/base/bcm_osal.h
@@ -477,4 +477,7 @@  enum dbg_status	qed_dbg_alloc_user_data(struct ecore_hwfn *p_hwfn,
 	qed_dbg_alloc_user_data(p_hwfn, user_data_ptr)
 #define OSAL_DB_REC_OCCURRED(p_hwfn) nothing
 
+int ecore_mz_mapping_alloc(void);
+void ecore_mz_mapping_free(void);
+
 #endif /* __BCM_OSAL_H */
diff --git a/drivers/net/qede/qede_main.c b/drivers/net/qede/qede_main.c
index 0303903..fd63262 100644
--- a/drivers/net/qede/qede_main.c
+++ b/drivers/net/qede/qede_main.c
@@ -72,6 +72,12 @@  qed_probe(struct ecore_dev *edev, struct rte_pci_device *pci_dev,
 	hw_prepare_params.allow_mdump = false;
 	hw_prepare_params.b_en_pacing = false;
 	hw_prepare_params.epoch = OSAL_GET_EPOCH(ECORE_LEADING_HWFN(edev));
+	rc = ecore_mz_mapping_alloc();
+	if (rc) {
+		DP_ERR(edev, "mem zones array allocation failed\n");
+		return rc;
+	}
+
 	rc = ecore_hw_prepare(edev, &hw_prepare_params);
 	if (rc) {
 		DP_ERR(edev, "hw prepare failed\n");
@@ -722,6 +728,7 @@  static void qed_remove(struct ecore_dev *edev)
 		return;
 
 	ecore_hw_remove(edev);
+	ecore_mz_mapping_free();
 }
 
 static int qed_send_drv_state(struct ecore_dev *edev, bool active)
diff --git a/lib/eal/common/eal_common_memzone.c b/lib/eal/common/eal_common_memzone.c
index a9cd91f..f94330b 100644
--- a/lib/eal/common/eal_common_memzone.c
+++ b/lib/eal/common/eal_common_memzone.c
@@ -22,6 +22,10 @@ 
 #include "eal_private.h"
 #include "eal_memcfg.h"
 
+#define RTE_DEFAULT_MAX_MEMZONE 2560
+
+static size_t memzone_max = RTE_DEFAULT_MAX_MEMZONE;
+
 static inline const struct rte_memzone *
 memzone_lookup_thread_unsafe(const char *name)
 {
@@ -81,8 +85,9 @@  memzone_reserve_aligned_thread_unsafe(const char *name, size_t len,
 	/* no more room in config */
 	if (arr->count >= arr->len) {
 		RTE_LOG(ERR, EAL,
-		"%s(): Number of requested memzone segments exceeds RTE_MAX_MEMZONE\n",
-			__func__);
+		"%s(): Number of requested memzone segments exceeds max "
+		"memzone segments (%d >= %d)\n",
+			__func__, arr->count, arr->len);
 		rte_errno = ENOSPC;
 		return NULL;
 	}
@@ -396,7 +401,7 @@  rte_eal_memzone_init(void)
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
 			rte_fbarray_init(&mcfg->memzones, "memzone",
-			RTE_MAX_MEMZONE, sizeof(struct rte_memzone))) {
+			rte_memzone_max_get(), sizeof(struct rte_memzone))) {
 		RTE_LOG(ERR, EAL, "Cannot allocate memzone list\n");
 		ret = -1;
 	} else if (rte_eal_process_type() == RTE_PROC_SECONDARY &&
@@ -430,3 +435,20 @@  void rte_memzone_walk(void (*func)(const struct rte_memzone *, void *),
 	}
 	rte_rwlock_read_unlock(&mcfg->mlock);
 }
+
+int
+rte_memzone_max_set(size_t max)
+{
+	/* Setting max memzone must occur befaore calling rte_eal_init() */
+	if (eal_get_internal_configuration()->init_complete > 0)
+		return -1;
+
+	memzone_max = max;
+	return 0;
+}
+
+size_t
+rte_memzone_max_get(void)
+{
+	return memzone_max;
+}
diff --git a/lib/eal/include/rte_memzone.h b/lib/eal/include/rte_memzone.h
index 5302caa..3ff7c73 100644
--- a/lib/eal/include/rte_memzone.h
+++ b/lib/eal/include/rte_memzone.h
@@ -305,6 +305,26 @@  void rte_memzone_dump(FILE *f);
 void rte_memzone_walk(void (*func)(const struct rte_memzone *, void *arg),
 		      void *arg);
 
+/**
+ * Set max memzone value
+ *
+ * @param max
+ *   Value of max memzone allocations
+ * @return
+ *  0 on success, -1 otherwise
+ */
+__rte_experimental
+int rte_memzone_max_set(size_t max);
+
+/**
+ * Get max memzone value
+ *
+ * @return
+ *   Value of max memzone allocations
+ */
+__rte_experimental
+size_t rte_memzone_max_get(void);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/version.map b/lib/eal/version.map
index 51a820d..b52a83c 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -430,6 +430,10 @@  EXPERIMENTAL {
 	rte_thread_create_control;
 	rte_thread_set_name;
 	__rte_eal_trace_generic_blob;
+
+	# added in 23.07
+	rte_memzone_max_set;
+	rte_memzone_max_get;
 };
 
 INTERNAL {