[v8] mempool cache: add zero-copy get and put functions

Message ID 20230209145833.129986-1-mb@smartsharesystems.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series [v8] mempool cache: add zero-copy get and put functions |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/github-robot: build success github build: passed
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-abi-testing success Testing PASS

Commit Message

Morten Brørup Feb. 9, 2023, 2:58 p.m. UTC
  Zero-copy access to mempool caches is beneficial for PMD performance, and
must be provided by the mempool library to fix [Bug 1052] without a
performance regression.

[Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052

Bugzilla ID: 1052

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>

v8:
* Actually include the rte_errno header file.
  Note to self: The changes only take effect on the disk after the file in
  the text editor has been saved.
v7:
* Fix typo in function description. (checkpatch)
* Zero-copy functions may set rte_errno; include rte_errno header file.
  (ci/loongarch-compilation)
v6:
* Improve description of the 'n' parameter to the zero-copy get function.
  (Konstantin, Bruce)
* The caches used for zero-copy may not be user-owned, so remove this word
  from the function descriptions. (Kamalakshitha)
v5:
* Bugfix: Compare zero-copy get request to the cache size instead of the
  flush threshold; otherwise refill could overflow the memory allocated
  for the cache. (Andrew)
* Split the zero-copy put function into an internal function doing the
  work, and a public function with trace.
* Avoid code duplication by rewriting rte_mempool_do_generic_put() to use
  the internal zero-copy put function. (Andrew)
* Corrected the return type of rte_mempool_cache_zc_put_bulk() from void *
  to void **; it returns a pointer to an array of objects.
* Fix coding style: Add missing curly brackets. (Andrew)
v4:
* Fix checkpatch warnings.
v3:
* Bugfix: Respect the cache size; compare to the flush threshold instead
  of RTE_MEMPOOL_CACHE_MAX_SIZE.
* Added 'rewind' function for incomplete 'put' operations. (Konstantin)
* Replace RTE_ASSERTs with runtime checks of the request size.
  Instead of failing, return NULL if the request is too big. (Konstantin)
* Modified comparison to prevent overflow if n is really huge and len is
  non-zero.
* Updated the comments in the code.
v2:
* Fix checkpatch warnings.
* Fix missing registration of trace points.
* The functions are inline, so they don't go into the map file.
v1 changes from the RFC:
* Removed run-time parameter checks. (Honnappa)
  This is a hot fast path function; requiring correct application
  behaviour, i.e. function parameters must be valid.
* Added RTE_ASSERT for parameters instead.
  Code for this is only generated if built with RTE_ENABLE_ASSERT.
* Removed fallback when 'cache' parameter is not set. (Honnappa)
* Chose the simple get function; i.e. do not move the existing objects in
  the cache to the top of the new stack, just leave them at the bottom.
* Renamed the functions. Other suggestions are welcome, of course. ;-)
* Updated the function descriptions.
* Added the functions to trace_fp and version.map.
---
 lib/mempool/mempool_trace_points.c |   9 ++
 lib/mempool/rte_mempool.h          | 238 +++++++++++++++++++++++++----
 lib/mempool/rte_mempool_trace_fp.h |  23 +++
 lib/mempool/version.map            |   5 +
 4 files changed, 246 insertions(+), 29 deletions(-)
  

Comments

fengchengwen Feb. 10, 2023, 8:35 a.m. UTC | #1
Acked-by: Chengwen Feng <fengchengwen@huawei.com>

On 2023/2/9 22:58, Morten Brørup wrote:
> Zero-copy access to mempool caches is beneficial for PMD performance, and
> must be provided by the mempool library to fix [Bug 1052] without a
> performance regression.
> 
> [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> 
> Bugzilla ID: 1052
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> 

...
  
Honnappa Nagarahalli Feb. 12, 2023, 7:56 p.m. UTC | #2
Hi Morten,
	Apologies for the late comments.

> -----Original Message-----
> From: Morten Brørup <mb@smartsharesystems.com>
> Sent: Thursday, February 9, 2023 8:59 AM
> To: olivier.matz@6wind.com; andrew.rybchenko@oktetlabs.ru; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Kamalakshitha Aligeri
> <Kamalakshitha.Aligeri@arm.com>; bruce.richardson@intel.com;
> konstantin.ananyev@huawei.com; dev@dpdk.org
> Cc: nd <nd@arm.com>; david.marchand@redhat.com; Morten Brørup
> <mb@smartsharesystems.com>
> Subject: [PATCH v8] mempool cache: add zero-copy get and put functions
> 
> Zero-copy access to mempool caches is beneficial for PMD performance, and
> must be provided by the mempool library to fix [Bug 1052] without a
> performance regression.
> 
> [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> 
> Bugzilla ID: 1052
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> 
> v8:
> * Actually include the rte_errno header file.
>   Note to self: The changes only take effect on the disk after the file in
>   the text editor has been saved.
> v7:
> * Fix typo in function description. (checkpatch)
> * Zero-copy functions may set rte_errno; include rte_errno header file.
>   (ci/loongarch-compilation)
> v6:
> * Improve description of the 'n' parameter to the zero-copy get function.
>   (Konstantin, Bruce)
> * The caches used for zero-copy may not be user-owned, so remove this word
>   from the function descriptions. (Kamalakshitha)
> v5:
> * Bugfix: Compare zero-copy get request to the cache size instead of the
>   flush threshold; otherwise refill could overflow the memory allocated
>   for the cache. (Andrew)
> * Split the zero-copy put function into an internal function doing the
>   work, and a public function with trace.
> * Avoid code duplication by rewriting rte_mempool_do_generic_put() to use
>   the internal zero-copy put function. (Andrew)
> * Corrected the return type of rte_mempool_cache_zc_put_bulk() from void
> *
>   to void **; it returns a pointer to an array of objects.
> * Fix coding style: Add missing curly brackets. (Andrew)
> v4:
> * Fix checkpatch warnings.
> v3:
> * Bugfix: Respect the cache size; compare to the flush threshold instead
>   of RTE_MEMPOOL_CACHE_MAX_SIZE.
> * Added 'rewind' function for incomplete 'put' operations. (Konstantin)
> * Replace RTE_ASSERTs with runtime checks of the request size.
>   Instead of failing, return NULL if the request is too big. (Konstantin)
> * Modified comparison to prevent overflow if n is really huge and len is
>   non-zero.
> * Updated the comments in the code.
> v2:
> * Fix checkpatch warnings.
> * Fix missing registration of trace points.
> * The functions are inline, so they don't go into the map file.
> v1 changes from the RFC:
> * Removed run-time parameter checks. (Honnappa)
>   This is a hot fast path function; requiring correct application
>   behaviour, i.e. function parameters must be valid.
> * Added RTE_ASSERT for parameters instead.
>   Code for this is only generated if built with RTE_ENABLE_ASSERT.
> * Removed fallback when 'cache' parameter is not set. (Honnappa)
> * Chose the simple get function; i.e. do not move the existing objects in
>   the cache to the top of the new stack, just leave them at the bottom.
> * Renamed the functions. Other suggestions are welcome, of course. ;-)
> * Updated the function descriptions.
> * Added the functions to trace_fp and version.map.
> ---
>  lib/mempool/mempool_trace_points.c |   9 ++
>  lib/mempool/rte_mempool.h          | 238 +++++++++++++++++++++++++----
>  lib/mempool/rte_mempool_trace_fp.h |  23 +++
>  lib/mempool/version.map            |   5 +
>  4 files changed, 246 insertions(+), 29 deletions(-)
> 
> diff --git a/lib/mempool/mempool_trace_points.c
> b/lib/mempool/mempool_trace_points.c
> index 4ad76deb34..83d353a764 100644
> --- a/lib/mempool/mempool_trace_points.c
> +++ b/lib/mempool/mempool_trace_points.c
> @@ -77,3 +77,12 @@
> RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free,
> 
>  RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname,
>  	lib.mempool.set.ops.byname)
> +
> +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk,
> +	lib.mempool.cache.zc.put.bulk)
> +
> +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_rewind,
> +	lib.mempool.cache.zc.put.rewind)
> +
> +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk,
> +	lib.mempool.cache.zc.get.bulk)
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 9f530db24b..711a9d1c16 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -42,6 +42,7 @@
>  #include <rte_config.h>
>  #include <rte_spinlock.h>
>  #include <rte_debug.h>
> +#include <rte_errno.h>
>  #include <rte_lcore.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_ring.h>
> @@ -1346,6 +1347,198 @@ rte_mempool_cache_flush(struct
> rte_mempool_cache *cache,
>  	cache->len = 0;
>  }
> 
> +
> +/**
> + * @internal used by rte_mempool_cache_zc_put_bulk() and
> rte_mempool_do_generic_put().
> + *
> + * Zero-copy put objects in a mempool cache backed by the specified
> mempool.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param mp
> + *   A pointer to the mempool.
> + * @param n
> + *   The number of objects to be put in the mempool cache.
> + * @return
> + *   The pointer to where to put the objects in the mempool cache.
> + *   NULL if the request itself is too big for the cache, i.e.
> + *   exceeds the cache flush threshold.
> + */
> +static __rte_always_inline void **
> +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> +		struct rte_mempool *mp,
> +		unsigned int n)
> +{
> +	void **cache_objs;
> +
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(mp != NULL);
> +
> +	if (n <= cache->flushthresh - cache->len) {
> +		/*
> +		 * The objects can be added to the cache without crossing the
> +		 * flush threshold.
> +		 */
> +		cache_objs = &cache->objs[cache->len];
> +		cache->len += n;
> +	} else if (likely(n <= cache->flushthresh)) {
> +		/*
> +		 * The request itself fits into the cache.
> +		 * But first, the cache must be flushed to the backend, so
> +		 * adding the objects does not cross the flush threshold.
> +		 */
> +		cache_objs = &cache->objs[0];
> +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> >len);
This is a flush of the cache. It is probably worth having a counter for this.

> +		cache->len = n;
> +	} else {
> +		/* The request itself is too big for the cache. */
This is possibly an error condition. Do we need to set rte_errno? Do we need a counter here to capture that?

> +		return NULL;
> +	}
> +
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> +
> +	return cache_objs;
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: This API may change, or be removed, without prior
> notice.
> + *
> + * Zero-copy put objects in a mempool cache backed by the specified
> mempool.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param mp
> + *   A pointer to the mempool.
> + * @param n
> + *   The number of objects to be put in the mempool cache.
> + * @return
> + *   The pointer to where to put the objects in the mempool cache.
> + *   NULL if the request itself is too big for the cache, i.e.
> + *   exceeds the cache flush threshold.
> + */
> +__rte_experimental
> +static __rte_always_inline void **
> +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> +		struct rte_mempool *mp,
> +		unsigned int n)
> +{
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(mp != NULL);
> +
> +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> +	return __rte_mempool_cache_zc_put_bulk(cache, mp, n); }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: This API may change, or be removed, without prior
> notice.
> + *
> + * Zero-copy un-put objects in a mempool cache.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param n
> + *   The number of objects not put in the mempool cache after calling
> + *   rte_mempool_cache_zc_put_bulk().
> + */
> +__rte_experimental
> +static __rte_always_inline void
> +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> +		unsigned int n)
> +{
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(n <= cache->len);
> +
> +	rte_mempool_trace_cache_zc_put_rewind(cache, n);
> +
> +	cache->len -= n;
> +
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n); }
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: This API may change, or be removed, without prior
> notice.
> + *
> + * Zero-copy get objects from a mempool cache backed by the specified
> mempool.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param mp
> + *   A pointer to the mempool.
> + * @param n
> + *   The number of objects to be made available for extraction from the
> mempool cache.
> + * @return
> + *   The pointer to the objects in the mempool cache.
> + *   NULL on error; i.e. the cache + the pool does not contain 'n' objects.
> + *   With rte_errno set to the error code of the mempool dequeue function,
> + *   or EINVAL if the request itself is too big for the cache, i.e.
> + *   exceeds the cache flush threshold.
> + */
> +__rte_experimental
> +static __rte_always_inline void *
> +rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
> +		struct rte_mempool *mp,
> +		unsigned int n)
> +{
> +	unsigned int len, size;
> +
> +	RTE_ASSERT(cache != NULL);
> +	RTE_ASSERT(mp != NULL);
> +
> +	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
> +
> +	len = cache->len;
> +	size = cache->size;
> +
> +	if (n <= len) {
> +		/* The request can be satisfied from the cache as is. */
> +		len -= n;
> +	} else if (likely(n <= size)) {
> +		/*
> +		 * The request itself can be satisfied from the cache.
> +		 * But first, the cache must be filled from the backend;
> +		 * fetch size + requested - len objects.
> +		 */
> +		int ret;
> +
> +		ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> >objs[len], size + n - len);
> +		if (unlikely(ret < 0)) {
> +			/*
> +			 * We are buffer constrained.
> +			 * Do not fill the cache, just satisfy the request.
> +			 */
> +			ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> >objs[len], n - len);
> +			if (unlikely(ret < 0)) {
> +				/* Unable to satisfy the request. */
> +
> +				RTE_MEMPOOL_STAT_ADD(mp,
> get_fail_bulk, 1);
> +				RTE_MEMPOOL_STAT_ADD(mp,
> get_fail_objs, n);
> +
> +				rte_errno = -ret;
> +				return NULL;
> +			}
> +
> +			len = 0;
> +		} else {
> +			len = size;
> +		}
> +	} else {
> +		/* The request itself is too big for the cache. */
> +		rte_errno = EINVAL;
> +		return NULL;
> +	}
> +
> +	cache->len = len;
> +
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> +
> +	return &cache->objs[len];
> +}
> +
>  /**
>   * @internal Put several objects back in the mempool; used internally.
>   * @param mp
> @@ -1364,32 +1557,25 @@ rte_mempool_do_generic_put(struct
> rte_mempool *mp, void * const *obj_table,  {
>  	void **cache_objs;
> 
> -	/* No cache provided */
> -	if (unlikely(cache == NULL))
> -		goto driver_enqueue;
> +	/* No cache provided? */
> +	if (unlikely(cache == NULL)) {
> +		/* Increment stats now, adding in mempool always succeeds.
> */
> +		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> +		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> 
> -	/* increment stat now, adding in mempool always success */
> -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> +		goto driver_enqueue;
> +	}
> 
> -	/* The request itself is too big for the cache */
> -	if (unlikely(n > cache->flushthresh))
> -		goto driver_enqueue_stats_incremented;
> +	/* Prepare to add the objects to the cache. */
> +	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> 
> -	/*
> -	 * The cache follows the following algorithm:
> -	 *   1. If the objects cannot be added to the cache without crossing
> -	 *      the flush threshold, flush the cache to the backend.
> -	 *   2. Add the objects to the cache.
> -	 */
> +	/* The request itself is too big for the cache? */
> +	if (unlikely(cache_objs == NULL)) {
> +		/* Increment stats now, adding in mempool always succeeds.
> */
> +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> 
> -	if (cache->len + n <= cache->flushthresh) {
> -		cache_objs = &cache->objs[cache->len];
> -		cache->len += n;
> -	} else {
> -		cache_objs = &cache->objs[0];
> -		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> >len);
> -		cache->len = n;
> +		goto driver_enqueue;
>  	}
> 
>  	/* Add the objects to the cache. */
> @@ -1399,13 +1585,7 @@ rte_mempool_do_generic_put(struct
> rte_mempool *mp, void * const *obj_table,
> 
>  driver_enqueue:
> 
> -	/* increment stat now, adding in mempool always success */
> -	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> -	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> -
> -driver_enqueue_stats_incremented:
> -
> -	/* push objects to the backend */
> +	/* Push the objects to the backend. */
>  	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);  }
> 
> diff --git a/lib/mempool/rte_mempool_trace_fp.h
> b/lib/mempool/rte_mempool_trace_fp.h
> index ed060e887c..14666457f7 100644
> --- a/lib/mempool/rte_mempool_trace_fp.h
> +++ b/lib/mempool/rte_mempool_trace_fp.h
> @@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
>  	rte_trace_point_emit_ptr(mempool);
>  )
> 
> +RTE_TRACE_POINT_FP(
> +	rte_mempool_trace_cache_zc_put_bulk,
> +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> nb_objs),
> +	rte_trace_point_emit_ptr(cache);
> +	rte_trace_point_emit_ptr(mempool);
> +	rte_trace_point_emit_u32(nb_objs);
> +)
> +
> +RTE_TRACE_POINT_FP(
> +	rte_mempool_trace_cache_zc_put_rewind,
> +	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
> +	rte_trace_point_emit_ptr(cache);
> +	rte_trace_point_emit_u32(nb_objs);
> +)
> +
> +RTE_TRACE_POINT_FP(
> +	rte_mempool_trace_cache_zc_get_bulk,
> +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> nb_objs),
> +	rte_trace_point_emit_ptr(cache);
> +	rte_trace_point_emit_ptr(mempool);
> +	rte_trace_point_emit_u32(nb_objs);
> +)
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/mempool/version.map b/lib/mempool/version.map index
> b67d7aace7..1383ae6db2 100644
> --- a/lib/mempool/version.map
> +++ b/lib/mempool/version.map
> @@ -63,6 +63,11 @@ EXPERIMENTAL {
>  	__rte_mempool_trace_ops_alloc;
>  	__rte_mempool_trace_ops_free;
>  	__rte_mempool_trace_set_ops_byname;
> +
> +	# added in 23.03
> +	__rte_mempool_trace_cache_zc_put_bulk;
> +	__rte_mempool_trace_cache_zc_put_rewind;
> +	__rte_mempool_trace_cache_zc_get_bulk;
>  };
> 
>  INTERNAL {
> --
> 2.17.1
  
Morten Brørup Feb. 12, 2023, 11:15 p.m. UTC | #3
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Sunday, 12 February 2023 20.57
> 
> Hi Morten,
> 	Apologies for the late comments.

Better late than never. :-)

> 
> > From: Morten Brørup <mb@smartsharesystems.com>
> > Sent: Thursday, February 9, 2023 8:59 AM
> >
> > Zero-copy access to mempool caches is beneficial for PMD performance,
> and
> > must be provided by the mempool library to fix [Bug 1052] without a
> > performance regression.
> >
> > [Bug 1052]: https://bugs.dpdk.org/show_bug.cgi?id=1052
> >
> > Bugzilla ID: 1052
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> >
> > v8:
> > * Actually include the rte_errno header file.
> >   Note to self: The changes only take effect on the disk after the
> file in
> >   the text editor has been saved.
> > v7:
> > * Fix typo in function description. (checkpatch)
> > * Zero-copy functions may set rte_errno; include rte_errno header
> file.
> >   (ci/loongarch-compilation)
> > v6:
> > * Improve description of the 'n' parameter to the zero-copy get
> function.
> >   (Konstantin, Bruce)
> > * The caches used for zero-copy may not be user-owned, so remove this
> word
> >   from the function descriptions. (Kamalakshitha)
> > v5:
> > * Bugfix: Compare zero-copy get request to the cache size instead of
> the
> >   flush threshold; otherwise refill could overflow the memory
> allocated
> >   for the cache. (Andrew)
> > * Split the zero-copy put function into an internal function doing
> the
> >   work, and a public function with trace.
> > * Avoid code duplication by rewriting rte_mempool_do_generic_put() to
> use
> >   the internal zero-copy put function. (Andrew)
> > * Corrected the return type of rte_mempool_cache_zc_put_bulk() from
> void
> > *
> >   to void **; it returns a pointer to an array of objects.
> > * Fix coding style: Add missing curly brackets. (Andrew)
> > v4:
> > * Fix checkpatch warnings.
> > v3:
> > * Bugfix: Respect the cache size; compare to the flush threshold
> instead
> >   of RTE_MEMPOOL_CACHE_MAX_SIZE.
> > * Added 'rewind' function for incomplete 'put' operations.
> (Konstantin)
> > * Replace RTE_ASSERTs with runtime checks of the request size.
> >   Instead of failing, return NULL if the request is too big.
> (Konstantin)
> > * Modified comparison to prevent overflow if n is really huge and len
> is
> >   non-zero.
> > * Updated the comments in the code.
> > v2:
> > * Fix checkpatch warnings.
> > * Fix missing registration of trace points.
> > * The functions are inline, so they don't go into the map file.
> > v1 changes from the RFC:
> > * Removed run-time parameter checks. (Honnappa)
> >   This is a hot fast path function; requiring correct application
> >   behaviour, i.e. function parameters must be valid.
> > * Added RTE_ASSERT for parameters instead.
> >   Code for this is only generated if built with RTE_ENABLE_ASSERT.
> > * Removed fallback when 'cache' parameter is not set. (Honnappa)
> > * Chose the simple get function; i.e. do not move the existing
> objects in
> >   the cache to the top of the new stack, just leave them at the
> bottom.
> > * Renamed the functions. Other suggestions are welcome, of course. ;-
> )
> > * Updated the function descriptions.
> > * Added the functions to trace_fp and version.map.
> > ---
> >  lib/mempool/mempool_trace_points.c |   9 ++
> >  lib/mempool/rte_mempool.h          | 238 +++++++++++++++++++++++++--
> --
> >  lib/mempool/rte_mempool_trace_fp.h |  23 +++
> >  lib/mempool/version.map            |   5 +
> >  4 files changed, 246 insertions(+), 29 deletions(-)
> >
> > diff --git a/lib/mempool/mempool_trace_points.c
> > b/lib/mempool/mempool_trace_points.c
> > index 4ad76deb34..83d353a764 100644
> > --- a/lib/mempool/mempool_trace_points.c
> > +++ b/lib/mempool/mempool_trace_points.c
> > @@ -77,3 +77,12 @@
> > RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free,
> >
> >  RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname,
> >  	lib.mempool.set.ops.byname)
> > +
> > +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk,
> > +	lib.mempool.cache.zc.put.bulk)
> > +
> > +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_rewind,
> > +	lib.mempool.cache.zc.put.rewind)
> > +
> > +RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk,
> > +	lib.mempool.cache.zc.get.bulk)
> > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> > index 9f530db24b..711a9d1c16 100644
> > --- a/lib/mempool/rte_mempool.h
> > +++ b/lib/mempool/rte_mempool.h
> > @@ -42,6 +42,7 @@
> >  #include <rte_config.h>
> >  #include <rte_spinlock.h>
> >  #include <rte_debug.h>
> > +#include <rte_errno.h>
> >  #include <rte_lcore.h>
> >  #include <rte_branch_prediction.h>
> >  #include <rte_ring.h>
> > @@ -1346,6 +1347,198 @@ rte_mempool_cache_flush(struct
> > rte_mempool_cache *cache,
> >  	cache->len = 0;
> >  }
> >
> > +
> > +/**
> > + * @internal used by rte_mempool_cache_zc_put_bulk() and
> > rte_mempool_do_generic_put().
> > + *
> > + * Zero-copy put objects in a mempool cache backed by the specified
> > mempool.
> > + *
> > + * @param cache
> > + *   A pointer to the mempool cache.
> > + * @param mp
> > + *   A pointer to the mempool.
> > + * @param n
> > + *   The number of objects to be put in the mempool cache.
> > + * @return
> > + *   The pointer to where to put the objects in the mempool cache.
> > + *   NULL if the request itself is too big for the cache, i.e.
> > + *   exceeds the cache flush threshold.
> > + */
> > +static __rte_always_inline void **
> > +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > +		struct rte_mempool *mp,
> > +		unsigned int n)
> > +{
> > +	void **cache_objs;
> > +
> > +	RTE_ASSERT(cache != NULL);
> > +	RTE_ASSERT(mp != NULL);
> > +
> > +	if (n <= cache->flushthresh - cache->len) {
> > +		/*
> > +		 * The objects can be added to the cache without crossing
> the
> > +		 * flush threshold.
> > +		 */
> > +		cache_objs = &cache->objs[cache->len];
> > +		cache->len += n;
> > +	} else if (likely(n <= cache->flushthresh)) {
> > +		/*
> > +		 * The request itself fits into the cache.
> > +		 * But first, the cache must be flushed to the backend, so
> > +		 * adding the objects does not cross the flush threshold.
> > +		 */
> > +		cache_objs = &cache->objs[0];
> > +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > >len);
> This is a flush of the cache. It is probably worth having a counter for
> this.

We somewhat already do. The put_common_pool_bulk counter is incremented in rte_mempool_ops_enqueue_bulk() [1].

[1]: https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/mempool/rte_mempool.h#L824

This counter doesn't exactly count the number of times the cache was flushed, because it also counts bulk put transactions not going via the cache.

Thinking further about it, I agree that specific counters for cache flush and cache refill could be useful, and should be added. However, being this late, I would prefer postponing them for a separate patch.

> 
> > +		cache->len = n;
> > +	} else {
> > +		/* The request itself is too big for the cache. */
> This is possibly an error condition. Do we need to set rte_errno?

I considered this when I wrote the function, and concluded that this function only returns NULL as normal behavior, never because of failure. E.g. if a cache can only hold 4 mbufs, and the PMD tries to store 32 mbufs, it is correct behavior of the PMD to call this function and learn that the cache is too small for direct access; it is not a failure in the PMD nor in the cache.

If it could return NULL due to a variety of reasons, I would probably agree that we *need* to set rte_errno, so the application could determine the reason.

But since it can only return NULL due to one reason (which involves correct use of the function), I convinced myself that setting rte_errno would not convey any additional information than the NULL return value itself, and be a waste of performance in the fast path. If you disagree, then it should be set to EINVAL, like when rte_mempool_cache_zc_get_bulk() is called with a request too big for the cache.

> Do we need a counter here to capture that?

Good question. I don't know. It would indicate that a cache is smaller than expected by the users trying to access the cache directly.

And if we add such a counter, we should probably add a similar counter for the cache get function too.

But again, being this late, I would postpone such counters for a separate patch. And they should trigger more discussions about required/useful counters.

For reference, the rte_mempool_debug_stats is cache aligned and currently holds 12 64-bit counters, so we can add 4 more - which is exactly the number discussed here - without changing its size. So this is not a barrier to adding those counters.

Furthermore, I suppose that we only want to increase the counter when the called through the mempool cache API, not when called indirectly through the mempool API. This would mean that the ordinary mempool functions cannot call the mempool cache functions, or the wrong counters would increase. So adding such counters is not completely trivial.

> 
> > +		return NULL;
> > +	}
> > +
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > +
> > +	return cache_objs;
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: This API may change, or be removed, without
> prior
> > notice.
> > + *
> > + * Zero-copy put objects in a mempool cache backed by the specified
> > mempool.
> > + *
> > + * @param cache
> > + *   A pointer to the mempool cache.
> > + * @param mp
> > + *   A pointer to the mempool.
> > + * @param n
> > + *   The number of objects to be put in the mempool cache.
> > + * @return
> > + *   The pointer to where to put the objects in the mempool cache.
> > + *   NULL if the request itself is too big for the cache, i.e.
> > + *   exceeds the cache flush threshold.
> > + */
> > +__rte_experimental
> > +static __rte_always_inline void **
> > +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > +		struct rte_mempool *mp,
> > +		unsigned int n)
> > +{
> > +	RTE_ASSERT(cache != NULL);
> > +	RTE_ASSERT(mp != NULL);
> > +
> > +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> > +	return __rte_mempool_cache_zc_put_bulk(cache, mp, n); }
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: This API may change, or be removed, without
> prior
> > notice.
> > + *
> > + * Zero-copy un-put objects in a mempool cache.
> > + *
> > + * @param cache
> > + *   A pointer to the mempool cache.
> > + * @param n
> > + *   The number of objects not put in the mempool cache after
> calling
> > + *   rte_mempool_cache_zc_put_bulk().
> > + */
> > +__rte_experimental
> > +static __rte_always_inline void
> > +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> > +		unsigned int n)
> > +{
> > +	RTE_ASSERT(cache != NULL);
> > +	RTE_ASSERT(n <= cache->len);
> > +
> > +	rte_mempool_trace_cache_zc_put_rewind(cache, n);
> > +
> > +	cache->len -= n;
> > +
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n); }
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: This API may change, or be removed, without
> prior
> > notice.
> > + *
> > + * Zero-copy get objects from a mempool cache backed by the
> specified
> > mempool.
> > + *
> > + * @param cache
> > + *   A pointer to the mempool cache.
> > + * @param mp
> > + *   A pointer to the mempool.
> > + * @param n
> > + *   The number of objects to be made available for extraction from
> the
> > mempool cache.
> > + * @return
> > + *   The pointer to the objects in the mempool cache.
> > + *   NULL on error; i.e. the cache + the pool does not contain 'n'
> objects.
> > + *   With rte_errno set to the error code of the mempool dequeue
> function,
> > + *   or EINVAL if the request itself is too big for the cache, i.e.
> > + *   exceeds the cache flush threshold.
> > + */
> > +__rte_experimental
> > +static __rte_always_inline void *
> > +rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
> > +		struct rte_mempool *mp,
> > +		unsigned int n)
> > +{
> > +	unsigned int len, size;
> > +
> > +	RTE_ASSERT(cache != NULL);
> > +	RTE_ASSERT(mp != NULL);
> > +
> > +	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
> > +
> > +	len = cache->len;
> > +	size = cache->size;
> > +
> > +	if (n <= len) {
> > +		/* The request can be satisfied from the cache as is. */
> > +		len -= n;
> > +	} else if (likely(n <= size)) {
> > +		/*
> > +		 * The request itself can be satisfied from the cache.
> > +		 * But first, the cache must be filled from the backend;
> > +		 * fetch size + requested - len objects.
> > +		 */
> > +		int ret;
> > +
> > +		ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> > >objs[len], size + n - len);
> > +		if (unlikely(ret < 0)) {
> > +			/*
> > +			 * We are buffer constrained.
> > +			 * Do not fill the cache, just satisfy the request.
> > +			 */
> > +			ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> > >objs[len], n - len);
> > +			if (unlikely(ret < 0)) {
> > +				/* Unable to satisfy the request. */
> > +
> > +				RTE_MEMPOOL_STAT_ADD(mp,
> > get_fail_bulk, 1);
> > +				RTE_MEMPOOL_STAT_ADD(mp,
> > get_fail_objs, n);
> > +
> > +				rte_errno = -ret;
> > +				return NULL;
> > +			}
> > +
> > +			len = 0;
> > +		} else {
> > +			len = size;
> > +		}
> > +	} else {
> > +		/* The request itself is too big for the cache. */
> > +		rte_errno = EINVAL;
> > +		return NULL;
> > +	}
> > +
> > +	cache->len = len;
> > +
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> > +
> > +	return &cache->objs[len];
> > +}
> > +
> >  /**
> >   * @internal Put several objects back in the mempool; used
> internally.
> >   * @param mp
> > @@ -1364,32 +1557,25 @@ rte_mempool_do_generic_put(struct
> > rte_mempool *mp, void * const *obj_table,  {
> >  	void **cache_objs;
> >
> > -	/* No cache provided */
> > -	if (unlikely(cache == NULL))
> > -		goto driver_enqueue;
> > +	/* No cache provided? */
> > +	if (unlikely(cache == NULL)) {
> > +		/* Increment stats now, adding in mempool always succeeds.
> > */
> > +		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > +		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> >
> > -	/* increment stat now, adding in mempool always success */
> > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > +		goto driver_enqueue;
> > +	}
> >
> > -	/* The request itself is too big for the cache */
> > -	if (unlikely(n > cache->flushthresh))
> > -		goto driver_enqueue_stats_incremented;
> > +	/* Prepare to add the objects to the cache. */
> > +	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> >
> > -	/*
> > -	 * The cache follows the following algorithm:
> > -	 *   1. If the objects cannot be added to the cache without
> crossing
> > -	 *      the flush threshold, flush the cache to the backend.
> > -	 *   2. Add the objects to the cache.
> > -	 */
> > +	/* The request itself is too big for the cache? */
> > +	if (unlikely(cache_objs == NULL)) {
> > +		/* Increment stats now, adding in mempool always succeeds.
> > */
> > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> >
> > -	if (cache->len + n <= cache->flushthresh) {
> > -		cache_objs = &cache->objs[cache->len];
> > -		cache->len += n;
> > -	} else {
> > -		cache_objs = &cache->objs[0];
> > -		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > >len);
> > -		cache->len = n;
> > +		goto driver_enqueue;
> >  	}
> >
> >  	/* Add the objects to the cache. */
> > @@ -1399,13 +1585,7 @@ rte_mempool_do_generic_put(struct
> > rte_mempool *mp, void * const *obj_table,
> >
> >  driver_enqueue:
> >
> > -	/* increment stat now, adding in mempool always success */
> > -	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > -	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > -
> > -driver_enqueue_stats_incremented:
> > -
> > -	/* push objects to the backend */
> > +	/* Push the objects to the backend. */
> >  	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);  }
> >
> > diff --git a/lib/mempool/rte_mempool_trace_fp.h
> > b/lib/mempool/rte_mempool_trace_fp.h
> > index ed060e887c..14666457f7 100644
> > --- a/lib/mempool/rte_mempool_trace_fp.h
> > +++ b/lib/mempool/rte_mempool_trace_fp.h
> > @@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
> >  	rte_trace_point_emit_ptr(mempool);
> >  )
> >
> > +RTE_TRACE_POINT_FP(
> > +	rte_mempool_trace_cache_zc_put_bulk,
> > +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> > nb_objs),
> > +	rte_trace_point_emit_ptr(cache);
> > +	rte_trace_point_emit_ptr(mempool);
> > +	rte_trace_point_emit_u32(nb_objs);
> > +)
> > +
> > +RTE_TRACE_POINT_FP(
> > +	rte_mempool_trace_cache_zc_put_rewind,
> > +	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
> > +	rte_trace_point_emit_ptr(cache);
> > +	rte_trace_point_emit_u32(nb_objs);
> > +)
> > +
> > +RTE_TRACE_POINT_FP(
> > +	rte_mempool_trace_cache_zc_get_bulk,
> > +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> > nb_objs),
> > +	rte_trace_point_emit_ptr(cache);
> > +	rte_trace_point_emit_ptr(mempool);
> > +	rte_trace_point_emit_u32(nb_objs);
> > +)
> > +
> >  #ifdef __cplusplus
> >  }
> >  #endif
> > diff --git a/lib/mempool/version.map b/lib/mempool/version.map index
> > b67d7aace7..1383ae6db2 100644
> > --- a/lib/mempool/version.map
> > +++ b/lib/mempool/version.map
> > @@ -63,6 +63,11 @@ EXPERIMENTAL {
> >  	__rte_mempool_trace_ops_alloc;
> >  	__rte_mempool_trace_ops_free;
> >  	__rte_mempool_trace_set_ops_byname;
> > +
> > +	# added in 23.03
> > +	__rte_mempool_trace_cache_zc_put_bulk;
> > +	__rte_mempool_trace_cache_zc_put_rewind;
> > +	__rte_mempool_trace_cache_zc_get_bulk;
> >  };
> >
> >  INTERNAL {
> > --
> > 2.17.1
  
Honnappa Nagarahalli Feb. 13, 2023, 4:29 a.m. UTC | #4
<snip>

> > > +/**
> > > + * @internal used by rte_mempool_cache_zc_put_bulk() and
> > > rte_mempool_do_generic_put().
> > > + *
> > > + * Zero-copy put objects in a mempool cache backed by the specified
> > > mempool.
> > > + *
> > > + * @param cache
> > > + *   A pointer to the mempool cache.
> > > + * @param mp
> > > + *   A pointer to the mempool.
> > > + * @param n
> > > + *   The number of objects to be put in the mempool cache.
> > > + * @return
> > > + *   The pointer to where to put the objects in the mempool cache.
> > > + *   NULL if the request itself is too big for the cache, i.e.
> > > + *   exceeds the cache flush threshold.
> > > + */
> > > +static __rte_always_inline void **
> > > +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > +		struct rte_mempool *mp,
> > > +		unsigned int n)
> > > +{
> > > +	void **cache_objs;
> > > +
> > > +	RTE_ASSERT(cache != NULL);
> > > +	RTE_ASSERT(mp != NULL);
> > > +
> > > +	if (n <= cache->flushthresh - cache->len) {
> > > +		/*
> > > +		 * The objects can be added to the cache without crossing
> > the
> > > +		 * flush threshold.
> > > +		 */
> > > +		cache_objs = &cache->objs[cache->len];
> > > +		cache->len += n;
> > > +	} else if (likely(n <= cache->flushthresh)) {
> > > +		/*
> > > +		 * The request itself fits into the cache.
> > > +		 * But first, the cache must be flushed to the backend, so
> > > +		 * adding the objects does not cross the flush threshold.
> > > +		 */
> > > +		cache_objs = &cache->objs[0];
> > > +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > > >len);
> > This is a flush of the cache. It is probably worth having a counter
> > for this.
> 
> We somewhat already do. The put_common_pool_bulk counter is
> incremented in rte_mempool_ops_enqueue_bulk() [1].
> 
> [1]:
> https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/mempool/rte_mempool.
> h#L824
> 
> This counter doesn't exactly count the number of times the cache was
> flushed, because it also counts bulk put transactions not going via the cache.
> 
> Thinking further about it, I agree that specific counters for cache flush and
> cache refill could be useful, and should be added. However, being this late, I
> would prefer postponing them for a separate patch.
Agree, can be in a separate patch, they never existed.

> 
> >
> > > +		cache->len = n;
> > > +	} else {
> > > +		/* The request itself is too big for the cache. */
> > This is possibly an error condition. Do we need to set rte_errno?
> 
> I considered this when I wrote the function, and concluded that this function
> only returns NULL as normal behavior, never because of failure. E.g. if a cache
> can only hold 4 mbufs, and the PMD tries to store 32 mbufs, it is correct
> behavior of the PMD to call this function and learn that the cache is too small
> for direct access; it is not a failure in the PMD nor in the cache.
This condition happens when there is a mismatch between the cache configuration and the behavior of the PMD. From this perspective I think this is an error. This could go unnoticed, I do not think this misconfiguration is reported anywhere.

> 
> If it could return NULL due to a variety of reasons, I would probably agree that
> we *need* to set rte_errno, so the application could determine the reason.
> 
> But since it can only return NULL due to one reason (which involves correct
> use of the function), I convinced myself that setting rte_errno would not
> convey any additional information than the NULL return value itself, and be a
> waste of performance in the fast path. If you disagree, then it should be set to
> EINVAL, like when rte_mempool_cache_zc_get_bulk() is called with a request
> too big for the cache.
> 
> > Do we need a counter here to capture that?
> 
> Good question. I don't know. It would indicate that a cache is smaller than
> expected by the users trying to access the cache directly.
> 
> And if we add such a counter, we should probably add a similar counter for
> the cache get function too.
Agree

> 
> But again, being this late, I would postpone such counters for a separate
> patch. And they should trigger more discussions about required/useful
> counters.
Agree, should be postponed to another patch.

> 
> For reference, the rte_mempool_debug_stats is cache aligned and currently
> holds 12 64-bit counters, so we can add 4 more - which is exactly the number
> discussed here - without changing its size. So this is not a barrier to adding
> those counters.
> 
> Furthermore, I suppose that we only want to increase the counter when the
> called through the mempool cache API, not when called indirectly through
> the mempool API. This would mean that the ordinary mempool functions
> cannot call the mempool cache functions, or the wrong counters would
> increase. So adding such counters is not completely trivial.
> 
> >
> > > +		return NULL;
> > > +	}
> > > +
> > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > +
> > > +	return cache_objs;
> > > +}
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > prior
> > > notice.
> > > + *
> > > + * Zero-copy put objects in a mempool cache backed by the specified
> > > mempool.
> > > + *
> > > + * @param cache
> > > + *   A pointer to the mempool cache.
> > > + * @param mp
> > > + *   A pointer to the mempool.
> > > + * @param n
> > > + *   The number of objects to be put in the mempool cache.
> > > + * @return
> > > + *   The pointer to where to put the objects in the mempool cache.
> > > + *   NULL if the request itself is too big for the cache, i.e.
> > > + *   exceeds the cache flush threshold.
> > > + */
> > > +__rte_experimental
> > > +static __rte_always_inline void **
> > > +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > +		struct rte_mempool *mp,
> > > +		unsigned int n)
> > > +{
> > > +	RTE_ASSERT(cache != NULL);
> > > +	RTE_ASSERT(mp != NULL);
> > > +
> > > +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> > > +	return __rte_mempool_cache_zc_put_bulk(cache, mp, n); }
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > prior
> > > notice.
> > > + *
> > > + * Zero-copy un-put objects in a mempool cache.
> > > + *
> > > + * @param cache
> > > + *   A pointer to the mempool cache.
> > > + * @param n
> > > + *   The number of objects not put in the mempool cache after
> > calling
> > > + *   rte_mempool_cache_zc_put_bulk().
> > > + */
> > > +__rte_experimental
> > > +static __rte_always_inline void
> > > +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> > > +		unsigned int n)
Earlier there was a discussion on the API name.
IMO, we should keep the API names similar to those in ring library. This would provide consistency across the libraries.
There were some concerns expressed in PMD having to call 2 APIs. I do not think changing to 2 APIs will have any perf impact.

Also, what is the use case for the 'rewind' API?

> > > +{
> > > +	RTE_ASSERT(cache != NULL);
> > > +	RTE_ASSERT(n <= cache->len);
> > > +
> > > +	rte_mempool_trace_cache_zc_put_rewind(cache, n);
> > > +
> > > +	cache->len -= n;
> > > +
> > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n); }
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > prior
> > > notice.
> > > + *
> > > + * Zero-copy get objects from a mempool cache backed by the
> > specified
> > > mempool.
> > > + *
> > > + * @param cache
> > > + *   A pointer to the mempool cache.
> > > + * @param mp
> > > + *   A pointer to the mempool.
> > > + * @param n
> > > + *   The number of objects to be made available for extraction from
> > the
> > > mempool cache.
> > > + * @return
> > > + *   The pointer to the objects in the mempool cache.
> > > + *   NULL on error; i.e. the cache + the pool does not contain 'n'
> > objects.
> > > + *   With rte_errno set to the error code of the mempool dequeue
> > function,
> > > + *   or EINVAL if the request itself is too big for the cache, i.e.
> > > + *   exceeds the cache flush threshold.
> > > + */
> > > +__rte_experimental
> > > +static __rte_always_inline void *
> > > +rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
> > > +		struct rte_mempool *mp,
> > > +		unsigned int n)
> > > +{
> > > +	unsigned int len, size;
> > > +
> > > +	RTE_ASSERT(cache != NULL);
> > > +	RTE_ASSERT(mp != NULL);
> > > +
> > > +	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
> > > +
> > > +	len = cache->len;
> > > +	size = cache->size;
> > > +
> > > +	if (n <= len) {
> > > +		/* The request can be satisfied from the cache as is. */
> > > +		len -= n;
> > > +	} else if (likely(n <= size)) {
> > > +		/*
> > > +		 * The request itself can be satisfied from the cache.
> > > +		 * But first, the cache must be filled from the backend;
> > > +		 * fetch size + requested - len objects.
> > > +		 */
> > > +		int ret;
> > > +
> > > +		ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> > > >objs[len], size + n - len);
> > > +		if (unlikely(ret < 0)) {
> > > +			/*
> > > +			 * We are buffer constrained.
> > > +			 * Do not fill the cache, just satisfy the request.
> > > +			 */
> > > +			ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> > > >objs[len], n - len);
> > > +			if (unlikely(ret < 0)) {
> > > +				/* Unable to satisfy the request. */
> > > +
> > > +				RTE_MEMPOOL_STAT_ADD(mp,
> > > get_fail_bulk, 1);
> > > +				RTE_MEMPOOL_STAT_ADD(mp,
> > > get_fail_objs, n);
> > > +
> > > +				rte_errno = -ret;
> > > +				return NULL;
> > > +			}
> > > +
> > > +			len = 0;
> > > +		} else {
> > > +			len = size;
> > > +		}
> > > +	} else {
> > > +		/* The request itself is too big for the cache. */
> > > +		rte_errno = EINVAL;
> > > +		return NULL;
> > > +	}
> > > +
> > > +	cache->len = len;
> > > +
> > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> > > +
> > > +	return &cache->objs[len];
> > > +}
> > > +
> > >  /**
> > >   * @internal Put several objects back in the mempool; used
> > internally.
> > >   * @param mp
> > > @@ -1364,32 +1557,25 @@ rte_mempool_do_generic_put(struct
> > > rte_mempool *mp, void * const *obj_table,  {
> > >  	void **cache_objs;
> > >
> > > -	/* No cache provided */
> > > -	if (unlikely(cache == NULL))
> > > -		goto driver_enqueue;
> > > +	/* No cache provided? */
> > > +	if (unlikely(cache == NULL)) {
> > > +		/* Increment stats now, adding in mempool always succeeds.
> > > */
> > > +		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > > +		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > >
> > > -	/* increment stat now, adding in mempool always success */
> > > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > +		goto driver_enqueue;
> > > +	}
> > >
> > > -	/* The request itself is too big for the cache */
> > > -	if (unlikely(n > cache->flushthresh))
> > > -		goto driver_enqueue_stats_incremented;
> > > +	/* Prepare to add the objects to the cache. */
> > > +	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> > >
> > > -	/*
> > > -	 * The cache follows the following algorithm:
> > > -	 *   1. If the objects cannot be added to the cache without
> > crossing
> > > -	 *      the flush threshold, flush the cache to the backend.
> > > -	 *   2. Add the objects to the cache.
> > > -	 */
> > > +	/* The request itself is too big for the cache? */
> > > +	if (unlikely(cache_objs == NULL)) {
> > > +		/* Increment stats now, adding in mempool always succeeds.
> > > */
> > > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > >
> > > -	if (cache->len + n <= cache->flushthresh) {
> > > -		cache_objs = &cache->objs[cache->len];
> > > -		cache->len += n;
> > > -	} else {
> > > -		cache_objs = &cache->objs[0];
> > > -		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > > >len);
> > > -		cache->len = n;
> > > +		goto driver_enqueue;
> > >  	}
> > >
> > >  	/* Add the objects to the cache. */ @@ -1399,13 +1585,7 @@
> > > rte_mempool_do_generic_put(struct rte_mempool *mp, void * const
> > > *obj_table,
> > >
> > >  driver_enqueue:
> > >
> > > -	/* increment stat now, adding in mempool always success */
> > > -	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > > -	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > > -
> > > -driver_enqueue_stats_incremented:
> > > -
> > > -	/* push objects to the backend */
> > > +	/* Push the objects to the backend. */
> > >  	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);  }
> > >
> > > diff --git a/lib/mempool/rte_mempool_trace_fp.h
> > > b/lib/mempool/rte_mempool_trace_fp.h
> > > index ed060e887c..14666457f7 100644
> > > --- a/lib/mempool/rte_mempool_trace_fp.h
> > > +++ b/lib/mempool/rte_mempool_trace_fp.h
> > > @@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
> > >  	rte_trace_point_emit_ptr(mempool);
> > >  )
> > >
> > > +RTE_TRACE_POINT_FP(
> > > +	rte_mempool_trace_cache_zc_put_bulk,
> > > +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> > > nb_objs),
> > > +	rte_trace_point_emit_ptr(cache);
> > > +	rte_trace_point_emit_ptr(mempool);
> > > +	rte_trace_point_emit_u32(nb_objs);
> > > +)
> > > +
> > > +RTE_TRACE_POINT_FP(
> > > +	rte_mempool_trace_cache_zc_put_rewind,
> > > +	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
> > > +	rte_trace_point_emit_ptr(cache);
> > > +	rte_trace_point_emit_u32(nb_objs);
> > > +)
> > > +
> > > +RTE_TRACE_POINT_FP(
> > > +	rte_mempool_trace_cache_zc_get_bulk,
> > > +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> > > nb_objs),
> > > +	rte_trace_point_emit_ptr(cache);
> > > +	rte_trace_point_emit_ptr(mempool);
> > > +	rte_trace_point_emit_u32(nb_objs);
> > > +)
> > > +
> > >  #ifdef __cplusplus
> > >  }
> > >  #endif
> > > diff --git a/lib/mempool/version.map b/lib/mempool/version.map index
> > > b67d7aace7..1383ae6db2 100644
> > > --- a/lib/mempool/version.map
> > > +++ b/lib/mempool/version.map
> > > @@ -63,6 +63,11 @@ EXPERIMENTAL {
> > >  	__rte_mempool_trace_ops_alloc;
> > >  	__rte_mempool_trace_ops_free;
> > >  	__rte_mempool_trace_set_ops_byname;
> > > +
> > > +	# added in 23.03
> > > +	__rte_mempool_trace_cache_zc_put_bulk;
> > > +	__rte_mempool_trace_cache_zc_put_rewind;
> > > +	__rte_mempool_trace_cache_zc_get_bulk;
> > >  };
> > >
> > >  INTERNAL {
> > > --
> > > 2.17.1
  
Morten Brørup Feb. 13, 2023, 9:30 a.m. UTC | #5
> From: Honnappa Nagarahalli [mailto:Honnappa.Nagarahalli@arm.com]
> Sent: Monday, 13 February 2023 05.30
> 
> <snip>
> 
> > > > +/**
> > > > + * @internal used by rte_mempool_cache_zc_put_bulk() and
> > > > rte_mempool_do_generic_put().
> > > > + *
> > > > + * Zero-copy put objects in a mempool cache backed by the
> specified
> > > > mempool.
> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param mp
> > > > + *   A pointer to the mempool.
> > > > + * @param n
> > > > + *   The number of objects to be put in the mempool cache.
> > > > + * @return
> > > > + *   The pointer to where to put the objects in the mempool
> cache.
> > > > + *   NULL if the request itself is too big for the cache, i.e.
> > > > + *   exceeds the cache flush threshold.
> > > > + */
> > > > +static __rte_always_inline void **
> > > > +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > > +		struct rte_mempool *mp,
> > > > +		unsigned int n)
> > > > +{
> > > > +	void **cache_objs;
> > > > +
> > > > +	RTE_ASSERT(cache != NULL);
> > > > +	RTE_ASSERT(mp != NULL);
> > > > +
> > > > +	if (n <= cache->flushthresh - cache->len) {
> > > > +		/*
> > > > +		 * The objects can be added to the cache without
> crossing
> > > the
> > > > +		 * flush threshold.
> > > > +		 */
> > > > +		cache_objs = &cache->objs[cache->len];
> > > > +		cache->len += n;
> > > > +	} else if (likely(n <= cache->flushthresh)) {
> > > > +		/*
> > > > +		 * The request itself fits into the cache.
> > > > +		 * But first, the cache must be flushed to the
> backend, so
> > > > +		 * adding the objects does not cross the flush
> threshold.
> > > > +		 */
> > > > +		cache_objs = &cache->objs[0];
> > > > +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > > > >len);
> > > This is a flush of the cache. It is probably worth having a counter
> > > for this.
> >
> > We somewhat already do. The put_common_pool_bulk counter is
> > incremented in rte_mempool_ops_enqueue_bulk() [1].
> >
> > [1]: https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/mempool/rte_mempool.h#L824
> >
> > This counter doesn't exactly count the number of times the cache was
> > flushed, because it also counts bulk put transactions not going via
> the cache.
> >
> > Thinking further about it, I agree that specific counters for cache
> flush and
> > cache refill could be useful, and should be added. However, being
> this late, I
> > would prefer postponing them for a separate patch.
> Agree, can be in a separate patch, they never existed.

OK. We have agreed to postpone the counter discussion to a separate patch. So let's resume the discussion there.

Such a patch will not make it for RC1, so I have put it on my TODO list for later.

> 
> >
> > >
> > > > +		cache->len = n;
> > > > +	} else {
> > > > +		/* The request itself is too big for the cache. */
> > > This is possibly an error condition. Do we need to set rte_errno?
> >
> > I considered this when I wrote the function, and concluded that this
> function
> > only returns NULL as normal behavior, never because of failure. E.g.
> if a cache
> > can only hold 4 mbufs, and the PMD tries to store 32 mbufs, it is
> correct
> > behavior of the PMD to call this function and learn that the cache is
> too small
> > for direct access; it is not a failure in the PMD nor in the cache.
> This condition happens when there is a mismatch between the cache
> configuration and the behavior of the PMD. From this perspective I
> think this is an error. This could go unnoticed, I do not think this
> misconfiguration is reported anywhere.

No strong objection from me.

In v9, I will also set rte_errno=EINVAL here.

> 
> >
> > If it could return NULL due to a variety of reasons, I would probably
> agree that
> > we *need* to set rte_errno, so the application could determine the
> reason.
> >
> > But since it can only return NULL due to one reason (which involves
> correct
> > use of the function), I convinced myself that setting rte_errno would
> not
> > convey any additional information than the NULL return value itself,
> and be a
> > waste of performance in the fast path. If you disagree, then it
> should be set to
> > EINVAL, like when rte_mempool_cache_zc_get_bulk() is called with a
> request
> > too big for the cache.
> >
> > > Do we need a counter here to capture that?
> >
> > Good question. I don't know. It would indicate that a cache is
> smaller than
> > expected by the users trying to access the cache directly.
> >
> > And if we add such a counter, we should probably add a similar
> counter for
> > the cache get function too.
> Agree
> 
> >
> > But again, being this late, I would postpone such counters for a
> separate
> > patch. And they should trigger more discussions about required/useful
> > counters.
> Agree, should be postponed to another patch.

Agreed. Let's move the counter discussion to the counters patch, when provided.

> 
> >
> > For reference, the rte_mempool_debug_stats is cache aligned and
> currently
> > holds 12 64-bit counters, so we can add 4 more - which is exactly the
> number
> > discussed here - without changing its size. So this is not a barrier
> to adding
> > those counters.
> >
> > Furthermore, I suppose that we only want to increase the counter when
> the
> > called through the mempool cache API, not when called indirectly
> through
> > the mempool API. This would mean that the ordinary mempool functions
> > cannot call the mempool cache functions, or the wrong counters would
> > increase. So adding such counters is not completely trivial.
> >
> > >
> > > > +		return NULL;
> > > > +	}
> > > > +
> > > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > > +
> > > > +	return cache_objs;
> > > > +}
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > prior
> > > > notice.
> > > > + *
> > > > + * Zero-copy put objects in a mempool cache backed by the
> specified
> > > > mempool.
> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param mp
> > > > + *   A pointer to the mempool.
> > > > + * @param n
> > > > + *   The number of objects to be put in the mempool cache.
> > > > + * @return
> > > > + *   The pointer to where to put the objects in the mempool
> cache.
> > > > + *   NULL if the request itself is too big for the cache, i.e.
> > > > + *   exceeds the cache flush threshold.
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline void **
> > > > +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > > +		struct rte_mempool *mp,
> > > > +		unsigned int n)
> > > > +{
> > > > +	RTE_ASSERT(cache != NULL);
> > > > +	RTE_ASSERT(mp != NULL);
> > > > +
> > > > +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> > > > +	return __rte_mempool_cache_zc_put_bulk(cache, mp, n); }
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > prior
> > > > notice.
> > > > + *
> > > > + * Zero-copy un-put objects in a mempool cache.
> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param n
> > > > + *   The number of objects not put in the mempool cache after
> > > calling
> > > > + *   rte_mempool_cache_zc_put_bulk().
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline void
> > > > +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> > > > +		unsigned int n)
> Earlier there was a discussion on the API name.

The discussion was here:

http://inbox.dpdk.org/dev/98CBD80474FA8B44BF855DF32C47DC35D875E8@smartserver.smartshare.dk/

> IMO, we should keep the API names similar to those in ring library.
> This would provide consistency across the libraries.
> There were some concerns expressed in PMD having to call 2 APIs. I do
> not think changing to 2 APIs will have any perf impact.

There is also the difference that the ring library implements locking with these APIs, whereas the mempool cache API do not need locking. The ring's _start() function is called to enter the critical section, and the _finish() function to leave the critical section.

I am usually in favor of consistency, but I would argue this: For functions (in the mempool cache) that are lockless, it might be confusing if they have names (from the ring library) that imply locking. DPDK is already bad at documenting the thread safeness of various functions; let's not make it worse by using confusing function names.

This is a democracy, so I am open to changing the API for consistency, if the community insists. Anyone interested, please indicate which API you prefer for the zero-copy mempool cache.

A. The unique API provided in this patch:
rte_mempool_cache_zc_get_bulk(), rte_mempool_cache_zc_put_bulk(), and the optional rte_mempool_cache_zc_put_rewind(); or

B. An API with _start and _finish, like in the ring library:
rte_mempool_cache_zc_get_bulk_start and _finish(), and rte_mempool_cache_zc_put_bulk_start() and _finish().

I am in favor of A: Keep the unique names as provided in the patch.

Konstantin accepted A, so unless he says otherwise, I'll treat that as a vote for A.

> 
> Also, what is the use case for the 'rewind' API?

It is for use cases where the application/PMD is opportunistic and prepares for zero-copying a full burst, but while zero-copying realizes that there was not a full burst to zero-copy.

E.g. a PMD freeing mbufs after transmit. It hopes to "fast free" the entire burst, and prepares for zero-copying the full burst to the mempool; but while zero-copying the mbufs to the mempool, a few of the mbufs don't live up to the criteria for "fast free", so they cannot be zero-copied, and thus the PMD frees those few mbufs normally instead. Then the PMD must call rewind() to adjust the zero-copy burst size accordingly when done.
  
Olivier Matz Feb. 13, 2023, 9:37 a.m. UTC | #6
Hello,

Thank you for this work, and sorry for the late feedback too.

On Mon, Feb 13, 2023 at 04:29:51AM +0000, Honnappa Nagarahalli wrote:
> <snip>
> 
> > > > +/**
> > > > + * @internal used by rte_mempool_cache_zc_put_bulk() and
> > > > rte_mempool_do_generic_put().
> > > > + *
> > > > + * Zero-copy put objects in a mempool cache backed by the specified
> > > > mempool.
> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param mp
> > > > + *   A pointer to the mempool.
> > > > + * @param n
> > > > + *   The number of objects to be put in the mempool cache.
> > > > + * @return
> > > > + *   The pointer to where to put the objects in the mempool cache.
> > > > + *   NULL if the request itself is too big for the cache, i.e.
> > > > + *   exceeds the cache flush threshold.
> > > > + */
> > > > +static __rte_always_inline void **
> > > > +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > > +		struct rte_mempool *mp,
> > > > +		unsigned int n)
> > > > +{
> > > > +	void **cache_objs;
> > > > +
> > > > +	RTE_ASSERT(cache != NULL);
> > > > +	RTE_ASSERT(mp != NULL);
> > > > +
> > > > +	if (n <= cache->flushthresh - cache->len) {

The previous code was doing this test instead:

if (cache->len + n <= cache->flushthresh)

I know there is an invariant asserting that cache->len <= cache->threshold,
so there is no real issue, but I'll tend to say that it is a good practise
to avoid substractions on unsigned values to avoid the risk of wrapping.

I also think the previous test was a bit more readable.

> > > > +		/*
> > > > +		 * The objects can be added to the cache without crossing
> > > the
> > > > +		 * flush threshold.
> > > > +		 */
> > > > +		cache_objs = &cache->objs[cache->len];
> > > > +		cache->len += n;
> > > > +	} else if (likely(n <= cache->flushthresh)) {
> > > > +		/*
> > > > +		 * The request itself fits into the cache.
> > > > +		 * But first, the cache must be flushed to the backend, so
> > > > +		 * adding the objects does not cross the flush threshold.
> > > > +		 */
> > > > +		cache_objs = &cache->objs[0];
> > > > +		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > > > >len);
> > > This is a flush of the cache. It is probably worth having a counter
> > > for this.
> > 
> > We somewhat already do. The put_common_pool_bulk counter is
> > incremented in rte_mempool_ops_enqueue_bulk() [1].
> > 
> > [1]:
> > https://elixir.bootlin.com/dpdk/v22.11.1/source/lib/mempool/rte_mempool.
> > h#L824
> > 
> > This counter doesn't exactly count the number of times the cache was
> > flushed, because it also counts bulk put transactions not going via the cache.
> > 
> > Thinking further about it, I agree that specific counters for cache flush and
> > cache refill could be useful, and should be added. However, being this late, I
> > would prefer postponing them for a separate patch.
> Agree, can be in a separate patch, they never existed.
> 
> > 
> > >
> > > > +		cache->len = n;
> > > > +	} else {
> > > > +		/* The request itself is too big for the cache. */
> > > This is possibly an error condition. Do we need to set rte_errno?
> > 
> > I considered this when I wrote the function, and concluded that this function
> > only returns NULL as normal behavior, never because of failure. E.g. if a cache
> > can only hold 4 mbufs, and the PMD tries to store 32 mbufs, it is correct
> > behavior of the PMD to call this function and learn that the cache is too small
> > for direct access; it is not a failure in the PMD nor in the cache.
> This condition happens when there is a mismatch between the cache configuration and the behavior of the PMD. From this perspective I think this is an error. This could go unnoticed, I do not think this misconfiguration is reported anywhere.
> 
> > 
> > If it could return NULL due to a variety of reasons, I would probably agree that
> > we *need* to set rte_errno, so the application could determine the reason.
> > 
> > But since it can only return NULL due to one reason (which involves correct
> > use of the function), I convinced myself that setting rte_errno would not
> > convey any additional information than the NULL return value itself, and be a
> > waste of performance in the fast path. If you disagree, then it should be set to
> > EINVAL, like when rte_mempool_cache_zc_get_bulk() is called with a request
> > too big for the cache.
> > 
> > > Do we need a counter here to capture that?
> > 
> > Good question. I don't know. It would indicate that a cache is smaller than
> > expected by the users trying to access the cache directly.
> > 
> > And if we add such a counter, we should probably add a similar counter for
> > the cache get function too.
> Agree
> 
> > 
> > But again, being this late, I would postpone such counters for a separate
> > patch. And they should trigger more discussions about required/useful
> > counters.
> Agree, should be postponed to another patch.
> 
> > 
> > For reference, the rte_mempool_debug_stats is cache aligned and currently
> > holds 12 64-bit counters, so we can add 4 more - which is exactly the number
> > discussed here - without changing its size. So this is not a barrier to adding
> > those counters.
> > 
> > Furthermore, I suppose that we only want to increase the counter when the
> > called through the mempool cache API, not when called indirectly through
> > the mempool API. This would mean that the ordinary mempool functions
> > cannot call the mempool cache functions, or the wrong counters would
> > increase. So adding such counters is not completely trivial.
> > 
> > >
> > > > +		return NULL;
> > > > +	}
> > > > +
> > > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > > +
> > > > +	return cache_objs;
> > > > +}
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > prior
> > > > notice.
> > > > + *
> > > > + * Zero-copy put objects in a mempool cache backed by the specified
> > > > mempool.

I think we should document the differences and advantage of using this
function over the standard version, explaining which copy is avoided,
why it is faster, ...

Also, we should say that once this function is called, the user has
to copy the objects to the cache.

> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param mp
> > > > + *   A pointer to the mempool.
> > > > + * @param n
> > > > + *   The number of objects to be put in the mempool cache.
> > > > + * @return
> > > > + *   The pointer to where to put the objects in the mempool cache.
> > > > + *   NULL if the request itself is too big for the cache, i.e.
> > > > + *   exceeds the cache flush threshold.
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline void **
> > > > +rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
> > > > +		struct rte_mempool *mp,
> > > > +		unsigned int n)
> > > > +{
> > > > +	RTE_ASSERT(cache != NULL);
> > > > +	RTE_ASSERT(mp != NULL);
> > > > +
> > > > +	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
> > > > +	return __rte_mempool_cache_zc_put_bulk(cache, mp, n); }
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > prior
> > > > notice.
> > > > + *
> > > > + * Zero-copy un-put objects in a mempool cache.
> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param n
> > > > + *   The number of objects not put in the mempool cache after
> > > calling
> > > > + *   rte_mempool_cache_zc_put_bulk().
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline void
> > > > +rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
> > > > +		unsigned int n)
> Earlier there was a discussion on the API name.
> IMO, we should keep the API names similar to those in ring library. This would provide consistency across the libraries.
> There were some concerns expressed in PMD having to call 2 APIs. I do not think changing to 2 APIs will have any perf impact.

I'm not really convinced by the API names too. Again, sorry, I know this
comment arrives after the battle.

Your proposal is:

/* Zero-copy put objects in a mempool cache backed by the specified mempool. */
rte_mempool_cache_zc_put_bulk(cache, mp, n)

/* Zero-copy get objects from a mempool cache backed by the specified mempool. */
rte_mempool_cache_zc_get_bulk(cache, mp, n)

Here are some observations:

- This was said in the discussion previously, but the functions do not
  really get or put objects in the cache. Instead, they prepare the
  cache (filling it or flushing it if needed) and update its length so
  that the user can do the effective copy.

- The "_cache" is superfluous for me: these functions do not deal more
  with the cache than the non zero-copy version

- The order of the parameters is (cache, mp, n) while the other functions
  that take a mempool and a cache as parameters have the mp first (see
  _generic versions).

- The "_bulk" is indeed present on other functions, but not all (the generic
  version does not have it), I'm not sure it is absolutely required

What do you think about these API below?

rte_mempool_prepare_zc_put(mp, n, cache)
rte_mempool_prepare_zc_get(mp, n, cache)

> 
> Also, what is the use case for the 'rewind' API?

+1

I have the same feeling that rewind() is not required now. It can be
added later if we find a use-case.

In case we want to keep it, I think we need to better specify in the API
comments in which unique conditions the function can be called
(i.e. after a call to rte_mempool_prepare_zc_put() with the same number
of objects, given no other operations were done on the mempool in
between). A call outside of these conditions has an undefined behavior.

> 
> > > > +{
> > > > +	RTE_ASSERT(cache != NULL);
> > > > +	RTE_ASSERT(n <= cache->len);
> > > > +
> > > > +	rte_mempool_trace_cache_zc_put_rewind(cache, n);
> > > > +
> > > > +	cache->len -= n;
> > > > +
> > > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n); }
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: This API may change, or be removed, without
> > > prior
> > > > notice.
> > > > + *
> > > > + * Zero-copy get objects from a mempool cache backed by the
> > > specified
> > > > mempool.
> > > > + *
> > > > + * @param cache
> > > > + *   A pointer to the mempool cache.
> > > > + * @param mp
> > > > + *   A pointer to the mempool.
> > > > + * @param n
> > > > + *   The number of objects to be made available for extraction from
> > > the
> > > > mempool cache.
> > > > + * @return
> > > > + *   The pointer to the objects in the mempool cache.
> > > > + *   NULL on error; i.e. the cache + the pool does not contain 'n'
> > > objects.
> > > > + *   With rte_errno set to the error code of the mempool dequeue
> > > function,
> > > > + *   or EINVAL if the request itself is too big for the cache, i.e.
> > > > + *   exceeds the cache flush threshold.
> > > > + */
> > > > +__rte_experimental
> > > > +static __rte_always_inline void *
> > > > +rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
> > > > +		struct rte_mempool *mp,
> > > > +		unsigned int n)
> > > > +{
> > > > +	unsigned int len, size;
> > > > +
> > > > +	RTE_ASSERT(cache != NULL);
> > > > +	RTE_ASSERT(mp != NULL);
> > > > +
> > > > +	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
> > > > +
> > > > +	len = cache->len;
> > > > +	size = cache->size;
> > > > +
> > > > +	if (n <= len) {
> > > > +		/* The request can be satisfied from the cache as is. */
> > > > +		len -= n;
> > > > +	} else if (likely(n <= size)) {
> > > > +		/*
> > > > +		 * The request itself can be satisfied from the cache.
> > > > +		 * But first, the cache must be filled from the backend;
> > > > +		 * fetch size + requested - len objects.
> > > > +		 */
> > > > +		int ret;
> > > > +
> > > > +		ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> > > > >objs[len], size + n - len);
> > > > +		if (unlikely(ret < 0)) {
> > > > +			/*
> > > > +			 * We are buffer constrained.
> > > > +			 * Do not fill the cache, just satisfy the request.
> > > > +			 */
> > > > +			ret = rte_mempool_ops_dequeue_bulk(mp, &cache-
> > > > >objs[len], n - len);
> > > > +			if (unlikely(ret < 0)) {
> > > > +				/* Unable to satisfy the request. */
> > > > +
> > > > +				RTE_MEMPOOL_STAT_ADD(mp,
> > > > get_fail_bulk, 1);
> > > > +				RTE_MEMPOOL_STAT_ADD(mp,
> > > > get_fail_objs, n);
> > > > +
> > > > +				rte_errno = -ret;
> > > > +				return NULL;
> > > > +			}
> > > > +
> > > > +			len = 0;
> > > > +		} else {
> > > > +			len = size;
> > > > +		}
> > > > +	} else {
> > > > +		/* The request itself is too big for the cache. */
> > > > +		rte_errno = EINVAL;
> > > > +		return NULL;
> > > > +	}
> > > > +
> > > > +	cache->len = len;
> > > > +
> > > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> > > > +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> > > > +
> > > > +	return &cache->objs[len];
> > > > +}
> > > > +
> > > >  /**
> > > >   * @internal Put several objects back in the mempool; used
> > > internally.
> > > >   * @param mp
> > > > @@ -1364,32 +1557,25 @@ rte_mempool_do_generic_put(struct
> > > > rte_mempool *mp, void * const *obj_table,  {
> > > >  	void **cache_objs;
> > > >
> > > > -	/* No cache provided */
> > > > -	if (unlikely(cache == NULL))
> > > > -		goto driver_enqueue;
> > > > +	/* No cache provided? */
> > > > +	if (unlikely(cache == NULL)) {
> > > > +		/* Increment stats now, adding in mempool always succeeds.
> > > > */
> > > > +		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > > > +		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > > >
> > > > -	/* increment stat now, adding in mempool always success */
> > > > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > -	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > > +		goto driver_enqueue;
> > > > +	}
> > > >
> > > > -	/* The request itself is too big for the cache */
> > > > -	if (unlikely(n > cache->flushthresh))
> > > > -		goto driver_enqueue_stats_incremented;
> > > > +	/* Prepare to add the objects to the cache. */
> > > > +	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
> > > >
> > > > -	/*
> > > > -	 * The cache follows the following algorithm:
> > > > -	 *   1. If the objects cannot be added to the cache without
> > > crossing
> > > > -	 *      the flush threshold, flush the cache to the backend.
> > > > -	 *   2. Add the objects to the cache.
> > > > -	 */
> > > > +	/* The request itself is too big for the cache? */
> > > > +	if (unlikely(cache_objs == NULL)) {
> > > > +		/* Increment stats now, adding in mempool always succeeds.
> > > > */
> > > > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> > > > +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> > > >
> > > > -	if (cache->len + n <= cache->flushthresh) {
> > > > -		cache_objs = &cache->objs[cache->len];
> > > > -		cache->len += n;
> > > > -	} else {
> > > > -		cache_objs = &cache->objs[0];
> > > > -		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache-
> > > > >len);
> > > > -		cache->len = n;
> > > > +		goto driver_enqueue;
> > > >  	}
> > > >
> > > >  	/* Add the objects to the cache. */ @@ -1399,13 +1585,7 @@
> > > > rte_mempool_do_generic_put(struct rte_mempool *mp, void * const
> > > > *obj_table,
> > > >
> > > >  driver_enqueue:
> > > >
> > > > -	/* increment stat now, adding in mempool always success */
> > > > -	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> > > > -	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> > > > -
> > > > -driver_enqueue_stats_incremented:
> > > > -
> > > > -	/* push objects to the backend */
> > > > +	/* Push the objects to the backend. */
> > > >  	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);  }
> > > >
> > > > diff --git a/lib/mempool/rte_mempool_trace_fp.h
> > > > b/lib/mempool/rte_mempool_trace_fp.h
> > > > index ed060e887c..14666457f7 100644
> > > > --- a/lib/mempool/rte_mempool_trace_fp.h
> > > > +++ b/lib/mempool/rte_mempool_trace_fp.h
> > > > @@ -109,6 +109,29 @@ RTE_TRACE_POINT_FP(
> > > >  	rte_trace_point_emit_ptr(mempool);
> > > >  )
> > > >
> > > > +RTE_TRACE_POINT_FP(
> > > > +	rte_mempool_trace_cache_zc_put_bulk,
> > > > +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> > > > nb_objs),
> > > > +	rte_trace_point_emit_ptr(cache);
> > > > +	rte_trace_point_emit_ptr(mempool);
> > > > +	rte_trace_point_emit_u32(nb_objs);
> > > > +)
> > > > +
> > > > +RTE_TRACE_POINT_FP(
> > > > +	rte_mempool_trace_cache_zc_put_rewind,
> > > > +	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
> > > > +	rte_trace_point_emit_ptr(cache);
> > > > +	rte_trace_point_emit_u32(nb_objs);
> > > > +)
> > > > +
> > > > +RTE_TRACE_POINT_FP(
> > > > +	rte_mempool_trace_cache_zc_get_bulk,
> > > > +	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t
> > > > nb_objs),
> > > > +	rte_trace_point_emit_ptr(cache);
> > > > +	rte_trace_point_emit_ptr(mempool);
> > > > +	rte_trace_point_emit_u32(nb_objs);
> > > > +)
> > > > +
> > > >  #ifdef __cplusplus
> > > >  }
> > > >  #endif
> > > > diff --git a/lib/mempool/version.map b/lib/mempool/version.map index
> > > > b67d7aace7..1383ae6db2 100644
> > > > --- a/lib/mempool/version.map
> > > > +++ b/lib/mempool/version.map
> > > > @@ -63,6 +63,11 @@ EXPERIMENTAL {
> > > >  	__rte_mempool_trace_ops_alloc;
> > > >  	__rte_mempool_trace_ops_free;
> > > >  	__rte_mempool_trace_set_ops_byname;
> > > > +
> > > > +	# added in 23.03
> > > > +	__rte_mempool_trace_cache_zc_put_bulk;
> > > > +	__rte_mempool_trace_cache_zc_put_rewind;
> > > > +	__rte_mempool_trace_cache_zc_get_bulk;
> > > >  };
> > > >
> > > >  INTERNAL {
> > > > --
> > > > 2.17.1
>
  
Morten Brørup Feb. 13, 2023, 10:25 a.m. UTC | #7
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Monday, 13 February 2023 10.37
> 
> Hello,
> 
> Thank you for this work, and sorry for the late feedback too.

Better late than never. And it's a core library, so important to get it right!

> 
> On Mon, Feb 13, 2023 at 04:29:51AM +0000, Honnappa Nagarahalli wrote:
> > <snip>
> >
> > > > > +/**
> > > > > + * @internal used by rte_mempool_cache_zc_put_bulk() and
> > > > > rte_mempool_do_generic_put().
> > > > > + *
> > > > > + * Zero-copy put objects in a mempool cache backed by the
> specified
> > > > > mempool.
> > > > > + *
> > > > > + * @param cache
> > > > > + *   A pointer to the mempool cache.
> > > > > + * @param mp
> > > > > + *   A pointer to the mempool.
> > > > > + * @param n
> > > > > + *   The number of objects to be put in the mempool cache.
> > > > > + * @return
> > > > > + *   The pointer to where to put the objects in the mempool
> cache.
> > > > > + *   NULL if the request itself is too big for the cache, i.e.
> > > > > + *   exceeds the cache flush threshold.
> > > > > + */
> > > > > +static __rte_always_inline void **
> > > > > +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache
> *cache,
> > > > > +		struct rte_mempool *mp,
> > > > > +		unsigned int n)
> > > > > +{
> > > > > +	void **cache_objs;
> > > > > +
> > > > > +	RTE_ASSERT(cache != NULL);
> > > > > +	RTE_ASSERT(mp != NULL);
> > > > > +
> > > > > +	if (n <= cache->flushthresh - cache->len) {
> 
> The previous code was doing this test instead:
> 
> if (cache->len + n <= cache->flushthresh)
> 
> I know there is an invariant asserting that cache->len <= cache-
> >threshold,
> so there is no real issue, but I'll tend to say that it is a good
> practise
> to avoid substractions on unsigned values to avoid the risk of
> wrapping.
> 
> I also think the previous test was a bit more readable.

I agree with you, but I didn't object to Andrew's recommendation of changing it to this, so I did.

I will change it back. Konstantin, I hope you don't mind. :-)

[...]

> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: This API may change, or be removed,
> without
> > > > prior
> > > > > notice.
> > > > > + *
> > > > > + * Zero-copy put objects in a mempool cache backed by the
> specified
> > > > > mempool.
> 
> I think we should document the differences and advantage of using this
> function over the standard version, explaining which copy is avoided,
> why it is faster, ...
> 
> Also, we should say that once this function is called, the user has
> to copy the objects to the cache.
> 

I agree, the function descriptions could be more verbose.

If we want to get this feature into DPDK now, we can postpone the descriptions improvements to a later patch.

[...]

> > Earlier there was a discussion on the API name.
> > IMO, we should keep the API names similar to those in ring library.
> This would provide consistency across the libraries.
> > There were some concerns expressed in PMD having to call 2 APIs. I do
> not think changing to 2 APIs will have any perf impact.
> 
> I'm not really convinced by the API names too. Again, sorry, I know
> this
> comment arrives after the battle.
> 
> Your proposal is:
> 
> /* Zero-copy put objects in a mempool cache backed by the specified
> mempool. */
> rte_mempool_cache_zc_put_bulk(cache, mp, n)
> 
> /* Zero-copy get objects from a mempool cache backed by the specified
> mempool. */
> rte_mempool_cache_zc_get_bulk(cache, mp, n)
> 
> Here are some observations:
> 
> - This was said in the discussion previously, but the functions do not
>   really get or put objects in the cache. Instead, they prepare the
>   cache (filling it or flushing it if needed) and update its length so
>   that the user can do the effective copy.

Can be fixed by improving function descriptions.

> 
> - The "_cache" is superfluous for me: these functions do not deal more
>   with the cache than the non zero-copy version

I have been thinking of these as "mempool cache" APIs.

I don't mind getting rid of "_cache" in their names, if we agree that they are "mempool" functions, instead of "mempool cache" functions.

> 
> - The order of the parameters is (cache, mp, n) while the other
> functions
>   that take a mempool and a cache as parameters have the mp first (see
>   _generic versions).

The order of the parameters was due to considering these as "mempool cache" functions, so I followed the convention for an existing "mempool cache" function:

rte_mempool_cache_flush(struct rte_mempool_cache *cache,
		struct rte_mempool *mp);

If we instead consider them as simple "mempool" functions, I agree with you about the parameter ordering.

So, what does the community think... Are these "mempool cache" functions, or just "mempool" functions?

> 
> - The "_bulk" is indeed present on other functions, but not all (the
> generic
>   version does not have it), I'm not sure it is absolutely required

The mempool library offers both single-object and bulk functions, so the function names must include "_bulk".

> 
> What do you think about these API below?
> 
> rte_mempool_prepare_zc_put(mp, n, cache)
> rte_mempool_prepare_zc_get(mp, n, cache)

I initially used "prepare" in the names, but since we don't have accompanying "commit" functions, I decided against "prepare" to avoid confusion. (Any SQL developer will probably agree with me on this.)

> 
> >
> > Also, what is the use case for the 'rewind' API?
> 
> +1
> 
> I have the same feeling that rewind() is not required now. It can be
> added later if we find a use-case.
> 
> In case we want to keep it, I think we need to better specify in the
> API
> comments in which unique conditions the function can be called
> (i.e. after a call to rte_mempool_prepare_zc_put() with the same number
> of objects, given no other operations were done on the mempool in
> between). A call outside of these conditions has an undefined behavior.

Please refer to my answer to Honnappa on this topic.
  
Andrew Rybchenko Feb. 14, 2023, 2:16 p.m. UTC | #8
On 2/13/23 13:25, Morten Brørup wrote:
>> From: Olivier Matz [mailto:olivier.matz@6wind.com]
>> Sent: Monday, 13 February 2023 10.37
>>
>> Hello,
>>
>> Thank you for this work, and sorry for the late feedback too.
> 
> Better late than never. And it's a core library, so important to get it right!
> 
>>
>> On Mon, Feb 13, 2023 at 04:29:51AM +0000, Honnappa Nagarahalli wrote:
>>> <snip>
>>>
>>>>>> +/**
>>>>>> + * @internal used by rte_mempool_cache_zc_put_bulk() and
>>>>>> rte_mempool_do_generic_put().
>>>>>> + *
>>>>>> + * Zero-copy put objects in a mempool cache backed by the
>> specified
>>>>>> mempool.
>>>>>> + *
>>>>>> + * @param cache
>>>>>> + *   A pointer to the mempool cache.
>>>>>> + * @param mp
>>>>>> + *   A pointer to the mempool.
>>>>>> + * @param n
>>>>>> + *   The number of objects to be put in the mempool cache.
>>>>>> + * @return
>>>>>> + *   The pointer to where to put the objects in the mempool
>> cache.
>>>>>> + *   NULL if the request itself is too big for the cache, i.e.
>>>>>> + *   exceeds the cache flush threshold.
>>>>>> + */
>>>>>> +static __rte_always_inline void **
>>>>>> +__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache
>> *cache,
>>>>>> +		struct rte_mempool *mp,
>>>>>> +		unsigned int n)
>>>>>> +{
>>>>>> +	void **cache_objs;
>>>>>> +
>>>>>> +	RTE_ASSERT(cache != NULL);
>>>>>> +	RTE_ASSERT(mp != NULL);
>>>>>> +
>>>>>> +	if (n <= cache->flushthresh - cache->len) {
>>
>> The previous code was doing this test instead:
>>
>> if (cache->len + n <= cache->flushthresh)
>>
>> I know there is an invariant asserting that cache->len <= cache-
>>> threshold,
>> so there is no real issue, but I'll tend to say that it is a good
>> practise
>> to avoid substractions on unsigned values to avoid the risk of
>> wrapping.
>>
>> I also think the previous test was a bit more readable.
> 
> I agree with you, but I didn't object to Andrew's recommendation of changing it to this, so I did.
> 
> I will change it back. Konstantin, I hope you don't mind. :-)

I've suggested to use minus here to ensure that we handle
extremely big 'n' value here correctly (which would result in
addition overflow).

> 
> [...]
> 
>>>>>> +/**
>>>>>> + * @warning
>>>>>> + * @b EXPERIMENTAL: This API may change, or be removed,
>> without
>>>>> prior
>>>>>> notice.
>>>>>> + *
>>>>>> + * Zero-copy put objects in a mempool cache backed by the
>> specified
>>>>>> mempool.
>>
>> I think we should document the differences and advantage of using this
>> function over the standard version, explaining which copy is avoided,
>> why it is faster, ...
>>
>> Also, we should say that once this function is called, the user has
>> to copy the objects to the cache.
>>
> 
> I agree, the function descriptions could be more verbose.
> 
> If we want to get this feature into DPDK now, we can postpone the descriptions improvements to a later patch.

No strong opinion, but I'd wait for description improvements.
It is very important to have good description from the very
beginning.
I'll try to find time this week to help, but can't promise.
May be it is already late...

> 
> [...]
> 
>>> Earlier there was a discussion on the API name.
>>> IMO, we should keep the API names similar to those in ring library.
>> This would provide consistency across the libraries.
>>> There were some concerns expressed in PMD having to call 2 APIs. I do
>> not think changing to 2 APIs will have any perf impact.
>>
>> I'm not really convinced by the API names too. Again, sorry, I know
>> this
>> comment arrives after the battle.
>>
>> Your proposal is:
>>
>> /* Zero-copy put objects in a mempool cache backed by the specified
>> mempool. */
>> rte_mempool_cache_zc_put_bulk(cache, mp, n)
>>
>> /* Zero-copy get objects from a mempool cache backed by the specified
>> mempool. */
>> rte_mempool_cache_zc_get_bulk(cache, mp, n)
>>
>> Here are some observations:
>>
>> - This was said in the discussion previously, but the functions do not
>>    really get or put objects in the cache. Instead, they prepare the
>>    cache (filling it or flushing it if needed) and update its length so
>>    that the user can do the effective copy.
> 
> Can be fixed by improving function descriptions.
> 
>>
>> - The "_cache" is superfluous for me: these functions do not deal more
>>    with the cache than the non zero-copy version
> 
> I have been thinking of these as "mempool cache" APIs.
> 
> I don't mind getting rid of "_cache" in their names, if we agree that they are "mempool" functions, instead of "mempool cache" functions.
> 
>>
>> - The order of the parameters is (cache, mp, n) while the other
>> functions
>>    that take a mempool and a cache as parameters have the mp first (see
>>    _generic versions).
> 
> The order of the parameters was due to considering these as "mempool cache" functions, so I followed the convention for an existing "mempool cache" function:
> 
> rte_mempool_cache_flush(struct rte_mempool_cache *cache,
> 		struct rte_mempool *mp);
> 
> If we instead consider them as simple "mempool" functions, I agree with you about the parameter ordering.
> 
> So, what does the community think... Are these "mempool cache" functions, or just "mempool" functions?

Since 'cache' is mandatory here (it cannot be NULL), I agree
that it is 'mempool cache' API, not 'mempool API'.

> 
>>
>> - The "_bulk" is indeed present on other functions, but not all (the
>> generic
>>    version does not have it), I'm not sure it is absolutely required
> 
> The mempool library offers both single-object and bulk functions, so the function names must include "_bulk".

I have no strong opinion here. Yes, "bulk" is nice for
consistency, but IMHO not strictly required since it
makes function name longer and there is no value
in single-object version of these functions.

> 
>>
>> What do you think about these API below?
>>
>> rte_mempool_prepare_zc_put(mp, n, cache)
>> rte_mempool_prepare_zc_get(mp, n, cache)
> 
> I initially used "prepare" in the names, but since we don't have accompanying "commit" functions, I decided against "prepare" to avoid confusion. (Any SQL developer will probably agree with me on this.)

prepare -> reserve?

However, in the case of get we really do get. When function
returns corresponding objects are fully got from mempool.
Yes, in the case of put we need to copy object pointers
into provided space. However, we don't need to call any
mempool (cache) API to commit it. Plus API naming symmetry.
So, I'd *not* add prepare/reserve to function names.

> 
>>
>>>
>>> Also, what is the use case for the 'rewind' API?
>>
>> +1
>>
>> I have the same feeling that rewind() is not required now. It can be
>> added later if we find a use-case.
>>
>> In case we want to keep it, I think we need to better specify in the
>> API
>> comments in which unique conditions the function can be called
>> (i.e. after a call to rte_mempool_prepare_zc_put() with the same number
>> of objects, given no other operations were done on the mempool in
>> between). A call outside of these conditions has an undefined behavior.
> 
> Please refer to my answer to Honnappa on this topic.
>
  

Patch

diff --git a/lib/mempool/mempool_trace_points.c b/lib/mempool/mempool_trace_points.c
index 4ad76deb34..83d353a764 100644
--- a/lib/mempool/mempool_trace_points.c
+++ b/lib/mempool/mempool_trace_points.c
@@ -77,3 +77,12 @@  RTE_TRACE_POINT_REGISTER(rte_mempool_trace_ops_free,
 
 RTE_TRACE_POINT_REGISTER(rte_mempool_trace_set_ops_byname,
 	lib.mempool.set.ops.byname)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_bulk,
+	lib.mempool.cache.zc.put.bulk)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_put_rewind,
+	lib.mempool.cache.zc.put.rewind)
+
+RTE_TRACE_POINT_REGISTER(rte_mempool_trace_cache_zc_get_bulk,
+	lib.mempool.cache.zc.get.bulk)
diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 9f530db24b..711a9d1c16 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -42,6 +42,7 @@ 
 #include <rte_config.h>
 #include <rte_spinlock.h>
 #include <rte_debug.h>
+#include <rte_errno.h>
 #include <rte_lcore.h>
 #include <rte_branch_prediction.h>
 #include <rte_ring.h>
@@ -1346,6 +1347,198 @@  rte_mempool_cache_flush(struct rte_mempool_cache *cache,
 	cache->len = 0;
 }
 
+
+/**
+ * @internal used by rte_mempool_cache_zc_put_bulk() and rte_mempool_do_generic_put().
+ *
+ * Zero-copy put objects in a mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be put in the mempool cache.
+ * @return
+ *   The pointer to where to put the objects in the mempool cache.
+ *   NULL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+static __rte_always_inline void **
+__rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	void **cache_objs;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	if (n <= cache->flushthresh - cache->len) {
+		/*
+		 * The objects can be added to the cache without crossing the
+		 * flush threshold.
+		 */
+		cache_objs = &cache->objs[cache->len];
+		cache->len += n;
+	} else if (likely(n <= cache->flushthresh)) {
+		/*
+		 * The request itself fits into the cache.
+		 * But first, the cache must be flushed to the backend, so
+		 * adding the objects does not cross the flush threshold.
+		 */
+		cache_objs = &cache->objs[0];
+		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
+		cache->len = n;
+	} else {
+		/* The request itself is too big for the cache. */
+		return NULL;
+	}
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
+
+	return cache_objs;
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy put objects in a mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be put in the mempool cache.
+ * @return
+ *   The pointer to where to put the objects in the mempool cache.
+ *   NULL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+__rte_experimental
+static __rte_always_inline void **
+rte_mempool_cache_zc_put_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	rte_mempool_trace_cache_zc_put_bulk(cache, mp, n);
+	return __rte_mempool_cache_zc_put_bulk(cache, mp, n);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy un-put objects in a mempool cache.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param n
+ *   The number of objects not put in the mempool cache after calling
+ *   rte_mempool_cache_zc_put_bulk().
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_mempool_cache_zc_put_rewind(struct rte_mempool_cache *cache,
+		unsigned int n)
+{
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(n <= cache->len);
+
+	rte_mempool_trace_cache_zc_put_rewind(cache, n);
+
+	cache->len -= n;
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, (int)-n);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: This API may change, or be removed, without prior notice.
+ *
+ * Zero-copy get objects from a mempool cache backed by the specified mempool.
+ *
+ * @param cache
+ *   A pointer to the mempool cache.
+ * @param mp
+ *   A pointer to the mempool.
+ * @param n
+ *   The number of objects to be made available for extraction from the mempool cache.
+ * @return
+ *   The pointer to the objects in the mempool cache.
+ *   NULL on error; i.e. the cache + the pool does not contain 'n' objects.
+ *   With rte_errno set to the error code of the mempool dequeue function,
+ *   or EINVAL if the request itself is too big for the cache, i.e.
+ *   exceeds the cache flush threshold.
+ */
+__rte_experimental
+static __rte_always_inline void *
+rte_mempool_cache_zc_get_bulk(struct rte_mempool_cache *cache,
+		struct rte_mempool *mp,
+		unsigned int n)
+{
+	unsigned int len, size;
+
+	RTE_ASSERT(cache != NULL);
+	RTE_ASSERT(mp != NULL);
+
+	rte_mempool_trace_cache_zc_get_bulk(cache, mp, n);
+
+	len = cache->len;
+	size = cache->size;
+
+	if (n <= len) {
+		/* The request can be satisfied from the cache as is. */
+		len -= n;
+	} else if (likely(n <= size)) {
+		/*
+		 * The request itself can be satisfied from the cache.
+		 * But first, the cache must be filled from the backend;
+		 * fetch size + requested - len objects.
+		 */
+		int ret;
+
+		ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], size + n - len);
+		if (unlikely(ret < 0)) {
+			/*
+			 * We are buffer constrained.
+			 * Do not fill the cache, just satisfy the request.
+			 */
+			ret = rte_mempool_ops_dequeue_bulk(mp, &cache->objs[len], n - len);
+			if (unlikely(ret < 0)) {
+				/* Unable to satisfy the request. */
+
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
+				RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
+
+				rte_errno = -ret;
+				return NULL;
+			}
+
+			len = 0;
+		} else {
+			len = size;
+		}
+	} else {
+		/* The request itself is too big for the cache. */
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	cache->len = len;
+
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+
+	return &cache->objs[len];
+}
+
 /**
  * @internal Put several objects back in the mempool; used internally.
  * @param mp
@@ -1364,32 +1557,25 @@  rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 {
 	void **cache_objs;
 
-	/* No cache provided */
-	if (unlikely(cache == NULL))
-		goto driver_enqueue;
+	/* No cache provided? */
+	if (unlikely(cache == NULL)) {
+		/* Increment stats now, adding in mempool always succeeds. */
+		RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
+		RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
 
-	/* increment stat now, adding in mempool always success */
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
-	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
+		goto driver_enqueue;
+	}
 
-	/* The request itself is too big for the cache */
-	if (unlikely(n > cache->flushthresh))
-		goto driver_enqueue_stats_incremented;
+	/* Prepare to add the objects to the cache. */
+	cache_objs = __rte_mempool_cache_zc_put_bulk(cache, mp, n);
 
-	/*
-	 * The cache follows the following algorithm:
-	 *   1. If the objects cannot be added to the cache without crossing
-	 *      the flush threshold, flush the cache to the backend.
-	 *   2. Add the objects to the cache.
-	 */
+	/* The request itself is too big for the cache? */
+	if (unlikely(cache_objs == NULL)) {
+		/* Increment stats now, adding in mempool always succeeds. */
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
 
-	if (cache->len + n <= cache->flushthresh) {
-		cache_objs = &cache->objs[cache->len];
-		cache->len += n;
-	} else {
-		cache_objs = &cache->objs[0];
-		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
-		cache->len = n;
+		goto driver_enqueue;
 	}
 
 	/* Add the objects to the cache. */
@@ -1399,13 +1585,7 @@  rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 
 driver_enqueue:
 
-	/* increment stat now, adding in mempool always success */
-	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
-	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
-
-driver_enqueue_stats_incremented:
-
-	/* push objects to the backend */
+	/* Push the objects to the backend. */
 	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
 }
 
diff --git a/lib/mempool/rte_mempool_trace_fp.h b/lib/mempool/rte_mempool_trace_fp.h
index ed060e887c..14666457f7 100644
--- a/lib/mempool/rte_mempool_trace_fp.h
+++ b/lib/mempool/rte_mempool_trace_fp.h
@@ -109,6 +109,29 @@  RTE_TRACE_POINT_FP(
 	rte_trace_point_emit_ptr(mempool);
 )
 
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_put_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_put_rewind,
+	RTE_TRACE_POINT_ARGS(void *cache, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
+RTE_TRACE_POINT_FP(
+	rte_mempool_trace_cache_zc_get_bulk,
+	RTE_TRACE_POINT_ARGS(void *cache, void *mempool, uint32_t nb_objs),
+	rte_trace_point_emit_ptr(cache);
+	rte_trace_point_emit_ptr(mempool);
+	rte_trace_point_emit_u32(nb_objs);
+)
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/mempool/version.map b/lib/mempool/version.map
index b67d7aace7..1383ae6db2 100644
--- a/lib/mempool/version.map
+++ b/lib/mempool/version.map
@@ -63,6 +63,11 @@  EXPERIMENTAL {
 	__rte_mempool_trace_ops_alloc;
 	__rte_mempool_trace_ops_free;
 	__rte_mempool_trace_set_ops_byname;
+
+	# added in 23.03
+	__rte_mempool_trace_cache_zc_put_bulk;
+	__rte_mempool_trace_cache_zc_put_rewind;
+	__rte_mempool_trace_cache_zc_get_bulk;
 };
 
 INTERNAL {