[v4,3/3] mempool: use cache for frequently updated stats

Message ID 20221104120329.1219-3-mb@smartsharesystems.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [v4,1/3] mempool: split stats from debug |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/iol-mellanox-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/github-robot: build success github build: passed
ci/iol-x86_64-compile-testing success Testing PASS

Commit Message

Morten Brørup Nov. 4, 2022, 12:03 p.m. UTC
  When built with stats enabled (RTE_LIBRTE_MEMPOOL_STATS defined), the
performance of mempools with caches is improved as follows.

When accessing objects in the mempool, either the put_bulk and put_objs or
the get_success_bulk and get_success_objs statistics counters are likely
to be incremented.

By adding an alternative set of these counters to the mempool cache
structure, accessing the dedicated statistics structure is avoided in the
likely cases where these counters are incremented.

The trick here is that the cache line holding the mempool cache structure
is accessed anyway, in order to access the 'len' or 'flushthresh' fields.
Updating some statistics counters in the same cache line has lower
performance cost than accessing the statistics counters in the dedicated
statistics structure, which resides in another cache line.

mempool_perf_autotest with this patch shows the following improvements in
rate_persec.

The cost of enabling mempool stats (without debug) after this patch:
-6.8 % and -6.7 %, respectively without and with cache.

v4:
* Fix checkpatch warnings:
  A couple of typos in the patch description.
  The macro to add to a mempool cache stat variable should not use
  do {} while (0). Personally, I would tend to disagree with this, but
  whatever keeps the CI happy.
v3:
* Don't update the description of the RTE_MEMPOOL_STAT_ADD macro.
  This change belongs in the first patch of the series.
v2:
* Move the statistics counters into a stats structure.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/mempool/rte_mempool.c |  9 ++++++
 lib/mempool/rte_mempool.h | 66 ++++++++++++++++++++++++++++++++-------
 2 files changed, 64 insertions(+), 11 deletions(-)
  

Comments

Andrew Rybchenko Nov. 6, 2022, 11:40 a.m. UTC | #1
On 11/4/22 15:03, Morten Brørup wrote:
> When built with stats enabled (RTE_LIBRTE_MEMPOOL_STATS defined), the
> performance of mempools with caches is improved as follows.
> 
> When accessing objects in the mempool, either the put_bulk and put_objs or
> the get_success_bulk and get_success_objs statistics counters are likely
> to be incremented.
> 
> By adding an alternative set of these counters to the mempool cache
> structure, accessing the dedicated statistics structure is avoided in the
> likely cases where these counters are incremented.
> 
> The trick here is that the cache line holding the mempool cache structure
> is accessed anyway, in order to access the 'len' or 'flushthresh' fields.
> Updating some statistics counters in the same cache line has lower
> performance cost than accessing the statistics counters in the dedicated
> statistics structure, which resides in another cache line.
> 
> mempool_perf_autotest with this patch shows the following improvements in
> rate_persec.
> 
> The cost of enabling mempool stats (without debug) after this patch:
> -6.8 % and -6.7 %, respectively without and with cache.
> 
> v4:
> * Fix checkpatch warnings:
>    A couple of typos in the patch description.
>    The macro to add to a mempool cache stat variable should not use
>    do {} while (0). Personally, I would tend to disagree with this, but
>    whatever keeps the CI happy.
> v3:
> * Don't update the description of the RTE_MEMPOOL_STAT_ADD macro.
>    This change belongs in the first patch of the series.
> v2:
> * Move the statistics counters into a stats structure.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>

Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

[snip]

> +/**
> + * @internal When stats is enabled, store some statistics.
> + *
> + * @param cache
> + *   Pointer to the memory pool cache.
> + * @param name
> + *   Name of the statistics field to increment in the memory pool cache.
> + * @param n
> + *   Number to add to the statistics.
> + */
> +#ifdef RTE_LIBRTE_MEMPOOL_STATS
> +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) (cache)->stats.name += n

I'd enclose it in parenthesis.

> +#else
> +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) do {} while (0)
> +#endif
> +
  
Morten Brørup Nov. 6, 2022, 11:50 a.m. UTC | #2
> From: Andrew Rybchenko [mailto:andrew.rybchenko@oktetlabs.ru]
> Sent: Sunday, 6 November 2022 12.41
> 
> On 11/4/22 15:03, Morten Brørup wrote:

[...]

> > +/**
> > + * @internal When stats is enabled, store some statistics.
> > + *
> > + * @param cache
> > + *   Pointer to the memory pool cache.
> > + * @param name
> > + *   Name of the statistics field to increment in the memory pool
> cache.
> > + * @param n
> > + *   Number to add to the statistics.
> > + */
> > +#ifdef RTE_LIBRTE_MEMPOOL_STATS
> > +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) (cache)-
> >stats.name += n
> 
> I'd enclose it in parenthesis.

Me too! I had it surrounded by "do {...} while (0)" in v3, but checkpatch complained about it [1], so I changed it to the above. Which checkpatch also complains about. :-(

[1]: http://mails.dpdk.org/archives/test-report/2022-November/321316.html

Feel free to modify this macro at your preference when merging!

> 
> > +#else
> > +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) do {} while (0)
> > +#endif
> > +
  
Andrew Rybchenko Nov. 6, 2022, 11:59 a.m. UTC | #3
On 11/6/22 14:50, Morten Brørup wrote:
>> From: Andrew Rybchenko [mailto:andrew.rybchenko@oktetlabs.ru]
>> Sent: Sunday, 6 November 2022 12.41
>>
>> On 11/4/22 15:03, Morten Brørup wrote:
> 
> [...]
> 
>>> +/**
>>> + * @internal When stats is enabled, store some statistics.
>>> + *
>>> + * @param cache
>>> + *   Pointer to the memory pool cache.
>>> + * @param name
>>> + *   Name of the statistics field to increment in the memory pool
>> cache.
>>> + * @param n
>>> + *   Number to add to the statistics.
>>> + */
>>> +#ifdef RTE_LIBRTE_MEMPOOL_STATS
>>> +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) (cache)-
>>> stats.name += n
>>
>> I'd enclose it in parenthesis.
> 
> Me too! I had it surrounded by "do {...} while (0)" in v3, but checkpatch complained about it [1], so I changed it to the above. Which checkpatch also complains about. :-(

I mean
#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) \
  ((cache)->stats.name += (n))

> 
> [1]: http://mails.dpdk.org/archives/test-report/2022-November/321316.html

Yes, I've seen it.

> 
> Feel free to modify this macro at your preference when merging!
> 
>>
>>> +#else
>>> +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) do {} while (0)
>>> +#endif
>>> +
>
  
Morten Brørup Nov. 6, 2022, 12:16 p.m. UTC | #4
> From: Andrew Rybchenko [mailto:andrew.rybchenko@oktetlabs.ru]
> Sent: Sunday, 6 November 2022 13.00
> 
> On 11/6/22 14:50, Morten Brørup wrote:
> >> From: Andrew Rybchenko [mailto:andrew.rybchenko@oktetlabs.ru]
> >> Sent: Sunday, 6 November 2022 12.41
> >>
> >> On 11/4/22 15:03, Morten Brørup wrote:
> >
> > [...]
> >
> >>> +/**
> >>> + * @internal When stats is enabled, store some statistics.
> >>> + *
> >>> + * @param cache
> >>> + *   Pointer to the memory pool cache.
> >>> + * @param name
> >>> + *   Name of the statistics field to increment in the memory pool
> >> cache.
> >>> + * @param n
> >>> + *   Number to add to the statistics.
> >>> + */
> >>> +#ifdef RTE_LIBRTE_MEMPOOL_STATS
> >>> +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) (cache)-
> >>> stats.name += n
> >>
> >> I'd enclose it in parenthesis.
> >
> > Me too! I had it surrounded by "do {...} while (0)" in v3, but
> checkpatch complained about it [1], so I changed it to the above. Which
> checkpatch also complains about. :-(
> 
> I mean
> #define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) \
>   ((cache)->stats.name += (n))

Thank you for elaborating, Andrew. I guess this is exactly what the checkpatch error message for v4 [2] says - I just couldn't understand what it wanted me to do.

[2]: http://mails.dpdk.org/archives/test-report/2022-November/321345.html

I don't always agree with checkpatch, but now Thomas has a third option for how to write this macro when merging. :-)

> 
> >
> > [1]: http://mails.dpdk.org/archives/test-report/2022-
> November/321316.html
> 
> Yes, I've seen it.
> 
> >
> > Feel free to modify this macro at your preference when merging!
> >
> >>
> >>> +#else
> >>> +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) do {} while (0)
> >>> +#endif
> >>> +
  
Mattias Rönnblom Nov. 7, 2022, 7:30 a.m. UTC | #5
On 2022-11-04 13:03, Morten Brørup wrote:
> When built with stats enabled (RTE_LIBRTE_MEMPOOL_STATS defined), the
> performance of mempools with caches is improved as follows.
> 
> When accessing objects in the mempool, either the put_bulk and put_objs or
> the get_success_bulk and get_success_objs statistics counters are likely
> to be incremented.
> 
> By adding an alternative set of these counters to the mempool cache
> structure, accessing the dedicated statistics structure is avoided in the
> likely cases where these counters are incremented.
> 
> The trick here is that the cache line holding the mempool cache structure
> is accessed anyway, in order to access the 'len' or 'flushthresh' fields.
> Updating some statistics counters in the same cache line has lower
> performance cost than accessing the statistics counters in the dedicated
> statistics structure, which resides in another cache line.
> 
> mempool_perf_autotest with this patch shows the following improvements in
> rate_persec.
> 
> The cost of enabling mempool stats (without debug) after this patch:
> -6.8 % and -6.7 %, respectively without and with cache.
> 
> v4:
> * Fix checkpatch warnings:
>    A couple of typos in the patch description.
>    The macro to add to a mempool cache stat variable should not use
>    do {} while (0). Personally, I would tend to disagree with this, but
>    whatever keeps the CI happy.
> v3:
> * Don't update the description of the RTE_MEMPOOL_STAT_ADD macro.
>    This change belongs in the first patch of the series.
> v2:
> * Move the statistics counters into a stats structure.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
>   lib/mempool/rte_mempool.c |  9 ++++++
>   lib/mempool/rte_mempool.h | 66 ++++++++++++++++++++++++++++++++-------
>   2 files changed, 64 insertions(+), 11 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
> index e6208125e0..a18e39af04 100644
> --- a/lib/mempool/rte_mempool.c
> +++ b/lib/mempool/rte_mempool.c
> @@ -1286,6 +1286,15 @@ rte_mempool_dump(FILE *f, struct rte_mempool *mp)
>   		sum.get_success_blks += mp->stats[lcore_id].get_success_blks;
>   		sum.get_fail_blks += mp->stats[lcore_id].get_fail_blks;
>   	}
> +	if (mp->cache_size != 0) {
> +		/* Add the statistics stored in the mempool caches. */
> +		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
> +			sum.put_bulk += mp->local_cache[lcore_id].stats.put_bulk;
> +			sum.put_objs += mp->local_cache[lcore_id].stats.put_objs;
> +			sum.get_success_bulk += mp->local_cache[lcore_id].stats.get_success_bulk;
> +			sum.get_success_objs += mp->local_cache[lcore_id].stats.get_success_objs;
> +		}
> +	}
>   	fprintf(f, "  stats:\n");
>   	fprintf(f, "    put_bulk=%"PRIu64"\n", sum.put_bulk);
>   	fprintf(f, "    put_objs=%"PRIu64"\n", sum.put_objs);
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index abfe34c05f..e6eb573739 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -86,6 +86,19 @@ struct rte_mempool_cache {
>   	uint32_t size;	      /**< Size of the cache */
>   	uint32_t flushthresh; /**< Threshold before we flush excess elements */
>   	uint32_t len;	      /**< Current cache count */
> +#ifdef RTE_LIBRTE_MEMPOOL_STATS
> +	uint32_t unused;
> +	/*
> +	 * Alternative location for the most frequently updated mempool statistics (per-lcore),
> +	 * providing faster update access when using a mempool cache.
> +	 */
> +	struct {
> +		uint64_t put_bulk;          /**< Number of puts. */
> +		uint64_t put_objs;          /**< Number of objects successfully put. */
> +		uint64_t get_success_bulk;  /**< Successful allocation number. */
> +		uint64_t get_success_objs;  /**< Objects successfully allocated. */
> +	} stats;                        /**< Statistics */
> +#endif
>   	/**
>   	 * Cache objects
>   	 *
> @@ -319,6 +332,22 @@ struct rte_mempool {
>   #define RTE_MEMPOOL_STAT_ADD(mp, name, n) do {} while (0)
>   #endif
>   
> +/**
> + * @internal When stats is enabled, store some statistics.
> + *
> + * @param cache
> + *   Pointer to the memory pool cache.
> + * @param name
> + *   Name of the statistics field to increment in the memory pool cache.
> + * @param n
> + *   Number to add to the statistics.
> + */
> +#ifdef RTE_LIBRTE_MEMPOOL_STATS
> +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) (cache)->stats.name += n
> +#else
> +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) do {} while (0)
> +#endif
> +
>   /**
>    * @internal Calculate the size of the mempool header.
>    *
> @@ -1333,13 +1362,17 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
>   {
>   	void **cache_objs;
>   
> +	/* No cache provided */
> +	if (unlikely(cache == NULL))
> +		goto driver_enqueue;
> +
>   	/* increment stat now, adding in mempool always success */
> -	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> -	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
>   
> -	/* No cache provided or the request itself is too big for the cache */
> -	if (unlikely(cache == NULL || n > cache->flushthresh))
> -		goto driver_enqueue;
> +	/* The request itself is too big for the cache */
> +	if (unlikely(n > cache->flushthresh))
> +		goto driver_enqueue_stats_incremented;
>   
>   	/*
>   	 * The cache follows the following algorithm:
> @@ -1364,6 +1397,12 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
>   
>   driver_enqueue:
>   
> +	/* increment stat now, adding in mempool always success */
> +	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> +	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> +
> +driver_enqueue_stats_incremented:
> +
>   	/* push objects to the backend */
>   	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
>   }
> @@ -1470,8 +1509,8 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
>   	if (remaining == 0) {
>   		/* The entire request is satisfied from the cache. */
>   
> -		RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
> -		RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
> +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
>   
>   		return 0;
>   	}
> @@ -1500,8 +1539,8 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
>   
>   	cache->len = cache->size;
>   
> -	RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
> -	RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
>   
>   	return 0;
>   
> @@ -1523,8 +1562,13 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
>   		RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
>   		RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
>   	} else {
> -		RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
> -		RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
> +		if (likely(cache != NULL)) {
> +			RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> +			RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> +		} else {
> +			RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
> +			RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
> +		}
>   	}
>   
>   	return ret;

For the series,
Reviewed-By: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
  
Konstantin Ananyev Nov. 8, 2022, 9:20 a.m. UTC | #6
> When built with stats enabled (RTE_LIBRTE_MEMPOOL_STATS defined), the
> performance of mempools with caches is improved as follows.
> 
> When accessing objects in the mempool, either the put_bulk and put_objs or
> the get_success_bulk and get_success_objs statistics counters are likely
> to be incremented.
> 
> By adding an alternative set of these counters to the mempool cache
> structure, accessing the dedicated statistics structure is avoided in the
> likely cases where these counters are incremented.
> 
> The trick here is that the cache line holding the mempool cache structure
> is accessed anyway, in order to access the 'len' or 'flushthresh' fields.
> Updating some statistics counters in the same cache line has lower
> performance cost than accessing the statistics counters in the dedicated
> statistics structure, which resides in another cache line.
> 
> mempool_perf_autotest with this patch shows the following improvements in
> rate_persec.
> 
> The cost of enabling mempool stats (without debug) after this patch:
> -6.8 % and -6.7 %, respectively without and with cache.
> 
> v4:
> * Fix checkpatch warnings:
>   A couple of typos in the patch description.
>   The macro to add to a mempool cache stat variable should not use
>   do {} while (0). Personally, I would tend to disagree with this, but
>   whatever keeps the CI happy.
> v3:
> * Don't update the description of the RTE_MEMPOOL_STAT_ADD macro.
>   This change belongs in the first patch of the series.
> v2:
> * Move the statistics counters into a stats structure.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
>  lib/mempool/rte_mempool.c |  9 ++++++
>  lib/mempool/rte_mempool.h | 66 ++++++++++++++++++++++++++++++++-------
>  2 files changed, 64 insertions(+), 11 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
> index e6208125e0..a18e39af04 100644
> --- a/lib/mempool/rte_mempool.c
> +++ b/lib/mempool/rte_mempool.c
> @@ -1286,6 +1286,15 @@ rte_mempool_dump(FILE *f, struct rte_mempool *mp)
>  		sum.get_success_blks += mp->stats[lcore_id].get_success_blks;
>  		sum.get_fail_blks += mp->stats[lcore_id].get_fail_blks;
>  	}
> +	if (mp->cache_size != 0) {
> +		/* Add the statistics stored in the mempool caches. */
> +		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
> +			sum.put_bulk += mp->local_cache[lcore_id].stats.put_bulk;
> +			sum.put_objs += mp->local_cache[lcore_id].stats.put_objs;
> +			sum.get_success_bulk += mp->local_cache[lcore_id].stats.get_success_bulk;
> +			sum.get_success_objs += mp->local_cache[lcore_id].stats.get_success_objs;
> +		}
> +	}
>  	fprintf(f, "  stats:\n");
>  	fprintf(f, "    put_bulk=%"PRIu64"\n", sum.put_bulk);
>  	fprintf(f, "    put_objs=%"PRIu64"\n", sum.put_objs);
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index abfe34c05f..e6eb573739 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -86,6 +86,19 @@ struct rte_mempool_cache {
>  	uint32_t size;	      /**< Size of the cache */
>  	uint32_t flushthresh; /**< Threshold before we flush excess elements */
>  	uint32_t len;	      /**< Current cache count */
> +#ifdef RTE_LIBRTE_MEMPOOL_STATS
> +	uint32_t unused;
> +	/*
> +	 * Alternative location for the most frequently updated mempool statistics (per-lcore),
> +	 * providing faster update access when using a mempool cache.
> +	 */
> +	struct {
> +		uint64_t put_bulk;          /**< Number of puts. */
> +		uint64_t put_objs;          /**< Number of objects successfully put. */
> +		uint64_t get_success_bulk;  /**< Successful allocation number. */
> +		uint64_t get_success_objs;  /**< Objects successfully allocated. */
> +	} stats;                        /**< Statistics */
> +#endif
>  	/**
>  	 * Cache objects
>  	 *
> @@ -319,6 +332,22 @@ struct rte_mempool {
>  #define RTE_MEMPOOL_STAT_ADD(mp, name, n) do {} while (0)
>  #endif
> 
> +/**
> + * @internal When stats is enabled, store some statistics.
> + *
> + * @param cache
> + *   Pointer to the memory pool cache.
> + * @param name
> + *   Name of the statistics field to increment in the memory pool cache.
> + * @param n
> + *   Number to add to the statistics.
> + */
> +#ifdef RTE_LIBRTE_MEMPOOL_STATS
> +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) (cache)->stats.name += n

As Andrew already pointed, it needs to be: ((cache)->stats.name += (n))
Apart from that, LGTM.
Series-Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>

> +#else
> +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) do {} while (0)
> +#endif
> +
>  /**
>   * @internal Calculate the size of the mempool header.
>   *
> @@ -1333,13 +1362,17 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
>  {
>  	void **cache_objs;
> 
> +	/* No cache provided */
> +	if (unlikely(cache == NULL))
> +		goto driver_enqueue;
> +
>  	/* increment stat now, adding in mempool always success */
> -	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> -	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> 
> -	/* No cache provided or the request itself is too big for the cache */
> -	if (unlikely(cache == NULL || n > cache->flushthresh))
> -		goto driver_enqueue;
> +	/* The request itself is too big for the cache */
> +	if (unlikely(n > cache->flushthresh))
> +		goto driver_enqueue_stats_incremented;
> 
>  	/*
>  	 * The cache follows the following algorithm:
> @@ -1364,6 +1397,12 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
> 
>  driver_enqueue:
> 
> +	/* increment stat now, adding in mempool always success */
> +	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> +	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> +
> +driver_enqueue_stats_incremented:
> +
>  	/* push objects to the backend */
>  	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
>  }
> @@ -1470,8 +1509,8 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
>  	if (remaining == 0) {
>  		/* The entire request is satisfied from the cache. */
> 
> -		RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
> -		RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
> +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> +		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> 
>  		return 0;
>  	}
> @@ -1500,8 +1539,8 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
> 
>  	cache->len = cache->size;
> 
> -	RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
> -	RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> +	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> 
>  	return 0;
> 
> @@ -1523,8 +1562,13 @@ rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
>  		RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
>  		RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
>  	} else {
> -		RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
> -		RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
> +		if (likely(cache != NULL)) {
> +			RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
> +			RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
> +		} else {
> +			RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
> +			RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
> +		}
>  	}
> 
>  	return ret;
> --
> 2.17.1
>
  
Morten Brørup Nov. 8, 2022, 11:21 a.m. UTC | #7
> From: Konstantin Ananyev [mailto:konstantin.ananyev@huawei.com]
> Sent: Tuesday, 8 November 2022 10.20
> 
> > When built with stats enabled (RTE_LIBRTE_MEMPOOL_STATS defined), the
> > performance of mempools with caches is improved as follows.
> >
> > When accessing objects in the mempool, either the put_bulk and
> put_objs or
> > the get_success_bulk and get_success_objs statistics counters are
> likely
> > to be incremented.
> >
> > By adding an alternative set of these counters to the mempool cache
> > structure, accessing the dedicated statistics structure is avoided in
> the
> > likely cases where these counters are incremented.
> >
> > The trick here is that the cache line holding the mempool cache
> structure
> > is accessed anyway, in order to access the 'len' or 'flushthresh'
> fields.
> > Updating some statistics counters in the same cache line has lower
> > performance cost than accessing the statistics counters in the
> dedicated
> > statistics structure, which resides in another cache line.
> >
> > mempool_perf_autotest with this patch shows the following
> improvements in
> > rate_persec.
> >
> > The cost of enabling mempool stats (without debug) after this patch:
> > -6.8 % and -6.7 %, respectively without and with cache.
> >
> > v4:
> > * Fix checkpatch warnings:
> >   A couple of typos in the patch description.
> >   The macro to add to a mempool cache stat variable should not use
> >   do {} while (0). Personally, I would tend to disagree with this,
> but
> >   whatever keeps the CI happy.
> > v3:
> > * Don't update the description of the RTE_MEMPOOL_STAT_ADD macro.
> >   This change belongs in the first patch of the series.
> > v2:
> > * Move the statistics counters into a stats structure.
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---

[...]

> > +/**
> > + * @internal When stats is enabled, store some statistics.
> > + *
> > + * @param cache
> > + *   Pointer to the memory pool cache.
> > + * @param name
> > + *   Name of the statistics field to increment in the memory pool
> cache.
> > + * @param n
> > + *   Number to add to the statistics.
> > + */
> > +#ifdef RTE_LIBRTE_MEMPOOL_STATS
> > +#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) (cache)->stats.name += n
> 
> As Andrew already pointed, it needs to be: ((cache)->stats.name += (n))
> Apart from that, LGTM.
> Series-Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>

@Thomas, this series should be ready to apply... it now has been:
Reviewed-by: (mempool maintainer) Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Reviewed-By: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>

Please fix the RTE_MEMPOOL_CACHE_STAT_ADD macro while merging, to satisfy checkpatch. ;-)

It should be:

+#ifdef RTE_LIBRTE_MEMPOOL_STATS
+#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) ((cache)->stats.name += (n))
+#else
+#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) do {} while (0)
+#endif
  

Patch

diff --git a/lib/mempool/rte_mempool.c b/lib/mempool/rte_mempool.c
index e6208125e0..a18e39af04 100644
--- a/lib/mempool/rte_mempool.c
+++ b/lib/mempool/rte_mempool.c
@@ -1286,6 +1286,15 @@  rte_mempool_dump(FILE *f, struct rte_mempool *mp)
 		sum.get_success_blks += mp->stats[lcore_id].get_success_blks;
 		sum.get_fail_blks += mp->stats[lcore_id].get_fail_blks;
 	}
+	if (mp->cache_size != 0) {
+		/* Add the statistics stored in the mempool caches. */
+		for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
+			sum.put_bulk += mp->local_cache[lcore_id].stats.put_bulk;
+			sum.put_objs += mp->local_cache[lcore_id].stats.put_objs;
+			sum.get_success_bulk += mp->local_cache[lcore_id].stats.get_success_bulk;
+			sum.get_success_objs += mp->local_cache[lcore_id].stats.get_success_objs;
+		}
+	}
 	fprintf(f, "  stats:\n");
 	fprintf(f, "    put_bulk=%"PRIu64"\n", sum.put_bulk);
 	fprintf(f, "    put_objs=%"PRIu64"\n", sum.put_objs);
diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index abfe34c05f..e6eb573739 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -86,6 +86,19 @@  struct rte_mempool_cache {
 	uint32_t size;	      /**< Size of the cache */
 	uint32_t flushthresh; /**< Threshold before we flush excess elements */
 	uint32_t len;	      /**< Current cache count */
+#ifdef RTE_LIBRTE_MEMPOOL_STATS
+	uint32_t unused;
+	/*
+	 * Alternative location for the most frequently updated mempool statistics (per-lcore),
+	 * providing faster update access when using a mempool cache.
+	 */
+	struct {
+		uint64_t put_bulk;          /**< Number of puts. */
+		uint64_t put_objs;          /**< Number of objects successfully put. */
+		uint64_t get_success_bulk;  /**< Successful allocation number. */
+		uint64_t get_success_objs;  /**< Objects successfully allocated. */
+	} stats;                        /**< Statistics */
+#endif
 	/**
 	 * Cache objects
 	 *
@@ -319,6 +332,22 @@  struct rte_mempool {
 #define RTE_MEMPOOL_STAT_ADD(mp, name, n) do {} while (0)
 #endif
 
+/**
+ * @internal When stats is enabled, store some statistics.
+ *
+ * @param cache
+ *   Pointer to the memory pool cache.
+ * @param name
+ *   Name of the statistics field to increment in the memory pool cache.
+ * @param n
+ *   Number to add to the statistics.
+ */
+#ifdef RTE_LIBRTE_MEMPOOL_STATS
+#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) (cache)->stats.name += n
+#else
+#define RTE_MEMPOOL_CACHE_STAT_ADD(cache, name, n) do {} while (0)
+#endif
+
 /**
  * @internal Calculate the size of the mempool header.
  *
@@ -1333,13 +1362,17 @@  rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 {
 	void **cache_objs;
 
+	/* No cache provided */
+	if (unlikely(cache == NULL))
+		goto driver_enqueue;
+
 	/* increment stat now, adding in mempool always success */
-	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
-	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
 
-	/* No cache provided or the request itself is too big for the cache */
-	if (unlikely(cache == NULL || n > cache->flushthresh))
-		goto driver_enqueue;
+	/* The request itself is too big for the cache */
+	if (unlikely(n > cache->flushthresh))
+		goto driver_enqueue_stats_incremented;
 
 	/*
 	 * The cache follows the following algorithm:
@@ -1364,6 +1397,12 @@  rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 
 driver_enqueue:
 
+	/* increment stat now, adding in mempool always success */
+	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
+	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
+
+driver_enqueue_stats_incremented:
+
 	/* push objects to the backend */
 	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
 }
@@ -1470,8 +1509,8 @@  rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 	if (remaining == 0) {
 		/* The entire request is satisfied from the cache. */
 
-		RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
-		RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+		RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
 
 		return 0;
 	}
@@ -1500,8 +1539,8 @@  rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 
 	cache->len = cache->size;
 
-	RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
-	RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+	RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
 
 	return 0;
 
@@ -1523,8 +1562,13 @@  rte_mempool_do_generic_get(struct rte_mempool *mp, void **obj_table,
 		RTE_MEMPOOL_STAT_ADD(mp, get_fail_bulk, 1);
 		RTE_MEMPOOL_STAT_ADD(mp, get_fail_objs, n);
 	} else {
-		RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
-		RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
+		if (likely(cache != NULL)) {
+			RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_bulk, 1);
+			RTE_MEMPOOL_CACHE_STAT_ADD(cache, get_success_objs, n);
+		} else {
+			RTE_MEMPOOL_STAT_ADD(mp, get_success_bulk, 1);
+			RTE_MEMPOOL_STAT_ADD(mp, get_success_objs, n);
+		}
 	}
 
 	return ret;