mempool: micro-optimize put function

Message ID 20221116101855.93297-1-mb@smartsharesystems.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series mempool: micro-optimize put function |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/intel-Testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/github-robot: build success github build: passed
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS

Commit Message

Morten Brørup Nov. 16, 2022, 10:18 a.m. UTC
  Micro-optimization:
Reduced the most likely code path in the generic put function by moving an
unlikely check out of the most likely code path and further down.

Also updated the comments in the function.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/mempool/rte_mempool.h | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)
  

Comments

Andrew Rybchenko Nov. 16, 2022, 11:04 a.m. UTC | #1
On 11/16/22 13:18, Morten Brørup wrote:
> Micro-optimization:
> Reduced the most likely code path in the generic put function by moving an
> unlikely check out of the most likely code path and further down.
> 
> Also updated the comments in the function.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
>   lib/mempool/rte_mempool.h | 35 ++++++++++++++++++-----------------
>   1 file changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 9f530db24b..aba90dbb5b 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -1364,32 +1364,33 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
>   {
>   	void **cache_objs;
>   
> -	/* No cache provided */
> +	/* No cache provided? */
>   	if (unlikely(cache == NULL))
>   		goto driver_enqueue;
>   
> -	/* increment stat now, adding in mempool always success */
> +	/* Increment stats now, adding in mempool always succeeds. */
>   	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
>   	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
>   
> -	/* The request itself is too big for the cache */
> -	if (unlikely(n > cache->flushthresh))
> -		goto driver_enqueue_stats_incremented;

I've kept the check here since it protects against overflow in len plus 
n below if n is really huge.

> -
> -	/*
> -	 * The cache follows the following algorithm:
> -	 *   1. If the objects cannot be added to the cache without crossing
> -	 *      the flush threshold, flush the cache to the backend.
> -	 *   2. Add the objects to the cache.
> -	 */
> -
> -	if (cache->len + n <= cache->flushthresh) {
> +	if (likely(cache->len + n <= cache->flushthresh)) {
> +		/*
> +		 * The objects can be added to the cache without crossing the
> +		 * flush threshold.
> +		 */
>   		cache_objs = &cache->objs[cache->len];
>   		cache->len += n;
> -	} else {
> +	} else if (likely(n <= cache->flushthresh)) {
> +		/*
> +		 * The request itself fits into the cache.
> +		 * But first, the cache must be flushed to the backend, so
> +		 * adding the objects does not cross the flush threshold.
> +		 */
>   		cache_objs = &cache->objs[0];
>   		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
>   		cache->len = n;
> +	} else {
> +		/* The request itself is too big for the cache. */
> +		goto driver_enqueue_stats_incremented;
>   	}
>   
>   	/* Add the objects to the cache. */
> @@ -1399,13 +1400,13 @@ rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
>   
>   driver_enqueue:
>   
> -	/* increment stat now, adding in mempool always success */
> +	/* Increment stats now, adding in mempool always succeeds. */
>   	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
>   	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
>   
>   driver_enqueue_stats_incremented:
>   
> -	/* push objects to the backend */
> +	/* Push the objects to the backend. */
>   	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
>   }
>
  
Morten Brørup Nov. 16, 2022, 11:10 a.m. UTC | #2
> From: Andrew Rybchenko [mailto:andrew.rybchenko@oktetlabs.ru]
> Sent: Wednesday, 16 November 2022 12.05
> 
> On 11/16/22 13:18, Morten Brørup wrote:
> > Micro-optimization:
> > Reduced the most likely code path in the generic put function by
> moving an
> > unlikely check out of the most likely code path and further down.
> >
> > Also updated the comments in the function.
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> > ---
> >   lib/mempool/rte_mempool.h | 35 ++++++++++++++++++-----------------
> >   1 file changed, 18 insertions(+), 17 deletions(-)
> >
> > diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> > index 9f530db24b..aba90dbb5b 100644
> > --- a/lib/mempool/rte_mempool.h
> > +++ b/lib/mempool/rte_mempool.h
> > @@ -1364,32 +1364,33 @@ rte_mempool_do_generic_put(struct rte_mempool
> *mp, void * const *obj_table,
> >   {
> >   	void **cache_objs;
> >
> > -	/* No cache provided */
> > +	/* No cache provided? */
> >   	if (unlikely(cache == NULL))
> >   		goto driver_enqueue;
> >
> > -	/* increment stat now, adding in mempool always success */
> > +	/* Increment stats now, adding in mempool always succeeds. */
> >   	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
> >   	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
> >
> > -	/* The request itself is too big for the cache */
> > -	if (unlikely(n > cache->flushthresh))
> > -		goto driver_enqueue_stats_incremented;
> 
> I've kept the check here since it protects against overflow in len plus
> n below if n is really huge.

We can fix that, see below.

> 
> > -
> > -	/*
> > -	 * The cache follows the following algorithm:
> > -	 *   1. If the objects cannot be added to the cache without
> crossing
> > -	 *      the flush threshold, flush the cache to the backend.
> > -	 *   2. Add the objects to the cache.
> > -	 */
> > -
> > -	if (cache->len + n <= cache->flushthresh) {
> > +	if (likely(cache->len + n <= cache->flushthresh)) {

It is an invariant that cache->len <= cache->flushthresh, so the above comparison can be rewritten to protect against overflow:

if (likely(n <= cache->flushthresh - cache->len)) {

> > +		/*
> > +		 * The objects can be added to the cache without crossing
> the
> > +		 * flush threshold.
> > +		 */
> >   		cache_objs = &cache->objs[cache->len];
> >   		cache->len += n;
> > -	} else {
> > +	} else if (likely(n <= cache->flushthresh)) {
> > +		/*
> > +		 * The request itself fits into the cache.
> > +		 * But first, the cache must be flushed to the backend, so
> > +		 * adding the objects does not cross the flush threshold.
> > +		 */
> >   		cache_objs = &cache->objs[0];
> >   		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
> >   		cache->len = n;
> > +	} else {
> > +		/* The request itself is too big for the cache. */
> > +		goto driver_enqueue_stats_incremented;
> >   	}
> >
> >   	/* Add the objects to the cache. */
> > @@ -1399,13 +1400,13 @@ rte_mempool_do_generic_put(struct rte_mempool
> *mp, void * const *obj_table,
> >
> >   driver_enqueue:
> >
> > -	/* increment stat now, adding in mempool always success */
> > +	/* Increment stats now, adding in mempool always succeeds. */
> >   	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
> >   	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
> >
> >   driver_enqueue_stats_incremented:
> >
> > -	/* push objects to the backend */
> > +	/* Push the objects to the backend. */
> >   	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
> >   }
> >
>
  
Andrew Rybchenko Nov. 16, 2022, 11:29 a.m. UTC | #3
On 11/16/22 14:10, Morten Brørup wrote:
>> From: Andrew Rybchenko [mailto:andrew.rybchenko@oktetlabs.ru]
>> Sent: Wednesday, 16 November 2022 12.05
>>
>> On 11/16/22 13:18, Morten Brørup wrote:
>>> Micro-optimization:
>>> Reduced the most likely code path in the generic put function by
>> moving an
>>> unlikely check out of the most likely code path and further down.
>>>
>>> Also updated the comments in the function.
>>>
>>> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
>>> ---
>>>    lib/mempool/rte_mempool.h | 35 ++++++++++++++++++-----------------
>>>    1 file changed, 18 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
>>> index 9f530db24b..aba90dbb5b 100644
>>> --- a/lib/mempool/rte_mempool.h
>>> +++ b/lib/mempool/rte_mempool.h
>>> @@ -1364,32 +1364,33 @@ rte_mempool_do_generic_put(struct rte_mempool
>> *mp, void * const *obj_table,
>>>    {
>>>    	void **cache_objs;
>>>
>>> -	/* No cache provided */
>>> +	/* No cache provided? */
>>>    	if (unlikely(cache == NULL))
>>>    		goto driver_enqueue;
>>>
>>> -	/* increment stat now, adding in mempool always success */
>>> +	/* Increment stats now, adding in mempool always succeeds. */
>>>    	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
>>>    	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
>>>
>>> -	/* The request itself is too big for the cache */
>>> -	if (unlikely(n > cache->flushthresh))
>>> -		goto driver_enqueue_stats_incremented;
>>
>> I've kept the check here since it protects against overflow in len plus
>> n below if n is really huge.
> 
> We can fix that, see below.
> 
>>
>>> -
>>> -	/*
>>> -	 * The cache follows the following algorithm:
>>> -	 *   1. If the objects cannot be added to the cache without
>> crossing
>>> -	 *      the flush threshold, flush the cache to the backend.
>>> -	 *   2. Add the objects to the cache.
>>> -	 */
>>> -
>>> -	if (cache->len + n <= cache->flushthresh) {
>>> +	if (likely(cache->len + n <= cache->flushthresh)) {
> 
> It is an invariant that cache->len <= cache->flushthresh, so the above comparison can be rewritten to protect against overflow:
> 
> if (likely(n <= cache->flushthresh - cache->len)) {
> 

True, but it would be useful to highlight the usage of the
invariant here using either a comment or an assert.

IMHO it is wrong to use likely here since, as far as I know, it makes 
else branch very expensive, but crossing the flush
threshold is an expected branch and it must not be that
expensive.

>>> +		/*
>>> +		 * The objects can be added to the cache without crossing
>> the
>>> +		 * flush threshold.
>>> +		 */
>>>    		cache_objs = &cache->objs[cache->len];
>>>    		cache->len += n;
>>> -	} else {
>>> +	} else if (likely(n <= cache->flushthresh)) {
>>> +		/*
>>> +		 * The request itself fits into the cache.
>>> +		 * But first, the cache must be flushed to the backend, so
>>> +		 * adding the objects does not cross the flush threshold.
>>> +		 */
>>>    		cache_objs = &cache->objs[0];
>>>    		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
>>>    		cache->len = n;
>>> +	} else {
>>> +		/* The request itself is too big for the cache. */
>>> +		goto driver_enqueue_stats_incremented;
>>>    	}
>>>
>>>    	/* Add the objects to the cache. */
>>> @@ -1399,13 +1400,13 @@ rte_mempool_do_generic_put(struct rte_mempool
>> *mp, void * const *obj_table,
>>>
>>>    driver_enqueue:
>>>
>>> -	/* increment stat now, adding in mempool always success */
>>> +	/* Increment stats now, adding in mempool always succeeds. */
>>>    	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
>>>    	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
>>>
>>>    driver_enqueue_stats_incremented:
>>>
>>> -	/* push objects to the backend */
>>> +	/* Push the objects to the backend. */
>>>    	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
>>>    }
>>>
>>
>
  

Patch

diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 9f530db24b..aba90dbb5b 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -1364,32 +1364,33 @@  rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 {
 	void **cache_objs;
 
-	/* No cache provided */
+	/* No cache provided? */
 	if (unlikely(cache == NULL))
 		goto driver_enqueue;
 
-	/* increment stat now, adding in mempool always success */
+	/* Increment stats now, adding in mempool always succeeds. */
 	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
 	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
 
-	/* The request itself is too big for the cache */
-	if (unlikely(n > cache->flushthresh))
-		goto driver_enqueue_stats_incremented;
-
-	/*
-	 * The cache follows the following algorithm:
-	 *   1. If the objects cannot be added to the cache without crossing
-	 *      the flush threshold, flush the cache to the backend.
-	 *   2. Add the objects to the cache.
-	 */
-
-	if (cache->len + n <= cache->flushthresh) {
+	if (likely(cache->len + n <= cache->flushthresh)) {
+		/*
+		 * The objects can be added to the cache without crossing the
+		 * flush threshold.
+		 */
 		cache_objs = &cache->objs[cache->len];
 		cache->len += n;
-	} else {
+	} else if (likely(n <= cache->flushthresh)) {
+		/*
+		 * The request itself fits into the cache.
+		 * But first, the cache must be flushed to the backend, so
+		 * adding the objects does not cross the flush threshold.
+		 */
 		cache_objs = &cache->objs[0];
 		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
 		cache->len = n;
+	} else {
+		/* The request itself is too big for the cache. */
+		goto driver_enqueue_stats_incremented;
 	}
 
 	/* Add the objects to the cache. */
@@ -1399,13 +1400,13 @@  rte_mempool_do_generic_put(struct rte_mempool *mp, void * const *obj_table,
 
 driver_enqueue:
 
-	/* increment stat now, adding in mempool always success */
+	/* Increment stats now, adding in mempool always succeeds. */
 	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
 	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
 
 driver_enqueue_stats_incremented:
 
-	/* push objects to the backend */
+	/* Push the objects to the backend. */
 	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
 }