[3/3] ring: use element APIs to implement legacy APIs

Message ID 20200703102651.8918-4-feifei.wang2@arm.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series ring clean up |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/travis-robot warning Travis build: failed
ci/Intel-compilation success Compilation OK

Commit Message

Feifei Wang July 3, 2020, 10:26 a.m. UTC
  Use rte_ring_xxx_elem_xxx APIs to replace legacy API implementation.
This reduces code duplication and improves code maintenance.

aarch64:
HW:N1sdp, 1 socket, 4 cores, 1 thread/core, 2.6GHz
OS:Ubuntu 18.04.1 LTS, Kernel: 5.4.0+
DPDK: 20.05-rc3, Configuration: arm64-n1sdp-linux-gcc
gcc:9.2.1

$sudo ./arm64-n1sdp-linux-gcc/app/test -l 1-2
RTE>>ring_perf_autotest

test results on aarch64 in the case of esize 4:

                                     without this patch   with this patch
Testing burst enq/deq
legacy APIs: SP/SC: burst (size: 8):          1.11              1.10
legacy APIs: SP/SC: burst (size: 32):         1.95              1.97
legacy APIs: MP/MC: burst (size: 8):          1.86              1.94
legacy APIs: MP/MC: burst (size: 32):         2.65              2.69
Testing bulk enq/deq
legacy APIs: SP/SC: bulk (size: 8):           1.08              1.09
legacy APIs: SP/SC: bulk (size: 32):          1.89              1.90
legacy APIs: MP/MC: bulk (size: 8):           1.85              1.98
legacy APIs: MP/MC: bulk (size: 32):          2.65              2.69

x86:
HW: dell, CPU Intel(R) Xeon(R) Gold 6240, 2 sockets, 18 cores/socket,
1 thread/core, 3.3GHz
OS: Ubuntu 20.04 LTS, Kernel: 5.4.0-37-generic
DPDK: 20.05-rc3, Configuration: x86_64-native-linuxapp-gcc
gcc: 9.3.0

$sudo ./x86_64-native-linuxapp-gcc/app/test -l 14,16
RTE>>ring_perf_autotest

test results on x86 in the case of esize 4:

                                     without this patch   with this patch
Testing burst enq/deq
legacy APIs: SP/SC: burst (size: 8):          29.35             27.78
legacy APIs: SP/SC: burst (size: 32):         73.11             73.39
legacy APIs: MP/MC: burst (size: 8):          62.36             62.37
legacy APIs: MP/MC: burst (size: 32):         101.01            101.03
Testing bulk enq/deq
legacy APIs: SP/SC: bulk (size: 8):           25.94             29.55
legacy APIs: SP/SC: bulk (size: 32):          70.00             78.87
legacy APIs: MP/MC: bulk (size: 8):           63.41             62.48
legacy APIs: MP/MC: bulk (size: 32):          105.86            103.84

Summary:
In aarch64 server with this patch, there is almost no performance
difference.
In x86 server with this patch, in some cases, the performance slightly
improve, in other cases, the performance slightly drop.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 lib/librte_ring/rte_ring.h | 284 ++++---------------------------------
 1 file changed, 30 insertions(+), 254 deletions(-)
  

Comments

Feifei Wang July 7, 2020, 5:19 a.m. UTC | #1
Hi, Konstantin, David

I'm Feifei Wang from Arm. Sorry to make the following request:
Would you please do some ring performance tests of this patch in your platforms at the time you are free?
And I want to know whether this patch has a significant impact on other platforms except ARM.

Thanks very much.
Feifei

> -----Original Message-----
> From: Feifei Wang <feifei.wang2@arm.com>
> Sent: 2020年7月3日 18:27
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Konstantin
> Ananyev <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; nd <nd@arm.com>; Feifei Wang
> <Feifei.Wang2@arm.com>
> Subject: [PATCH 3/3] ring: use element APIs to implement legacy APIs
> 
> Use rte_ring_xxx_elem_xxx APIs to replace legacy API implementation.
> This reduces code duplication and improves code maintenance.
> 
> aarch64:
> HW:N1sdp, 1 socket, 4 cores, 1 thread/core, 2.6GHz OS:Ubuntu 18.04.1 LTS,
> Kernel: 5.4.0+
> DPDK: 20.05-rc3, Configuration: arm64-n1sdp-linux-gcc
> gcc:9.2.1
> 
> $sudo ./arm64-n1sdp-linux-gcc/app/test -l 1-2
> RTE>>ring_perf_autotest
> 
> test results on aarch64 in the case of esize 4:
> 
>                                      without this patch   with this patch
> Testing burst enq/deq
> legacy APIs: SP/SC: burst (size: 8):          1.11              1.10
> legacy APIs: SP/SC: burst (size: 32):         1.95              1.97
> legacy APIs: MP/MC: burst (size: 8):          1.86              1.94
> legacy APIs: MP/MC: burst (size: 32):         2.65              2.69
> Testing bulk enq/deq
> legacy APIs: SP/SC: bulk (size: 8):           1.08              1.09
> legacy APIs: SP/SC: bulk (size: 32):          1.89              1.90
> legacy APIs: MP/MC: bulk (size: 8):           1.85              1.98
> legacy APIs: MP/MC: bulk (size: 32):          2.65              2.69
> 
> x86:
> HW: dell, CPU Intel(R) Xeon(R) Gold 6240, 2 sockets, 18 cores/socket,
> 1 thread/core, 3.3GHz
> OS: Ubuntu 20.04 LTS, Kernel: 5.4.0-37-generic
> DPDK: 20.05-rc3, Configuration: x86_64-native-linuxapp-gcc
> gcc: 9.3.0
> 
> $sudo ./x86_64-native-linuxapp-gcc/app/test -l 14,16
> RTE>>ring_perf_autotest
> 
> test results on x86 in the case of esize 4:
> 
>                                      without this patch   with this patch
> Testing burst enq/deq
> legacy APIs: SP/SC: burst (size: 8):          29.35             27.78
> legacy APIs: SP/SC: burst (size: 32):         73.11             73.39
> legacy APIs: MP/MC: burst (size: 8):          62.36             62.37
> legacy APIs: MP/MC: burst (size: 32):         101.01            101.03
> Testing bulk enq/deq
> legacy APIs: SP/SC: bulk (size: 8):           25.94             29.55
> legacy APIs: SP/SC: bulk (size: 32):          70.00             78.87
> legacy APIs: MP/MC: bulk (size: 8):           63.41             62.48
> legacy APIs: MP/MC: bulk (size: 32):          105.86            103.84
> 
> Summary:
> In aarch64 server with this patch, there is almost no performance difference.
> In x86 server with this patch, in some cases, the performance slightly
> improve, in other cases, the performance slightly drop.
> 
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  lib/librte_ring/rte_ring.h | 284 ++++---------------------------------
>  1 file changed, 30 insertions(+), 254 deletions(-)
> 
> diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index
> 35f3f8c42..2a2190bfc 100644
> --- a/lib/librte_ring/rte_ring.h
> +++ b/lib/librte_ring/rte_ring.h
> @@ -191,168 +191,6 @@ void rte_ring_free(struct rte_ring *r);
>   */
>  void rte_ring_dump(FILE *f, const struct rte_ring *r);
> 
> -/* the actual enqueue of pointers on the ring.
> - * Placed here since identical code needed in both
> - * single and multi producer enqueue functions */ -#define
> ENQUEUE_PTRS(r, ring_start, prod_head, obj_table, n, obj_type) do { \
> -	unsigned int i; \
> -	const uint32_t size = (r)->size; \
> -	uint32_t idx = prod_head & (r)->mask; \
> -	obj_type *ring = (obj_type *)ring_start; \
> -	if (likely(idx + n < size)) { \
> -		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) { \
> -			ring[idx] = obj_table[i]; \
> -			ring[idx + 1] = obj_table[i + 1]; \
> -			ring[idx + 2] = obj_table[i + 2]; \
> -			ring[idx + 3] = obj_table[i + 3]; \
> -		} \
> -		switch (n & 0x3) { \
> -		case 3: \
> -			ring[idx++] = obj_table[i++]; /* fallthrough */ \
> -		case 2: \
> -			ring[idx++] = obj_table[i++]; /* fallthrough */ \
> -		case 1: \
> -			ring[idx++] = obj_table[i++]; \
> -		} \
> -	} else { \
> -		for (i = 0; idx < size; i++, idx++)\
> -			ring[idx] = obj_table[i]; \
> -		for (idx = 0; i < n; i++, idx++) \
> -			ring[idx] = obj_table[i]; \
> -	} \
> -} while (0)
> -
> -/* the actual copy of pointers on the ring to obj_table.
> - * Placed here since identical code needed in both
> - * single and multi consumer dequeue functions */ -#define
> DEQUEUE_PTRS(r, ring_start, cons_head, obj_table, n, obj_type) do { \
> -	unsigned int i; \
> -	uint32_t idx = cons_head & (r)->mask; \
> -	const uint32_t size = (r)->size; \
> -	obj_type *ring = (obj_type *)ring_start; \
> -	if (likely(idx + n < size)) { \
> -		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {\
> -			obj_table[i] = ring[idx]; \
> -			obj_table[i + 1] = ring[idx + 1]; \
> -			obj_table[i + 2] = ring[idx + 2]; \
> -			obj_table[i + 3] = ring[idx + 3]; \
> -		} \
> -		switch (n & 0x3) { \
> -		case 3: \
> -			obj_table[i++] = ring[idx++]; /* fallthrough */ \
> -		case 2: \
> -			obj_table[i++] = ring[idx++]; /* fallthrough */ \
> -		case 1: \
> -			obj_table[i++] = ring[idx++]; \
> -		} \
> -	} else { \
> -		for (i = 0; idx < size; i++, idx++) \
> -			obj_table[i] = ring[idx]; \
> -		for (idx = 0; i < n; i++, idx++) \
> -			obj_table[i] = ring[idx]; \
> -	} \
> -} while (0)
> -
> -/* Between load and load. there might be cpu reorder in weak model
> - * (powerpc/arm).
> - * There are 2 choices for the users
> - * 1.use rmb() memory barrier
> - * 2.use one-direction load_acquire/store_release barrier,defined by
> - * CONFIG_RTE_USE_C11_MEM_MODEL=y
> - * It depends on performance test results.
> - * By default, move common functions to rte_ring_generic.h
> - */
> -#ifdef RTE_USE_C11_MEM_MODEL
> -#include "rte_ring_c11_mem.h"
> -#else
> -#include "rte_ring_generic.h"
> -#endif
> -
> -/**
> - * @internal Enqueue several objects on the ring
> - *
> -  * @param r
> - *   A pointer to the ring structure.
> - * @param obj_table
> - *   A pointer to a table of void * pointers (objects).
> - * @param n
> - *   The number of objects to add in the ring from the obj_table.
> - * @param behavior
> - *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a
> ring
> - *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> ring
> - * @param is_sp
> - *   Indicates whether to use single producer or multi-producer head update
> - * @param free_space
> - *   returns the amount of space after the enqueue operation has finished
> - * @return
> - *   Actual number of objects enqueued.
> - *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> - */
> -static __rte_always_inline unsigned int -__rte_ring_do_enqueue(struct
> rte_ring *r, void * const *obj_table,
> -		 unsigned int n, enum rte_ring_queue_behavior behavior,
> -		 unsigned int is_sp, unsigned int *free_space)
> -{
> -	uint32_t prod_head, prod_next;
> -	uint32_t free_entries;
> -
> -	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
> -			&prod_head, &prod_next, &free_entries);
> -	if (n == 0)
> -		goto end;
> -
> -	ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n, void *);
> -
> -	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> -end:
> -	if (free_space != NULL)
> -		*free_space = free_entries - n;
> -	return n;
> -}
> -
> -/**
> - * @internal Dequeue several objects from the ring
> - *
> - * @param r
> - *   A pointer to the ring structure.
> - * @param obj_table
> - *   A pointer to a table of void * pointers (objects).
> - * @param n
> - *   The number of objects to pull from the ring.
> - * @param behavior
> - *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a
> ring
> - *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> ring
> - * @param is_sc
> - *   Indicates whether to use single consumer or multi-consumer head
> update
> - * @param available
> - *   returns the number of remaining ring entries after the dequeue has
> finished
> - * @return
> - *   - Actual number of objects dequeued.
> - *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> - */
> -static __rte_always_inline unsigned int -__rte_ring_do_dequeue(struct
> rte_ring *r, void **obj_table,
> -		 unsigned int n, enum rte_ring_queue_behavior behavior,
> -		 unsigned int is_sc, unsigned int *available)
> -{
> -	uint32_t cons_head, cons_next;
> -	uint32_t entries;
> -
> -	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
> -			&cons_head, &cons_next, &entries);
> -	if (n == 0)
> -		goto end;
> -
> -	DEQUEUE_PTRS(r, &r[1], cons_head, obj_table, n, void *);
> -
> -	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
> -
> -end:
> -	if (available != NULL)
> -		*available = entries - n;
> -	return n;
> -}
> -
>  /**
>   * Enqueue several objects on the ring (multi-producers safe).
>   *
> @@ -375,8 +213,8 @@ static __rte_always_inline unsigned int
> rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
>  			 unsigned int n, unsigned int *free_space)  {
> -	return __rte_ring_do_enqueue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			RTE_RING_SYNC_MT, free_space);
> +	return rte_ring_mp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> +			n, free_space);
>  }
> 
>  /**
> @@ -398,8 +236,8 @@ static __rte_always_inline unsigned int
> rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
>  			 unsigned int n, unsigned int *free_space)  {
> -	return __rte_ring_do_enqueue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			RTE_RING_SYNC_ST, free_space);
> +	return rte_ring_sp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> +			n, free_space);
>  }
> 
>  /**
> @@ -425,24 +263,8 @@ static __rte_always_inline unsigned int
> rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
>  		      unsigned int n, unsigned int *free_space)  {
> -	switch (r->prod.sync_type) {
> -	case RTE_RING_SYNC_MT:
> -		return rte_ring_mp_enqueue_bulk(r, obj_table, n,
> free_space);
> -	case RTE_RING_SYNC_ST:
> -		return rte_ring_sp_enqueue_bulk(r, obj_table, n,
> free_space);
> -#ifdef ALLOW_EXPERIMENTAL_API
> -	case RTE_RING_SYNC_MT_RTS:
> -		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
> -			free_space);
> -	case RTE_RING_SYNC_MT_HTS:
> -		return rte_ring_mp_hts_enqueue_bulk(r, obj_table, n,
> -			free_space);
> -#endif
> -	}
> -
> -	/* valid ring should never reach this point */
> -	RTE_ASSERT(0);
> -	return 0;
> +	return rte_ring_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> +			n, free_space);
>  }
> 
>  /**
> @@ -462,7 +284,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void *
> const *obj_table,  static __rte_always_inline int
> rte_ring_mp_enqueue(struct rte_ring *r, void *obj)  {
> -	return rte_ring_mp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> +	return rte_ring_mp_enqueue_elem(r, obj, sizeof(void *));
>  }
> 
>  /**
> @@ -479,7 +301,7 @@ rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
> static __rte_always_inline int  rte_ring_sp_enqueue(struct rte_ring *r, void
> *obj)  {
> -	return rte_ring_sp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> +	return rte_ring_sp_enqueue_elem(r, obj, sizeof(void *));
>  }
> 
>  /**
> @@ -500,7 +322,7 @@ rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
> static __rte_always_inline int  rte_ring_enqueue(struct rte_ring *r, void *obj)
> {
> -	return rte_ring_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> +	return rte_ring_enqueue_elem(r, obj, sizeof(void *));
>  }
> 
>  /**
> @@ -525,8 +347,8 @@ static __rte_always_inline unsigned int
> rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
>  		unsigned int n, unsigned int *available)  {
> -	return __rte_ring_do_dequeue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			RTE_RING_SYNC_MT, available);
> +	return rte_ring_mc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> +			n, available);
>  }
> 
>  /**
> @@ -549,8 +371,8 @@ static __rte_always_inline unsigned int
> rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
>  		unsigned int n, unsigned int *available)  {
> -	return __rte_ring_do_dequeue(r, obj_table, n,
> RTE_RING_QUEUE_FIXED,
> -			RTE_RING_SYNC_ST, available);
> +	return rte_ring_sc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> +			n, available);
>  }
> 
>  /**
> @@ -576,22 +398,8 @@ static __rte_always_inline unsigned int
> rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
>  		unsigned int *available)
>  {
> -	switch (r->cons.sync_type) {
> -	case RTE_RING_SYNC_MT:
> -		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
> -	case RTE_RING_SYNC_ST:
> -		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
> -#ifdef ALLOW_EXPERIMENTAL_API
> -	case RTE_RING_SYNC_MT_RTS:
> -		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n,
> available);
> -	case RTE_RING_SYNC_MT_HTS:
> -		return rte_ring_mc_hts_dequeue_bulk(r, obj_table, n,
> available);
> -#endif
> -	}
> -
> -	/* valid ring should never reach this point */
> -	RTE_ASSERT(0);
> -	return 0;
> +	return rte_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> +			n, available);
>  }
> 
>  /**
> @@ -612,7 +420,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void
> **obj_table, unsigned int n,  static __rte_always_inline int
> rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)  {
> -	return rte_ring_mc_dequeue_bulk(r, obj_p, 1, NULL)  ? 0 : -ENOENT;
> +	return rte_ring_mc_dequeue_elem(r, obj_p, sizeof(void *));
>  }
> 
>  /**
> @@ -630,7 +438,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void
> **obj_p)  static __rte_always_inline int  rte_ring_sc_dequeue(struct
> rte_ring *r, void **obj_p)  {
> -	return rte_ring_sc_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> +	return rte_ring_sc_dequeue_elem(r, obj_p, sizeof(void *));
>  }
> 
>  /**
> @@ -652,7 +460,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
> static __rte_always_inline int  rte_ring_dequeue(struct rte_ring *r, void
> **obj_p)  {
> -	return rte_ring_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> +	return rte_ring_dequeue_elem(r, obj_p, sizeof(void *));
>  }
> 
>  /**
> @@ -860,8 +668,8 @@ static __rte_always_inline unsigned int
> rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
>  			 unsigned int n, unsigned int *free_space)  {
> -	return __rte_ring_do_enqueue(r, obj_table, n,
> -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> free_space);
> +	return rte_ring_mp_enqueue_burst_elem(r, obj_table, sizeof(void
> *),
> +			n, free_space);
>  }
> 
>  /**
> @@ -883,8 +691,8 @@ static __rte_always_inline unsigned int
> rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
>  			 unsigned int n, unsigned int *free_space)  {
> -	return __rte_ring_do_enqueue(r, obj_table, n,
> -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> free_space);
> +	return rte_ring_sp_enqueue_burst_elem(r, obj_table, sizeof(void *),
> +			n, free_space);
>  }
> 
>  /**
> @@ -910,24 +718,8 @@ static __rte_always_inline unsigned int
> rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
>  		      unsigned int n, unsigned int *free_space)  {
> -	switch (r->prod.sync_type) {
> -	case RTE_RING_SYNC_MT:
> -		return rte_ring_mp_enqueue_burst(r, obj_table, n,
> free_space);
> -	case RTE_RING_SYNC_ST:
> -		return rte_ring_sp_enqueue_burst(r, obj_table, n,
> free_space);
> -#ifdef ALLOW_EXPERIMENTAL_API
> -	case RTE_RING_SYNC_MT_RTS:
> -		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
> -			free_space);
> -	case RTE_RING_SYNC_MT_HTS:
> -		return rte_ring_mp_hts_enqueue_burst(r, obj_table, n,
> -			free_space);
> -#endif
> -	}
> -
> -	/* valid ring should never reach this point */
> -	RTE_ASSERT(0);
> -	return 0;
> +	return rte_ring_enqueue_burst_elem(r, obj_table, sizeof(void *),
> +			n, free_space);
>  }
> 
>  /**
> @@ -954,8 +746,8 @@ static __rte_always_inline unsigned int
> rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
>  		unsigned int n, unsigned int *available)  {
> -	return __rte_ring_do_dequeue(r, obj_table, n,
> -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> available);
> +	return rte_ring_mc_dequeue_burst_elem(r, obj_table, sizeof(void
> *),
> +			n, available);
>  }
> 
>  /**
> @@ -979,8 +771,8 @@ static __rte_always_inline unsigned int
> rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
>  		unsigned int n, unsigned int *available)  {
> -	return __rte_ring_do_dequeue(r, obj_table, n,
> -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> available);
> +	return rte_ring_sc_dequeue_burst_elem(r, obj_table, sizeof(void *),
> +			n, available);
>  }
> 
>  /**
> @@ -1006,24 +798,8 @@ static __rte_always_inline unsigned int
> rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
>  		unsigned int n, unsigned int *available)  {
> -	switch (r->cons.sync_type) {
> -	case RTE_RING_SYNC_MT:
> -		return rte_ring_mc_dequeue_burst(r, obj_table, n,
> available);
> -	case RTE_RING_SYNC_ST:
> -		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
> -#ifdef ALLOW_EXPERIMENTAL_API
> -	case RTE_RING_SYNC_MT_RTS:
> -		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
> -			available);
> -	case RTE_RING_SYNC_MT_HTS:
> -		return rte_ring_mc_hts_dequeue_burst(r, obj_table, n,
> -			available);
> -#endif
> -	}
> -
> -	/* valid ring should never reach this point */
> -	RTE_ASSERT(0);
> -	return 0;
> +	return rte_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> +			n, available);
>  }
> 
>  #ifdef __cplusplus
> --
> 2.17.1
  
Ananyev, Konstantin July 7, 2020, 2:04 p.m. UTC | #2
Hi Feifei,

 > Hi, Konstantin, David
> 
> I'm Feifei Wang from Arm. Sorry to make the following request:
> Would you please do some ring performance tests of this patch in your platforms at the time you are free?
> And I want to know whether this patch has a significant impact on other platforms except ARM.

I run few tests on SKX box and so far didn’t notice any real perf difference.
Konstantin
 
> Thanks very much.
> Feifei
> 
> > -----Original Message-----
> > From: Feifei Wang <feifei.wang2@arm.com>
> > Sent: 2020年7月3日 18:27
> > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Konstantin
> > Ananyev <konstantin.ananyev@intel.com>
> > Cc: dev@dpdk.org; nd <nd@arm.com>; Feifei Wang
> > <Feifei.Wang2@arm.com>
> > Subject: [PATCH 3/3] ring: use element APIs to implement legacy APIs
> >
> > Use rte_ring_xxx_elem_xxx APIs to replace legacy API implementation.
> > This reduces code duplication and improves code maintenance.
> >
> > aarch64:
> > HW:N1sdp, 1 socket, 4 cores, 1 thread/core, 2.6GHz OS:Ubuntu 18.04.1 LTS,
> > Kernel: 5.4.0+
> > DPDK: 20.05-rc3, Configuration: arm64-n1sdp-linux-gcc
> > gcc:9.2.1
> >
> > $sudo ./arm64-n1sdp-linux-gcc/app/test -l 1-2
> > RTE>>ring_perf_autotest
> >
> > test results on aarch64 in the case of esize 4:
> >
> >                                      without this patch   with this patch
> > Testing burst enq/deq
> > legacy APIs: SP/SC: burst (size: 8):          1.11              1.10
> > legacy APIs: SP/SC: burst (size: 32):         1.95              1.97
> > legacy APIs: MP/MC: burst (size: 8):          1.86              1.94
> > legacy APIs: MP/MC: burst (size: 32):         2.65              2.69
> > Testing bulk enq/deq
> > legacy APIs: SP/SC: bulk (size: 8):           1.08              1.09
> > legacy APIs: SP/SC: bulk (size: 32):          1.89              1.90
> > legacy APIs: MP/MC: bulk (size: 8):           1.85              1.98
> > legacy APIs: MP/MC: bulk (size: 32):          2.65              2.69
> >
> > x86:
> > HW: dell, CPU Intel(R) Xeon(R) Gold 6240, 2 sockets, 18 cores/socket,
> > 1 thread/core, 3.3GHz
> > OS: Ubuntu 20.04 LTS, Kernel: 5.4.0-37-generic
> > DPDK: 20.05-rc3, Configuration: x86_64-native-linuxapp-gcc
> > gcc: 9.3.0
> >
> > $sudo ./x86_64-native-linuxapp-gcc/app/test -l 14,16
> > RTE>>ring_perf_autotest
> >
> > test results on x86 in the case of esize 4:
> >
> >                                      without this patch   with this patch
> > Testing burst enq/deq
> > legacy APIs: SP/SC: burst (size: 8):          29.35             27.78
> > legacy APIs: SP/SC: burst (size: 32):         73.11             73.39
> > legacy APIs: MP/MC: burst (size: 8):          62.36             62.37
> > legacy APIs: MP/MC: burst (size: 32):         101.01            101.03
> > Testing bulk enq/deq
> > legacy APIs: SP/SC: bulk (size: 8):           25.94             29.55
> > legacy APIs: SP/SC: bulk (size: 32):          70.00             78.87
> > legacy APIs: MP/MC: bulk (size: 8):           63.41             62.48
> > legacy APIs: MP/MC: bulk (size: 32):          105.86            103.84
> >
> > Summary:
> > In aarch64 server with this patch, there is almost no performance difference.
> > In x86 server with this patch, in some cases, the performance slightly
> > improve, in other cases, the performance slightly drop.
> >
> > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >  lib/librte_ring/rte_ring.h | 284 ++++---------------------------------
> >  1 file changed, 30 insertions(+), 254 deletions(-)
> >
> > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h index
> > 35f3f8c42..2a2190bfc 100644
> > --- a/lib/librte_ring/rte_ring.h
> > +++ b/lib/librte_ring/rte_ring.h
> > @@ -191,168 +191,6 @@ void rte_ring_free(struct rte_ring *r);
> >   */
> >  void rte_ring_dump(FILE *f, const struct rte_ring *r);
> >
> > -/* the actual enqueue of pointers on the ring.
> > - * Placed here since identical code needed in both
> > - * single and multi producer enqueue functions */ -#define
> > ENQUEUE_PTRS(r, ring_start, prod_head, obj_table, n, obj_type) do { \
> > -	unsigned int i; \
> > -	const uint32_t size = (r)->size; \
> > -	uint32_t idx = prod_head & (r)->mask; \
> > -	obj_type *ring = (obj_type *)ring_start; \
> > -	if (likely(idx + n < size)) { \
> > -		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) { \
> > -			ring[idx] = obj_table[i]; \
> > -			ring[idx + 1] = obj_table[i + 1]; \
> > -			ring[idx + 2] = obj_table[i + 2]; \
> > -			ring[idx + 3] = obj_table[i + 3]; \
> > -		} \
> > -		switch (n & 0x3) { \
> > -		case 3: \
> > -			ring[idx++] = obj_table[i++]; /* fallthrough */ \
> > -		case 2: \
> > -			ring[idx++] = obj_table[i++]; /* fallthrough */ \
> > -		case 1: \
> > -			ring[idx++] = obj_table[i++]; \
> > -		} \
> > -	} else { \
> > -		for (i = 0; idx < size; i++, idx++)\
> > -			ring[idx] = obj_table[i]; \
> > -		for (idx = 0; i < n; i++, idx++) \
> > -			ring[idx] = obj_table[i]; \
> > -	} \
> > -} while (0)
> > -
> > -/* the actual copy of pointers on the ring to obj_table.
> > - * Placed here since identical code needed in both
> > - * single and multi consumer dequeue functions */ -#define
> > DEQUEUE_PTRS(r, ring_start, cons_head, obj_table, n, obj_type) do { \
> > -	unsigned int i; \
> > -	uint32_t idx = cons_head & (r)->mask; \
> > -	const uint32_t size = (r)->size; \
> > -	obj_type *ring = (obj_type *)ring_start; \
> > -	if (likely(idx + n < size)) { \
> > -		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {\
> > -			obj_table[i] = ring[idx]; \
> > -			obj_table[i + 1] = ring[idx + 1]; \
> > -			obj_table[i + 2] = ring[idx + 2]; \
> > -			obj_table[i + 3] = ring[idx + 3]; \
> > -		} \
> > -		switch (n & 0x3) { \
> > -		case 3: \
> > -			obj_table[i++] = ring[idx++]; /* fallthrough */ \
> > -		case 2: \
> > -			obj_table[i++] = ring[idx++]; /* fallthrough */ \
> > -		case 1: \
> > -			obj_table[i++] = ring[idx++]; \
> > -		} \
> > -	} else { \
> > -		for (i = 0; idx < size; i++, idx++) \
> > -			obj_table[i] = ring[idx]; \
> > -		for (idx = 0; i < n; i++, idx++) \
> > -			obj_table[i] = ring[idx]; \
> > -	} \
> > -} while (0)
> > -
> > -/* Between load and load. there might be cpu reorder in weak model
> > - * (powerpc/arm).
> > - * There are 2 choices for the users
> > - * 1.use rmb() memory barrier
> > - * 2.use one-direction load_acquire/store_release barrier,defined by
> > - * CONFIG_RTE_USE_C11_MEM_MODEL=y
> > - * It depends on performance test results.
> > - * By default, move common functions to rte_ring_generic.h
> > - */
> > -#ifdef RTE_USE_C11_MEM_MODEL
> > -#include "rte_ring_c11_mem.h"
> > -#else
> > -#include "rte_ring_generic.h"
> > -#endif
> > -
> > -/**
> > - * @internal Enqueue several objects on the ring
> > - *
> > -  * @param r
> > - *   A pointer to the ring structure.
> > - * @param obj_table
> > - *   A pointer to a table of void * pointers (objects).
> > - * @param n
> > - *   The number of objects to add in the ring from the obj_table.
> > - * @param behavior
> > - *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a
> > ring
> > - *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from
> > ring
> > - * @param is_sp
> > - *   Indicates whether to use single producer or multi-producer head update
> > - * @param free_space
> > - *   returns the amount of space after the enqueue operation has finished
> > - * @return
> > - *   Actual number of objects enqueued.
> > - *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > - */
> > -static __rte_always_inline unsigned int -__rte_ring_do_enqueue(struct
> > rte_ring *r, void * const *obj_table,
> > -		 unsigned int n, enum rte_ring_queue_behavior behavior,
> > -		 unsigned int is_sp, unsigned int *free_space)
> > -{
> > -	uint32_t prod_head, prod_next;
> > -	uint32_t free_entries;
> > -
> > -	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
> > -			&prod_head, &prod_next, &free_entries);
> > -	if (n == 0)
> > -		goto end;
> > -
> > -	ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n, void *);
> > -
> > -	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> > -end:
> > -	if (free_space != NULL)
> > -		*free_space = free_entries - n;
> > -	return n;
> > -}
> > -
> > -/**
> > - * @internal Dequeue several objects from the ring
> > - *
> > - * @param r
> > - *   A pointer to the ring structure.
> > - * @param obj_table
> > - *   A pointer to a table of void * pointers (objects).
> > - * @param n
> > - *   The number of objects to pull from the ring.
> > - * @param behavior
> > - *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a
> > ring
> > - *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from
> > ring
> > - * @param is_sc
> > - *   Indicates whether to use single consumer or multi-consumer head
> > update
> > - * @param available
> > - *   returns the number of remaining ring entries after the dequeue has
> > finished
> > - * @return
> > - *   - Actual number of objects dequeued.
> > - *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > - */
> > -static __rte_always_inline unsigned int -__rte_ring_do_dequeue(struct
> > rte_ring *r, void **obj_table,
> > -		 unsigned int n, enum rte_ring_queue_behavior behavior,
> > -		 unsigned int is_sc, unsigned int *available)
> > -{
> > -	uint32_t cons_head, cons_next;
> > -	uint32_t entries;
> > -
> > -	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
> > -			&cons_head, &cons_next, &entries);
> > -	if (n == 0)
> > -		goto end;
> > -
> > -	DEQUEUE_PTRS(r, &r[1], cons_head, obj_table, n, void *);
> > -
> > -	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
> > -
> > -end:
> > -	if (available != NULL)
> > -		*available = entries - n;
> > -	return n;
> > -}
> > -
> >  /**
> >   * Enqueue several objects on the ring (multi-producers safe).
> >   *
> > @@ -375,8 +213,8 @@ static __rte_always_inline unsigned int
> > rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> >  			 unsigned int n, unsigned int *free_space)  {
> > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > -			RTE_RING_SYNC_MT, free_space);
> > +	return rte_ring_mp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > +			n, free_space);
> >  }
> >
> >  /**
> > @@ -398,8 +236,8 @@ static __rte_always_inline unsigned int
> > rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> >  			 unsigned int n, unsigned int *free_space)  {
> > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > -			RTE_RING_SYNC_ST, free_space);
> > +	return rte_ring_sp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > +			n, free_space);
> >  }
> >
> >  /**
> > @@ -425,24 +263,8 @@ static __rte_always_inline unsigned int
> > rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> >  		      unsigned int n, unsigned int *free_space)  {
> > -	switch (r->prod.sync_type) {
> > -	case RTE_RING_SYNC_MT:
> > -		return rte_ring_mp_enqueue_bulk(r, obj_table, n,
> > free_space);
> > -	case RTE_RING_SYNC_ST:
> > -		return rte_ring_sp_enqueue_bulk(r, obj_table, n,
> > free_space);
> > -#ifdef ALLOW_EXPERIMENTAL_API
> > -	case RTE_RING_SYNC_MT_RTS:
> > -		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
> > -			free_space);
> > -	case RTE_RING_SYNC_MT_HTS:
> > -		return rte_ring_mp_hts_enqueue_bulk(r, obj_table, n,
> > -			free_space);
> > -#endif
> > -	}
> > -
> > -	/* valid ring should never reach this point */
> > -	RTE_ASSERT(0);
> > -	return 0;
> > +	return rte_ring_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > +			n, free_space);
> >  }
> >
> >  /**
> > @@ -462,7 +284,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void *
> > const *obj_table,  static __rte_always_inline int
> > rte_ring_mp_enqueue(struct rte_ring *r, void *obj)  {
> > -	return rte_ring_mp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > +	return rte_ring_mp_enqueue_elem(r, obj, sizeof(void *));
> >  }
> >
> >  /**
> > @@ -479,7 +301,7 @@ rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
> > static __rte_always_inline int  rte_ring_sp_enqueue(struct rte_ring *r, void
> > *obj)  {
> > -	return rte_ring_sp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > +	return rte_ring_sp_enqueue_elem(r, obj, sizeof(void *));
> >  }
> >
> >  /**
> > @@ -500,7 +322,7 @@ rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
> > static __rte_always_inline int  rte_ring_enqueue(struct rte_ring *r, void *obj)
> > {
> > -	return rte_ring_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > +	return rte_ring_enqueue_elem(r, obj, sizeof(void *));
> >  }
> >
> >  /**
> > @@ -525,8 +347,8 @@ static __rte_always_inline unsigned int
> > rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> >  		unsigned int n, unsigned int *available)  {
> > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > -			RTE_RING_SYNC_MT, available);
> > +	return rte_ring_mc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > +			n, available);
> >  }
> >
> >  /**
> > @@ -549,8 +371,8 @@ static __rte_always_inline unsigned int
> > rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> >  		unsigned int n, unsigned int *available)  {
> > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > RTE_RING_QUEUE_FIXED,
> > -			RTE_RING_SYNC_ST, available);
> > +	return rte_ring_sc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > +			n, available);
> >  }
> >
> >  /**
> > @@ -576,22 +398,8 @@ static __rte_always_inline unsigned int
> > rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
> >  		unsigned int *available)
> >  {
> > -	switch (r->cons.sync_type) {
> > -	case RTE_RING_SYNC_MT:
> > -		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
> > -	case RTE_RING_SYNC_ST:
> > -		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
> > -#ifdef ALLOW_EXPERIMENTAL_API
> > -	case RTE_RING_SYNC_MT_RTS:
> > -		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n,
> > available);
> > -	case RTE_RING_SYNC_MT_HTS:
> > -		return rte_ring_mc_hts_dequeue_bulk(r, obj_table, n,
> > available);
> > -#endif
> > -	}
> > -
> > -	/* valid ring should never reach this point */
> > -	RTE_ASSERT(0);
> > -	return 0;
> > +	return rte_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > +			n, available);
> >  }
> >
> >  /**
> > @@ -612,7 +420,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void
> > **obj_table, unsigned int n,  static __rte_always_inline int
> > rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)  {
> > -	return rte_ring_mc_dequeue_bulk(r, obj_p, 1, NULL)  ? 0 : -ENOENT;
> > +	return rte_ring_mc_dequeue_elem(r, obj_p, sizeof(void *));
> >  }
> >
> >  /**
> > @@ -630,7 +438,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void
> > **obj_p)  static __rte_always_inline int  rte_ring_sc_dequeue(struct
> > rte_ring *r, void **obj_p)  {
> > -	return rte_ring_sc_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> > +	return rte_ring_sc_dequeue_elem(r, obj_p, sizeof(void *));
> >  }
> >
> >  /**
> > @@ -652,7 +460,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
> > static __rte_always_inline int  rte_ring_dequeue(struct rte_ring *r, void
> > **obj_p)  {
> > -	return rte_ring_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> > +	return rte_ring_dequeue_elem(r, obj_p, sizeof(void *));
> >  }
> >
> >  /**
> > @@ -860,8 +668,8 @@ static __rte_always_inline unsigned int
> > rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> >  			 unsigned int n, unsigned int *free_space)  {
> > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> > free_space);
> > +	return rte_ring_mp_enqueue_burst_elem(r, obj_table, sizeof(void
> > *),
> > +			n, free_space);
> >  }
> >
> >  /**
> > @@ -883,8 +691,8 @@ static __rte_always_inline unsigned int
> > rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> >  			 unsigned int n, unsigned int *free_space)  {
> > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> > free_space);
> > +	return rte_ring_sp_enqueue_burst_elem(r, obj_table, sizeof(void *),
> > +			n, free_space);
> >  }
> >
> >  /**
> > @@ -910,24 +718,8 @@ static __rte_always_inline unsigned int
> > rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> >  		      unsigned int n, unsigned int *free_space)  {
> > -	switch (r->prod.sync_type) {
> > -	case RTE_RING_SYNC_MT:
> > -		return rte_ring_mp_enqueue_burst(r, obj_table, n,
> > free_space);
> > -	case RTE_RING_SYNC_ST:
> > -		return rte_ring_sp_enqueue_burst(r, obj_table, n,
> > free_space);
> > -#ifdef ALLOW_EXPERIMENTAL_API
> > -	case RTE_RING_SYNC_MT_RTS:
> > -		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
> > -			free_space);
> > -	case RTE_RING_SYNC_MT_HTS:
> > -		return rte_ring_mp_hts_enqueue_burst(r, obj_table, n,
> > -			free_space);
> > -#endif
> > -	}
> > -
> > -	/* valid ring should never reach this point */
> > -	RTE_ASSERT(0);
> > -	return 0;
> > +	return rte_ring_enqueue_burst_elem(r, obj_table, sizeof(void *),
> > +			n, free_space);
> >  }
> >
> >  /**
> > @@ -954,8 +746,8 @@ static __rte_always_inline unsigned int
> > rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
> >  		unsigned int n, unsigned int *available)  {
> > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> > available);
> > +	return rte_ring_mc_dequeue_burst_elem(r, obj_table, sizeof(void
> > *),
> > +			n, available);
> >  }
> >
> >  /**
> > @@ -979,8 +771,8 @@ static __rte_always_inline unsigned int
> > rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
> >  		unsigned int n, unsigned int *available)  {
> > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> > available);
> > +	return rte_ring_sc_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > +			n, available);
> >  }
> >
> >  /**
> > @@ -1006,24 +798,8 @@ static __rte_always_inline unsigned int
> > rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
> >  		unsigned int n, unsigned int *available)  {
> > -	switch (r->cons.sync_type) {
> > -	case RTE_RING_SYNC_MT:
> > -		return rte_ring_mc_dequeue_burst(r, obj_table, n,
> > available);
> > -	case RTE_RING_SYNC_ST:
> > -		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
> > -#ifdef ALLOW_EXPERIMENTAL_API
> > -	case RTE_RING_SYNC_MT_RTS:
> > -		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
> > -			available);
> > -	case RTE_RING_SYNC_MT_HTS:
> > -		return rte_ring_mc_hts_dequeue_burst(r, obj_table, n,
> > -			available);
> > -#endif
> > -	}
> > -
> > -	/* valid ring should never reach this point */
> > -	RTE_ASSERT(0);
> > -	return 0;
> > +	return rte_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > +			n, available);
> >  }
> >
> >  #ifdef __cplusplus
> > --
> > 2.17.1
  
Ananyev, Konstantin July 7, 2020, 2:45 p.m. UTC | #3
> 
> Use rte_ring_xxx_elem_xxx APIs to replace legacy API implementation.
> This reduces code duplication and improves code maintenance.
> 
> aarch64:
> HW:N1sdp, 1 socket, 4 cores, 1 thread/core, 2.6GHz
> OS:Ubuntu 18.04.1 LTS, Kernel: 5.4.0+
> DPDK: 20.05-rc3, Configuration: arm64-n1sdp-linux-gcc
> gcc:9.2.1
> 
> $sudo ./arm64-n1sdp-linux-gcc/app/test -l 1-2
> RTE>>ring_perf_autotest
> 
> test results on aarch64 in the case of esize 4:
> 
>                                      without this patch   with this patch
> Testing burst enq/deq
> legacy APIs: SP/SC: burst (size: 8):          1.11              1.10
> legacy APIs: SP/SC: burst (size: 32):         1.95              1.97
> legacy APIs: MP/MC: burst (size: 8):          1.86              1.94
> legacy APIs: MP/MC: burst (size: 32):         2.65              2.69
> Testing bulk enq/deq
> legacy APIs: SP/SC: bulk (size: 8):           1.08              1.09
> legacy APIs: SP/SC: bulk (size: 32):          1.89              1.90
> legacy APIs: MP/MC: bulk (size: 8):           1.85              1.98
> legacy APIs: MP/MC: bulk (size: 32):          2.65              2.69
> 
> x86:
> HW: dell, CPU Intel(R) Xeon(R) Gold 6240, 2 sockets, 18 cores/socket,
> 1 thread/core, 3.3GHz
> OS: Ubuntu 20.04 LTS, Kernel: 5.4.0-37-generic
> DPDK: 20.05-rc3, Configuration: x86_64-native-linuxapp-gcc
> gcc: 9.3.0
> 
> $sudo ./x86_64-native-linuxapp-gcc/app/test -l 14,16
> RTE>>ring_perf_autotest
> 
> test results on x86 in the case of esize 4:
> 
>                                      without this patch   with this patch
> Testing burst enq/deq
> legacy APIs: SP/SC: burst (size: 8):          29.35             27.78
> legacy APIs: SP/SC: burst (size: 32):         73.11             73.39
> legacy APIs: MP/MC: burst (size: 8):          62.36             62.37
> legacy APIs: MP/MC: burst (size: 32):         101.01            101.03
> Testing bulk enq/deq
> legacy APIs: SP/SC: bulk (size: 8):           25.94             29.55
> legacy APIs: SP/SC: bulk (size: 32):          70.00             78.87
> legacy APIs: MP/MC: bulk (size: 8):           63.41             62.48
> legacy APIs: MP/MC: bulk (size: 32):          105.86            103.84
> 
> Summary:
> In aarch64 server with this patch, there is almost no performance
> difference.
> In x86 server with this patch, in some cases, the performance slightly
> improve, in other cases, the performance slightly drop.
> 
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 2.17.1
  
Honnappa Nagarahalli July 7, 2020, 4:21 p.m. UTC | #4
<snip>

> 
> Hi Feifei,
> 
>  > Hi, Konstantin, David
> >
> > I'm Feifei Wang from Arm. Sorry to make the following request:
> > Would you please do some ring performance tests of this patch in your
> platforms at the time you are free?
> > And I want to know whether this patch has a significant impact on other
> platforms except ARM.
> 
> I run few tests on SKX box and so far didn’t notice any real perf difference.
Thanks Konstantin for testing this.

> Konstantin
> 
> > Thanks very much.
> > Feifei
> >
> > > -----Original Message-----
> > > From: Feifei Wang <feifei.wang2@arm.com>
> > > Sent: 2020年7月3日 18:27
> > > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Konstantin
> > > Ananyev <konstantin.ananyev@intel.com>
> > > Cc: dev@dpdk.org; nd <nd@arm.com>; Feifei Wang
> > > <Feifei.Wang2@arm.com>
> > > Subject: [PATCH 3/3] ring: use element APIs to implement legacy APIs
> > >
> > > Use rte_ring_xxx_elem_xxx APIs to replace legacy API implementation.
> > > This reduces code duplication and improves code maintenance.
> > >
> > > aarch64:
> > > HW:N1sdp, 1 socket, 4 cores, 1 thread/core, 2.6GHz OS:Ubuntu 18.04.1
> > > LTS,
> > > Kernel: 5.4.0+
> > > DPDK: 20.05-rc3, Configuration: arm64-n1sdp-linux-gcc
> > > gcc:9.2.1
> > >
> > > $sudo ./arm64-n1sdp-linux-gcc/app/test -l 1-2
> > > RTE>>ring_perf_autotest
> > >
> > > test results on aarch64 in the case of esize 4:
> > >
> > >                                      without this patch   with this patch
> > > Testing burst enq/deq
> > > legacy APIs: SP/SC: burst (size: 8):          1.11              1.10
> > > legacy APIs: SP/SC: burst (size: 32):         1.95              1.97
> > > legacy APIs: MP/MC: burst (size: 8):          1.86              1.94
> > > legacy APIs: MP/MC: burst (size: 32):         2.65              2.69
> > > Testing bulk enq/deq
> > > legacy APIs: SP/SC: bulk (size: 8):           1.08              1.09
> > > legacy APIs: SP/SC: bulk (size: 32):          1.89              1.90
> > > legacy APIs: MP/MC: bulk (size: 8):           1.85              1.98
> > > legacy APIs: MP/MC: bulk (size: 32):          2.65              2.69
> > >
> > > x86:
> > > HW: dell, CPU Intel(R) Xeon(R) Gold 6240, 2 sockets, 18
> > > cores/socket,
> > > 1 thread/core, 3.3GHz
> > > OS: Ubuntu 20.04 LTS, Kernel: 5.4.0-37-generic
> > > DPDK: 20.05-rc3, Configuration: x86_64-native-linuxapp-gcc
> > > gcc: 9.3.0
> > >
> > > $sudo ./x86_64-native-linuxapp-gcc/app/test -l 14,16
> > > RTE>>ring_perf_autotest
> > >
> > > test results on x86 in the case of esize 4:
> > >
> > >                                      without this patch   with this patch
> > > Testing burst enq/deq
> > > legacy APIs: SP/SC: burst (size: 8):          29.35             27.78
> > > legacy APIs: SP/SC: burst (size: 32):         73.11             73.39
> > > legacy APIs: MP/MC: burst (size: 8):          62.36             62.37
> > > legacy APIs: MP/MC: burst (size: 32):         101.01            101.03
> > > Testing bulk enq/deq
> > > legacy APIs: SP/SC: bulk (size: 8):           25.94             29.55
> > > legacy APIs: SP/SC: bulk (size: 32):          70.00             78.87
> > > legacy APIs: MP/MC: bulk (size: 8):           63.41             62.48
> > > legacy APIs: MP/MC: bulk (size: 32):          105.86            103.84
> > >
> > > Summary:
> > > In aarch64 server with this patch, there is almost no performance
> difference.
> > > In x86 server with this patch, in some cases, the performance
> > > slightly improve, in other cases, the performance slightly drop.
I suggest removing the perf data from the commit message, as Konstantin confirmed there is no perf drop on x86 either.

> > >
> > > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> > >  lib/librte_ring/rte_ring.h | 284
> > > ++++---------------------------------
> > >  1 file changed, 30 insertions(+), 254 deletions(-)
> > >
> > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > index 35f3f8c42..2a2190bfc 100644
> > > --- a/lib/librte_ring/rte_ring.h
> > > +++ b/lib/librte_ring/rte_ring.h
> > > @@ -191,168 +191,6 @@ void rte_ring_free(struct rte_ring *r);
> > >   */
> > >  void rte_ring_dump(FILE *f, const struct rte_ring *r);
> > >
> > > -/* the actual enqueue of pointers on the ring.
> > > - * Placed here since identical code needed in both
> > > - * single and multi producer enqueue functions */ -#define
> > > ENQUEUE_PTRS(r, ring_start, prod_head, obj_table, n, obj_type) do { \
> > > -	unsigned int i; \
> > > -	const uint32_t size = (r)->size; \
> > > -	uint32_t idx = prod_head & (r)->mask; \
> > > -	obj_type *ring = (obj_type *)ring_start; \
> > > -	if (likely(idx + n < size)) { \
> > > -		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) { \
> > > -			ring[idx] = obj_table[i]; \
> > > -			ring[idx + 1] = obj_table[i + 1]; \
> > > -			ring[idx + 2] = obj_table[i + 2]; \
> > > -			ring[idx + 3] = obj_table[i + 3]; \
> > > -		} \
> > > -		switch (n & 0x3) { \
> > > -		case 3: \
> > > -			ring[idx++] = obj_table[i++]; /* fallthrough */ \
> > > -		case 2: \
> > > -			ring[idx++] = obj_table[i++]; /* fallthrough */ \
> > > -		case 1: \
> > > -			ring[idx++] = obj_table[i++]; \
> > > -		} \
> > > -	} else { \
> > > -		for (i = 0; idx < size; i++, idx++)\
> > > -			ring[idx] = obj_table[i]; \
> > > -		for (idx = 0; i < n; i++, idx++) \
> > > -			ring[idx] = obj_table[i]; \
> > > -	} \
> > > -} while (0)
> > > -
> > > -/* the actual copy of pointers on the ring to obj_table.
> > > - * Placed here since identical code needed in both
> > > - * single and multi consumer dequeue functions */ -#define
> > > DEQUEUE_PTRS(r, ring_start, cons_head, obj_table, n, obj_type) do { \
> > > -	unsigned int i; \
> > > -	uint32_t idx = cons_head & (r)->mask; \
> > > -	const uint32_t size = (r)->size; \
> > > -	obj_type *ring = (obj_type *)ring_start; \
> > > -	if (likely(idx + n < size)) { \
> > > -		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {\
> > > -			obj_table[i] = ring[idx]; \
> > > -			obj_table[i + 1] = ring[idx + 1]; \
> > > -			obj_table[i + 2] = ring[idx + 2]; \
> > > -			obj_table[i + 3] = ring[idx + 3]; \
> > > -		} \
> > > -		switch (n & 0x3) { \
> > > -		case 3: \
> > > -			obj_table[i++] = ring[idx++]; /* fallthrough */ \
> > > -		case 2: \
> > > -			obj_table[i++] = ring[idx++]; /* fallthrough */ \
> > > -		case 1: \
> > > -			obj_table[i++] = ring[idx++]; \
> > > -		} \
> > > -	} else { \
> > > -		for (i = 0; idx < size; i++, idx++) \
> > > -			obj_table[i] = ring[idx]; \
> > > -		for (idx = 0; i < n; i++, idx++) \
> > > -			obj_table[i] = ring[idx]; \
> > > -	} \
> > > -} while (0)
> > > -
> > > -/* Between load and load. there might be cpu reorder in weak model
> > > - * (powerpc/arm).
> > > - * There are 2 choices for the users
> > > - * 1.use rmb() memory barrier
> > > - * 2.use one-direction load_acquire/store_release barrier,defined
> > > by
> > > - * CONFIG_RTE_USE_C11_MEM_MODEL=y
> > > - * It depends on performance test results.
> > > - * By default, move common functions to rte_ring_generic.h
> > > - */
> > > -#ifdef RTE_USE_C11_MEM_MODEL
> > > -#include "rte_ring_c11_mem.h"
> > > -#else
> > > -#include "rte_ring_generic.h"
> > > -#endif
> > > -
> > > -/**
> > > - * @internal Enqueue several objects on the ring
> > > - *
> > > -  * @param r
> > > - *   A pointer to the ring structure.
> > > - * @param obj_table
> > > - *   A pointer to a table of void * pointers (objects).
> > > - * @param n
> > > - *   The number of objects to add in the ring from the obj_table.
> > > - * @param behavior
> > > - *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a
> > > ring
> > > - *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible
> from
> > > ring
> > > - * @param is_sp
> > > - *   Indicates whether to use single producer or multi-producer head
> update
> > > - * @param free_space
> > > - *   returns the amount of space after the enqueue operation has finished
> > > - * @return
> > > - *   Actual number of objects enqueued.
> > > - *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > > - */
> > > -static __rte_always_inline unsigned int
> > > -__rte_ring_do_enqueue(struct rte_ring *r, void * const *obj_table,
> > > -		 unsigned int n, enum rte_ring_queue_behavior behavior,
> > > -		 unsigned int is_sp, unsigned int *free_space)
> > > -{
> > > -	uint32_t prod_head, prod_next;
> > > -	uint32_t free_entries;
> > > -
> > > -	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
> > > -			&prod_head, &prod_next, &free_entries);
> > > -	if (n == 0)
> > > -		goto end;
> > > -
> > > -	ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n, void *);
> > > -
> > > -	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> > > -end:
> > > -	if (free_space != NULL)
> > > -		*free_space = free_entries - n;
> > > -	return n;
> > > -}
> > > -
> > > -/**
> > > - * @internal Dequeue several objects from the ring
> > > - *
> > > - * @param r
> > > - *   A pointer to the ring structure.
> > > - * @param obj_table
> > > - *   A pointer to a table of void * pointers (objects).
> > > - * @param n
> > > - *   The number of objects to pull from the ring.
> > > - * @param behavior
> > > - *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a
> > > ring
> > > - *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible
> from
> > > ring
> > > - * @param is_sc
> > > - *   Indicates whether to use single consumer or multi-consumer head
> > > update
> > > - * @param available
> > > - *   returns the number of remaining ring entries after the dequeue has
> > > finished
> > > - * @return
> > > - *   - Actual number of objects dequeued.
> > > - *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > > - */
> > > -static __rte_always_inline unsigned int
> > > -__rte_ring_do_dequeue(struct rte_ring *r, void **obj_table,
> > > -		 unsigned int n, enum rte_ring_queue_behavior behavior,
> > > -		 unsigned int is_sc, unsigned int *available)
> > > -{
> > > -	uint32_t cons_head, cons_next;
> > > -	uint32_t entries;
> > > -
> > > -	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
> > > -			&cons_head, &cons_next, &entries);
> > > -	if (n == 0)
> > > -		goto end;
> > > -
> > > -	DEQUEUE_PTRS(r, &r[1], cons_head, obj_table, n, void *);
> > > -
> > > -	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
> > > -
> > > -end:
> > > -	if (available != NULL)
> > > -		*available = entries - n;
> > > -	return n;
> > > -}
> > > -
> > >  /**
> > >   * Enqueue several objects on the ring (multi-producers safe).
> > >   *
> > > @@ -375,8 +213,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > >  			 unsigned int n, unsigned int *free_space)  {
> > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			RTE_RING_SYNC_MT, free_space);
> > > +	return rte_ring_mp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -398,8 +236,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > >  			 unsigned int n, unsigned int *free_space)  {
> > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			RTE_RING_SYNC_ST, free_space);
> > > +	return rte_ring_sp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -425,24 +263,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > >  		      unsigned int n, unsigned int *free_space)  {
> > > -	switch (r->prod.sync_type) {
> > > -	case RTE_RING_SYNC_MT:
> > > -		return rte_ring_mp_enqueue_bulk(r, obj_table, n,
> > > free_space);
> > > -	case RTE_RING_SYNC_ST:
> > > -		return rte_ring_sp_enqueue_bulk(r, obj_table, n,
> > > free_space);
> > > -#ifdef ALLOW_EXPERIMENTAL_API
> > > -	case RTE_RING_SYNC_MT_RTS:
> > > -		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
> > > -			free_space);
> > > -	case RTE_RING_SYNC_MT_HTS:
> > > -		return rte_ring_mp_hts_enqueue_bulk(r, obj_table, n,
> > > -			free_space);
> > > -#endif
> > > -	}
> > > -
> > > -	/* valid ring should never reach this point */
> > > -	RTE_ASSERT(0);
> > > -	return 0;
> > > +	return rte_ring_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -462,7 +284,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void *
> > > const *obj_table,  static __rte_always_inline int
> > > rte_ring_mp_enqueue(struct rte_ring *r, void *obj)  {
> > > -	return rte_ring_mp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > > +	return rte_ring_mp_enqueue_elem(r, obj, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -479,7 +301,7 @@ rte_ring_mp_enqueue(struct rte_ring *r, void
> > > *obj) static __rte_always_inline int  rte_ring_sp_enqueue(struct
> > > rte_ring *r, void
> > > *obj)  {
> > > -	return rte_ring_sp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > > +	return rte_ring_sp_enqueue_elem(r, obj, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -500,7 +322,7 @@ rte_ring_sp_enqueue(struct rte_ring *r, void
> > > *obj) static __rte_always_inline int  rte_ring_enqueue(struct
> > > rte_ring *r, void *obj) {
> > > -	return rte_ring_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > > +	return rte_ring_enqueue_elem(r, obj, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -525,8 +347,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			RTE_RING_SYNC_MT, available);
> > > +	return rte_ring_mc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -549,8 +371,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			RTE_RING_SYNC_ST, available);
> > > +	return rte_ring_sc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -576,22 +398,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
> > >  		unsigned int *available)
> > >  {
> > > -	switch (r->cons.sync_type) {
> > > -	case RTE_RING_SYNC_MT:
> > > -		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
> > > -	case RTE_RING_SYNC_ST:
> > > -		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
> > > -#ifdef ALLOW_EXPERIMENTAL_API
> > > -	case RTE_RING_SYNC_MT_RTS:
> > > -		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n,
> > > available);
> > > -	case RTE_RING_SYNC_MT_HTS:
> > > -		return rte_ring_mc_hts_dequeue_bulk(r, obj_table, n,
> > > available);
> > > -#endif
> > > -	}
> > > -
> > > -	/* valid ring should never reach this point */
> > > -	RTE_ASSERT(0);
> > > -	return 0;
> > > +	return rte_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -612,7 +420,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void
> > > **obj_table, unsigned int n,  static __rte_always_inline int
> > > rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)  {
> > > -	return rte_ring_mc_dequeue_bulk(r, obj_p, 1, NULL)  ? 0 : -ENOENT;
> > > +	return rte_ring_mc_dequeue_elem(r, obj_p, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -630,7 +438,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void
> > > **obj_p)  static __rte_always_inline int  rte_ring_sc_dequeue(struct
> > > rte_ring *r, void **obj_p)  {
> > > -	return rte_ring_sc_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> > > +	return rte_ring_sc_dequeue_elem(r, obj_p, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -652,7 +460,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void
> > > **obj_p) static __rte_always_inline int  rte_ring_dequeue(struct
> > > rte_ring *r, void
> > > **obj_p)  {
> > > -	return rte_ring_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> > > +	return rte_ring_dequeue_elem(r, obj_p, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -860,8 +668,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> > >  			 unsigned int n, unsigned int *free_space)  {
> > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> > > free_space);
> > > +	return rte_ring_mp_enqueue_burst_elem(r, obj_table, sizeof(void
> > > *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -883,8 +691,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> > >  			 unsigned int n, unsigned int *free_space)  {
> > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> > > free_space);
> > > +	return rte_ring_sp_enqueue_burst_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -910,24 +718,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> > >  		      unsigned int n, unsigned int *free_space)  {
> > > -	switch (r->prod.sync_type) {
> > > -	case RTE_RING_SYNC_MT:
> > > -		return rte_ring_mp_enqueue_burst(r, obj_table, n,
> > > free_space);
> > > -	case RTE_RING_SYNC_ST:
> > > -		return rte_ring_sp_enqueue_burst(r, obj_table, n,
> > > free_space);
> > > -#ifdef ALLOW_EXPERIMENTAL_API
> > > -	case RTE_RING_SYNC_MT_RTS:
> > > -		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
> > > -			free_space);
> > > -	case RTE_RING_SYNC_MT_HTS:
> > > -		return rte_ring_mp_hts_enqueue_burst(r, obj_table, n,
> > > -			free_space);
> > > -#endif
> > > -	}
> > > -
> > > -	/* valid ring should never reach this point */
> > > -	RTE_ASSERT(0);
> > > -	return 0;
> > > +	return rte_ring_enqueue_burst_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -954,8 +746,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> > > available);
> > > +	return rte_ring_mc_dequeue_burst_elem(r, obj_table, sizeof(void
> > > *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -979,8 +771,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> > > available);
> > > +	return rte_ring_sc_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -1006,24 +798,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	switch (r->cons.sync_type) {
> > > -	case RTE_RING_SYNC_MT:
> > > -		return rte_ring_mc_dequeue_burst(r, obj_table, n,
> > > available);
> > > -	case RTE_RING_SYNC_ST:
> > > -		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
> > > -#ifdef ALLOW_EXPERIMENTAL_API
> > > -	case RTE_RING_SYNC_MT_RTS:
> > > -		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
> > > -			available);
> > > -	case RTE_RING_SYNC_MT_HTS:
> > > -		return rte_ring_mc_hts_dequeue_burst(r, obj_table, n,
> > > -			available);
> > > -#endif
> > > -	}
> > > -
> > > -	/* valid ring should never reach this point */
> > > -	RTE_ASSERT(0);
> > > -	return 0;
> > > +	return rte_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  #ifdef __cplusplus
> > > --
> > > 2.17.1
  
David Christensen July 7, 2020, 8:07 p.m. UTC | #5
On 7/7/20 7:04 AM, Ananyev, Konstantin wrote:
> 
> Hi Feifei,
> 
>   > Hi, Konstantin, David
>>
>> I'm Feifei Wang from Arm. Sorry to make the following request:
>> Would you please do some ring performance tests of this patch in your platforms at the time you are free?
>> And I want to know whether this patch has a significant impact on other platforms except ARM.
> 
> I run few tests on SKX box and so far didn’t notice any real perf difference.
> Konstantin
>   

Full performance results for IBM POWER9 system below.  I ran the tests  
twice for each version and the results were consistent.

                                      without this patch   with this patch
Testing burst enq/deq
legacy APIs: SP/SC: burst (size: 8):          43.63              43.63
legacy APIs: SP/SC: burst (size: 32):         50.07              50.04
legacy APIs: MP/MC: burst (size: 8):          58.43              58.42
legacy APIs: MP/MC: burst (size: 32):         65.52              65.51
Testing bulk enq/deq
legacy APIs: SP/SC: bulk (size: 8):           43.61              43.61
legacy APIs: SP/SC: bulk (size: 32):          50.05              50.02
legacy APIs: MP/MC: bulk (size: 8):           58.43              58.43
legacy APIs: MP/MC: bulk (size: 32):          65.50              65.49

HW:
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              128
On-line CPU(s) list: 0-127
Thread(s) per core:  4
Core(s) per socket:  16
Socket(s):           2
NUMA node(s):        6
Model:               2.3 (pvr 004e 1203)
Model name:          POWER9, altivec supported
CPU max MHz:         3800.0000
CPU min MHz:         2300.0000
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            10240K
NUMA node0 CPU(s):   0-63
NUMA node8 CPU(s):   64-127

OS: RHEL 8.2

GCC: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)

DPDK: 20.08.0-rc0 (a8550b773)



Unpatched
===========
sudo app/test/dpdk-test -l 68,69
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.0 (socket 0)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.1 (socket 0)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.0 (socket 8)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.1 (socket 8)
EAL:   using IOMMU type 7 (sPAPR)
EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.0 (socket 8)
EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.1 (socket 8)
APP: HPET is not enabled, using TSC as default timer
RTE>>ring_perf_autotest

### Testing single element enq/deq ###
legacy APIs: SP/SC: single: 42.01
legacy APIs: MP/MC: single: 56.27

### Testing burst enq/deq ###
legacy APIs: SP/SC: burst (size: 8): 43.63
legacy APIs: SP/SC: burst (size: 32): 50.07
legacy APIs: MP/MC: burst (size: 8): 58.43
legacy APIs: MP/MC: burst (size: 32): 65.52

### Testing bulk enq/deq ###
legacy APIs: SP/SC: bulk (size: 8): 43.61
legacy APIs: SP/SC: bulk (size: 32): 50.05
legacy APIs: MP/MC: bulk (size: 8): 58.43
legacy APIs: MP/MC: bulk (size: 32): 65.50

### Testing empty bulk deq ###
legacy APIs: SP/SC: bulk (size: 8): 7.16
legacy APIs: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
legacy APIs: SP/SC: bulk (size: 8): 12.44
legacy APIs: MP/MC: bulk (size: 8): 16.19
legacy APIs: SP/SC: bulk (size: 32): 3.10
legacy APIs: MP/MC: bulk (size: 32): 3.64

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [68] count = 362382
Core [69] count = 362516
Total count (size: 8): 724898

Bulk enq/dequeue count on size 32
Core [68] count = 361565
Core [69] count = 361852
Total count (size: 32): 723417

### Testing single element enq/deq ###
elem APIs: element size 16B: SP/SC: single: 42.81
elem APIs: element size 16B: MP/MC: single: 56.78

### Testing burst enq/deq ###
elem APIs: element size 16B: SP/SC: burst (size: 8): 45.04
elem APIs: element size 16B: SP/SC: burst (size: 32): 59.27
elem APIs: element size 16B: MP/MC: burst (size: 8): 60.68
elem APIs: element size 16B: MP/MC: burst (size: 32): 75.00

### Testing bulk enq/deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 45.05
elem APIs: element size 16B: SP/SC: bulk (size: 32): 59.23
elem APIs: element size 16B: MP/MC: bulk (size: 8): 60.64
elem APIs: element size 16B: MP/MC: bulk (size: 32): 75.11

### Testing empty bulk deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.16
elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 12.15
elem APIs: element size 16B: MP/MC: bulk (size: 8): 15.55
elem APIs: element size 16B: SP/SC: bulk (size: 32): 3.22
elem APIs: element size 16B: MP/MC: bulk (size: 32): 3.86

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [68] count = 374327
Core [69] count = 374433
Total count (size: 8): 748760

Bulk enq/dequeue count on size 32
Core [68] count = 324111
Core [69] count = 320038
Total count (size: 32): 644149
Test OK

Patched
=======
$ sudo app/test/dpdk-test -l 68,69
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.0 (socket 0)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.1 (socket 0)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.0 (socket 8)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.1 (socket 8)
EAL:   using IOMMU type 7 (sPAPR)
EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.0 (socket 8)
EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.1 (socket 8)
APP: HPET is not enabled, using TSC as default timer
RTE>>ring_perf_autotest

### Testing single element enq/deq ###
legacy APIs: SP/SC: single: 42.00
legacy APIs: MP/MC: single: 56.27

### Testing burst enq/deq ###
legacy APIs: SP/SC: burst (size: 8): 43.63
legacy APIs: SP/SC: burst (size: 32): 50.04
legacy APIs: MP/MC: burst (size: 8): 58.42
legacy APIs: MP/MC: burst (size: 32): 65.51

### Testing bulk enq/deq ###
legacy APIs: SP/SC: bulk (size: 8): 43.61
legacy APIs: SP/SC: bulk (size: 32): 50.02
legacy APIs: MP/MC: bulk (size: 8): 58.43
legacy APIs: MP/MC: bulk (size: 32): 65.49

### Testing empty bulk deq ###
legacy APIs: SP/SC: bulk (size: 8): 7.16
legacy APIs: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
legacy APIs: SP/SC: bulk (size: 8): 12.43
legacy APIs: MP/MC: bulk (size: 8): 16.17
legacy APIs: SP/SC: bulk (size: 32): 3.10
legacy APIs: MP/MC: bulk (size: 32): 3.65

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [68] count = 363208
Core [69] count = 363334
Total count (size: 8): 726542

Bulk enq/dequeue count on size 32
Core [68] count = 361592
Core [69] count = 361690
Total count (size: 32): 723282

### Testing single element enq/deq ###
elem APIs: element size 16B: SP/SC: single: 42.78
elem APIs: element size 16B: MP/MC: single: 56.75

### Testing burst enq/deq ###
elem APIs: element size 16B: SP/SC: burst (size: 8): 45.04
elem APIs: element size 16B: SP/SC: burst (size: 32): 59.27
elem APIs: element size 16B: MP/MC: burst (size: 8): 60.66
elem APIs: element size 16B: MP/MC: burst (size: 32): 75.03

### Testing bulk enq/deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 45.04
elem APIs: element size 16B: SP/SC: bulk (size: 32): 59.33
elem APIs: element size 16B: MP/MC: bulk (size: 8): 60.65
elem APIs: element size 16B: MP/MC: bulk (size: 32): 75.04

### Testing empty bulk deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.16
elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.16

### Testing using two hyperthreads ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 12.14
elem APIs: element size 16B: MP/MC: bulk (size: 8): 15.56
elem APIs: element size 16B: SP/SC: bulk (size: 32): 3.22
elem APIs: element size 16B: MP/MC: bulk (size: 32): 3.86

### Testing using all slave nodes ###

Bulk enq/dequeue count on size 8
Core [68] count = 372618
Core [69] count = 372415
Total count (size: 8): 745033

Bulk enq/dequeue count on size 32
Core [68] count = 318784
Core [69] count = 316066
Total count (size: 32): 634850
Test OK
  
Feifei Wang July 8, 2020, 1:21 a.m. UTC | #6
Hi, Konstantin

Thanks very much for your effort.

Feifei

> -----Original Message-----
> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Sent: 2020年7月7日 22:04
> To: Feifei Wang <Feifei.Wang2@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; drc@linux.vnet.ibm.com
> Cc: dev@dpdk.org; nd <nd@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH 3/3] ring: use element APIs to implement legacy APIs
> 
> 
> Hi Feifei,
> 
>  > Hi, Konstantin, David
> >
> > I'm Feifei Wang from Arm. Sorry to make the following request:
> > Would you please do some ring performance tests of this patch in your
> platforms at the time you are free?
> > And I want to know whether this patch has a significant impact on other
> platforms except ARM.
> 
> I run few tests on SKX box and so far didn’t notice any real perf difference.
> Konstantin
> 
> > Thanks very much.
> > Feifei
> >
> > > -----Original Message-----
> > > From: Feifei Wang <feifei.wang2@arm.com>
> > > Sent: 2020年7月3日 18:27
> > > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Konstantin
> > > Ananyev <konstantin.ananyev@intel.com>
> > > Cc: dev@dpdk.org; nd <nd@arm.com>; Feifei Wang
> > > <Feifei.Wang2@arm.com>
> > > Subject: [PATCH 3/3] ring: use element APIs to implement legacy APIs
> > >
> > > Use rte_ring_xxx_elem_xxx APIs to replace legacy API implementation.
> > > This reduces code duplication and improves code maintenance.
> > >
> > > aarch64:
> > > HW:N1sdp, 1 socket, 4 cores, 1 thread/core, 2.6GHz OS:Ubuntu 18.04.1
> > > LTS,
> > > Kernel: 5.4.0+
> > > DPDK: 20.05-rc3, Configuration: arm64-n1sdp-linux-gcc
> > > gcc:9.2.1
> > >
> > > $sudo ./arm64-n1sdp-linux-gcc/app/test -l 1-2
> > > RTE>>ring_perf_autotest
> > >
> > > test results on aarch64 in the case of esize 4:
> > >
> > >                                      without this patch   with this patch
> > > Testing burst enq/deq
> > > legacy APIs: SP/SC: burst (size: 8):          1.11              1.10
> > > legacy APIs: SP/SC: burst (size: 32):         1.95              1.97
> > > legacy APIs: MP/MC: burst (size: 8):          1.86              1.94
> > > legacy APIs: MP/MC: burst (size: 32):         2.65              2.69
> > > Testing bulk enq/deq
> > > legacy APIs: SP/SC: bulk (size: 8):           1.08              1.09
> > > legacy APIs: SP/SC: bulk (size: 32):          1.89              1.90
> > > legacy APIs: MP/MC: bulk (size: 8):           1.85              1.98
> > > legacy APIs: MP/MC: bulk (size: 32):          2.65              2.69
> > >
> > > x86:
> > > HW: dell, CPU Intel(R) Xeon(R) Gold 6240, 2 sockets, 18
> > > cores/socket,
> > > 1 thread/core, 3.3GHz
> > > OS: Ubuntu 20.04 LTS, Kernel: 5.4.0-37-generic
> > > DPDK: 20.05-rc3, Configuration: x86_64-native-linuxapp-gcc
> > > gcc: 9.3.0
> > >
> > > $sudo ./x86_64-native-linuxapp-gcc/app/test -l 14,16
> > > RTE>>ring_perf_autotest
> > >
> > > test results on x86 in the case of esize 4:
> > >
> > >                                      without this patch   with this patch
> > > Testing burst enq/deq
> > > legacy APIs: SP/SC: burst (size: 8):          29.35             27.78
> > > legacy APIs: SP/SC: burst (size: 32):         73.11             73.39
> > > legacy APIs: MP/MC: burst (size: 8):          62.36             62.37
> > > legacy APIs: MP/MC: burst (size: 32):         101.01            101.03
> > > Testing bulk enq/deq
> > > legacy APIs: SP/SC: bulk (size: 8):           25.94             29.55
> > > legacy APIs: SP/SC: bulk (size: 32):          70.00             78.87
> > > legacy APIs: MP/MC: bulk (size: 8):           63.41             62.48
> > > legacy APIs: MP/MC: bulk (size: 32):          105.86            103.84
> > >
> > > Summary:
> > > In aarch64 server with this patch, there is almost no performance
> difference.
> > > In x86 server with this patch, in some cases, the performance
> > > slightly improve, in other cases, the performance slightly drop.
> > >
> > > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> > >  lib/librte_ring/rte_ring.h | 284
> > > ++++---------------------------------
> > >  1 file changed, 30 insertions(+), 254 deletions(-)
> > >
> > > diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
> > > index 35f3f8c42..2a2190bfc 100644
> > > --- a/lib/librte_ring/rte_ring.h
> > > +++ b/lib/librte_ring/rte_ring.h
> > > @@ -191,168 +191,6 @@ void rte_ring_free(struct rte_ring *r);
> > >   */
> > >  void rte_ring_dump(FILE *f, const struct rte_ring *r);
> > >
> > > -/* the actual enqueue of pointers on the ring.
> > > - * Placed here since identical code needed in both
> > > - * single and multi producer enqueue functions */ -#define
> > > ENQUEUE_PTRS(r, ring_start, prod_head, obj_table, n, obj_type) do { \
> > > -	unsigned int i; \
> > > -	const uint32_t size = (r)->size; \
> > > -	uint32_t idx = prod_head & (r)->mask; \
> > > -	obj_type *ring = (obj_type *)ring_start; \
> > > -	if (likely(idx + n < size)) { \
> > > -		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) { \
> > > -			ring[idx] = obj_table[i]; \
> > > -			ring[idx + 1] = obj_table[i + 1]; \
> > > -			ring[idx + 2] = obj_table[i + 2]; \
> > > -			ring[idx + 3] = obj_table[i + 3]; \
> > > -		} \
> > > -		switch (n & 0x3) { \
> > > -		case 3: \
> > > -			ring[idx++] = obj_table[i++]; /* fallthrough */ \
> > > -		case 2: \
> > > -			ring[idx++] = obj_table[i++]; /* fallthrough */ \
> > > -		case 1: \
> > > -			ring[idx++] = obj_table[i++]; \
> > > -		} \
> > > -	} else { \
> > > -		for (i = 0; idx < size; i++, idx++)\
> > > -			ring[idx] = obj_table[i]; \
> > > -		for (idx = 0; i < n; i++, idx++) \
> > > -			ring[idx] = obj_table[i]; \
> > > -	} \
> > > -} while (0)
> > > -
> > > -/* the actual copy of pointers on the ring to obj_table.
> > > - * Placed here since identical code needed in both
> > > - * single and multi consumer dequeue functions */ -#define
> > > DEQUEUE_PTRS(r, ring_start, cons_head, obj_table, n, obj_type) do { \
> > > -	unsigned int i; \
> > > -	uint32_t idx = cons_head & (r)->mask; \
> > > -	const uint32_t size = (r)->size; \
> > > -	obj_type *ring = (obj_type *)ring_start; \
> > > -	if (likely(idx + n < size)) { \
> > > -		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {\
> > > -			obj_table[i] = ring[idx]; \
> > > -			obj_table[i + 1] = ring[idx + 1]; \
> > > -			obj_table[i + 2] = ring[idx + 2]; \
> > > -			obj_table[i + 3] = ring[idx + 3]; \
> > > -		} \
> > > -		switch (n & 0x3) { \
> > > -		case 3: \
> > > -			obj_table[i++] = ring[idx++]; /* fallthrough */ \
> > > -		case 2: \
> > > -			obj_table[i++] = ring[idx++]; /* fallthrough */ \
> > > -		case 1: \
> > > -			obj_table[i++] = ring[idx++]; \
> > > -		} \
> > > -	} else { \
> > > -		for (i = 0; idx < size; i++, idx++) \
> > > -			obj_table[i] = ring[idx]; \
> > > -		for (idx = 0; i < n; i++, idx++) \
> > > -			obj_table[i] = ring[idx]; \
> > > -	} \
> > > -} while (0)
> > > -
> > > -/* Between load and load. there might be cpu reorder in weak model
> > > - * (powerpc/arm).
> > > - * There are 2 choices for the users
> > > - * 1.use rmb() memory barrier
> > > - * 2.use one-direction load_acquire/store_release barrier,defined
> > > by
> > > - * CONFIG_RTE_USE_C11_MEM_MODEL=y
> > > - * It depends on performance test results.
> > > - * By default, move common functions to rte_ring_generic.h
> > > - */
> > > -#ifdef RTE_USE_C11_MEM_MODEL
> > > -#include "rte_ring_c11_mem.h"
> > > -#else
> > > -#include "rte_ring_generic.h"
> > > -#endif
> > > -
> > > -/**
> > > - * @internal Enqueue several objects on the ring
> > > - *
> > > -  * @param r
> > > - *   A pointer to the ring structure.
> > > - * @param obj_table
> > > - *   A pointer to a table of void * pointers (objects).
> > > - * @param n
> > > - *   The number of objects to add in the ring from the obj_table.
> > > - * @param behavior
> > > - *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a
> > > ring
> > > - *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible
> from
> > > ring
> > > - * @param is_sp
> > > - *   Indicates whether to use single producer or multi-producer head
> update
> > > - * @param free_space
> > > - *   returns the amount of space after the enqueue operation has
> finished
> > > - * @return
> > > - *   Actual number of objects enqueued.
> > > - *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > > - */
> > > -static __rte_always_inline unsigned int
> > > -__rte_ring_do_enqueue(struct rte_ring *r, void * const *obj_table,
> > > -		 unsigned int n, enum rte_ring_queue_behavior behavior,
> > > -		 unsigned int is_sp, unsigned int *free_space)
> > > -{
> > > -	uint32_t prod_head, prod_next;
> > > -	uint32_t free_entries;
> > > -
> > > -	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
> > > -			&prod_head, &prod_next, &free_entries);
> > > -	if (n == 0)
> > > -		goto end;
> > > -
> > > -	ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n, void *);
> > > -
> > > -	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> > > -end:
> > > -	if (free_space != NULL)
> > > -		*free_space = free_entries - n;
> > > -	return n;
> > > -}
> > > -
> > > -/**
> > > - * @internal Dequeue several objects from the ring
> > > - *
> > > - * @param r
> > > - *   A pointer to the ring structure.
> > > - * @param obj_table
> > > - *   A pointer to a table of void * pointers (objects).
> > > - * @param n
> > > - *   The number of objects to pull from the ring.
> > > - * @param behavior
> > > - *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a
> > > ring
> > > - *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible
> from
> > > ring
> > > - * @param is_sc
> > > - *   Indicates whether to use single consumer or multi-consumer head
> > > update
> > > - * @param available
> > > - *   returns the number of remaining ring entries after the dequeue has
> > > finished
> > > - * @return
> > > - *   - Actual number of objects dequeued.
> > > - *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > > - */
> > > -static __rte_always_inline unsigned int
> > > -__rte_ring_do_dequeue(struct rte_ring *r, void **obj_table,
> > > -		 unsigned int n, enum rte_ring_queue_behavior behavior,
> > > -		 unsigned int is_sc, unsigned int *available)
> > > -{
> > > -	uint32_t cons_head, cons_next;
> > > -	uint32_t entries;
> > > -
> > > -	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
> > > -			&cons_head, &cons_next, &entries);
> > > -	if (n == 0)
> > > -		goto end;
> > > -
> > > -	DEQUEUE_PTRS(r, &r[1], cons_head, obj_table, n, void *);
> > > -
> > > -	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
> > > -
> > > -end:
> > > -	if (available != NULL)
> > > -		*available = entries - n;
> > > -	return n;
> > > -}
> > > -
> > >  /**
> > >   * Enqueue several objects on the ring (multi-producers safe).
> > >   *
> > > @@ -375,8 +213,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > >  			 unsigned int n, unsigned int *free_space)  {
> > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			RTE_RING_SYNC_MT, free_space);
> > > +	return rte_ring_mp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -398,8 +236,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > >  			 unsigned int n, unsigned int *free_space)  {
> > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			RTE_RING_SYNC_ST, free_space);
> > > +	return rte_ring_sp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -425,24 +263,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > >  		      unsigned int n, unsigned int *free_space)  {
> > > -	switch (r->prod.sync_type) {
> > > -	case RTE_RING_SYNC_MT:
> > > -		return rte_ring_mp_enqueue_bulk(r, obj_table, n,
> > > free_space);
> > > -	case RTE_RING_SYNC_ST:
> > > -		return rte_ring_sp_enqueue_bulk(r, obj_table, n,
> > > free_space);
> > > -#ifdef ALLOW_EXPERIMENTAL_API
> > > -	case RTE_RING_SYNC_MT_RTS:
> > > -		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
> > > -			free_space);
> > > -	case RTE_RING_SYNC_MT_HTS:
> > > -		return rte_ring_mp_hts_enqueue_bulk(r, obj_table, n,
> > > -			free_space);
> > > -#endif
> > > -	}
> > > -
> > > -	/* valid ring should never reach this point */
> > > -	RTE_ASSERT(0);
> > > -	return 0;
> > > +	return rte_ring_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -462,7 +284,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void *
> > > const *obj_table,  static __rte_always_inline int
> > > rte_ring_mp_enqueue(struct rte_ring *r, void *obj)  {
> > > -	return rte_ring_mp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > > +	return rte_ring_mp_enqueue_elem(r, obj, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -479,7 +301,7 @@ rte_ring_mp_enqueue(struct rte_ring *r, void
> > > *obj) static __rte_always_inline int  rte_ring_sp_enqueue(struct
> > > rte_ring *r, void
> > > *obj)  {
> > > -	return rte_ring_sp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > > +	return rte_ring_sp_enqueue_elem(r, obj, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -500,7 +322,7 @@ rte_ring_sp_enqueue(struct rte_ring *r, void
> > > *obj) static __rte_always_inline int  rte_ring_enqueue(struct
> > > rte_ring *r, void *obj) {
> > > -	return rte_ring_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > > +	return rte_ring_enqueue_elem(r, obj, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -525,8 +347,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			RTE_RING_SYNC_MT, available);
> > > +	return rte_ring_mc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -549,8 +371,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > > RTE_RING_QUEUE_FIXED,
> > > -			RTE_RING_SYNC_ST, available);
> > > +	return rte_ring_sc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -576,22 +398,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int
> n,
> > >  		unsigned int *available)
> > >  {
> > > -	switch (r->cons.sync_type) {
> > > -	case RTE_RING_SYNC_MT:
> > > -		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
> > > -	case RTE_RING_SYNC_ST:
> > > -		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
> > > -#ifdef ALLOW_EXPERIMENTAL_API
> > > -	case RTE_RING_SYNC_MT_RTS:
> > > -		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n,
> > > available);
> > > -	case RTE_RING_SYNC_MT_HTS:
> > > -		return rte_ring_mc_hts_dequeue_bulk(r, obj_table, n,
> > > available);
> > > -#endif
> > > -	}
> > > -
> > > -	/* valid ring should never reach this point */
> > > -	RTE_ASSERT(0);
> > > -	return 0;
> > > +	return rte_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -612,7 +420,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void
> > > **obj_table, unsigned int n,  static __rte_always_inline int
> > > rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)  {
> > > -	return rte_ring_mc_dequeue_bulk(r, obj_p, 1, NULL)  ? 0 : -ENOENT;
> > > +	return rte_ring_mc_dequeue_elem(r, obj_p, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -630,7 +438,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void
> > > **obj_p)  static __rte_always_inline int  rte_ring_sc_dequeue(struct
> > > rte_ring *r, void **obj_p)  {
> > > -	return rte_ring_sc_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> > > +	return rte_ring_sc_dequeue_elem(r, obj_p, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -652,7 +460,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void
> > > **obj_p) static __rte_always_inline int  rte_ring_dequeue(struct
> > > rte_ring *r, void
> > > **obj_p)  {
> > > -	return rte_ring_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> > > +	return rte_ring_dequeue_elem(r, obj_p, sizeof(void *));
> > >  }
> > >
> > >  /**
> > > @@ -860,8 +668,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> > >  			 unsigned int n, unsigned int *free_space)  {
> > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> > > free_space);
> > > +	return rte_ring_mp_enqueue_burst_elem(r, obj_table, sizeof(void
> > > *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -883,8 +691,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> > >  			 unsigned int n, unsigned int *free_space)  {
> > > -	return __rte_ring_do_enqueue(r, obj_table, n,
> > > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> > > free_space);
> > > +	return rte_ring_sp_enqueue_burst_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -910,24 +718,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> > >  		      unsigned int n, unsigned int *free_space)  {
> > > -	switch (r->prod.sync_type) {
> > > -	case RTE_RING_SYNC_MT:
> > > -		return rte_ring_mp_enqueue_burst(r, obj_table, n,
> > > free_space);
> > > -	case RTE_RING_SYNC_ST:
> > > -		return rte_ring_sp_enqueue_burst(r, obj_table, n,
> > > free_space);
> > > -#ifdef ALLOW_EXPERIMENTAL_API
> > > -	case RTE_RING_SYNC_MT_RTS:
> > > -		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
> > > -			free_space);
> > > -	case RTE_RING_SYNC_MT_HTS:
> > > -		return rte_ring_mp_hts_enqueue_burst(r, obj_table, n,
> > > -			free_space);
> > > -#endif
> > > -	}
> > > -
> > > -	/* valid ring should never reach this point */
> > > -	RTE_ASSERT(0);
> > > -	return 0;
> > > +	return rte_ring_enqueue_burst_elem(r, obj_table, sizeof(void *),
> > > +			n, free_space);
> > >  }
> > >
> > >  /**
> > > @@ -954,8 +746,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT,
> > > available);
> > > +	return rte_ring_mc_dequeue_burst_elem(r, obj_table, sizeof(void
> > > *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -979,8 +771,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	return __rte_ring_do_dequeue(r, obj_table, n,
> > > -			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST,
> > > available);
> > > +	return rte_ring_sc_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  /**
> > > @@ -1006,24 +798,8 @@ static __rte_always_inline unsigned int
> > > rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
> > >  		unsigned int n, unsigned int *available)  {
> > > -	switch (r->cons.sync_type) {
> > > -	case RTE_RING_SYNC_MT:
> > > -		return rte_ring_mc_dequeue_burst(r, obj_table, n,
> > > available);
> > > -	case RTE_RING_SYNC_ST:
> > > -		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
> > > -#ifdef ALLOW_EXPERIMENTAL_API
> > > -	case RTE_RING_SYNC_MT_RTS:
> > > -		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
> > > -			available);
> > > -	case RTE_RING_SYNC_MT_HTS:
> > > -		return rte_ring_mc_hts_dequeue_burst(r, obj_table, n,
> > > -			available);
> > > -#endif
> > > -	}
> > > -
> > > -	/* valid ring should never reach this point */
> > > -	RTE_ASSERT(0);
> > > -	return 0;
> > > +	return rte_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > > +			n, available);
> > >  }
> > >
> > >  #ifdef __cplusplus
> > > --
> > > 2.17.1
  
Feifei Wang July 8, 2020, 1:30 a.m. UTC | #7
Hi, David

> -----Original Message-----
> From: David Christensen <drc@linux.vnet.ibm.com>
> Sent: 2020年7月8日 4:07
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Feifei Wang
> <Feifei.Wang2@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>
> Cc: dev@dpdk.org; nd <nd@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>
> Subject: Re: [PATCH 3/3] ring: use element APIs to implement legacy APIs
> 
> 
> 
> On 7/7/20 7:04 AM, Ananyev, Konstantin wrote:
> >
> > Hi Feifei,
> >
> >   > Hi, Konstantin, David
> >>
> >> I'm Feifei Wang from Arm. Sorry to make the following request:
> >> Would you please do some ring performance tests of this patch in your
> platforms at the time you are free?
> >> And I want to know whether this patch has a significant impact on other
> platforms except ARM.
> >
> > I run few tests on SKX box and so far didn’t notice any real perf difference.
> > Konstantin
> >
> 
Thanks very much for presenting these test results.  
Feifei
> Full performance results for IBM POWER9 system below.  I ran the tests
> twice for each version and the results were consistent.
> 
>                                       without this patch   with this patch
> Testing burst enq/deq
> legacy APIs: SP/SC: burst (size: 8):          43.63              43.63
> legacy APIs: SP/SC: burst (size: 32):         50.07              50.04
> legacy APIs: MP/MC: burst (size: 8):          58.43              58.42
> legacy APIs: MP/MC: burst (size: 32):         65.52              65.51
> Testing bulk enq/deq
> legacy APIs: SP/SC: bulk (size: 8):           43.61              43.61
> legacy APIs: SP/SC: bulk (size: 32):          50.05              50.02
> legacy APIs: MP/MC: bulk (size: 8):           58.43              58.43
> legacy APIs: MP/MC: bulk (size: 32):          65.50              65.49
> 
> HW:
> Architecture:        ppc64le
> Byte Order:          Little Endian
> CPU(s):              128
> On-line CPU(s) list: 0-127
> Thread(s) per core:  4
> Core(s) per socket:  16
> Socket(s):           2
> NUMA node(s):        6
> Model:               2.3 (pvr 004e 1203)
> Model name:          POWER9, altivec supported
> CPU max MHz:         3800.0000
> CPU min MHz:         2300.0000
> L1d cache:           32K
> L1i cache:           32K
> L2 cache:            512K
> L3 cache:            10240K
> NUMA node0 CPU(s):   0-63
> NUMA node8 CPU(s):   64-127
> 
> OS: RHEL 8.2
> 
> GCC: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)
> 
> DPDK: 20.08.0-rc0 (a8550b773)
> 
> 
> 
> Unpatched
> ===========
> sudo app/test/dpdk-test -l 68,69
> EAL: Detected 128 lcore(s)
> EAL: Detected 2 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-2048kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.0 (socket 0)
> EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.1 (socket 0)
> EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.0 (socket 8)
> EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.1 (socket 8)
> EAL:   using IOMMU type 7 (sPAPR)
> EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.0 (socket 8)
> EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.1 (socket 8)
> APP: HPET is not enabled, using TSC as default timer
> RTE>>ring_perf_autotest
> 
> ### Testing single element enq/deq ###
> legacy APIs: SP/SC: single: 42.01
> legacy APIs: MP/MC: single: 56.27
> 
> ### Testing burst enq/deq ###
> legacy APIs: SP/SC: burst (size: 8): 43.63 legacy APIs: SP/SC: burst (size: 32):
> 50.07 legacy APIs: MP/MC: burst (size: 8): 58.43 legacy APIs: MP/MC: burst
> (size: 32): 65.52
> 
> ### Testing bulk enq/deq ###
> legacy APIs: SP/SC: bulk (size: 8): 43.61 legacy APIs: SP/SC: bulk (size: 32):
> 50.05 legacy APIs: MP/MC: bulk (size: 8): 58.43 legacy APIs: MP/MC: bulk (size:
> 32): 65.50
> 
> ### Testing empty bulk deq ###
> legacy APIs: SP/SC: bulk (size: 8): 7.16 legacy APIs: MP/MC: bulk (size: 8): 7.16
> 
> ### Testing using two hyperthreads ###
> legacy APIs: SP/SC: bulk (size: 8): 12.44 legacy APIs: MP/MC: bulk (size: 8):
> 16.19 legacy APIs: SP/SC: bulk (size: 32): 3.10 legacy APIs: MP/MC: bulk (size:
> 32): 3.64
> 
> ### Testing using all slave nodes ###
> 
> Bulk enq/dequeue count on size 8
> Core [68] count = 362382
> Core [69] count = 362516
> Total count (size: 8): 724898
> 
> Bulk enq/dequeue count on size 32
> Core [68] count = 361565
> Core [69] count = 361852
> Total count (size: 32): 723417
> 
> ### Testing single element enq/deq ###
> elem APIs: element size 16B: SP/SC: single: 42.81 elem APIs: element size 16B:
> MP/MC: single: 56.78
> 
> ### Testing burst enq/deq ###
> elem APIs: element size 16B: SP/SC: burst (size: 8): 45.04 elem APIs: element
> size 16B: SP/SC: burst (size: 32): 59.27 elem APIs: element size 16B: MP/MC:
> burst (size: 8): 60.68 elem APIs: element size 16B: MP/MC: burst (size: 32):
> 75.00
> 
> ### Testing bulk enq/deq ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 45.05 elem APIs: element
> size 16B: SP/SC: bulk (size: 32): 59.23 elem APIs: element size 16B: MP/MC:
> bulk (size: 8): 60.64 elem APIs: element size 16B: MP/MC: bulk (size: 32):
> 75.11
> 
> ### Testing empty bulk deq ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.16 elem APIs: element
> size 16B: MP/MC: bulk (size: 8): 7.16
> 
> ### Testing using two hyperthreads ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 12.15 elem APIs: element
> size 16B: MP/MC: bulk (size: 8): 15.55 elem APIs: element size 16B: SP/SC:
> bulk (size: 32): 3.22 elem APIs: element size 16B: MP/MC: bulk (size: 32): 3.86
> 
> ### Testing using all slave nodes ###
> 
> Bulk enq/dequeue count on size 8
> Core [68] count = 374327
> Core [69] count = 374433
> Total count (size: 8): 748760
> 
> Bulk enq/dequeue count on size 32
> Core [68] count = 324111
> Core [69] count = 320038
> Total count (size: 32): 644149
> Test OK
> 
> Patched
> =======
> $ sudo app/test/dpdk-test -l 68,69
> EAL: Detected 128 lcore(s)
> EAL: Detected 2 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'VA'
> EAL: No available hugepages reported in hugepages-2048kB
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.0 (socket 0)
> EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.1 (socket 0)
> EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.0 (socket 8)
> EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.1 (socket 8)
> EAL:   using IOMMU type 7 (sPAPR)
> EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.0 (socket 8)
> EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.1 (socket 8)
> APP: HPET is not enabled, using TSC as default timer
> RTE>>ring_perf_autotest
> 
> ### Testing single element enq/deq ###
> legacy APIs: SP/SC: single: 42.00
> legacy APIs: MP/MC: single: 56.27
> 
> ### Testing burst enq/deq ###
> legacy APIs: SP/SC: burst (size: 8): 43.63 legacy APIs: SP/SC: burst (size: 32):
> 50.04 legacy APIs: MP/MC: burst (size: 8): 58.42 legacy APIs: MP/MC: burst
> (size: 32): 65.51
> 
> ### Testing bulk enq/deq ###
> legacy APIs: SP/SC: bulk (size: 8): 43.61 legacy APIs: SP/SC: bulk (size: 32):
> 50.02 legacy APIs: MP/MC: bulk (size: 8): 58.43 legacy APIs: MP/MC: bulk (size:
> 32): 65.49
> 
> ### Testing empty bulk deq ###
> legacy APIs: SP/SC: bulk (size: 8): 7.16 legacy APIs: MP/MC: bulk (size: 8): 7.16
> 
> ### Testing using two hyperthreads ###
> legacy APIs: SP/SC: bulk (size: 8): 12.43 legacy APIs: MP/MC: bulk (size: 8):
> 16.17 legacy APIs: SP/SC: bulk (size: 32): 3.10 legacy APIs: MP/MC: bulk (size:
> 32): 3.65
> 
> ### Testing using all slave nodes ###
> 
> Bulk enq/dequeue count on size 8
> Core [68] count = 363208
> Core [69] count = 363334
> Total count (size: 8): 726542
> 
> Bulk enq/dequeue count on size 32
> Core [68] count = 361592
> Core [69] count = 361690
> Total count (size: 32): 723282
> 
> ### Testing single element enq/deq ###
> elem APIs: element size 16B: SP/SC: single: 42.78 elem APIs: element size 16B:
> MP/MC: single: 56.75
> 
> ### Testing burst enq/deq ###
> elem APIs: element size 16B: SP/SC: burst (size: 8): 45.04 elem APIs: element
> size 16B: SP/SC: burst (size: 32): 59.27 elem APIs: element size 16B: MP/MC:
> burst (size: 8): 60.66 elem APIs: element size 16B: MP/MC: burst (size: 32):
> 75.03
> 
> ### Testing bulk enq/deq ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 45.04 elem APIs: element
> size 16B: SP/SC: bulk (size: 32): 59.33 elem APIs: element size 16B: MP/MC:
> bulk (size: 8): 60.65 elem APIs: element size 16B: MP/MC: bulk (size: 32):
> 75.04
> 
> ### Testing empty bulk deq ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.16 elem APIs: element
> size 16B: MP/MC: bulk (size: 8): 7.16
> 
> ### Testing using two hyperthreads ###
> elem APIs: element size 16B: SP/SC: bulk (size: 8): 12.14 elem APIs: element
> size 16B: MP/MC: bulk (size: 8): 15.56 elem APIs: element size 16B: SP/SC:
> bulk (size: 32): 3.22 elem APIs: element size 16B: MP/MC: bulk (size: 32): 3.86
> 
> ### Testing using all slave nodes ###
> 
> Bulk enq/dequeue count on size 8
> Core [68] count = 372618
> Core [69] count = 372415
> Total count (size: 8): 745033
> 
> Bulk enq/dequeue count on size 32
> Core [68] count = 318784
> Core [69] count = 316066
> Total count (size: 32): 634850
> Test OK
  
Feifei Wang July 8, 2020, 3:19 a.m. UTC | #8
> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: 2020年7月8日 0:22
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Feifei Wang
> <Feifei.Wang2@arm.com>; drc@linux.vnet.ibm.com
> Cc: dev@dpdk.org; nd <nd@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH 3/3] ring: use element APIs to implement legacy APIs
> 
> <snip>
> 
> >
> > Hi Feifei,
> >
> >  > Hi, Konstantin, David
> > >
> > > I'm Feifei Wang from Arm. Sorry to make the following request:
> > > Would you please do some ring performance tests of this patch in
> > > your
> > platforms at the time you are free?
> > > And I want to know whether this patch has a significant impact on
> > > other
> > platforms except ARM.
> >
> > I run few tests on SKX box and so far didn’t notice any real perf difference.
> Thanks Konstantin for testing this.
> 
> > Konstantin
> >
> > > Thanks very much.
> > > Feifei
> > >
> > > > -----Original Message-----
> > > > From: Feifei Wang <feifei.wang2@arm.com>
> > > > Sent: 2020年7月3日 18:27
> > > > To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> > > > Konstantin Ananyev <konstantin.ananyev@intel.com>
> > > > Cc: dev@dpdk.org; nd <nd@arm.com>; Feifei Wang
> > > > <Feifei.Wang2@arm.com>
> > > > Subject: [PATCH 3/3] ring: use element APIs to implement legacy
> > > > APIs
> > > >
> > > > Use rte_ring_xxx_elem_xxx APIs to replace legacy API implementation.
> > > > This reduces code duplication and improves code maintenance.
> > > >
> > > > aarch64:
> > > > HW:N1sdp, 1 socket, 4 cores, 1 thread/core, 2.6GHz OS:Ubuntu
> > > > 18.04.1 LTS,
> > > > Kernel: 5.4.0+
> > > > DPDK: 20.05-rc3, Configuration: arm64-n1sdp-linux-gcc
> > > > gcc:9.2.1
> > > >
> > > > $sudo ./arm64-n1sdp-linux-gcc/app/test -l 1-2
> > > > RTE>>ring_perf_autotest
> > > >
> > > > test results on aarch64 in the case of esize 4:
> > > >
> > > >                                      without this patch   with this patch
> > > > Testing burst enq/deq
> > > > legacy APIs: SP/SC: burst (size: 8):          1.11              1.10
> > > > legacy APIs: SP/SC: burst (size: 32):         1.95              1.97
> > > > legacy APIs: MP/MC: burst (size: 8):          1.86              1.94
> > > > legacy APIs: MP/MC: burst (size: 32):         2.65              2.69
> > > > Testing bulk enq/deq
> > > > legacy APIs: SP/SC: bulk (size: 8):           1.08              1.09
> > > > legacy APIs: SP/SC: bulk (size: 32):          1.89              1.90
> > > > legacy APIs: MP/MC: bulk (size: 8):           1.85              1.98
> > > > legacy APIs: MP/MC: bulk (size: 32):          2.65              2.69
> > > >
> > > > x86:
> > > > HW: dell, CPU Intel(R) Xeon(R) Gold 6240, 2 sockets, 18
> > > > cores/socket,
> > > > 1 thread/core, 3.3GHz
> > > > OS: Ubuntu 20.04 LTS, Kernel: 5.4.0-37-generic
> > > > DPDK: 20.05-rc3, Configuration: x86_64-native-linuxapp-gcc
> > > > gcc: 9.3.0
> > > >
> > > > $sudo ./x86_64-native-linuxapp-gcc/app/test -l 14,16
> > > > RTE>>ring_perf_autotest
> > > >
> > > > test results on x86 in the case of esize 4:
> > > >
> > > >                                      without this patch   with this patch
> > > > Testing burst enq/deq
> > > > legacy APIs: SP/SC: burst (size: 8):          29.35             27.78
> > > > legacy APIs: SP/SC: burst (size: 32):         73.11             73.39
> > > > legacy APIs: MP/MC: burst (size: 8):          62.36             62.37
> > > > legacy APIs: MP/MC: burst (size: 32):         101.01            101.03
> > > > Testing bulk enq/deq
> > > > legacy APIs: SP/SC: bulk (size: 8):           25.94             29.55
> > > > legacy APIs: SP/SC: bulk (size: 32):          70.00             78.87
> > > > legacy APIs: MP/MC: bulk (size: 8):           63.41             62.48
> > > > legacy APIs: MP/MC: bulk (size: 32):          105.86            103.84
> > > >
> > > > Summary:
> > > > In aarch64 server with this patch, there is almost no performance
> > difference.
> > > > In x86 server with this patch, in some cases, the performance
> > > > slightly improve, in other cases, the performance slightly drop.
> I suggest removing the perf data from the commit message, as Konstantin
> confirmed there is no perf drop on x86 either.
Ok, that's all right.
> 
> > > >
> > > > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > > > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > > ---
> > > >  lib/librte_ring/rte_ring.h | 284
> > > > ++++---------------------------------
> > > >  1 file changed, 30 insertions(+), 254 deletions(-)
> > > >
> > > > diff --git a/lib/librte_ring/rte_ring.h
> > > > b/lib/librte_ring/rte_ring.h index 35f3f8c42..2a2190bfc 100644
> > > > --- a/lib/librte_ring/rte_ring.h
> > > > +++ b/lib/librte_ring/rte_ring.h
> > > > @@ -191,168 +191,6 @@ void rte_ring_free(struct rte_ring *r);
> > > >   */
> > > >  void rte_ring_dump(FILE *f, const struct rte_ring *r);
> > > >
> > > > -/* the actual enqueue of pointers on the ring.
> > > > - * Placed here since identical code needed in both
> > > > - * single and multi producer enqueue functions */ -#define
> > > > ENQUEUE_PTRS(r, ring_start, prod_head, obj_table, n, obj_type) do
> > > > { \ -unsigned int i; \ -const uint32_t size = (r)->size; \
> > > > -uint32_t idx = prod_head & (r)->mask; \ -obj_type *ring =
> > > > (obj_type *)ring_start; \ -if (likely(idx + n < size)) { \ -for (i
> > > > = 0; i < (n & ~0x3); i += 4, idx += 4) { \ -ring[idx] =
> > > > obj_table[i]; \ -ring[idx + 1] = obj_table[i + 1]; \ -ring[idx +
> > > > 2] = obj_table[i + 2]; \ -ring[idx + 3] = obj_table[i + 3]; \ -} \
> > > > -switch (n & 0x3) { \ -case 3: \ -ring[idx++] = obj_table[i++]; /*
> > > > fallthrough */ \ -case 2: \ -ring[idx++] = obj_table[i++]; /*
> > > > fallthrough */ \ -case 1: \ -ring[idx++] = obj_table[i++]; \ -} \
> > > > -} else { \ -for (i = 0; idx < size; i++, idx++)\ -ring[idx] =
> > > > obj_table[i]; \ -for (idx = 0; i < n; i++, idx++) \ -ring[idx] =
> > > > obj_table[i]; \ -} \ -} while (0)
> > > > -
> > > > -/* the actual copy of pointers on the ring to obj_table.
> > > > - * Placed here since identical code needed in both
> > > > - * single and multi consumer dequeue functions */ -#define
> > > > DEQUEUE_PTRS(r, ring_start, cons_head, obj_table, n, obj_type) do
> > > > { \ -unsigned int i; \ -uint32_t idx = cons_head & (r)->mask; \
> > > > -const uint32_t size = (r)->size; \ -obj_type *ring = (obj_type
> > > > *)ring_start; \ -if (likely(idx + n < size)) { \ -for (i = 0; i <
> > > > (n & ~0x3); i += 4, idx += 4) {\ -obj_table[i] = ring[idx]; \
> > > > -obj_table[i + 1] = ring[idx + 1]; \ -obj_table[i + 2] = ring[idx
> > > > + 2]; \ -obj_table[i + 3] = ring[idx + 3]; \ -} \ -switch (n &
> > > > 0x3) { \ -case 3: \ -obj_table[i++] = ring[idx++]; /* fallthrough
> > > > */ \ -case 2: \ -obj_table[i++] = ring[idx++]; /* fallthrough */ \
> > > > -case 1: \ -obj_table[i++] = ring[idx++]; \ -} \ -} else { \ -for
> > > > (i = 0; idx < size; i++, idx++) \ -obj_table[i] = ring[idx]; \
> > > > -for (idx = 0; i < n; i++, idx++) \ -obj_table[i] = ring[idx]; \
> > > > -} \ -} while (0)
> > > > -
> > > > -/* Between load and load. there might be cpu reorder in weak
> > > > model
> > > > - * (powerpc/arm).
> > > > - * There are 2 choices for the users
> > > > - * 1.use rmb() memory barrier
> > > > - * 2.use one-direction load_acquire/store_release barrier,defined
> > > > by
> > > > - * CONFIG_RTE_USE_C11_MEM_MODEL=y
> > > > - * It depends on performance test results.
> > > > - * By default, move common functions to rte_ring_generic.h
> > > > - */
> > > > -#ifdef RTE_USE_C11_MEM_MODEL
> > > > -#include "rte_ring_c11_mem.h"
> > > > -#else
> > > > -#include "rte_ring_generic.h"
> > > > -#endif
> > > > -
> > > > -/**
> > > > - * @internal Enqueue several objects on the ring
> > > > - *
> > > > -  * @param r
> > > > - *   A pointer to the ring structure.
> > > > - * @param obj_table
> > > > - *   A pointer to a table of void * pointers (objects).
> > > > - * @param n
> > > > - *   The number of objects to add in the ring from the obj_table.
> > > > - * @param behavior
> > > > - *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from
> a
> > > > ring
> > > > - *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible
> > from
> > > > ring
> > > > - * @param is_sp
> > > > - *   Indicates whether to use single producer or multi-producer head
> > update
> > > > - * @param free_space
> > > > - *   returns the amount of space after the enqueue operation has
> finished
> > > > - * @return
> > > > - *   Actual number of objects enqueued.
> > > > - *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > > > - */
> > > > -static __rte_always_inline unsigned int
> > > > -__rte_ring_do_enqueue(struct rte_ring *r, void * const
> > > > *obj_table,
> > > > - unsigned int n, enum rte_ring_queue_behavior behavior,
> > > > - unsigned int is_sp, unsigned int *free_space) -{ -uint32_t
> > > > prod_head, prod_next; -uint32_t free_entries;
> > > > -
> > > > -n = __rte_ring_move_prod_head(r, is_sp, n, behavior, -&prod_head,
> > > > &prod_next, &free_entries); -if (n == 0) -goto end;
> > > > -
> > > > -ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n, void *);
> > > > -
> > > > -update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
> > > > -end:
> > > > -if (free_space != NULL)
> > > > -*free_space = free_entries - n;
> > > > -return n;
> > > > -}
> > > > -
> > > > -/**
> > > > - * @internal Dequeue several objects from the ring
> > > > - *
> > > > - * @param r
> > > > - *   A pointer to the ring structure.
> > > > - * @param obj_table
> > > > - *   A pointer to a table of void * pointers (objects).
> > > > - * @param n
> > > > - *   The number of objects to pull from the ring.
> > > > - * @param behavior
> > > > - *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from
> a
> > > > ring
> > > > - *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible
> > from
> > > > ring
> > > > - * @param is_sc
> > > > - *   Indicates whether to use single consumer or multi-consumer head
> > > > update
> > > > - * @param available
> > > > - *   returns the number of remaining ring entries after the dequeue
> has
> > > > finished
> > > > - * @return
> > > > - *   - Actual number of objects dequeued.
> > > > - *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
> > > > - */
> > > > -static __rte_always_inline unsigned int
> > > > -__rte_ring_do_dequeue(struct rte_ring *r, void **obj_table,
> > > > - unsigned int n, enum rte_ring_queue_behavior behavior,
> > > > - unsigned int is_sc, unsigned int *available) -{ -uint32_t
> > > > cons_head, cons_next; -uint32_t entries;
> > > > -
> > > > -n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
> > > > -&cons_head, &cons_next, &entries); -if (n == 0) -goto end;
> > > > -
> > > > -DEQUEUE_PTRS(r, &r[1], cons_head, obj_table, n, void *);
> > > > -
> > > > -update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
> > > > -
> > > > -end:
> > > > -if (available != NULL)
> > > > -*available = entries - n;
> > > > -return n;
> > > > -}
> > > > -
> > > >  /**
> > > >   * Enqueue several objects on the ring (multi-producers safe).
> > > >   *
> > > > @@ -375,8 +213,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > > >   unsigned int n, unsigned int *free_space)  { -return
> > > > __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
> > > > -RTE_RING_SYNC_MT, free_space);
> > > > +return rte_ring_mp_enqueue_bulk_elem(r, obj_table, sizeof(void
> > > > +*), n, free_space);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -398,8 +236,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > > >   unsigned int n, unsigned int *free_space)  { -return
> > > > __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
> > > > -RTE_RING_SYNC_ST, free_space);
> > > > +return rte_ring_sp_enqueue_bulk_elem(r, obj_table, sizeof(void
> > > > +*), n, free_space);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -425,24 +263,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
> > > >        unsigned int n, unsigned int *free_space)  { -switch
> > > > (r->prod.sync_type) { -case RTE_RING_SYNC_MT:
> > > > -return rte_ring_mp_enqueue_bulk(r, obj_table, n, free_space);
> > > > -case RTE_RING_SYNC_ST:
> > > > -return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
> > > > -#ifdef ALLOW_EXPERIMENTAL_API -case RTE_RING_SYNC_MT_RTS:
> > > > -return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
> > > > -free_space); -case RTE_RING_SYNC_MT_HTS:
> > > > -return rte_ring_mp_hts_enqueue_bulk(r, obj_table, n,
> > > > -free_space); -#endif -}
> > > > -
> > > > -/* valid ring should never reach this point */ -RTE_ASSERT(0);
> > > > -return 0;
> > > > +return rte_ring_enqueue_bulk_elem(r, obj_table, sizeof(void *),
> > > > +n, free_space);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -462,7 +284,7 @@ rte_ring_enqueue_bulk(struct rte_ring *r, void
> > > > * const *obj_table,  static __rte_always_inline int
> > > > rte_ring_mp_enqueue(struct rte_ring *r, void *obj)  { -return
> > > > rte_ring_mp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > > > +return rte_ring_mp_enqueue_elem(r, obj, sizeof(void *));
> > > >  }
> > > >
> > > >  /**
> > > > @@ -479,7 +301,7 @@ rte_ring_mp_enqueue(struct rte_ring *r, void
> > > > *obj) static __rte_always_inline int  rte_ring_sp_enqueue(struct
> > > > rte_ring *r, void
> > > > *obj)  {
> > > > -return rte_ring_sp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
> > > > +return rte_ring_sp_enqueue_elem(r, obj, sizeof(void *));
> > > >  }
> > > >
> > > >  /**
> > > > @@ -500,7 +322,7 @@ rte_ring_sp_enqueue(struct rte_ring *r, void
> > > > *obj) static __rte_always_inline int  rte_ring_enqueue(struct
> > > > rte_ring *r, void *obj) { -return rte_ring_enqueue_bulk(r, &obj,
> > > > 1, NULL) ? 0 : -ENOBUFS;
> > > > +return rte_ring_enqueue_elem(r, obj, sizeof(void *));
> > > >  }
> > > >
> > > >  /**
> > > > @@ -525,8 +347,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > > > unsigned int n, unsigned int *available)  { -return
> > > > __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
> > > > -RTE_RING_SYNC_MT, available);
> > > > +return rte_ring_mc_dequeue_bulk_elem(r, obj_table, sizeof(void
> > > > +*), n, available);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -549,8 +371,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > > > unsigned int n, unsigned int *available)  { -return
> > > > __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
> > > > -RTE_RING_SYNC_ST, available);
> > > > +return rte_ring_sc_dequeue_bulk_elem(r, obj_table, sizeof(void
> > > > +*), n, available);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -576,22 +398,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table,
> > > > unsigned int n,  unsigned int *available)  { -switch
> > > > (r->cons.sync_type) { -case RTE_RING_SYNC_MT:
> > > > -return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
> > > > -case RTE_RING_SYNC_ST:
> > > > -return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
> > > > -#ifdef ALLOW_EXPERIMENTAL_API -case RTE_RING_SYNC_MT_RTS:
> > > > -return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n, available);
> > > > -case RTE_RING_SYNC_MT_HTS:
> > > > -return rte_ring_mc_hts_dequeue_bulk(r, obj_table, n, available);
> > > > -#endif -}
> > > > -
> > > > -/* valid ring should never reach this point */ -RTE_ASSERT(0);
> > > > -return 0;
> > > > +return rte_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
> > > > +n, available);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -612,7 +420,7 @@ rte_ring_dequeue_bulk(struct rte_ring *r, void
> > > > **obj_table, unsigned int n,  static __rte_always_inline int
> > > > rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)  { -return
> > > > rte_ring_mc_dequeue_bulk(r, obj_p, 1, NULL)  ? 0 : -ENOENT;
> > > > +return rte_ring_mc_dequeue_elem(r, obj_p, sizeof(void *));
> > > >  }
> > > >
> > > >  /**
> > > > @@ -630,7 +438,7 @@ rte_ring_mc_dequeue(struct rte_ring *r, void
> > > > **obj_p)  static __rte_always_inline int
> > > > rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)  { -return
> > > > rte_ring_sc_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> > > > +return rte_ring_sc_dequeue_elem(r, obj_p, sizeof(void *));
> > > >  }
> > > >
> > > >  /**
> > > > @@ -652,7 +460,7 @@ rte_ring_sc_dequeue(struct rte_ring *r, void
> > > > **obj_p) static __rte_always_inline int  rte_ring_dequeue(struct
> > > > rte_ring *r, void
> > > > **obj_p)  {
> > > > -return rte_ring_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
> > > > +return rte_ring_dequeue_elem(r, obj_p, sizeof(void *));
> > > >  }
> > > >
> > > >  /**
> > > > @@ -860,8 +668,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const
> *obj_table,
> > > >   unsigned int n, unsigned int *free_space)  { -return
> > > > __rte_ring_do_enqueue(r, obj_table, n, -
> RTE_RING_QUEUE_VARIABLE,
> > > > RTE_RING_SYNC_MT, free_space);
> > > > +return rte_ring_mp_enqueue_burst_elem(r, obj_table, sizeof(void
> > > > *),
> > > > +n, free_space);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -883,8 +691,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> > > >   unsigned int n, unsigned int *free_space)  { -return
> > > > __rte_ring_do_enqueue(r, obj_table, n, -
> RTE_RING_QUEUE_VARIABLE,
> > > > RTE_RING_SYNC_ST, free_space);
> > > > +return rte_ring_sp_enqueue_burst_elem(r, obj_table, sizeof(void
> > > > +*), n, free_space);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -910,24 +718,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
> > > >        unsigned int n, unsigned int *free_space)  { -switch
> > > > (r->prod.sync_type) { -case RTE_RING_SYNC_MT:
> > > > -return rte_ring_mp_enqueue_burst(r, obj_table, n, free_space);
> > > > -case RTE_RING_SYNC_ST:
> > > > -return rte_ring_sp_enqueue_burst(r, obj_table, n, free_space);
> > > > -#ifdef ALLOW_EXPERIMENTAL_API -case RTE_RING_SYNC_MT_RTS:
> > > > -return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
> > > > -free_space); -case RTE_RING_SYNC_MT_HTS:
> > > > -return rte_ring_mp_hts_enqueue_burst(r, obj_table, n,
> > > > -free_space); -#endif -}
> > > > -
> > > > -/* valid ring should never reach this point */ -RTE_ASSERT(0);
> > > > -return 0;
> > > > +return rte_ring_enqueue_burst_elem(r, obj_table, sizeof(void *),
> > > > +n, free_space);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -954,8 +746,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
> > > > unsigned int n, unsigned int *available)  { -return
> > > > __rte_ring_do_dequeue(r, obj_table, n, -
> RTE_RING_QUEUE_VARIABLE,
> > > > RTE_RING_SYNC_MT, available);
> > > > +return rte_ring_mc_dequeue_burst_elem(r, obj_table, sizeof(void
> > > > *),
> > > > +n, available);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -979,8 +771,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
> > > > unsigned int n, unsigned int *available)  { -return
> > > > __rte_ring_do_dequeue(r, obj_table, n, -
> RTE_RING_QUEUE_VARIABLE,
> > > > RTE_RING_SYNC_ST, available);
> > > > +return rte_ring_sc_dequeue_burst_elem(r, obj_table, sizeof(void
> > > > +*), n, available);
> > > >  }
> > > >
> > > >  /**
> > > > @@ -1006,24 +798,8 @@ static __rte_always_inline unsigned int
> > > > rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
> > > > unsigned int n, unsigned int *available)  { -switch
> > > > (r->cons.sync_type) { -case RTE_RING_SYNC_MT:
> > > > -return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
> > > > -case RTE_RING_SYNC_ST:
> > > > -return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
> > > > -#ifdef ALLOW_EXPERIMENTAL_API -case RTE_RING_SYNC_MT_RTS:
> > > > -return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
> > > > -available); -case RTE_RING_SYNC_MT_HTS:
> > > > -return rte_ring_mc_hts_dequeue_burst(r, obj_table, n,
> > > > -available); -#endif -}
> > > > -
> > > > -/* valid ring should never reach this point */ -RTE_ASSERT(0);
> > > > -return 0;
> > > > +return rte_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
> > > > +n, available);
> > > >  }
> > > >
> > > >  #ifdef __cplusplus
> > > > --
> > > > 2.17.1
>
  

Patch

diff --git a/lib/librte_ring/rte_ring.h b/lib/librte_ring/rte_ring.h
index 35f3f8c42..2a2190bfc 100644
--- a/lib/librte_ring/rte_ring.h
+++ b/lib/librte_ring/rte_ring.h
@@ -191,168 +191,6 @@  void rte_ring_free(struct rte_ring *r);
  */
 void rte_ring_dump(FILE *f, const struct rte_ring *r);
 
-/* the actual enqueue of pointers on the ring.
- * Placed here since identical code needed in both
- * single and multi producer enqueue functions */
-#define ENQUEUE_PTRS(r, ring_start, prod_head, obj_table, n, obj_type) do { \
-	unsigned int i; \
-	const uint32_t size = (r)->size; \
-	uint32_t idx = prod_head & (r)->mask; \
-	obj_type *ring = (obj_type *)ring_start; \
-	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) { \
-			ring[idx] = obj_table[i]; \
-			ring[idx + 1] = obj_table[i + 1]; \
-			ring[idx + 2] = obj_table[i + 2]; \
-			ring[idx + 3] = obj_table[i + 3]; \
-		} \
-		switch (n & 0x3) { \
-		case 3: \
-			ring[idx++] = obj_table[i++]; /* fallthrough */ \
-		case 2: \
-			ring[idx++] = obj_table[i++]; /* fallthrough */ \
-		case 1: \
-			ring[idx++] = obj_table[i++]; \
-		} \
-	} else { \
-		for (i = 0; idx < size; i++, idx++)\
-			ring[idx] = obj_table[i]; \
-		for (idx = 0; i < n; i++, idx++) \
-			ring[idx] = obj_table[i]; \
-	} \
-} while (0)
-
-/* the actual copy of pointers on the ring to obj_table.
- * Placed here since identical code needed in both
- * single and multi consumer dequeue functions */
-#define DEQUEUE_PTRS(r, ring_start, cons_head, obj_table, n, obj_type) do { \
-	unsigned int i; \
-	uint32_t idx = cons_head & (r)->mask; \
-	const uint32_t size = (r)->size; \
-	obj_type *ring = (obj_type *)ring_start; \
-	if (likely(idx + n < size)) { \
-		for (i = 0; i < (n & ~0x3); i += 4, idx += 4) {\
-			obj_table[i] = ring[idx]; \
-			obj_table[i + 1] = ring[idx + 1]; \
-			obj_table[i + 2] = ring[idx + 2]; \
-			obj_table[i + 3] = ring[idx + 3]; \
-		} \
-		switch (n & 0x3) { \
-		case 3: \
-			obj_table[i++] = ring[idx++]; /* fallthrough */ \
-		case 2: \
-			obj_table[i++] = ring[idx++]; /* fallthrough */ \
-		case 1: \
-			obj_table[i++] = ring[idx++]; \
-		} \
-	} else { \
-		for (i = 0; idx < size; i++, idx++) \
-			obj_table[i] = ring[idx]; \
-		for (idx = 0; i < n; i++, idx++) \
-			obj_table[i] = ring[idx]; \
-	} \
-} while (0)
-
-/* Between load and load. there might be cpu reorder in weak model
- * (powerpc/arm).
- * There are 2 choices for the users
- * 1.use rmb() memory barrier
- * 2.use one-direction load_acquire/store_release barrier,defined by
- * CONFIG_RTE_USE_C11_MEM_MODEL=y
- * It depends on performance test results.
- * By default, move common functions to rte_ring_generic.h
- */
-#ifdef RTE_USE_C11_MEM_MODEL
-#include "rte_ring_c11_mem.h"
-#else
-#include "rte_ring_generic.h"
-#endif
-
-/**
- * @internal Enqueue several objects on the ring
- *
-  * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects).
- * @param n
- *   The number of objects to add in the ring from the obj_table.
- * @param behavior
- *   RTE_RING_QUEUE_FIXED:    Enqueue a fixed number of items from a ring
- *   RTE_RING_QUEUE_VARIABLE: Enqueue as many items as possible from ring
- * @param is_sp
- *   Indicates whether to use single producer or multi-producer head update
- * @param free_space
- *   returns the amount of space after the enqueue operation has finished
- * @return
- *   Actual number of objects enqueued.
- *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
- */
-static __rte_always_inline unsigned int
-__rte_ring_do_enqueue(struct rte_ring *r, void * const *obj_table,
-		 unsigned int n, enum rte_ring_queue_behavior behavior,
-		 unsigned int is_sp, unsigned int *free_space)
-{
-	uint32_t prod_head, prod_next;
-	uint32_t free_entries;
-
-	n = __rte_ring_move_prod_head(r, is_sp, n, behavior,
-			&prod_head, &prod_next, &free_entries);
-	if (n == 0)
-		goto end;
-
-	ENQUEUE_PTRS(r, &r[1], prod_head, obj_table, n, void *);
-
-	update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
-end:
-	if (free_space != NULL)
-		*free_space = free_entries - n;
-	return n;
-}
-
-/**
- * @internal Dequeue several objects from the ring
- *
- * @param r
- *   A pointer to the ring structure.
- * @param obj_table
- *   A pointer to a table of void * pointers (objects).
- * @param n
- *   The number of objects to pull from the ring.
- * @param behavior
- *   RTE_RING_QUEUE_FIXED:    Dequeue a fixed number of items from a ring
- *   RTE_RING_QUEUE_VARIABLE: Dequeue as many items as possible from ring
- * @param is_sc
- *   Indicates whether to use single consumer or multi-consumer head update
- * @param available
- *   returns the number of remaining ring entries after the dequeue has finished
- * @return
- *   - Actual number of objects dequeued.
- *     If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only.
- */
-static __rte_always_inline unsigned int
-__rte_ring_do_dequeue(struct rte_ring *r, void **obj_table,
-		 unsigned int n, enum rte_ring_queue_behavior behavior,
-		 unsigned int is_sc, unsigned int *available)
-{
-	uint32_t cons_head, cons_next;
-	uint32_t entries;
-
-	n = __rte_ring_move_cons_head(r, (int)is_sc, n, behavior,
-			&cons_head, &cons_next, &entries);
-	if (n == 0)
-		goto end;
-
-	DEQUEUE_PTRS(r, &r[1], cons_head, obj_table, n, void *);
-
-	update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
-
-end:
-	if (available != NULL)
-		*available = entries - n;
-	return n;
-}
-
 /**
  * Enqueue several objects on the ring (multi-producers safe).
  *
@@ -375,8 +213,8 @@  static __rte_always_inline unsigned int
 rte_ring_mp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			RTE_RING_SYNC_MT, free_space);
+	return rte_ring_mp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
+			n, free_space);
 }
 
 /**
@@ -398,8 +236,8 @@  static __rte_always_inline unsigned int
 rte_ring_sp_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			RTE_RING_SYNC_ST, free_space);
+	return rte_ring_sp_enqueue_bulk_elem(r, obj_table, sizeof(void *),
+			n, free_space);
 }
 
 /**
@@ -425,24 +263,8 @@  static __rte_always_inline unsigned int
 rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	switch (r->prod.sync_type) {
-	case RTE_RING_SYNC_MT:
-		return rte_ring_mp_enqueue_bulk(r, obj_table, n, free_space);
-	case RTE_RING_SYNC_ST:
-		return rte_ring_sp_enqueue_bulk(r, obj_table, n, free_space);
-#ifdef ALLOW_EXPERIMENTAL_API
-	case RTE_RING_SYNC_MT_RTS:
-		return rte_ring_mp_rts_enqueue_bulk(r, obj_table, n,
-			free_space);
-	case RTE_RING_SYNC_MT_HTS:
-		return rte_ring_mp_hts_enqueue_bulk(r, obj_table, n,
-			free_space);
-#endif
-	}
-
-	/* valid ring should never reach this point */
-	RTE_ASSERT(0);
-	return 0;
+	return rte_ring_enqueue_bulk_elem(r, obj_table, sizeof(void *),
+			n, free_space);
 }
 
 /**
@@ -462,7 +284,7 @@  rte_ring_enqueue_bulk(struct rte_ring *r, void * const *obj_table,
 static __rte_always_inline int
 rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
 {
-	return rte_ring_mp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
+	return rte_ring_mp_enqueue_elem(r, obj, sizeof(void *));
 }
 
 /**
@@ -479,7 +301,7 @@  rte_ring_mp_enqueue(struct rte_ring *r, void *obj)
 static __rte_always_inline int
 rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
 {
-	return rte_ring_sp_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
+	return rte_ring_sp_enqueue_elem(r, obj, sizeof(void *));
 }
 
 /**
@@ -500,7 +322,7 @@  rte_ring_sp_enqueue(struct rte_ring *r, void *obj)
 static __rte_always_inline int
 rte_ring_enqueue(struct rte_ring *r, void *obj)
 {
-	return rte_ring_enqueue_bulk(r, &obj, 1, NULL) ? 0 : -ENOBUFS;
+	return rte_ring_enqueue_elem(r, obj, sizeof(void *));
 }
 
 /**
@@ -525,8 +347,8 @@  static __rte_always_inline unsigned int
 rte_ring_mc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			RTE_RING_SYNC_MT, available);
+	return rte_ring_mc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
+			n, available);
 }
 
 /**
@@ -549,8 +371,8 @@  static __rte_always_inline unsigned int
 rte_ring_sc_dequeue_bulk(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n, RTE_RING_QUEUE_FIXED,
-			RTE_RING_SYNC_ST, available);
+	return rte_ring_sc_dequeue_bulk_elem(r, obj_table, sizeof(void *),
+			n, available);
 }
 
 /**
@@ -576,22 +398,8 @@  static __rte_always_inline unsigned int
 rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 		unsigned int *available)
 {
-	switch (r->cons.sync_type) {
-	case RTE_RING_SYNC_MT:
-		return rte_ring_mc_dequeue_bulk(r, obj_table, n, available);
-	case RTE_RING_SYNC_ST:
-		return rte_ring_sc_dequeue_bulk(r, obj_table, n, available);
-#ifdef ALLOW_EXPERIMENTAL_API
-	case RTE_RING_SYNC_MT_RTS:
-		return rte_ring_mc_rts_dequeue_bulk(r, obj_table, n, available);
-	case RTE_RING_SYNC_MT_HTS:
-		return rte_ring_mc_hts_dequeue_bulk(r, obj_table, n, available);
-#endif
-	}
-
-	/* valid ring should never reach this point */
-	RTE_ASSERT(0);
-	return 0;
+	return rte_ring_dequeue_bulk_elem(r, obj_table, sizeof(void *),
+			n, available);
 }
 
 /**
@@ -612,7 +420,7 @@  rte_ring_dequeue_bulk(struct rte_ring *r, void **obj_table, unsigned int n,
 static __rte_always_inline int
 rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_mc_dequeue_bulk(r, obj_p, 1, NULL)  ? 0 : -ENOENT;
+	return rte_ring_mc_dequeue_elem(r, obj_p, sizeof(void *));
 }
 
 /**
@@ -630,7 +438,7 @@  rte_ring_mc_dequeue(struct rte_ring *r, void **obj_p)
 static __rte_always_inline int
 rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_sc_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
+	return rte_ring_sc_dequeue_elem(r, obj_p, sizeof(void *));
 }
 
 /**
@@ -652,7 +460,7 @@  rte_ring_sc_dequeue(struct rte_ring *r, void **obj_p)
 static __rte_always_inline int
 rte_ring_dequeue(struct rte_ring *r, void **obj_p)
 {
-	return rte_ring_dequeue_bulk(r, obj_p, 1, NULL) ? 0 : -ENOENT;
+	return rte_ring_dequeue_elem(r, obj_p, sizeof(void *));
 }
 
 /**
@@ -860,8 +668,8 @@  static __rte_always_inline unsigned int
 rte_ring_mp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, free_space);
+	return rte_ring_mp_enqueue_burst_elem(r, obj_table, sizeof(void *),
+			n, free_space);
 }
 
 /**
@@ -883,8 +691,8 @@  static __rte_always_inline unsigned int
 rte_ring_sp_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 			 unsigned int n, unsigned int *free_space)
 {
-	return __rte_ring_do_enqueue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, free_space);
+	return rte_ring_sp_enqueue_burst_elem(r, obj_table, sizeof(void *),
+			n, free_space);
 }
 
 /**
@@ -910,24 +718,8 @@  static __rte_always_inline unsigned int
 rte_ring_enqueue_burst(struct rte_ring *r, void * const *obj_table,
 		      unsigned int n, unsigned int *free_space)
 {
-	switch (r->prod.sync_type) {
-	case RTE_RING_SYNC_MT:
-		return rte_ring_mp_enqueue_burst(r, obj_table, n, free_space);
-	case RTE_RING_SYNC_ST:
-		return rte_ring_sp_enqueue_burst(r, obj_table, n, free_space);
-#ifdef ALLOW_EXPERIMENTAL_API
-	case RTE_RING_SYNC_MT_RTS:
-		return rte_ring_mp_rts_enqueue_burst(r, obj_table, n,
-			free_space);
-	case RTE_RING_SYNC_MT_HTS:
-		return rte_ring_mp_hts_enqueue_burst(r, obj_table, n,
-			free_space);
-#endif
-	}
-
-	/* valid ring should never reach this point */
-	RTE_ASSERT(0);
-	return 0;
+	return rte_ring_enqueue_burst_elem(r, obj_table, sizeof(void *),
+			n, free_space);
 }
 
 /**
@@ -954,8 +746,8 @@  static __rte_always_inline unsigned int
 rte_ring_mc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_MT, available);
+	return rte_ring_mc_dequeue_burst_elem(r, obj_table, sizeof(void *),
+			n, available);
 }
 
 /**
@@ -979,8 +771,8 @@  static __rte_always_inline unsigned int
 rte_ring_sc_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
-	return __rte_ring_do_dequeue(r, obj_table, n,
-			RTE_RING_QUEUE_VARIABLE, RTE_RING_SYNC_ST, available);
+	return rte_ring_sc_dequeue_burst_elem(r, obj_table, sizeof(void *),
+			n, available);
 }
 
 /**
@@ -1006,24 +798,8 @@  static __rte_always_inline unsigned int
 rte_ring_dequeue_burst(struct rte_ring *r, void **obj_table,
 		unsigned int n, unsigned int *available)
 {
-	switch (r->cons.sync_type) {
-	case RTE_RING_SYNC_MT:
-		return rte_ring_mc_dequeue_burst(r, obj_table, n, available);
-	case RTE_RING_SYNC_ST:
-		return rte_ring_sc_dequeue_burst(r, obj_table, n, available);
-#ifdef ALLOW_EXPERIMENTAL_API
-	case RTE_RING_SYNC_MT_RTS:
-		return rte_ring_mc_rts_dequeue_burst(r, obj_table, n,
-			available);
-	case RTE_RING_SYNC_MT_HTS:
-		return rte_ring_mc_hts_dequeue_burst(r, obj_table, n,
-			available);
-#endif
-	}
-
-	/* valid ring should never reach this point */
-	RTE_ASSERT(0);
-	return 0;
+	return rte_ring_dequeue_burst_elem(r, obj_table, sizeof(void *),
+			n, available);
 }
 
 #ifdef __cplusplus