[v1,2/2] ring: relaxed ordering for load and store the head

Message ID 1541763767-7399-3-git-send-email-gavin.hu@arm.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series ring C11 library fix and optimization |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Gavin Hu Nov. 9, 2018, 11:42 a.m. UTC
  When calling __atomic_compare_exchange_n, use relaxed ordering for the
success case, as multiple producers/consumers do not release updates to
each other so no need for acquire or release ordering.

Because the thread fence in place, ordering for the first iteration can be
relaxed.

Run the ring perf test on the following testbed:
HW: ThunderX2 B0 CPU CN9975 v2.0, 2 sockets, 28core,4 threads/core,2.5GHz
OS: Ubuntu 16.04.5 LTS, Kernel: 4.15.0-36-generic
DPDK: 18.08, Configuration: arm64-armv8a-linuxapp-gcc
gcc: 8.1.0
$sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 \
--socket-mem=1024 -- -i

Without the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.75
MP/MC bulk enq/dequeue (size: 8): 10.18
SP/SC bulk enq/dequeue (size: 32): 1.80
MP/MC bulk enq/dequeue (size: 32): 2.34

With the patch:
*** Testing using two physical cores ***
SP/SC bulk enq/dequeue (size: 8): 5.59
MP/MC bulk enq/dequeue (size: 8): 10.54
SP/SC bulk enq/dequeue (size: 32): 1.73
MP/MC bulk enq/dequeue (size: 32): 2.38

No significant improvement, nor regression was seen,as the optimisation is
not at the critical path.

Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
---
 lib/librte_ring/rte_ring_c11_mem.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)
  

Patch

diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h
index dc49a99..0fb73a3 100644
--- a/lib/librte_ring/rte_ring_c11_mem.h
+++ b/lib/librte_ring/rte_ring_c11_mem.h
@@ -61,7 +61,7 @@  __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
 	unsigned int max = n;
 	int success;
 
-	*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_ACQUIRE);
+	*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
 	do {
 		/* Reset n to the initial burst count */
 		n = max;
@@ -97,7 +97,7 @@  __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
 			/* on failure, *old_head is updated */
 			success = __atomic_compare_exchange_n(&r->prod.head,
 					old_head, *new_head,
-					0, __ATOMIC_ACQUIRE,
+					0, __ATOMIC_RELAXED,
 					__ATOMIC_RELAXED);
 	} while (unlikely(success == 0));
 	return n;
@@ -137,7 +137,7 @@  __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
 	int success;
 
 	/* move cons.head atomically */
-	*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_ACQUIRE);
+	*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_RELAXED);
 	do {
 		/* Restore n as it may change every loop */
 		n = max;
@@ -172,7 +172,7 @@  __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
 			/* on failure, *old_head will be updated */
 			success = __atomic_compare_exchange_n(&r->cons.head,
 							old_head, *new_head,
-							0, __ATOMIC_ACQUIRE,
+							0, __ATOMIC_RELAXED,
 							__ATOMIC_RELAXED);
 	} while (unlikely(success == 0));
 	return n;