[v4,4/4] ring: move the atomic load of head above the loop
Checks
Commit Message
In __rte_ring_move_prod_head, move the __atomic_load_n up and out of
the do {} while loop as upon failure the old_head will be updated,
another load is costly and not necessary.
This helps a little on the latency,about 1~5%.
Test result with the patch(two cores):
SP/SC bulk enq/dequeue (size: 8): 5.64
MP/MC bulk enq/dequeue (size: 8): 9.58
SP/SC bulk enq/dequeue (size: 32): 1.98
MP/MC bulk enq/dequeue (size: 32): 2.30
Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
Cc: stable@dpdk.org
Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
---
lib/librte_ring/rte_ring_c11_mem.h | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
Comments
17/09/2018 10:11, Gavin Hu:
> In __rte_ring_move_prod_head, move the __atomic_load_n up and out of
> the do {} while loop as upon failure the old_head will be updated,
> another load is costly and not necessary.
>
> This helps a little on the latency,about 1~5%.
>
> Test result with the patch(two cores):
> SP/SC bulk enq/dequeue (size: 8): 5.64
> MP/MC bulk enq/dequeue (size: 8): 9.58
> SP/SC bulk enq/dequeue (size: 32): 1.98
> MP/MC bulk enq/dequeue (size: 32): 2.30
>
> Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
> Cc: stable@dpdk.org
>
> Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
We are missing reviews and acknowledgements on this series.
@@ -61,13 +61,11 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
unsigned int max = n;
int success;
+ *old_head = __atomic_load_n(&r->prod.head, __ATOMIC_ACQUIRE);
do {
/* Reset n to the initial burst count */
n = max;
- *old_head = __atomic_load_n(&r->prod.head,
- __ATOMIC_ACQUIRE);
-
/* load-acquire synchronize with store-release of ht->tail
* in update_tail.
*/
@@ -93,6 +91,7 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
if (is_sp)
r->prod.head = *new_head, success = 1;
else
+ /* on failure, *old_head is updated */
success = __atomic_compare_exchange_n(&r->prod.head,
old_head, *new_head,
0, __ATOMIC_ACQUIRE,
@@ -134,13 +133,11 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
int success;
/* move cons.head atomically */
+ *old_head = __atomic_load_n(&r->cons.head, __ATOMIC_ACQUIRE);
do {
/* Restore n as it may change every loop */
n = max;
- *old_head = __atomic_load_n(&r->cons.head,
- __ATOMIC_ACQUIRE);
-
/* this load-acquire synchronize with store-release of ht->tail
* in update_tail.
*/
@@ -165,6 +162,7 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
if (is_sc)
r->cons.head = *new_head, success = 1;
else
+ /* on failure, *old_head will be updated */
success = __atomic_compare_exchange_n(&r->cons.head,
old_head, *new_head,
0, __ATOMIC_ACQUIRE,