[v3,3/3] ring: move the atomic load of head above the loop

Message ID 1537172244-64874-3-git-send-email-gavin.hu@arm.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [v3,1/3] ring: read tail using atomic load |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Gavin Hu Sept. 17, 2018, 8:17 a.m. UTC
  In __rte_ring_move_prod_head, move the __atomic_load_n up and out of
the do {} while loop as upon failure the old_head will be updated,
another load is costly and not necessary.

This helps a little on the latency,about 1~5%.

 Test result with the patch(two cores):
 SP/SC bulk enq/dequeue (size: 8): 5.64
 MP/MC bulk enq/dequeue (size: 8): 9.58
 SP/SC bulk enq/dequeue (size: 32): 1.98
 MP/MC bulk enq/dequeue (size: 32): 2.30

Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
---
 lib/librte_ring/rte_ring_c11_mem.h | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)
  

Comments

Gavin Hu Sept. 26, 2018, 9:29 a.m. UTC | #1
+Justin He for review

> -----Original Message-----
> From: Gavin Hu <gavin.hu@arm.com>
> Sent: Monday, September 17, 2018 4:17 PM
> To: dev@dpdk.org
> Cc: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Steve Capper
> <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>;
> jerin.jacob@caviumnetworks.com; nd <nd@arm.com>; stable@dpdk.org
> Subject: [PATCH v3 3/3] ring: move the atomic load of head above the loop
> 
> In __rte_ring_move_prod_head, move the __atomic_load_n up and out of
> the do {} while loop as upon failure the old_head will be updated, another
> load is costly and not necessary.
> 
> This helps a little on the latency,about 1~5%.
> 
>  Test result with the patch(two cores):
>  SP/SC bulk enq/dequeue (size: 8): 5.64
>  MP/MC bulk enq/dequeue (size: 8): 9.58
>  SP/SC bulk enq/dequeue (size: 32): 1.98  MP/MC bulk enq/dequeue (size:
> 32): 2.30
> 
> Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> ---
>  lib/librte_ring/rte_ring_c11_mem.h | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/librte_ring/rte_ring_c11_mem.h
> b/lib/librte_ring/rte_ring_c11_mem.h
> index 0eae3b3..95cc508 100644
> --- a/lib/librte_ring/rte_ring_c11_mem.h
> +++ b/lib/librte_ring/rte_ring_c11_mem.h
> @@ -61,13 +61,11 @@ __rte_ring_move_prod_head(struct rte_ring *r,
> unsigned int is_sp,
>  	unsigned int max = n;
>  	int success;
> 
> +	*old_head = __atomic_load_n(&r->prod.head,
> __ATOMIC_ACQUIRE);
>  	do {
>  		/* Reset n to the initial burst count */
>  		n = max;
> 
> -		*old_head = __atomic_load_n(&r->prod.head,
> -					__ATOMIC_ACQUIRE);
> -
>  		/* load-acquire synchronize with store-release of ht->tail
>  		 * in update_tail.
>  		 */
> @@ -93,6 +91,7 @@ __rte_ring_move_prod_head(struct rte_ring *r,
> unsigned int is_sp,
>  		if (is_sp)
>  			r->prod.head = *new_head, success = 1;
>  		else
> +			/* on failure, *old_head is updated */
>  			success = __atomic_compare_exchange_n(&r-
> >prod.head,
>  					old_head, *new_head,
>  					0, __ATOMIC_ACQUIRE,
> @@ -134,13 +133,11 @@ __rte_ring_move_cons_head(struct rte_ring *r, int
> is_sc,
>  	int success;
> 
>  	/* move cons.head atomically */
> +	*old_head = __atomic_load_n(&r->cons.head,
> __ATOMIC_ACQUIRE);
>  	do {
>  		/* Restore n as it may change every loop */
>  		n = max;
> 
> -		*old_head = __atomic_load_n(&r->cons.head,
> -					__ATOMIC_ACQUIRE);
> -
>  		/* this load-acquire synchronize with store-release of ht->tail
>  		 * in update_tail.
>  		 */
> @@ -165,6 +162,7 @@ __rte_ring_move_cons_head(struct rte_ring *r, int
> is_sc,
>  		if (is_sc)
>  			r->cons.head = *new_head, success = 1;
>  		else
> +			/* on failure, *old_head will be updated */
>  			success = __atomic_compare_exchange_n(&r-
> >cons.head,
>  							old_head,
> *new_head,
>  							0,
> __ATOMIC_ACQUIRE,
> --
> 2.7.4
  
Justin He Sept. 26, 2018, 10:06 a.m. UTC | #2
> -----Original Message-----
> From: Gavin Hu (Arm Technology China)
> Sent: 2018年9月26日 17:30
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; dev@dpdk.org
> Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Steve Capper
> <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>;
> jerin.jacob@caviumnetworks.com; nd <nd@arm.com>; stable@dpdk.org; Justin
> He <Justin.He@arm.com>
> Subject: RE: [PATCH v3 3/3] ring: move the atomic load of head above the loop
>
> +Justin He for review
>
> > -----Original Message-----
> > From: Gavin Hu <gavin.hu@arm.com>
> > Sent: Monday, September 17, 2018 4:17 PM
> > To: dev@dpdk.org
> > Cc: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Honnappa
> > Nagarahalli <Honnappa.Nagarahalli@arm.com>; Steve Capper
> > <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>;
> > jerin.jacob@caviumnetworks.com; nd <nd@arm.com>; stable@dpdk.org
> > Subject: [PATCH v3 3/3] ring: move the atomic load of head above the loop
> >
> > In __rte_ring_move_prod_head, move the __atomic_load_n up and out of
> > the do {} while loop as upon failure the old_head will be updated, another
> > load is costly and not necessary.
> >
> > This helps a little on the latency,about 1~5%.
> >
> >  Test result with the patch(two cores):
> >  SP/SC bulk enq/dequeue (size: 8): 5.64
> >  MP/MC bulk enq/dequeue (size: 8): 9.58
> >  SP/SC bulk enq/dequeue (size: 32): 1.98  MP/MC bulk enq/dequeue (size:
> > 32): 2.30
> >
> > Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Reviewed-by: Steve Capper <steve.capper@arm.com>
> > Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > ---
> >  lib/librte_ring/rte_ring_c11_mem.h | 10 ++++------
> >  1 file changed, 4 insertions(+), 6 deletions(-)
> >
> > diff --git a/lib/librte_ring/rte_ring_c11_mem.h
> > b/lib/librte_ring/rte_ring_c11_mem.h
> > index 0eae3b3..95cc508 100644
> > --- a/lib/librte_ring/rte_ring_c11_mem.h
> > +++ b/lib/librte_ring/rte_ring_c11_mem.h
> > @@ -61,13 +61,11 @@ __rte_ring_move_prod_head(struct rte_ring *r,
> > unsigned int is_sp,
> >  unsigned int max = n;
> >  int success;
> >
> > +*old_head = __atomic_load_n(&r->prod.head,
> > __ATOMIC_ACQUIRE);
> >  do {
> >  /* Reset n to the initial burst count */
> >  n = max;
> >
> > -*old_head = __atomic_load_n(&r->prod.head,
> > -__ATOMIC_ACQUIRE);
> > -
> >  /* load-acquire synchronize with store-release of ht->tail
> >   * in update_tail.
> >   */
> > @@ -93,6 +91,7 @@ __rte_ring_move_prod_head(struct rte_ring *r,
> > unsigned int is_sp,
> >  if (is_sp)
> >  r->prod.head = *new_head, success = 1;
> >  else
> > +/* on failure, *old_head is updated */
> >  success = __atomic_compare_exchange_n(&r-
> > >prod.head,
> >  old_head, *new_head,
> >  0, __ATOMIC_ACQUIRE,
> > @@ -134,13 +133,11 @@ __rte_ring_move_cons_head(struct rte_ring *r, int
> > is_sc,
> >  int success;
> >
> >  /* move cons.head atomically */
> > +*old_head = __atomic_load_n(&r->cons.head,
> > __ATOMIC_ACQUIRE);
> >  do {
> >  /* Restore n as it may change every loop */
> >  n = max;
> >
> > -*old_head = __atomic_load_n(&r->cons.head,
> > -__ATOMIC_ACQUIRE);
> > -
> >  /* this load-acquire synchronize with store-release of ht->tail
> >   * in update_tail.
> >   */
> > @@ -165,6 +162,7 @@ __rte_ring_move_cons_head(struct rte_ring *r, int
> > is_sc,
> >  if (is_sc)
> >  r->cons.head = *new_head, success = 1;
> >  else
> > +/* on failure, *old_head will be updated */
> >  success = __atomic_compare_exchange_n(&r-
> > >cons.head,
> >  old_head,
> > *new_head,
> >  0,
> > __ATOMIC_ACQUIRE,
> > --
> > 2.7.4
Reviewed-by: Jia He <justin.he@arm.com>

Cheers,
Justin (Jia He)
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
  
Stephen Hemminger Sept. 29, 2018, 7:19 a.m. UTC | #3
On Wed, 26 Sep 2018 10:06:36 +0000
Justin He <Justin.He@arm.com> wrote:

> Reviewed-by: Jia He <justin.he@arm.com>
> 
> Cheers,
> Justin (Jia He)
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Please adjust your corporate mail settings to remove this automatic footer
because the footer wording creates a legal conflict with the open and public
mailing lists.
  
Jerin Jacob Sept. 29, 2018, 10:59 a.m. UTC | #4
-----Original Message-----
> Date: Mon, 17 Sep 2018 16:17:24 +0800
> From: Gavin Hu <gavin.hu@arm.com>
> To: dev@dpdk.org
> CC: gavin.hu@arm.com, Honnappa.Nagarahalli@arm.com, steve.capper@arm.com,
>  Ola.Liljedahl@arm.com, jerin.jacob@caviumnetworks.com, nd@arm.com,
>  stable@dpdk.org
> Subject: [PATCH v3 3/3] ring: move the atomic load of head above the loop
> X-Mailer: git-send-email 2.7.4
> 
> External Email
> 
> In __rte_ring_move_prod_head, move the __atomic_load_n up and out of
> the do {} while loop as upon failure the old_head will be updated,
> another load is costly and not necessary.
> 
> This helps a little on the latency,about 1~5%.
> 
>  Test result with the patch(two cores):
>  SP/SC bulk enq/dequeue (size: 8): 5.64
>  MP/MC bulk enq/dequeue (size: 8): 9.58
>  SP/SC bulk enq/dequeue (size: 32): 1.98
>  MP/MC bulk enq/dequeue (size: 32): 2.30
> 
> Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>

Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
  

Patch

diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h
index 0eae3b3..95cc508 100644
--- a/lib/librte_ring/rte_ring_c11_mem.h
+++ b/lib/librte_ring/rte_ring_c11_mem.h
@@ -61,13 +61,11 @@  __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
 	unsigned int max = n;
 	int success;
 
+	*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_ACQUIRE);
 	do {
 		/* Reset n to the initial burst count */
 		n = max;
 
-		*old_head = __atomic_load_n(&r->prod.head,
-					__ATOMIC_ACQUIRE);
-
 		/* load-acquire synchronize with store-release of ht->tail
 		 * in update_tail.
 		 */
@@ -93,6 +91,7 @@  __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
 		if (is_sp)
 			r->prod.head = *new_head, success = 1;
 		else
+			/* on failure, *old_head is updated */
 			success = __atomic_compare_exchange_n(&r->prod.head,
 					old_head, *new_head,
 					0, __ATOMIC_ACQUIRE,
@@ -134,13 +133,11 @@  __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
 	int success;
 
 	/* move cons.head atomically */
+	*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_ACQUIRE);
 	do {
 		/* Restore n as it may change every loop */
 		n = max;
 
-		*old_head = __atomic_load_n(&r->cons.head,
-					__ATOMIC_ACQUIRE);
-
 		/* this load-acquire synchronize with store-release of ht->tail
 		 * in update_tail.
 		 */
@@ -165,6 +162,7 @@  __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
 		if (is_sc)
 			r->cons.head = *new_head, success = 1;
 		else
+			/* on failure, *old_head will be updated */
 			success = __atomic_compare_exchange_n(&r->cons.head,
 							old_head, *new_head,
 							0, __ATOMIC_ACQUIRE,