[1/2] net/i40e: desc loading is unnecessarily ordered for aarch64

Message ID 1565693011-33998-2-git-send-email-gavin.hu@arm.com (mailing list archive)
State Accepted, archived
Delegated to: Qi Zhang
Headers
Series i40e neon vPMD optiomization for aarch64 |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/iol-Compile-Testing success Compile Testing PASS
ci/intel-Performance-Testing success Performance Testing PASS
ci/mellanox-Performance-Testing success Performance Testing PASS

Commit Message

Gavin Hu Aug. 13, 2019, 10:43 a.m. UTC
  For x86, the descriptors needs to be loaded in order, so in between two
descriptors loading, there is a compiler barrier in place.[1]
For aarch64, a patch [2] is in place to survive with discontinuous DD bits,
the barriers can be removed to take full advantage of out-of-order
execution.

50% performance gain in the RFC2544 NDR test was measured on ThunderX2.
12.50% performan gain in the RFC2544 NDR test was measured on Ampere
eMAG80 platform.

[1] http://inbox.dpdk.org/users/039ED4275CED7440929022BC67E7061153D71548@
SHSMSX105.ccr.corp.intel.com/
[2] https://mails.dpdk.org/archives/stable/2017-October/003324.html

Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
---
 drivers/net/i40e/i40e_rxtx_vec_neon.c | 1 -
 1 file changed, 1 deletion(-)
  

Comments

Honnappa Nagarahalli Aug. 28, 2019, 10:09 p.m. UTC | #1
Thanks Gavin, few comments are inline

> -----Original Message-----
> From: Gavin Hu <gavin.hu@arm.com>
> Sent: Tuesday, August 13, 2019 5:44 AM
> To: dev@dpdk.org
> Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
> pbhagavatula@marvell.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; qi.z.zhang@intel.com;
> bruce.richardson@intel.com; stable@dpdk.org
> Subject: [PATCH 1/2] net/i40e: desc loading is unnecessarily ordered for
> aarch64
> 
> For x86, the descriptors needs to be loaded in order, so in between two
> descriptors loading, there is a compiler barrier in place.
IMO, we can skip the above as this change applies to Arm platforms. Instead, capture this in the code in comments to explain why the ordering of the loads is not required. This will help others reading the code. 

[1] For aarch64, a
> patch [2] is in place to survive with discontinuous DD bits, the barriers can be
> removed to take full advantage of out-of-order execution.
> 
> 50% performance gain in the RFC2544 NDR test was measured on ThunderX2.
> 12.50% performan gain in the RFC2544 NDR test was measured on Ampere
> eMAG80 platform.
> 
> [1]
> http://inbox.dpdk.org/users/039ED4275CED7440929022BC67E7061153D71
> 548@
> SHSMSX105.ccr.corp.intel.com/
> [2] https://mails.dpdk.org/archives/stable/2017-October/003324.html
> 
> Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> ---
>  drivers/net/i40e/i40e_rxtx_vec_neon.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> index 83572ef..5555e9b 100644
> --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> @@ -285,7 +285,6 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq,
> struct rte_mbuf **rx_pkts,
>  		/* Read desc statuses backwards to avoid race condition */
>  		/* A.1 load 4 pkts desc */
>  		descs[3] =  vld1q_u64((uint64_t *)(rxdp + 3));
> -		rte_rmb();
> 
>  		/* B.2 copy 2 mbuf point into rx_pkts  */
>  		vst1q_u64((uint64_t *)&rx_pkts[pos], mbp1);
> --
> 2.7.4
  
Gavin Hu Aug. 30, 2019, 8:33 a.m. UTC | #2
Hi Honnappa,

> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Thursday, August 29, 2019 6:10 AM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
> dev@dpdk.org
> Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
> pbhagavatula@marvell.com; qi.z.zhang@intel.com;
> bruce.richardson@intel.com; stable@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH 1/2] net/i40e: desc loading is unnecessarily ordered for
> aarch64
> 
> Thanks Gavin, few comments are inline
> 
> > -----Original Message-----
> > From: Gavin Hu <gavin.hu@arm.com>
> > Sent: Tuesday, August 13, 2019 5:44 AM
> > To: dev@dpdk.org
> > Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
> > pbhagavatula@marvell.com; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com>; qi.z.zhang@intel.com;
> > bruce.richardson@intel.com; stable@dpdk.org
> > Subject: [PATCH 1/2] net/i40e: desc loading is unnecessarily ordered for
> > aarch64
> >
> > For x86, the descriptors needs to be loaded in order, so in between two
> > descriptors loading, there is a compiler barrier in place.
> IMO, we can skip the above as this change applies to Arm platforms. Instead,
> capture this in the code in comments to explain why the ordering of the
> loads is not required. This will help others reading the code.

As the line of code was removed, there is no suitable place to add a comment.
Instead adding it in the commit log makes the story complete and easy to understand. 

> [1] For aarch64, a
> > patch [2] is in place to survive with discontinuous DD bits, the barriers can
> be
> > removed to take full advantage of out-of-order execution.
> >
> > 50% performance gain in the RFC2544 NDR test was measured on
> ThunderX2.
> > 12.50% performan gain in the RFC2544 NDR test was measured on
> Ampere
> > eMAG80 platform.
> >
> > [1]
> >
> http://inbox.dpdk.org/users/039ED4275CED7440929022BC67E7061153D71
> > 548@
> > SHSMSX105.ccr.corp.intel.com/
> > [2] https://mails.dpdk.org/archives/stable/2017-October/003324.html
> >
> > Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Steve Capper <steve.capper@arm.com>
> > ---
> >  drivers/net/i40e/i40e_rxtx_vec_neon.c | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > index 83572ef..5555e9b 100644
> > --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > @@ -285,7 +285,6 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq,
> > struct rte_mbuf **rx_pkts,
> >  		/* Read desc statuses backwards to avoid race condition */
> >  		/* A.1 load 4 pkts desc */
> >  		descs[3] =  vld1q_u64((uint64_t *)(rxdp + 3));
> > -		rte_rmb();
> >
> >  		/* B.2 copy 2 mbuf point into rx_pkts  */
> >  		vst1q_u64((uint64_t *)&rx_pkts[pos], mbp1);
> > --
> > 2.7.4
  

Patch

diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c
index 83572ef..5555e9b 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
@@ -285,7 +285,6 @@  _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		/* Read desc statuses backwards to avoid race condition */
 		/* A.1 load 4 pkts desc */
 		descs[3] =  vld1q_u64((uint64_t *)(rxdp + 3));
-		rte_rmb();
 
 		/* B.2 copy 2 mbuf point into rx_pkts  */
 		vst1q_u64((uint64_t *)&rx_pkts[pos], mbp1);