[2/2] net/i40e: remove compiler barrier for aarch64

Message ID 1565693011-33998-3-git-send-email-gavin.hu@arm.com (mailing list archive)
State Accepted, archived
Delegated to: Qi Zhang
Headers
Series i40e neon vPMD optiomization for aarch64 |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Gavin Hu Aug. 13, 2019, 10:43 a.m. UTC
  As packet length extraction code was simplified,the ordering
was not necessary any more.[1]

2% performance gain was measured on Marvell ThunderX2.
4.3% performance gain was measure on Ampere eMAG80

[1] http://mails.dpdk.org/archives/dev/2016-April/037529.html

Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
Cc: stable@dpdk.org

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
---
 drivers/net/i40e/i40e_rxtx_vec_neon.c | 3 ---
 1 file changed, 3 deletions(-)
  

Comments

Honnappa Nagarahalli Aug. 28, 2019, 10:48 p.m. UTC | #1
> 
> As packet length extraction code was simplified,the ordering was not
> necessary any more.[1]
IMO, there is no relationship between the compiler barrier and [1] at least on Arm platforms. I suggest we just say 'there is no reason for the compiler barrier'.
I think this compiler barrier is not required for x86/PPC as well.

> 
> 2% performance gain was measured on Marvell ThunderX2.
> 4.3% performance gain was measure on Ampere eMAG80
> 
> [1] http://mails.dpdk.org/archives/dev/2016-April/037529.html
> 
> Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> ---
>  drivers/net/i40e/i40e_rxtx_vec_neon.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> index 5555e9b..864eb9a 100644
> --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> @@ -307,9 +307,6 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq,
> struct rte_mbuf **rx_pkts,
>  			rte_mbuf_prefetch_part2(rx_pkts[pos + 3]);
>  		}
> 
> -		/* avoid compiler reorder optimization */
> -		rte_compiler_barrier();
> -
>  		/* pkt 3,4 shift the pktlen field to be 16-bit aligned*/
>  		uint32x4_t len3 =
> vshlq_u32(vreinterpretq_u32_u64(descs[3]),
>  					    len_shl);
> --
> 2.7.4
  
Gavin Hu Aug. 30, 2019, 8:51 a.m. UTC | #2
Hi Honnappa,

> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Thursday, August 29, 2019 6:49 AM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>;
> dev@dpdk.org
> Cc: nd <nd@arm.com>; thomas@monjalon.net; jerinj@marvell.com;
> pbhagavatula@marvell.com; qi.z.zhang@intel.com;
> bruce.richardson@intel.com; stable@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH 2/2] net/i40e: remove compiler barrier for aarch64
> 
> >
> > As packet length extraction code was simplified,the ordering was not
> > necessary any more.[1]
> IMO, there is no relationship between the compiler barrier and [1] at least
> on Arm platforms. I suggest we just say 'there is no reason for the compiler
> barrier'.
> I think this compiler barrier is not required for x86/PPC as well.

The compiler barrier was ever really required for x86, as the two accesses to the desc[] entry must be ordered. 
After [1] was applied, the first access was removed, then there is no reason for the compiler barrier.
For aarch64, it borrows the barrier and does not change according to the new code, so the barrier can be removed also.

Hopefully I got the whole story across clearly and completely. 

> 
> >
> > 2% performance gain was measured on Marvell ThunderX2.
> > 4.3% performance gain was measure on Ampere eMAG80
> >
> > [1] http://mails.dpdk.org/archives/dev/2016-April/037529.html
> >
> > Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Steve Capper <steve.capper@arm.com>
> > ---
> >  drivers/net/i40e/i40e_rxtx_vec_neon.c | 3 ---
> >  1 file changed, 3 deletions(-)
> >
> > diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > index 5555e9b..864eb9a 100644
> > --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > @@ -307,9 +307,6 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq,
> > struct rte_mbuf **rx_pkts,
> >  			rte_mbuf_prefetch_part2(rx_pkts[pos + 3]);
> >  		}
> >
> > -		/* avoid compiler reorder optimization */
> > -		rte_compiler_barrier();
> > -
> >  		/* pkt 3,4 shift the pktlen field to be 16-bit aligned*/
> >  		uint32x4_t len3 =
> > vshlq_u32(vreinterpretq_u32_u64(descs[3]),
> >  					    len_shl);
> > --
> > 2.7.4
  

Patch

diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c
index 5555e9b..864eb9a 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
@@ -307,9 +307,6 @@  _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 			rte_mbuf_prefetch_part2(rx_pkts[pos + 3]);
 		}
 
-		/* avoid compiler reorder optimization */
-		rte_compiler_barrier();
-
 		/* pkt 3,4 shift the pktlen field to be 16-bit aligned*/
 		uint32x4_t len3 = vshlq_u32(vreinterpretq_u32_u64(descs[3]),
 					    len_shl);