[1/2] net/i40e: fix risk in Rx descriptor read in NEON vector path

Message ID 20210906033201.1789796-2-ruifeng.wang@arm.com (mailing list archive)
State Superseded, archived
Delegated to: Qi Zhang
Headers
Series i40e Rx descriptor loads ordering |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Ruifeng Wang Sept. 6, 2021, 3:32 a.m. UTC
  Rx descriptor is 16B/32B in size and consists of multiple words.
The word that includes DD field should be read first. Read result
with DD bit set indicates the rest part in a descriptor is valid.

In NEON vector PMD, vector load loads two contiguous 8B of
descriptor data into vector register. Given vector load ensures no
16B atomicity, read of the word that includes DD field could be
reordered after read of other words. In this case, some words could
be invalid data.

Read barrier is added after read of qword1 that includes DD field.
And qword0 is reloaded to update vector register. This ensures
what fetched is correct descriptor data.

Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 drivers/net/i40e/i40e_rxtx_vec_neon.c | 8 ++++++++
 1 file changed, 8 insertions(+)
  

Comments

Honnappa Nagarahalli Sept. 14, 2021, 6:33 p.m. UTC | #1
<snip>
Similar comments that I have to patch 2/2

> 
> Rx descriptor is 16B/32B in size and consists of multiple words.
> The word that includes DD field should be read first. Read result with DD bit
> set indicates the rest part in a descriptor is valid.
Suggest rewording as follows:
Rx descriptor is 16B/32B in size. If the DD bit is set, it indicates that the rest of the descriptor words have valid values. Hence, the word containing DD bit must be read first before reading the rest of the descriptor words.

> 
> In NEON vector PMD, vector load loads two contiguous 8B of descriptor data
> into vector register. Given vector load ensures no 16B atomicity, read of the
> word that includes DD field could be reordered after read of other words. In
> this case, some words could be invalid data.
"some words could contain invalid data"

> 
> Read barrier is added after read of qword1 that includes DD field.
> And qword0 is reloaded to update vector register. This ensures what fetched
> is correct descriptor data.
"This ensures that the fetched data is correct".

Suggest capturing the performance impact, so it is clearly documented.
> 
> Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
With the above comments,
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>

> ---
>  drivers/net/i40e/i40e_rxtx_vec_neon.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> index b2683fda60..71191c7cc8 100644
> --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> @@ -286,6 +286,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue
> *__rte_restrict rxq,
>  		descs[1] =  vld1q_u64((uint64_t *)(rxdp + 1));
>  		descs[0] =  vld1q_u64((uint64_t *)(rxdp));
> 
> +		/* Use acquire fence to order loads of descriptor qwords */
> +		rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> +		/* A.2 reload qword0 to make it ordered after qword1 load */
> +		descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3],
> 0);
> +		descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2],
> 0);
> +		descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1],
> 0);
> +		descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0);
> +
>  		/* B.1 load 4 mbuf point */
>  		mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]);
>  		mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]);
> --
> 2.25.1
  
Ruifeng Wang Sept. 15, 2021, 8:42 a.m. UTC | #2
> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Wednesday, September 15, 2021 2:33 AM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; dev@dpdk.org
> Cc: beilei.xing@intel.com; qi.z.zhang@intel.com;
> bruce.richardson@intel.com; jerinj@marvell.com;
> hemant.agrawal@nxp.com; drc@linux.vnet.ibm.com; stable@dpdk.org; nd
> <nd@arm.com>; Ruifeng Wang <Ruifeng.Wang@arm.com>; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH 1/2] net/i40e: fix risk in Rx descriptor read in NEON
> vector path
> 
> <snip>
> Similar comments that I have to patch 2/2
> 
> >
> > Rx descriptor is 16B/32B in size and consists of multiple words.
> > The word that includes DD field should be read first. Read result with
> > DD bit set indicates the rest part in a descriptor is valid.
> Suggest rewording as follows:
> Rx descriptor is 16B/32B in size. If the DD bit is set, it indicates that the rest of
> the descriptor words have valid values. Hence, the word containing DD bit
> must be read first before reading the rest of the descriptor words.
> 
> >
> > In NEON vector PMD, vector load loads two contiguous 8B of descriptor
> > data into vector register. Given vector load ensures no 16B atomicity,
> > read of the word that includes DD field could be reordered after read
> > of other words. In this case, some words could be invalid data.
> "some words could contain invalid data"
> 
> >
> > Read barrier is added after read of qword1 that includes DD field.
> > And qword0 is reloaded to update vector register. This ensures what
> > fetched is correct descriptor data.
> "This ensures that the fetched data is correct".
> 
> Suggest capturing the performance impact, so it is clearly documented.

Added performance impact to commit message in v2.
> >
> > Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> With the above comments,
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> 

Thanks for your review.
Comments are addressed in v2.
> > ---
> >  drivers/net/i40e/i40e_rxtx_vec_neon.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > index b2683fda60..71191c7cc8 100644
> > --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > @@ -286,6 +286,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue
> > *__rte_restrict rxq,
> >  		descs[1] =  vld1q_u64((uint64_t *)(rxdp + 1));
> >  		descs[0] =  vld1q_u64((uint64_t *)(rxdp));
> >
> > +		/* Use acquire fence to order loads of descriptor qwords */
> > +		rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> > +		/* A.2 reload qword0 to make it ordered after qword1 load
> */
> > +		descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3],
> > 0);
> > +		descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2],
> > 0);
> > +		descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1],
> > 0);
> > +		descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0);
> > +
> >  		/* B.1 load 4 mbuf point */
> >  		mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]);
> >  		mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]);
> > --
> > 2.25.1
  

Patch

diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c
index b2683fda60..71191c7cc8 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
@@ -286,6 +286,14 @@  _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq,
 		descs[1] =  vld1q_u64((uint64_t *)(rxdp + 1));
 		descs[0] =  vld1q_u64((uint64_t *)(rxdp));
 
+		/* Use acquire fence to order loads of descriptor qwords */
+		rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+		/* A.2 reload qword0 to make it ordered after qword1 load */
+		descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3], 0);
+		descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0);
+		descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0);
+		descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0);
+
 		/* B.1 load 4 mbuf point */
 		mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]);
 		mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]);