[v2] net/mlx5: fix risk in Rx descriptor read in NEON vector path

Message ID 20230530054804.4101060-1-ruifeng.wang@arm.com (mailing list archive)
State Accepted, archived
Delegated to: Raslan Darawsheh
Headers
Series [v2] net/mlx5: fix risk in Rx descriptor read in NEON vector path |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/github-robot: build success github build: passed
ci/iol-abi-testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-unit-testing success Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS

Commit Message

Ruifeng Wang May 30, 2023, 5:48 a.m. UTC
  In NEON vector PMD, vector load loads two contiguous 8B of
descriptor data into vector register. Given vector load ensures no
16B atomicity, read of the word that includes op_own field could be
reordered after read of other words. In this case, some words could
contain invalid data.

Reloaded qword0 after read barrier to update vector register. This
ensures that the fetched data is correct.

Testpmd single core test on N1SDP/ThunderX2 showed no performance drop.

Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx completions")
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
v2: Rebased and added tags that received.

 drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 ++++++++
 1 file changed, 8 insertions(+)
  

Comments

Raslan Darawsheh June 19, 2023, 12:13 p.m. UTC | #1
Hi,

> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Tuesday, May 30, 2023 8:48 AM
> To: Raslan Darawsheh <rasland@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com; stable@dpdk.org;
> nd@arm.com; Ruifeng Wang <ruifeng.wang@arm.com>; Ali Alnubani
> <alialnu@nvidia.com>
> Subject: [PATCH v2] net/mlx5: fix risk in Rx descriptor read in NEON vector
> path
> 
> In NEON vector PMD, vector load loads two contiguous 8B of
> descriptor data into vector register. Given vector load ensures no
> 16B atomicity, read of the word that includes op_own field could be
> reordered after read of other words. In this case, some words could
> contain invalid data.
> 
> Reloaded qword0 after read barrier to update vector register. This
> ensures that the fetched data is correct.
> 
> Testpmd single core test on N1SDP/ThunderX2 showed no performance drop.
> 
> Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx
> completions")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Tested-by: Ali Alnubani <alialnu@nvidia.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
> v2: Rebased and added tags that received.
> 

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh
  

Patch

diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index 75e8ed7e5a..9079da65de 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -675,6 +675,14 @@  rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 		c0 = vld1q_u64((uint64_t *)(p0 + 48));
 		/* Synchronize for loading the rest of blocks. */
 		rte_io_rmb();
+		/* B.0 (CQE 3) reload lower half of the block. */
+		c3 = vld1q_lane_u64((uint64_t *)(p3 + 48), c3, 0);
+		/* B.0 (CQE 2) reload lower half of the block. */
+		c2 = vld1q_lane_u64((uint64_t *)(p2 + 48), c2, 0);
+		/* B.0 (CQE 1) reload lower half of the block. */
+		c1 = vld1q_lane_u64((uint64_t *)(p1 + 48), c1, 0);
+		/* B.0 (CQE 0) reload lower half of the block. */
+		c0 = vld1q_lane_u64((uint64_t *)(p0 + 48), c0, 0);
 		/* Prefetch next 4 CQEs. */
 		if (pkts_n - pos >= 2 * MLX5_VPMD_DESCS_PER_LOOP) {
 			unsigned int next = pos + MLX5_VPMD_DESCS_PER_LOOP;