[3/3] net/bnxt: fix risk in Rx descriptor read in NEON path

Message ID 20220413103156.3680600-4-ruifeng.wang@arm.com (mailing list archive)
State Accepted, archived
Delegated to: Ajit Khaparde
Headers
Series BNXT changes |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/intel-Testing success Testing PASS

Commit Message

Ruifeng Wang April 13, 2022, 10:31 a.m. UTC
  Rx descriptor contains a valid bit which indicates readiness of the rest
of descriptor words. Hence, the word contains valid bit must be read
prior to other words.

In NEON vector path, two contiguous 8B descriptor are loaded to a single
NEON register. Given vector load ensures no 16B atomicity, read of the
word that includes valid bit could be reordered after read of other words.
In this case, data could be invalid.

Reloaded lower 64b after read barrier. This ensures what fetched is
correct.

Also fixed comments that not pertains to Arm platform architecture.

Fixes: deae85145c64 ("net/bnxt: handle multiple packets per loop in vector Rx")
Cc: lance.richardson@broadcom.com
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 drivers/net/bnxt/bnxt_rxtx_vec_neon.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)
  

Patch

diff --git a/drivers/net/bnxt/bnxt_rxtx_vec_neon.c b/drivers/net/bnxt/bnxt_rxtx_vec_neon.c
index 779e23ac4f..32f8e59b3a 100644
--- a/drivers/net/bnxt/bnxt_rxtx_vec_neon.c
+++ b/drivers/net/bnxt/bnxt_rxtx_vec_neon.c
@@ -231,25 +231,38 @@  recv_burst_vec_neon(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		}
 
 		/*
-		 * Load the four current descriptors into SSE registers in
-		 * reverse order to ensure consistent state.
+		 * Load the four current descriptors into NEON registers.
+		 * IO barriers are used to ensure consistent state.
 		 */
 		rxcmp1[3] = vld1q_u32((void *)&cpr->cp_desc_ring[cons + 7]);
 		rte_io_rmb();
+		/* Reload lower 64b of descriptors to make it ordered after info3_v. */
+		rxcmp1[3] = vreinterpretq_u32_u64(vld1q_lane_u64
+				((void *)&cpr->cp_desc_ring[cons + 7],
+				vreinterpretq_u64_u32(rxcmp1[3]), 0));
 		rxcmp[3] = vld1q_u32((void *)&cpr->cp_desc_ring[cons + 6]);
 
 		rxcmp1[2] = vld1q_u32((void *)&cpr->cp_desc_ring[cons + 5]);
 		rte_io_rmb();
+		rxcmp1[2] = vreinterpretq_u32_u64(vld1q_lane_u64
+				((void *)&cpr->cp_desc_ring[cons + 5],
+				vreinterpretq_u64_u32(rxcmp1[2]), 0));
 		rxcmp[2] = vld1q_u32((void *)&cpr->cp_desc_ring[cons + 4]);
 
 		t1 = vreinterpretq_u64_u32(vzip2q_u32(rxcmp1[2], rxcmp1[3]));
 
 		rxcmp1[1] = vld1q_u32((void *)&cpr->cp_desc_ring[cons + 3]);
 		rte_io_rmb();
+		rxcmp1[1] = vreinterpretq_u32_u64(vld1q_lane_u64
+				((void *)&cpr->cp_desc_ring[cons + 3],
+				vreinterpretq_u64_u32(rxcmp1[1]), 0));
 		rxcmp[1] = vld1q_u32((void *)&cpr->cp_desc_ring[cons + 2]);
 
 		rxcmp1[0] = vld1q_u32((void *)&cpr->cp_desc_ring[cons + 1]);
 		rte_io_rmb();
+		rxcmp1[0] = vreinterpretq_u32_u64(vld1q_lane_u64
+				((void *)&cpr->cp_desc_ring[cons + 1],
+				vreinterpretq_u64_u32(rxcmp1[0]), 0));
 		rxcmp[0] = vld1q_u32((void *)&cpr->cp_desc_ring[cons + 0]);
 
 		t0 = vreinterpretq_u64_u32(vzip2q_u32(rxcmp1[0], rxcmp1[1]));