[v2,2/2] net/mlx5: reduce unnecessary memory access
Checks
Commit Message
MR btree len is a constant during Rx replenish.
Moved retrieve of the value out of loop to reduce data loads.
Slight performance uplift was measured on both N1SDP and x86.
Suggested-by: Slava Ovsiienko <viacheslavo@nvidia.com>
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
drivers/net/mlx5/mlx5_rxtx_vec.c | 35 ++++++++++++++++++--------------
1 file changed, 20 insertions(+), 15 deletions(-)
Comments
> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Wednesday, July 7, 2021 12:03
> To: Raslan Darawsheh <rasland@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Slava
> Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; jerinj@marvell.com; nd@arm.com;
> honnappa.nagarahalli@arm.com; Ruifeng Wang <ruifeng.wang@arm.com>
> Subject: [PATCH v2 2/2] net/mlx5: reduce unnecessary memory access
>
> MR btree len is a constant during Rx replenish.
> Moved retrieve of the value out of loop to reduce data loads.
> Slight performance uplift was measured on both N1SDP and x86.
>
> Suggested-by: Slava Ovsiienko <viacheslavo@nvidia.com>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Thank you for the update,
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
@@ -106,22 +106,27 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data *rxq)
rxq->stats.rx_nombuf += n;
return;
}
- for (i = 0; i < n; ++i) {
- void *buf_addr;
-
- /*
- * In order to support the mbufs with external attached
- * data buffer we should use the buf_addr pointer
- * instead of rte_mbuf_buf_addr(). It touches the mbuf
- * itself and may impact the performance.
- */
- buf_addr = elts[i]->buf_addr;
- wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr +
- RTE_PKTMBUF_HEADROOM);
- /* If there's a single MR, no need to replace LKey. */
- if (unlikely(mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh)
- > 1))
+ if (unlikely(mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh) > 1)) {
+ for (i = 0; i < n; ++i) {
+ /*
+ * In order to support the mbufs with external attached
+ * data buffer we should use the buf_addr pointer
+ * instead of rte_mbuf_buf_addr(). It touches the mbuf
+ * itself and may impact the performance.
+ */
+ void *buf_addr = elts[i]->buf_addr;
+
+ wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr +
+ RTE_PKTMBUF_HEADROOM);
wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]);
+ }
+ } else {
+ for (i = 0; i < n; ++i) {
+ void *buf_addr = elts[i]->buf_addr;
+
+ wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr +
+ RTE_PKTMBUF_HEADROOM);
+ }
}
rxq->rq_ci += n;
/* Prevent overflowing into consumed mbufs. */