net/mlx5: fix sync on Tx doorbell for PPC64

  • net/mlx5: fix sync on Tx doorbell for PPC64
Commit Message

Dekel Peled March 18, 2019, 6:42 a.m.
In file lib/librte_eal/common/include/arch/ppc_64/rte_atomic.h:
rte_mb() is defined as asm "sync".
rte_wmb() is defined as asm "lwsync".

mlx5_tx_dbrec_cond_wmb() uses rte_wmb() to ensure ordering between
DB record and BF copy.
For P9 processor, not having strongly-ordered memory model, this
memory barrier is not strict enough, so rte_mb() has to be used.
For x86 processor, having strongly-ordered memory model, the use
of rte_mb() instead of rte_wmb() causes up to ~10% performance hit.

This patch adds mlx5_arch_specific_mb(), defined as rte_mb() for PPC64
and as rte_wmb() for other processors.
mlx5_tx_dbrec_cond_wmb() will use mlx5_arch_specific_mb() in order to
guarantee data is valid for any processor architecture.

Original work by Yongseok Koh.

Fixes: 6cb559d67b83 ("net/mlx5: add vectorized Rx/Tx burst for x86")

Signed-off-by: Dekel Peled <>
 drivers/net/mlx5/mlx5_rxtx.h  | 2 +-
 drivers/net/mlx5/mlx5_utils.h | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)


diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 53115dd..df51589 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -707,7 +707,7 @@  uint32_t mlx5_tx_update_ext_mp(struct mlx5_txq_data *txq, uintptr_t addr,
 	*txq->qp_db = rte_cpu_to_be_32(txq->wqe_ci);
 	/* Ensure ordering between DB record and BF copy. */
-	rte_wmb();
+	mlx5_arch_specific_mb();
 	mlx5_uar_write64_relaxed(*src, dst, txq->uar_lock);
 	if (cond)
diff --git a/drivers/net/mlx5/mlx5_utils.h b/drivers/net/mlx5/mlx5_utils.h
index 97092c7..6742271 100644
--- a/drivers/net/mlx5/mlx5_utils.h
+++ b/drivers/net/mlx5/mlx5_utils.h
@@ -25,6 +25,15 @@ 
 #define bool _Bool
+ * Define strict memory-barrier for PPC64.
+ */
+#if defined(__PPC64__)
+#define mlx5_arch_specific_mb() rte_mb()
+#define mlx5_arch_specific_mb() rte_wmb()
 /* Bit-field manipulation. */
 #define BITFIELD_DECLARE(bf, type, size) \
 	type bf[(((size_t)(size) / (sizeof(type) * CHAR_BIT)) + \