[dpdk-dev] net/mlx5: remove excessive data prefetch

Message ID 20180312170545.16165-1-yskoh@mellanox.com (mailing list archive)
State Accepted, archived
Delegated to: Shahaf Shuler
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Yongseok Koh March 12, 2018, 5:05 p.m. UTC
  In Enhanced Multi-Packet Send (eMPW), entire packet data is prefetched to
LLC if it isn't inlined. Even though this helps reducing jitter when HW
fetches data by DMA, this can thresh the LLC with evicting precious data.
And if the size of queue is large and there are many queues, this might not
be effective. Also, if application runs on a remote node from the PCIe
link, it may not be helpful and can even cause bad results.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
---
 drivers/net/mlx5/mlx5_rxtx.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)
  

Comments

Adrien Mazarguil March 13, 2018, 12:44 p.m. UTC | #1
On Mon, Mar 12, 2018 at 10:05:45AM -0700, Yongseok Koh wrote:
> In Enhanced Multi-Packet Send (eMPW), entire packet data is prefetched to
> LLC if it isn't inlined. Even though this helps reducing jitter when HW
> fetches data by DMA, this can thresh the LLC with evicting precious data.
> And if the size of queue is large and there are many queues, this might not
> be effective. Also, if application runs on a remote node from the PCIe
> link, it may not be helpful and can even cause bad results.
> 
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>

Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
  
Shahaf Shuler April 3, 2018, 4:58 a.m. UTC | #2
Tuesday, March 13, 2018 2:45 PM, Adrien Mazarguil:
> Subject: Re: [dpdk-dev] [PATCH] net/mlx5: remove excessive data prefetch
> 
> On Mon, Mar 12, 2018 at 10:05:45AM -0700, Yongseok Koh wrote:
> > In Enhanced Multi-Packet Send (eMPW), entire packet data is prefetched
> > to LLC if it isn't inlined. Even though this helps reducing jitter
> > when HW fetches data by DMA, this can thresh the LLC with evicting
> precious data.
> > And if the size of queue is large and there are many queues, this
> > might not be effective. Also, if application runs on a remote node
> > from the PCIe link, it may not be helpful and can even cause bad results.
> >
> > Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> 
> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

Applied to next-net-mlx, thanks. 

> 
> --
> Adrien Mazarguil
> 6WIND
  

Patch

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 049f7e6c1..c2060b734 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1320,7 +1320,6 @@  txq_burst_empw(struct mlx5_txq_data *txq, struct rte_mbuf **pkts,
 	do {
 		struct rte_mbuf *buf = *(pkts++);
 		uintptr_t addr;
-		unsigned int n;
 		unsigned int do_inline = 0; /* Whether inline is possible. */
 		uint32_t length;
 		uint8_t cs_flags;
@@ -1440,11 +1439,8 @@  txq_burst_empw(struct mlx5_txq_data *txq, struct rte_mbuf **pkts,
 					((uintptr_t)mpw.data.raw +
 					 inl_pad);
 			(*txq->elts)[elts_head++ & elts_m] = buf;
-			addr = rte_pktmbuf_mtod(buf, uintptr_t);
-			for (n = 0; n * RTE_CACHE_LINE_SIZE < length; n++)
-				rte_prefetch2((void *)(addr +
-						n * RTE_CACHE_LINE_SIZE));
-			addr = rte_cpu_to_be_64(addr);
+			addr = rte_cpu_to_be_64(rte_pktmbuf_mtod(buf,
+								 uintptr_t));
 			*dseg = (rte_v128u32_t) {
 				rte_cpu_to_be_32(length),
 				mlx5_tx_mb2mr(txq, buf),