[dpdk-dev,2/2] mbuf: make sure userdata is initialized

Message ID 1436485068-30609-3-git-send-email-stephen@networkplumber.org (mailing list archive)
State Changes Requested, archived
Headers

Commit Message

Stephen Hemminger July 9, 2015, 11:37 p.m. UTC
  From: Stephen Hemminger <shemming@brocade.com>

For applications that use m->userdata the initialization can
be a signficant (10%) performance penalty.

Rather than taking the cache penalty of initializing userdata
in the receive handling, do it in the place where mbuf is
already cache hot and being setup.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 drivers/net/cxgbe/sge.c               | 1 +
 drivers/net/e1000/em_rxtx.c           | 2 ++
 drivers/net/e1000/igb_rxtx.c          | 2 ++
 drivers/net/enic/enic_main.c          | 2 ++
 drivers/net/fm10k/fm10k.h             | 1 +
 drivers/net/i40e/i40e_rxtx.c          | 4 ++++
 drivers/net/ixgbe/ixgbe_rxtx.c        | 4 ++++
 drivers/net/ixgbe/ixgbe_rxtx_vec.c    | 1 +
 drivers/net/virtio/virtio_rxtx.c      | 3 +++
 drivers/net/vmxnet3/vmxnet3_rxtx.c    | 1 +
 drivers/net/xenvirt/rte_eth_xenvirt.c | 1 +
 lib/librte_mbuf/rte_mbuf.h            | 1 +
 12 files changed, 23 insertions(+)
  

Comments

Bruce Richardson July 10, 2015, 9:32 a.m. UTC | #1
On Thu, Jul 09, 2015 at 04:37:48PM -0700, Stephen Hemminger wrote:
> From: Stephen Hemminger <shemming@brocade.com>
> 
> For applications that use m->userdata the initialization can
> be a signficant (10%) performance penalty.
> 
> Rather than taking the cache penalty of initializing userdata
> in the receive handling, do it in the place where mbuf is
> already cache hot and being setup.

Should the management of the userdata field not be the responsibility of the
app itself, rather than having the PMD manage it? If the PMD does manage the
userdata field, I would suggest taking the approach of having the field cleared
by the mbuf library on free, rather than on RX.

> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  drivers/net/cxgbe/sge.c               | 1 +
>  drivers/net/e1000/em_rxtx.c           | 2 ++
>  drivers/net/e1000/igb_rxtx.c          | 2 ++
>  drivers/net/enic/enic_main.c          | 2 ++
>  drivers/net/fm10k/fm10k.h             | 1 +
>  drivers/net/i40e/i40e_rxtx.c          | 4 ++++
>  drivers/net/ixgbe/ixgbe_rxtx.c        | 4 ++++
>  drivers/net/ixgbe/ixgbe_rxtx_vec.c    | 1 +
>  drivers/net/virtio/virtio_rxtx.c      | 3 +++
>  drivers/net/vmxnet3/vmxnet3_rxtx.c    | 1 +
>  drivers/net/xenvirt/rte_eth_xenvirt.c | 1 +
>  lib/librte_mbuf/rte_mbuf.h            | 1 +
>  12 files changed, 23 insertions(+)
> 
<snip>
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> index 912d3b4..d73d8dc 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> @@ -742,6 +742,7 @@ ixgbe_rxq_vec_setup(struct ixgbe_rx_queue *rxq)
>  
>  	mb_def.nb_segs = 1;
>  	mb_def.data_off = RTE_PKTMBUF_HEADROOM;
> +	mb_def.userdata = NULL;
>  	mb_def.port = rxq->port_id;
>  	rte_mbuf_refcnt_set(&mb_def, 1);
>  

This won't actually work for the vector PMD. The userdata field is not covered
by the "rearm_data" part of the mbuf.

/Bruce
  
Stephen Hemminger July 10, 2015, 3:43 p.m. UTC | #2
On Fri, 10 Jul 2015 10:32:17 +0100
Bruce Richardson <bruce.richardson@intel.com> wrote:

> On Thu, Jul 09, 2015 at 04:37:48PM -0700, Stephen Hemminger wrote:
> > From: Stephen Hemminger <shemming@brocade.com>
> > 
> > For applications that use m->userdata the initialization can
> > be a signficant (10%) performance penalty.
> > 
> > Rather than taking the cache penalty of initializing userdata
> > in the receive handling, do it in the place where mbuf is
> > already cache hot and being setup.  
> 
> Should the management of the userdata field not be the responsibility of the
> app itself, rather than having the PMD manage it? If the PMD does manage the
> userdata field, I would suggest taking the approach of having the field cleared
> by the mbuf library on free, rather than on RX.

The problem with that is m->userdata is that when the application
gets the mbuf, touching the userdata field causes false cache
sharing and we see a significant performance drop. Internally the
userdata field is only used for few special cases and 0/NULL
is used to indicate that no metadata is present.
  
Olivier Matz July 15, 2015, 10:10 a.m. UTC | #3
On 07/10/2015 05:43 PM, Stephen Hemminger wrote:
> On Fri, 10 Jul 2015 10:32:17 +0100
> Bruce Richardson <bruce.richardson@intel.com> wrote:
>
>> On Thu, Jul 09, 2015 at 04:37:48PM -0700, Stephen Hemminger wrote:
>>> From: Stephen Hemminger <shemming@brocade.com>
>>>
>>> For applications that use m->userdata the initialization can
>>> be a signficant (10%) performance penalty.
>>>
>>> Rather than taking the cache penalty of initializing userdata
>>> in the receive handling, do it in the place where mbuf is
>>> already cache hot and being setup.
>>
>> Should the management of the userdata field not be the responsibility of the
>> app itself, rather than having the PMD manage it? If the PMD does manage the
>> userdata field, I would suggest taking the approach of having the field cleared
>> by the mbuf library on free, rather than on RX.
>
> The problem with that is m->userdata is that when the application
> gets the mbuf, touching the userdata field causes false cache
> sharing and we see a significant performance drop. Internally the
> userdata field is only used for few special cases and 0/NULL
> is used to indicate that no metadata is present.
>

I agree with Bruce: the userdata field is in the second cache line,
and we should avoid touching it in rx path. Resetting it in mbuf_free()
may not be the solution either: in tx path, the mbufs are freed when
we want to send a new mbuf on the hw ring entry, and the mbuf may
not be in the cache anymore.

I don't understand why doing it in the application induces a performance
drop compared to the PMD. What do you mean by false cache sharing?
  
Olivier Matz Feb. 3, 2016, 5:23 p.m. UTC | #4
Hi,

On 07/10/2015 05:43 PM, Stephen Hemminger wrote:
> On Fri, 10 Jul 2015 10:32:17 +0100
> Bruce Richardson <bruce.richardson@intel.com> wrote:
>
>> On Thu, Jul 09, 2015 at 04:37:48PM -0700, Stephen Hemminger wrote:
>>> From: Stephen Hemminger <shemming@brocade.com>
>>>
>>> For applications that use m->userdata the initialization can
>>> be a signficant (10%) performance penalty.
>>>
>>> Rather than taking the cache penalty of initializing userdata
>>> in the receive handling, do it in the place where mbuf is
>>> already cache hot and being setup.
>>
>> Should the management of the userdata field not be the responsibility of the
>> app itself, rather than having the PMD manage it? If the PMD does manage the
>> userdata field, I would suggest taking the approach of having the field cleared
>> by the mbuf library on free, rather than on RX.
>
> The problem with that is m->userdata is that when the application
> gets the mbuf, touching the userdata field causes false cache
> sharing and we see a significant performance drop. Internally the
> userdata field is only used for few special cases and 0/NULL
> is used to indicate that no metadata is present.

Agree with Bruce. The management of this field is the responsibility
of the application.

Also, this field is located in the second cache line, and we should
avoid to touch it in the rx functions. We had the same discussion for
the next field in this thread:
http://dpdk.org/ml/archives/dev/2015-June/019182.html

A pure-app solution would be to set it to NULL in:
- your mempool object init function
- in your mbuf_free() function, before calling rte_pktmbuf_free()
- before passing the mbuf to the tx driver

But a better solution in dpdk is probably what Bruce suggests.

Regards,
Olivier
  

Patch

diff --git a/drivers/net/cxgbe/sge.c b/drivers/net/cxgbe/sge.c
index 359296e..c51ed39 100644
--- a/drivers/net/cxgbe/sge.c
+++ b/drivers/net/cxgbe/sge.c
@@ -406,6 +406,7 @@  static unsigned int refill_fl_usembufs(struct adapter *adap, struct sge_fl *q,
 		}
 
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
+		mbuf->userdata = NULL;
 		mbuf->next = NULL;
 
 		mapping = (dma_addr_t)(mbuf->buf_physaddr + mbuf->data_off);
diff --git a/drivers/net/e1000/em_rxtx.c b/drivers/net/e1000/em_rxtx.c
index 38f5c3b..a2fd522 100644
--- a/drivers/net/e1000/em_rxtx.c
+++ b/drivers/net/e1000/em_rxtx.c
@@ -790,6 +790,7 @@  eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 				rxq->crc_len);
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_packet_prefetch((char *)rxm->buf_addr + rxm->data_off);
+		rxm->userdata = NULL;
 		rxm->nb_segs = 1;
 		rxm->next = NULL;
 		rxm->pkt_len = pkt_len;
@@ -959,6 +960,7 @@  eth_em_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		data_len = rte_le_to_cpu_16(rxd.length);
 		rxm->data_len = data_len;
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
+		rxm->userdata = NULL;
 
 		/*
 		 * If this is the first buffer of the received packet,
diff --git a/drivers/net/e1000/igb_rxtx.c b/drivers/net/e1000/igb_rxtx.c
index 43d6703..9fc0690 100644
--- a/drivers/net/e1000/igb_rxtx.c
+++ b/drivers/net/e1000/igb_rxtx.c
@@ -774,6 +774,7 @@  eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		pkt_len = (uint16_t) (rte_le_to_cpu_16(rxd.wb.upper.length) -
 				      rxq->crc_len);
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
+		rxm->userdata = NULL;
 		rte_packet_prefetch((char *)rxm->buf_addr + rxm->data_off);
 		rxm->nb_segs = 1;
 		rxm->next = NULL;
@@ -948,6 +949,7 @@  eth_igb_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		data_len = rte_le_to_cpu_16(rxd.wb.upper.length);
 		rxm->data_len = data_len;
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
+		rxm->userdata = NULL;
 
 		/*
 		 * If this is the first buffer of the received packet,
diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index 15313c2..8cf5948 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -342,6 +342,7 @@  static int enic_rq_alloc_buf(struct vnic_rq *rq)
 		}
 
 		hdr_mbuf->data_off = RTE_PKTMBUF_HEADROOM;
+		hdr_mbuf->userdata = NULL;
 
 		hdr_mbuf->nb_segs = 2;
 		hdr_mbuf->port = enic->port_id;
@@ -363,6 +364,7 @@  static int enic_rq_alloc_buf(struct vnic_rq *rq)
 	}
 
 	mbuf->data_off = RTE_PKTMBUF_HEADROOM;
+	mbuf->userdata = NULL;
 	mbuf->next = NULL;
 
 	dma_addr = (dma_addr_t)
diff --git a/drivers/net/fm10k/fm10k.h b/drivers/net/fm10k/fm10k.h
index c089882..c9d9147 100644
--- a/drivers/net/fm10k/fm10k.h
+++ b/drivers/net/fm10k/fm10k.h
@@ -257,6 +257,7 @@  fm10k_pktmbuf_reset(struct rte_mbuf *mb, uint8_t in_port)
 	mb->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb->buf_addr +
 			RTE_PKTMBUF_HEADROOM, FM10K_RX_DATABUF_ALIGN)
 			- (char *)mb->buf_addr);
+	mb->userdata = NULL;
 	mb->port = in_port;
 }
 
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 88b015d..177b823 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -804,6 +804,7 @@  i40e_rx_alloc_bufs(struct i40e_rx_queue *rxq)
 		rte_mbuf_refcnt_set(mb, 1);
 		mb->next = NULL;
 		mb->data_off = RTE_PKTMBUF_HEADROOM;
+		mb->userdata = NULL;
 		mb->nb_segs = 1;
 		mb->port = rxq->port_id;
 		dma_addr = rte_cpu_to_le_64(\
@@ -961,6 +962,7 @@  i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 				I40E_RXD_QW1_LENGTH_PBUF_SHIFT) - rxq->crc_len;
 
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
+		rxm->userdata = NULL;
 		rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
 		rxm->nb_segs = 1;
 		rxm->next = NULL;
@@ -1069,6 +1071,7 @@  i40e_recv_scattered_pkts(void *rx_queue,
 					I40E_RXD_QW1_LENGTH_PBUF_SHIFT;
 		rxm->data_len = rx_packet_len;
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
+		rxm->userdata = NULL;
 
 		/**
 		 * If this is the first buffer of the received packet, set the
@@ -2436,6 +2439,7 @@  i40e_alloc_rx_queue_mbufs(struct i40e_rx_queue *rxq)
 		rte_mbuf_refcnt_set(mbuf, 1);
 		mbuf->next = NULL;
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
+		mbuf->userdata = NULL;
 		mbuf->nb_segs = 1;
 		mbuf->port = rxq->port_id;
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index b1db57f..558c039 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1054,6 +1054,7 @@  ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, bool reset_mbuf)
 
 		rte_mbuf_refcnt_set(mb, 1);
 		mb->data_off = RTE_PKTMBUF_HEADROOM;
+		mb->userdata = NULL;
 
 		/* populate the descriptors */
 		dma_addr = rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb));
@@ -1321,6 +1322,7 @@  ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
 		pkt_len = (uint16_t) (rte_le_to_cpu_16(rxd.wb.upper.length) -
 				      rxq->crc_len);
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
+		rxm->userdata = NULL;
 		rte_packet_prefetch((char *)rxm->buf_addr + rxm->data_off);
 		rxm->nb_segs = 1;
 		rxm->next = NULL;
@@ -1596,6 +1598,7 @@  next_desc:
 			rxe->mbuf = nmb;
 
 			rxm->data_off = RTE_PKTMBUF_HEADROOM;
+			rxm->userdata = NULL;
 			rxdp->read.hdr_addr = dma;
 			rxdp->read.pkt_addr = dma;
 		} else
@@ -3462,6 +3465,7 @@  ixgbe_alloc_rx_queue_mbufs(struct ixgbe_rx_queue *rxq)
 		rte_mbuf_refcnt_set(mbuf, 1);
 		mbuf->next = NULL;
 		mbuf->data_off = RTE_PKTMBUF_HEADROOM;
+		mbuf->userdata = NULL;
 		mbuf->nb_segs = 1;
 		mbuf->port = rxq->port_id;
 
diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
index 912d3b4..d73d8dc 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
@@ -742,6 +742,7 @@  ixgbe_rxq_vec_setup(struct ixgbe_rx_queue *rxq)
 
 	mb_def.nb_segs = 1;
 	mb_def.data_off = RTE_PKTMBUF_HEADROOM;
+	mb_def.userdata = NULL;
 	mb_def.port = rxq->port_id;
 	rte_mbuf_refcnt_set(&mb_def, 1);
 
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 091c7fb..b6966cc 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -519,6 +519,7 @@  virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 
 		rxm->port = rxvq->port_id;
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
+		rxm->userdata = NULL;
 
 		rxm->nb_segs = 1;
 		rxm->next = NULL;
@@ -635,6 +636,7 @@  virtio_recv_mergeable_pkts(void *rx_queue,
 			seg_num = 1;
 
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
+		rxm->userdata = NULL;
 		rxm->nb_segs = seg_num;
 		rxm->next = NULL;
 		rxm->pkt_len = (uint32_t)(len[0] - hdr_size);
@@ -673,6 +675,7 @@  virtio_recv_mergeable_pkts(void *rx_queue,
 				rxm = rcv_pkts[extra_idx];
 
 				rxm->data_off = RTE_PKTMBUF_HEADROOM - hdr_size;
+				rxm->userdata = NULL;
 				rxm->next = NULL;
 				rxm->pkt_len = (uint32_t)(len[extra_idx]);
 				rxm->data_len = (uint16_t)(len[extra_idx]);
diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c b/drivers/net/vmxnet3/vmxnet3_rxtx.c
index c0cbacb..fe6560b 100644
--- a/drivers/net/vmxnet3/vmxnet3_rxtx.c
+++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c
@@ -638,6 +638,7 @@  vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		rxm->pkt_len = (uint16_t)rcd->len;
 		rxm->data_len = (uint16_t)rcd->len;
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
+		rxm->userdata = NULL;
 		rxm->ol_flags = 0;
 		rxm->vlan_tci = 0;
 
diff --git a/drivers/net/xenvirt/rte_eth_xenvirt.c b/drivers/net/xenvirt/rte_eth_xenvirt.c
index 73e8bce..4000a31 100644
--- a/drivers/net/xenvirt/rte_eth_xenvirt.c
+++ b/drivers/net/xenvirt/rte_eth_xenvirt.c
@@ -111,6 +111,7 @@  eth_xenvirt_rx(void *q, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		PMD_RX_LOG(DEBUG, "packet len:%d\n", len[i]);
 		rxm->next = NULL;
 		rxm->data_off = RTE_PKTMBUF_HEADROOM;
+		rxm->userdata = NULL;
 		rxm->data_len = (uint16_t)(len[i] - sizeof(struct virtio_net_hdr));
 		rxm->nb_segs = 1;
 		rxm->port = pi->port_id;
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index f0a543b..302a9c1 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -794,6 +794,7 @@  static inline void rte_pktmbuf_reset(struct rte_mbuf *m)
 	m->vlan_tci_outer = 0;
 	m->nb_segs = 1;
 	m->port = 0xff;
+	m->userdata = NULL;
 
 	m->ol_flags = 0;
 	m->packet_type = 0;