[dpdk-dev,2/2] mbuf: make sure userdata is initialized
Commit Message
From: Stephen Hemminger <shemming@brocade.com>
For applications that use m->userdata the initialization can
be a signficant (10%) performance penalty.
Rather than taking the cache penalty of initializing userdata
in the receive handling, do it in the place where mbuf is
already cache hot and being setup.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
drivers/net/cxgbe/sge.c | 1 +
drivers/net/e1000/em_rxtx.c | 2 ++
drivers/net/e1000/igb_rxtx.c | 2 ++
drivers/net/enic/enic_main.c | 2 ++
drivers/net/fm10k/fm10k.h | 1 +
drivers/net/i40e/i40e_rxtx.c | 4 ++++
drivers/net/ixgbe/ixgbe_rxtx.c | 4 ++++
drivers/net/ixgbe/ixgbe_rxtx_vec.c | 1 +
drivers/net/virtio/virtio_rxtx.c | 3 +++
drivers/net/vmxnet3/vmxnet3_rxtx.c | 1 +
drivers/net/xenvirt/rte_eth_xenvirt.c | 1 +
lib/librte_mbuf/rte_mbuf.h | 1 +
12 files changed, 23 insertions(+)
Comments
On Thu, Jul 09, 2015 at 04:37:48PM -0700, Stephen Hemminger wrote:
> From: Stephen Hemminger <shemming@brocade.com>
>
> For applications that use m->userdata the initialization can
> be a signficant (10%) performance penalty.
>
> Rather than taking the cache penalty of initializing userdata
> in the receive handling, do it in the place where mbuf is
> already cache hot and being setup.
Should the management of the userdata field not be the responsibility of the
app itself, rather than having the PMD manage it? If the PMD does manage the
userdata field, I would suggest taking the approach of having the field cleared
by the mbuf library on free, rather than on RX.
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
> drivers/net/cxgbe/sge.c | 1 +
> drivers/net/e1000/em_rxtx.c | 2 ++
> drivers/net/e1000/igb_rxtx.c | 2 ++
> drivers/net/enic/enic_main.c | 2 ++
> drivers/net/fm10k/fm10k.h | 1 +
> drivers/net/i40e/i40e_rxtx.c | 4 ++++
> drivers/net/ixgbe/ixgbe_rxtx.c | 4 ++++
> drivers/net/ixgbe/ixgbe_rxtx_vec.c | 1 +
> drivers/net/virtio/virtio_rxtx.c | 3 +++
> drivers/net/vmxnet3/vmxnet3_rxtx.c | 1 +
> drivers/net/xenvirt/rte_eth_xenvirt.c | 1 +
> lib/librte_mbuf/rte_mbuf.h | 1 +
> 12 files changed, 23 insertions(+)
>
<snip>
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> index 912d3b4..d73d8dc 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
> @@ -742,6 +742,7 @@ ixgbe_rxq_vec_setup(struct ixgbe_rx_queue *rxq)
>
> mb_def.nb_segs = 1;
> mb_def.data_off = RTE_PKTMBUF_HEADROOM;
> + mb_def.userdata = NULL;
> mb_def.port = rxq->port_id;
> rte_mbuf_refcnt_set(&mb_def, 1);
>
This won't actually work for the vector PMD. The userdata field is not covered
by the "rearm_data" part of the mbuf.
/Bruce
On Fri, 10 Jul 2015 10:32:17 +0100
Bruce Richardson <bruce.richardson@intel.com> wrote:
> On Thu, Jul 09, 2015 at 04:37:48PM -0700, Stephen Hemminger wrote:
> > From: Stephen Hemminger <shemming@brocade.com>
> >
> > For applications that use m->userdata the initialization can
> > be a signficant (10%) performance penalty.
> >
> > Rather than taking the cache penalty of initializing userdata
> > in the receive handling, do it in the place where mbuf is
> > already cache hot and being setup.
>
> Should the management of the userdata field not be the responsibility of the
> app itself, rather than having the PMD manage it? If the PMD does manage the
> userdata field, I would suggest taking the approach of having the field cleared
> by the mbuf library on free, rather than on RX.
The problem with that is m->userdata is that when the application
gets the mbuf, touching the userdata field causes false cache
sharing and we see a significant performance drop. Internally the
userdata field is only used for few special cases and 0/NULL
is used to indicate that no metadata is present.
On 07/10/2015 05:43 PM, Stephen Hemminger wrote:
> On Fri, 10 Jul 2015 10:32:17 +0100
> Bruce Richardson <bruce.richardson@intel.com> wrote:
>
>> On Thu, Jul 09, 2015 at 04:37:48PM -0700, Stephen Hemminger wrote:
>>> From: Stephen Hemminger <shemming@brocade.com>
>>>
>>> For applications that use m->userdata the initialization can
>>> be a signficant (10%) performance penalty.
>>>
>>> Rather than taking the cache penalty of initializing userdata
>>> in the receive handling, do it in the place where mbuf is
>>> already cache hot and being setup.
>>
>> Should the management of the userdata field not be the responsibility of the
>> app itself, rather than having the PMD manage it? If the PMD does manage the
>> userdata field, I would suggest taking the approach of having the field cleared
>> by the mbuf library on free, rather than on RX.
>
> The problem with that is m->userdata is that when the application
> gets the mbuf, touching the userdata field causes false cache
> sharing and we see a significant performance drop. Internally the
> userdata field is only used for few special cases and 0/NULL
> is used to indicate that no metadata is present.
>
I agree with Bruce: the userdata field is in the second cache line,
and we should avoid touching it in rx path. Resetting it in mbuf_free()
may not be the solution either: in tx path, the mbufs are freed when
we want to send a new mbuf on the hw ring entry, and the mbuf may
not be in the cache anymore.
I don't understand why doing it in the application induces a performance
drop compared to the PMD. What do you mean by false cache sharing?
Hi,
On 07/10/2015 05:43 PM, Stephen Hemminger wrote:
> On Fri, 10 Jul 2015 10:32:17 +0100
> Bruce Richardson <bruce.richardson@intel.com> wrote:
>
>> On Thu, Jul 09, 2015 at 04:37:48PM -0700, Stephen Hemminger wrote:
>>> From: Stephen Hemminger <shemming@brocade.com>
>>>
>>> For applications that use m->userdata the initialization can
>>> be a signficant (10%) performance penalty.
>>>
>>> Rather than taking the cache penalty of initializing userdata
>>> in the receive handling, do it in the place where mbuf is
>>> already cache hot and being setup.
>>
>> Should the management of the userdata field not be the responsibility of the
>> app itself, rather than having the PMD manage it? If the PMD does manage the
>> userdata field, I would suggest taking the approach of having the field cleared
>> by the mbuf library on free, rather than on RX.
>
> The problem with that is m->userdata is that when the application
> gets the mbuf, touching the userdata field causes false cache
> sharing and we see a significant performance drop. Internally the
> userdata field is only used for few special cases and 0/NULL
> is used to indicate that no metadata is present.
Agree with Bruce. The management of this field is the responsibility
of the application.
Also, this field is located in the second cache line, and we should
avoid to touch it in the rx functions. We had the same discussion for
the next field in this thread:
http://dpdk.org/ml/archives/dev/2015-June/019182.html
A pure-app solution would be to set it to NULL in:
- your mempool object init function
- in your mbuf_free() function, before calling rte_pktmbuf_free()
- before passing the mbuf to the tx driver
But a better solution in dpdk is probably what Bruce suggests.
Regards,
Olivier
@@ -406,6 +406,7 @@ static unsigned int refill_fl_usembufs(struct adapter *adap, struct sge_fl *q,
}
mbuf->data_off = RTE_PKTMBUF_HEADROOM;
+ mbuf->userdata = NULL;
mbuf->next = NULL;
mapping = (dma_addr_t)(mbuf->buf_physaddr + mbuf->data_off);
@@ -790,6 +790,7 @@ eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
rxq->crc_len);
rxm->data_off = RTE_PKTMBUF_HEADROOM;
rte_packet_prefetch((char *)rxm->buf_addr + rxm->data_off);
+ rxm->userdata = NULL;
rxm->nb_segs = 1;
rxm->next = NULL;
rxm->pkt_len = pkt_len;
@@ -959,6 +960,7 @@ eth_em_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
data_len = rte_le_to_cpu_16(rxd.length);
rxm->data_len = data_len;
rxm->data_off = RTE_PKTMBUF_HEADROOM;
+ rxm->userdata = NULL;
/*
* If this is the first buffer of the received packet,
@@ -774,6 +774,7 @@ eth_igb_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
pkt_len = (uint16_t) (rte_le_to_cpu_16(rxd.wb.upper.length) -
rxq->crc_len);
rxm->data_off = RTE_PKTMBUF_HEADROOM;
+ rxm->userdata = NULL;
rte_packet_prefetch((char *)rxm->buf_addr + rxm->data_off);
rxm->nb_segs = 1;
rxm->next = NULL;
@@ -948,6 +949,7 @@ eth_igb_recv_scattered_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
data_len = rte_le_to_cpu_16(rxd.wb.upper.length);
rxm->data_len = data_len;
rxm->data_off = RTE_PKTMBUF_HEADROOM;
+ rxm->userdata = NULL;
/*
* If this is the first buffer of the received packet,
@@ -342,6 +342,7 @@ static int enic_rq_alloc_buf(struct vnic_rq *rq)
}
hdr_mbuf->data_off = RTE_PKTMBUF_HEADROOM;
+ hdr_mbuf->userdata = NULL;
hdr_mbuf->nb_segs = 2;
hdr_mbuf->port = enic->port_id;
@@ -363,6 +364,7 @@ static int enic_rq_alloc_buf(struct vnic_rq *rq)
}
mbuf->data_off = RTE_PKTMBUF_HEADROOM;
+ mbuf->userdata = NULL;
mbuf->next = NULL;
dma_addr = (dma_addr_t)
@@ -257,6 +257,7 @@ fm10k_pktmbuf_reset(struct rte_mbuf *mb, uint8_t in_port)
mb->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb->buf_addr +
RTE_PKTMBUF_HEADROOM, FM10K_RX_DATABUF_ALIGN)
- (char *)mb->buf_addr);
+ mb->userdata = NULL;
mb->port = in_port;
}
@@ -804,6 +804,7 @@ i40e_rx_alloc_bufs(struct i40e_rx_queue *rxq)
rte_mbuf_refcnt_set(mb, 1);
mb->next = NULL;
mb->data_off = RTE_PKTMBUF_HEADROOM;
+ mb->userdata = NULL;
mb->nb_segs = 1;
mb->port = rxq->port_id;
dma_addr = rte_cpu_to_le_64(\
@@ -961,6 +962,7 @@ i40e_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
I40E_RXD_QW1_LENGTH_PBUF_SHIFT) - rxq->crc_len;
rxm->data_off = RTE_PKTMBUF_HEADROOM;
+ rxm->userdata = NULL;
rte_prefetch0(RTE_PTR_ADD(rxm->buf_addr, RTE_PKTMBUF_HEADROOM));
rxm->nb_segs = 1;
rxm->next = NULL;
@@ -1069,6 +1071,7 @@ i40e_recv_scattered_pkts(void *rx_queue,
I40E_RXD_QW1_LENGTH_PBUF_SHIFT;
rxm->data_len = rx_packet_len;
rxm->data_off = RTE_PKTMBUF_HEADROOM;
+ rxm->userdata = NULL;
/**
* If this is the first buffer of the received packet, set the
@@ -2436,6 +2439,7 @@ i40e_alloc_rx_queue_mbufs(struct i40e_rx_queue *rxq)
rte_mbuf_refcnt_set(mbuf, 1);
mbuf->next = NULL;
mbuf->data_off = RTE_PKTMBUF_HEADROOM;
+ mbuf->userdata = NULL;
mbuf->nb_segs = 1;
mbuf->port = rxq->port_id;
@@ -1054,6 +1054,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, bool reset_mbuf)
rte_mbuf_refcnt_set(mb, 1);
mb->data_off = RTE_PKTMBUF_HEADROOM;
+ mb->userdata = NULL;
/* populate the descriptors */
dma_addr = rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb));
@@ -1321,6 +1322,7 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
pkt_len = (uint16_t) (rte_le_to_cpu_16(rxd.wb.upper.length) -
rxq->crc_len);
rxm->data_off = RTE_PKTMBUF_HEADROOM;
+ rxm->userdata = NULL;
rte_packet_prefetch((char *)rxm->buf_addr + rxm->data_off);
rxm->nb_segs = 1;
rxm->next = NULL;
@@ -1596,6 +1598,7 @@ next_desc:
rxe->mbuf = nmb;
rxm->data_off = RTE_PKTMBUF_HEADROOM;
+ rxm->userdata = NULL;
rxdp->read.hdr_addr = dma;
rxdp->read.pkt_addr = dma;
} else
@@ -3462,6 +3465,7 @@ ixgbe_alloc_rx_queue_mbufs(struct ixgbe_rx_queue *rxq)
rte_mbuf_refcnt_set(mbuf, 1);
mbuf->next = NULL;
mbuf->data_off = RTE_PKTMBUF_HEADROOM;
+ mbuf->userdata = NULL;
mbuf->nb_segs = 1;
mbuf->port = rxq->port_id;
@@ -742,6 +742,7 @@ ixgbe_rxq_vec_setup(struct ixgbe_rx_queue *rxq)
mb_def.nb_segs = 1;
mb_def.data_off = RTE_PKTMBUF_HEADROOM;
+ mb_def.userdata = NULL;
mb_def.port = rxq->port_id;
rte_mbuf_refcnt_set(&mb_def, 1);
@@ -519,6 +519,7 @@ virtio_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
rxm->port = rxvq->port_id;
rxm->data_off = RTE_PKTMBUF_HEADROOM;
+ rxm->userdata = NULL;
rxm->nb_segs = 1;
rxm->next = NULL;
@@ -635,6 +636,7 @@ virtio_recv_mergeable_pkts(void *rx_queue,
seg_num = 1;
rxm->data_off = RTE_PKTMBUF_HEADROOM;
+ rxm->userdata = NULL;
rxm->nb_segs = seg_num;
rxm->next = NULL;
rxm->pkt_len = (uint32_t)(len[0] - hdr_size);
@@ -673,6 +675,7 @@ virtio_recv_mergeable_pkts(void *rx_queue,
rxm = rcv_pkts[extra_idx];
rxm->data_off = RTE_PKTMBUF_HEADROOM - hdr_size;
+ rxm->userdata = NULL;
rxm->next = NULL;
rxm->pkt_len = (uint32_t)(len[extra_idx]);
rxm->data_len = (uint16_t)(len[extra_idx]);
@@ -638,6 +638,7 @@ vmxnet3_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
rxm->pkt_len = (uint16_t)rcd->len;
rxm->data_len = (uint16_t)rcd->len;
rxm->data_off = RTE_PKTMBUF_HEADROOM;
+ rxm->userdata = NULL;
rxm->ol_flags = 0;
rxm->vlan_tci = 0;
@@ -111,6 +111,7 @@ eth_xenvirt_rx(void *q, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
PMD_RX_LOG(DEBUG, "packet len:%d\n", len[i]);
rxm->next = NULL;
rxm->data_off = RTE_PKTMBUF_HEADROOM;
+ rxm->userdata = NULL;
rxm->data_len = (uint16_t)(len[i] - sizeof(struct virtio_net_hdr));
rxm->nb_segs = 1;
rxm->port = pi->port_id;
@@ -794,6 +794,7 @@ static inline void rte_pktmbuf_reset(struct rte_mbuf *m)
m->vlan_tci_outer = 0;
m->nb_segs = 1;
m->port = 0xff;
+ m->userdata = NULL;
m->ol_flags = 0;
m->packet_type = 0;