[dpdk-dev,4/5] vhost: do not use rte_memcpy for virtio_hdr copy

Message ID 1449122773-25510-5-git-send-email-yuanhan.liu@linux.intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers

Commit Message

Yuanhan Liu Dec. 3, 2015, 6:06 a.m. UTC
  First of all, rte_memcpy() is mostly useful for coping big packets
by leveraging hardware advanced instructions like AVX. But for virtio
net hdr, which is 12 bytes at most, invoking rte_memcpy() will not
introduce any performance boost.

And, to my suprise, rte_memcpy() is huge. Since rte_memcpy() is
inlined, it takes more space every time we call it at a different
place.  Replacing the two rte_memcpy() with directly copy saves
nearly 12K bytes of code size!

    # without this patch
    $ size /path/to/vhost_rxtx.o
       text    data     bss     dec     hex filename
      36171       0       0   36171    8d4b /path/to/vhost_rxtx.o

    # with this patch
    $ size /path/to/vhost_rxtx.o
       text    data     bss     dec     hex filename
      24179       0       0   24179    5e73 /path/to/vhost_rxtx.o

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---
 lib/librte_vhost/vhost_rxtx.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)
  

Comments

Huawei Xie Jan. 27, 2016, 2:46 a.m. UTC | #1
On 12/3/2015 2:03 PM, Yuanhan Liu wrote:
> +	if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) {
> +		*(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr;
> +	} else {
> +		*(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr;
> +	}

Thanks!
We might simplify this further. Just reset the first two fields flags
and gso_type.
  
Yuanhan Liu Jan. 27, 2016, 3:22 a.m. UTC | #2
On Wed, Jan 27, 2016 at 02:46:39AM +0000, Xie, Huawei wrote:
> On 12/3/2015 2:03 PM, Yuanhan Liu wrote:
> > +	if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) {
> > +		*(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr;
> > +	} else {
> > +		*(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr;
> > +	}
> 
> Thanks!
> We might simplify this further. Just reset the first two fields flags
> and gso_type.

What's this "simplification" for? Don't even to say that we will add
TSO support, which modifies few more files, such as csum_start: reseting
the first two fields only is wrong here.

	--yliu
  
Huawei Xie Jan. 27, 2016, 5:56 a.m. UTC | #3
On 1/27/2016 11:22 AM, Yuanhan Liu wrote:
> On Wed, Jan 27, 2016 at 02:46:39AM +0000, Xie, Huawei wrote:
>> On 12/3/2015 2:03 PM, Yuanhan Liu wrote:
>>> +	if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) {
>>> +		*(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr;
>>> +	} else {
>>> +		*(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr;
>>> +	}
>> Thanks!
>> We might simplify this further. Just reset the first two fields flags
>> and gso_type.
> What's this "simplification" for? Don't even to say that we will add
> TSO support, which modifies few more files, such as csum_start: reseting
> the first two fields only is wrong here.

I know TSO before commenting, but at least in this implementation and
this specific patch, i guess zeroing two fields are enough.

What is wrong resetting only two fields?

>
> 	--yliu
>
  
Yuanhan Liu Jan. 27, 2016, 6:02 a.m. UTC | #4
On Wed, Jan 27, 2016 at 05:56:56AM +0000, Xie, Huawei wrote:
> On 1/27/2016 11:22 AM, Yuanhan Liu wrote:
> > On Wed, Jan 27, 2016 at 02:46:39AM +0000, Xie, Huawei wrote:
> >> On 12/3/2015 2:03 PM, Yuanhan Liu wrote:
> >>> +	if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) {
> >>> +		*(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr;
> >>> +	} else {
> >>> +		*(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr;
> >>> +	}
> >> Thanks!
> >> We might simplify this further. Just reset the first two fields flags
> >> and gso_type.
> > What's this "simplification" for? Don't even to say that we will add
> > TSO support, which modifies few more files, such as csum_start: reseting
> > the first two fields only is wrong here.
> 
> I know TSO before commenting, but at least in this implementation and
> this specific patch, i guess zeroing two fields are enough.
> 
> What is wrong resetting only two fields?

I then have to ask "What is the benifit of resetting only two fields"?
If doing so, we have to change it back for TSO. My proposal requires no
extra change when adding TSO support.

	--yliu
  
Huawei Xie Jan. 27, 2016, 6:16 a.m. UTC | #5
On 1/27/2016 2:02 PM, Yuanhan Liu wrote:
> On Wed, Jan 27, 2016 at 05:56:56AM +0000, Xie, Huawei wrote:
>> On 1/27/2016 11:22 AM, Yuanhan Liu wrote:
>>> On Wed, Jan 27, 2016 at 02:46:39AM +0000, Xie, Huawei wrote:
>>>> On 12/3/2015 2:03 PM, Yuanhan Liu wrote:
>>>>> +	if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) {
>>>>> +		*(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr;
>>>>> +	} else {
>>>>> +		*(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr;
>>>>> +	}
>>>> Thanks!
>>>> We might simplify this further. Just reset the first two fields flags
>>>> and gso_type.
>>> What's this "simplification" for? Don't even to say that we will add
>>> TSO support, which modifies few more files, such as csum_start: reseting
>>> the first two fields only is wrong here.
>> I know TSO before commenting, but at least in this implementation and
>> this specific patch, i guess zeroing two fields are enough.
>>
>> What is wrong resetting only two fields?
> I then have to ask "What is the benifit of resetting only two fields"?
> If doing so, we have to change it back for TSO. My proposal requires no
> extra change when adding TSO support.

? Benefit is we save four unnecessary stores.

>
> 	--yliu
>
  
Yuanhan Liu Jan. 27, 2016, 6:35 a.m. UTC | #6
On Wed, Jan 27, 2016 at 06:16:37AM +0000, Xie, Huawei wrote:
> On 1/27/2016 2:02 PM, Yuanhan Liu wrote:
> > On Wed, Jan 27, 2016 at 05:56:56AM +0000, Xie, Huawei wrote:
> >> On 1/27/2016 11:22 AM, Yuanhan Liu wrote:
> >>> On Wed, Jan 27, 2016 at 02:46:39AM +0000, Xie, Huawei wrote:
> >>>> On 12/3/2015 2:03 PM, Yuanhan Liu wrote:
> >>>>> +	if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) {
> >>>>> +		*(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr;
> >>>>> +	} else {
> >>>>> +		*(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr;
> >>>>> +	}
> >>>> Thanks!
> >>>> We might simplify this further. Just reset the first two fields flags
> >>>> and gso_type.
> >>> What's this "simplification" for? Don't even to say that we will add
> >>> TSO support, which modifies few more files, such as csum_start: reseting
> >>> the first two fields only is wrong here.
> >> I know TSO before commenting, but at least in this implementation and
> >> this specific patch, i guess zeroing two fields are enough.
> >>
> >> What is wrong resetting only two fields?
> > I then have to ask "What is the benifit of resetting only two fields"?
> > If doing so, we have to change it back for TSO. My proposal requires no
> > extra change when adding TSO support.
> 
> ? Benefit is we save four unnecessary stores.

Hmm..., the hdr size is 12 bytes at most. I mean, does it really matter,
coping 3 bytes, or coping 12 bytes in a row?

	--yliu
  

Patch

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 7464b6b..1e0a24e 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -70,14 +70,17 @@  copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	uint32_t cpy_len;
 	struct vring_desc *desc;
 	uint64_t desc_addr;
-	struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0};
+	struct virtio_net_hdr_mrg_rxbuf hdr = {{0, 0, 0, 0, 0, 0}, 0};
 
 	desc = &vq->desc[desc_idx];
 	desc_addr = gpa_to_vva(dev, desc->addr);
 	rte_prefetch0((void *)(uintptr_t)desc_addr);
 
-	rte_memcpy((void *)(uintptr_t)desc_addr,
-			(const void *)&virtio_hdr, vq->vhost_hlen);
+	if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) {
+		*(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr;
+	} else {
+		*(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr;
+	}
 	PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0);
 
 	desc_offset = vq->vhost_hlen;
@@ -340,7 +343,7 @@  copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			    uint16_t res_start_idx, uint16_t res_end_idx,
 			    struct rte_mbuf *pkt)
 {
-	struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0};
+	struct virtio_net_hdr_mrg_rxbuf hdr = {{0, 0, 0, 0, 0, 0}, 0};
 	uint32_t vec_idx = 0;
 	uint16_t cur_idx = res_start_idx;
 	uint64_t desc_addr;
@@ -361,13 +364,16 @@  copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	rte_prefetch0((void *)(uintptr_t)desc_addr);
 
-	virtio_hdr.num_buffers = res_end_idx - res_start_idx;
+	hdr.num_buffers = res_end_idx - res_start_idx;
 
 	LOG_DEBUG(VHOST_DATA, "(%"PRIu64") RX: Num merge buffers %d\n",
 		dev->device_fh, virtio_hdr.num_buffers);
 
-	rte_memcpy((void *)(uintptr_t)desc_addr,
-		(const void *)&virtio_hdr, vq->vhost_hlen);
+	if (vq->vhost_hlen == sizeof(struct virtio_net_hdr_mrg_rxbuf)) {
+		*(struct virtio_net_hdr_mrg_rxbuf *)(uintptr_t)desc_addr = hdr;
+	} else {
+		*(struct virtio_net_hdr *)(uintptr_t)desc_addr = hdr.hdr;
+	}
 	PRINT_PACKET(dev, (uintptr_t)desc_addr, vq->vhost_hlen, 0);
 
 	desc_avail  = vq->buf_vec[vec_idx].buf_len - vq->vhost_hlen;