[dpdk-dev,v2,4/7] vhost: add dequeue zero copy
Commit Message
See the bottom.
-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yuanhan Liu
Sent: Friday, September 23, 2016 5:13 AM
To: dev@dpdk.org
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; Yuanhan Liu <yuanhan.liu@linux.intel.com>
Subject: [dpdk-dev] [PATCH v2 4/7] vhost: add dequeue zero copy
The basic idea of dequeue zero copy is, instead of copying data from the desc buf, here we let the mbuf reference the desc buf addr directly.
Doing so, however, has one major issue: we can't update the used ring at the end of rte_vhost_dequeue_burst. Because we don't do the copy here, an update of the used ring would let the driver to reclaim the desc buf. As a result, DPDK might reference a stale memory region.
To update the used ring properly, this patch does several tricks:
- when mbuf references a desc buf, refcnt is added by 1.
This is to pin lock the mbuf, so that a mbuf free from the DPDK
won't actually free it, instead, refcnt is subtracted by 1.
- We chain all those mbuf together (by tailq)
And we check it every time on the rte_vhost_dequeue_burst entrance,
to see if the mbuf is freed (when refcnt equals to 1). If that
happens, it means we are the last user of this mbuf and we are
safe to update the used ring.
- "struct zcopy_mbuf" is introduced, to associate an mbuf with the
right desc idx.
Dequeue zero copy is introduced for performance reason, and some rough tests show about 50% perfomance boost for packet size 1500B. For small packets, (e.g. 64B), it actually slows a bit down (well, it could up to 15%). That is expected because this patch introduces some extra works, and it outweighs the benefit from saving few bytes copy.
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
---
v2: - use unlikely/likely for dequeue_zero_copy check, as it's not enabled
by default, as well as it has some limitations in vm2nic case.
- handle the case that a desc buf might across 2 host phys pages
- reset nr_zmbuf to 0 at set_vring_num
- set the zmbuf_size to vq->size, but not the double of it.
---
lib/librte_vhost/vhost.c | 2 +
lib/librte_vhost/vhost.h | 22 +++++-
lib/librte_vhost/vhost_user.c | 42 +++++++++- lib/librte_vhost/virtio_net.c | 173 +++++++++++++++++++++++++++++++++++++-----
4 files changed, 219 insertions(+), 20 deletions(-)
+ if (unlikely(dev->dequeue_zero_copy)) {
+ struct zcopy_mbuf *zmbuf;
+
+ zmbuf = get_zmbuf(vq);
+ if (!zmbuf)
+ return -1;
+ zmbuf->mbuf = m;
+ zmbuf->desc_idx = desc_idx;
+
+ /*
+ * Pin lock the mbuf; we will check later to see whether
+ * the mbuf is freed (when we are the last user) or not.
+ * If that's the case, we then could update the used ring
+ * safely.
+ */
+ rte_mbuf_refcnt_update(m, 1);
+
+ vq->nr_zmbuf += 1;
+ TAILQ_INSERT_TAIL(&vq->zmbuf_list, zmbuf, next);
+ }
+
hdr = (struct virtio_net_hdr *)((uintptr_t)desc_addr);
rte_prefetch0(hdr);
this function copy_desc_to_mbuf has changed on the dpdk-next-virtio repo. Based on current dpdk-next-virtio repo, the commit ID is as below:
commit b4f7b43cd9d3b6413f41221051d03a23bc5f5fbe
Author: Zhiyong Yang <zhiyong.yang@intel.com>
Date: Thu Sep 29 20:35:49 2016 +0800
Then you will find the parameter "struct vhost_virtqueue *vq" is removed, so if apply your patch on that commit ID, the build will fail, since no vq definition but we used it in the function.
Could you check? Thx.
== Build lib/librte_table
/home/qxu10/dpdk-zero/lib/librte_vhost/virtio_net.c: In function 'copy_desc_to_mbuf':
/home/qxu10/dpdk-zero/lib/librte_vhost/virtio_net.c:745:21: error: 'vq' undeclared (first use in this function)
zmbuf = get_zmbuf(vq);
^
/home/qxu10/dpdk-zero/lib/librte_vhost/virtio_net.c:745:21: note: each undeclared identifier is reported only once for each function it appears in
/home/qxu10/dpdk-zero/mk/internal/rte.compile-pre.mk:138: recipe for target 'virtio_net.o' failed
make[5]: *** [virtio_net.o] Error 1
/home/qxu10/dpdk-zero/mk/rte.subdir.mk:61: recipe for target 'librte_vhost' failed
make[4]: *** [librte_vhost] Error 2
make[4]: *** Waiting for unfinished jobs....
Comments
On Thu, Oct 06, 2016 at 02:37:27PM +0000, Xu, Qian Q wrote:
> this function copy_desc_to_mbuf has changed on the dpdk-next-virtio repo. Based on current dpdk-next-virtio repo, the commit ID is as below:
> commit b4f7b43cd9d3b6413f41221051d03a23bc5f5fbe
> Author: Zhiyong Yang <zhiyong.yang@intel.com>
> Date: Thu Sep 29 20:35:49 2016 +0800
>
> Then you will find the parameter "struct vhost_virtqueue *vq" is removed, so if apply your patch on that commit ID, the build will fail, since no vq definition but we used it in the function.
> Could you check? Thx.
I knew that: a rebase is needed, and I have done the rebase (locally);
just haven't sent it out yet.
--yliu
>
> == Build lib/librte_table
> /home/qxu10/dpdk-zero/lib/librte_vhost/virtio_net.c: In function 'copy_desc_to_mbuf':
> /home/qxu10/dpdk-zero/lib/librte_vhost/virtio_net.c:745:21: error: 'vq' undeclared (first use in this function)
> zmbuf = get_zmbuf(vq);
> ^
> /home/qxu10/dpdk-zero/lib/librte_vhost/virtio_net.c:745:21: note: each undeclared identifier is reported only once for each function it appears in
> /home/qxu10/dpdk-zero/mk/internal/rte.compile-pre.mk:138: recipe for target 'virtio_net.o' failed
> make[5]: *** [virtio_net.o] Error 1
> /home/qxu10/dpdk-zero/mk/rte.subdir.mk:61: recipe for target 'librte_vhost' failed
> make[4]: *** [librte_vhost] Error 2
> make[4]: *** Waiting for unfinished jobs....
>
Good to know. I will try v3. BTW, on the master branch, seems vhost PMD is broken. When we run --vdev 'eth_vhost0,xxxx', then it will report the error that the driver is not supported. V16.07 is OK, but I haven't got time to do git bisect.
-----Original Message-----
From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com]
Sent: Sunday, October 9, 2016 3:03 AM
To: Xu, Qian Q <qian.q.xu@intel.com>
Cc: dev@dpdk.org; Maxime Coquelin <maxime.coquelin@redhat.com>
Subject: Re: [dpdk-dev] [PATCH v2 4/7] vhost: add dequeue zero copy
On Thu, Oct 06, 2016 at 02:37:27PM +0000, Xu, Qian Q wrote:
> this function copy_desc_to_mbuf has changed on the dpdk-next-virtio repo. Based on current dpdk-next-virtio repo, the commit ID is as below:
> commit b4f7b43cd9d3b6413f41221051d03a23bc5f5fbe
> Author: Zhiyong Yang <zhiyong.yang@intel.com>
> Date: Thu Sep 29 20:35:49 2016 +0800
>
> Then you will find the parameter "struct vhost_virtqueue *vq" is removed, so if apply your patch on that commit ID, the build will fail, since no vq definition but we used it in the function.
> Could you check? Thx.
I knew that: a rebase is needed, and I have done the rebase (locally); just haven't sent it out yet.
--yliu
>
> == Build lib/librte_table
> /home/qxu10/dpdk-zero/lib/librte_vhost/virtio_net.c: In function 'copy_desc_to_mbuf':
> /home/qxu10/dpdk-zero/lib/librte_vhost/virtio_net.c:745:21: error: 'vq' undeclared (first use in this function)
> zmbuf = get_zmbuf(vq);
> ^
> /home/qxu10/dpdk-zero/lib/librte_vhost/virtio_net.c:745:21: note: each
> undeclared identifier is reported only once for each function it
> appears in
> /home/qxu10/dpdk-zero/mk/internal/rte.compile-pre.mk:138: recipe for
> target 'virtio_net.o' failed
> make[5]: *** [virtio_net.o] Error 1
> /home/qxu10/dpdk-zero/mk/rte.subdir.mk:61: recipe for target
> 'librte_vhost' failed
> make[4]: *** [librte_vhost] Error 2
> make[4]: *** Waiting for unfinished jobs....
>
Hi Xu,
On 10/10/2016 12:12 PM, Xu, Qian Q wrote:
> Good to know. I will try v3. BTW, on the master branch, seems vhost PMD is broken. When we run --vdev 'eth_vhost0,xxxx', then it will report the error that the driver is not supported. V16.07 is OK, but I haven't got time to do git bisect.
Name has been chenged to net_vhost0 for consistency with other PMDs.
Maxime
Oh, thx for the info, it's better to have some documentation update in R16.11 release notes.
-----Original Message-----
From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
Sent: Monday, October 10, 2016 11:14 AM
To: Xu, Qian Q <qian.q.xu@intel.com>; Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH v2 4/7] vhost: add dequeue zero copy
Hi Xu,
On 10/10/2016 12:12 PM, Xu, Qian Q wrote:
> Good to know. I will try v3. BTW, on the master branch, seems vhost PMD is broken. When we run --vdev 'eth_vhost0,xxxx', then it will report the error that the driver is not supported. V16.07 is OK, but I haven't got time to do git bisect.
Name has been chenged to net_vhost0 for consistency with other PMDs.
Maxime
I'm a little concerned if it's a correct way to change the name from release to release, some users may use eth_vhost for the driver, and found it was not working. Do we need also make the consistency b/w releases?
What's the name difference b/w eth and net? Other PMDs are not virtual devices. Virtio_user is also a virtual device, do we need to change the name,too?
-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Xu, Qian Q
Sent: Monday, October 10, 2016 11:23 AM
To: Maxime Coquelin <maxime.coquelin@redhat.com>; Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH v2 4/7] vhost: add dequeue zero copy
Oh, thx for the info, it's better to have some documentation update in R16.11 release notes.
-----Original Message-----
From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
Sent: Monday, October 10, 2016 11:14 AM
To: Xu, Qian Q <qian.q.xu@intel.com>; Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH v2 4/7] vhost: add dequeue zero copy
Hi Xu,
On 10/10/2016 12:12 PM, Xu, Qian Q wrote:
> Good to know. I will try v3. BTW, on the master branch, seems vhost PMD is broken. When we run --vdev 'eth_vhost0,xxxx', then it will report the error that the driver is not supported. V16.07 is OK, but I haven't got time to do git bisect.
Name has been chenged to net_vhost0 for consistency with other PMDs.
Maxime
On 10/10/2016 12:22 PM, Xu, Qian Q wrote:
> Oh, thx for the info, it's better to have some documentation update in R16.11 release notes.
I'm not the author of this change, just faced the same situation as
yours a a user.
But I agree that if documentation is not aligned, it should be updated.
Maxime
>
> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Monday, October 10, 2016 11:14 AM
> To: Xu, Qian Q <qian.q.xu@intel.com>; Yuanhan Liu <yuanhan.liu@linux.intel.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 4/7] vhost: add dequeue zero copy
>
> Hi Xu,
>
> On 10/10/2016 12:12 PM, Xu, Qian Q wrote:
>> Good to know. I will try v3. BTW, on the master branch, seems vhost PMD is broken. When we run --vdev 'eth_vhost0,xxxx', then it will report the error that the driver is not supported. V16.07 is OK, but I haven't got time to do git bisect.
> Name has been chenged to net_vhost0 for consistency with other PMDs.
>
> Maxime
>
@@ -141,6 +141,8 @@ init_vring_queue(struct vhost_virtqueue *vq, int qp_idx)
/* always set the default vq pair to enabled */
if (qp_idx == 0)
vq->enabled = 1;
+
+ TAILQ_INIT(&vq->zmbuf_list);
}
static void
@@ -36,6 +36,7 @@
#include <stdint.h>
#include <stdio.h>
#include <sys/types.h>
+#include <sys/queue.h>
#include <unistd.h>
#include <linux/vhost.h>
@@ -61,6 +62,19 @@ struct buf_vector {
uint32_t desc_idx;
};
+/*
+ * A structure to hold some fields needed in zero copy code path,
+ * mainly for associating an mbuf with the right desc_idx.
+ */
+struct zcopy_mbuf {
+ struct rte_mbuf *mbuf;
+ uint32_t desc_idx;
+ uint16_t in_use;
+
+ TAILQ_ENTRY(zcopy_mbuf) next;
+};
+TAILQ_HEAD(zcopy_mbuf_list, zcopy_mbuf);
+
/**
* Structure contains variables relevant to RX/TX virtqueues.
*/
@@ -85,6 +99,12 @@ struct vhost_virtqueue {
/* Physical address of used ring, for logging */
uint64_t log_guest_addr;
+
+ uint16_t nr_zmbuf;
+ uint16_t zmbuf_size;
+ uint16_t last_zmbuf_idx;
+ struct zcopy_mbuf *zmbufs;
+ struct zcopy_mbuf_list zmbuf_list;
} __rte_cache_aligned;
/* Old kernels have no such macro defined */ @@ -135,6 +155,7 @@ struct virtio_net {
/* to tell if we need broadcast rarp packet */
rte_atomic16_t broadcast_rarp;
uint32_t virt_qp_nb;
+ int dequeue_zero_copy;
struct vhost_virtqueue *virtqueue[VHOST_MAX_QUEUE_PAIRS * 2];
#define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
char ifname[IF_NAME_SZ];
@@ -146,7 +167,6 @@ struct virtio_net {
uint32_t nr_guest_pages;
uint32_t max_guest_pages;
struct guest_page *guest_pages;
-
} __rte_cache_aligned;
/**
@@ -180,7 +180,23 @@ static int
vhost_user_set_vring_num(struct virtio_net *dev,
struct vhost_vring_state *state)
{
- dev->virtqueue[state->index]->size = state->num;
+ struct vhost_virtqueue *vq = dev->virtqueue[state->index];
+
+ vq->size = state->num;
+
+ if (dev->dequeue_zero_copy) {
+ vq->nr_zmbuf = 0;
+ vq->last_zmbuf_idx = 0;
+ vq->zmbuf_size = vq->size;
+ vq->zmbufs = rte_zmalloc(NULL, vq->zmbuf_size *
+ sizeof(struct zcopy_mbuf), 0);
+ if (vq->zmbufs == NULL) {
+ RTE_LOG(WARNING, VHOST_CONFIG,
+ "failed to allocate mem for zero copy; "
+ "zero copy is force disabled\n");
+ dev->dequeue_zero_copy = 0;
+ }
+ }
return 0;
}
@@ -662,11 +678,32 @@ vhost_user_set_vring_kick(struct virtio_net *dev, struct VhostUserMsg *pmsg)
vq->kickfd = file.fd;
if (virtio_is_ready(dev) && !(dev->flags & VIRTIO_DEV_RUNNING)) {
+ if (dev->dequeue_zero_copy) {
+ RTE_LOG(INFO, VHOST_CONFIG,
+ "Tx zero copy is enabled\n");
+ }
+
if (notify_ops->new_device(dev->vid) == 0)
dev->flags |= VIRTIO_DEV_RUNNING;
}
}
+static void
+free_zmbufs(struct vhost_virtqueue *vq) {
+ struct zcopy_mbuf *zmbuf, *next;
+
+ for (zmbuf = TAILQ_FIRST(&vq->zmbuf_list);
+ zmbuf != NULL; zmbuf = next) {
+ next = TAILQ_NEXT(zmbuf, next);
+
+ rte_pktmbuf_free(zmbuf->mbuf);
+ TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next);
+ }
+
+ rte_free(vq->zmbufs);
+}
+
/*
* when virtio is stopped, qemu will send us the GET_VRING_BASE message.
*/
@@ -695,6 +732,9 @@ vhost_user_get_vring_base(struct virtio_net *dev,
dev->virtqueue[state->index]->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
+ if (dev->dequeue_zero_copy)
+ free_zmbufs(dev->virtqueue[state->index]);
+
return 0;
}
@@ -678,6 +678,43 @@ make_rarp_packet(struct rte_mbuf *rarp_mbuf, const struct ether_addr *mac)
return 0;
}
+static inline struct zcopy_mbuf *__attribute__((always_inline))
+get_zmbuf(struct vhost_virtqueue *vq) {
+ uint16_t i;
+ uint16_t last;
+ int tries = 0;
+
+ /* search [last_zmbuf_idx, zmbuf_size) */
+ i = vq->last_zmbuf_idx;
+ last = vq->zmbuf_size;
+
+again:
+ for (; i < last; i++) {
+ if (vq->zmbufs[i].in_use == 0) {
+ vq->last_zmbuf_idx = i + 1;
+ vq->zmbufs[i].in_use = 1;
+ return &vq->zmbufs[i];
+ }
+ }
+
+ tries++;
+ if (tries == 1) {
+ /* search [0, last_zmbuf_idx) */
+ i = 0;
+ last = vq->last_zmbuf_idx;
+ goto again;
+ }
+
+ return NULL;
+}
+
+static inline void __attribute__((always_inline)) put_zmbuf(struct
+zcopy_mbuf *zmbuf) {
+ zmbuf->in_use = 0;
+}
+
static inline int __attribute__((always_inline)) copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
struct rte_mbuf *m, uint16_t desc_idx, @@ -701,6 +738,27 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
if (unlikely(!desc_addr))
return -1;