From patchwork Tue Sep 15 13:35:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxime Coquelin X-Patchwork-Id: 77740 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6497DA04C7; Tue, 15 Sep 2020 15:36:50 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D30231C11A; Tue, 15 Sep 2020 15:36:19 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by dpdk.org (Postfix) with ESMTP id E0A551BFC5 for ; Tue, 15 Sep 2020 15:36:17 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600176977; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=u6pOEInt3nZcA7UFdWJ3wDgmHtwbp6FZHS+ODaA4kgI=; b=YOxlUwedkfXUxx608Rj1L5grUcmJpakIIJoQY3ie2RquOVCuwt6zbWtkyNMSJvzn4hMPKw mXt/aVB/ie36tonFlTFOqNwGuN8iuEdGg4AufNFyv9VMl5pwjRUS/OPlyhAtwTiBMvZYX9 7Yiyde34j275zDNTKuNzOeQIa7Yam+Y= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-269-9v4SsAkMNH2fCuvMd5jc8g-1; Tue, 15 Sep 2020 09:36:15 -0400 X-MC-Unique: 9v4SsAkMNH2fCuvMd5jc8g-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4C5DE81CBEB; Tue, 15 Sep 2020 13:36:14 +0000 (UTC) Received: from localhost.localdomain (unknown [10.36.110.16]) by smtp.corp.redhat.com (Postfix) with ESMTP id F0B057513D; Tue, 15 Sep 2020 13:36:12 +0000 (UTC) From: Maxime Coquelin To: dev@dpdk.org, chenbo.xia@intel.com, ian.stokes@intel.com Cc: Maxime Coquelin Date: Tue, 15 Sep 2020 15:35:58 +0200 Message-Id: <20200915133558.210639-5-maxime.coquelin@redhat.com> In-Reply-To: <20200915133558.210639-1-maxime.coquelin@redhat.com> References: <20200915133558.210639-1-maxime.coquelin@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=maxime.coquelin@redhat.com X-Mimecast-Spam-Score: 0.002 X-Mimecast-Originator: redhat.com Subject: [dpdk-dev] [PATCH 4/4] vhost: remove dequeue zero-copy support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Dequeue zero-copy removal was announced in DPDK v20.08. This feature brings constraints which makes the maintenance of the Vhost library difficult. Its limitations makes it also diffcult to use by the applications (Tx vring starvation). Removing it makes it easier to add new features, and also remove some code in the hot path, which should bring a performance improvement for the standard path. Signed-off-by: Maxime Coquelin --- doc/guides/prog_guide/vhost_lib.rst | 52 +---- lib/librte_vhost/rte_vhost.h | 2 +- lib/librte_vhost/socket.c | 47 ---- lib/librte_vhost/vhost.c | 14 -- lib/librte_vhost/vhost.h | 7 - lib/librte_vhost/vhost_user.c | 79 +------ lib/librte_vhost/virtio_net.c | 326 +++------------------------- 7 files changed, 33 insertions(+), 494 deletions(-) diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst index b892eec67a..ba4c62aeb8 100644 --- a/doc/guides/prog_guide/vhost_lib.rst +++ b/doc/guides/prog_guide/vhost_lib.rst @@ -51,50 +51,6 @@ The following is an overview of some key Vhost API functions: This reconnect option is enabled by default. However, it can be turned off by setting this flag. - - ``RTE_VHOST_USER_DEQUEUE_ZERO_COPY`` - - Dequeue zero copy will be enabled when this flag is set. It is disabled by - default. - - There are some truths (including limitations) you might want to know while - setting this flag: - - * zero copy is not good for small packets (typically for packet size below - 512). - - * zero copy is really good for VM2VM case. For iperf between two VMs, the - boost could be above 70% (when TSO is enabled). - - * For zero copy in VM2NIC case, guest Tx used vring may be starved if the - PMD driver consume the mbuf but not release them timely. - - For example, i40e driver has an optimization to maximum NIC pipeline which - postpones returning transmitted mbuf until only tx_free_threshold free - descs left. The virtio TX used ring will be starved if the formula - (num_i40e_tx_desc - num_virtio_tx_desc > tx_free_threshold) is true, since - i40e will not return back mbuf. - - A performance tip for tuning zero copy in VM2NIC case is to adjust the - frequency of mbuf free (i.e. adjust tx_free_threshold of i40e driver) to - balance consumer and producer. - - * Guest memory should be backended with huge pages to achieve better - performance. Using 1G page size is the best. - - When dequeue zero copy is enabled, the guest phys address and host phys - address mapping has to be established. Using non-huge pages means far - more page segments. To make it simple, DPDK vhost does a linear search - of those segments, thus the fewer the segments, the quicker we will get - the mapping. NOTE: we may speed it by using tree searching in future. - - * zero copy can not work when using vfio-pci with iommu mode currently, this - is because we don't setup iommu dma mapping for guest memory. If you have - to use vfio-pci driver, please insert vfio-pci kernel module in noiommu - mode. - - * The consumer of zero copy mbufs should consume these mbufs as soon as - possible, otherwise it may block the operations in vhost. - - ``RTE_VHOST_USER_IOMMU_SUPPORT`` IOMMU support will be enabled when this flag is set. It is disabled by @@ -362,16 +318,16 @@ Guest memory requirement * Memory pre-allocation - For non-zerocopy non-async data path, guest memory pre-allocation is not a + For non-async data path, guest memory pre-allocation is not a must. This can help save of memory. If users really want the guest memory to be pre-allocated (e.g., for performance reason), we can add option ``-mem-prealloc`` when starting QEMU. Or, we can lock all memory at vhost side which will force memory to be allocated when mmap at vhost side; option --mlockall in ovs-dpdk is an example in hand. - For async and zerocopy data path, we force the VM memory to be - pre-allocated at vhost lib when mapping the guest memory; and also we need - to lock the memory to prevent pages being swapped out to disk. + For async data path, we force the VM memory to be pre-allocated at vhost + lib when mapping the guest memory; and also we need to lock the memory to + prevent pages being swapped out to disk. * Memory sharing diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h index a94c84134d..46019df6fe 100644 --- a/lib/librte_vhost/rte_vhost.h +++ b/lib/librte_vhost/rte_vhost.h @@ -28,7 +28,7 @@ extern "C" { #define RTE_VHOST_USER_CLIENT (1ULL << 0) #define RTE_VHOST_USER_NO_RECONNECT (1ULL << 1) -#define RTE_VHOST_USER_DEQUEUE_ZERO_COPY (1ULL << 2) +#define RTE_VHOST_USER_RESERVED_1 (1ULL << 2) #define RTE_VHOST_USER_IOMMU_SUPPORT (1ULL << 3) #define RTE_VHOST_USER_POSTCOPY_SUPPORT (1ULL << 4) /* support mbuf with external buffer attached */ diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index 73e1dca95e..0169d36481 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -37,7 +37,6 @@ struct vhost_user_socket { struct sockaddr_un un; bool is_server; bool reconnect; - bool dequeue_zero_copy; bool iommu_support; bool use_builtin_virtio_net; bool extbuf; @@ -229,9 +228,6 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) vhost_attach_vdpa_device(vid, vsocket->vdpa_dev); - if (vsocket->dequeue_zero_copy) - vhost_enable_dequeue_zero_copy(vid); - if (vsocket->extbuf) vhost_enable_extbuf(vid); @@ -878,18 +874,8 @@ rte_vhost_driver_register(const char *path, uint64_t flags) goto out_free; } vsocket->vdpa_dev = NULL; - vsocket->dequeue_zero_copy = flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY; vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT; vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT; - - if (vsocket->dequeue_zero_copy && - (flags & RTE_VHOST_USER_IOMMU_SUPPORT)) { - VHOST_LOG_CONFIG(ERR, - "error: enabling dequeue zero copy and IOMMU features " - "simultaneously is not supported\n"); - goto out_mutex; - } - vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY; if (vsocket->async_copy && @@ -918,39 +904,6 @@ rte_vhost_driver_register(const char *path, uint64_t flags) vsocket->features = VIRTIO_NET_SUPPORTED_FEATURES; vsocket->protocol_features = VHOST_USER_PROTOCOL_FEATURES; - /* - * Dequeue zero copy can't assure descriptors returned in order. - * Also, it requires that the guest memory is populated, which is - * not compatible with postcopy. - */ - if (vsocket->dequeue_zero_copy) { - if (vsocket->extbuf) { - VHOST_LOG_CONFIG(ERR, - "error: zero copy is incompatible with external buffers\n"); - ret = -1; - goto out_mutex; - } - if (vsocket->linearbuf) { - VHOST_LOG_CONFIG(ERR, - "error: zero copy is incompatible with linear buffers\n"); - ret = -1; - goto out_mutex; - } - if ((flags & RTE_VHOST_USER_CLIENT) != 0) { - VHOST_LOG_CONFIG(ERR, - "error: zero copy is incompatible with vhost client mode\n"); - ret = -1; - goto out_mutex; - } - vsocket->supported_features &= ~(1ULL << VIRTIO_F_IN_ORDER); - vsocket->features &= ~(1ULL << VIRTIO_F_IN_ORDER); - - VHOST_LOG_CONFIG(INFO, - "Dequeue zero copy requested, disabling postcopy support\n"); - vsocket->protocol_features &= - ~(1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT); - } - if (vsocket->async_copy) { vsocket->supported_features &= ~(1ULL << VHOST_F_LOG_ALL); vsocket->features &= ~(1ULL << VHOST_F_LOG_ALL); diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index 8f20a0818f..c7cd34e42b 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -539,8 +539,6 @@ init_vring_queue(struct virtio_net *dev, uint32_t vring_idx) vhost_user_iotlb_init(dev, vring_idx); /* Backends are set to -1 indicating an inactive device. */ vq->backend = -1; - - TAILQ_INIT(&vq->zmbuf_list); } static void @@ -704,18 +702,6 @@ vhost_set_ifname(int vid, const char *if_name, unsigned int if_len) dev->ifname[sizeof(dev->ifname) - 1] = '\0'; } -void -vhost_enable_dequeue_zero_copy(int vid) -{ - struct virtio_net *dev = get_device(vid); - - if (dev == NULL) - return; - - dev->dequeue_zero_copy = 1; - VHOST_LOG_CONFIG(INFO, "dequeue zero copy is enabled\n"); -} - void vhost_set_builtin_virtio_net(int vid, bool enable) { diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 632f66d532..321cf6e99c 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -379,7 +379,6 @@ struct virtio_net { /* to tell if we need broadcast rarp packet */ int16_t broadcast_rarp; uint32_t nr_vring; - int dequeue_zero_copy; int async_copy; int extbuf; int linearbuf; @@ -897,10 +896,4 @@ mbuf_is_consumed(struct rte_mbuf *m) return true; } -static __rte_always_inline void -put_zmbuf(struct zcopy_mbuf *zmbuf) -{ - zmbuf->in_use = 0; -} - #endif /* _VHOST_NET_CDEV_H_ */ diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index c3c924faec..8af5ea0a98 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -134,29 +134,6 @@ get_blk_size(int fd) return ret == -1 ? (uint64_t)-1 : (uint64_t)stat.st_blksize; } -/* - * Reclaim all the outstanding zmbufs for a virtqueue. - */ -static void -drain_zmbuf_list(struct vhost_virtqueue *vq) -{ - struct zcopy_mbuf *zmbuf, *next; - - for (zmbuf = TAILQ_FIRST(&vq->zmbuf_list); - zmbuf != NULL; zmbuf = next) { - next = TAILQ_NEXT(zmbuf, next); - - while (!mbuf_is_consumed(zmbuf->mbuf)) - usleep(1000); - - TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next); - restore_mbuf(zmbuf->mbuf); - rte_pktmbuf_free(zmbuf->mbuf); - put_zmbuf(zmbuf); - vq->nr_zmbuf -= 1; - } -} - static void free_mem_region(struct virtio_net *dev) { @@ -167,14 +144,6 @@ free_mem_region(struct virtio_net *dev) if (!dev || !dev->mem) return; - if (dev->dequeue_zero_copy) { - for (i = 0; i < dev->nr_vring; i++) { - vq = dev->virtqueue[i]; - if (vq) - drain_zmbuf_list(vq); - } - } - for (i = 0; i < dev->mem->nregions; i++) { reg = &dev->mem->regions[i]; if (reg->host_user_addr) { @@ -454,23 +423,6 @@ vhost_user_set_vring_num(struct virtio_net **pdev, return RTE_VHOST_MSG_RESULT_ERR; } - if (dev->dequeue_zero_copy) { - vq->nr_zmbuf = 0; - vq->last_zmbuf_idx = 0; - vq->zmbuf_size = vq->size; - if (vq->zmbufs) - rte_free(vq->zmbufs); - vq->zmbufs = rte_zmalloc(NULL, vq->zmbuf_size * - sizeof(struct zcopy_mbuf), 0); - if (vq->zmbufs == NULL) { - VHOST_LOG_CONFIG(WARNING, - "failed to allocate mem for zero copy; " - "zero copy is force disabled\n"); - dev->dequeue_zero_copy = 0; - } - TAILQ_INIT(&vq->zmbuf_list); - } - if (vq_is_packed(dev)) { if (vq->shadow_used_packed) rte_free(vq->shadow_used_packed); @@ -524,7 +476,6 @@ numa_realloc(struct virtio_net *dev, int index) int oldnode, newnode; struct virtio_net *old_dev; struct vhost_virtqueue *old_vq, *vq; - struct zcopy_mbuf *new_zmbuf; struct vring_used_elem *new_shadow_used_split; struct vring_used_elem_packed *new_shadow_used_packed; struct batch_copy_elem *new_batch_copy_elems; @@ -555,16 +506,6 @@ numa_realloc(struct virtio_net *dev, int index) return dev; memcpy(vq, old_vq, sizeof(*vq)); - TAILQ_INIT(&vq->zmbuf_list); - - if (dev->dequeue_zero_copy) { - new_zmbuf = rte_malloc_socket(NULL, vq->zmbuf_size * - sizeof(struct zcopy_mbuf), 0, newnode); - if (new_zmbuf) { - rte_free(vq->zmbufs); - vq->zmbufs = new_zmbuf; - } - } if (vq_is_packed(dev)) { new_shadow_used_packed = rte_malloc_socket(NULL, @@ -1179,8 +1120,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, goto err_mmap; } - populate = (dev->dequeue_zero_copy || dev->async_copy) ? - MAP_POPULATE : 0; + populate = dev->async_copy ? MAP_POPULATE : 0; mmap_addr = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED | populate, fd, 0); @@ -1195,7 +1135,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, reg->host_user_addr = (uint64_t)(uintptr_t)mmap_addr + mmap_offset; - if (dev->dequeue_zero_copy || dev->async_copy) + if (dev->async_copy) if (add_guest_pages(dev, reg, alignment) < 0) { VHOST_LOG_CONFIG(ERR, "adding guest pages to region %u failed.\n", @@ -1933,15 +1873,6 @@ vhost_user_set_vring_kick(struct virtio_net **pdev, struct VhostUserMsg *msg, return RTE_VHOST_MSG_RESULT_OK; } -static void -free_zmbufs(struct vhost_virtqueue *vq) -{ - drain_zmbuf_list(vq); - - rte_free(vq->zmbufs); - vq->zmbufs = NULL; -} - /* * when virtio is stopped, qemu will send us the GET_VRING_BASE message. */ @@ -1996,8 +1927,6 @@ vhost_user_get_vring_base(struct virtio_net **pdev, vq->signalled_used_valid = false; - if (dev->dequeue_zero_copy) - free_zmbufs(vq); if (vq_is_packed(dev)) { rte_free(vq->shadow_used_packed); vq->shadow_used_packed = NULL; @@ -2051,10 +1980,6 @@ vhost_user_set_vring_enable(struct virtio_net **pdev, } } - /* On disable, rings have to be stopped being processed. */ - if (!enable && dev->dequeue_zero_copy) - drain_zmbuf_list(dev->virtqueue[index]); - dev->virtqueue[index]->enabled = enable; return RTE_VHOST_MSG_RESULT_OK; diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index bd9303c8a9..0a0bea1a5a 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -1946,7 +1946,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, struct rte_mbuf *m, struct rte_mempool *mbuf_pool) { uint32_t buf_avail, buf_offset; - uint64_t buf_addr, buf_iova, buf_len; + uint64_t buf_addr, buf_len; uint32_t mbuf_avail, mbuf_offset; uint32_t cpy_len; struct rte_mbuf *cur = m, *prev = m; @@ -1958,7 +1958,6 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, int error = 0; buf_addr = buf_vec[vec_idx].buf_addr; - buf_iova = buf_vec[vec_idx].buf_iova; buf_len = buf_vec[vec_idx].buf_len; if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1)) { @@ -1988,14 +1987,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, buf_offset = dev->vhost_hlen - buf_len; vec_idx++; buf_addr = buf_vec[vec_idx].buf_addr; - buf_iova = buf_vec[vec_idx].buf_iova; buf_len = buf_vec[vec_idx].buf_len; buf_avail = buf_len - buf_offset; } else if (buf_len == dev->vhost_hlen) { if (unlikely(++vec_idx >= nr_vec)) goto out; buf_addr = buf_vec[vec_idx].buf_addr; - buf_iova = buf_vec[vec_idx].buf_iova; buf_len = buf_vec[vec_idx].buf_len; buf_offset = 0; @@ -2012,48 +2009,23 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, mbuf_offset = 0; mbuf_avail = m->buf_len - RTE_PKTMBUF_HEADROOM; while (1) { - uint64_t hpa; - cpy_len = RTE_MIN(buf_avail, mbuf_avail); - /* - * A desc buf might across two host physical pages that are - * not continuous. In such case (gpa_to_hpa returns 0), data - * will be copied even though zero copy is enabled. - */ - if (unlikely(dev->dequeue_zero_copy && (hpa = gpa_to_hpa(dev, - buf_iova + buf_offset, cpy_len)))) { - cur->data_len = cpy_len; - cur->data_off = 0; - cur->buf_addr = - (void *)(uintptr_t)(buf_addr + buf_offset); - cur->buf_iova = hpa; - - /* - * In zero copy mode, one mbuf can only reference data - * for one or partial of one desc buff. - */ - mbuf_avail = cpy_len; - } else { - if (likely(cpy_len > MAX_BATCH_LEN || - vq->batch_copy_nb_elems >= vq->size || - (hdr && cur == m))) { - rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, - mbuf_offset), - (void *)((uintptr_t)(buf_addr + - buf_offset)), - cpy_len); - } else { - batch_copy[vq->batch_copy_nb_elems].dst = - rte_pktmbuf_mtod_offset(cur, void *, - mbuf_offset); - batch_copy[vq->batch_copy_nb_elems].src = + if (likely(cpy_len > MAX_BATCH_LEN || + vq->batch_copy_nb_elems >= vq->size || + (hdr && cur == m))) { + rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, + mbuf_offset), (void *)((uintptr_t)(buf_addr + - buf_offset)); - batch_copy[vq->batch_copy_nb_elems].len = - cpy_len; - vq->batch_copy_nb_elems++; - } + buf_offset)), cpy_len); + } else { + batch_copy[vq->batch_copy_nb_elems].dst = + rte_pktmbuf_mtod_offset(cur, void *, + mbuf_offset); + batch_copy[vq->batch_copy_nb_elems].src = + (void *)((uintptr_t)(buf_addr + buf_offset)); + batch_copy[vq->batch_copy_nb_elems].len = cpy_len; + vq->batch_copy_nb_elems++; } mbuf_avail -= cpy_len; @@ -2067,7 +2039,6 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, break; buf_addr = buf_vec[vec_idx].buf_addr; - buf_iova = buf_vec[vec_idx].buf_iova; buf_len = buf_vec[vec_idx].buf_len; buf_offset = 0; @@ -2089,8 +2060,6 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, error = -1; goto out; } - if (unlikely(dev->dequeue_zero_copy)) - rte_mbuf_refcnt_update(cur, 1); prev->next = cur; prev->data_len = mbuf_offset; @@ -2114,37 +2083,6 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, return error; } -static __rte_always_inline struct zcopy_mbuf * -get_zmbuf(struct vhost_virtqueue *vq) -{ - uint16_t i; - uint16_t last; - int tries = 0; - - /* search [last_zmbuf_idx, zmbuf_size) */ - i = vq->last_zmbuf_idx; - last = vq->zmbuf_size; - -again: - for (; i < last; i++) { - if (vq->zmbufs[i].in_use == 0) { - vq->last_zmbuf_idx = i + 1; - vq->zmbufs[i].in_use = 1; - return &vq->zmbufs[i]; - } - } - - tries++; - if (tries == 1) { - /* search [0, last_zmbuf_idx) */ - i = 0; - last = vq->last_zmbuf_idx; - goto again; - } - - return NULL; -} - static void virtio_dev_extbuf_free(void *addr __rte_unused, void *opaque) { @@ -2244,30 +2182,6 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, uint16_t dropped = 0; static bool allocerr_warned; - if (unlikely(dev->dequeue_zero_copy)) { - struct zcopy_mbuf *zmbuf, *next; - - for (zmbuf = TAILQ_FIRST(&vq->zmbuf_list); - zmbuf != NULL; zmbuf = next) { - next = TAILQ_NEXT(zmbuf, next); - - if (mbuf_is_consumed(zmbuf->mbuf)) { - update_shadow_used_ring_split(vq, - zmbuf->desc_idx, 0); - TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next); - restore_mbuf(zmbuf->mbuf); - rte_pktmbuf_free(zmbuf->mbuf); - put_zmbuf(zmbuf); - vq->nr_zmbuf -= 1; - } - } - - if (likely(vq->shadow_used_idx)) { - flush_shadow_used_ring_split(dev, vq); - vhost_vring_call_split(dev, vq); - } - } - /* * The ordering between avail index and * desc reads needs to be enforced. @@ -2300,8 +2214,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, VHOST_ACCESS_RO) < 0)) break; - if (likely(dev->dequeue_zero_copy == 0)) - update_shadow_used_ring_split(vq, head_idx, 0); + update_shadow_used_ring_split(vq, head_idx, 0); pkts[i] = virtio_dev_pktmbuf_alloc(dev, mbuf_pool, buf_len); if (unlikely(pkts[i] == NULL)) { @@ -2335,42 +2248,16 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, i++; break; } - - if (unlikely(dev->dequeue_zero_copy)) { - struct zcopy_mbuf *zmbuf; - - zmbuf = get_zmbuf(vq); - if (!zmbuf) { - rte_pktmbuf_free(pkts[i]); - dropped += 1; - i++; - break; - } - zmbuf->mbuf = pkts[i]; - zmbuf->desc_idx = head_idx; - - /* - * Pin lock the mbuf; we will check later to see - * whether the mbuf is freed (when we are the last - * user) or not. If that's the case, we then could - * update the used ring safely. - */ - rte_mbuf_refcnt_update(pkts[i], 1); - - vq->nr_zmbuf += 1; - TAILQ_INSERT_TAIL(&vq->zmbuf_list, zmbuf, next); - } } + vq->last_avail_idx += i; - if (likely(dev->dequeue_zero_copy == 0)) { - do_data_copy_dequeue(vq); - if (unlikely(i < count)) - vq->shadow_used_idx = i; - if (likely(vq->shadow_used_idx)) { - flush_shadow_used_ring_split(dev, vq); - vhost_vring_call_split(dev, vq); - } + do_data_copy_dequeue(vq); + if (unlikely(i < count)) + vq->shadow_used_idx = i; + if (likely(vq->shadow_used_idx)) { + flush_shadow_used_ring_split(dev, vq); + vhost_vring_call_split(dev, vq); } return (i - dropped); @@ -2570,162 +2457,6 @@ virtio_dev_tx_single_packed(struct virtio_net *dev, return ret; } -static __rte_always_inline int -virtio_dev_tx_batch_packed_zmbuf(struct virtio_net *dev, - struct vhost_virtqueue *vq, - struct rte_mempool *mbuf_pool, - struct rte_mbuf **pkts) -{ - struct zcopy_mbuf *zmbufs[PACKED_BATCH_SIZE]; - uintptr_t desc_addrs[PACKED_BATCH_SIZE]; - uint16_t ids[PACKED_BATCH_SIZE]; - uint16_t i; - - uint16_t avail_idx = vq->last_avail_idx; - - if (vhost_reserve_avail_batch_packed(dev, vq, mbuf_pool, pkts, - avail_idx, desc_addrs, ids)) - return -1; - - vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) - zmbufs[i] = get_zmbuf(vq); - - vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { - if (!zmbufs[i]) - goto free_pkt; - } - - vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { - zmbufs[i]->mbuf = pkts[i]; - zmbufs[i]->desc_idx = ids[i]; - zmbufs[i]->desc_count = 1; - } - - vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) - rte_mbuf_refcnt_update(pkts[i], 1); - - vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) - TAILQ_INSERT_TAIL(&vq->zmbuf_list, zmbufs[i], next); - - vq->nr_zmbuf += PACKED_BATCH_SIZE; - vq_inc_last_avail_packed(vq, PACKED_BATCH_SIZE); - - return 0; - -free_pkt: - vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) - rte_pktmbuf_free(pkts[i]); - - return -1; -} - -static __rte_always_inline int -virtio_dev_tx_single_packed_zmbuf(struct virtio_net *dev, - struct vhost_virtqueue *vq, - struct rte_mempool *mbuf_pool, - struct rte_mbuf **pkts) -{ - uint16_t buf_id, desc_count; - struct zcopy_mbuf *zmbuf; - - if (vhost_dequeue_single_packed(dev, vq, mbuf_pool, pkts, &buf_id, - &desc_count)) - return -1; - - zmbuf = get_zmbuf(vq); - if (!zmbuf) { - rte_pktmbuf_free(*pkts); - return -1; - } - zmbuf->mbuf = *pkts; - zmbuf->desc_idx = buf_id; - zmbuf->desc_count = desc_count; - - rte_mbuf_refcnt_update(*pkts, 1); - - vq->nr_zmbuf += 1; - TAILQ_INSERT_TAIL(&vq->zmbuf_list, zmbuf, next); - - vq_inc_last_avail_packed(vq, desc_count); - return 0; -} - -static __rte_always_inline void -free_zmbuf(struct vhost_virtqueue *vq) -{ - struct zcopy_mbuf *next = NULL; - struct zcopy_mbuf *zmbuf; - - for (zmbuf = TAILQ_FIRST(&vq->zmbuf_list); - zmbuf != NULL; zmbuf = next) { - next = TAILQ_NEXT(zmbuf, next); - - uint16_t last_used_idx = vq->last_used_idx; - - if (mbuf_is_consumed(zmbuf->mbuf)) { - uint16_t flags; - flags = vq->desc_packed[last_used_idx].flags; - if (vq->used_wrap_counter) { - flags |= VRING_DESC_F_USED; - flags |= VRING_DESC_F_AVAIL; - } else { - flags &= ~VRING_DESC_F_USED; - flags &= ~VRING_DESC_F_AVAIL; - } - - vq->desc_packed[last_used_idx].id = zmbuf->desc_idx; - vq->desc_packed[last_used_idx].len = 0; - - rte_smp_wmb(); - vq->desc_packed[last_used_idx].flags = flags; - - vq_inc_last_used_packed(vq, zmbuf->desc_count); - - TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next); - restore_mbuf(zmbuf->mbuf); - rte_pktmbuf_free(zmbuf->mbuf); - put_zmbuf(zmbuf); - vq->nr_zmbuf -= 1; - } - } -} - -static __rte_noinline uint16_t -virtio_dev_tx_packed_zmbuf(struct virtio_net *dev, - struct vhost_virtqueue *__rte_restrict vq, - struct rte_mempool *mbuf_pool, - struct rte_mbuf **__rte_restrict pkts, - uint32_t count) -{ - uint32_t pkt_idx = 0; - uint32_t remained = count; - - free_zmbuf(vq); - - do { - if (remained >= PACKED_BATCH_SIZE) { - if (!virtio_dev_tx_batch_packed_zmbuf(dev, vq, - mbuf_pool, &pkts[pkt_idx])) { - pkt_idx += PACKED_BATCH_SIZE; - remained -= PACKED_BATCH_SIZE; - continue; - } - } - - if (virtio_dev_tx_single_packed_zmbuf(dev, vq, mbuf_pool, - &pkts[pkt_idx])) - break; - pkt_idx++; - remained--; - - } while (remained); - - if (pkt_idx) - vhost_vring_call_packed(dev, vq); - - return pkt_idx; -} - static __rte_noinline uint16_t virtio_dev_tx_packed(struct virtio_net *dev, struct vhost_virtqueue *__rte_restrict vq, @@ -2841,14 +2572,9 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, count -= 1; } - if (vq_is_packed(dev)) { - if (unlikely(dev->dequeue_zero_copy)) - count = virtio_dev_tx_packed_zmbuf(dev, vq, mbuf_pool, - pkts, count); - else - count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, - count); - } else + if (vq_is_packed(dev)) + count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count); + else count = virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count); out: