From patchwork Mon Sep 28 09:17:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxime Coquelin X-Patchwork-Id: 78989 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6BA48A04AF; Mon, 28 Sep 2020 11:19:00 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 9F2971D52D; Mon, 28 Sep 2020 11:17:37 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by dpdk.org (Postfix) with ESMTP id D64F91D156 for ; Mon, 28 Sep 2020 11:17:30 +0200 (CEST) Dkim-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1601284649; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0X2xsO1HjNOcdrGZ3XwQ6RrFZNDaPXRrUS8zaJTsnek=; b=EzhELRDhm3Rkz8grX0+vVjvrC2xLyxHLMjvqlsfH3RANhC2DCukZuNWjczCdajIaL6/MMI ISWyKiNvnPGpU1z5qIbq5j2q+HgDdkwSpCTynIvv1T/EQIuCCAxto0uKNJzrULMixIy7ba M1+zzpLBAG6bpEB7+6tI5Gc2WquWeuw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-252-S6PiiTrQMh2b1up8JiKFfw-1; Mon, 28 Sep 2020 05:17:27 -0400 X-MC-Unique: S6PiiTrQMh2b1up8JiKFfw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 552771DDE9; Mon, 28 Sep 2020 09:17:26 +0000 (UTC) Received: from localhost.localdomain (unknown [10.36.110.36]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0B4D078822; Mon, 28 Sep 2020 09:17:24 +0000 (UTC) From: Maxime Coquelin To: dev@dpdk.org, chenbo.xia@intel.com, ian.stokes@intel.com Cc: Maxime Coquelin Date: Mon, 28 Sep 2020 11:17:12 +0200 Message-Id: <20200928091712.7946-5-maxime.coquelin@redhat.com> In-Reply-To: <20200928091712.7946-1-maxime.coquelin@redhat.com> References: <20200928091712.7946-1-maxime.coquelin@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=maxime.coquelin@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Subject: [dpdk-dev] [PATCH v2 4/4] vhost: remove dequeue zero-copy support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Dequeue zero-copy removal was announced in DPDK v20.08. This feature brings constraints which makes the maintenance of the Vhost library difficult. Its limitations makes it also difficult to use by the applications (Tx vring starvation). Removing it makes it easier to add new features, and also remove some code in the hot path, which should bring a performance improvement for the standard path. Signed-off-by: Maxime Coquelin Reviewed-by: Chenbo Xia --- doc/guides/prog_guide/vhost_lib.rst | 52 +---- lib/librte_vhost/rte_vhost.h | 2 +- lib/librte_vhost/socket.c | 47 ---- lib/librte_vhost/vhost.c | 14 -- lib/librte_vhost/vhost.h | 28 --- lib/librte_vhost/vhost_user.c | 80 +------ lib/librte_vhost/virtio_net.c | 326 +++------------------------- 7 files changed, 33 insertions(+), 516 deletions(-) diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst index b892eec67a..ba4c62aeb8 100644 --- a/doc/guides/prog_guide/vhost_lib.rst +++ b/doc/guides/prog_guide/vhost_lib.rst @@ -51,50 +51,6 @@ The following is an overview of some key Vhost API functions: This reconnect option is enabled by default. However, it can be turned off by setting this flag. - - ``RTE_VHOST_USER_DEQUEUE_ZERO_COPY`` - - Dequeue zero copy will be enabled when this flag is set. It is disabled by - default. - - There are some truths (including limitations) you might want to know while - setting this flag: - - * zero copy is not good for small packets (typically for packet size below - 512). - - * zero copy is really good for VM2VM case. For iperf between two VMs, the - boost could be above 70% (when TSO is enabled). - - * For zero copy in VM2NIC case, guest Tx used vring may be starved if the - PMD driver consume the mbuf but not release them timely. - - For example, i40e driver has an optimization to maximum NIC pipeline which - postpones returning transmitted mbuf until only tx_free_threshold free - descs left. The virtio TX used ring will be starved if the formula - (num_i40e_tx_desc - num_virtio_tx_desc > tx_free_threshold) is true, since - i40e will not return back mbuf. - - A performance tip for tuning zero copy in VM2NIC case is to adjust the - frequency of mbuf free (i.e. adjust tx_free_threshold of i40e driver) to - balance consumer and producer. - - * Guest memory should be backended with huge pages to achieve better - performance. Using 1G page size is the best. - - When dequeue zero copy is enabled, the guest phys address and host phys - address mapping has to be established. Using non-huge pages means far - more page segments. To make it simple, DPDK vhost does a linear search - of those segments, thus the fewer the segments, the quicker we will get - the mapping. NOTE: we may speed it by using tree searching in future. - - * zero copy can not work when using vfio-pci with iommu mode currently, this - is because we don't setup iommu dma mapping for guest memory. If you have - to use vfio-pci driver, please insert vfio-pci kernel module in noiommu - mode. - - * The consumer of zero copy mbufs should consume these mbufs as soon as - possible, otherwise it may block the operations in vhost. - - ``RTE_VHOST_USER_IOMMU_SUPPORT`` IOMMU support will be enabled when this flag is set. It is disabled by @@ -362,16 +318,16 @@ Guest memory requirement * Memory pre-allocation - For non-zerocopy non-async data path, guest memory pre-allocation is not a + For non-async data path, guest memory pre-allocation is not a must. This can help save of memory. If users really want the guest memory to be pre-allocated (e.g., for performance reason), we can add option ``-mem-prealloc`` when starting QEMU. Or, we can lock all memory at vhost side which will force memory to be allocated when mmap at vhost side; option --mlockall in ovs-dpdk is an example in hand. - For async and zerocopy data path, we force the VM memory to be - pre-allocated at vhost lib when mapping the guest memory; and also we need - to lock the memory to prevent pages being swapped out to disk. + For async data path, we force the VM memory to be pre-allocated at vhost + lib when mapping the guest memory; and also we need to lock the memory to + prevent pages being swapped out to disk. * Memory sharing diff --git a/lib/librte_vhost/rte_vhost.h b/lib/librte_vhost/rte_vhost.h index a94c84134d..46019df6fe 100644 --- a/lib/librte_vhost/rte_vhost.h +++ b/lib/librte_vhost/rte_vhost.h @@ -28,7 +28,7 @@ extern "C" { #define RTE_VHOST_USER_CLIENT (1ULL << 0) #define RTE_VHOST_USER_NO_RECONNECT (1ULL << 1) -#define RTE_VHOST_USER_DEQUEUE_ZERO_COPY (1ULL << 2) +#define RTE_VHOST_USER_RESERVED_1 (1ULL << 2) #define RTE_VHOST_USER_IOMMU_SUPPORT (1ULL << 3) #define RTE_VHOST_USER_POSTCOPY_SUPPORT (1ULL << 4) /* support mbuf with external buffer attached */ diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index 73e1dca95e..0169d36481 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -37,7 +37,6 @@ struct vhost_user_socket { struct sockaddr_un un; bool is_server; bool reconnect; - bool dequeue_zero_copy; bool iommu_support; bool use_builtin_virtio_net; bool extbuf; @@ -229,9 +228,6 @@ vhost_user_add_connection(int fd, struct vhost_user_socket *vsocket) vhost_attach_vdpa_device(vid, vsocket->vdpa_dev); - if (vsocket->dequeue_zero_copy) - vhost_enable_dequeue_zero_copy(vid); - if (vsocket->extbuf) vhost_enable_extbuf(vid); @@ -878,18 +874,8 @@ rte_vhost_driver_register(const char *path, uint64_t flags) goto out_free; } vsocket->vdpa_dev = NULL; - vsocket->dequeue_zero_copy = flags & RTE_VHOST_USER_DEQUEUE_ZERO_COPY; vsocket->extbuf = flags & RTE_VHOST_USER_EXTBUF_SUPPORT; vsocket->linearbuf = flags & RTE_VHOST_USER_LINEARBUF_SUPPORT; - - if (vsocket->dequeue_zero_copy && - (flags & RTE_VHOST_USER_IOMMU_SUPPORT)) { - VHOST_LOG_CONFIG(ERR, - "error: enabling dequeue zero copy and IOMMU features " - "simultaneously is not supported\n"); - goto out_mutex; - } - vsocket->async_copy = flags & RTE_VHOST_USER_ASYNC_COPY; if (vsocket->async_copy && @@ -918,39 +904,6 @@ rte_vhost_driver_register(const char *path, uint64_t flags) vsocket->features = VIRTIO_NET_SUPPORTED_FEATURES; vsocket->protocol_features = VHOST_USER_PROTOCOL_FEATURES; - /* - * Dequeue zero copy can't assure descriptors returned in order. - * Also, it requires that the guest memory is populated, which is - * not compatible with postcopy. - */ - if (vsocket->dequeue_zero_copy) { - if (vsocket->extbuf) { - VHOST_LOG_CONFIG(ERR, - "error: zero copy is incompatible with external buffers\n"); - ret = -1; - goto out_mutex; - } - if (vsocket->linearbuf) { - VHOST_LOG_CONFIG(ERR, - "error: zero copy is incompatible with linear buffers\n"); - ret = -1; - goto out_mutex; - } - if ((flags & RTE_VHOST_USER_CLIENT) != 0) { - VHOST_LOG_CONFIG(ERR, - "error: zero copy is incompatible with vhost client mode\n"); - ret = -1; - goto out_mutex; - } - vsocket->supported_features &= ~(1ULL << VIRTIO_F_IN_ORDER); - vsocket->features &= ~(1ULL << VIRTIO_F_IN_ORDER); - - VHOST_LOG_CONFIG(INFO, - "Dequeue zero copy requested, disabling postcopy support\n"); - vsocket->protocol_features &= - ~(1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT); - } - if (vsocket->async_copy) { vsocket->supported_features &= ~(1ULL << VHOST_F_LOG_ALL); vsocket->features &= ~(1ULL << VHOST_F_LOG_ALL); diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index 8f20a0818f..c7cd34e42b 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -539,8 +539,6 @@ init_vring_queue(struct virtio_net *dev, uint32_t vring_idx) vhost_user_iotlb_init(dev, vring_idx); /* Backends are set to -1 indicating an inactive device. */ vq->backend = -1; - - TAILQ_INIT(&vq->zmbuf_list); } static void @@ -704,18 +702,6 @@ vhost_set_ifname(int vid, const char *if_name, unsigned int if_len) dev->ifname[sizeof(dev->ifname) - 1] = '\0'; } -void -vhost_enable_dequeue_zero_copy(int vid) -{ - struct virtio_net *dev = get_device(vid); - - if (dev == NULL) - return; - - dev->dequeue_zero_copy = 1; - VHOST_LOG_CONFIG(INFO, "dequeue zero copy is enabled\n"); -} - void vhost_set_builtin_virtio_net(int vid, bool enable) { diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 73a1ed8895..20ccdc9bdc 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -94,20 +94,6 @@ struct buf_vector { uint32_t desc_idx; }; -/* - * A structure to hold some fields needed in zero copy code path, - * mainly for associating an mbuf with the right desc_idx. - */ -struct zcopy_mbuf { - struct rte_mbuf *mbuf; - uint32_t desc_idx; - uint16_t desc_count; - uint16_t in_use; - - TAILQ_ENTRY(zcopy_mbuf) next; -}; -TAILQ_HEAD(zcopy_mbuf_list, zcopy_mbuf); - /* * Structure contains the info for each batched memory copy. */ @@ -185,12 +171,6 @@ struct vhost_virtqueue { struct rte_vhost_resubmit_info *resubmit_inflight; uint64_t global_counter; - uint16_t nr_zmbuf; - uint16_t zmbuf_size; - uint16_t last_zmbuf_idx; - struct zcopy_mbuf *zmbufs; - struct zcopy_mbuf_list zmbuf_list; - union { struct vring_used_elem *shadow_used_split; struct vring_used_elem_packed *shadow_used_packed; @@ -380,7 +360,6 @@ struct virtio_net { /* to tell if we need broadcast rarp packet */ int16_t broadcast_rarp; uint32_t nr_vring; - int dequeue_zero_copy; int async_copy; int extbuf; int linearbuf; @@ -718,7 +697,6 @@ int alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx); void vhost_attach_vdpa_device(int vid, struct rte_vdpa_device *dev); void vhost_set_ifname(int, const char *if_name, unsigned int if_len); -void vhost_enable_dequeue_zero_copy(int vid); void vhost_set_builtin_virtio_net(int vid, bool enable); void vhost_enable_extbuf(int vid); void vhost_enable_linearbuf(int vid); @@ -898,10 +876,4 @@ mbuf_is_consumed(struct rte_mbuf *m) return true; } -static __rte_always_inline void -put_zmbuf(struct zcopy_mbuf *zmbuf) -{ - zmbuf->in_use = 0; -} - #endif /* _VHOST_NET_CDEV_H_ */ diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 501218e192..3196391c32 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -134,47 +134,15 @@ get_blk_size(int fd) return ret == -1 ? (uint64_t)-1 : (uint64_t)stat.st_blksize; } -/* - * Reclaim all the outstanding zmbufs for a virtqueue. - */ -static void -drain_zmbuf_list(struct vhost_virtqueue *vq) -{ - struct zcopy_mbuf *zmbuf, *next; - - for (zmbuf = TAILQ_FIRST(&vq->zmbuf_list); - zmbuf != NULL; zmbuf = next) { - next = TAILQ_NEXT(zmbuf, next); - - while (!mbuf_is_consumed(zmbuf->mbuf)) - usleep(1000); - - TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next); - restore_mbuf(zmbuf->mbuf); - rte_pktmbuf_free(zmbuf->mbuf); - put_zmbuf(zmbuf); - vq->nr_zmbuf -= 1; - } -} - static void free_mem_region(struct virtio_net *dev) { uint32_t i; struct rte_vhost_mem_region *reg; - struct vhost_virtqueue *vq; if (!dev || !dev->mem) return; - if (dev->dequeue_zero_copy) { - for (i = 0; i < dev->nr_vring; i++) { - vq = dev->virtqueue[i]; - if (vq) - drain_zmbuf_list(vq); - } - } - for (i = 0; i < dev->mem->nregions; i++) { reg = &dev->mem->regions[i]; if (reg->host_user_addr) { @@ -454,23 +422,6 @@ vhost_user_set_vring_num(struct virtio_net **pdev, return RTE_VHOST_MSG_RESULT_ERR; } - if (dev->dequeue_zero_copy) { - vq->nr_zmbuf = 0; - vq->last_zmbuf_idx = 0; - vq->zmbuf_size = vq->size; - if (vq->zmbufs) - rte_free(vq->zmbufs); - vq->zmbufs = rte_zmalloc(NULL, vq->zmbuf_size * - sizeof(struct zcopy_mbuf), 0); - if (vq->zmbufs == NULL) { - VHOST_LOG_CONFIG(WARNING, - "failed to allocate mem for zero copy; " - "zero copy is force disabled\n"); - dev->dequeue_zero_copy = 0; - } - TAILQ_INIT(&vq->zmbuf_list); - } - if (vq_is_packed(dev)) { if (vq->shadow_used_packed) rte_free(vq->shadow_used_packed); @@ -524,7 +475,6 @@ numa_realloc(struct virtio_net *dev, int index) int oldnode, newnode; struct virtio_net *old_dev; struct vhost_virtqueue *old_vq, *vq; - struct zcopy_mbuf *new_zmbuf; struct vring_used_elem *new_shadow_used_split; struct vring_used_elem_packed *new_shadow_used_packed; struct batch_copy_elem *new_batch_copy_elems; @@ -555,16 +505,6 @@ numa_realloc(struct virtio_net *dev, int index) return dev; memcpy(vq, old_vq, sizeof(*vq)); - TAILQ_INIT(&vq->zmbuf_list); - - if (dev->dequeue_zero_copy) { - new_zmbuf = rte_malloc_socket(NULL, vq->zmbuf_size * - sizeof(struct zcopy_mbuf), 0, newnode); - if (new_zmbuf) { - rte_free(vq->zmbufs); - vq->zmbufs = new_zmbuf; - } - } if (vq_is_packed(dev)) { new_shadow_used_packed = rte_malloc_socket(NULL, @@ -1179,8 +1119,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, goto err_mmap; } - populate = (dev->dequeue_zero_copy || dev->async_copy) ? - MAP_POPULATE : 0; + populate = dev->async_copy ? MAP_POPULATE : 0; mmap_addr = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED | populate, fd, 0); @@ -1195,7 +1134,7 @@ vhost_user_set_mem_table(struct virtio_net **pdev, struct VhostUserMsg *msg, reg->host_user_addr = (uint64_t)(uintptr_t)mmap_addr + mmap_offset; - if (dev->dequeue_zero_copy || dev->async_copy) + if (dev->async_copy) if (add_guest_pages(dev, reg, alignment) < 0) { VHOST_LOG_CONFIG(ERR, "adding guest pages to region %u failed.\n", @@ -1933,15 +1872,6 @@ vhost_user_set_vring_kick(struct virtio_net **pdev, struct VhostUserMsg *msg, return RTE_VHOST_MSG_RESULT_OK; } -static void -free_zmbufs(struct vhost_virtqueue *vq) -{ - drain_zmbuf_list(vq); - - rte_free(vq->zmbufs); - vq->zmbufs = NULL; -} - /* * when virtio is stopped, qemu will send us the GET_VRING_BASE message. */ @@ -1996,8 +1926,6 @@ vhost_user_get_vring_base(struct virtio_net **pdev, vq->signalled_used_valid = false; - if (dev->dequeue_zero_copy) - free_zmbufs(vq); if (vq_is_packed(dev)) { rte_free(vq->shadow_used_packed); vq->shadow_used_packed = NULL; @@ -2051,10 +1979,6 @@ vhost_user_set_vring_enable(struct virtio_net **pdev, } } - /* On disable, rings have to be stopped being processed. */ - if (!enable && dev->dequeue_zero_copy) - drain_zmbuf_list(dev->virtqueue[index]); - dev->virtqueue[index]->enabled = enable; return RTE_VHOST_MSG_RESULT_OK; diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index bd9303c8a9..0a0bea1a5a 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -1946,7 +1946,7 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, struct rte_mbuf *m, struct rte_mempool *mbuf_pool) { uint32_t buf_avail, buf_offset; - uint64_t buf_addr, buf_iova, buf_len; + uint64_t buf_addr, buf_len; uint32_t mbuf_avail, mbuf_offset; uint32_t cpy_len; struct rte_mbuf *cur = m, *prev = m; @@ -1958,7 +1958,6 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, int error = 0; buf_addr = buf_vec[vec_idx].buf_addr; - buf_iova = buf_vec[vec_idx].buf_iova; buf_len = buf_vec[vec_idx].buf_len; if (unlikely(buf_len < dev->vhost_hlen && nr_vec <= 1)) { @@ -1988,14 +1987,12 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, buf_offset = dev->vhost_hlen - buf_len; vec_idx++; buf_addr = buf_vec[vec_idx].buf_addr; - buf_iova = buf_vec[vec_idx].buf_iova; buf_len = buf_vec[vec_idx].buf_len; buf_avail = buf_len - buf_offset; } else if (buf_len == dev->vhost_hlen) { if (unlikely(++vec_idx >= nr_vec)) goto out; buf_addr = buf_vec[vec_idx].buf_addr; - buf_iova = buf_vec[vec_idx].buf_iova; buf_len = buf_vec[vec_idx].buf_len; buf_offset = 0; @@ -2012,48 +2009,23 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, mbuf_offset = 0; mbuf_avail = m->buf_len - RTE_PKTMBUF_HEADROOM; while (1) { - uint64_t hpa; - cpy_len = RTE_MIN(buf_avail, mbuf_avail); - /* - * A desc buf might across two host physical pages that are - * not continuous. In such case (gpa_to_hpa returns 0), data - * will be copied even though zero copy is enabled. - */ - if (unlikely(dev->dequeue_zero_copy && (hpa = gpa_to_hpa(dev, - buf_iova + buf_offset, cpy_len)))) { - cur->data_len = cpy_len; - cur->data_off = 0; - cur->buf_addr = - (void *)(uintptr_t)(buf_addr + buf_offset); - cur->buf_iova = hpa; - - /* - * In zero copy mode, one mbuf can only reference data - * for one or partial of one desc buff. - */ - mbuf_avail = cpy_len; - } else { - if (likely(cpy_len > MAX_BATCH_LEN || - vq->batch_copy_nb_elems >= vq->size || - (hdr && cur == m))) { - rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, - mbuf_offset), - (void *)((uintptr_t)(buf_addr + - buf_offset)), - cpy_len); - } else { - batch_copy[vq->batch_copy_nb_elems].dst = - rte_pktmbuf_mtod_offset(cur, void *, - mbuf_offset); - batch_copy[vq->batch_copy_nb_elems].src = + if (likely(cpy_len > MAX_BATCH_LEN || + vq->batch_copy_nb_elems >= vq->size || + (hdr && cur == m))) { + rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, + mbuf_offset), (void *)((uintptr_t)(buf_addr + - buf_offset)); - batch_copy[vq->batch_copy_nb_elems].len = - cpy_len; - vq->batch_copy_nb_elems++; - } + buf_offset)), cpy_len); + } else { + batch_copy[vq->batch_copy_nb_elems].dst = + rte_pktmbuf_mtod_offset(cur, void *, + mbuf_offset); + batch_copy[vq->batch_copy_nb_elems].src = + (void *)((uintptr_t)(buf_addr + buf_offset)); + batch_copy[vq->batch_copy_nb_elems].len = cpy_len; + vq->batch_copy_nb_elems++; } mbuf_avail -= cpy_len; @@ -2067,7 +2039,6 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, break; buf_addr = buf_vec[vec_idx].buf_addr; - buf_iova = buf_vec[vec_idx].buf_iova; buf_len = buf_vec[vec_idx].buf_len; buf_offset = 0; @@ -2089,8 +2060,6 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, error = -1; goto out; } - if (unlikely(dev->dequeue_zero_copy)) - rte_mbuf_refcnt_update(cur, 1); prev->next = cur; prev->data_len = mbuf_offset; @@ -2114,37 +2083,6 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, return error; } -static __rte_always_inline struct zcopy_mbuf * -get_zmbuf(struct vhost_virtqueue *vq) -{ - uint16_t i; - uint16_t last; - int tries = 0; - - /* search [last_zmbuf_idx, zmbuf_size) */ - i = vq->last_zmbuf_idx; - last = vq->zmbuf_size; - -again: - for (; i < last; i++) { - if (vq->zmbufs[i].in_use == 0) { - vq->last_zmbuf_idx = i + 1; - vq->zmbufs[i].in_use = 1; - return &vq->zmbufs[i]; - } - } - - tries++; - if (tries == 1) { - /* search [0, last_zmbuf_idx) */ - i = 0; - last = vq->last_zmbuf_idx; - goto again; - } - - return NULL; -} - static void virtio_dev_extbuf_free(void *addr __rte_unused, void *opaque) { @@ -2244,30 +2182,6 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, uint16_t dropped = 0; static bool allocerr_warned; - if (unlikely(dev->dequeue_zero_copy)) { - struct zcopy_mbuf *zmbuf, *next; - - for (zmbuf = TAILQ_FIRST(&vq->zmbuf_list); - zmbuf != NULL; zmbuf = next) { - next = TAILQ_NEXT(zmbuf, next); - - if (mbuf_is_consumed(zmbuf->mbuf)) { - update_shadow_used_ring_split(vq, - zmbuf->desc_idx, 0); - TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next); - restore_mbuf(zmbuf->mbuf); - rte_pktmbuf_free(zmbuf->mbuf); - put_zmbuf(zmbuf); - vq->nr_zmbuf -= 1; - } - } - - if (likely(vq->shadow_used_idx)) { - flush_shadow_used_ring_split(dev, vq); - vhost_vring_call_split(dev, vq); - } - } - /* * The ordering between avail index and * desc reads needs to be enforced. @@ -2300,8 +2214,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, VHOST_ACCESS_RO) < 0)) break; - if (likely(dev->dequeue_zero_copy == 0)) - update_shadow_used_ring_split(vq, head_idx, 0); + update_shadow_used_ring_split(vq, head_idx, 0); pkts[i] = virtio_dev_pktmbuf_alloc(dev, mbuf_pool, buf_len); if (unlikely(pkts[i] == NULL)) { @@ -2335,42 +2248,16 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, i++; break; } - - if (unlikely(dev->dequeue_zero_copy)) { - struct zcopy_mbuf *zmbuf; - - zmbuf = get_zmbuf(vq); - if (!zmbuf) { - rte_pktmbuf_free(pkts[i]); - dropped += 1; - i++; - break; - } - zmbuf->mbuf = pkts[i]; - zmbuf->desc_idx = head_idx; - - /* - * Pin lock the mbuf; we will check later to see - * whether the mbuf is freed (when we are the last - * user) or not. If that's the case, we then could - * update the used ring safely. - */ - rte_mbuf_refcnt_update(pkts[i], 1); - - vq->nr_zmbuf += 1; - TAILQ_INSERT_TAIL(&vq->zmbuf_list, zmbuf, next); - } } + vq->last_avail_idx += i; - if (likely(dev->dequeue_zero_copy == 0)) { - do_data_copy_dequeue(vq); - if (unlikely(i < count)) - vq->shadow_used_idx = i; - if (likely(vq->shadow_used_idx)) { - flush_shadow_used_ring_split(dev, vq); - vhost_vring_call_split(dev, vq); - } + do_data_copy_dequeue(vq); + if (unlikely(i < count)) + vq->shadow_used_idx = i; + if (likely(vq->shadow_used_idx)) { + flush_shadow_used_ring_split(dev, vq); + vhost_vring_call_split(dev, vq); } return (i - dropped); @@ -2570,162 +2457,6 @@ virtio_dev_tx_single_packed(struct virtio_net *dev, return ret; } -static __rte_always_inline int -virtio_dev_tx_batch_packed_zmbuf(struct virtio_net *dev, - struct vhost_virtqueue *vq, - struct rte_mempool *mbuf_pool, - struct rte_mbuf **pkts) -{ - struct zcopy_mbuf *zmbufs[PACKED_BATCH_SIZE]; - uintptr_t desc_addrs[PACKED_BATCH_SIZE]; - uint16_t ids[PACKED_BATCH_SIZE]; - uint16_t i; - - uint16_t avail_idx = vq->last_avail_idx; - - if (vhost_reserve_avail_batch_packed(dev, vq, mbuf_pool, pkts, - avail_idx, desc_addrs, ids)) - return -1; - - vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) - zmbufs[i] = get_zmbuf(vq); - - vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { - if (!zmbufs[i]) - goto free_pkt; - } - - vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) { - zmbufs[i]->mbuf = pkts[i]; - zmbufs[i]->desc_idx = ids[i]; - zmbufs[i]->desc_count = 1; - } - - vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) - rte_mbuf_refcnt_update(pkts[i], 1); - - vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) - TAILQ_INSERT_TAIL(&vq->zmbuf_list, zmbufs[i], next); - - vq->nr_zmbuf += PACKED_BATCH_SIZE; - vq_inc_last_avail_packed(vq, PACKED_BATCH_SIZE); - - return 0; - -free_pkt: - vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) - rte_pktmbuf_free(pkts[i]); - - return -1; -} - -static __rte_always_inline int -virtio_dev_tx_single_packed_zmbuf(struct virtio_net *dev, - struct vhost_virtqueue *vq, - struct rte_mempool *mbuf_pool, - struct rte_mbuf **pkts) -{ - uint16_t buf_id, desc_count; - struct zcopy_mbuf *zmbuf; - - if (vhost_dequeue_single_packed(dev, vq, mbuf_pool, pkts, &buf_id, - &desc_count)) - return -1; - - zmbuf = get_zmbuf(vq); - if (!zmbuf) { - rte_pktmbuf_free(*pkts); - return -1; - } - zmbuf->mbuf = *pkts; - zmbuf->desc_idx = buf_id; - zmbuf->desc_count = desc_count; - - rte_mbuf_refcnt_update(*pkts, 1); - - vq->nr_zmbuf += 1; - TAILQ_INSERT_TAIL(&vq->zmbuf_list, zmbuf, next); - - vq_inc_last_avail_packed(vq, desc_count); - return 0; -} - -static __rte_always_inline void -free_zmbuf(struct vhost_virtqueue *vq) -{ - struct zcopy_mbuf *next = NULL; - struct zcopy_mbuf *zmbuf; - - for (zmbuf = TAILQ_FIRST(&vq->zmbuf_list); - zmbuf != NULL; zmbuf = next) { - next = TAILQ_NEXT(zmbuf, next); - - uint16_t last_used_idx = vq->last_used_idx; - - if (mbuf_is_consumed(zmbuf->mbuf)) { - uint16_t flags; - flags = vq->desc_packed[last_used_idx].flags; - if (vq->used_wrap_counter) { - flags |= VRING_DESC_F_USED; - flags |= VRING_DESC_F_AVAIL; - } else { - flags &= ~VRING_DESC_F_USED; - flags &= ~VRING_DESC_F_AVAIL; - } - - vq->desc_packed[last_used_idx].id = zmbuf->desc_idx; - vq->desc_packed[last_used_idx].len = 0; - - rte_smp_wmb(); - vq->desc_packed[last_used_idx].flags = flags; - - vq_inc_last_used_packed(vq, zmbuf->desc_count); - - TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next); - restore_mbuf(zmbuf->mbuf); - rte_pktmbuf_free(zmbuf->mbuf); - put_zmbuf(zmbuf); - vq->nr_zmbuf -= 1; - } - } -} - -static __rte_noinline uint16_t -virtio_dev_tx_packed_zmbuf(struct virtio_net *dev, - struct vhost_virtqueue *__rte_restrict vq, - struct rte_mempool *mbuf_pool, - struct rte_mbuf **__rte_restrict pkts, - uint32_t count) -{ - uint32_t pkt_idx = 0; - uint32_t remained = count; - - free_zmbuf(vq); - - do { - if (remained >= PACKED_BATCH_SIZE) { - if (!virtio_dev_tx_batch_packed_zmbuf(dev, vq, - mbuf_pool, &pkts[pkt_idx])) { - pkt_idx += PACKED_BATCH_SIZE; - remained -= PACKED_BATCH_SIZE; - continue; - } - } - - if (virtio_dev_tx_single_packed_zmbuf(dev, vq, mbuf_pool, - &pkts[pkt_idx])) - break; - pkt_idx++; - remained--; - - } while (remained); - - if (pkt_idx) - vhost_vring_call_packed(dev, vq); - - return pkt_idx; -} - static __rte_noinline uint16_t virtio_dev_tx_packed(struct virtio_net *dev, struct vhost_virtqueue *__rte_restrict vq, @@ -2841,14 +2572,9 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, count -= 1; } - if (vq_is_packed(dev)) { - if (unlikely(dev->dequeue_zero_copy)) - count = virtio_dev_tx_packed_zmbuf(dev, vq, mbuf_pool, - pkts, count); - else - count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, - count); - } else + if (vq_is_packed(dev)) + count = virtio_dev_tx_packed(dev, vq, mbuf_pool, pkts, count); + else count = virtio_dev_tx_split(dev, vq, mbuf_pool, pkts, count); out: