From patchwork Mon Apr 30 15:59:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxime Coquelin X-Patchwork-Id: 39181 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D5BFDE5D; Mon, 30 Apr 2018 18:00:12 +0200 (CEST) Received: from mx1.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by dpdk.org (Postfix) with ESMTP id 35DC3DED; Mon, 30 Apr 2018 18:00:11 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id AB4F1122660; Mon, 30 Apr 2018 16:00:10 +0000 (UTC) Received: from localhost.localdomain (ovpn-112-35.ams2.redhat.com [10.36.112.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id 00934202327B; Mon, 30 Apr 2018 16:00:08 +0000 (UTC) From: Maxime Coquelin To: dev@dpdk.org, jianfeng.tan@intel.com, tiwei.bie@intel.com, mst@redhat.com Cc: stable@dpdk.org, Maxime Coquelin Date: Mon, 30 Apr 2018 17:59:54 +0200 Message-Id: <20180430155954.9939-1-maxime.coquelin@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Mon, 30 Apr 2018 16:00:10 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Mon, 30 Apr 2018 16:00:10 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'maxime.coquelin@redhat.com' RCPT:'' Subject: [dpdk-dev] [PATCH] vhost: improve dirty pages logging performance X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This patch caches all dirty pages logging until the used ring index is updated. These dirty pages won't be accessed by the guest as long as the host doesn't give them back to it by updating the index. The goal of this optimization is to fix a performance regression introduced when the vhost library started to use atomic operations to set bits in the shared dirty log map. While the fix was valid as previous implementation wasn't safe against concurent accesses, contention was induced. With this patch, during migration, we have: 1. Less atomic operations as only a single atomic OR operation per 32 pages. 2. Less atomic operations as during a burst, the same page will be marked dirty only once. 3. Less write memory barriers. Fixes: 897f13a1f726 ("vhost: make page logging atomic") Cc: stable@dpdk.org Suggested-by: Michael S. Tsirkin Signed-off-by: Maxime Coquelin --- Hi, This series was tested with migrating a guest while running PVP benchmark at 1Mpps with both ovs-dpdk and testpmd as vswitch. With this patch we recover the packet drops regressions seen since the use of atomic operations to log dirty pages. Some numbers: A) PVP Live migration using testpmd (Single queue pair) ------------------------------------------------------- Without patch: =========================Stream Rate: 1Mpps============== No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss 0 1Mpps 134 18966 16 11628963.0 1 1Mpps 125 18790 16 8436300.0 2 1Mpps 122 19171 15 13881342.0 3 1Mpps 132 18913 15 12079492.0 <------------------------Summary------------------------> Max 1Mpps 134 19171 16 13881342 Min 1Mpps 122 18790 15 8436300 Mean 1Mpps 128 18960 15 11506524 Median 1Mpps 128 18939 15 11854227 Stdev 0 5.68 158.81 0.58 2266371.52 ========================================================= With patch: =========================Stream Rate: 1Mpps============== No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss 0 1Mpps 119 13521 15 478174.0 1 1Mpps 116 14084 14 452018.0 2 1Mpps 122 13908 14 464486.0 3 1Mpps 129 13787 16 478234.0 <------------------------Summary------------------------> Max 1Mpps 129 14084 16 478234 Min 1Mpps 116 13521 14 452018 Mean 1Mpps 121 13825 14 468228 Median 1Mpps 120 13847 14 471330 Stdev 0 5.57 236.52 0.96 12593.78 ========================================================= B) OVS-DPDK migration with 2 queue pairs ---------------------------------------- Without patch: =======================Stream Rate: 1Mpps================ No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss 0 1Mpps 146 20270 116 15937394.0 1 1Mpps 150 25617 16 11120370.0 2 1Mpps 138 40983 115 24779971.0 3 1Mpps 161 20435 17 15363519.0 <------------------------Summary------------------------> Max 1Mpps 161 40983 116 24779971 Min 1Mpps 138 20270 16 11120370 Mean 1Mpps 148 26826 66 16800313 Median 1Mpps 148 23026 66 15650456 Stdev 0 9.57 9758.9 57.16 5737179.93 ========================================================= With patch: =======================Stream Rate: 1Mpps================ No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss 0 1Mpps 155 18915 17 481330.0 1 1Mpps 156 21097 18 370556.0 2 1Mpps 156 49610 15 369291.0 3 1Mpps 144 31124 15 361914.0 <------------------------Summary------------------------> Max 1Mpps 156 49610 18 481330 Min 1Mpps 144 18915 15 361914 Mean 1Mpps 152 30186 16 395772 Median 1Mpps 155 26110 16 369923 Stdev 0 5.85 13997.82 1.5 57165.33 ========================================================= C) OVS-DPDK migration with single queue pair -------------------------------------------- Without patch: =======================Stream Rate: 1Mpps================ No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss 0 1Mpps 129 17411 15 11105414.0 1 1Mpps 130 16544 15 8028438.0 2 1Mpps 132 15202 15 7491584.0 3 1Mpps 133 18100 15 8385047.0 <------------------------Summary------------------------> Max 1Mpps 133 18100 15 11105414 Min 1Mpps 129 15202 15 7491584 Mean 1Mpps 131 16814 15 8752620 Median 1Mpps 131 16977 15 8206742 Stdev 0 1.83 1249.22 0.0 1610941.84 ========================================================= With patch: =======================Stream Rate: 1Mpps================ No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss 0 1Mpps 117 12966 14 275873.0 1 1Mpps 127 13398 14 460323.0 2 1Mpps 119 13216 14 280831.0 3 1Mpps 131 13544 15 465951.0 <------------------------Summary------------------------> Max 1Mpps 131 13544 15 465951 Min 1Mpps 117 12966 14 275873 Mean 1Mpps 123 13281 14 370744 Median 1Mpps 123 13307 14 370577 Stdev 0 6.61 249.2 0.5 106729.6 ========================================================= lib/librte_vhost/vhost.h | 96 +++++++++++++++++++++++++++++++++++++++++++ lib/librte_vhost/virtio_net.c | 29 ++++++++----- 2 files changed, 115 insertions(+), 10 deletions(-) diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 891978131..0d9129504 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -36,6 +36,8 @@ #define BUF_VECTOR_MAX 256 +#define VHOST_LOG_CACHE_NR 32 + /** * Structure contains buffer address, length and descriptor index * from vring to do scatter RX. @@ -69,6 +71,14 @@ struct batch_copy_elem { uint64_t log_addr; }; +/* + * Structure that contains the info for batched dirty logging. + */ +struct log_cache_entry { + uint32_t offset; + uint32_t val; +}; + /** * Structure contains variables relevant to RX/TX virtqueues. */ @@ -112,6 +122,9 @@ struct vhost_virtqueue { struct batch_copy_elem *batch_copy_elems; uint16_t batch_copy_nb_elems; + struct log_cache_entry log_cache[VHOST_LOG_CACHE_NR]; + uint16_t log_cache_nb_elem; + rte_rwlock_t iotlb_lock; rte_rwlock_t iotlb_pending_lock; struct rte_mempool *iotlb_pool; @@ -340,6 +353,89 @@ vhost_log_write(struct virtio_net *dev, uint64_t addr, uint64_t len) } } +static __rte_always_inline void +vhost_log_cache_sync(struct virtio_net *dev, struct vhost_virtqueue *vq) +{ + uint32_t *log_base; + int i; + + if (likely(((dev->features & (1ULL << VHOST_F_LOG_ALL)) == 0) || + !dev->log_base)) + return; + + log_base = (uint32_t *)(uintptr_t)dev->log_base; + + /* To make sure guest memory updates are committed before logging */ + rte_smp_wmb(); + + for (i = 0; i < vq->log_cache_nb_elem; i++) { + struct log_cache_entry *elem = vq->log_cache + i; + + __sync_fetch_and_or(log_base + elem->offset, elem->val); + } + + vq->log_cache_nb_elem = 0; +} + +static __rte_always_inline void +vhost_log_cache_page(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint64_t page) +{ + uint32_t bit_nr = page % 32; + uint32_t offset = page / 32; + int i; + + for (i = 0; i < vq->log_cache_nb_elem; i++) { + struct log_cache_entry *elem = vq->log_cache + i; + + if (elem->offset == offset) { + elem->val |= (1U << bit_nr); + return; + } + } + + if (unlikely(i >= VHOST_LOG_CACHE_NR)) { + /* + * No more room for a new log cache entry, + * so write the dirty log map directly. + */ + rte_smp_wmb(); + vhost_log_page((uint8_t *)(uintptr_t)dev->log_base, page); + + return; + } + + vq->log_cache[i].offset = offset; + vq->log_cache[i].val = (1U << bit_nr); +} + +static __rte_always_inline void +vhost_log_cache_write(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint64_t addr, uint64_t len) +{ + uint64_t page; + + if (likely(((dev->features & (1ULL << VHOST_F_LOG_ALL)) == 0) || + !dev->log_base || !len)) + return; + + if (unlikely(dev->log_size <= ((addr + len - 1) / VHOST_LOG_PAGE / 8))) + return; + + page = addr / VHOST_LOG_PAGE; + while (page * VHOST_LOG_PAGE < addr + len) { + vhost_log_cache_page(dev, vq, page); + page += 1; + } +} + +static __rte_always_inline void +vhost_log_cache_used_vring(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint64_t offset, uint64_t len) +{ + vhost_log_cache_write(dev, vq, vq->log_guest_addr + offset, len); +} + static __rte_always_inline void vhost_log_used_vring(struct virtio_net *dev, struct vhost_virtqueue *vq, uint64_t offset, uint64_t len) diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 5fdd4172b..ac42cdbd0 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -78,7 +78,7 @@ do_flush_shadow_used_ring(struct virtio_net *dev, struct vhost_virtqueue *vq, rte_memcpy(&vq->used->ring[to], &vq->shadow_used_ring[from], size * sizeof(struct vring_used_elem)); - vhost_log_used_vring(dev, vq, + vhost_log_cache_used_vring(dev, vq, offsetof(struct vring_used, ring[to]), size * sizeof(struct vring_used_elem)); } @@ -106,6 +106,8 @@ flush_shadow_used_ring(struct virtio_net *dev, struct vhost_virtqueue *vq) rte_smp_wmb(); + vhost_log_cache_sync(dev, vq); + *(volatile uint16_t *)&vq->used->idx += vq->shadow_used_idx; vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx), sizeof(vq->used->idx)); @@ -130,7 +132,7 @@ do_data_copy_enqueue(struct virtio_net *dev, struct vhost_virtqueue *vq) for (i = 0; i < count; i++) { rte_memcpy(elem[i].dst, elem[i].src, elem[i].len); - vhost_log_write(dev, elem[i].log_addr, elem[i].len); + vhost_log_cache_write(dev, vq, elem[i].log_addr, elem[i].len); PRINT_PACKET(dev, (uintptr_t)elem[i].dst, elem[i].len, 0); } } @@ -251,7 +253,7 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, virtio_enqueue_offload(m, (struct virtio_net_hdr *)(uintptr_t)desc_addr); PRINT_PACKET(dev, (uintptr_t)desc_addr, dev->vhost_hlen, 0); - vhost_log_write(dev, desc_gaddr, dev->vhost_hlen); + vhost_log_cache_write(dev, vq, desc_gaddr, dev->vhost_hlen); } else { struct virtio_net_hdr vnet_hdr; uint64_t remain = dev->vhost_hlen; @@ -274,7 +276,7 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, (void *)(uintptr_t)src, len); PRINT_PACKET(dev, (uintptr_t)dst, (uint32_t)len, 0); - vhost_log_write(dev, guest_addr, len); + vhost_log_cache_write(dev, vq, guest_addr, len); remain -= len; guest_addr += len; dst += len; @@ -355,7 +357,8 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, desc_offset)), rte_pktmbuf_mtod_offset(m, void *, mbuf_offset), cpy_len); - vhost_log_write(dev, desc_gaddr + desc_offset, cpy_len); + vhost_log_cache_write(dev, vq, desc_gaddr + desc_offset, + cpy_len); PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset), cpy_len, 0); } else { @@ -444,7 +447,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, vq->used->ring[used_idx].id = desc_indexes[i]; vq->used->ring[used_idx].len = pkts[i]->pkt_len + dev->vhost_hlen; - vhost_log_used_vring(dev, vq, + vhost_log_cache_used_vring(dev, vq, offsetof(struct vring_used, ring[used_idx]), sizeof(vq->used->ring[used_idx])); } @@ -504,6 +507,8 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id, rte_smp_wmb(); + vhost_log_cache_sync(dev, vq); + *(volatile uint16_t *)&vq->used->idx += count; vq->last_used_idx += count; vhost_log_used_vring(dev, vq, @@ -767,7 +772,8 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, PRINT_PACKET(dev, (uintptr_t)dst, (uint32_t)len, 0); - vhost_log_write(dev, guest_addr, len); + vhost_log_cache_write(dev, vq, + guest_addr, len); remain -= len; guest_addr += len; @@ -776,7 +782,7 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, } else { PRINT_PACKET(dev, (uintptr_t)hdr_addr, dev->vhost_hlen, 0); - vhost_log_write(dev, hdr_phys_addr, + vhost_log_cache_write(dev, vq, hdr_phys_addr, dev->vhost_hlen); } @@ -790,7 +796,8 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq, desc_offset)), rte_pktmbuf_mtod_offset(m, void *, mbuf_offset), cpy_len); - vhost_log_write(dev, desc_gaddr + desc_offset, cpy_len); + vhost_log_cache_write(dev, vq, desc_gaddr + desc_offset, + cpy_len); PRINT_PACKET(dev, (uintptr_t)(desc_addr + desc_offset), cpy_len, 0); } else { @@ -1320,7 +1327,7 @@ update_used_ring(struct virtio_net *dev, struct vhost_virtqueue *vq, { vq->used->ring[used_idx].id = desc_idx; vq->used->ring[used_idx].len = 0; - vhost_log_used_vring(dev, vq, + vhost_log_cache_used_vring(dev, vq, offsetof(struct vring_used, ring[used_idx]), sizeof(vq->used->ring[used_idx])); } @@ -1335,6 +1342,8 @@ update_used_idx(struct virtio_net *dev, struct vhost_virtqueue *vq, rte_smp_wmb(); rte_smp_rmb(); + vhost_log_cache_sync(dev, vq); + vq->used->idx += count; vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx), sizeof(vq->used->idx));