From patchwork Fri Dec 14 21:16:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48933 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id AFE191B996; Fri, 14 Dec 2018 22:26:37 +0100 (CET) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 3FCBF1B933 for ; Fri, 14 Dec 2018 22:26:34 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 14 Dec 2018 13:26:33 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,354,1539673200"; d="scan'208";a="118922610" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.145]) by orsmga001.jf.intel.com with ESMTP; 14 Dec 2018 13:26:32 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: alejandro.lucero@netronome.com, dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Sat, 15 Dec 2018 05:16:03 +0800 Message-Id: <20181214211612.167681-2-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181214211612.167681-1-xiao.w.wang@intel.com> References: <20181213100910.13087-2-xiao.w.wang@intel.com> <20181214211612.167681-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH v4 01/10] vhost: remove unused internal API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" vhost_detach_vdpa_device() is internally defined but not used, remove it in this patch. Signed-off-by: Xiao Wang Reviewed-by: Maxime Coquelin --- lib/librte_vhost/vhost.c | 13 ------------- lib/librte_vhost/vhost.h | 1 - 2 files changed, 14 deletions(-) diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index 70ac6bc9c..b32babee4 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -400,19 +400,6 @@ vhost_attach_vdpa_device(int vid, int did) dev->vdpa_dev_id = did; } -void -vhost_detach_vdpa_device(int vid) -{ - struct virtio_net *dev = get_device(vid); - - if (dev == NULL) - return; - - vhost_user_host_notifier_ctrl(vid, false); - - dev->vdpa_dev_id = -1; -} - void vhost_set_ifname(int vid, const char *if_name, unsigned int if_len) { diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 552b9298d..d5bab4803 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -629,7 +629,6 @@ void free_vq(struct virtio_net *dev, struct vhost_virtqueue *vq); int alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx); void vhost_attach_vdpa_device(int vid, int did); -void vhost_detach_vdpa_device(int vid); void vhost_set_ifname(int, const char *if_name, unsigned int if_len); void vhost_enable_dequeue_zero_copy(int vid); From patchwork Fri Dec 14 21:16:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48934 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 823671B9A5; Fri, 14 Dec 2018 22:26:39 +0100 (CET) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id CD0961B957 for ; Fri, 14 Dec 2018 22:26:35 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 14 Dec 2018 13:26:35 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,354,1539673200"; d="scan'208";a="118922621" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.145]) by orsmga001.jf.intel.com with ESMTP; 14 Dec 2018 13:26:34 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: alejandro.lucero@netronome.com, dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Sat, 15 Dec 2018 05:16:04 +0800 Message-Id: <20181214211612.167681-3-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181214211612.167681-1-xiao.w.wang@intel.com> References: <20181213100910.13087-2-xiao.w.wang@intel.com> <20181214211612.167681-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH v4 02/10] vhost: provide helper for host notifier ctrl X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" VDPA driver can decide if it needs to enable/disable the host notifier mapping, so exposing a API can allow flexibility. A later patch will base on this. Signed-off-by: Xiao Wang Reviewed-by: Maxime Coquelin --- drivers/net/ifc/ifcvf_vdpa.c | 3 +++ lib/librte_vhost/rte_vdpa.h | 18 ++++++++++++++++++ lib/librte_vhost/rte_vhost_version.map | 1 + lib/librte_vhost/vhost_user.c | 7 +------ 4 files changed, 23 insertions(+), 6 deletions(-) diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index 97a57f182..e844109f3 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -556,6 +556,9 @@ ifcvf_dev_config(int vid) rte_atomic32_set(&internal->dev_attached, 1); update_datapath(internal); + if (rte_vhost_host_notifier_ctrl(vid, true) != 0) + DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did); + return 0; } diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h index a418da47c..fff657391 100644 --- a/lib/librte_vhost/rte_vdpa.h +++ b/lib/librte_vhost/rte_vdpa.h @@ -11,6 +11,8 @@ * Device specific vhost lib */ +#include + #include #include "rte_vhost.h" @@ -155,4 +157,20 @@ rte_vdpa_get_device(int did); */ int __rte_experimental rte_vdpa_get_device_num(void); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Enable/Disable host notifier mapping for a vdpa port. + * + * @param vid + * vhost device id + * @enable + * true for host notifier map, false for host notifier unmap + * @return + * 0 on success, -1 on failure + */ +int __rte_experimental +rte_vhost_host_notifier_ctrl(int vid, bool enable); #endif /* _RTE_VDPA_H_ */ diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map index ae39b6e21..22302e972 100644 --- a/lib/librte_vhost/rte_vhost_version.map +++ b/lib/librte_vhost/rte_vhost_version.map @@ -83,4 +83,5 @@ EXPERIMENTAL { rte_vhost_crypto_finalize_requests; rte_vhost_crypto_set_zero_copy; rte_vhost_va_from_guest_pa; + rte_vhost_host_notifier_ctrl; }; diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 557213491..8fec773d5 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -2049,11 +2049,6 @@ vhost_user_msg_handler(int vid, int fd) if (vdpa_dev->ops->dev_conf) vdpa_dev->ops->dev_conf(dev->vid); dev->flags |= VIRTIO_DEV_VDPA_CONFIGURED; - if (vhost_user_host_notifier_ctrl(dev->vid, true) != 0) { - RTE_LOG(INFO, VHOST_CONFIG, - "(%d) software relay is used for vDPA, performance may be low.\n", - dev->vid); - } } return 0; @@ -2148,7 +2143,7 @@ static int vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev, return process_slave_message_reply(dev, &msg); } -int vhost_user_host_notifier_ctrl(int vid, bool enable) +int rte_vhost_host_notifier_ctrl(int vid, bool enable) { struct virtio_net *dev; struct rte_vdpa_device *vdpa_dev; From patchwork Fri Dec 14 21:16:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48935 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E90881B9B3; Fri, 14 Dec 2018 22:26:40 +0100 (CET) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 485F01B959 for ; Fri, 14 Dec 2018 22:26:37 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 14 Dec 2018 13:26:37 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,354,1539673200"; d="scan'208";a="118922639" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.145]) by orsmga001.jf.intel.com with ESMTP; 14 Dec 2018 13:26:35 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: alejandro.lucero@netronome.com, dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Sat, 15 Dec 2018 05:16:05 +0800 Message-Id: <20181214211612.167681-4-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181214211612.167681-1-xiao.w.wang@intel.com> References: <20181213100910.13087-2-xiao.w.wang@intel.com> <20181214211612.167681-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH v4 03/10] vhost: provide helpers for virtio ring relay X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This patch provides two helpers for vdpa device driver to perform a relay between the guest virtio ring and a mediate virtio ring. The available ring relay will synchronize the available entries, and helps to do desc validity checking. The used ring relay will synchronize the used entries from mediate ring to guest ring, and helps to do dirty page logging for live migration. The next patch will leverage these two helpers. Signed-off-by: Xiao Wang Reviewed-by: Maxime Coquelin --- lib/librte_vhost/rte_vdpa.h | 39 +++++++ lib/librte_vhost/rte_vhost_version.map | 2 + lib/librte_vhost/vdpa.c | 194 +++++++++++++++++++++++++++++++++ lib/librte_vhost/vhost.h | 40 +++++++ lib/librte_vhost/virtio_net.c | 39 ------- 5 files changed, 275 insertions(+), 39 deletions(-) diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h index fff657391..02b8d14ed 100644 --- a/lib/librte_vhost/rte_vdpa.h +++ b/lib/librte_vhost/rte_vdpa.h @@ -173,4 +173,43 @@ rte_vdpa_get_device_num(void); */ int __rte_experimental rte_vhost_host_notifier_ctrl(int vid, bool enable); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Synchronize the available ring from guest to mediate ring, help to + * check desc validity to protect against malicious guest driver. + * + * @param vid + * vhost device id + * @param qid + * vhost queue id + * @param vring_m + * mediate virtio ring pointer + * @return + * number of synced available entries on success, -1 on failure + */ +int __rte_experimental +rte_vdpa_relay_vring_avail(int vid, uint16_t qid, void *vring_m); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Synchronize the used ring from mediate ring to guest, log dirty + * page for each writeable buffer, caller should handle the used + * ring logging before device stop. + * + * @param vid + * vhost device id + * @param qid + * vhost queue id + * @param vring_m + * mediate virtio ring pointer + * @return + * number of synced used entries on success, -1 on failure + */ +int __rte_experimental +rte_vdpa_relay_vring_used(int vid, uint16_t qid, void *vring_m); #endif /* _RTE_VDPA_H_ */ diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map index 22302e972..dd3b4c1cb 100644 --- a/lib/librte_vhost/rte_vhost_version.map +++ b/lib/librte_vhost/rte_vhost_version.map @@ -84,4 +84,6 @@ EXPERIMENTAL { rte_vhost_crypto_set_zero_copy; rte_vhost_va_from_guest_pa; rte_vhost_host_notifier_ctrl; + rte_vdpa_relay_vring_avail; + rte_vdpa_relay_vring_used; }; diff --git a/lib/librte_vhost/vdpa.c b/lib/librte_vhost/vdpa.c index e7d849ee0..dcf6c3b8e 100644 --- a/lib/librte_vhost/vdpa.c +++ b/lib/librte_vhost/vdpa.c @@ -122,3 +122,197 @@ rte_vdpa_get_device_num(void) { return vdpa_device_num; } + +static bool +invalid_desc_check(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint64_t desc_iova, uint64_t desc_len, uint8_t perm) +{ + uint64_t desc_addr, desc_chunck_len; + + while (desc_len) { + desc_chunck_len = desc_len; + desc_addr = vhost_iova_to_vva(dev, vq, + desc_iova, + &desc_chunck_len, + perm); + + if (!desc_addr) + return true; + + desc_len -= desc_chunck_len; + desc_iova += desc_chunck_len; + } + + return false; +} + +int +rte_vdpa_relay_vring_avail(int vid, uint16_t qid, void *vring_m) +{ + struct virtio_net *dev = get_device(vid); + uint16_t idx, idx_m, desc_id; + struct vring_desc desc; + struct vhost_virtqueue *vq; + struct vring_desc *desc_ring; + struct vring_desc *idesc = NULL; + struct vring *s_vring; + uint64_t dlen; + int ret; + uint8_t perm; + + if (!dev || !vring_m) + return -1; + + if (qid >= dev->nr_vring) + return -1; + + if (vq_is_packed(dev)) + return -1; + + s_vring = (struct vring *)vring_m; + vq = dev->virtqueue[qid]; + idx = vq->avail->idx; + idx_m = s_vring->avail->idx; + ret = (uint16_t)(idx - idx_m); + + while (idx_m != idx) { + /* avail entry copy */ + desc_id = vq->avail->ring[idx_m & (vq->size - 1)]; + s_vring->avail->ring[idx_m & (vq->size - 1)] = desc_id; + desc_ring = vq->desc; + + if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) { + dlen = vq->desc[desc_id].len; + desc_ring = (struct vring_desc *)(uintptr_t) + vhost_iova_to_vva(dev, vq, + vq->desc[desc_id].addr, &dlen, + VHOST_ACCESS_RO); + if (unlikely(!desc_ring)) + return -1; + + if (unlikely(dlen < vq->desc[idx].len)) { + idesc = alloc_copy_ind_table(dev, vq, + vq->desc[idx].addr, + vq->desc[idx].len); + if (unlikely(!idesc)) + return -1; + + desc_ring = idesc; + } + + desc_id = 0; + } + + /* check if the buf addr is within the guest memory */ + do { + desc = desc_ring[desc_id]; + perm = desc.flags & VRING_DESC_F_WRITE ? + VHOST_ACCESS_WO : VHOST_ACCESS_RO; + if (invalid_desc_check(dev, vq, desc.addr, desc.len, + perm)) { + if (unlikely(idesc)) + free_ind_table(idesc); + return -1; + } + desc_id = desc.next; + } while (desc.flags & VRING_DESC_F_NEXT); + + if (unlikely(idesc)) { + free_ind_table(idesc); + idesc = NULL; + } + + idx_m++; + } + + rte_smp_wmb(); + s_vring->avail->idx = idx; + + if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) + vhost_avail_event(vq) = idx; + + return ret; +} + +int +rte_vdpa_relay_vring_used(int vid, uint16_t qid, void *vring_m) +{ + struct virtio_net *dev = get_device(vid); + uint16_t idx, idx_m, desc_id; + struct vhost_virtqueue *vq; + struct vring_desc desc; + struct vring_desc *desc_ring; + struct vring_desc *idesc = NULL; + struct vring *s_vring; + uint64_t dlen; + int ret; + + if (!dev || !vring_m) + return -1; + + if (qid >= dev->nr_vring) + return -1; + + if (vq_is_packed(dev)) + return -1; + + s_vring = (struct vring *)vring_m; + vq = dev->virtqueue[qid]; + idx = vq->used->idx; + idx_m = s_vring->used->idx; + ret = (uint16_t)(idx_m - idx); + + while (idx != idx_m) { + /* copy used entry, used ring logging is not covered here */ + vq->used->ring[idx & (vq->size - 1)] = + s_vring->used->ring[idx & (vq->size - 1)]; + + desc_id = vq->used->ring[idx & (vq->size - 1)].id; + desc_ring = vq->desc; + + if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) { + dlen = vq->desc[desc_id].len; + desc_ring = (struct vring_desc *)(uintptr_t) + vhost_iova_to_vva(dev, vq, + vq->desc[desc_id].addr, &dlen, + VHOST_ACCESS_RO); + if (unlikely(!desc_ring)) + return -1; + + if (unlikely(dlen < vq->desc[idx].len)) { + idesc = alloc_copy_ind_table(dev, vq, + vq->desc[idx].addr, + vq->desc[idx].len); + if (unlikely(!idesc)) + return -1; + + desc_ring = idesc; + } + + desc_id = 0; + } + + /* dirty page logging for DMA writeable buffer */ + do { + desc = desc_ring[desc_id]; + if (desc.flags & VRING_DESC_F_WRITE) + vhost_log_write(dev, desc.addr, desc.len); + desc_id = desc.next; + } while (desc.flags & VRING_DESC_F_NEXT); + + if (unlikely(idesc)) { + free_ind_table(idesc); + idesc = NULL; + } + + idx++; + } + + rte_smp_wmb(); + vq->used->idx = idx_m; + + if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) + vring_used_event(s_vring) = idx_m; + + return ret; +} diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index d5bab4803..3b3265c4b 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -18,6 +18,7 @@ #include #include #include +#include #include "rte_vhost.h" #include "rte_vdpa.h" @@ -754,4 +755,43 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq) eventfd_write(vq->callfd, (eventfd_t)1); } +static __rte_always_inline void * +alloc_copy_ind_table(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint64_t desc_addr, uint64_t desc_len) +{ + void *idesc; + uint64_t src, dst; + uint64_t len, remain = desc_len; + + idesc = rte_malloc(__func__, desc_len, 0); + if (unlikely(!idesc)) + return 0; + + dst = (uint64_t)(uintptr_t)idesc; + + while (remain) { + len = remain; + src = vhost_iova_to_vva(dev, vq, desc_addr, &len, + VHOST_ACCESS_RO); + if (unlikely(!src || !len)) { + rte_free(idesc); + return 0; + } + + rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len); + + remain -= len; + dst += len; + desc_addr += len; + } + + return idesc; +} + +static __rte_always_inline void +free_ind_table(void *idesc) +{ + rte_free(idesc); +} + #endif /* _VHOST_NET_CDEV_H_ */ diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 5e1a1a727..8c657a101 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -37,45 +37,6 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t nr_vring) return (is_tx ^ (idx & 1)) == 0 && idx < nr_vring; } -static __rte_always_inline void * -alloc_copy_ind_table(struct virtio_net *dev, struct vhost_virtqueue *vq, - uint64_t desc_addr, uint64_t desc_len) -{ - void *idesc; - uint64_t src, dst; - uint64_t len, remain = desc_len; - - idesc = rte_malloc(__func__, desc_len, 0); - if (unlikely(!idesc)) - return 0; - - dst = (uint64_t)(uintptr_t)idesc; - - while (remain) { - len = remain; - src = vhost_iova_to_vva(dev, vq, desc_addr, &len, - VHOST_ACCESS_RO); - if (unlikely(!src || !len)) { - rte_free(idesc); - return 0; - } - - rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len); - - remain -= len; - dst += len; - desc_addr += len; - } - - return idesc; -} - -static __rte_always_inline void -free_ind_table(void *idesc) -{ - rte_free(idesc); -} - static __rte_always_inline void do_flush_shadow_used_ring_split(struct virtio_net *dev, struct vhost_virtqueue *vq, From patchwork Fri Dec 14 21:16:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48936 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 437261B9BE; Fri, 14 Dec 2018 22:26:43 +0100 (CET) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id D2B451B99F for ; Fri, 14 Dec 2018 22:26:38 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 14 Dec 2018 13:26:38 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,354,1539673200"; d="scan'208";a="118922654" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.145]) by orsmga001.jf.intel.com with ESMTP; 14 Dec 2018 13:26:37 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: alejandro.lucero@netronome.com, dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Sat, 15 Dec 2018 05:16:06 +0800 Message-Id: <20181214211612.167681-5-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181214211612.167681-1-xiao.w.wang@intel.com> References: <20181213100910.13087-2-xiao.w.wang@intel.com> <20181214211612.167681-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH v4 04/10] net/ifc: dump debug message for error X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Driver probe may fail for different causes, debug message is helpful for debugging issue. Signed-off-by: Xiao Wang Reviewed-by: Maxime Coquelin --- drivers/net/ifc/ifcvf_vdpa.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index e844109f3..aacd5f9bf 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -22,7 +22,7 @@ #define DRV_LOG(level, fmt, args...) \ rte_log(RTE_LOG_ ## level, ifcvf_vdpa_logtype, \ - "%s(): " fmt "\n", __func__, ##args) + "IFCVF %s(): " fmt "\n", __func__, ##args) #ifndef PAGE_SIZE #define PAGE_SIZE 4096 @@ -756,11 +756,16 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, internal->pdev = pci_dev; rte_spinlock_init(&internal->lock); - if (ifcvf_vfio_setup(internal) < 0) - return -1; - if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0) - return -1; + if (ifcvf_vfio_setup(internal) < 0) { + DRV_LOG(ERR, "failed to setup device %s", pci_dev->name); + goto error; + } + + if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0) { + DRV_LOG(ERR, "failed to init device %s", pci_dev->name); + goto error; + } internal->max_queues = IFCVF_MAX_QUEUES; features = ifcvf_get_features(&internal->hw); @@ -782,8 +787,10 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, internal->did = rte_vdpa_register_device(&internal->dev_addr, &ifcvf_ops); - if (internal->did < 0) + if (internal->did < 0) { + DRV_LOG(ERR, "failed to register device %s", pci_dev->name); goto error; + } rte_atomic32_set(&internal->started, 1); update_datapath(internal); From patchwork Fri Dec 14 21:16:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48937 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id B57DC1B9D5; Fri, 14 Dec 2018 22:26:45 +0100 (CET) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 91BAA1B965; Fri, 14 Dec 2018 22:26:40 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 14 Dec 2018 13:26:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,354,1539673200"; d="scan'208";a="118922663" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.145]) by orsmga001.jf.intel.com with ESMTP; 14 Dec 2018 13:26:38 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: alejandro.lucero@netronome.com, dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang , stable@dpdk.org Date: Sat, 15 Dec 2018 05:16:07 +0800 Message-Id: <20181214211612.167681-6-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181214211612.167681-1-xiao.w.wang@intel.com> References: <20181213100910.13087-2-xiao.w.wang@intel.com> <20181214211612.167681-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH v4 05/10] net/ifc: store only registered device instance X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" If driver fails to register ifc VF device into vhost lib, then this device should not be stored. Fixes: a3f8150eac6d ("net/ifcvf: add ifcvf vDPA driver") cc: stable@dpdk.org Signed-off-by: Xiao Wang Reviewed-by: Maxime Coquelin --- drivers/net/ifc/ifcvf_vdpa.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index aacd5f9bf..6fcd50b73 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -781,10 +781,6 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, internal->dev_addr.type = PCI_ADDR; list->internal = internal; - pthread_mutex_lock(&internal_list_lock); - TAILQ_INSERT_TAIL(&internal_list, list, next); - pthread_mutex_unlock(&internal_list_lock); - internal->did = rte_vdpa_register_device(&internal->dev_addr, &ifcvf_ops); if (internal->did < 0) { @@ -792,6 +788,10 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, goto error; } + pthread_mutex_lock(&internal_list_lock); + TAILQ_INSERT_TAIL(&internal_list, list, next); + pthread_mutex_unlock(&internal_list_lock); + rte_atomic32_set(&internal->started, 1); update_datapath(internal); From patchwork Fri Dec 14 21:16:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48938 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 134111B9E0; Fri, 14 Dec 2018 22:26:48 +0100 (CET) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 08C781B965 for ; Fri, 14 Dec 2018 22:26:41 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 14 Dec 2018 13:26:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,354,1539673200"; d="scan'208";a="118922667" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.145]) by orsmga001.jf.intel.com with ESMTP; 14 Dec 2018 13:26:40 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: alejandro.lucero@netronome.com, dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Sat, 15 Dec 2018 05:16:08 +0800 Message-Id: <20181214211612.167681-7-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181214211612.167681-1-xiao.w.wang@intel.com> References: <20181213100910.13087-2-xiao.w.wang@intel.com> <20181214211612.167681-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH v4 06/10] net/ifc: detect if VDPA mode is specified X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" If user wants the VF to be used in VDPA (vhost data path acceleration) mode, then the user can add a "vdpa=1" parameter for the device. So if driver doesn't not find this option, it should quit and let the bus continue the probe. Signed-off-by: Xiao Wang Reviewed-by: Maxime Coquelin --- drivers/net/ifc/Makefile | 1 + drivers/net/ifc/ifcvf_vdpa.c | 47 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 48 insertions(+) diff --git a/drivers/net/ifc/Makefile b/drivers/net/ifc/Makefile index 39b36ae5d..7755a87eb 100644 --- a/drivers/net/ifc/Makefile +++ b/drivers/net/ifc/Makefile @@ -10,6 +10,7 @@ LIB = librte_pmd_ifc.a LDLIBS += -lpthread LDLIBS += -lrte_eal -lrte_pci -lrte_vhost -lrte_bus_pci +LDLIBS += -lrte_kvargs CFLAGS += -O3 CFLAGS += $(WERROR_FLAGS) diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index 6fcd50b73..c0e50354a 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include #include "base/ifcvf.h" @@ -28,6 +30,13 @@ #define PAGE_SIZE 4096 #endif +#define IFCVF_VDPA_MODE "vdpa" + +static const char * const ifcvf_valid_arguments[] = { + IFCVF_VDPA_MODE, + NULL +}; + static int ifcvf_vdpa_logtype; struct ifcvf_internal { @@ -735,6 +744,21 @@ static struct rte_vdpa_dev_ops ifcvf_ops = { .get_notify_area = ifcvf_get_notify_area, }; +static inline int +open_int(const char *key __rte_unused, const char *value, void *extra_args) +{ + uint16_t *n = extra_args; + + if (value == NULL || extra_args == NULL) + return -EINVAL; + + *n = (uint16_t)strtoul(value, NULL, 0); + if (*n == USHRT_MAX && errno == ERANGE) + return -1; + + return 0; +} + static int ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, struct rte_pci_device *pci_dev) @@ -742,10 +766,31 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, uint64_t features; struct ifcvf_internal *internal = NULL; struct internal_list *list = NULL; + int vdpa_mode = 0; + struct rte_kvargs *kvlist = NULL; + int ret = 0; if (rte_eal_process_type() != RTE_PROC_PRIMARY) return 0; + kvlist = rte_kvargs_parse(pci_dev->device.devargs->args, + ifcvf_valid_arguments); + if (kvlist == NULL) + return 1; + + /* probe only when vdpa mode is specified */ + if (rte_kvargs_count(kvlist, IFCVF_VDPA_MODE) == 0) { + rte_kvargs_free(kvlist); + return 1; + } + + ret = rte_kvargs_process(kvlist, IFCVF_VDPA_MODE, &open_int, + &vdpa_mode); + if (ret < 0 || vdpa_mode == 0) { + rte_kvargs_free(kvlist); + return 1; + } + list = rte_zmalloc("ifcvf", sizeof(*list), 0); if (list == NULL) goto error; @@ -795,9 +840,11 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, rte_atomic32_set(&internal->started, 1); update_datapath(internal); + rte_kvargs_free(kvlist); return 0; error: + rte_kvargs_free(kvlist); rte_free(list); rte_free(internal); return -1; From patchwork Fri Dec 14 21:16:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48939 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 978BB1B9D0; Fri, 14 Dec 2018 22:26:50 +0100 (CET) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 1CED41B9E2 for ; Fri, 14 Dec 2018 22:26:47 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 14 Dec 2018 13:26:47 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,354,1539673200"; d="scan'208";a="118922679" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.145]) by orsmga001.jf.intel.com with ESMTP; 14 Dec 2018 13:26:41 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: alejandro.lucero@netronome.com, dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Sat, 15 Dec 2018 05:16:09 +0800 Message-Id: <20181214211612.167681-8-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181214211612.167681-1-xiao.w.wang@intel.com> References: <20181213100910.13087-2-xiao.w.wang@intel.com> <20181214211612.167681-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH v4 07/10] net/ifc: add devarg for LM mode X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This patch series enables a new method for live migration, i.e. software assisted live migration. This patch provides a device argument for user to choose the methold. When "swlm=1", driver/device will do live migration with a relay thread dealing with dirty page logging. Without this parameter, device will do dirty page logging and there's no relay thread consuming CPU resource. Signed-off-by: Xiao Wang --- drivers/net/ifc/ifcvf_vdpa.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index c0e50354a..395c5112f 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include @@ -31,9 +32,11 @@ #endif #define IFCVF_VDPA_MODE "vdpa" +#define IFCVF_SW_FALLBACK_LM "swlm" static const char * const ifcvf_valid_arguments[] = { IFCVF_VDPA_MODE, + IFCVF_SW_FALLBACK_LM, NULL }; @@ -56,6 +59,7 @@ struct ifcvf_internal { rte_atomic32_t dev_attached; rte_atomic32_t running; rte_spinlock_t lock; + bool sw_lm; }; struct internal_list { @@ -767,6 +771,7 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, struct ifcvf_internal *internal = NULL; struct internal_list *list = NULL; int vdpa_mode = 0; + int sw_fallback_lm = 0; struct rte_kvargs *kvlist = NULL; int ret = 0; @@ -826,6 +831,14 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, internal->dev_addr.type = PCI_ADDR; list->internal = internal; + if (rte_kvargs_count(kvlist, IFCVF_SW_FALLBACK_LM)) { + ret = rte_kvargs_process(kvlist, IFCVF_SW_FALLBACK_LM, + &open_int, &sw_fallback_lm); + if (ret < 0) + goto error; + } + internal->sw_lm = sw_fallback_lm; + internal->did = rte_vdpa_register_device(&internal->dev_addr, &ifcvf_ops); if (internal->did < 0) { From patchwork Fri Dec 14 21:16:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48940 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 5D8941B9F0; Fri, 14 Dec 2018 22:26:53 +0100 (CET) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id B4E6B1B9C8 for ; Fri, 14 Dec 2018 22:26:48 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 14 Dec 2018 13:26:47 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,354,1539673200"; d="scan'208";a="118922686" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.145]) by orsmga001.jf.intel.com with ESMTP; 14 Dec 2018 13:26:45 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: alejandro.lucero@netronome.com, dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Sat, 15 Dec 2018 05:16:10 +0800 Message-Id: <20181214211612.167681-9-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181214211612.167681-1-xiao.w.wang@intel.com> References: <20181213100910.13087-2-xiao.w.wang@intel.com> <20181214211612.167681-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH v4 08/10] net/ifc: use lib API for used ring logging X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Vhost lib has already provided a helper for used ring logging, driver could use it to reduce code. Signed-off-by: Xiao Wang Reviewed-by: Maxime Coquelin --- drivers/net/ifc/ifcvf_vdpa.c | 27 ++++++++------------------- 1 file changed, 8 insertions(+), 19 deletions(-) diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index 395c5112f..f181c5a6e 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -31,6 +31,9 @@ #define PAGE_SIZE 4096 #endif +#define IFCVF_USED_RING_LEN(size) \ + ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3) + #define IFCVF_VDPA_MODE "vdpa" #define IFCVF_SW_FALLBACK_LM "swlm" @@ -288,21 +291,6 @@ vdpa_ifcvf_start(struct ifcvf_internal *internal) return ifcvf_start_hw(&internal->hw); } -static void -ifcvf_used_ring_log(struct ifcvf_hw *hw, uint32_t queue, uint8_t *log_buf) -{ - uint32_t i, size; - uint64_t pfn; - - pfn = hw->vring[queue].used / PAGE_SIZE; - size = hw->vring[queue].size * sizeof(struct vring_used_elem) + - sizeof(uint16_t) * 3; - - for (i = 0; i <= size / PAGE_SIZE; i++) - __sync_fetch_and_or_8(&log_buf[(pfn + i) / 8], - 1 << ((pfn + i) % 8)); -} - static void vdpa_ifcvf_stop(struct ifcvf_internal *internal) { @@ -311,7 +299,7 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal) int vid; uint64_t features; uint64_t log_base, log_size; - uint8_t *log_buf; + uint64_t len; vid = internal->vid; ifcvf_stop_hw(hw); @@ -330,9 +318,10 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal) * IFCVF marks dirty memory pages for only packet buffer, * SW helps to mark the used ring as dirty after device stops. */ - log_buf = (uint8_t *)(uintptr_t)log_base; - for (i = 0; i < hw->nr_vring; i++) - ifcvf_used_ring_log(hw, i, log_buf); + for (i = 0; i < hw->nr_vring; i++) { + len = IFCVF_USED_RING_LEN(hw->vring[i].size); + rte_vhost_log_used_vring(vid, i, 0, len); + } } } From patchwork Fri Dec 14 21:16:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48941 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 4692A1B9F9; Fri, 14 Dec 2018 22:26:55 +0100 (CET) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 2D1CE1B9E9 for ; Fri, 14 Dec 2018 22:26:49 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 14 Dec 2018 13:26:48 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,354,1539673200"; d="scan'208";a="118922694" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.145]) by orsmga001.jf.intel.com with ESMTP; 14 Dec 2018 13:26:46 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: alejandro.lucero@netronome.com, dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Sat, 15 Dec 2018 05:16:11 +0800 Message-Id: <20181214211612.167681-10-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181214211612.167681-1-xiao.w.wang@intel.com> References: <20181213100910.13087-2-xiao.w.wang@intel.com> <20181214211612.167681-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH v4 09/10] net/ifc: support SW assisted VDPA live migration X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" In SW assisted live migration mode, driver will stop the device and setup a mediate virtio ring to relay the communication between the virtio driver and the VDPA device. This data path intervention will allow SW to help on guest dirty page logging for live migration. This SW fallback is event driven relay thread, so when the network throughput is low, this SW fallback will take little CPU resource, but when the throughput goes up, the relay thread's CPU usage will goes up accordinly. User needs to take all the factors including CPU usage, guest perf degradation, etc. into consideration when selecting the live migration support mode. Signed-off-by: Xiao Wang --- drivers/net/ifc/base/ifcvf.h | 1 + drivers/net/ifc/ifcvf_vdpa.c | 346 ++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 344 insertions(+), 3 deletions(-) diff --git a/drivers/net/ifc/base/ifcvf.h b/drivers/net/ifc/base/ifcvf.h index c15c69107..e8a30d2c6 100644 --- a/drivers/net/ifc/base/ifcvf.h +++ b/drivers/net/ifc/base/ifcvf.h @@ -50,6 +50,7 @@ #define IFCVF_LM_ENABLE_VF 0x1 #define IFCVF_LM_ENABLE_PF 0x3 #define IFCVF_LOG_BASE 0x100000000000 +#define IFCVF_MEDIATE_VRING 0x200000000000 #define IFCVF_32_BIT_MASK 0xffffffff diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index f181c5a6e..61757d0b4 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -63,6 +63,9 @@ struct ifcvf_internal { rte_atomic32_t running; rte_spinlock_t lock; bool sw_lm; + bool sw_fallback_running; + /* mediated vring for sw fallback */ + struct vring m_vring[IFCVF_MAX_QUEUES * 2]; }; struct internal_list { @@ -308,6 +311,9 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal) rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx, hw->vring[i].last_used_idx); + if (internal->sw_lm) + return; + rte_vhost_get_negotiated_features(vid, &features); if (RTE_VHOST_NEED_LOG(features)) { ifcvf_disable_logging(hw); @@ -539,6 +545,318 @@ update_datapath(struct ifcvf_internal *internal) return ret; } +static int +m_ifcvf_start(struct ifcvf_internal *internal) +{ + struct ifcvf_hw *hw = &internal->hw; + uint32_t i, nr_vring; + int vid, ret; + struct rte_vhost_vring vq; + void *vring_buf; + uint64_t m_vring_iova = IFCVF_MEDIATE_VRING; + uint64_t size; + uint64_t gpa; + + vid = internal->vid; + nr_vring = rte_vhost_get_vring_num(vid); + rte_vhost_get_negotiated_features(vid, &hw->req_features); + + for (i = 0; i < nr_vring; i++) { + rte_vhost_get_vhost_vring(vid, i, &vq); + + size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE), + PAGE_SIZE); + vring_buf = rte_zmalloc("ifcvf", size, PAGE_SIZE); + vring_init(&internal->m_vring[i], vq.size, vring_buf, + PAGE_SIZE); + + ret = rte_vfio_container_dma_map(internal->vfio_container_fd, + (uint64_t)(uintptr_t)vring_buf, m_vring_iova, size); + if (ret < 0) { + DRV_LOG(ERR, "mediate vring DMA map failed."); + goto error; + } + + gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.desc); + if (gpa == 0) { + DRV_LOG(ERR, "Fail to get GPA for descriptor ring."); + return -1; + } + hw->vring[i].desc = gpa; + + hw->vring[i].avail = m_vring_iova + + (char *)internal->m_vring[i].avail - + (char *)internal->m_vring[i].desc; + + hw->vring[i].used = m_vring_iova + + (char *)internal->m_vring[i].used - + (char *)internal->m_vring[i].desc; + + hw->vring[i].size = vq.size; + + rte_vhost_get_vring_base(vid, i, &hw->vring[i].last_avail_idx, + &hw->vring[i].last_used_idx); + + m_vring_iova += size; + } + hw->nr_vring = nr_vring; + + return ifcvf_start_hw(&internal->hw); + +error: + for (i = 0; i < nr_vring; i++) + if (internal->m_vring[i].desc) + rte_free(internal->m_vring[i].desc); + + return -1; +} + +static int +m_ifcvf_stop(struct ifcvf_internal *internal) +{ + int vid; + uint32_t i; + struct rte_vhost_vring vq; + struct ifcvf_hw *hw = &internal->hw; + uint64_t m_vring_iova = IFCVF_MEDIATE_VRING; + uint64_t size, len; + + vid = internal->vid; + ifcvf_stop_hw(hw); + + for (i = 0; i < hw->nr_vring; i++) { + rte_vhost_get_vhost_vring(vid, i, &vq); + len = IFCVF_USED_RING_LEN(vq.size); + rte_vhost_log_used_vring(vid, i, 0, len); + + size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE), + PAGE_SIZE); + rte_vfio_container_dma_unmap(internal->vfio_container_fd, + (uint64_t)(uintptr_t)internal->m_vring[i].desc, + m_vring_iova, size); + + rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx, + hw->vring[i].last_used_idx); + rte_free(internal->m_vring[i].desc); + m_vring_iova += size; + } + + return 0; +} + +static int +m_enable_vfio_intr(struct ifcvf_internal *internal) +{ + uint32_t nr_vring; + struct rte_intr_handle *intr_handle = &internal->pdev->intr_handle; + int ret; + + nr_vring = rte_vhost_get_vring_num(internal->vid); + + ret = rte_intr_efd_enable(intr_handle, nr_vring); + if (ret) + return -1; + + ret = rte_intr_enable(intr_handle); + if (ret) + return -1; + + return 0; +} + +static void +m_disable_vfio_intr(struct ifcvf_internal *internal) +{ + struct rte_intr_handle *intr_handle = &internal->pdev->intr_handle; + + rte_intr_efd_disable(intr_handle); + rte_intr_disable(intr_handle); +} + +static void +update_avail_ring(struct ifcvf_internal *internal, uint16_t qid) +{ + rte_vdpa_relay_vring_avail(internal->vid, qid, &internal->m_vring[qid]); + ifcvf_notify_queue(&internal->hw, qid); +} + +static void +update_used_ring(struct ifcvf_internal *internal, uint16_t qid) +{ + rte_vdpa_relay_vring_used(internal->vid, qid, &internal->m_vring[qid]); + rte_vhost_vring_call(internal->vid, qid); +} + +static void * +vring_relay(void *arg) +{ + int i, vid, epfd, fd, nfds; + struct ifcvf_internal *internal = (struct ifcvf_internal *)arg; + struct rte_vhost_vring vring; + struct rte_intr_handle *intr_handle; + uint16_t qid, q_num; + struct epoll_event events[IFCVF_MAX_QUEUES * 4]; + struct epoll_event ev; + int nbytes; + uint64_t buf; + + vid = internal->vid; + q_num = rte_vhost_get_vring_num(vid); + /* prepare the mediate vring */ + for (qid = 0; qid < q_num; qid++) { + rte_vhost_get_vring_base(vid, qid, + &internal->m_vring[qid].avail->idx, + &internal->m_vring[qid].used->idx); + rte_vdpa_relay_vring_avail(vid, qid, &internal->m_vring[qid]); + } + + /* add notify fd and interrupt fd to epoll */ + epfd = epoll_create(IFCVF_MAX_QUEUES * 2); + if (epfd < 0) { + DRV_LOG(ERR, "failed to create epoll instance."); + return NULL; + } + internal->epfd = epfd; + + for (qid = 0; qid < q_num; qid++) { + ev.events = EPOLLIN | EPOLLPRI; + rte_vhost_get_vhost_vring(vid, qid, &vring); + ev.data.u64 = qid << 1 | (uint64_t)vring.kickfd << 32; + if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) { + DRV_LOG(ERR, "epoll add error: %s", strerror(errno)); + return NULL; + } + } + + intr_handle = &internal->pdev->intr_handle; + for (qid = 0; qid < q_num; qid++) { + ev.events = EPOLLIN | EPOLLPRI; + ev.data.u64 = 1 | qid << 1 | + (uint64_t)intr_handle->efds[qid] << 32; + if (epoll_ctl(epfd, EPOLL_CTL_ADD, intr_handle->efds[qid], &ev) + < 0) { + DRV_LOG(ERR, "epoll add error: %s", strerror(errno)); + return NULL; + } + } + + /* start relay with a first kick */ + for (qid = 0; qid < q_num; qid++) + ifcvf_notify_queue(&internal->hw, qid); + + /* listen to the events and react accordingly */ + for (;;) { + nfds = epoll_wait(epfd, events, q_num * 2, -1); + if (nfds < 0) { + if (errno == EINTR) + continue; + DRV_LOG(ERR, "epoll_wait return fail\n"); + return NULL; + } + + for (i = 0; i < nfds; i++) { + fd = (uint32_t)(events[i].data.u64 >> 32); + do { + nbytes = read(fd, &buf, 8); + if (nbytes < 0) { + if (errno == EINTR || + errno == EWOULDBLOCK || + errno == EAGAIN) + continue; + DRV_LOG(INFO, "Error reading " + "kickfd: %s", + strerror(errno)); + } + break; + } while (1); + + qid = events[i].data.u32 >> 1; + + if (events[i].data.u32 & 1) + update_used_ring(internal, qid); + else + update_avail_ring(internal, qid); + } + } + + return NULL; +} + +static int +setup_vring_relay(struct ifcvf_internal *internal) +{ + int ret; + + ret = pthread_create(&internal->tid, NULL, vring_relay, + (void *)internal); + if (ret) { + DRV_LOG(ERR, "failed to create ring relay pthread."); + return -1; + } + return 0; +} + +static int +unset_vring_relay(struct ifcvf_internal *internal) +{ + void *status; + + if (internal->tid) { + pthread_cancel(internal->tid); + pthread_join(internal->tid, &status); + } + internal->tid = 0; + + if (internal->epfd >= 0) + close(internal->epfd); + internal->epfd = -1; + + return 0; +} + +static int +ifcvf_sw_fallback_switchover(struct ifcvf_internal *internal) +{ + int ret; + + /* stop the direct IO data path */ + unset_notify_relay(internal); + vdpa_ifcvf_stop(internal); + vdpa_disable_vfio_intr(internal); + + ret = rte_vhost_host_notifier_ctrl(internal->vid, false); + if (ret && ret != -ENOTSUP) + goto error; + + /* set up interrupt for interrupt relay */ + ret = m_enable_vfio_intr(internal); + if (ret) + goto unmap; + + /* config the VF */ + ret = m_ifcvf_start(internal); + if (ret) + goto unset_intr; + + /* set up vring relay thread */ + ret = setup_vring_relay(internal); + if (ret) + goto stop_vf; + + internal->sw_fallback_running = true; + + return 0; + +stop_vf: + m_ifcvf_stop(internal); +unset_intr: + m_disable_vfio_intr(internal); +unmap: + ifcvf_dma_map(internal, 0); +error: + return -1; +} + static int ifcvf_dev_config(int vid) { @@ -579,8 +897,25 @@ ifcvf_dev_close(int vid) } internal = list->internal; - rte_atomic32_set(&internal->dev_attached, 0); - update_datapath(internal); + + if (internal->sw_fallback_running) { + /* unset ring relay */ + unset_vring_relay(internal); + + /* reset VF */ + m_ifcvf_stop(internal); + + /* remove interrupt setting */ + m_disable_vfio_intr(internal); + + /* unset DMA map for guest memory */ + ifcvf_dma_map(internal, 0); + + internal->sw_fallback_running = false; + } else { + rte_atomic32_set(&internal->dev_attached, 0); + update_datapath(internal); + } return 0; } @@ -604,7 +939,12 @@ ifcvf_set_features(int vid) internal = list->internal; rte_vhost_get_negotiated_features(vid, &features); - if (RTE_VHOST_NEED_LOG(features)) { + if (!RTE_VHOST_NEED_LOG(features)) + return 0; + + if (internal->sw_lm) { + ifcvf_sw_fallback_switchover(internal); + } else { rte_vhost_get_log_base(vid, &log_base, &log_size); rte_vfio_container_dma_map(internal->vfio_container_fd, log_base, IFCVF_LOG_BASE, log_size); From patchwork Fri Dec 14 21:16:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48942 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 587011BA30; Fri, 14 Dec 2018 22:26:57 +0100 (CET) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 9E94D1B9E9 for ; Fri, 14 Dec 2018 22:26:50 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 14 Dec 2018 13:26:49 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,354,1539673200"; d="scan'208";a="118922700" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.145]) by orsmga001.jf.intel.com with ESMTP; 14 Dec 2018 13:26:48 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: alejandro.lucero@netronome.com, dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Sat, 15 Dec 2018 05:16:12 +0800 Message-Id: <20181214211612.167681-11-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181214211612.167681-1-xiao.w.wang@intel.com> References: <20181213100910.13087-2-xiao.w.wang@intel.com> <20181214211612.167681-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH v4 10/10] doc: update ifc NIC document X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add the SW assisted VDPA live migration feature into NIC doc. Signed-off-by: Xiao Wang --- doc/guides/nics/ifc.rst | 8 ++++++++ doc/guides/rel_notes/release_19_02.rst | 6 ++++++ 2 files changed, 14 insertions(+) diff --git a/doc/guides/nics/ifc.rst b/doc/guides/nics/ifc.rst index 48f9adf1d..eb55d329a 100644 --- a/doc/guides/nics/ifc.rst +++ b/doc/guides/nics/ifc.rst @@ -39,6 +39,13 @@ the driver probe a new container is created for this device, with this container vDPA driver can program DMA remapping table with the VM's memory region information. +The device argument "swlm=1" will configure the driver into SW assisted live +migration mode. In this mode, the driver will set up a SW relay thread when LM +happens, this thread will help device to log dirty pages. Thus this mode does +not require HW to implement a dirty page logging function block, but will +consume some percentage of CPU resource depending on the network throughput. +If no "swlm=1" specified, driver will rely on device's logging capability. + Key IFCVF vDPA driver ops ~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -70,6 +77,7 @@ Features Features of the IFCVF driver are: - Compatibility with virtio 0.95 and 1.0. +- SW assisted vDPA live migration. Prerequisites diff --git a/doc/guides/rel_notes/release_19_02.rst b/doc/guides/rel_notes/release_19_02.rst index e86ef9511..ced6af8f0 100644 --- a/doc/guides/rel_notes/release_19_02.rst +++ b/doc/guides/rel_notes/release_19_02.rst @@ -60,6 +60,12 @@ New Features * Added the handler to get firmware version string. * Added support for multicast filtering. +* **Added support for SW-assisted VDPA live migration.** + + This SW-assisted VDPA live migration facility helps VDPA devices without + logging capability to perform live migration, a mediate SW relay can help + devices to track dirty pages caused by DMA. IFC driver has enabled this + SW-assisted live migration mode. Removed Items -------------