From patchwork Wed Nov 28 09:45:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48371 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 45A5D1B43D; Wed, 28 Nov 2018 10:55:47 +0100 (CET) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 87B4C1B42F for ; Wed, 28 Nov 2018 10:55:45 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Nov 2018 01:55:45 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,290,1539673200"; d="scan'208";a="112891116" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.106]) by orsmga001.jf.intel.com with ESMTP; 28 Nov 2018 01:55:43 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Wed, 28 Nov 2018 17:45:59 +0800 Message-Id: <20181128094607.106173-2-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181128094607.106173-1-xiao.w.wang@intel.com> References: <20181128094607.106173-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH 1/9] vhost: provide helper for host notifier ctrl X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" VDPA driver can decide if it needs to enable/disable the EPT mapping, exposing a API can allow flexibility. A later patch will base on this. Signed-off-by: Xiao Wang --- drivers/net/ifc/ifcvf_vdpa.c | 3 +++ lib/librte_vhost/rte_vdpa.h | 18 ++++++++++++++++++ lib/librte_vhost/rte_vhost_version.map | 1 + lib/librte_vhost/vhost.c | 3 +-- lib/librte_vhost/vhost_user.c | 7 +------ 5 files changed, 24 insertions(+), 8 deletions(-) diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index 97a57f182..e844109f3 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -556,6 +556,9 @@ ifcvf_dev_config(int vid) rte_atomic32_set(&internal->dev_attached, 1); update_datapath(internal); + if (rte_vhost_host_notifier_ctrl(vid, true) != 0) + DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did); + return 0; } diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h index a418da47c..89c5bb6b3 100644 --- a/lib/librte_vhost/rte_vdpa.h +++ b/lib/librte_vhost/rte_vdpa.h @@ -11,6 +11,8 @@ * Device specific vhost lib */ +#include + #include #include "rte_vhost.h" @@ -155,4 +157,20 @@ rte_vdpa_get_device(int did); */ int __rte_experimental rte_vdpa_get_device_num(void); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Enable/Disable EPT mapping for a vdpa port. + * + * @param vid + * vhost device id + * @enable + * true for EPT map, false for EPT unmap + * @return + * 0 on success, -1 on failure + */ +int __rte_experimental +rte_vhost_host_notifier_ctrl(int vid, bool enable); #endif /* _RTE_VDPA_H_ */ diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map index ae39b6e21..22302e972 100644 --- a/lib/librte_vhost/rte_vhost_version.map +++ b/lib/librte_vhost/rte_vhost_version.map @@ -83,4 +83,5 @@ EXPERIMENTAL { rte_vhost_crypto_finalize_requests; rte_vhost_crypto_set_zero_copy; rte_vhost_va_from_guest_pa; + rte_vhost_host_notifier_ctrl; }; diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index 70ac6bc9c..e7a60e0b4 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -408,8 +408,7 @@ vhost_detach_vdpa_device(int vid) if (dev == NULL) return; - vhost_user_host_notifier_ctrl(vid, false); - + vhost_destroy_device_notify(dev); dev->vdpa_dev_id = -1; } diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 3ea64eba6..5e0da0589 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -2045,11 +2045,6 @@ vhost_user_msg_handler(int vid, int fd) if (vdpa_dev->ops->dev_conf) vdpa_dev->ops->dev_conf(dev->vid); dev->flags |= VIRTIO_DEV_VDPA_CONFIGURED; - if (vhost_user_host_notifier_ctrl(dev->vid, true) != 0) { - RTE_LOG(INFO, VHOST_CONFIG, - "(%d) software relay is used for vDPA, performance may be low.\n", - dev->vid); - } } return 0; @@ -2144,7 +2139,7 @@ static int vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev, return process_slave_message_reply(dev, &msg); } -int vhost_user_host_notifier_ctrl(int vid, bool enable) +int rte_vhost_host_notifier_ctrl(int vid, bool enable) { struct virtio_net *dev; struct rte_vdpa_device *vdpa_dev; From patchwork Wed Nov 28 09:46:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48372 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 595CD1B460; Wed, 28 Nov 2018 10:55:51 +0100 (CET) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 82F611B452 for ; Wed, 28 Nov 2018 10:55:49 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Nov 2018 01:55:49 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,290,1539673200"; d="scan'208";a="112891133" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.106]) by orsmga001.jf.intel.com with ESMTP; 28 Nov 2018 01:55:47 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Wed, 28 Nov 2018 17:46:00 +0800 Message-Id: <20181128094607.106173-3-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181128094607.106173-1-xiao.w.wang@intel.com> References: <20181128094607.106173-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH 2/9] vhost: provide helpers for virtio ring relay X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This patch provides two helpers for vdpa device driver to perform a relay between the guest virtio ring and a mediate virtio ring. The available ring relay will synchronize the available entries, and helps to do desc validity checking. The used ring relay will synchronize the used entries from mediate ring to guest ring, and helps to do dirty page logging for live migration. The next patch will leverage these two helpers. Signed-off-by: Xiao Wang --- lib/librte_vhost/rte_vdpa.h | 38 ++++++++ lib/librte_vhost/rte_vhost_version.map | 2 + lib/librte_vhost/vdpa.c | 173 +++++++++++++++++++++++++++++++++ lib/librte_vhost/vhost.h | 40 ++++++++ lib/librte_vhost/virtio_net.c | 39 -------- 5 files changed, 253 insertions(+), 39 deletions(-) diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h index 89c5bb6b3..0c44b9080 100644 --- a/lib/librte_vhost/rte_vdpa.h +++ b/lib/librte_vhost/rte_vdpa.h @@ -173,4 +173,42 @@ rte_vdpa_get_device_num(void); */ int __rte_experimental rte_vhost_host_notifier_ctrl(int vid, bool enable); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Synchronize the available ring from guest to mediate ring, help to + * check desc validity to protect against malicious guest driver. + * + * @param vid + * vhost device id + * @param qid + * vhost queue id + * @param m_vring + * mediate virtio ring pointer + * @return + * number of synced available entries on success, -1 on failure + */ +int __rte_experimental +rte_vdpa_relay_avail_ring(int vid, int qid, struct vring *m_vring); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Synchronize the used ring from mediate ring to guest, log dirty + * page for each Rx buffer used. + * + * @param vid + * vhost device id + * @param qid + * vhost queue id + * @param m_vring + * mediate virtio ring pointer + * @return + * number of synced used entries on success, -1 on failure + */ +int __rte_experimental +rte_vdpa_relay_used_ring(int vid, int qid, struct vring *m_vring); #endif /* _RTE_VDPA_H_ */ diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map index 22302e972..0ad0fbea2 100644 --- a/lib/librte_vhost/rte_vhost_version.map +++ b/lib/librte_vhost/rte_vhost_version.map @@ -84,4 +84,6 @@ EXPERIMENTAL { rte_vhost_crypto_set_zero_copy; rte_vhost_va_from_guest_pa; rte_vhost_host_notifier_ctrl; + rte_vdpa_relay_avail_ring; + rte_vdpa_relay_used_ring; }; diff --git a/lib/librte_vhost/vdpa.c b/lib/librte_vhost/vdpa.c index e7d849ee0..e41117776 100644 --- a/lib/librte_vhost/vdpa.c +++ b/lib/librte_vhost/vdpa.c @@ -122,3 +122,176 @@ rte_vdpa_get_device_num(void) { return vdpa_device_num; } + +static int +invalid_desc_check(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint64_t desc_iova, uint64_t desc_len, uint8_t perm) +{ + uint64_t desc_addr, desc_chunck_len; + + while (desc_len) { + desc_chunck_len = desc_len; + desc_addr = vhost_iova_to_vva(dev, vq, + desc_iova, + &desc_chunck_len, + perm); + + if (!desc_addr) + return -1; + + desc_len -= desc_chunck_len; + desc_iova += desc_chunck_len; + } + + return 0; +} + +int +rte_vdpa_relay_avail_ring(int vid, int qid, struct vring *m_vring) +{ + struct virtio_net *dev = get_device(vid); + uint16_t idx, idx_m, desc_id; + struct vring_desc desc; + struct vhost_virtqueue *vq; + struct vring_desc *desc_ring; + struct vring_desc *idesc = NULL; + uint64_t dlen; + int ret; + + if (!dev) + return -1; + + vq = dev->virtqueue[qid]; + idx = vq->avail->idx; + idx_m = m_vring->avail->idx; + ret = idx - idx_m; + + while (idx_m != idx) { + /* avail entry copy */ + desc_id = vq->avail->ring[idx_m % vq->size]; + m_vring->avail->ring[idx_m % vq->size] = desc_id; + desc_ring = vq->desc; + + if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) { + dlen = vq->desc[desc_id].len; + desc_ring = (struct vring_desc *)(uintptr_t) + vhost_iova_to_vva(dev, vq, vq->desc[desc_id].addr, + &dlen, + VHOST_ACCESS_RO); + if (unlikely(!desc_ring)) + return -1; + + if (unlikely(dlen < vq->desc[idx].len)) { + idesc = alloc_copy_ind_table(dev, vq, + vq->desc[idx].addr, vq->desc[idx].len); + if (unlikely(!idesc)) + return -1; + + desc_ring = idesc; + } + + desc_id = 0; + } + + /* check if the buf addr is within the guest memory */ + do { + desc = desc_ring[desc_id]; + if (invalid_desc_check(dev, vq, desc.addr, desc.len, + VHOST_ACCESS_RW)) + return -1; + desc_id = desc.next; + } while (desc.flags & VRING_DESC_F_NEXT); + + if (unlikely(!!idesc)) { + free_ind_table(idesc); + idesc = NULL; + } + + idx_m++; + } + + m_vring->avail->idx = idx; + + if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) + vhost_avail_event(vq) = vq->avail->idx; + + return ret; +} + +int +rte_vdpa_relay_used_ring(int vid, int qid, struct vring *m_vring) +{ + struct virtio_net *dev = get_device(vid); + uint16_t idx, idx_m, desc_id; + struct vhost_virtqueue *vq; + struct vring_desc desc; + struct vring_desc *desc_ring; + struct vring_desc *idesc = NULL; + uint64_t dlen; + int ret; + + if (!dev) + return -1; + + vq = dev->virtqueue[qid]; + idx = vq->used->idx; + idx_m = m_vring->used->idx; + ret = idx_m - idx; + + while (idx != idx_m) { + /* copy used entry, used ring logging is not covered here */ + vq->used->ring[idx % vq->size] = + m_vring->used->ring[idx % vq->size]; + + /* dirty page logging for used ring */ + vhost_log_used_vring(dev, vq, + offsetof(struct vring_used, ring[idx % vq->size]), + sizeof(struct vring_used_elem)); + + desc_id = vq->used->ring[idx % vq->size].id; + desc_ring = vq->desc; + + if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) { + dlen = vq->desc[desc_id].len; + desc_ring = (struct vring_desc *)(uintptr_t) + vhost_iova_to_vva(dev, vq, vq->desc[desc_id].addr, + &dlen, + VHOST_ACCESS_RO); + if (unlikely(!desc_ring)) + return -1; + + if (unlikely(dlen < vq->desc[idx].len)) { + idesc = alloc_copy_ind_table(dev, vq, + vq->desc[idx].addr, vq->desc[idx].len); + if (unlikely(!idesc)) + return -1; + + desc_ring = idesc; + } + + desc_id = 0; + } + + /* dirty page logging for Rx buffer */ + do { + desc = desc_ring[desc_id]; + if (desc.flags & VRING_DESC_F_WRITE) + vhost_log_write(dev, desc.addr, desc.len); + desc_id = desc.next; + } while (desc.flags & VRING_DESC_F_NEXT); + + if (unlikely(!!idesc)) { + free_ind_table(idesc); + idesc = NULL; + } + + idx++; + } + + vq->used->idx = idx_m; + + if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) + vring_used_event(m_vring) = m_vring->used->idx; + + return ret; +} diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 5218f1b12..2164cd6d9 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -18,6 +18,7 @@ #include #include #include +#include #include "rte_vhost.h" #include "rte_vdpa.h" @@ -753,4 +754,43 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq) eventfd_write(vq->callfd, (eventfd_t)1); } +static __rte_always_inline void * +alloc_copy_ind_table(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint64_t desc_addr, uint64_t desc_len) +{ + void *idesc; + uint64_t src, dst; + uint64_t len, remain = desc_len; + + idesc = rte_malloc(__func__, desc_len, 0); + if (unlikely(!idesc)) + return 0; + + dst = (uint64_t)(uintptr_t)idesc; + + while (remain) { + len = remain; + src = vhost_iova_to_vva(dev, vq, desc_addr, &len, + VHOST_ACCESS_RO); + if (unlikely(!src || !len)) { + rte_free(idesc); + return 0; + } + + rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len); + + remain -= len; + dst += len; + desc_addr += len; + } + + return idesc; +} + +static __rte_always_inline void +free_ind_table(void *idesc) +{ + rte_free(idesc); +} + #endif /* _VHOST_NET_CDEV_H_ */ diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 5e1a1a727..8c657a101 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -37,45 +37,6 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t nr_vring) return (is_tx ^ (idx & 1)) == 0 && idx < nr_vring; } -static __rte_always_inline void * -alloc_copy_ind_table(struct virtio_net *dev, struct vhost_virtqueue *vq, - uint64_t desc_addr, uint64_t desc_len) -{ - void *idesc; - uint64_t src, dst; - uint64_t len, remain = desc_len; - - idesc = rte_malloc(__func__, desc_len, 0); - if (unlikely(!idesc)) - return 0; - - dst = (uint64_t)(uintptr_t)idesc; - - while (remain) { - len = remain; - src = vhost_iova_to_vva(dev, vq, desc_addr, &len, - VHOST_ACCESS_RO); - if (unlikely(!src || !len)) { - rte_free(idesc); - return 0; - } - - rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len); - - remain -= len; - dst += len; - desc_addr += len; - } - - return idesc; -} - -static __rte_always_inline void -free_ind_table(void *idesc) -{ - rte_free(idesc); -} - static __rte_always_inline void do_flush_shadow_used_ring_split(struct virtio_net *dev, struct vhost_virtqueue *vq, From patchwork Wed Nov 28 09:46:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48373 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 7BFA81B463; Wed, 28 Nov 2018 10:55:55 +0100 (CET) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 8998D1B463 for ; Wed, 28 Nov 2018 10:55:51 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Nov 2018 01:55:51 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,290,1539673200"; d="scan'208";a="112891142" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.106]) by orsmga001.jf.intel.com with ESMTP; 28 Nov 2018 01:55:49 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Wed, 28 Nov 2018 17:46:01 +0800 Message-Id: <20181128094607.106173-4-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181128094607.106173-1-xiao.w.wang@intel.com> References: <20181128094607.106173-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH 3/9] net/ifc: dump debug message for error X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Driver probe may fail for different causes, debug message is helpful for debugging issue. Signed-off-by: Xiao Wang --- drivers/net/ifc/ifcvf_vdpa.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index e844109f3..aacd5f9bf 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -22,7 +22,7 @@ #define DRV_LOG(level, fmt, args...) \ rte_log(RTE_LOG_ ## level, ifcvf_vdpa_logtype, \ - "%s(): " fmt "\n", __func__, ##args) + "IFCVF %s(): " fmt "\n", __func__, ##args) #ifndef PAGE_SIZE #define PAGE_SIZE 4096 @@ -756,11 +756,16 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, internal->pdev = pci_dev; rte_spinlock_init(&internal->lock); - if (ifcvf_vfio_setup(internal) < 0) - return -1; - if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0) - return -1; + if (ifcvf_vfio_setup(internal) < 0) { + DRV_LOG(ERR, "failed to setup device %s", pci_dev->name); + goto error; + } + + if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0) { + DRV_LOG(ERR, "failed to init device %s", pci_dev->name); + goto error; + } internal->max_queues = IFCVF_MAX_QUEUES; features = ifcvf_get_features(&internal->hw); @@ -782,8 +787,10 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, internal->did = rte_vdpa_register_device(&internal->dev_addr, &ifcvf_ops); - if (internal->did < 0) + if (internal->did < 0) { + DRV_LOG(ERR, "failed to register device %s", pci_dev->name); goto error; + } rte_atomic32_set(&internal->started, 1); update_datapath(internal); From patchwork Wed Nov 28 09:46:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48374 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 891F51B475; Wed, 28 Nov 2018 10:55:57 +0100 (CET) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 2AA1A1B46A; Wed, 28 Nov 2018 10:55:54 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Nov 2018 01:55:53 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,290,1539673200"; d="scan'208";a="112891149" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.106]) by orsmga001.jf.intel.com with ESMTP; 28 Nov 2018 01:55:52 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang , stable@dpdk.org Date: Wed, 28 Nov 2018 17:46:02 +0800 Message-Id: <20181128094607.106173-5-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181128094607.106173-1-xiao.w.wang@intel.com> References: <20181128094607.106173-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH 4/9] net/ifc: store only registered device instance X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" If driver fails to register ifc VF device into vhost lib, then this device should not be stored. Fixes: a3f8150eac6d ("net/ifcvf: add ifcvf vDPA driver") cc: stable@dpdk.org Signed-off-by: Xiao Wang --- drivers/net/ifc/ifcvf_vdpa.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index aacd5f9bf..6fcd50b73 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -781,10 +781,6 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, internal->dev_addr.type = PCI_ADDR; list->internal = internal; - pthread_mutex_lock(&internal_list_lock); - TAILQ_INSERT_TAIL(&internal_list, list, next); - pthread_mutex_unlock(&internal_list_lock); - internal->did = rte_vdpa_register_device(&internal->dev_addr, &ifcvf_ops); if (internal->did < 0) { @@ -792,6 +788,10 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, goto error; } + pthread_mutex_lock(&internal_list_lock); + TAILQ_INSERT_TAIL(&internal_list, list, next); + pthread_mutex_unlock(&internal_list_lock); + rte_atomic32_set(&internal->started, 1); update_datapath(internal); From patchwork Wed Nov 28 09:46:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48375 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 28C661B485; Wed, 28 Nov 2018 10:56:00 +0100 (CET) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id EB8CB1B46F for ; Wed, 28 Nov 2018 10:55:56 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Nov 2018 01:55:56 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,290,1539673200"; d="scan'208";a="112891164" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.106]) by orsmga001.jf.intel.com with ESMTP; 28 Nov 2018 01:55:54 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Wed, 28 Nov 2018 17:46:03 +0800 Message-Id: <20181128094607.106173-6-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181128094607.106173-1-xiao.w.wang@intel.com> References: <20181128094607.106173-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH 5/9] net/ifc: detect if VDPA mode is specified X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" If user wants the VF to be used in VDPA (vhost data path acceleration) mode, then the user can add a "vdpa=1" parameter for the device. So if driver doesn't not find this option, it should quit and let the bus continue the probe. Signed-off-by: Xiao Wang --- drivers/net/ifc/ifcvf_vdpa.c | 47 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index 6fcd50b73..c0e50354a 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include #include "base/ifcvf.h" @@ -28,6 +30,13 @@ #define PAGE_SIZE 4096 #endif +#define IFCVF_VDPA_MODE "vdpa" + +static const char * const ifcvf_valid_arguments[] = { + IFCVF_VDPA_MODE, + NULL +}; + static int ifcvf_vdpa_logtype; struct ifcvf_internal { @@ -735,6 +744,21 @@ static struct rte_vdpa_dev_ops ifcvf_ops = { .get_notify_area = ifcvf_get_notify_area, }; +static inline int +open_int(const char *key __rte_unused, const char *value, void *extra_args) +{ + uint16_t *n = extra_args; + + if (value == NULL || extra_args == NULL) + return -EINVAL; + + *n = (uint16_t)strtoul(value, NULL, 0); + if (*n == USHRT_MAX && errno == ERANGE) + return -1; + + return 0; +} + static int ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, struct rte_pci_device *pci_dev) @@ -742,10 +766,31 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, uint64_t features; struct ifcvf_internal *internal = NULL; struct internal_list *list = NULL; + int vdpa_mode = 0; + struct rte_kvargs *kvlist = NULL; + int ret = 0; if (rte_eal_process_type() != RTE_PROC_PRIMARY) return 0; + kvlist = rte_kvargs_parse(pci_dev->device.devargs->args, + ifcvf_valid_arguments); + if (kvlist == NULL) + return 1; + + /* probe only when vdpa mode is specified */ + if (rte_kvargs_count(kvlist, IFCVF_VDPA_MODE) == 0) { + rte_kvargs_free(kvlist); + return 1; + } + + ret = rte_kvargs_process(kvlist, IFCVF_VDPA_MODE, &open_int, + &vdpa_mode); + if (ret < 0 || vdpa_mode == 0) { + rte_kvargs_free(kvlist); + return 1; + } + list = rte_zmalloc("ifcvf", sizeof(*list), 0); if (list == NULL) goto error; @@ -795,9 +840,11 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, rte_atomic32_set(&internal->started, 1); update_datapath(internal); + rte_kvargs_free(kvlist); return 0; error: + rte_kvargs_free(kvlist); rte_free(list); rte_free(internal); return -1; From patchwork Wed Nov 28 09:46:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48376 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 0EAC91B48D; Wed, 28 Nov 2018 10:56:03 +0100 (CET) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id D02971B46F for ; Wed, 28 Nov 2018 10:55:58 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Nov 2018 01:55:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,290,1539673200"; d="scan'208";a="112891177" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.106]) by orsmga001.jf.intel.com with ESMTP; 28 Nov 2018 01:55:57 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Wed, 28 Nov 2018 17:46:04 +0800 Message-Id: <20181128094607.106173-7-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181128094607.106173-1-xiao.w.wang@intel.com> References: <20181128094607.106173-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH 6/9] net/ifc: add devarg for LM mode X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This patch series enables a new method for live migration, i.e. software assisted live migration. This patch provides a device argument for user to choose the methold. When "swlm=1", driver/device will do live migration with a relay thread dealing with dirty page logging. Without this parameter, device will do dirty page logging and there's no relay thread consuming CPU resource. Signed-off-by: Xiao Wang Signed-off-by: Xiao Wang > --- drivers/net/ifc/ifcvf_vdpa.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index c0e50354a..e9cc8d7bc 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include @@ -31,9 +32,11 @@ #endif #define IFCVF_VDPA_MODE "vdpa" +#define IFCVF_SW_FALLBACK_LM "swlm" static const char * const ifcvf_valid_arguments[] = { IFCVF_VDPA_MODE, + IFCVF_SW_FALLBACK_LM, NULL }; @@ -56,6 +59,7 @@ struct ifcvf_internal { rte_atomic32_t dev_attached; rte_atomic32_t running; rte_spinlock_t lock; + bool sw_lm; }; struct internal_list { @@ -767,6 +771,7 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, struct ifcvf_internal *internal = NULL; struct internal_list *list = NULL; int vdpa_mode = 0; + int sw_fallback_lm = 0; struct rte_kvargs *kvlist = NULL; int ret = 0; @@ -826,6 +831,16 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, internal->dev_addr.type = PCI_ADDR; list->internal = internal; + if (rte_kvargs_count(kvlist, IFCVF_SW_FALLBACK_LM)) { + ret = rte_kvargs_process(kvlist, IFCVF_SW_FALLBACK_LM, + &open_int, &sw_fallback_lm); + if (ret < 0) + goto error; + internal->sw_lm = sw_fallback_lm ? true : false; + } else { + internal->sw_lm = false; + } + internal->did = rte_vdpa_register_device(&internal->dev_addr, &ifcvf_ops); if (internal->did < 0) { From patchwork Wed Nov 28 09:46:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48377 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 91FA61B48F; Wed, 28 Nov 2018 10:56:05 +0100 (CET) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 707781B488 for ; Wed, 28 Nov 2018 10:56:01 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Nov 2018 01:56:01 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,290,1539673200"; d="scan'208";a="112891202" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.106]) by orsmga001.jf.intel.com with ESMTP; 28 Nov 2018 01:55:59 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Wed, 28 Nov 2018 17:46:05 +0800 Message-Id: <20181128094607.106173-8-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181128094607.106173-1-xiao.w.wang@intel.com> References: <20181128094607.106173-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH 7/9] net/ifc: use lib API for used ring logging X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Vhost lib has already provided a helper for used ring logging, driver could use it to reduce code. Signed-off-by: Xiao Wang --- drivers/net/ifc/ifcvf_vdpa.c | 27 ++++++++------------------- 1 file changed, 8 insertions(+), 19 deletions(-) diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index e9cc8d7bc..6c64ac4f7 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -31,6 +31,9 @@ #define PAGE_SIZE 4096 #endif +#define IFCVF_USED_RING_LEN(size) \ + ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3) + #define IFCVF_VDPA_MODE "vdpa" #define IFCVF_SW_FALLBACK_LM "swlm" @@ -288,21 +291,6 @@ vdpa_ifcvf_start(struct ifcvf_internal *internal) return ifcvf_start_hw(&internal->hw); } -static void -ifcvf_used_ring_log(struct ifcvf_hw *hw, uint32_t queue, uint8_t *log_buf) -{ - uint32_t i, size; - uint64_t pfn; - - pfn = hw->vring[queue].used / PAGE_SIZE; - size = hw->vring[queue].size * sizeof(struct vring_used_elem) + - sizeof(uint16_t) * 3; - - for (i = 0; i <= size / PAGE_SIZE; i++) - __sync_fetch_and_or_8(&log_buf[(pfn + i) / 8], - 1 << ((pfn + i) % 8)); -} - static void vdpa_ifcvf_stop(struct ifcvf_internal *internal) { @@ -311,7 +299,7 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal) int vid; uint64_t features; uint64_t log_base, log_size; - uint8_t *log_buf; + uint64_t len; vid = internal->vid; ifcvf_stop_hw(hw); @@ -330,9 +318,10 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal) * IFCVF marks dirty memory pages for only packet buffer, * SW helps to mark the used ring as dirty after device stops. */ - log_buf = (uint8_t *)(uintptr_t)log_base; - for (i = 0; i < hw->nr_vring; i++) - ifcvf_used_ring_log(hw, i, log_buf); + for (i = 0; i < hw->nr_vring; i++) { + len = IFCVF_USED_RING_LEN(hw->vring[i].size); + rte_vhost_log_used_vring(vid, i, 0, len); + } } } From patchwork Wed Nov 28 09:46:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48378 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 0BF431B495; Wed, 28 Nov 2018 10:56:08 +0100 (CET) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id B52AF1B474 for ; Wed, 28 Nov 2018 10:56:03 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Nov 2018 01:56:03 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,290,1539673200"; d="scan'208";a="112891231" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.106]) by orsmga001.jf.intel.com with ESMTP; 28 Nov 2018 01:56:01 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Wed, 28 Nov 2018 17:46:06 +0800 Message-Id: <20181128094607.106173-9-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181128094607.106173-1-xiao.w.wang@intel.com> References: <20181128094607.106173-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH 8/9] net/ifc: support SW assisted VDPA live migration X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" In SW assisted live migration mode, driver will stop the device and setup a mediate virtio ring to relay the communication between the virtio driver and the VDPA device. This data path intervention will allow SW to help on guest dirty page logging for live migration. This SW fallback is event driven relay thread, so when the network throughput is low, this SW fallback will take little CPU resource, but when the throughput goes up, the relay thread's CPU usage will goes up accordinly. User needs to take all the factors including CPU usage, guest perf degradation, etc. into consideration when selecting the live migration support mode. Signed-off-by: Xiao Wang --- drivers/net/ifc/base/ifcvf.h | 1 + drivers/net/ifc/ifcvf_vdpa.c | 346 ++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 344 insertions(+), 3 deletions(-) diff --git a/drivers/net/ifc/base/ifcvf.h b/drivers/net/ifc/base/ifcvf.h index f026c70ab..8eb70ae9d 100644 --- a/drivers/net/ifc/base/ifcvf.h +++ b/drivers/net/ifc/base/ifcvf.h @@ -50,6 +50,7 @@ #define IFCVF_LM_ENABLE_VF 0x1 #define IFCVF_LM_ENABLE_PF 0x3 #define IFCVF_LOG_BASE 0x100000000000 +#define IFCVF_MEDIATE_VRING 0x200000000000 #define IFCVF_32_BIT_MASK 0xffffffff diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index 6c64ac4f7..875a0009d 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -63,6 +63,9 @@ struct ifcvf_internal { rte_atomic32_t running; rte_spinlock_t lock; bool sw_lm; + bool sw_fallback_running; + /* mediated vring for sw fallback */ + struct vring m_vring[IFCVF_MAX_QUEUES * 2]; }; struct internal_list { @@ -308,6 +311,9 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal) rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx, hw->vring[i].last_used_idx); + if (internal->sw_lm) + return; + rte_vhost_get_negotiated_features(vid, &features); if (RTE_VHOST_NEED_LOG(features)) { ifcvf_disable_logging(hw); @@ -539,6 +545,318 @@ update_datapath(struct ifcvf_internal *internal) return ret; } +static int +m_ifcvf_start(struct ifcvf_internal *internal) +{ + struct ifcvf_hw *hw = &internal->hw; + uint32_t i, nr_vring; + int vid, ret; + struct rte_vhost_vring vq; + void *vring_buf; + uint64_t m_vring_iova = IFCVF_MEDIATE_VRING; + uint64_t size; + uint64_t gpa; + + vid = internal->vid; + nr_vring = rte_vhost_get_vring_num(vid); + rte_vhost_get_negotiated_features(vid, &hw->req_features); + + for (i = 0; i < nr_vring; i++) { + rte_vhost_get_vhost_vring(vid, i, &vq); + + size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE), + PAGE_SIZE); + vring_buf = rte_zmalloc("ifcvf", size, PAGE_SIZE); + vring_init(&internal->m_vring[i], vq.size, vring_buf, + PAGE_SIZE); + + ret = rte_vfio_container_dma_map(internal->vfio_container_fd, + (uint64_t)(uintptr_t)vring_buf, m_vring_iova, size); + if (ret < 0) { + DRV_LOG(ERR, "mediate vring DMA map failed."); + goto error; + } + + gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.desc); + if (gpa == 0) { + DRV_LOG(ERR, "Fail to get GPA for descriptor ring."); + return -1; + } + hw->vring[i].desc = gpa; + + hw->vring[i].avail = m_vring_iova + + (char *)internal->m_vring[i].avail - + (char *)internal->m_vring[i].desc; + + hw->vring[i].used = m_vring_iova + + (char *)internal->m_vring[i].used - + (char *)internal->m_vring[i].desc; + + hw->vring[i].size = vq.size; + + rte_vhost_get_vring_base(vid, i, &hw->vring[i].last_avail_idx, + &hw->vring[i].last_used_idx); + + m_vring_iova += size; + } + hw->nr_vring = nr_vring; + + return ifcvf_start_hw(&internal->hw); + +error: + for (i = 0; i < nr_vring; i++) + if (internal->m_vring[i].desc) + rte_free(internal->m_vring[i].desc); + + return -1; +} + +static int +m_ifcvf_stop(struct ifcvf_internal *internal) +{ + int vid; + uint32_t i; + struct rte_vhost_vring vq; + struct ifcvf_hw *hw = &internal->hw; + uint64_t m_vring_iova = IFCVF_MEDIATE_VRING; + uint64_t size, len; + + vid = internal->vid; + ifcvf_stop_hw(hw); + + for (i = 0; i < hw->nr_vring; i++) { + rte_vhost_get_vhost_vring(vid, i, &vq); + len = IFCVF_USED_RING_LEN(vq.size); + rte_vhost_log_used_vring(vid, i, 0, len); + + size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE), + PAGE_SIZE); + rte_vfio_container_dma_unmap(internal->vfio_container_fd, + (uint64_t)(uintptr_t)internal->m_vring[i].desc, + m_vring_iova, size); + + rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx, + hw->vring[i].last_used_idx); + rte_free(internal->m_vring[i].desc); + m_vring_iova += size; + } + + return 0; +} + +static int +m_enable_vfio_intr(struct ifcvf_internal *internal) +{ + uint32_t nr_vring; + struct rte_intr_handle *intr_handle = &internal->pdev->intr_handle; + int ret; + + nr_vring = rte_vhost_get_vring_num(internal->vid); + + ret = rte_intr_efd_enable(intr_handle, nr_vring); + if (ret) + return -1; + + ret = rte_intr_enable(intr_handle); + if (ret) + return -1; + + return 0; +} + +static void +m_disable_vfio_intr(struct ifcvf_internal *internal) +{ + struct rte_intr_handle *intr_handle = &internal->pdev->intr_handle; + + rte_intr_efd_disable(intr_handle); + rte_intr_disable(intr_handle); +} + +static void +update_avail_ring(struct ifcvf_internal *internal, int qid) +{ + rte_vdpa_relay_avail_ring(internal->vid, qid, &internal->m_vring[qid]); + ifcvf_notify_queue(&internal->hw, qid); +} + +static void +update_used_ring(struct ifcvf_internal *internal, int qid) +{ + rte_vdpa_relay_used_ring(internal->vid, qid, &internal->m_vring[qid]); + rte_vhost_vring_call(internal->vid, qid); +} + +static void * +vring_relay(void *arg) +{ + int i, vid, epfd, fd, nfds; + struct ifcvf_internal *internal = (struct ifcvf_internal *)arg; + struct rte_vhost_vring vring; + struct rte_intr_handle *intr_handle; + uint32_t qid, q_num; + struct epoll_event events[IFCVF_MAX_QUEUES * 4]; + struct epoll_event ev; + int nbytes; + uint64_t buf; + + vid = internal->vid; + q_num = rte_vhost_get_vring_num(vid); + /* prepare the mediate vring */ + for (qid = 0; qid < q_num; qid++) { + rte_vhost_get_vring_base(vid, qid, + &internal->m_vring[qid].avail->idx, + &internal->m_vring[qid].used->idx); + rte_vdpa_relay_avail_ring(vid, qid, &internal->m_vring[qid]); + } + + /* add notify fd and interrupt fd to epoll */ + epfd = epoll_create(IFCVF_MAX_QUEUES * 2); + if (epfd < 0) { + DRV_LOG(ERR, "failed to create epoll instance."); + return NULL; + } + internal->epfd = epfd; + + for (qid = 0; qid < q_num; qid++) { + ev.events = EPOLLIN | EPOLLPRI; + rte_vhost_get_vhost_vring(vid, qid, &vring); + ev.data.u64 = qid << 1 | (uint64_t)vring.kickfd << 32; + if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) { + DRV_LOG(ERR, "epoll add error: %s", strerror(errno)); + return NULL; + } + } + + intr_handle = &internal->pdev->intr_handle; + for (qid = 0; qid < q_num; qid++) { + ev.events = EPOLLIN | EPOLLPRI; + ev.data.u64 = 1 | qid << 1 | + (uint64_t)intr_handle->efds[qid] << 32; + if (epoll_ctl(epfd, EPOLL_CTL_ADD, intr_handle->efds[qid], &ev) + < 0) { + DRV_LOG(ERR, "epoll add error: %s", strerror(errno)); + return NULL; + } + } + + /* start relay with a first kick */ + for (qid = 0; qid < q_num; qid++) + ifcvf_notify_queue(&internal->hw, qid); + + /* listen to the events and react accordingly */ + for (;;) { + nfds = epoll_wait(epfd, events, q_num * 2, -1); + if (nfds < 0) { + if (errno == EINTR) + continue; + DRV_LOG(ERR, "epoll_wait return fail\n"); + return NULL; + } + + for (i = 0; i < nfds; i++) { + fd = (uint32_t)(events[i].data.u64 >> 32); + do { + nbytes = read(fd, &buf, 8); + if (nbytes < 0) { + if (errno == EINTR || + errno == EWOULDBLOCK || + errno == EAGAIN) + continue; + DRV_LOG(INFO, "Error reading " + "kickfd: %s", + strerror(errno)); + } + break; + } while (1); + + qid = events[i].data.u32 >> 1; + + if (events[i].data.u32 & 1) + update_used_ring(internal, qid); + else + update_avail_ring(internal, qid); + } + } + + return NULL; +} + +static int +setup_vring_relay(struct ifcvf_internal *internal) +{ + int ret; + + ret = pthread_create(&internal->tid, NULL, vring_relay, + (void *)internal); + if (ret) { + DRV_LOG(ERR, "failed to create ring relay pthread."); + return -1; + } + return 0; +} + +static int +unset_vring_relay(struct ifcvf_internal *internal) +{ + void *status; + + if (internal->tid) { + pthread_cancel(internal->tid); + pthread_join(internal->tid, &status); + } + internal->tid = 0; + + if (internal->epfd >= 0) + close(internal->epfd); + internal->epfd = -1; + + return 0; +} + +static int +ifcvf_sw_fallback_switchover(struct ifcvf_internal *internal) +{ + int ret; + + /* stop the direct IO data path */ + unset_notify_relay(internal); + vdpa_ifcvf_stop(internal); + vdpa_disable_vfio_intr(internal); + + ret = rte_vhost_host_notifier_ctrl(internal->vid, false); + if (ret && ret != -ENOTSUP) + goto error; + + /* set up interrupt for interrupt relay */ + ret = m_enable_vfio_intr(internal); + if (ret) + goto unmap; + + /* config the VF */ + ret = m_ifcvf_start(internal); + if (ret) + goto unset_intr; + + /* set up vring relay thread */ + ret = setup_vring_relay(internal); + if (ret) + goto stop_vf; + + internal->sw_fallback_running = true; + + return 0; + +stop_vf: + m_ifcvf_stop(internal); +unset_intr: + m_disable_vfio_intr(internal); +unmap: + ifcvf_dma_map(internal, 0); +error: + return -1; +} + static int ifcvf_dev_config(int vid) { @@ -579,8 +897,25 @@ ifcvf_dev_close(int vid) } internal = list->internal; - rte_atomic32_set(&internal->dev_attached, 0); - update_datapath(internal); + + if (internal->sw_fallback_running) { + /* unset ring relay */ + unset_vring_relay(internal); + + /* reset VF */ + m_ifcvf_stop(internal); + + /* remove interrupt setting */ + m_disable_vfio_intr(internal); + + /* unset DMA map for guest memory */ + ifcvf_dma_map(internal, 0); + + internal->sw_fallback_running = false; + } else { + rte_atomic32_set(&internal->dev_attached, 0); + update_datapath(internal); + } return 0; } @@ -604,7 +939,12 @@ ifcvf_set_features(int vid) internal = list->internal; rte_vhost_get_negotiated_features(vid, &features); - if (RTE_VHOST_NEED_LOG(features)) { + if (!RTE_VHOST_NEED_LOG(features)) + return 0; + + if (internal->sw_lm) { + ifcvf_sw_fallback_switchover(internal); + } else { rte_vhost_get_log_base(vid, &log_base, &log_size); rte_vfio_container_dma_map(internal->vfio_container_fd, log_base, IFCVF_LOG_BASE, log_size); From patchwork Wed Nov 28 09:46:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48379 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D72AC1B49D; Wed, 28 Nov 2018 10:56:09 +0100 (CET) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 835781B495 for ; Wed, 28 Nov 2018 10:56:06 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Nov 2018 01:56:06 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,290,1539673200"; d="scan'208";a="112891256" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.106]) by orsmga001.jf.intel.com with ESMTP; 28 Nov 2018 01:56:04 -0800 From: Xiao Wang To: tiwei.bie@intel.com, maxime.coquelin@redhat.com Cc: dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Wed, 28 Nov 2018 17:46:07 +0800 Message-Id: <20181128094607.106173-10-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181128094607.106173-1-xiao.w.wang@intel.com> References: <20181128094607.106173-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH 9/9] doc: update ifc NIC document X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Signed-off-by: Xiao Wang --- doc/guides/nics/ifc.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/doc/guides/nics/ifc.rst b/doc/guides/nics/ifc.rst index 48f9adf1d..a16f2982f 100644 --- a/doc/guides/nics/ifc.rst +++ b/doc/guides/nics/ifc.rst @@ -39,6 +39,12 @@ the driver probe a new container is created for this device, with this container vDPA driver can program DMA remapping table with the VM's memory region information. +The device argument "swlm=1" will configure the driver into SW assisted live +migration mode. In this mode, the driver will set up a SW relay thread when LM +happens, this thread will help device to log dirty pages. Thus this mode does +not require HW to implement a dirty page logging function block, but will +consume some percentage of CPU resource depending on the network throughput. + Key IFCVF vDPA driver ops ~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -70,6 +76,7 @@ Features Features of the IFCVF driver are: - Compatibility with virtio 0.95 and 1.0. +- SW assisted vDPA for live migration. Prerequisites