From patchwork Thu Dec 13 01:10:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48718 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id CD7421B44D; Thu, 13 Dec 2018 02:22:50 +0100 (CET) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id DFB751B44D for ; Thu, 13 Dec 2018 02:22:49 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Dec 2018 17:22:49 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,346,1539673200"; d="scan'208";a="101112814" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.145]) by orsmga008.jf.intel.com with ESMTP; 12 Dec 2018 17:22:48 -0800 From: Xiao Wang To: alejandro.lucero@netronome.com, tiwei.bie@intel.com Cc: maxime.coquelin@redhat.com, dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Thu, 13 Dec 2018 09:10:06 +0800 Message-Id: <20181213011014.110089-2-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181213011014.110089-1-xiao.w.wang@intel.com> References: <20181128094607.106173-3-xiao.w.wang@intel.com> <20181213011014.110089-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH v2 1/9] vhost: provide helper for host notifier ctrl X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" VDPA driver can decide if it needs to enable/disable the host notifier mapping, so exposing a API can allow flexibility. A later patch will base on this. Signed-off-by: Xiao Wang --- v2: * Reword the vdpa host notifier control API comment. --- drivers/net/ifc/ifcvf_vdpa.c | 3 +++ lib/librte_vhost/rte_vdpa.h | 18 ++++++++++++++++++ lib/librte_vhost/rte_vhost_version.map | 1 + lib/librte_vhost/vhost.c | 3 +-- lib/librte_vhost/vhost_user.c | 7 +------ 5 files changed, 24 insertions(+), 8 deletions(-) diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index 97a57f182..e844109f3 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -556,6 +556,9 @@ ifcvf_dev_config(int vid) rte_atomic32_set(&internal->dev_attached, 1); update_datapath(internal); + if (rte_vhost_host_notifier_ctrl(vid, true) != 0) + DRV_LOG(NOTICE, "vDPA (%d): software relay is used.", did); + return 0; } diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h index a418da47c..fff657391 100644 --- a/lib/librte_vhost/rte_vdpa.h +++ b/lib/librte_vhost/rte_vdpa.h @@ -11,6 +11,8 @@ * Device specific vhost lib */ +#include + #include #include "rte_vhost.h" @@ -155,4 +157,20 @@ rte_vdpa_get_device(int did); */ int __rte_experimental rte_vdpa_get_device_num(void); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Enable/Disable host notifier mapping for a vdpa port. + * + * @param vid + * vhost device id + * @enable + * true for host notifier map, false for host notifier unmap + * @return + * 0 on success, -1 on failure + */ +int __rte_experimental +rte_vhost_host_notifier_ctrl(int vid, bool enable); #endif /* _RTE_VDPA_H_ */ diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map index ae39b6e21..22302e972 100644 --- a/lib/librte_vhost/rte_vhost_version.map +++ b/lib/librte_vhost/rte_vhost_version.map @@ -83,4 +83,5 @@ EXPERIMENTAL { rte_vhost_crypto_finalize_requests; rte_vhost_crypto_set_zero_copy; rte_vhost_va_from_guest_pa; + rte_vhost_host_notifier_ctrl; }; diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index 70ac6bc9c..e7a60e0b4 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -408,8 +408,7 @@ vhost_detach_vdpa_device(int vid) if (dev == NULL) return; - vhost_user_host_notifier_ctrl(vid, false); - + vhost_destroy_device_notify(dev); dev->vdpa_dev_id = -1; } diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index 3ea64eba6..5e0da0589 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -2045,11 +2045,6 @@ vhost_user_msg_handler(int vid, int fd) if (vdpa_dev->ops->dev_conf) vdpa_dev->ops->dev_conf(dev->vid); dev->flags |= VIRTIO_DEV_VDPA_CONFIGURED; - if (vhost_user_host_notifier_ctrl(dev->vid, true) != 0) { - RTE_LOG(INFO, VHOST_CONFIG, - "(%d) software relay is used for vDPA, performance may be low.\n", - dev->vid); - } } return 0; @@ -2144,7 +2139,7 @@ static int vhost_user_slave_set_vring_host_notifier(struct virtio_net *dev, return process_slave_message_reply(dev, &msg); } -int vhost_user_host_notifier_ctrl(int vid, bool enable) +int rte_vhost_host_notifier_ctrl(int vid, bool enable) { struct virtio_net *dev; struct rte_vdpa_device *vdpa_dev; From patchwork Thu Dec 13 01:10:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48719 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 13F941B455; Thu, 13 Dec 2018 02:22:57 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id E47131B3B1 for ; Thu, 13 Dec 2018 02:22:55 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Dec 2018 17:22:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,346,1539673200"; d="scan'208";a="101112830" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.145]) by orsmga008.jf.intel.com with ESMTP; 12 Dec 2018 17:22:53 -0800 From: Xiao Wang To: alejandro.lucero@netronome.com, tiwei.bie@intel.com Cc: maxime.coquelin@redhat.com, dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Thu, 13 Dec 2018 09:10:07 +0800 Message-Id: <20181213011014.110089-3-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181213011014.110089-1-xiao.w.wang@intel.com> References: <20181128094607.106173-3-xiao.w.wang@intel.com> <20181213011014.110089-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH v2 2/9] vhost: provide helpers for virtio ring relay X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This patch provides two helpers for vdpa device driver to perform a relay between the guest virtio ring and a mediate virtio ring. The available ring relay will synchronize the available entries, and helps to do desc validity checking. The used ring relay will synchronize the used entries from mediate ring to guest ring, and helps to do dirty page logging for live migration. The next patch will leverage these two helpers. Signed-off-by: Xiao Wang --- v2: * Make the vring relay API parameter as "void *" to accomodate the future potential new ring layout, e.g. packed ring. * Add parameter check for the new API. * Add memory barrier for ring idx update. * Remove the used ring logging in the relay. * Some comment fix and code cleaning according to Tiwei's comment. --- lib/librte_vhost/rte_vdpa.h | 38 +++++++ lib/librte_vhost/rte_vhost_version.map | 2 + lib/librte_vhost/vdpa.c | 187 +++++++++++++++++++++++++++++++++ lib/librte_vhost/vhost.h | 40 +++++++ lib/librte_vhost/virtio_net.c | 39 ------- 5 files changed, 267 insertions(+), 39 deletions(-) diff --git a/lib/librte_vhost/rte_vdpa.h b/lib/librte_vhost/rte_vdpa.h index fff657391..265250939 100644 --- a/lib/librte_vhost/rte_vdpa.h +++ b/lib/librte_vhost/rte_vdpa.h @@ -173,4 +173,42 @@ rte_vdpa_get_device_num(void); */ int __rte_experimental rte_vhost_host_notifier_ctrl(int vid, bool enable); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Synchronize the available ring from guest to mediate ring, help to + * check desc validity to protect against malicious guest driver. + * + * @param vid + * vhost device id + * @param qid + * vhost queue id + * @param m_vring + * mediate virtio ring pointer + * @return + * number of synced available entries on success, -1 on failure + */ +int __rte_experimental +rte_vdpa_relay_avail_ring(int vid, uint16_t qid, void *m_vring); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Synchronize the used ring from mediate ring to guest, log dirty + * page for each Rx buffer used. + * + * @param vid + * vhost device id + * @param qid + * vhost queue id + * @param m_vring + * mediate virtio ring pointer + * @return + * number of synced used entries on success, -1 on failure + */ +int __rte_experimental +rte_vdpa_relay_used_ring(int vid, uint16_t qid, void *m_vring); #endif /* _RTE_VDPA_H_ */ diff --git a/lib/librte_vhost/rte_vhost_version.map b/lib/librte_vhost/rte_vhost_version.map index 22302e972..0ad0fbea2 100644 --- a/lib/librte_vhost/rte_vhost_version.map +++ b/lib/librte_vhost/rte_vhost_version.map @@ -84,4 +84,6 @@ EXPERIMENTAL { rte_vhost_crypto_set_zero_copy; rte_vhost_va_from_guest_pa; rte_vhost_host_notifier_ctrl; + rte_vdpa_relay_avail_ring; + rte_vdpa_relay_used_ring; }; diff --git a/lib/librte_vhost/vdpa.c b/lib/librte_vhost/vdpa.c index e7d849ee0..16193cfc0 100644 --- a/lib/librte_vhost/vdpa.c +++ b/lib/librte_vhost/vdpa.c @@ -122,3 +122,190 @@ rte_vdpa_get_device_num(void) { return vdpa_device_num; } + +static bool +invalid_desc_check(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint64_t desc_iova, uint64_t desc_len, uint8_t perm) +{ + uint64_t desc_addr, desc_chunck_len; + + while (desc_len) { + desc_chunck_len = desc_len; + desc_addr = vhost_iova_to_vva(dev, vq, + desc_iova, + &desc_chunck_len, + perm); + + if (!desc_addr) + return true; + + desc_len -= desc_chunck_len; + desc_iova += desc_chunck_len; + } + + return false; +} + +int +rte_vdpa_relay_avail_ring(int vid, uint16_t qid, void *m_vring) +{ + struct virtio_net *dev = get_device(vid); + uint16_t idx, idx_m, desc_id; + struct vring_desc desc; + struct vhost_virtqueue *vq; + struct vring_desc *desc_ring; + struct vring_desc *idesc = NULL; + struct vring *s_vring; + uint64_t dlen; + int ret; + + if (!dev || !m_vring) + return -1; + + if (qid >= dev->nr_vring) + return -1; + + if (vq_is_packed(dev)) + return -1; + + s_vring = (struct vring *)m_vring; + vq = dev->virtqueue[qid]; + idx = vq->avail->idx; + idx_m = s_vring->avail->idx; + ret = (uint16_t)(idx - idx_m); + + while (idx_m != idx) { + /* avail entry copy */ + desc_id = vq->avail->ring[idx_m & (vq->size - 1)]; + s_vring->avail->ring[idx_m & (vq->size - 1)] = desc_id; + desc_ring = vq->desc; + + if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) { + dlen = vq->desc[desc_id].len; + desc_ring = (struct vring_desc *)(uintptr_t) + vhost_iova_to_vva(dev, vq, vq->desc[desc_id].addr, + &dlen, VHOST_ACCESS_RO); + if (unlikely(!desc_ring)) + return -1; + + if (unlikely(dlen < vq->desc[idx].len)) { + idesc = alloc_copy_ind_table(dev, vq, + vq->desc[idx].addr, vq->desc[idx].len); + if (unlikely(!idesc)) + return -1; + + desc_ring = idesc; + } + + desc_id = 0; + } + + /* check if the buf addr is within the guest memory */ + do { + desc = desc_ring[desc_id]; + if (invalid_desc_check(dev, vq, desc.addr, desc.len, + VHOST_ACCESS_RW)) { + if (unlikely(idesc)) + free_ind_table(idesc); + return -1; + } + desc_id = desc.next; + } while (desc.flags & VRING_DESC_F_NEXT); + + if (unlikely(idesc)) { + free_ind_table(idesc); + idesc = NULL; + } + + idx_m++; + } + + rte_smp_wmb(); + s_vring->avail->idx = idx; + + if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) + vhost_avail_event(vq) = idx; + + return ret; +} + +int +rte_vdpa_relay_used_ring(int vid, uint16_t qid, void *m_vring) +{ + struct virtio_net *dev = get_device(vid); + uint16_t idx, idx_m, desc_id; + struct vhost_virtqueue *vq; + struct vring_desc desc; + struct vring_desc *desc_ring; + struct vring_desc *idesc = NULL; + struct vring *s_vring; + uint64_t dlen; + int ret; + + if (!dev || !m_vring) + return -1; + + if (qid >= dev->nr_vring) + return -1; + + if (vq_is_packed(dev)) + return -1; + + s_vring = (struct vring *)m_vring; + vq = dev->virtqueue[qid]; + idx = vq->used->idx; + idx_m = s_vring->used->idx; + ret = (uint16_t)(idx_m - idx); + + while (idx != idx_m) { + /* copy used entry, used ring logging is not covered here */ + vq->used->ring[idx & (vq->size - 1)] = + s_vring->used->ring[idx & (vq->size - 1)]; + + desc_id = vq->used->ring[idx & (vq->size - 1)].id; + desc_ring = vq->desc; + + if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) { + dlen = vq->desc[desc_id].len; + desc_ring = (struct vring_desc *)(uintptr_t) + vhost_iova_to_vva(dev, vq, vq->desc[desc_id].addr, + &dlen, VHOST_ACCESS_RO); + if (unlikely(!desc_ring)) + return -1; + + if (unlikely(dlen < vq->desc[idx].len)) { + idesc = alloc_copy_ind_table(dev, vq, + vq->desc[idx].addr, vq->desc[idx].len); + if (unlikely(!idesc)) + return -1; + + desc_ring = idesc; + } + + desc_id = 0; + } + + /* dirty page logging for DMA writeable buffer */ + do { + desc = desc_ring[desc_id]; + if (desc.flags & VRING_DESC_F_WRITE) + vhost_log_write(dev, desc.addr, desc.len); + desc_id = desc.next; + } while (desc.flags & VRING_DESC_F_NEXT); + + if (unlikely(idesc)) { + free_ind_table(idesc); + idesc = NULL; + } + + idx++; + } + + rte_smp_wmb(); + vq->used->idx = idx_m; + + if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) + vring_used_event(s_vring) = idx_m; + + return ret; +} diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 5218f1b12..2164cd6d9 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -18,6 +18,7 @@ #include #include #include +#include #include "rte_vhost.h" #include "rte_vdpa.h" @@ -753,4 +754,43 @@ vhost_vring_call_packed(struct virtio_net *dev, struct vhost_virtqueue *vq) eventfd_write(vq->callfd, (eventfd_t)1); } +static __rte_always_inline void * +alloc_copy_ind_table(struct virtio_net *dev, struct vhost_virtqueue *vq, + uint64_t desc_addr, uint64_t desc_len) +{ + void *idesc; + uint64_t src, dst; + uint64_t len, remain = desc_len; + + idesc = rte_malloc(__func__, desc_len, 0); + if (unlikely(!idesc)) + return 0; + + dst = (uint64_t)(uintptr_t)idesc; + + while (remain) { + len = remain; + src = vhost_iova_to_vva(dev, vq, desc_addr, &len, + VHOST_ACCESS_RO); + if (unlikely(!src || !len)) { + rte_free(idesc); + return 0; + } + + rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len); + + remain -= len; + dst += len; + desc_addr += len; + } + + return idesc; +} + +static __rte_always_inline void +free_ind_table(void *idesc) +{ + rte_free(idesc); +} + #endif /* _VHOST_NET_CDEV_H_ */ diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 5e1a1a727..8c657a101 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -37,45 +37,6 @@ is_valid_virt_queue_idx(uint32_t idx, int is_tx, uint32_t nr_vring) return (is_tx ^ (idx & 1)) == 0 && idx < nr_vring; } -static __rte_always_inline void * -alloc_copy_ind_table(struct virtio_net *dev, struct vhost_virtqueue *vq, - uint64_t desc_addr, uint64_t desc_len) -{ - void *idesc; - uint64_t src, dst; - uint64_t len, remain = desc_len; - - idesc = rte_malloc(__func__, desc_len, 0); - if (unlikely(!idesc)) - return 0; - - dst = (uint64_t)(uintptr_t)idesc; - - while (remain) { - len = remain; - src = vhost_iova_to_vva(dev, vq, desc_addr, &len, - VHOST_ACCESS_RO); - if (unlikely(!src || !len)) { - rte_free(idesc); - return 0; - } - - rte_memcpy((void *)(uintptr_t)dst, (void *)(uintptr_t)src, len); - - remain -= len; - dst += len; - desc_addr += len; - } - - return idesc; -} - -static __rte_always_inline void -free_ind_table(void *idesc) -{ - rte_free(idesc); -} - static __rte_always_inline void do_flush_shadow_used_ring_split(struct virtio_net *dev, struct vhost_virtqueue *vq, From patchwork Thu Dec 13 01:10:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48720 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2A5DD1B465; Thu, 13 Dec 2018 02:23:00 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id 1AB361B463 for ; Thu, 13 Dec 2018 02:22:58 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Dec 2018 17:22:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,346,1539673200"; d="scan'208";a="101112843" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.145]) by orsmga008.jf.intel.com with ESMTP; 12 Dec 2018 17:22:57 -0800 From: Xiao Wang To: alejandro.lucero@netronome.com, tiwei.bie@intel.com Cc: maxime.coquelin@redhat.com, dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Thu, 13 Dec 2018 09:10:08 +0800 Message-Id: <20181213011014.110089-4-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181213011014.110089-1-xiao.w.wang@intel.com> References: <20181128094607.106173-3-xiao.w.wang@intel.com> <20181213011014.110089-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH v2 3/9] net/ifc: dump debug message for error X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Driver probe may fail for different causes, debug message is helpful for debugging issue. Signed-off-by: Xiao Wang --- drivers/net/ifc/ifcvf_vdpa.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index e844109f3..aacd5f9bf 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -22,7 +22,7 @@ #define DRV_LOG(level, fmt, args...) \ rte_log(RTE_LOG_ ## level, ifcvf_vdpa_logtype, \ - "%s(): " fmt "\n", __func__, ##args) + "IFCVF %s(): " fmt "\n", __func__, ##args) #ifndef PAGE_SIZE #define PAGE_SIZE 4096 @@ -756,11 +756,16 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, internal->pdev = pci_dev; rte_spinlock_init(&internal->lock); - if (ifcvf_vfio_setup(internal) < 0) - return -1; - if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0) - return -1; + if (ifcvf_vfio_setup(internal) < 0) { + DRV_LOG(ERR, "failed to setup device %s", pci_dev->name); + goto error; + } + + if (ifcvf_init_hw(&internal->hw, internal->pdev) < 0) { + DRV_LOG(ERR, "failed to init device %s", pci_dev->name); + goto error; + } internal->max_queues = IFCVF_MAX_QUEUES; features = ifcvf_get_features(&internal->hw); @@ -782,8 +787,10 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, internal->did = rte_vdpa_register_device(&internal->dev_addr, &ifcvf_ops); - if (internal->did < 0) + if (internal->did < 0) { + DRV_LOG(ERR, "failed to register device %s", pci_dev->name); goto error; + } rte_atomic32_set(&internal->started, 1); update_datapath(internal); From patchwork Thu Dec 13 01:10:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48721 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A59701B46D; Thu, 13 Dec 2018 02:23:09 +0100 (CET) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 5BB3E1B45C; Thu, 13 Dec 2018 02:23:07 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Dec 2018 17:23:06 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,346,1539673200"; d="scan'208";a="101112871" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.145]) by orsmga008.jf.intel.com with ESMTP; 12 Dec 2018 17:23:04 -0800 From: Xiao Wang To: alejandro.lucero@netronome.com, tiwei.bie@intel.com Cc: maxime.coquelin@redhat.com, dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang , stable@dpdk.org Date: Thu, 13 Dec 2018 09:10:09 +0800 Message-Id: <20181213011014.110089-5-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181213011014.110089-1-xiao.w.wang@intel.com> References: <20181128094607.106173-3-xiao.w.wang@intel.com> <20181213011014.110089-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH v2 4/9] net/ifc: store only registered device instance X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" If driver fails to register ifc VF device into vhost lib, then this device should not be stored. Fixes: a3f8150eac6d ("net/ifcvf: add ifcvf vDPA driver") cc: stable@dpdk.org Signed-off-by: Xiao Wang --- drivers/net/ifc/ifcvf_vdpa.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index aacd5f9bf..6fcd50b73 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -781,10 +781,6 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, internal->dev_addr.type = PCI_ADDR; list->internal = internal; - pthread_mutex_lock(&internal_list_lock); - TAILQ_INSERT_TAIL(&internal_list, list, next); - pthread_mutex_unlock(&internal_list_lock); - internal->did = rte_vdpa_register_device(&internal->dev_addr, &ifcvf_ops); if (internal->did < 0) { @@ -792,6 +788,10 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, goto error; } + pthread_mutex_lock(&internal_list_lock); + TAILQ_INSERT_TAIL(&internal_list, list, next); + pthread_mutex_unlock(&internal_list_lock); + rte_atomic32_set(&internal->started, 1); update_datapath(internal); From patchwork Thu Dec 13 01:10:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48722 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id BF6201B463; Thu, 13 Dec 2018 02:23:12 +0100 (CET) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id E13D41B46A for ; Thu, 13 Dec 2018 02:23:10 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Dec 2018 17:23:10 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,346,1539673200"; d="scan'208";a="101112888" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.145]) by orsmga008.jf.intel.com with ESMTP; 12 Dec 2018 17:23:09 -0800 From: Xiao Wang To: alejandro.lucero@netronome.com, tiwei.bie@intel.com Cc: maxime.coquelin@redhat.com, dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Thu, 13 Dec 2018 09:10:10 +0800 Message-Id: <20181213011014.110089-6-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181213011014.110089-1-xiao.w.wang@intel.com> References: <20181128094607.106173-3-xiao.w.wang@intel.com> <20181213011014.110089-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH v2 5/9] net/ifc: detect if VDPA mode is specified X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" If user wants the VF to be used in VDPA (vhost data path acceleration) mode, then the user can add a "vdpa=1" parameter for the device. So if driver doesn't not find this option, it should quit and let the bus continue the probe. Signed-off-by: Xiao Wang --- drivers/net/ifc/ifcvf_vdpa.c | 47 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index 6fcd50b73..c0e50354a 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include #include "base/ifcvf.h" @@ -28,6 +30,13 @@ #define PAGE_SIZE 4096 #endif +#define IFCVF_VDPA_MODE "vdpa" + +static const char * const ifcvf_valid_arguments[] = { + IFCVF_VDPA_MODE, + NULL +}; + static int ifcvf_vdpa_logtype; struct ifcvf_internal { @@ -735,6 +744,21 @@ static struct rte_vdpa_dev_ops ifcvf_ops = { .get_notify_area = ifcvf_get_notify_area, }; +static inline int +open_int(const char *key __rte_unused, const char *value, void *extra_args) +{ + uint16_t *n = extra_args; + + if (value == NULL || extra_args == NULL) + return -EINVAL; + + *n = (uint16_t)strtoul(value, NULL, 0); + if (*n == USHRT_MAX && errno == ERANGE) + return -1; + + return 0; +} + static int ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, struct rte_pci_device *pci_dev) @@ -742,10 +766,31 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, uint64_t features; struct ifcvf_internal *internal = NULL; struct internal_list *list = NULL; + int vdpa_mode = 0; + struct rte_kvargs *kvlist = NULL; + int ret = 0; if (rte_eal_process_type() != RTE_PROC_PRIMARY) return 0; + kvlist = rte_kvargs_parse(pci_dev->device.devargs->args, + ifcvf_valid_arguments); + if (kvlist == NULL) + return 1; + + /* probe only when vdpa mode is specified */ + if (rte_kvargs_count(kvlist, IFCVF_VDPA_MODE) == 0) { + rte_kvargs_free(kvlist); + return 1; + } + + ret = rte_kvargs_process(kvlist, IFCVF_VDPA_MODE, &open_int, + &vdpa_mode); + if (ret < 0 || vdpa_mode == 0) { + rte_kvargs_free(kvlist); + return 1; + } + list = rte_zmalloc("ifcvf", sizeof(*list), 0); if (list == NULL) goto error; @@ -795,9 +840,11 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, rte_atomic32_set(&internal->started, 1); update_datapath(internal); + rte_kvargs_free(kvlist); return 0; error: + rte_kvargs_free(kvlist); rte_free(list); rte_free(internal); return -1; From patchwork Thu Dec 13 01:10:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48723 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id BF3771B472; Thu, 13 Dec 2018 02:23:16 +0100 (CET) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 340381B460 for ; Thu, 13 Dec 2018 02:23:15 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Dec 2018 17:23:14 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,346,1539673200"; d="scan'208";a="101112896" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.145]) by orsmga008.jf.intel.com with ESMTP; 12 Dec 2018 17:23:13 -0800 From: Xiao Wang To: alejandro.lucero@netronome.com, tiwei.bie@intel.com Cc: maxime.coquelin@redhat.com, dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Thu, 13 Dec 2018 09:10:11 +0800 Message-Id: <20181213011014.110089-7-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181213011014.110089-1-xiao.w.wang@intel.com> References: <20181128094607.106173-3-xiao.w.wang@intel.com> <20181213011014.110089-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH v2 6/9] net/ifc: add devarg for LM mode X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This patch series enables a new method for live migration, i.e. software assisted live migration. This patch provides a device argument for user to choose the methold. When "swlm=1", driver/device will do live migration with a relay thread dealing with dirty page logging. Without this parameter, device will do dirty page logging and there's no relay thread consuming CPU resource. Signed-off-by: Xiao Wang --- drivers/net/ifc/ifcvf_vdpa.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index c0e50354a..395c5112f 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include @@ -31,9 +32,11 @@ #endif #define IFCVF_VDPA_MODE "vdpa" +#define IFCVF_SW_FALLBACK_LM "swlm" static const char * const ifcvf_valid_arguments[] = { IFCVF_VDPA_MODE, + IFCVF_SW_FALLBACK_LM, NULL }; @@ -56,6 +59,7 @@ struct ifcvf_internal { rte_atomic32_t dev_attached; rte_atomic32_t running; rte_spinlock_t lock; + bool sw_lm; }; struct internal_list { @@ -767,6 +771,7 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, struct ifcvf_internal *internal = NULL; struct internal_list *list = NULL; int vdpa_mode = 0; + int sw_fallback_lm = 0; struct rte_kvargs *kvlist = NULL; int ret = 0; @@ -826,6 +831,14 @@ ifcvf_pci_probe(struct rte_pci_driver *pci_drv __rte_unused, internal->dev_addr.type = PCI_ADDR; list->internal = internal; + if (rte_kvargs_count(kvlist, IFCVF_SW_FALLBACK_LM)) { + ret = rte_kvargs_process(kvlist, IFCVF_SW_FALLBACK_LM, + &open_int, &sw_fallback_lm); + if (ret < 0) + goto error; + } + internal->sw_lm = sw_fallback_lm; + internal->did = rte_vdpa_register_device(&internal->dev_addr, &ifcvf_ops); if (internal->did < 0) { From patchwork Thu Dec 13 01:10:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48724 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 423BC1B45D; Thu, 13 Dec 2018 02:23:25 +0100 (CET) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 7BEFB1B451 for ; Thu, 13 Dec 2018 02:23:24 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Dec 2018 17:23:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,346,1539673200"; d="scan'208";a="101112912" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.145]) by orsmga008.jf.intel.com with ESMTP; 12 Dec 2018 17:23:21 -0800 From: Xiao Wang To: alejandro.lucero@netronome.com, tiwei.bie@intel.com Cc: maxime.coquelin@redhat.com, dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Thu, 13 Dec 2018 09:10:12 +0800 Message-Id: <20181213011014.110089-8-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181213011014.110089-1-xiao.w.wang@intel.com> References: <20181128094607.106173-3-xiao.w.wang@intel.com> <20181213011014.110089-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH v2 7/9] net/ifc: use lib API for used ring logging X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Vhost lib has already provided a helper for used ring logging, driver could use it to reduce code. Signed-off-by: Xiao Wang --- drivers/net/ifc/ifcvf_vdpa.c | 27 ++++++++------------------- 1 file changed, 8 insertions(+), 19 deletions(-) diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index 395c5112f..f181c5a6e 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -31,6 +31,9 @@ #define PAGE_SIZE 4096 #endif +#define IFCVF_USED_RING_LEN(size) \ + ((size) * sizeof(struct vring_used_elem) + sizeof(uint16_t) * 3) + #define IFCVF_VDPA_MODE "vdpa" #define IFCVF_SW_FALLBACK_LM "swlm" @@ -288,21 +291,6 @@ vdpa_ifcvf_start(struct ifcvf_internal *internal) return ifcvf_start_hw(&internal->hw); } -static void -ifcvf_used_ring_log(struct ifcvf_hw *hw, uint32_t queue, uint8_t *log_buf) -{ - uint32_t i, size; - uint64_t pfn; - - pfn = hw->vring[queue].used / PAGE_SIZE; - size = hw->vring[queue].size * sizeof(struct vring_used_elem) + - sizeof(uint16_t) * 3; - - for (i = 0; i <= size / PAGE_SIZE; i++) - __sync_fetch_and_or_8(&log_buf[(pfn + i) / 8], - 1 << ((pfn + i) % 8)); -} - static void vdpa_ifcvf_stop(struct ifcvf_internal *internal) { @@ -311,7 +299,7 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal) int vid; uint64_t features; uint64_t log_base, log_size; - uint8_t *log_buf; + uint64_t len; vid = internal->vid; ifcvf_stop_hw(hw); @@ -330,9 +318,10 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal) * IFCVF marks dirty memory pages for only packet buffer, * SW helps to mark the used ring as dirty after device stops. */ - log_buf = (uint8_t *)(uintptr_t)log_base; - for (i = 0; i < hw->nr_vring; i++) - ifcvf_used_ring_log(hw, i, log_buf); + for (i = 0; i < hw->nr_vring; i++) { + len = IFCVF_USED_RING_LEN(hw->vring[i].size); + rte_vhost_log_used_vring(vid, i, 0, len); + } } } From patchwork Thu Dec 13 01:10:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48725 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 7A84E1B460; Thu, 13 Dec 2018 02:23:28 +0100 (CET) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 52EE71B47C for ; Thu, 13 Dec 2018 02:23:27 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Dec 2018 17:23:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,346,1539673200"; d="scan'208";a="101112927" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.145]) by orsmga008.jf.intel.com with ESMTP; 12 Dec 2018 17:23:25 -0800 From: Xiao Wang To: alejandro.lucero@netronome.com, tiwei.bie@intel.com Cc: maxime.coquelin@redhat.com, dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Thu, 13 Dec 2018 09:10:13 +0800 Message-Id: <20181213011014.110089-9-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181213011014.110089-1-xiao.w.wang@intel.com> References: <20181128094607.106173-3-xiao.w.wang@intel.com> <20181213011014.110089-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH v2 8/9] net/ifc: support SW assisted VDPA live migration X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" In SW assisted live migration mode, driver will stop the device and setup a mediate virtio ring to relay the communication between the virtio driver and the VDPA device. This data path intervention will allow SW to help on guest dirty page logging for live migration. This SW fallback is event driven relay thread, so when the network throughput is low, this SW fallback will take little CPU resource, but when the throughput goes up, the relay thread's CPU usage will goes up accordinly. User needs to take all the factors including CPU usage, guest perf degradation, etc. into consideration when selecting the live migration support mode. Signed-off-by: Xiao Wang --- v2: * Make the parameter parsing code shorter. --- drivers/net/ifc/base/ifcvf.h | 1 + drivers/net/ifc/ifcvf_vdpa.c | 346 ++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 344 insertions(+), 3 deletions(-) diff --git a/drivers/net/ifc/base/ifcvf.h b/drivers/net/ifc/base/ifcvf.h index f026c70ab..8eb70ae9d 100644 --- a/drivers/net/ifc/base/ifcvf.h +++ b/drivers/net/ifc/base/ifcvf.h @@ -50,6 +50,7 @@ #define IFCVF_LM_ENABLE_VF 0x1 #define IFCVF_LM_ENABLE_PF 0x3 #define IFCVF_LOG_BASE 0x100000000000 +#define IFCVF_MEDIATE_VRING 0x200000000000 #define IFCVF_32_BIT_MASK 0xffffffff diff --git a/drivers/net/ifc/ifcvf_vdpa.c b/drivers/net/ifc/ifcvf_vdpa.c index f181c5a6e..31ea880b2 100644 --- a/drivers/net/ifc/ifcvf_vdpa.c +++ b/drivers/net/ifc/ifcvf_vdpa.c @@ -63,6 +63,9 @@ struct ifcvf_internal { rte_atomic32_t running; rte_spinlock_t lock; bool sw_lm; + bool sw_fallback_running; + /* mediated vring for sw fallback */ + struct vring m_vring[IFCVF_MAX_QUEUES * 2]; }; struct internal_list { @@ -308,6 +311,9 @@ vdpa_ifcvf_stop(struct ifcvf_internal *internal) rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx, hw->vring[i].last_used_idx); + if (internal->sw_lm) + return; + rte_vhost_get_negotiated_features(vid, &features); if (RTE_VHOST_NEED_LOG(features)) { ifcvf_disable_logging(hw); @@ -539,6 +545,318 @@ update_datapath(struct ifcvf_internal *internal) return ret; } +static int +m_ifcvf_start(struct ifcvf_internal *internal) +{ + struct ifcvf_hw *hw = &internal->hw; + uint32_t i, nr_vring; + int vid, ret; + struct rte_vhost_vring vq; + void *vring_buf; + uint64_t m_vring_iova = IFCVF_MEDIATE_VRING; + uint64_t size; + uint64_t gpa; + + vid = internal->vid; + nr_vring = rte_vhost_get_vring_num(vid); + rte_vhost_get_negotiated_features(vid, &hw->req_features); + + for (i = 0; i < nr_vring; i++) { + rte_vhost_get_vhost_vring(vid, i, &vq); + + size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE), + PAGE_SIZE); + vring_buf = rte_zmalloc("ifcvf", size, PAGE_SIZE); + vring_init(&internal->m_vring[i], vq.size, vring_buf, + PAGE_SIZE); + + ret = rte_vfio_container_dma_map(internal->vfio_container_fd, + (uint64_t)(uintptr_t)vring_buf, m_vring_iova, size); + if (ret < 0) { + DRV_LOG(ERR, "mediate vring DMA map failed."); + goto error; + } + + gpa = hva_to_gpa(vid, (uint64_t)(uintptr_t)vq.desc); + if (gpa == 0) { + DRV_LOG(ERR, "Fail to get GPA for descriptor ring."); + return -1; + } + hw->vring[i].desc = gpa; + + hw->vring[i].avail = m_vring_iova + + (char *)internal->m_vring[i].avail - + (char *)internal->m_vring[i].desc; + + hw->vring[i].used = m_vring_iova + + (char *)internal->m_vring[i].used - + (char *)internal->m_vring[i].desc; + + hw->vring[i].size = vq.size; + + rte_vhost_get_vring_base(vid, i, &hw->vring[i].last_avail_idx, + &hw->vring[i].last_used_idx); + + m_vring_iova += size; + } + hw->nr_vring = nr_vring; + + return ifcvf_start_hw(&internal->hw); + +error: + for (i = 0; i < nr_vring; i++) + if (internal->m_vring[i].desc) + rte_free(internal->m_vring[i].desc); + + return -1; +} + +static int +m_ifcvf_stop(struct ifcvf_internal *internal) +{ + int vid; + uint32_t i; + struct rte_vhost_vring vq; + struct ifcvf_hw *hw = &internal->hw; + uint64_t m_vring_iova = IFCVF_MEDIATE_VRING; + uint64_t size, len; + + vid = internal->vid; + ifcvf_stop_hw(hw); + + for (i = 0; i < hw->nr_vring; i++) { + rte_vhost_get_vhost_vring(vid, i, &vq); + len = IFCVF_USED_RING_LEN(vq.size); + rte_vhost_log_used_vring(vid, i, 0, len); + + size = RTE_ALIGN_CEIL(vring_size(vq.size, PAGE_SIZE), + PAGE_SIZE); + rte_vfio_container_dma_unmap(internal->vfio_container_fd, + (uint64_t)(uintptr_t)internal->m_vring[i].desc, + m_vring_iova, size); + + rte_vhost_set_vring_base(vid, i, hw->vring[i].last_avail_idx, + hw->vring[i].last_used_idx); + rte_free(internal->m_vring[i].desc); + m_vring_iova += size; + } + + return 0; +} + +static int +m_enable_vfio_intr(struct ifcvf_internal *internal) +{ + uint32_t nr_vring; + struct rte_intr_handle *intr_handle = &internal->pdev->intr_handle; + int ret; + + nr_vring = rte_vhost_get_vring_num(internal->vid); + + ret = rte_intr_efd_enable(intr_handle, nr_vring); + if (ret) + return -1; + + ret = rte_intr_enable(intr_handle); + if (ret) + return -1; + + return 0; +} + +static void +m_disable_vfio_intr(struct ifcvf_internal *internal) +{ + struct rte_intr_handle *intr_handle = &internal->pdev->intr_handle; + + rte_intr_efd_disable(intr_handle); + rte_intr_disable(intr_handle); +} + +static void +update_avail_ring(struct ifcvf_internal *internal, uint16_t qid) +{ + rte_vdpa_relay_avail_ring(internal->vid, qid, &internal->m_vring[qid]); + ifcvf_notify_queue(&internal->hw, qid); +} + +static void +update_used_ring(struct ifcvf_internal *internal, uint16_t qid) +{ + rte_vdpa_relay_used_ring(internal->vid, qid, &internal->m_vring[qid]); + rte_vhost_vring_call(internal->vid, qid); +} + +static void * +vring_relay(void *arg) +{ + int i, vid, epfd, fd, nfds; + struct ifcvf_internal *internal = (struct ifcvf_internal *)arg; + struct rte_vhost_vring vring; + struct rte_intr_handle *intr_handle; + uint16_t qid, q_num; + struct epoll_event events[IFCVF_MAX_QUEUES * 4]; + struct epoll_event ev; + int nbytes; + uint64_t buf; + + vid = internal->vid; + q_num = rte_vhost_get_vring_num(vid); + /* prepare the mediate vring */ + for (qid = 0; qid < q_num; qid++) { + rte_vhost_get_vring_base(vid, qid, + &internal->m_vring[qid].avail->idx, + &internal->m_vring[qid].used->idx); + rte_vdpa_relay_avail_ring(vid, qid, &internal->m_vring[qid]); + } + + /* add notify fd and interrupt fd to epoll */ + epfd = epoll_create(IFCVF_MAX_QUEUES * 2); + if (epfd < 0) { + DRV_LOG(ERR, "failed to create epoll instance."); + return NULL; + } + internal->epfd = epfd; + + for (qid = 0; qid < q_num; qid++) { + ev.events = EPOLLIN | EPOLLPRI; + rte_vhost_get_vhost_vring(vid, qid, &vring); + ev.data.u64 = qid << 1 | (uint64_t)vring.kickfd << 32; + if (epoll_ctl(epfd, EPOLL_CTL_ADD, vring.kickfd, &ev) < 0) { + DRV_LOG(ERR, "epoll add error: %s", strerror(errno)); + return NULL; + } + } + + intr_handle = &internal->pdev->intr_handle; + for (qid = 0; qid < q_num; qid++) { + ev.events = EPOLLIN | EPOLLPRI; + ev.data.u64 = 1 | qid << 1 | + (uint64_t)intr_handle->efds[qid] << 32; + if (epoll_ctl(epfd, EPOLL_CTL_ADD, intr_handle->efds[qid], &ev) + < 0) { + DRV_LOG(ERR, "epoll add error: %s", strerror(errno)); + return NULL; + } + } + + /* start relay with a first kick */ + for (qid = 0; qid < q_num; qid++) + ifcvf_notify_queue(&internal->hw, qid); + + /* listen to the events and react accordingly */ + for (;;) { + nfds = epoll_wait(epfd, events, q_num * 2, -1); + if (nfds < 0) { + if (errno == EINTR) + continue; + DRV_LOG(ERR, "epoll_wait return fail\n"); + return NULL; + } + + for (i = 0; i < nfds; i++) { + fd = (uint32_t)(events[i].data.u64 >> 32); + do { + nbytes = read(fd, &buf, 8); + if (nbytes < 0) { + if (errno == EINTR || + errno == EWOULDBLOCK || + errno == EAGAIN) + continue; + DRV_LOG(INFO, "Error reading " + "kickfd: %s", + strerror(errno)); + } + break; + } while (1); + + qid = events[i].data.u32 >> 1; + + if (events[i].data.u32 & 1) + update_used_ring(internal, qid); + else + update_avail_ring(internal, qid); + } + } + + return NULL; +} + +static int +setup_vring_relay(struct ifcvf_internal *internal) +{ + int ret; + + ret = pthread_create(&internal->tid, NULL, vring_relay, + (void *)internal); + if (ret) { + DRV_LOG(ERR, "failed to create ring relay pthread."); + return -1; + } + return 0; +} + +static int +unset_vring_relay(struct ifcvf_internal *internal) +{ + void *status; + + if (internal->tid) { + pthread_cancel(internal->tid); + pthread_join(internal->tid, &status); + } + internal->tid = 0; + + if (internal->epfd >= 0) + close(internal->epfd); + internal->epfd = -1; + + return 0; +} + +static int +ifcvf_sw_fallback_switchover(struct ifcvf_internal *internal) +{ + int ret; + + /* stop the direct IO data path */ + unset_notify_relay(internal); + vdpa_ifcvf_stop(internal); + vdpa_disable_vfio_intr(internal); + + ret = rte_vhost_host_notifier_ctrl(internal->vid, false); + if (ret && ret != -ENOTSUP) + goto error; + + /* set up interrupt for interrupt relay */ + ret = m_enable_vfio_intr(internal); + if (ret) + goto unmap; + + /* config the VF */ + ret = m_ifcvf_start(internal); + if (ret) + goto unset_intr; + + /* set up vring relay thread */ + ret = setup_vring_relay(internal); + if (ret) + goto stop_vf; + + internal->sw_fallback_running = true; + + return 0; + +stop_vf: + m_ifcvf_stop(internal); +unset_intr: + m_disable_vfio_intr(internal); +unmap: + ifcvf_dma_map(internal, 0); +error: + return -1; +} + static int ifcvf_dev_config(int vid) { @@ -579,8 +897,25 @@ ifcvf_dev_close(int vid) } internal = list->internal; - rte_atomic32_set(&internal->dev_attached, 0); - update_datapath(internal); + + if (internal->sw_fallback_running) { + /* unset ring relay */ + unset_vring_relay(internal); + + /* reset VF */ + m_ifcvf_stop(internal); + + /* remove interrupt setting */ + m_disable_vfio_intr(internal); + + /* unset DMA map for guest memory */ + ifcvf_dma_map(internal, 0); + + internal->sw_fallback_running = false; + } else { + rte_atomic32_set(&internal->dev_attached, 0); + update_datapath(internal); + } return 0; } @@ -604,7 +939,12 @@ ifcvf_set_features(int vid) internal = list->internal; rte_vhost_get_negotiated_features(vid, &features); - if (RTE_VHOST_NEED_LOG(features)) { + if (!RTE_VHOST_NEED_LOG(features)) + return 0; + + if (internal->sw_lm) { + ifcvf_sw_fallback_switchover(internal); + } else { rte_vhost_get_log_base(vid, &log_base, &log_size); rte_vfio_container_dma_map(internal->vfio_container_fd, log_base, IFCVF_LOG_BASE, log_size); From patchwork Thu Dec 13 01:10:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiao Wang X-Patchwork-Id: 48726 X-Patchwork-Delegate: maxime.coquelin@redhat.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 20B0B1B461; Thu, 13 Dec 2018 02:23:39 +0100 (CET) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 470541B454 for ; Thu, 13 Dec 2018 02:23:37 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Dec 2018 17:23:36 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,346,1539673200"; d="scan'208";a="101112948" Received: from dpdk-xiao-1.sh.intel.com ([10.67.111.145]) by orsmga008.jf.intel.com with ESMTP; 12 Dec 2018 17:23:34 -0800 From: Xiao Wang To: alejandro.lucero@netronome.com, tiwei.bie@intel.com Cc: maxime.coquelin@redhat.com, dev@dpdk.org, zhihong.wang@intel.com, xiaolong.ye@intel.com, Xiao Wang Date: Thu, 13 Dec 2018 09:10:14 +0800 Message-Id: <20181213011014.110089-10-xiao.w.wang@intel.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20181213011014.110089-1-xiao.w.wang@intel.com> References: <20181128094607.106173-3-xiao.w.wang@intel.com> <20181213011014.110089-1-xiao.w.wang@intel.com> Subject: [dpdk-dev] [PATCH v2 9/9] doc: update ifc NIC document X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Signed-off-by: Xiao Wang --- v2: * Add release note. --- doc/guides/nics/ifc.rst | 7 +++++++ doc/guides/rel_notes/release_19_02.rst | 5 +++++ 2 files changed, 12 insertions(+) diff --git a/doc/guides/nics/ifc.rst b/doc/guides/nics/ifc.rst index 48f9adf1d..858f35f74 100644 --- a/doc/guides/nics/ifc.rst +++ b/doc/guides/nics/ifc.rst @@ -39,6 +39,12 @@ the driver probe a new container is created for this device, with this container vDPA driver can program DMA remapping table with the VM's memory region information. +The device argument "swlm=1" will configure the driver into SW assisted live +migration mode. In this mode, the driver will set up a SW relay thread when LM +happens, this thread will help device to log dirty pages. Thus this mode does +not require HW to implement a dirty page logging function block, but will +consume some percentage of CPU resource depending on the network throughput. + Key IFCVF vDPA driver ops ~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -70,6 +76,7 @@ Features Features of the IFCVF driver are: - Compatibility with virtio 0.95 and 1.0. +- SW assisted vDPA live migration. Prerequisites diff --git a/doc/guides/rel_notes/release_19_02.rst b/doc/guides/rel_notes/release_19_02.rst index a94fa86a7..ea3909631 100644 --- a/doc/guides/rel_notes/release_19_02.rst +++ b/doc/guides/rel_notes/release_19_02.rst @@ -54,6 +54,11 @@ New Features Also, make sure to start the actual text at the margin. ========================================================= +* **Add support for SW-assisted VDPA live migration.** + This SW-assisted VDPA live migration facility helps VDPA devices without + logging capability to perform live migration, a mediate SW relay can help + devices to track dirty pages caused by DMA. IFC driver has enabled this + SW-assisted live migration mode. Removed Items -------------