From patchwork Tue Sep 29 09:29:52 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Patrick Fu <patrick.fu@intel.com>
X-Patchwork-Id: 79156
X-Patchwork-Delegate: maxime.coquelin@redhat.com
Return-Path: <dev-bounces@dpdk.org>
X-Original-To: patchwork@inbox.dpdk.org
Delivered-To: patchwork@inbox.dpdk.org
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id D90D2A04C0;
	Tue, 29 Sep 2020 11:38:35 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 3CEB21D91B;
	Tue, 29 Sep 2020 11:38:25 +0200 (CEST)
Received: from mga06.intel.com (mga06.intel.com [134.134.136.31])
 by dpdk.org (Postfix) with ESMTP id D82251D90D
 for <dev@dpdk.org>; Tue, 29 Sep 2020 11:38:22 +0200 (CEST)
IronPort-SDR: 
 O3ZRJj4PKu+Xx/ApkyseNTkcPI/ke94Q73ySfUVmp/gyW9Zgou0jo4TTNKIGmsKBHaJodnnMvM
 qxE5T2yxOyjw==
X-IronPort-AV: E=McAfee;i="6000,8403,9758"; a="223742114"
X-IronPort-AV: E=Sophos;i="5.77,318,1596524400"; d="scan'208";a="223742114"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga004.jf.intel.com ([10.7.209.38])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 29 Sep 2020 02:38:21 -0700
IronPort-SDR: 
 1xdojOuOzFTYuM3UKyiwsUxcuXG35Cl7rVwAGfujfOurc5jCi5OaCdWgfbMwjv4ghhNS6ulj2j
 UoR6B4CTJpQQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.77,318,1596524400"; d="scan'208";a="457215976"
Received: from npg-dpdk-patrickfu-casc2.sh.intel.com ([10.67.119.92])
 by orsmga004.jf.intel.com with ESMTP; 29 Sep 2020 02:38:19 -0700
From: Patrick Fu <patrick.fu@intel.com>
To: dev@dpdk.org,
	maxime.coquelin@redhat.com,
	chenbo.xia@intel.com
Cc: zhihong.wang@intel.com,
	cheng1.jiang@intel.com,
	patrick.fu@intel.com
Date: Tue, 29 Sep 2020 17:29:52 +0800
Message-Id: <20200929092955.2848419-2-patrick.fu@intel.com>
X-Mailer: git-send-email 2.18.4
In-Reply-To: <20200929092955.2848419-1-patrick.fu@intel.com>
References: <20200911015316.1903181-1-patrick.fu@intel.com>
 <20200929092955.2848419-1-patrick.fu@intel.com>
Subject: [dpdk-dev] [PATCH v3 1/4] vhost: simplify async copy completion
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Current async ops allows check_completed_copies() callback to return
arbitrary number of async iov segments finished from backend async
devices. This design creates complexity for vhost to handle breaking
transfer of a single packet (i.e. transfer completes in the middle
of a async descriptor) and prevents application callbacks from
leveraging hardware capability to offload the work. Thus, this patch
enforces the check_completed_copies() callback to return the number
of async memory descriptors, which is aligned with async transfer
data ops callbacks. vHost async data path are revised to work with
new ops define, which provides a clean and simplified processing.

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
---
 lib/librte_vhost/rte_vhost_async.h |  15 ++-
 lib/librte_vhost/vhost.c           |  20 ++--
 lib/librte_vhost/vhost.h           |   8 +-
 lib/librte_vhost/vhost_user.c      |   6 +-
 lib/librte_vhost/virtio_net.c      | 151 ++++++++++++-----------------
 5 files changed, 90 insertions(+), 110 deletions(-)

diff --git a/lib/librte_vhost/rte_vhost_async.h b/lib/librte_vhost/rte_vhost_async.h
index cb6284539..c73bd7c99 100644
--- a/lib/librte_vhost/rte_vhost_async.h
+++ b/lib/librte_vhost/rte_vhost_async.h
@@ -76,13 +76,26 @@ struct rte_vhost_async_channel_ops {
 	 * @param max_packets
 	 *  max number of packets could be completed
 	 * @return
-	 *  number of iov segments completed
+	 *  number of async descs completed
 	 */
 	uint32_t (*check_completed_copies)(int vid, uint16_t queue_id,
 		struct rte_vhost_async_status *opaque_data,
 		uint16_t max_packets);
 };
 
+/**
+ * inflight async packet information
+ */
+struct async_inflight_info {
+	union {
+		uint32_t info;
+		struct {
+			uint16_t descs; /* num of descs inflight */
+			uint16_t segs; /* iov segs inflight */
+		};
+	};
+};
+
 /**
  *  dma channel feature bit definition
  */
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 8f20a0818..eca507836 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -333,8 +333,8 @@ free_vq(struct virtio_net *dev, struct vhost_virtqueue *vq)
 		rte_free(vq->shadow_used_split);
 		if (vq->async_pkts_pending)
 			rte_free(vq->async_pkts_pending);
-		if (vq->async_pending_info)
-			rte_free(vq->async_pending_info);
+		if (vq->async_pkts_info)
+			rte_free(vq->async_pkts_info);
 	}
 	rte_free(vq->batch_copy_elems);
 	rte_mempool_free(vq->iotlb_pool);
@@ -1573,15 +1573,15 @@ int rte_vhost_async_channel_register(int vid, uint16_t queue_id,
 	vq->async_pkts_pending = rte_malloc(NULL,
 			vq->size * sizeof(uintptr_t),
 			RTE_CACHE_LINE_SIZE);
-	vq->async_pending_info = rte_malloc(NULL,
-			vq->size * sizeof(uint64_t),
+	vq->async_pkts_info = rte_malloc(NULL,
+			vq->size * sizeof(struct async_inflight_info),
 			RTE_CACHE_LINE_SIZE);
-	if (!vq->async_pkts_pending || !vq->async_pending_info) {
+	if (!vq->async_pkts_pending || !vq->async_pkts_info) {
 		if (vq->async_pkts_pending)
 			rte_free(vq->async_pkts_pending);
 
-		if (vq->async_pending_info)
-			rte_free(vq->async_pending_info);
+		if (vq->async_pkts_info)
+			rte_free(vq->async_pkts_info);
 
 		VHOST_LOG_CONFIG(ERR,
 				"async register failed: cannot allocate memory for vq data "
@@ -1635,9 +1635,9 @@ int rte_vhost_async_channel_unregister(int vid, uint16_t queue_id)
 		vq->async_pkts_pending = NULL;
 	}
 
-	if (vq->async_pending_info) {
-		rte_free(vq->async_pending_info);
-		vq->async_pending_info = NULL;
+	if (vq->async_pkts_info) {
+		rte_free(vq->async_pkts_info);
+		vq->async_pkts_info = NULL;
 	}
 
 	vq->async_ops.transfer_data = NULL;
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 73a1ed889..155a832c1 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -46,8 +46,6 @@
 
 #define MAX_PKT_BURST 32
 
-#define ASYNC_MAX_POLL_SEG 255
-
 #define VHOST_MAX_ASYNC_IT (MAX_PKT_BURST * 2)
 #define VHOST_MAX_ASYNC_VEC (BUF_VECTOR_MAX * 2)
 
@@ -225,12 +223,10 @@ struct vhost_virtqueue {
 
 	/* async data transfer status */
 	uintptr_t	**async_pkts_pending;
-	#define		ASYNC_PENDING_INFO_N_MSK 0xFFFF
-	#define		ASYNC_PENDING_INFO_N_SFT 16
-	uint64_t	*async_pending_info;
+	struct async_inflight_info *async_pkts_info;
 	uint16_t	async_pkts_idx;
 	uint16_t	async_pkts_inflight_n;
-	uint16_t	async_last_seg_n;
+	uint16_t	async_last_pkts_n;
 
 	/* vq async features */
 	bool		async_inorder;
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 501218e19..67d2a2d43 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -2006,10 +2006,10 @@ vhost_user_get_vring_base(struct virtio_net **pdev,
 		vq->shadow_used_split = NULL;
 		if (vq->async_pkts_pending)
 			rte_free(vq->async_pkts_pending);
-		if (vq->async_pending_info)
-			rte_free(vq->async_pending_info);
+		if (vq->async_pkts_info)
+			rte_free(vq->async_pkts_info);
 		vq->async_pkts_pending = NULL;
-		vq->async_pending_info = NULL;
+		vq->async_pkts_info = NULL;
 	}
 
 	rte_free(vq->batch_copy_elems);
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index bd9303c8a..68ead9a71 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -1473,37 +1473,6 @@ virtio_dev_rx_async_get_info_idx(uint16_t pkts_idx,
 		(vq_size - n_inflight + pkts_idx) & (vq_size - 1);
 }
 
-static __rte_always_inline void
-virtio_dev_rx_async_submit_split_err(struct virtio_net *dev,
-	struct vhost_virtqueue *vq, uint16_t queue_id,
-	uint16_t last_idx, uint16_t shadow_idx)
-{
-	uint16_t start_idx, pkts_idx, vq_size;
-	uint64_t *async_pending_info;
-
-	pkts_idx = vq->async_pkts_idx;
-	async_pending_info = vq->async_pending_info;
-	vq_size = vq->size;
-	start_idx = virtio_dev_rx_async_get_info_idx(pkts_idx,
-		vq_size, vq->async_pkts_inflight_n);
-
-	while (likely((start_idx & (vq_size - 1)) != pkts_idx)) {
-		uint64_t n_seg =
-			async_pending_info[(start_idx) & (vq_size - 1)] >>
-			ASYNC_PENDING_INFO_N_SFT;
-
-		while (n_seg)
-			n_seg -= vq->async_ops.check_completed_copies(dev->vid,
-				queue_id, 0, 1);
-	}
-
-	vq->async_pkts_inflight_n = 0;
-	vq->batch_copy_nb_elems = 0;
-
-	vq->shadow_used_idx = shadow_idx;
-	vq->last_avail_idx = last_idx;
-}
-
 static __rte_noinline uint32_t
 virtio_dev_rx_async_submit_split(struct virtio_net *dev,
 	struct vhost_virtqueue *vq, uint16_t queue_id,
@@ -1512,7 +1481,7 @@ virtio_dev_rx_async_submit_split(struct virtio_net *dev,
 	uint32_t pkt_idx = 0, pkt_burst_idx = 0;
 	uint16_t num_buffers;
 	struct buf_vector buf_vec[BUF_VECTOR_MAX];
-	uint16_t avail_head, last_idx, shadow_idx;
+	uint16_t avail_head;
 
 	struct rte_vhost_iov_iter *it_pool = vq->it_pool;
 	struct iovec *vec_pool = vq->vec_pool;
@@ -1522,11 +1491,11 @@ virtio_dev_rx_async_submit_split(struct virtio_net *dev,
 	struct rte_vhost_iov_iter *src_it = it_pool;
 	struct rte_vhost_iov_iter *dst_it = it_pool + 1;
 	uint16_t n_free_slot, slot_idx;
+	uint16_t pkt_err = 0;
+	struct async_inflight_info *pkts_info = vq->async_pkts_info;
 	int n_pkts = 0;
 
 	avail_head = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE);
-	last_idx = vq->last_avail_idx;
-	shadow_idx = vq->shadow_used_idx;
 
 	/*
 	 * The ordering between avail index and
@@ -1565,14 +1534,14 @@ virtio_dev_rx_async_submit_split(struct virtio_net *dev,
 		if (src_it->count) {
 			async_fill_desc(&tdes[pkt_burst_idx], src_it, dst_it);
 			pkt_burst_idx++;
-			vq->async_pending_info[slot_idx] =
-				num_buffers | (src_it->nr_segs << 16);
+			pkts_info[slot_idx].descs = num_buffers;
+			pkts_info[slot_idx].segs = src_it->nr_segs;
 			src_iovec += src_it->nr_segs;
 			dst_iovec += dst_it->nr_segs;
 			src_it += 2;
 			dst_it += 2;
 		} else {
-			vq->async_pending_info[slot_idx] = num_buffers;
+			pkts_info[slot_idx].info = num_buffers;
 			vq->async_pkts_inflight_n++;
 		}
 
@@ -1586,35 +1555,46 @@ virtio_dev_rx_async_submit_split(struct virtio_net *dev,
 			dst_iovec = vec_pool + (VHOST_MAX_ASYNC_VEC >> 1);
 			src_it = it_pool;
 			dst_it = it_pool + 1;
+			vq->async_pkts_inflight_n += n_pkts;
 
 			if (unlikely(n_pkts < (int)pkt_burst_idx)) {
-				vq->async_pkts_inflight_n +=
-					n_pkts > 0 ? n_pkts : 0;
-				virtio_dev_rx_async_submit_split_err(dev,
-					vq, queue_id, last_idx, shadow_idx);
-				return 0;
+				/*
+				 * log error packets number here and do actual
+				 * error processing when applications poll
+				 * completion
+				 */
+				pkt_err = pkt_burst_idx - n_pkts;
+				pkt_burst_idx = 0;
+				break;
 			}
 
 			pkt_burst_idx = 0;
-			vq->async_pkts_inflight_n += n_pkts;
 		}
 	}
 
 	if (pkt_burst_idx) {
 		n_pkts = vq->async_ops.transfer_data(dev->vid,
 				queue_id, tdes, 0, pkt_burst_idx);
-		if (unlikely(n_pkts < (int)pkt_burst_idx)) {
-			vq->async_pkts_inflight_n += n_pkts > 0 ? n_pkts : 0;
-			virtio_dev_rx_async_submit_split_err(dev, vq, queue_id,
-				last_idx, shadow_idx);
-			return 0;
-		}
-
 		vq->async_pkts_inflight_n += n_pkts;
+
+		if (unlikely(n_pkts < (int)pkt_burst_idx))
+			pkt_err = pkt_burst_idx - n_pkts;
 	}
 
 	do_data_copy_enqueue(dev, vq);
 
+	if (unlikely(pkt_err)) {
+		do {
+			if (pkts_info[slot_idx].segs)
+				pkt_err--;
+			vq->last_avail_idx -= pkts_info[slot_idx].descs;
+			vq->shadow_used_idx -= pkts_info[slot_idx].descs;
+			vq->async_pkts_inflight_n--;
+			slot_idx--;
+			pkt_idx--;
+		} while (pkt_err);
+	}
+
 	n_free_slot = vq->size - vq->async_pkts_idx;
 	if (n_free_slot > pkt_idx) {
 		rte_memcpy(&vq->async_pkts_pending[vq->async_pkts_idx],
@@ -1640,10 +1620,10 @@ uint16_t rte_vhost_poll_enqueue_completed(int vid, uint16_t queue_id,
 {
 	struct virtio_net *dev = get_device(vid);
 	struct vhost_virtqueue *vq;
-	uint16_t n_segs_cpl, n_pkts_put = 0, n_descs = 0;
+	uint16_t n_pkts_cpl = 0, n_pkts_put = 0, n_descs = 0;
 	uint16_t start_idx, pkts_idx, vq_size;
 	uint16_t n_inflight;
-	uint64_t *async_pending_info;
+	struct async_inflight_info *pkts_info;
 
 	if (!dev)
 		return 0;
@@ -1657,51 +1637,40 @@ uint16_t rte_vhost_poll_enqueue_completed(int vid, uint16_t queue_id,
 
 	vq = dev->virtqueue[queue_id];
 
+	if (unlikely(!vq->async_registered)) {
+		VHOST_LOG_DATA(ERR, "(%d) %s: async not registered for queue id %d.\n",
+			dev->vid, __func__, queue_id);
+		return 0;
+	}
+
 	rte_spinlock_lock(&vq->access_lock);
 
 	n_inflight = vq->async_pkts_inflight_n;
 	pkts_idx = vq->async_pkts_idx;
-	async_pending_info = vq->async_pending_info;
+	pkts_info = vq->async_pkts_info;
 	vq_size = vq->size;
 	start_idx = virtio_dev_rx_async_get_info_idx(pkts_idx,
 		vq_size, vq->async_pkts_inflight_n);
 
-	n_segs_cpl = vq->async_ops.check_completed_copies(vid, queue_id,
-		0, ASYNC_MAX_POLL_SEG - vq->async_last_seg_n) +
-		vq->async_last_seg_n;
+	if (count > vq->async_last_pkts_n)
+		n_pkts_cpl = vq->async_ops.check_completed_copies(vid,
+			queue_id, 0, count - vq->async_last_pkts_n);
+	n_pkts_cpl += vq->async_last_pkts_n;
 
 	rte_smp_wmb();
 
 	while (likely((n_pkts_put < count) && n_inflight)) {
-		uint64_t info = async_pending_info[
-			(start_idx + n_pkts_put) & (vq_size - 1)];
-		uint64_t n_segs;
+		uint16_t info_idx = (start_idx + n_pkts_put) & (vq_size - 1);
+		if (n_pkts_cpl && pkts_info[info_idx].segs)
+			n_pkts_cpl--;
+		else if (!n_pkts_cpl && pkts_info[info_idx].segs)
+			break;
 		n_pkts_put++;
 		n_inflight--;
-		n_descs += info & ASYNC_PENDING_INFO_N_MSK;
-		n_segs = info >> ASYNC_PENDING_INFO_N_SFT;
-
-		if (n_segs) {
-			if (unlikely(n_segs_cpl < n_segs)) {
-				n_pkts_put--;
-				n_inflight++;
-				n_descs -= info & ASYNC_PENDING_INFO_N_MSK;
-				if (n_segs_cpl) {
-					async_pending_info[
-						(start_idx + n_pkts_put) &
-						(vq_size - 1)] =
-					((n_segs - n_segs_cpl) <<
-					 ASYNC_PENDING_INFO_N_SFT) |
-					(info & ASYNC_PENDING_INFO_N_MSK);
-					n_segs_cpl = 0;
-				}
-				break;
-			}
-			n_segs_cpl -= n_segs;
-		}
+		n_descs += pkts_info[info_idx].descs;
 	}
 
-	vq->async_last_seg_n = n_segs_cpl;
+	vq->async_last_pkts_n = n_pkts_cpl;
 
 	if (n_pkts_put) {
 		vq->async_pkts_inflight_n = n_inflight;
@@ -1710,16 +1679,18 @@ uint16_t rte_vhost_poll_enqueue_completed(int vid, uint16_t queue_id,
 					n_descs, __ATOMIC_RELEASE);
 			vhost_vring_call_split(dev, vq);
 		}
-	}
 
-	if (start_idx + n_pkts_put <= vq_size) {
-		rte_memcpy(pkts, &vq->async_pkts_pending[start_idx],
-			n_pkts_put * sizeof(uintptr_t));
-	} else {
-		rte_memcpy(pkts, &vq->async_pkts_pending[start_idx],
-			(vq_size - start_idx) * sizeof(uintptr_t));
-		rte_memcpy(&pkts[vq_size - start_idx], vq->async_pkts_pending,
-			(n_pkts_put - vq_size + start_idx) * sizeof(uintptr_t));
+		if (start_idx + n_pkts_put <= vq_size) {
+			rte_memcpy(pkts, &vq->async_pkts_pending[start_idx],
+				n_pkts_put * sizeof(uintptr_t));
+		} else {
+			rte_memcpy(pkts, &vq->async_pkts_pending[start_idx],
+				(vq_size - start_idx) * sizeof(uintptr_t));
+			rte_memcpy(&pkts[vq_size - start_idx],
+				vq->async_pkts_pending,
+				(n_pkts_put + start_idx - vq_size) *
+				sizeof(uintptr_t));
+		}
 	}
 
 	rte_spinlock_unlock(&vq->access_lock);

From patchwork Tue Sep 29 09:29:53 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Patrick Fu <patrick.fu@intel.com>
X-Patchwork-Id: 79157
X-Patchwork-Delegate: maxime.coquelin@redhat.com
Return-Path: <dev-bounces@dpdk.org>
X-Original-To: patchwork@inbox.dpdk.org
Delivered-To: patchwork@inbox.dpdk.org
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 8D48CA04C0;
	Tue, 29 Sep 2020 11:38:56 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id AD42D1D936;
	Tue, 29 Sep 2020 11:38:35 +0200 (CEST)
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
 by dpdk.org (Postfix) with ESMTP id 0711A1D935
 for <dev@dpdk.org>; Tue, 29 Sep 2020 11:38:32 +0200 (CEST)
IronPort-SDR: 
 cWrjIzIsKs2miT9feVwfuWvuJYJwmCF1cW88Cd/6Xh2ESCMC+2Am1ZKPKBYOM1kxIfgzyvWNzC
 UWOU17Cx2a0Q==
X-IronPort-AV: E=McAfee;i="6000,8403,9758"; a="159485204"
X-IronPort-AV: E=Sophos;i="5.77,318,1596524400"; d="scan'208";a="159485204"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga004.jf.intel.com ([10.7.209.38])
 by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 29 Sep 2020 02:38:31 -0700
IronPort-SDR: 
 i5RSKgljARUOYiBiG9R73n+CVMPpa7wMkgIQBzyk9KmpM0t5VAV8gNXTZkMnS3NYZYurCbOJnG
 qisQ7e9WUNZg==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.77,318,1596524400"; d="scan'208";a="457216011"
Received: from npg-dpdk-patrickfu-casc2.sh.intel.com ([10.67.119.92])
 by orsmga004.jf.intel.com with ESMTP; 29 Sep 2020 02:38:29 -0700
From: Patrick Fu <patrick.fu@intel.com>
To: dev@dpdk.org,
	maxime.coquelin@redhat.com,
	chenbo.xia@intel.com
Cc: zhihong.wang@intel.com,
	cheng1.jiang@intel.com,
	patrick.fu@intel.com
Date: Tue, 29 Sep 2020 17:29:53 +0800
Message-Id: <20200929092955.2848419-3-patrick.fu@intel.com>
X-Mailer: git-send-email 2.18.4
In-Reply-To: <20200929092955.2848419-1-patrick.fu@intel.com>
References: <20200911015316.1903181-1-patrick.fu@intel.com>
 <20200929092955.2848419-1-patrick.fu@intel.com>
Subject: [dpdk-dev] [PATCH v3 2/4] vhost: dynamically allocate async memory
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Allocate async internal memory buffer by rte_malloc(), replacing array
declaration inside vq structure. Dynamic allocation can help to save
memory footprint when async path is not registered.

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.c | 69 ++++++++++++++++++++++++++--------------
 lib/librte_vhost/vhost.h |  4 +--
 2 files changed, 47 insertions(+), 26 deletions(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index eca507836..05b578c2f 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -324,6 +324,24 @@ cleanup_device(struct virtio_net *dev, int destroy)
 	}
 }
 
+static void
+vhost_free_async_mem(struct vhost_virtqueue *vq)
+{
+	if (vq->async_pkts_pending)
+		rte_free(vq->async_pkts_pending);
+	if (vq->async_pkts_info)
+		rte_free(vq->async_pkts_info);
+	if (vq->it_pool)
+		rte_free(vq->it_pool);
+	if (vq->vec_pool)
+		rte_free(vq->vec_pool);
+
+	vq->async_pkts_pending = NULL;
+	vq->async_pkts_info = NULL;
+	vq->it_pool = NULL;
+	vq->vec_pool = NULL;
+}
+
 void
 free_vq(struct virtio_net *dev, struct vhost_virtqueue *vq)
 {
@@ -331,10 +349,7 @@ free_vq(struct virtio_net *dev, struct vhost_virtqueue *vq)
 		rte_free(vq->shadow_used_packed);
 	else {
 		rte_free(vq->shadow_used_split);
-		if (vq->async_pkts_pending)
-			rte_free(vq->async_pkts_pending);
-		if (vq->async_pkts_info)
-			rte_free(vq->async_pkts_info);
+		vhost_free_async_mem(vq);
 	}
 	rte_free(vq->batch_copy_elems);
 	rte_mempool_free(vq->iotlb_pool);
@@ -1538,6 +1553,7 @@ int rte_vhost_async_channel_register(int vid, uint16_t queue_id,
 	struct vhost_virtqueue *vq;
 	struct virtio_net *dev = get_device(vid);
 	struct rte_vhost_async_features f;
+	int node;
 
 	if (dev == NULL || ops == NULL)
 		return -1;
@@ -1570,19 +1586,32 @@ int rte_vhost_async_channel_register(int vid, uint16_t queue_id,
 		goto reg_out;
 	}
 
-	vq->async_pkts_pending = rte_malloc(NULL,
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	if (get_mempolicy(&node, NULL, 0, vq, MPOL_F_NODE | MPOL_F_ADDR)) {
+		VHOST_LOG_CONFIG(ERR,
+			"unable to get numa information in async register. "
+			"allocating async buffer memory on the caller thread node\n");
+		node = SOCKET_ID_ANY;
+	}
+#else
+	node = SOCKET_ID_ANY;
+#endif
+
+	vq->async_pkts_pending = rte_malloc_socket(NULL,
 			vq->size * sizeof(uintptr_t),
-			RTE_CACHE_LINE_SIZE);
-	vq->async_pkts_info = rte_malloc(NULL,
+			RTE_CACHE_LINE_SIZE, node);
+	vq->async_pkts_info = rte_malloc_socket(NULL,
 			vq->size * sizeof(struct async_inflight_info),
-			RTE_CACHE_LINE_SIZE);
-	if (!vq->async_pkts_pending || !vq->async_pkts_info) {
-		if (vq->async_pkts_pending)
-			rte_free(vq->async_pkts_pending);
-
-		if (vq->async_pkts_info)
-			rte_free(vq->async_pkts_info);
-
+			RTE_CACHE_LINE_SIZE, node);
+	vq->it_pool = rte_malloc_socket(NULL,
+			VHOST_MAX_ASYNC_IT * sizeof(struct rte_vhost_iov_iter),
+			RTE_CACHE_LINE_SIZE, node);
+	vq->vec_pool = rte_malloc_socket(NULL,
+			VHOST_MAX_ASYNC_VEC * sizeof(struct iovec),
+			RTE_CACHE_LINE_SIZE, node);
+	if (!vq->async_pkts_pending || !vq->async_pkts_info ||
+		!vq->it_pool || !vq->vec_pool) {
+		vhost_free_async_mem(vq);
 		VHOST_LOG_CONFIG(ERR,
 				"async register failed: cannot allocate memory for vq data "
 				"(vid %d, qid: %d)\n", vid, queue_id);
@@ -1630,15 +1659,7 @@ int rte_vhost_async_channel_unregister(int vid, uint16_t queue_id)
 		goto out;
 	}
 
-	if (vq->async_pkts_pending) {
-		rte_free(vq->async_pkts_pending);
-		vq->async_pkts_pending = NULL;
-	}
-
-	if (vq->async_pkts_info) {
-		rte_free(vq->async_pkts_info);
-		vq->async_pkts_info = NULL;
-	}
+	vhost_free_async_mem(vq);
 
 	vq->async_ops.transfer_data = NULL;
 	vq->async_ops.check_completed_copies = NULL;
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 155a832c1..f0ee00c73 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -218,8 +218,8 @@ struct vhost_virtqueue {
 	/* operation callbacks for async dma */
 	struct rte_vhost_async_channel_ops	async_ops;
 
-	struct rte_vhost_iov_iter it_pool[VHOST_MAX_ASYNC_IT];
-	struct iovec vec_pool[VHOST_MAX_ASYNC_VEC];
+	struct rte_vhost_iov_iter *it_pool;
+	struct iovec *vec_pool;
 
 	/* async data transfer status */
 	uintptr_t	**async_pkts_pending;

From patchwork Tue Sep 29 09:29:54 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Patrick Fu <patrick.fu@intel.com>
X-Patchwork-Id: 79158
X-Patchwork-Delegate: maxime.coquelin@redhat.com
Return-Path: <dev-bounces@dpdk.org>
X-Original-To: patchwork@inbox.dpdk.org
Delivered-To: patchwork@inbox.dpdk.org
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 40E1DA04C0;
	Tue, 29 Sep 2020 11:39:18 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id DADEC1D9A5;
	Tue, 29 Sep 2020 11:38:41 +0200 (CEST)
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
 by dpdk.org (Postfix) with ESMTP id 02EBD1D91A
 for <dev@dpdk.org>; Tue, 29 Sep 2020 11:38:34 +0200 (CEST)
IronPort-SDR: 
 GfC6jkCEqHqGMQMZLw4JEJ+vosofqK6q69M/FjRgEqpYCgSjh9PccPJCH3hSyzevg5w+h4DUNB
 mKlRculMzP0Q==
X-IronPort-AV: E=McAfee;i="6000,8403,9758"; a="159485209"
X-IronPort-AV: E=Sophos;i="5.77,318,1596524400"; d="scan'208";a="159485209"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga004.jf.intel.com ([10.7.209.38])
 by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 29 Sep 2020 02:38:34 -0700
IronPort-SDR: 
 vvVpLk1ifM2MQu8GGGmHlCKecHB5tcYhpOj0I3FNvJAhGA8ZMm3xU95IbfzeGFQHN1mD0ipmRL
 QyXi+WdcomvQ==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.77,318,1596524400"; d="scan'208";a="457216018"
Received: from npg-dpdk-patrickfu-casc2.sh.intel.com ([10.67.119.92])
 by orsmga004.jf.intel.com with ESMTP; 29 Sep 2020 02:38:32 -0700
From: Patrick Fu <patrick.fu@intel.com>
To: dev@dpdk.org,
	maxime.coquelin@redhat.com,
	chenbo.xia@intel.com
Cc: zhihong.wang@intel.com,
	cheng1.jiang@intel.com,
	patrick.fu@intel.com
Date: Tue, 29 Sep 2020 17:29:54 +0800
Message-Id: <20200929092955.2848419-4-patrick.fu@intel.com>
X-Mailer: git-send-email 2.18.4
In-Reply-To: <20200929092955.2848419-1-patrick.fu@intel.com>
References: <20200911015316.1903181-1-patrick.fu@intel.com>
 <20200929092955.2848419-1-patrick.fu@intel.com>
Subject: [dpdk-dev] [PATCH v3 3/4] vhost: fix async vector buffer overrun
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Add check on the async vector buffer usage to prevent the buf overrun.
If the unused vector buffer is not sufficient to prepare for next
packet's iov creation, an async transfer will be triggered immediately
to free the vector buffer.

Fixes: 78639d54563a ("vhost: introduce async enqueue registration API")

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.h      |  2 +-
 lib/librte_vhost/virtio_net.c | 13 ++++++++++++-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index f0ee00c73..34197ce99 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -47,7 +47,7 @@
 #define MAX_PKT_BURST 32
 
 #define VHOST_MAX_ASYNC_IT (MAX_PKT_BURST * 2)
-#define VHOST_MAX_ASYNC_VEC (BUF_VECTOR_MAX * 2)
+#define VHOST_MAX_ASYNC_VEC (BUF_VECTOR_MAX * 4)
 
 #define PACKED_DESC_ENQUEUE_USED_FLAG(w)	\
 	((w) ? (VRING_DESC_F_AVAIL | VRING_DESC_F_USED | VRING_DESC_F_WRITE) : \
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 68ead9a71..60cd5e2b1 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -1492,6 +1492,7 @@ virtio_dev_rx_async_submit_split(struct virtio_net *dev,
 	struct rte_vhost_iov_iter *dst_it = it_pool + 1;
 	uint16_t n_free_slot, slot_idx;
 	uint16_t pkt_err = 0;
+	uint16_t segs_await = 0;
 	struct async_inflight_info *pkts_info = vq->async_pkts_info;
 	int n_pkts = 0;
 
@@ -1540,6 +1541,7 @@ virtio_dev_rx_async_submit_split(struct virtio_net *dev,
 			dst_iovec += dst_it->nr_segs;
 			src_it += 2;
 			dst_it += 2;
+			segs_await += src_it->nr_segs;
 		} else {
 			pkts_info[slot_idx].info = num_buffers;
 			vq->async_pkts_inflight_n++;
@@ -1547,14 +1549,23 @@ virtio_dev_rx_async_submit_split(struct virtio_net *dev,
 
 		vq->last_avail_idx += num_buffers;
 
+		/*
+		 * conditions to trigger async device transfer:
+		 * - buffered packet number reaches transfer threshold
+		 * - this is the last packet in the burst enqueue
+		 * - unused async iov number is less than max vhost vector
+		 */
 		if (pkt_burst_idx >= VHOST_ASYNC_BATCH_THRESHOLD ||
-				(pkt_idx == count - 1 && pkt_burst_idx)) {
+			(pkt_idx == count - 1 && pkt_burst_idx) ||
+			(VHOST_MAX_ASYNC_VEC / 2 - segs_await <
+			BUF_VECTOR_MAX)) {
 			n_pkts = vq->async_ops.transfer_data(dev->vid,
 					queue_id, tdes, 0, pkt_burst_idx);
 			src_iovec = vec_pool;
 			dst_iovec = vec_pool + (VHOST_MAX_ASYNC_VEC >> 1);
 			src_it = it_pool;
 			dst_it = it_pool + 1;
+			segs_await = 0;
 			vq->async_pkts_inflight_n += n_pkts;
 
 			if (unlikely(n_pkts < (int)pkt_burst_idx)) {

From patchwork Tue Sep 29 09:29:55 2020
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Patrick Fu <patrick.fu@intel.com>
X-Patchwork-Id: 79159
X-Patchwork-Delegate: maxime.coquelin@redhat.com
Return-Path: <dev-bounces@dpdk.org>
X-Original-To: patchwork@inbox.dpdk.org
Delivered-To: patchwork@inbox.dpdk.org
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id D5B9EA04C0;
	Tue, 29 Sep 2020 11:39:33 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 96D021D9C7;
	Tue, 29 Sep 2020 11:38:43 +0200 (CEST)
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
 by dpdk.org (Postfix) with ESMTP id 89B4F1D94A
 for <dev@dpdk.org>; Tue, 29 Sep 2020 11:38:37 +0200 (CEST)
IronPort-SDR: 
 LinveBrKyh3GfYNQinAfkXA0ZVhExmw5lxNPvekQPjNXqou0RaHmB1FZN+kALj8lVqP7Z/HTfb
 LzxcWxpFFV6Q==
X-IronPort-AV: E=McAfee;i="6000,8403,9758"; a="159485215"
X-IronPort-AV: E=Sophos;i="5.77,318,1596524400"; d="scan'208";a="159485215"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga004.jf.intel.com ([10.7.209.38])
 by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 29 Sep 2020 02:38:37 -0700
IronPort-SDR: 
 bdllUY8miX+I1ieIITKh9c/R/PJiUwU5SHqoiM7gMUAKcZa/34HjLmi7gsa0+WyLv7RmfD8KJv
 wiWy/0ouZN+g==
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.77,318,1596524400"; d="scan'208";a="457216029"
Received: from npg-dpdk-patrickfu-casc2.sh.intel.com ([10.67.119.92])
 by orsmga004.jf.intel.com with ESMTP; 29 Sep 2020 02:38:35 -0700
From: Patrick Fu <patrick.fu@intel.com>
To: dev@dpdk.org,
	maxime.coquelin@redhat.com,
	chenbo.xia@intel.com
Cc: zhihong.wang@intel.com,
	cheng1.jiang@intel.com,
	patrick.fu@intel.com
Date: Tue, 29 Sep 2020 17:29:55 +0800
Message-Id: <20200929092955.2848419-5-patrick.fu@intel.com>
X-Mailer: git-send-email 2.18.4
In-Reply-To: <20200929092955.2848419-1-patrick.fu@intel.com>
References: <20200911015316.1903181-1-patrick.fu@intel.com>
 <20200929092955.2848419-1-patrick.fu@intel.com>
Subject: [dpdk-dev] [PATCH v3 4/4] vhost: fix async register/unregister
	deadlock
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

When async register/unregister function is invoked in certain vhost
event callbacks (e.g. vring state change), deadlock may occur due to
recursive spinlock acquire. This patch removes unnecessary spinlock
from register API and use trylock() primitive in the unregister API
to avoid deadlock.

Fixes: 78639d54563a ("vhost: introduce async enqueue registration API")

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
---
 lib/librte_vhost/vhost.c      | 13 +++++++------
 lib/librte_vhost/vhost_user.c |  4 ++--
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 05b578c2f..ba504a00a 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -1577,8 +1577,6 @@ int rte_vhost_async_channel_register(int vid, uint16_t queue_id,
 		ops->transfer_data == NULL))
 		return -1;
 
-	rte_spinlock_lock(&vq->access_lock);
-
 	if (unlikely(vq->async_registered)) {
 		VHOST_LOG_CONFIG(ERR,
 			"async register failed: channel already registered "
@@ -1627,8 +1625,6 @@ int rte_vhost_async_channel_register(int vid, uint16_t queue_id,
 	vq->async_registered = true;
 
 reg_out:
-	rte_spinlock_unlock(&vq->access_lock);
-
 	return 0;
 }
 
@@ -1647,10 +1643,15 @@ int rte_vhost_async_channel_unregister(int vid, uint16_t queue_id)
 		return ret;
 
 	ret = 0;
-	rte_spinlock_lock(&vq->access_lock);
 
 	if (!vq->async_registered)
-		goto out;
+		return ret;
+
+	if (!rte_spinlock_trylock(&vq->access_lock)) {
+		VHOST_LOG_CONFIG(ERR, "Failed to unregister async channel. "
+			"virt queue busy.\n");
+		return -1;
+	}
 
 	if (vq->async_pkts_inflight_n) {
 		VHOST_LOG_CONFIG(ERR, "Failed to unregister async channel. "
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 67d2a2d43..c8d74bde6 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -2043,9 +2043,9 @@ vhost_user_set_vring_enable(struct virtio_net **pdev,
 		"set queue enable: %d to qp idx: %d\n",
 		enable, index);
 
-	if (!enable && dev->virtqueue[index]->async_registered) {
+	if (enable && dev->virtqueue[index]->async_registered) {
 		if (dev->virtqueue[index]->async_pkts_inflight_n) {
-			VHOST_LOG_CONFIG(ERR, "failed to disable vring. "
+			VHOST_LOG_CONFIG(ERR, "failed to enable vring. "
 			"async inflight packets must be completed first\n");
 			return RTE_VHOST_MSG_RESULT_ERR;
 		}