From patchwork Wed Jun 30 06:40:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feifei Wang X-Patchwork-Id: 95029 X-Patchwork-Delegate: qi.z.zhang@intel.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 99A25A0A0F; Wed, 30 Jun 2021 08:40:50 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 868574122A; Wed, 30 Jun 2021 08:40:47 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id B884041229 for ; Wed, 30 Jun 2021 08:40:46 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2DC946D; Tue, 29 Jun 2021 23:40:46 -0700 (PDT) Received: from net-x86-dell-8268.shanghai.arm.com (net-x86-dell-8268.shanghai.arm.com [10.169.210.141]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 4700D3F5A1; Tue, 29 Jun 2021 23:40:44 -0700 (PDT) From: Feifei Wang To: Beilei Xing Cc: dev@dpdk.org, nd@arm.com, Feifei Wang , Ruifeng Wang Date: Wed, 30 Jun 2021 14:40:35 +0800 Message-Id: <20210630064036.105151-2-feifei.wang2@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210630064036.105151-1-feifei.wang2@arm.com> References: <20210527081714.1367611-1-feifei.wang2@arm.com> <20210630064036.105151-1-feifei.wang2@arm.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v3 1/2] net/i40e: improve performance for scalar Tx X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" For i40e scalar Tx path, if implement FAST_FREE_MBUF mode, it means per-queue all mbufs come from the same mempool and have refcnt = 1. Thus we can use bulk free of the buffers when mbuf fast free mode is enabled. Following are the test results with this patch: MRR L3FWD Test: two ports & bi-directional flows & one core RX API: i40e_recv_pkts_bulk_alloc TX API: i40e_xmit_pkts_simple ring_descs_size = 1024; Ring_I40E_TX_MAX_FREE_SZ = 64; tx_rs_thresh = I40E_DEFAULT_TX_RSBIT_THRESH = 32; tx_free_thresh = I40E_DEFAULT_TX_FREE_THRESH = 32; For scalar path in arm platform with default 'tx_rs_thresh': In n1sdp, performance is improved by 7.9%; In thunderx2, performance is improved by 7.6%. For scalar path in x86 platform with default 'tx_rs_thresh': performance is improved by 4.7%. Suggested-by: Ruifeng Wang Signed-off-by: Feifei Wang Reviewed-by: Ruifeng Wang Acked-by: Beilei Xing --- drivers/net/i40e/i40e_rxtx.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 6c58decece..0d3482a9d2 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -1294,22 +1294,40 @@ static __rte_always_inline int i40e_tx_free_bufs(struct i40e_tx_queue *txq) { struct i40e_tx_entry *txep; - uint16_t i; + uint16_t tx_rs_thresh = txq->tx_rs_thresh; + uint16_t i = 0, j = 0; + struct rte_mbuf *free[RTE_I40E_TX_MAX_FREE_BUF_SZ]; + const uint16_t k = RTE_ALIGN_FLOOR(tx_rs_thresh, RTE_I40E_TX_MAX_FREE_BUF_SZ); + const uint16_t m = tx_rs_thresh % RTE_I40E_TX_MAX_FREE_BUF_SZ; if ((txq->tx_ring[txq->tx_next_dd].cmd_type_offset_bsz & rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) != rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE)) return 0; - txep = &(txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)]); + txep = &txq->sw_ring[txq->tx_next_dd - (tx_rs_thresh - 1)]; - for (i = 0; i < txq->tx_rs_thresh; i++) + for (i = 0; i < tx_rs_thresh; i++) rte_prefetch0((txep + i)->mbuf); if (txq->offloads & DEV_TX_OFFLOAD_MBUF_FAST_FREE) { - for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) { - rte_mempool_put(txep->mbuf->pool, txep->mbuf); - txep->mbuf = NULL; + if (k) { + for (j = 0; j != k; j += RTE_I40E_TX_MAX_FREE_BUF_SZ) { + for (i = 0; i < RTE_I40E_TX_MAX_FREE_BUF_SZ; ++i, ++txep) { + free[i] = txep->mbuf; + txep->mbuf = NULL; + } + rte_mempool_put_bulk(free[0]->pool, (void **)free, + RTE_I40E_TX_MAX_FREE_BUF_SZ); + } + } + + if (m) { + for (i = 0; i < m; ++i, ++txep) { + free[i] = txep->mbuf; + txep->mbuf = NULL; + } + rte_mempool_put_bulk(free[0]->pool, (void **)free, m); } } else { for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) { From patchwork Wed Jun 30 06:40:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Feifei Wang X-Patchwork-Id: 95030 X-Patchwork-Delegate: qi.z.zhang@intel.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 54AD3A0A0F; Wed, 30 Jun 2021 08:40:56 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id A0CE541230; Wed, 30 Jun 2021 08:40:50 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id 3A09A41230 for ; Wed, 30 Jun 2021 08:40:49 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A8E5FD6E; Tue, 29 Jun 2021 23:40:48 -0700 (PDT) Received: from net-x86-dell-8268.shanghai.arm.com (net-x86-dell-8268.shanghai.arm.com [10.169.210.141]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id C34203F5A1; Tue, 29 Jun 2021 23:40:46 -0700 (PDT) From: Feifei Wang To: Beilei Xing Cc: dev@dpdk.org, nd@arm.com, Feifei Wang , Ruifeng Wang Date: Wed, 30 Jun 2021 14:40:36 +0800 Message-Id: <20210630064036.105151-3-feifei.wang2@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210630064036.105151-1-feifei.wang2@arm.com> References: <20210527081714.1367611-1-feifei.wang2@arm.com> <20210630064036.105151-1-feifei.wang2@arm.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v3 2/2] net/i40e: improve performance for vector Tx X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" For i40e vector Tx path, if tx_offload is set as FAST_FREE_MBUF mode, no mbuf fast free operations are executed. To fix this, add mbuf fast free mode for vector Tx path. Furthermore, for i40e vector Tx path, if implement FAST_FREE_MBUF mode, it means per-queue all mbufs come from the same mempool and have refcnt = 1. Thus we can use bulk free of the buffers when mbuf fast free mode is enabled. For vector path in arm platform: In n1sdp, performance is improved by 18.4%; In thunderx2, performance is improved by 23%. For vector path in x86 platform: No performance changes. Suggested-by: Ruifeng Wang Signed-off-by: Feifei Wang Reviewed-by: Ruifeng Wang --- drivers/net/i40e/i40e_rxtx_vec_common.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 16fcf0aec6..f52ed98d62 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -99,6 +99,16 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) * tx_next_dd - (tx_rs_thresh-1) */ txep = &txq->sw_ring[txq->tx_next_dd - (n - 1)]; + + if (txq->offloads & DEV_TX_OFFLOAD_MBUF_FAST_FREE) { + for (i = 0; i < n; i++) { + free[i] = txep[i].mbuf; + txep[i].mbuf = NULL; + } + rte_mempool_put_bulk(free[0]->pool, (void **)free, n); + goto done; + } + m = rte_pktmbuf_prefree_seg(txep[0].mbuf); if (likely(m != NULL)) { free[0] = m; @@ -126,6 +136,7 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) } } +done: /* buffers were freed, update counters */ txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh); txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + txq->tx_rs_thresh);