Patch Detail
get:
Show a patch.
patch:
Update a patch.
put:
Update a patch.
GET /api/patches/86523/?format=api
https://patches.dpdk.org/api/patches/86523/?format=api", "web_url": "https://patches.dpdk.org/project/dpdk/patch/20210114063951.2580-4-leyi.rong@intel.com/", "project": { "id": 1, "url": "https://patches.dpdk.org/api/projects/1/?format=api", "name": "DPDK", "link_name": "dpdk", "list_id": "dev.dpdk.org", "list_email": "dev@dpdk.org", "web_url": "http://core.dpdk.org", "scm_url": "git://dpdk.org/dpdk", "webscm_url": "http://git.dpdk.org/dpdk", "list_archive_url": "https://inbox.dpdk.org/dev", "list_archive_url_format": "https://inbox.dpdk.org/dev/{}", "commit_url_format": "" }, "msgid": "<20210114063951.2580-4-leyi.rong@intel.com>", "list_archive_url": "https://inbox.dpdk.org/dev/20210114063951.2580-4-leyi.rong@intel.com", "date": "2021-01-14T06:39:51", "name": "[v3,3/3] net/i40e: optimize Tx by using AVX512", "commit_ref": null, "pull_url": null, "state": "accepted", "archived": true, "hash": "07ebfc3fa78b78bf7d4bc83fcf017fe7ab8a960d", "submitter": { "id": 1204, "url": "https://patches.dpdk.org/api/people/1204/?format=api", "name": "Leyi Rong", "email": "leyi.rong@intel.com" }, "delegate": { "id": 1540, "url": "https://patches.dpdk.org/api/users/1540/?format=api", "username": "qzhan15", "first_name": "Qi", "last_name": "Zhang", "email": "qi.z.zhang@intel.com" }, "mbox": "https://patches.dpdk.org/project/dpdk/patch/20210114063951.2580-4-leyi.rong@intel.com/mbox/", "series": [ { "id": 14716, "url": "https://patches.dpdk.org/api/series/14716/?format=api", "web_url": "https://patches.dpdk.org/project/dpdk/list/?series=14716", "date": "2021-01-14T06:39:49", "name": "AVX512 vPMD on i40e", "version": 3, "mbox": "https://patches.dpdk.org/series/14716/mbox/" } ], "comments": "https://patches.dpdk.org/api/patches/86523/comments/", "check": "fail", "checks": "https://patches.dpdk.org/api/patches/86523/checks/", "tags": {}, "related": [], "headers": { "Return-Path": "<dev-bounces@dpdk.org>", "X-Original-To": "patchwork@inbox.dpdk.org", "Delivered-To": "patchwork@inbox.dpdk.org", "Received": [ "from mails.dpdk.org (mails.dpdk.org [217.70.189.124])\n\tby inbox.dpdk.org (Postfix) with ESMTP id C4DA4A0A02;\n\tThu, 14 Jan 2021 08:00:30 +0100 (CET)", "from [217.70.189.124] (localhost [127.0.0.1])\n\tby mails.dpdk.org (Postfix) with ESMTP id 469C8140EB6;\n\tThu, 14 Jan 2021 08:00:01 +0100 (CET)", "from mga12.intel.com (mga12.intel.com [192.55.52.136])\n by mails.dpdk.org (Postfix) with ESMTP id AB828140E99\n for <dev@dpdk.org>; Thu, 14 Jan 2021 07:59:56 +0100 (CET)", "from orsmga003.jf.intel.com ([10.7.209.27])\n by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;\n 13 Jan 2021 22:59:54 -0800", "from dpdk-lrong-srv-04.sh.intel.com ([10.67.119.221])\n by orsmga003.jf.intel.com with ESMTP; 13 Jan 2021 22:59:53 -0800" ], "IronPort-SDR": [ "\n 3eB++ZjPhMsbMAhpT+SJX2edSwbHn/nmK0xx4AbuaMP07Vk9pdHfIZTsc7/iEz+CMYYfWmOJpp\n Agcu8Vm3G5uw==", "\n mBRQReot4lH6mit2pPNGNnJ7u+aGr9zfxgHBoJSDhcJaAKYf+9br6LcVOUDDBtXz88DYvIYXOt\n rjdgJBddZazw==" ], "X-IronPort-AV": [ "E=McAfee;i=\"6000,8403,9863\"; a=\"157499691\"", "E=Sophos;i=\"5.79,346,1602572400\"; d=\"scan'208\";a=\"157499691\"", "E=Sophos;i=\"5.79,346,1602572400\"; d=\"scan'208\";a=\"349076948\"" ], "X-ExtLoop1": "1", "From": "Leyi Rong <leyi.rong@intel.com>", "To": "qi.z.zhang@intel.com, wenzhuo.lu@intel.com, ferruh.yigit@intel.com,\n bruce.richardson@intel.com, beilei.xing@intel.com", "Cc": "dev@dpdk.org,\n\tLeyi Rong <leyi.rong@intel.com>", "Date": "Thu, 14 Jan 2021 14:39:51 +0800", "Message-Id": "<20210114063951.2580-4-leyi.rong@intel.com>", "X-Mailer": "git-send-email 2.17.1", "In-Reply-To": "<20210114063951.2580-1-leyi.rong@intel.com>", "References": "<20201215021945.103396-1-leyi.rong@intel.com>\n <20210114063951.2580-1-leyi.rong@intel.com>", "Subject": "[dpdk-dev] [PATCH v3 3/3] net/i40e: optimize Tx by using AVX512", "X-BeenThere": "dev@dpdk.org", "X-Mailman-Version": "2.1.29", "Precedence": "list", "List-Id": "DPDK patches and discussions <dev.dpdk.org>", "List-Unsubscribe": "<https://mails.dpdk.org/options/dev>,\n <mailto:dev-request@dpdk.org?subject=unsubscribe>", "List-Archive": "<http://mails.dpdk.org/archives/dev/>", "List-Post": "<mailto:dev@dpdk.org>", "List-Help": "<mailto:dev-request@dpdk.org?subject=help>", "List-Subscribe": "<https://mails.dpdk.org/listinfo/dev>,\n <mailto:dev-request@dpdk.org?subject=subscribe>", "Errors-To": "dev-bounces@dpdk.org", "Sender": "\"dev\" <dev-bounces@dpdk.org>" }, "content": "Optimize Tx path by using AVX512 instructions and vectorize the\ntx free bufs process.\n\nSigned-off-by: Leyi Rong <leyi.rong@intel.com>\nSigned-off-by: Bruce Richardson <bruce.richardson@intel.com>\n---\n drivers/net/i40e/i40e_rxtx.c | 19 +++\n drivers/net/i40e/i40e_rxtx.h | 4 +\n drivers/net/i40e/i40e_rxtx_vec_avx512.c | 152 ++++++++++++++++++++----\n 3 files changed, 155 insertions(+), 20 deletions(-)", "diff": "diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c\nindex c99c051306..194bc3571f 100644\n--- a/drivers/net/i40e/i40e_rxtx.c\n+++ b/drivers/net/i40e/i40e_rxtx.c\n@@ -2508,6 +2508,25 @@ i40e_tx_queue_release_mbufs(struct i40e_tx_queue *txq)\n \t * vPMD tx will not set sw_ring's mbuf to NULL after free,\n \t * so need to free remains more carefully.\n \t */\n+#ifdef CC_AVX512_SUPPORT\n+\tif (dev->tx_pkt_burst == i40e_xmit_pkts_vec_avx512) {\n+\t\tstruct i40e_vec_tx_entry *swr = (void *)txq->sw_ring;\n+\n+\t\ti = txq->tx_next_dd - txq->tx_rs_thresh + 1;\n+\t\tif (txq->tx_tail < i) {\n+\t\t\tfor (; i < txq->nb_tx_desc; i++) {\n+\t\t\t\trte_pktmbuf_free_seg(swr[i].mbuf);\n+\t\t\t\tswr[i].mbuf = NULL;\n+\t\t\t}\n+\t\t\ti = 0;\n+\t\t}\n+\t\tfor (; i < txq->tx_tail; i++) {\n+\t\t\trte_pktmbuf_free_seg(swr[i].mbuf);\n+\t\t\tswr[i].mbuf = NULL;\n+\t\t}\n+\t\treturn;\n+\t}\n+#endif\n \tif (dev->tx_pkt_burst == i40e_xmit_pkts_vec_avx2 ||\n \t\t\tdev->tx_pkt_burst == i40e_xmit_pkts_vec) {\n \t\ti = txq->tx_next_dd - txq->tx_rs_thresh + 1;\ndiff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h\nindex 2e3e50eb79..2f55073c97 100644\n--- a/drivers/net/i40e/i40e_rxtx.h\n+++ b/drivers/net/i40e/i40e_rxtx.h\n@@ -129,6 +129,10 @@ struct i40e_tx_entry {\n \tuint16_t last_id;\n };\n \n+struct i40e_vec_tx_entry {\n+\tstruct rte_mbuf *mbuf;\n+};\n+\n /*\n * Structure associated with each TX queue.\n */\ndiff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/i40e/i40e_rxtx_vec_avx512.c\nindex ccddc3e2d4..43e939c605 100644\n--- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c\n+++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c\n@@ -873,6 +873,115 @@ i40e_recv_scattered_pkts_vec_avx512(void *rx_queue,\n \t\t\t\trx_pkts + retval, nb_pkts);\n }\n \n+static __rte_always_inline int\n+i40e_tx_free_bufs_avx512(struct i40e_tx_queue *txq)\n+{\n+\tstruct i40e_vec_tx_entry *txep;\n+\tuint32_t n;\n+\tuint32_t i;\n+\tint nb_free = 0;\n+\tstruct rte_mbuf *m, *free[RTE_I40E_TX_MAX_FREE_BUF_SZ];\n+\n+\t/* check DD bits on threshold descriptor */\n+\tif ((txq->tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &\n+\t\t\trte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=\n+\t\t\trte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE))\n+\t\treturn 0;\n+\n+\tn = txq->tx_rs_thresh;\n+\n+\t /* first buffer to free from S/W ring is at index\n+\t * tx_next_dd - (tx_rs_thresh-1)\n+\t */\n+\ttxep = (void *)txq->sw_ring;\n+\ttxep += txq->tx_next_dd - (n - 1);\n+\n+\tif (txq->offloads & DEV_TX_OFFLOAD_MBUF_FAST_FREE && (n & 31) == 0) {\n+\t\tstruct rte_mempool *mp = txep[0].mbuf->pool;\n+\t\tvoid **cache_objs;\n+\t\tstruct rte_mempool_cache *cache = rte_mempool_default_cache(mp,\n+\t\t\t\trte_lcore_id());\n+\n+\t\tif (!cache || cache->len == 0)\n+\t\t\tgoto normal;\n+\n+\t\tcache_objs = &cache->objs[cache->len];\n+\n+\t\tif (n > RTE_MEMPOOL_CACHE_MAX_SIZE) {\n+\t\t\trte_mempool_ops_enqueue_bulk(mp, (void *)txep, n);\n+\t\t\tgoto done;\n+\t\t}\n+\n+\t\t/* The cache follows the following algorithm\n+\t\t * 1. Add the objects to the cache\n+\t\t * 2. Anything greater than the cache min value (if it\n+\t\t * crosses the cache flush threshold) is flushed to the ring.\n+\t\t */\n+\t\t/* Add elements back into the cache */\n+\t\tuint32_t copied = 0;\n+\t\t/* n is multiple of 32 */\n+\t\twhile (copied < n) {\n+\t\t\tconst __m512i a = _mm512_load_si512(&txep[copied]);\n+\t\t\tconst __m512i b = _mm512_load_si512(&txep[copied + 8]);\n+\t\t\tconst __m512i c = _mm512_load_si512(&txep[copied + 16]);\n+\t\t\tconst __m512i d = _mm512_load_si512(&txep[copied + 24]);\n+\n+\t\t\t_mm512_storeu_si512(&cache_objs[copied], a);\n+\t\t\t_mm512_storeu_si512(&cache_objs[copied + 8], b);\n+\t\t\t_mm512_storeu_si512(&cache_objs[copied + 16], c);\n+\t\t\t_mm512_storeu_si512(&cache_objs[copied + 24], d);\n+\t\t\tcopied += 32;\n+\t\t}\n+\t\tcache->len += n;\n+\n+\t\tif (cache->len >= cache->flushthresh) {\n+\t\t\trte_mempool_ops_enqueue_bulk\n+\t\t\t\t(mp, &cache->objs[cache->size],\n+\t\t\t\tcache->len - cache->size);\n+\t\t\tcache->len = cache->size;\n+\t\t}\n+\t\tgoto done;\n+\t}\n+\n+normal:\n+\tm = rte_pktmbuf_prefree_seg(txep[0].mbuf);\n+\tif (likely(m)) {\n+\t\tfree[0] = m;\n+\t\tnb_free = 1;\n+\t\tfor (i = 1; i < n; i++) {\n+\t\t\trte_prefetch0(&txep[i + 3].mbuf->cacheline1);\n+\t\t\tm = rte_pktmbuf_prefree_seg(txep[i].mbuf);\n+\t\t\tif (likely(m)) {\n+\t\t\t\tif (likely(m->pool == free[0]->pool)) {\n+\t\t\t\t\tfree[nb_free++] = m;\n+\t\t\t\t} else {\n+\t\t\t\t\trte_mempool_put_bulk(free[0]->pool,\n+\t\t\t\t\t\t\t (void *)free,\n+\t\t\t\t\t\t\t nb_free);\n+\t\t\t\t\tfree[0] = m;\n+\t\t\t\t\tnb_free = 1;\n+\t\t\t\t}\n+\t\t\t}\n+\t\t}\n+\t\trte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);\n+\t} else {\n+\t\tfor (i = 1; i < n; i++) {\n+\t\t\tm = rte_pktmbuf_prefree_seg(txep[i].mbuf);\n+\t\t\tif (m)\n+\t\t\t\trte_mempool_put(m->pool, m);\n+\t\t}\n+\t}\n+\n+done:\n+\t/* buffers were freed, update counters */\n+\ttxq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);\n+\ttxq->tx_next_dd = (uint16_t)(txq->tx_next_dd + txq->tx_rs_thresh);\n+\tif (txq->tx_next_dd >= txq->nb_tx_desc)\n+\t\ttxq->tx_next_dd = (uint16_t)(txq->tx_rs_thresh - 1);\n+\n+\treturn txq->tx_rs_thresh;\n+}\n+\n static inline void\n vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags)\n {\n@@ -892,13 +1001,6 @@ vtx(volatile struct i40e_tx_desc *txdp,\n \tconst uint64_t hi_qw_tmpl = (I40E_TX_DESC_DTYPE_DATA |\n \t\t\t((uint64_t)flags << I40E_TXD_QW1_CMD_SHIFT));\n \n-\t/* if unaligned on 32-bit boundary, do one to align */\n-\tif (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {\n-\t\tvtx1(txdp, *pkt, flags);\n-\t\tnb_pkts--, txdp++, pkt++;\n-\t}\n-\n-\t/* do two at a time while possible, in bursts */\n \tfor (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {\n \t\tuint64_t hi_qw3 =\n \t\t\thi_qw_tmpl |\n@@ -917,14 +1019,13 @@ vtx(volatile struct i40e_tx_desc *txdp,\n \t\t\t((uint64_t)pkt[0]->data_len <<\n \t\t\t I40E_TXD_QW1_TX_BUF_SZ_SHIFT);\n \n-\t\t__m256i desc2_3 = _mm256_set_epi64x\n+\t\t__m512i desc0_3 =\n+\t\t\t_mm512_set_epi64\n \t\t\t(hi_qw3, pkt[3]->buf_iova + pkt[3]->data_off,\n-\t\t\thi_qw2, pkt[2]->buf_iova + pkt[2]->data_off);\n-\t\t__m256i desc0_1 = _mm256_set_epi64x\n-\t\t\t(hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off,\n+\t\t\thi_qw2, pkt[2]->buf_iova + pkt[2]->data_off,\n+\t\t\thi_qw1, pkt[1]->buf_iova + pkt[1]->data_off,\n \t\t\thi_qw0, pkt[0]->buf_iova + pkt[0]->data_off);\n-\t\t_mm256_store_si256((void *)(txdp + 2), desc2_3);\n-\t\t_mm256_store_si256((void *)txdp, desc0_1);\n+\t\t_mm512_storeu_si512((void *)txdp, desc0_3);\n \t}\n \n \t/* do any last ones */\n@@ -934,13 +1035,23 @@ vtx(volatile struct i40e_tx_desc *txdp,\n \t}\n }\n \n+static __rte_always_inline void\n+tx_backlog_entry_avx512(struct i40e_vec_tx_entry *txep,\n+\t\t\tstruct rte_mbuf **tx_pkts, uint16_t nb_pkts)\n+{\n+\tint i;\n+\n+\tfor (i = 0; i < (int)nb_pkts; ++i)\n+\t\ttxep[i].mbuf = tx_pkts[i];\n+}\n+\n static inline uint16_t\n i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,\n \t\t\t\t uint16_t nb_pkts)\n {\n \tstruct i40e_tx_queue *txq = (struct i40e_tx_queue *)tx_queue;\n \tvolatile struct i40e_tx_desc *txdp;\n-\tstruct i40e_tx_entry *txep;\n+\tstruct i40e_vec_tx_entry *txep;\n \tuint16_t n, nb_commit, tx_id;\n \tuint64_t flags = I40E_TD_CMD;\n \tuint64_t rs = I40E_TX_DESC_CMD_RS | I40E_TD_CMD;\n@@ -949,7 +1060,7 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,\n \tnb_pkts = RTE_MIN(nb_pkts, txq->tx_rs_thresh);\n \n \tif (txq->nb_tx_free < txq->tx_free_thresh)\n-\t\ti40e_tx_free_bufs(txq);\n+\t\ti40e_tx_free_bufs_avx512(txq);\n \n \tnb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);\n \tif (unlikely(nb_pkts == 0))\n@@ -957,13 +1068,14 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,\n \n \ttx_id = txq->tx_tail;\n \ttxdp = &txq->tx_ring[tx_id];\n-\ttxep = &txq->sw_ring[tx_id];\n+\ttxep = (void *)txq->sw_ring;\n+\ttxep += tx_id;\n \n \ttxq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);\n \n \tn = (uint16_t)(txq->nb_tx_desc - tx_id);\n \tif (nb_commit >= n) {\n-\t\ttx_backlog_entry(txep, tx_pkts, n);\n+\t\ttx_backlog_entry_avx512(txep, tx_pkts, n);\n \n \t\tvtx(txdp, tx_pkts, n - 1, flags);\n \t\ttx_pkts += (n - 1);\n@@ -977,11 +1089,11 @@ i40e_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts,\n \t\ttxq->tx_next_rs = (uint16_t)(txq->tx_rs_thresh - 1);\n \n \t\t/* avoid reach the end of ring */\n-\t\ttxdp = &txq->tx_ring[tx_id];\n-\t\ttxep = &txq->sw_ring[tx_id];\n+\t\ttxdp = txq->tx_ring;\n+\t\ttxep = (void *)txq->sw_ring;\n \t}\n \n-\ttx_backlog_entry(txep, tx_pkts, nb_commit);\n+\ttx_backlog_entry_avx512(txep, tx_pkts, nb_commit);\n \n \tvtx(txdp, tx_pkts, nb_commit, flags);\n \n", "prefixes": [ "v3", "3/3" ] }{ "id": 86523, "url": "