From patchwork Sun May 28 15:40:35 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yongseok Koh X-Patchwork-Id: 24804 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id 9FE8E567C; Sun, 28 May 2017 17:40:55 +0200 (CEST) Received: from EUR02-VE1-obe.outbound.protection.outlook.com (mail-eopbgr20083.outbound.protection.outlook.com [40.107.2.83]) by dpdk.org (Postfix) with ESMTP id 03D8C2986 for ; Sun, 28 May 2017 17:40:53 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=fFwulHq95mWYX1Ky419Wf0m1wlwd/RZQnkPTxyphYqk=; b=kCUiPQD6aLj0srVB21ESZ6SR0ZXQh7Q4GqCSUUbByFrStsy+9MFV2zQFr/By5tQCtkHELM5Gz2B3bphSOZqNXfR0OtAeHkA/FDYnFow8FszfCGzl6OsdCALm9yRL6rKGShs7PiQZn89SHdPCzGnJSmjJ8L9vwTuUZchg2HpSqGc= Authentication-Results: intel.com; dkim=none (message not signed) header.d=none; intel.com; dmarc=none action=none header.from=mellanox.com; Received: from mellanox.com (12.250.235.110) by HE1PR0501MB2041.eurprd05.prod.outlook.com (10.167.245.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.1101.14; Sun, 28 May 2017 15:40:47 +0000 From: Yongseok Koh To: CC: , , , Yongseok Koh Date: Sun, 28 May 2017 08:40:35 -0700 Message-ID: <20170528154035.32198-1-yskoh@mellanox.com> X-Mailer: git-send-email 2.11.0 MIME-Version: 1.0 X-Originating-IP: [12.250.235.110] X-ClientProxiedBy: BN6PR08CA0058.namprd08.prod.outlook.com (10.172.144.20) To HE1PR0501MB2041.eurprd05.prod.outlook.com (10.167.245.147) X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: HE1PR0501MB2041: X-MS-Office365-Filtering-Correlation-Id: 4010a5c9-8d80-4e33-2d9a-08d4a5dfe9be X-MS-Office365-Filtering-HT: Tenant X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(48565401081)(201703131423075)(201703031133081); SRVR:HE1PR0501MB2041; X-Microsoft-Exchange-Diagnostics: 1; HE1PR0501MB2041; 3:JTSZSDtJvjZMWfL+FcgD/NYE38Ds9n3NC4dPz1A9jOIJt9OeS0qaRfPg2rR3Mo1msZb9aJMBooHHgrvGwJUsUnQAdGE0IhVMjULQLrUiGNkABwEHoBl2R2pdhOw9kpSGSqXP2r37XlGk1CenMKpIlDYLOlx4aNAMKbRmowMJTNyBtUNNzMHf+/jl9jlTrGao/FNaZyCKk0c+MmyGBPD7KM2gxWxK6gdqgJXSPLqHjLPrd0z+NYz2rCrD38hYGaeQyjR1Ey2FXHYMbSWpJrqRHAJN/9XchWz6dAs8Qz0QqfA8GlLMJeWqtDR6fkl8MBO3g9+eHAJ3KK+MV03XVXpCiXMDkIzq7UcO1Tncxq5Htrk=; 25:dkdCmOPZuw+EwNIk7McYkIDKohEwHzBefNrrk0IpEgzIHtzYQRXJMgE3sfu+ENigx+YVPH+oqdB+4dtQ9mfJMuyOWw8pkENpmLFl8ihliy7uCp1KjMIiXeN5lIj1yfVNxw45Oh+8bQuEgJFZcKcvZvd0EdqYLTbpFgLF3MXndVLGNNZtLupElU5zt7Qrkr2s4bp9EEKIo9imj+BDhwcvCpyX7YeIPiLtJ13xllKhVRZWOhKfV9N8iDhqAGxuBt1C43Z5Jlx8AcMLTInyOlYwaKhdN3eTPnG/yaGPFMv0b7az1E0VfaEXL24w5B5kQFPAwYBdovph9OYy5wbT5WuTAiAS25oa0IH9eiz9MIp8CUmC9ug5PZQLgFbfIGitqtsEutKblJ3v+UcGXNINxwosxg5BFQwsP9fF9pdbEvaePNpeLseo+ObSaPPbe/sKipM1ZE4kPS+AcQ5DcSVO6YYfpG/7H5QjETd+wm/eODQwfoc= X-LD-Processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr X-Microsoft-Exchange-Diagnostics: 1; HE1PR0501MB2041; 31:dhoEE3QkDBJ+YyScBPZXLxYlvNWcnpRELayk05DHikUuzVaDYnE4NwzXXddRKeg5ib/Iw7J9WJC9aQvzRR8tIqwCZzDBJcNlpVEzajHadHp2E3gXUhr2f4TS5VKwckEYbZ793mJdHpojHIrSrAN/sEIQ1eKeAcuEtLtcb5eXhu+0ITofj15HbqPiJp1RqY/44eHWtQh7vck48rCRPkTA70yvx+eLAY2mMepsHe9CqEY=; 20:wOCQNx5LDD77ZiIpiM4kVHOqexpWfzHr95otjMRYxmW9tJj6ztAFm0eCG66tE/HvlGxuj/PLiCyRdhjOP0E/X+hNyA+iXnB6jA0lsvzhOrmKSGVNrYfwqi0xnnwHTIiaBCdTwVcbwYgvSWJze5U+kazq5WtvLhO8dFafMk/WGr0Rue5uhTfBzqDmUIcfk3t1TK7Z2UjkNzE8L3CWDF2XzT1UojqHaAIztNi64D/7vcHhY73Ps6hw0ob7jZKkVE0cyCc5W0DD848kmaUe233NiLll2koWnfCqVfnXb7mAxxl9idG/MwCF2cpA5aORrpBtiuFaH47Kyzr7qc0iRwZ0cL6ssiM0kavBqyDYTIAgRTK4YTmHZH8nJdq/dL9d6dNHdwn5YllKdyjAA93yX6YvELxjIzD2dovg8DV1v4aUVjZiW9HeviPxlnVQit2w0ESUNEJFLDk4m66fsBWD73P8bLpVIcz1RIP0StcW8yhDH/aMLPou8w/aglXfK08fAib+ X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(60795455431006); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040450)(601004)(2401047)(8121501046)(5005006)(93006095)(93001095)(3002001)(10201501046)(6055026)(6041248)(20161123562025)(20161123560025)(20161123558100)(20161123564025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123555025)(6072148); SRVR:HE1PR0501MB2041; BCL:0; PCL:0; RULEID:; SRVR:HE1PR0501MB2041; X-Microsoft-Exchange-Diagnostics: 1; HE1PR0501MB2041; 4:rI4KwMcqRY/pdJ/Vr/xpxmEvVvBarLaTGSgE6Jjz4Uw9q7eaZND1TqHMG9INTSHVsyed9b0pv31ACX7vPDP77LdZU/S2I+kMq/Pc1lHoxlUviXoGxgXYjlFCzXOI7aOOhtNmHeAEmW9F31BKaTjHcr4rox2gdmA9bfGBrgVXWEFqyZkBcPUaY5eCCMgwtHPZa2cmx0BrXYEfZsAbbSlHS2S3xidfcm2EHZGbnICDaoO90bCUseNfuXt9N4S23JOlOLbGsjABwkcVmM+rzJNy49fIRdnUkUJCakzuW6h2suDVKC7kvGs/D81wtU/tSvEhzKT6vizZjCizjL30oj7pLhsYbbQ9mWzP/D7q7UzJ0NZba8L3w9w9RB8Tc2/vpiLIk7iT7NEytIZAcyp6gfDv17pcJGFiZyt5rQm/ApiNZrxgBgF+T/HCFc/5LKOUiaiGtatwua/lwYx88gl8ZgcWYNPjw+SuJpsTjKuIbQHl2KQWhAEi77Os+GRHBvhq+sLKyK1rCroaX3XK4dtfzeWRWfrrOKmmD5qjtz9sa/L8DGrww6yxNwnOV6ju39sk9D4iSWY+NdgpWe3vv8TAL3M+hXcj7BxM27WgNp/eu2lmRgeV2ye0nZ2qzz02lK80/xy0yuzyFyQ1CDebphp56yFYAJXR/UfVHVlpifUjq4ZF9xdBEOBVDfIAaoFt39x9TvMulaHRqp0j4zlLfvmhFwsKapHtpzgMEvS1dHfNHAgd316jEjemkBQgU2fSG5IpVs8Hb/KlJBNaFaY12r5KpZHhMAEl8Kk8YH7uADO3YtZRwuXEql2FXPHGJ4MuGP9iHUunf9HZKyFt5mRGZsghZxQQUVOBv+3KAiw8grFG1blVF+c= X-Forefront-PRVS: 03218BFD9F X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(4630300001)(6009001)(39450400003)(39410400002)(39840400002)(39850400002)(39400400002)(39860400002)(5660300001)(478600001)(50226002)(6116002)(189998001)(54906002)(38730400002)(8676002)(3846002)(81166006)(4326008)(107886003)(110136004)(36756003)(53936002)(50986999)(305945005)(2906002)(7736002)(55016002)(42186005)(2351001)(6916009)(6666003)(33646002)(575784001)(86362001)(47776003)(50466002)(25786009)(48376002)(66066001)(1076002); DIR:OUT; SFP:1101; SCL:1; SRVR:HE1PR0501MB2041; H:mellanox.com; FPR:; SPF:None; MLV:sfv; LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; HE1PR0501MB2041; 23:UcRlp/jmjvu9OmLV9PWSHCI2FhIDlBRyO6rmY+Y?= R3TKSeV+FGTyF2r4H0O7LfwcxTCOPu1OoAk4wA/EmPmfEGosLMDaSTJvoYJ/+LYLlV/wkcIFDRSBHYa5rLZRHBml427AIu/zoiioQhZaT6700Tzb9ieWuzNKSpxgXQWw5HlcOwRsz6pWUwCdin0GHhKn8je9wnTcBXqn5wOtjU2vDu8PQLlqRqgrkmGirhWMBCzvASwEB0UZ4YOwGVsXmZLuHFVr3tHg8iaCeZ56cS682nYQKrYQqkdoRmOlNaNVNjVuHXHwW2JIsak0XTg/s5SLla0cziXTBeSU7zLvcMZqmaC0p7TbpfXEh2+z3tD5hyOLit3grXSHQSODHykSj31mFJjRKRT6CzDXKs5Lp9tVLGtfD0wCYukU+SWUBL7mAiX03cA5XgOdJbG7Drn53DS2U63RTDbKc5aLjYKKzQftIcZ8TWYpas9gtDHsoiCseZs6AMFOsnYdCyxOjBQbOAT/D8/4ZAgzhNAzFyCqAA3CzWOBavbaIc91RJu4EY/i0e5TtDrejEAONYAy3q6IiTrL0XkRUN9JkxVu2CxIE4k7bU4yIKs+vkrnKWe1OGtlOD6Z6ScJQIkPgszfaTcrXU8BSIE+bSW0NDMGSCY/ORCTTamH0bPe1okyoAUzXGlFC/dkPVZQWWHn47gvBhlaV/Lr8hebV5pKT+/rKKdFHp/Dgi0ftcZVx3/YHLrsOGUTdTVYBEfX8xS4Bntu9EWDVQ1AohcZvIyozcZ2A9sbFDaCwGgFSaowNr8Lp/qd68KbftYUSomEl1s/bJYwo7hxnqJGncteydTH3681gia/+kQq51iqzhQVo+kvCwE0m1lA36xpxKqoxO0yVU4sStdT/eNMUFieqCdNCA0oL8EzBYangBYdezMZ4rm2LtFXjxlg2IUESvzkmMT03ZDwRjCAh043VC+eyJYPcykKMsmCVTbItQ9SFCbEbYfe3W5FmldFQkk2g9+dZH6sW2ujhE5EPY7NeUQiXnGscYeBovfwda07x9m/ni1Z/kHwbLFva/pDge3g= X-Microsoft-Exchange-Diagnostics: 1; HE1PR0501MB2041; 6:3RkrCIdG16XSpwQk4jV9YENB3snrxDcpSNCVDrqpYmjeCM9Lv4fGXOpVmUfSMnfoxOqpFmTEx1uBqi5oc6o0fCWvn9Se8R8FZs5UckDLa9jL1hxNCiOFHp6qgp5K+VYZgVJcZmqkwuE0CWpWVQzBtbqukSu2ExleQIBnBe4M2sv6uiN5yU0qwQ9xgylBdtt0lf4aLSbJuB4z65+Sclm3uaxY2Hi3pi7ve9a9LmO1M02m40uw6PWhUhQrdPwzoM9/zOHMeKNcXfYVHLwpgkdYqs/7mOa7oX/TL31FywinKaj8D0pEKjEoUlpvUO0Wks9VqwUHkYgkJ4deZ4PwPJm0vEKnnySDjcbGpAKbse5flSKHs/RqIsWH5H5dkJQA/ZFJqYRybHm6EqRLtazRSUgPPp5chmyI4tbyapxOnohE/yCh2KajgiFLGDHTwr4lHB0+BDnE4IHV7WLsdgZn7XJU/LPOehFVqO6KTK/FmL0CqWUEw2o2QXup6gOfISVqnQSS3oulUWZAMr1nBjYm1fiGfMaGK3qu7qRm9rxX8e86dFM=; 5:86s70G1Qw/LXZtnUZW8AGOyLYSG7HtFmMzjziGjaGUMiKKwJ3XIah2HvQRoHe2ZrF+o2F3hVO8acM17hMlcXR6+Yjw4+L4eMpktO4xd9Ars+lFR7o0CVrttrrnZXtQGUXP24Lvik+oNamujA74BblOffTgKaKK+HofRzWl36SYY=; 24:3lnoZVdW1w/CDW0gV7VV6bOiDV6mqLWRpO5EklxCQNllp9+JzIAyUxJ0sug9Nw6Y3TlADXgo7Plkkz8SxKlhP4wMZNew1vEa4flO7Yqk60I= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1; HE1PR0501MB2041; 7:nZhY3kKQ+t5ur83z7U58ZtnZ9nCq4/Og2iS0c4E+EcByUGfv/PRUkcdCcrkygKQhJqgVkUieblyEWJC/l60M4Pw2m/IYOLtfYHalbl1uouHYcW0ZRsQ9YTTkvfDh/sXbHONiL2KCiW0FNmzZLg5Z7DKGn5alizsll5649IfpImM3D+DOTI30o2AoBL0iQ1bWiXslZ/ObDvgeV4Xc3s9w949ujlFvV5+gXBmki4VqaGqR/j7XULtVCZ8YYaqR2FleXHvSgxX3G/W89m79JkGi0U4eXLKox5Nnda2Fq/1J0/KJnznjESasqwE7G6ONxAYNDYlHRLoyPfm8skrMMCoUBA== X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 May 2017 15:40:47.1255 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0501MB2041 Subject: [dpdk-dev] [PATCH] net/mlx5: add vectorized Rx X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Vectorized Rx API for x86 is added. Signed-off-by: Yongseok Koh --- drivers/net/mlx5/Makefile | 1 + drivers/net/mlx5/mlx5_defs.h | 6 + drivers/net/mlx5/mlx5_ethdev.c | 8 + drivers/net/mlx5/mlx5_rxq.c | 39 +++- drivers/net/mlx5/mlx5_rxtx.c | 485 +++++++++++++++++++++++++++++++++++++++++ drivers/net/mlx5/mlx5_rxtx.h | 10 + 6 files changed, 548 insertions(+), 1 deletion(-) diff --git a/drivers/net/mlx5/Makefile b/drivers/net/mlx5/Makefile index c0799591b..02774e773 100644 --- a/drivers/net/mlx5/Makefile +++ b/drivers/net/mlx5/Makefile @@ -60,6 +60,7 @@ CFLAGS += -D_DEFAULT_SOURCE CFLAGS += -D_XOPEN_SOURCE=600 CFLAGS += $(WERROR_FLAGS) CFLAGS += -Wno-strict-prototypes +CFLAGS += -msse4.1 LDLIBS += -libverbs # A few warnings cannot be avoided in external headers. diff --git a/drivers/net/mlx5/mlx5_defs.h b/drivers/net/mlx5/mlx5_defs.h index 201bb3362..d2752b6a4 100644 --- a/drivers/net/mlx5/mlx5_defs.h +++ b/drivers/net/mlx5/mlx5_defs.h @@ -89,4 +89,10 @@ /* Maximum Packet headers size (L2+L3+L4) for TSO. */ #define MLX5_MAX_TSO_HEADER 128 +/* Parameters for Vectorized PMD */ +#undef MLX5_VECTORIZED_RX +#define MLX5_VPMD_RXQ_RPLNSH_THRESH 32U +#define MLX5_VPMD_MAX_RX_BURST MLX5_VPMD_RXQ_RPLNSH_THRESH +#define MLX5_VPMD_DESCS_PER_LOOP 4 + #endif /* RTE_PMD_MLX5_DEFS_H_ */ diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c index 3fd22cb85..8cd73c394 100644 --- a/drivers/net/mlx5/mlx5_ethdev.c +++ b/drivers/net/mlx5/mlx5_ethdev.c @@ -723,7 +723,11 @@ mlx5_dev_supported_ptypes_get(struct rte_eth_dev *dev) }; +#ifdef MLX5_VECTORIZED_RX + if (dev->rx_pkt_burst == mlx5_rx_burst_vec) +#else if (dev->rx_pkt_burst == mlx5_rx_burst) +#endif return ptypes; return NULL; } @@ -1605,5 +1609,9 @@ priv_select_tx_function(struct priv *priv) void priv_select_rx_function(struct priv *priv) { +#ifdef MLX5_VECTORIZED_RX + priv->dev->rx_pkt_burst = mlx5_rx_burst_vec; +#else priv->dev->rx_pkt_burst = mlx5_rx_burst; +#endif } diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c index 8b7823360..4b3295607 100644 --- a/drivers/net/mlx5/mlx5_rxq.c +++ b/drivers/net/mlx5/mlx5_rxq.c @@ -652,7 +652,20 @@ rxq_alloc_elts(struct rxq_ctrl *rxq_ctrl, unsigned int elts_n, const unsigned int sges_n = 1 << rxq_ctrl->rxq.sges_n; unsigned int i; int ret = 0; - +#ifdef MLX5_VECTORIZED_RX + uintptr_t p; + struct rte_mbuf mb_def = { .buf_addr = 0 }; /* zeroed mbuf */ + + /* Initialize default mbuf fields for vPMD */ + mb_def.nb_segs = 1; + rte_pktmbuf_reset_headroom(&mb_def); + mb_def.port = rxq_ctrl->rxq.port_id; + rte_mbuf_refcnt_set(&mb_def, 1); + /* prevent compiler reordering: rearm_data covers previous fields */ + rte_compiler_barrier(); + p = (uintptr_t)&mb_def.rearm_data; + rxq_ctrl->rxq.mbuf_initializer = *(uint64_t *)p; +#endif /* Iterate on segments. */ for (i = 0; (i != elts_n); ++i) { struct rte_mbuf *buf; @@ -854,6 +867,9 @@ rxq_setup(struct rxq_ctrl *tmpl) tmpl->rxq.cqe_n = log2above(ibcq->cqe); tmpl->rxq.cq_ci = 0; tmpl->rxq.rq_ci = 0; +#ifdef MLX5_VECTORIZED_RX + tmpl->rxq.rq_pi = 0; +#endif tmpl->rxq.cq_db = cq->dbrec; tmpl->rxq.wqes = (volatile struct mlx5_wqe_data_seg (*)[]) @@ -908,6 +924,9 @@ rxq_ctrl_setup(struct rte_eth_dev *dev, struct rxq_ctrl *rxq_ctrl, unsigned int mb_len = rte_pktmbuf_data_room_size(mp); unsigned int cqe_n = desc - 1; struct rte_mbuf *(*elts)[desc] = NULL; +#ifdef MLX5_VECTORIZED_RX + int i; +#endif int ret = 0; (void)conf; /* Thresholds configuration (ignored). */ @@ -987,7 +1006,11 @@ rxq_ctrl_setup(struct rte_eth_dev *dev, struct rxq_ctrl *rxq_ctrl, if (priv->cqe_comp) { attr.cq.comp_mask |= IBV_EXP_CQ_INIT_ATTR_FLAGS; attr.cq.flags |= IBV_EXP_CQ_COMPRESSED_CQE; +#ifdef MLX5_VECTORIZED_RX + cqe_n = desc - 1; +#else cqe_n = (desc * 2) - 1; /* Double the number of CQEs. */ +#endif } tmpl.cq = ibv_exp_create_cq(priv->ctx, cqe_n, NULL, tmpl.channel, 0, &attr.cq); @@ -1117,6 +1140,10 @@ rxq_ctrl_setup(struct rte_eth_dev *dev, struct rxq_ctrl *rxq_ctrl, rte_free(tmpl.rxq.elts); tmpl.rxq.elts = elts; *rxq_ctrl = tmpl; +#ifdef MLX5_VECTORIZED_RX + for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) + (*elts)[desc + i] = &rxq_ctrl->rxq.fake_mbuf; +#endif /* Update doorbell counter. */ rxq_ctrl->rxq.rq_ci = desc >> rxq_ctrl->rxq.sges_n; rte_wmb(); @@ -1192,7 +1219,12 @@ mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc, if (rxq_ctrl->rxq.elts_n != log2above(desc)) { rxq_ctrl = rte_realloc(rxq_ctrl, sizeof(*rxq_ctrl) + +#ifdef MLX5_VECTORIZED_RX + (desc + MLX5_VPMD_DESCS_PER_LOOP) * + sizeof(struct rte_mbuf *), +#else desc * sizeof(struct rte_mbuf *), +#endif RTE_CACHE_LINE_SIZE); if (!rxq_ctrl) { ERROR("%p: unable to reallocate queue index %u", @@ -1203,7 +1235,12 @@ mlx5_rx_queue_setup(struct rte_eth_dev *dev, uint16_t idx, uint16_t desc, } } else { rxq_ctrl = rte_calloc_socket("RXQ", 1, sizeof(*rxq_ctrl) + +#ifdef MLX5_VECTORIZED_RX + (desc + MLX5_VPMD_DESCS_PER_LOOP) * + sizeof(struct rte_mbuf *), +#else desc * sizeof(struct rte_mbuf *), +#endif 0, socket); if (rxq_ctrl == NULL) { ERROR("%p: unable to allocate queue index %u", diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index 6254228a9..1d74ce5e1 100644 --- a/drivers/net/mlx5/mlx5_rxtx.c +++ b/drivers/net/mlx5/mlx5_rxtx.c @@ -1948,6 +1948,491 @@ rxq_cq_to_ol_flags(struct rxq *rxq, volatile struct mlx5_cqe *cqe) return ol_flags; } +#ifdef MLX5_VECTORIZED_RX +#if !defined (RTE_ARCH_X86_64) +#error Currently supports only 64bit architectures +#endif + +#ifndef __INTEL_COMPILER +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wcast-qual" +#endif + +static inline void +rxq_copy_mbuf_v(struct rxq *rxq, struct rte_mbuf **pkts, uint16_t n) +{ + const uint16_t q_mask = (1 << rxq->elts_n) - 1; + struct rte_mbuf **elts = &(*rxq->elts)[rxq->rq_pi & q_mask]; + unsigned int pos; + uint16_t p = n & -2; + + for (pos = 0; pos < p; pos += 2) { + __m128i mbp; + mbp = _mm_loadu_si128((__m128i *)&elts[pos]); + _mm_storeu_si128((__m128i *)&pkts[pos], mbp); + } + if (n & 1) + pkts[pos] = elts[pos]; +} + +static inline void +rxq_replenish_bulk_mbuf(struct rxq *rxq) +{ + const unsigned int q_mask = (1 << rxq->elts_n) - 1; + unsigned int elts_idx = rxq->rq_ci & q_mask; + struct rte_mbuf **elts = &(*rxq->elts)[elts_idx]; + volatile struct mlx5_wqe_data_seg *wq = &(*rxq->wqes)[elts_idx]; + unsigned int i; + + if (rte_mempool_get_bulk(rxq->mp, (void *)elts, + MLX5_VPMD_RXQ_RPLNSH_THRESH) < 0) { + /* TODO: exception handling by fake_mbuf? */ + rxq->stats.rx_nombuf += MLX5_VPMD_RXQ_RPLNSH_THRESH; + return; + } + for (i = 0; i < MLX5_VPMD_RXQ_RPLNSH_THRESH; i++) + wq[i].addr = htonll(rte_pktmbuf_mtod(elts[i], uintptr_t)); + rxq->rq_ci += MLX5_VPMD_RXQ_RPLNSH_THRESH; + rte_wmb(); + *rxq->rq_db = htonl(rxq->rq_ci); +} + +static inline void +rxq_cq_uncompress_v(struct rxq *rxq, + volatile struct mlx5_cqe *cq, + struct rte_mbuf **elts) +{ + volatile struct mlx5_mini_cqe8 *mcq = (void *)(cq + 1); + struct rte_mbuf *t_pkt = elts[0]; /* Title packet */ + unsigned int pos; + uint16_t mcqe_n; + __m128i shuf_mask1, shuf_mask2; + __m128i rearm, rxdf; + const __m128i crc_adjust = _mm_set_epi16( + 0, 0, 0, + rxq->crc_present * (-ETHER_CRC_LEN), + 0, + rxq->crc_present * (-ETHER_CRC_LEN), + 0, 0 + ); + + rearm = _mm_loadu_si128((__m128i *)&t_pkt->rearm_data); + rxdf = _mm_loadu_si128((__m128i *)&t_pkt->rx_descriptor_fields1); + /* mask to shuffle from extracted mini CQE to mbuf */ + shuf_mask1 = _mm_set_epi8( + 0, 1, 2, 3, /* rss, bswap32 */ + 0xFF, 0xFF, /* skip vlan_tci */ + 6, 7, /* data_len, bswap16 */ + 0xFF, 0xFF, 6, 7, /* pkt_len, bswap16 */ + 0xFF, 0xFF, 0xFF, 0xFF /* skip packet_type */ + ); + shuf_mask2 = _mm_set_epi8( + 8, 9, 10, 11, /* rss, bswap32 */ + 0xFF, 0xFF, /* skip vlan_tci */ + 14, 15, /* data_len, bswap16 */ + 0xFF, 0xFF, 14, 15, /* pkt_len, bswap16 */ + 0xFF, 0xFF, 0xFF, 0xFF /* skip packet_type */ + ); + /* Restore the compressed count. Must be 16 bits. */ + mcqe_n = t_pkt->data_len + (rxq->crc_present * ETHER_CRC_LEN); + /* + * A. load mCQEs into a 128bit register. + * B. store rearm data to mbuf. + * C. combine data from mCQEs with rx_descriptor_fields1. + * D. store rx_descriptor_fields1. + */ + for (pos = 0; pos < mcqe_n; ) { + __m128i mcqe1, mcqe2; + __m128i rxdf1, rxdf2; + + /* A.1 load mCQEs into a 128bit register. */ + mcqe1 = _mm_loadu_si128((__m128i *)&mcq[pos % 8]); + mcqe2 = _mm_loadu_si128((__m128i *)&mcq[pos % 8 + 2]); + /* B.1 store rearm data to mbuf. */ + _mm_storeu_si128((__m128i *)&elts[pos]->rearm_data, rearm); + _mm_storeu_si128((__m128i *)&elts[pos + 1]->rearm_data, rearm); + _mm_storeu_si128((__m128i *)&elts[pos + 2]->rearm_data, rearm); + _mm_storeu_si128((__m128i *)&elts[pos + 3]->rearm_data, rearm); + /* C.1 combine data from mCQEs with rx_descriptor_fields1. */ + rxdf1 = _mm_shuffle_epi8(mcqe1, shuf_mask1); + rxdf2 = _mm_shuffle_epi8(mcqe1, shuf_mask2); + rxdf1 = _mm_add_epi16(rxdf1, crc_adjust); + rxdf2 = _mm_add_epi16(rxdf2, crc_adjust); + rxdf1 = _mm_blend_epi16(rxdf1, rxdf, 0x23); + rxdf2 = _mm_blend_epi16(rxdf2, rxdf, 0x23); + /* D.1 store rx_descriptor_fields1. */ + _mm_storeu_si128((__m128i *)&elts[pos]->rx_descriptor_fields1, + rxdf1); + _mm_storeu_si128((__m128i *)&elts[pos + 1]->rx_descriptor_fields1, + rxdf2); + /* C.1 combine data from mCQEs with rx_descriptor_fields1. */ + rxdf1 = _mm_shuffle_epi8(mcqe2, shuf_mask1); + rxdf2 = _mm_shuffle_epi8(mcqe2, shuf_mask2); + rxdf1 = _mm_add_epi16(rxdf1, crc_adjust); + rxdf2 = _mm_add_epi16(rxdf2, crc_adjust); + rxdf1 = _mm_blend_epi16(rxdf1, rxdf, 0x23); + rxdf2 = _mm_blend_epi16(rxdf2, rxdf, 0x23); + /* D.1 store rx_descriptor_fields1. */ + _mm_storeu_si128((__m128i *)&elts[pos + 2]->rx_descriptor_fields1, + rxdf1); + _mm_storeu_si128((__m128i *)&elts[pos + 3]->rx_descriptor_fields1, + rxdf2); + /* TODO: need to copy hash.fdir.hi ?? */ + pos += MLX5_VPMD_DESCS_PER_LOOP; + /* Move to next CQE */ + if (!(pos & 0x7)) + mcq = (void *)(cq + pos); + } + /* Invalidate consumed CQEs. */ + for (pos = 1; pos < mcqe_n; ++pos) + cq[pos].op_own = MLX5_CQE_INVALIDATE; + rxq->cq_ci += mcqe_n; +} + +static uint32_t mlx5_ptype_table[16] = { + RTE_PTYPE_UNKNOWN, + RTE_PTYPE_L3_IPV6_EXT_UNKNOWN, /* b0001 */ + RTE_PTYPE_L3_IPV4_EXT_UNKNOWN, /* b0010 */ + RTE_PTYPE_UNKNOWN, RTE_PTYPE_UNKNOWN, + RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | + RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN, /* b0101 */ + RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | + RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN, /* b0110 */ + RTE_PTYPE_UNKNOWN, RTE_PTYPE_UNKNOWN, + RTE_PTYPE_L3_IPV6_EXT_UNKNOWN, /* b1001 */ + RTE_PTYPE_L3_IPV4_EXT_UNKNOWN, /* b1010 */ + RTE_PTYPE_UNKNOWN, RTE_PTYPE_UNKNOWN, + RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | + RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN, /* b1101 */ + RTE_PTYPE_L3_IPV6_EXT_UNKNOWN | + RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN, /* b1110 */ + RTE_PTYPE_UNKNOWN +}; + + +static inline void +rxq_cq_to_ptype_oflags_v(struct rxq *rxq, __m128i cqes[4], struct rte_mbuf **pkts) +{ + __m128i pinfo0; + __m128i pinfo1; + __m128i pinfo, ptype; + const __m128i ptype_mask = _mm_set_epi32( + 0xd07, 0xd07, 0xd07, 0xd07); + const __m128i pinfo_mask = _mm_set_epi32( + 0x3, 0x3, 0x3, 0x3); + const __m128i mbuf_init = _mm_loadl_epi64((__m128i *)&rxq->mbuf_initializer); + __m128i rearm0, rearm1, rearm2, rearm3; + + /* Extract pkt_info field */ + pinfo0 = _mm_unpacklo_epi32(cqes[0], cqes[1]); + pinfo1 = _mm_unpacklo_epi32(cqes[2], cqes[3]); + pinfo = _mm_unpacklo_epi64(pinfo0, pinfo1); + /* Extract hdr_type_etc field */ + pinfo0 = _mm_unpackhi_epi32(cqes[0], cqes[1]); + pinfo1 = _mm_unpackhi_epi32(cqes[2], cqes[3]); + ptype = _mm_unpacklo_epi64(pinfo0, pinfo1); + /* + * Merge the two fields to generate the following + * bit[0] = l2_ok, bit[1] = l3_ok + * bit[2] = l4_ok, + * bit[8] = cv, bit[11:10] = l3_hdr_type + * bit[12] = tunneled, bit[13] = outer_l3_type + */ + ptype = _mm_and_si128(ptype, ptype_mask); + pinfo = _mm_and_si128(pinfo, pinfo_mask); + pinfo = _mm_slli_epi32(pinfo, 12); + ptype = _mm_or_si128(ptype, pinfo); + pinfo = _mm_srli_epi32(ptype, 10); + pkts[0]->packet_type = mlx5_ptype_table[_mm_extract_epi8(ptype, 0)]; + pkts[1]->packet_type = mlx5_ptype_table[_mm_extract_epi8(ptype, 4)]; + pkts[2]->packet_type = mlx5_ptype_table[_mm_extract_epi8(ptype, 8)]; + pkts[3]->packet_type = mlx5_ptype_table[_mm_extract_epi8(ptype, 12)]; + /* TODO: Fill ol_flags */ + rearm0 = _mm_blend_epi16(mbuf_init, mbuf_init, 0x01); + rearm1 = _mm_blend_epi16(mbuf_init, mbuf_init, 0x02); + rearm2 = _mm_blend_epi16(mbuf_init, mbuf_init, 0x04); + rearm3 = _mm_blend_epi16(mbuf_init, mbuf_init, 0x08); + _mm_store_si128((__m128i *)&pkts[0]->rearm_data, rearm0); + _mm_store_si128((__m128i *)&pkts[1]->rearm_data, rearm1); + _mm_store_si128((__m128i *)&pkts[2]->rearm_data, rearm2); + _mm_store_si128((__m128i *)&pkts[3]->rearm_data, rearm3); +} + +/** + * DPDK callback for vectorized RX. + * + * @param dpdk_rxq + * Generic pointer to RX queue structure. + * @param[out] pkts + * Array to store received packets. + * @param pkts_n + * Maximum number of packets in array. + * + * @return + * Number of packets successfully received (<= pkts_n). + */ +uint16_t +mlx5_rx_burst_vec(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n) +{ + struct rxq *rxq = dpdk_rxq; + const uint16_t q_n = 1 << rxq->cqe_n; + const uint16_t q_mask = q_n - 1; + volatile struct mlx5_cqe *cq; + struct rte_mbuf **elts; + unsigned int pos; + uint64_t comp_idx = MLX5_VPMD_DESCS_PER_LOOP; + uint16_t nocmp_n = 0; + uint16_t rcvd_pkt = 0; + unsigned int cq_idx = rxq->cq_ci & q_mask; + unsigned int elts_idx; + unsigned int ownership = !!(rxq->cq_ci & (q_mask + 1)); + __m128i owner_check, opcode_check, format_check; + __m128i shuf_mask, blend_mask; + const __m128i zero = _mm_setzero_si128(); + const __m128i ones = _mm_cmpeq_epi32(zero, zero); + const __m128i crc_adjust = _mm_set_epi16( + 0, 0, 0, 0, 0, + rxq->crc_present * (-ETHER_CRC_LEN), + 0, + rxq->crc_present * (-ETHER_CRC_LEN) + ); + + /* TODO: Consider striding for multi-seg packet? */ + assert(rxq->sges_n == 0); + assert(rxq->cqe_n == rxq->elts_n); + cq = &(*rxq->cqes)[cq_idx]; + rte_prefetch0(cq); + rte_prefetch0(cq + 1); + rte_prefetch0(cq + 2); + rte_prefetch0(cq + 3); + pkts_n = RTE_MIN(pkts_n, MLX5_VPMD_MAX_RX_BURST); + /* + * Order of indexes: + * rq_ci >= cq_ci >= rq_pi + * Definition of indexes: + * rq_ci - cq_ci := # of buffers owned by HW (posted). + * cq_ci - rq_pi := # of buffers not returned to app (uncompressed). + * N - (rq_ci - rq_pi) := # of buffers consumed (to be replenished). + */ + if ((uint16_t)(q_n - (rxq->rq_ci - rxq->rq_pi)) >= + MLX5_VPMD_RXQ_RPLNSH_THRESH) + rxq_replenish_bulk_mbuf(rxq); + /* See if there're unreturned mbufs from compressed CQE. */ + rcvd_pkt = rxq->cq_ci - rxq->rq_pi; + if (rcvd_pkt > 0) { + rcvd_pkt = RTE_MIN(rcvd_pkt, pkts_n); + rxq_copy_mbuf_v(rxq, pkts, rcvd_pkt); + rxq->rq_pi += rcvd_pkt; + pkts += rcvd_pkt; + } + elts_idx = rxq->rq_pi & q_mask; + elts = &(*rxq->elts)[elts_idx]; + pkts_n = RTE_MIN((uint16_t)(pkts_n - rcvd_pkt), q_n - elts_idx); + /* + * TODO; Need to make pkts not overflow. May need to reserve a few NULL + * CQEs at the end of CQ. + * + * pkts_n = RTE_ALIGN_FLOOR(pkts_n, MLX5_VPMD_DESCS_PER_LOOP); + */ + if (!pkts_n) { +#ifdef MLX5_PMD_SOFT_COUNTERS + /* Increment packets counter. */ + rxq->stats.ipackets += rcvd_pkt; +#endif + return rcvd_pkt; + } + /* At this point, there shouldn't be any remained packets */ + assert(rxq->rq_pi == rxq->cq_ci); + owner_check = + _mm_set_epi64x(0x0100000001000000LL, 0x0100000001000000LL); + opcode_check = + _mm_set_epi64x(0xf0000000f0000000LL, 0xf0000000f0000000LL); + format_check = + _mm_set_epi64x(0x0c0000000c000000LL, 0x0c0000000c000000LL); + /* Mask to blend from the last Qword to the first DQword */ + blend_mask = _mm_set_epi8( + 0x80, 0x80, 0x80, 0x80, + 0x80, 0x80, 0x80, 0x80, + 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x80 + ); + /* Mask to shuffle from extracted CQE to mbuf */ + shuf_mask = _mm_set_epi8( + 0xFF, 1, 2, 3, /* fdir.hi, bswap32 */ + 12, 13, 14, 15, /* rss, bswap32 */ + 10, 11, /* vlan_tci, bswap16 */ + 4, 5, /* data_len, bswap16 */ + 0xFF, 0xFF, /* zero out 2nd half of pkt_len */ + 4, 5 /* pkt_len, bswap16 */ + ); + /* + * A. load first Qword (8bytes) in one loop. + * B. copy 4 mbuf pointers from elts ring to returing pkts. + * C. load remained CQE data and extract necessary fields. + * D. fill in mbuf. + * E. get vaild CQEs. + * F. find compressed CQE. + */ + for (pos = 0; pos < pkts_n; + pos += MLX5_VPMD_DESCS_PER_LOOP) { + __m128i cqes[MLX5_VPMD_DESCS_PER_LOOP]; + __m128i cqe_tmp1, cqe_tmp2; + __m128i pkt_mb0, pkt_mb1, pkt_mb2, pkt_mb3; + __m128i op_own, op_own_tmp1, op_own_tmp2; + __m128i owner_mask, invalid_mask, comp_mask; + __m128i mbp1, mbp2; + uint64_t n; + + /* Prefetch next 4 CQEs */ + rte_prefetch0(&cq[pos + MLX5_VPMD_DESCS_PER_LOOP]); + rte_prefetch0(&cq[pos + MLX5_VPMD_DESCS_PER_LOOP + 1]); + rte_prefetch0(&cq[pos + MLX5_VPMD_DESCS_PER_LOOP + 2]); + rte_prefetch0(&cq[pos + MLX5_VPMD_DESCS_PER_LOOP + 3]); + /* A.1 load cqes. */ + cqes[3] = _mm_loadl_epi64((__m128i *)&cq[pos + 3].sop_drop_qpn); + rte_compiler_barrier(); + cqes[2] = _mm_loadl_epi64((__m128i *)&cq[pos + 2].sop_drop_qpn); + rte_compiler_barrier(); + /* B.1 load mbuf pointers. */ + mbp1 = _mm_loadu_si128((__m128i *)&elts[pos]); + mbp2 = _mm_loadu_si128((__m128i *)&elts[pos + 2]); + /* A.1 load cqes. */ + cqes[1] = _mm_loadl_epi64((__m128i *)&cq[pos + 1].sop_drop_qpn); + rte_compiler_barrier(); + cqes[0] = _mm_loadl_epi64((__m128i *)&cq[pos].sop_drop_qpn); + /* B.2 copy mbuf pointers. */ + _mm_storeu_si128((__m128i *)&pkts[pos], mbp1); + _mm_storeu_si128((__m128i *)&pkts[pos + 2], mbp2); + rte_compiler_barrier(); + /* C.1 load remained CQE data and extract necessary fields. */ + cqe_tmp2 = _mm_loadu_si128((__m128i *)&cq[pos + 3]); + cqe_tmp1 = _mm_loadu_si128((__m128i *)&cq[pos + 2]); + cqes[3] = _mm_blendv_epi8(cqes[3], cqe_tmp2, blend_mask); + cqes[2] = _mm_blendv_epi8(cqes[2], cqe_tmp1, blend_mask); + cqe_tmp2 = _mm_loadu_si128((__m128i *)&cq[pos + 3].rsvd1[3]); + cqe_tmp1 = _mm_loadu_si128((__m128i *)&cq[pos + 2].rsvd1[3]); + cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x30); + cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x30); + cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + 3].rsvd2[10]); + cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos + 2].rsvd2[10]); + cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x04); + cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x04); + /* C.2 generate final structure for mbuf with swapping bytes */ + pkt_mb3 = _mm_shuffle_epi8(cqes[3], shuf_mask); + pkt_mb2 = _mm_shuffle_epi8(cqes[2], shuf_mask); + /* C.3 adjust CRC length */ + pkt_mb3 = _mm_add_epi16(pkt_mb3, crc_adjust); + pkt_mb2 = _mm_add_epi16(pkt_mb2, crc_adjust); + /* D.1 fill in mbuf - rx_descriptor_fields1. */ + _mm_storeu_si128((void *)&pkts[pos + 3]->pkt_len, pkt_mb3); + _mm_storeu_si128((void *)&pkts[pos + 2]->pkt_len, pkt_mb2); + /* E.1 extract op_own field. */ + op_own_tmp2 = _mm_unpacklo_epi32(cqes[2], cqes[3]); + /* C.1 load remained CQE data and extract necessary fields. */ + cqe_tmp2 = _mm_loadu_si128((__m128i *)&cq[pos + 1]); + cqe_tmp1 = _mm_loadu_si128((__m128i *)&cq[pos]); + cqes[1] = _mm_blendv_epi8(cqes[1], cqe_tmp2, blend_mask); + cqes[0] = _mm_blendv_epi8(cqes[0], cqe_tmp1, blend_mask); + cqe_tmp2 = _mm_loadu_si128((__m128i *)&cq[pos + 1].rsvd1[3]); + cqe_tmp1 = _mm_loadu_si128((__m128i *)&cq[pos].rsvd1[3]); + cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x30); + cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x30); + cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + 1].rsvd2[10]); + cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos].rsvd2[10]); + cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x04); + cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x04); + /* C.2 generate final structure for mbuf with swapping bytes */ + pkt_mb1 = _mm_shuffle_epi8(cqes[1], shuf_mask); + pkt_mb0 = _mm_shuffle_epi8(cqes[0], shuf_mask); + /* C.3 adjust CRC length */ + pkt_mb1 = _mm_add_epi16(pkt_mb1, crc_adjust); + pkt_mb0 = _mm_add_epi16(pkt_mb0, crc_adjust); + /* E.1 extract op_own byte. */ + op_own_tmp1 = _mm_unpacklo_epi32(cqes[0], cqes[1]); + op_own = _mm_unpackhi_epi64(op_own_tmp1, op_own_tmp2); + /* D.1 fill in mbuf - rx_descriptor_fields1. */ + _mm_storeu_si128((void *)&pkts[pos + 1]->pkt_len, pkt_mb1); + _mm_storeu_si128((void *)&pkts[pos]->pkt_len, pkt_mb0); + /* D.1 fill in mbuf - rearm_data and packet_type. */ + rxq_cq_to_ptype_oflags_v(rxq, cqes, &pkts[pos]); + /* E.2 flip owner bit to mark invalid CQEs. */ + owner_mask = _mm_and_si128(op_own, owner_check); + if (ownership) + owner_mask = _mm_xor_si128(owner_mask, owner_check); + owner_mask = _mm_cmpeq_epi32(owner_mask, owner_check); + owner_mask = _mm_packs_epi32(owner_mask, zero); + /* E.3 get mask for invalidated CQEs */ + invalid_mask = _mm_cmpeq_epi32(opcode_check, + _mm_and_si128(op_own, opcode_check)); + invalid_mask = _mm_packs_epi32(invalid_mask, zero); + /* E.4 mask out beyond boundary */ + invalid_mask = _mm_or_si128( + invalid_mask, + ones << + ((pkts_n - pos) * sizeof(uint16_t) * 8)); + /* E.5 merge invalid_mask with invalid owner */ + invalid_mask = _mm_or_si128(invalid_mask, owner_mask); + /* F.1 find compressed CQE format. */ + comp_mask = _mm_and_si128(op_own, format_check); + comp_mask = _mm_cmpeq_epi32(comp_mask, format_check); + comp_mask = _mm_packs_epi32(comp_mask, zero); + /* F.2 mask out invalid entries */ + comp_mask = _mm_andnot_si128(invalid_mask, comp_mask); + comp_idx = _mm_cvtsi128_si64(comp_mask); + /* F.3 get the first compressed CQE */ + comp_idx = comp_idx ? + __builtin_ctzll(comp_idx) / + (sizeof(uint16_t) * 8) : + MLX5_VPMD_DESCS_PER_LOOP; + /* E.6 mask out entries after the compressed CQE */ + invalid_mask = _mm_or_si128( + invalid_mask, + ones << + (comp_idx * sizeof(uint16_t) * 8)); + /* E.7 count non-compressed valid CQEs. */ + n = _mm_cvtsi128_si64(invalid_mask); + n = n ? __builtin_ctzll(n) / (sizeof(uint16_t) * 8) : + MLX5_VPMD_DESCS_PER_LOOP; + nocmp_n += n; + if (likely(n != MLX5_VPMD_DESCS_PER_LOOP)) + break; + } + /* If no new CQE seen, return without updating cq_db. */ + if (unlikely(!nocmp_n && comp_idx == MLX5_VPMD_DESCS_PER_LOOP)) + return rcvd_pkt; + /* Update the consumer indexes for non-compressed CQEs. */ + assert(nocmp_n <= pkts_n); + rxq->cq_ci += nocmp_n; + rxq->rq_pi += nocmp_n; + rcvd_pkt += nocmp_n; + /* Uncompress the last CQE if compressed. */ + if (comp_idx < MLX5_VPMD_DESCS_PER_LOOP) { + assert(comp_idx == (nocmp_n % MLX5_VPMD_DESCS_PER_LOOP)); + rxq_cq_uncompress_v(rxq, &cq[nocmp_n], &elts[nocmp_n]); + /* Return more packets if needed */ + if (nocmp_n < pkts_n) { + uint16_t n = rxq->cq_ci - rxq->rq_pi; + + n = RTE_MIN(n, pkts_n - nocmp_n); + rxq_copy_mbuf_v(rxq, &pkts[nocmp_n], n); + rxq->rq_pi += n; + rcvd_pkt += n; + } + } + rte_wmb(); + *rxq->cq_db = htonl(rxq->cq_ci); +#ifdef MLX5_PMD_SOFT_COUNTERS + /* Increment packets counter. */ + rxq->stats.ipackets += rcvd_pkt; +#endif + return rcvd_pkt; +} + +#ifndef __INTEL_COMPILER +#pragma GCC diagnostic pop +#endif +#endif /* MLX5_VECTORIZED_RX */ + /** * DPDK callback for RX. * diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h index 8db8eb144..424213503 100644 --- a/drivers/net/mlx5/mlx5_rxtx.h +++ b/drivers/net/mlx5/mlx5_rxtx.h @@ -119,6 +119,9 @@ struct rxq { volatile uint32_t *rq_db; volatile uint32_t *cq_db; uint16_t rq_ci; +#ifdef MLX5_VECTORIZED_RX + uint16_t rq_pi; +#endif uint16_t cq_ci; volatile struct mlx5_wqe_data_seg(*wqes)[]; volatile struct mlx5_cqe(*cqes)[]; @@ -126,6 +129,10 @@ struct rxq { struct rte_mbuf *(*elts)[]; struct rte_mempool *mp; struct mlx5_rxq_stats stats; +#ifdef MLX5_VECTORIZED_RX + uint64_t mbuf_initializer; + struct rte_mbuf fake_mbuf; +#endif } __rte_cache_aligned; /* RX queue control descriptor. */ @@ -328,6 +335,9 @@ uint16_t mlx5_tx_burst(void *, struct rte_mbuf **, uint16_t); uint16_t mlx5_tx_burst_mpw(void *, struct rte_mbuf **, uint16_t); uint16_t mlx5_tx_burst_mpw_inline(void *, struct rte_mbuf **, uint16_t); uint16_t mlx5_tx_burst_empw(void *, struct rte_mbuf **, uint16_t); +#ifdef MLX5_VECTORIZED_RX +uint16_t mlx5_rx_burst_vec(void *, struct rte_mbuf **, uint16_t); +#endif uint16_t mlx5_rx_burst(void *, struct rte_mbuf **, uint16_t); uint16_t removed_tx_burst(void *, struct rte_mbuf **, uint16_t); uint16_t removed_rx_burst(void *, struct rte_mbuf **, uint16_t);