From patchwork Wed Dec 6 17:24:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavan Nikhilesh Bhagavatula X-Patchwork-Id: 134887 X-Patchwork-Delegate: jerinj@marvell.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 380AD4368C; Wed, 6 Dec 2023 18:24:32 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E3A7F4029C; Wed, 6 Dec 2023 18:24:30 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by mails.dpdk.org (Postfix) with ESMTP id 2D0C34021E for ; Wed, 6 Dec 2023 18:24:27 +0100 (CET) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3B6CCCgh032691 for ; Wed, 6 Dec 2023 09:24:25 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=hnmTDQ4yHx4IqzH/o8vCIB8i1TH7KcNBVSPboNnei1E=; b=dfp05Uav28E6Vftader/pGhzwb13AQiiLRhMGPuHp7A95meET43sZ+/5BuG5wV7RH3tA h62WQsRAc1os0CqktDjq8igY2EZbibFQhE4u4OKXr4YyNOZedXFHqgUEjoS+kKO+RdRk XPzPxYpQiVZIFRz7VJgYR3MNFLTO82jcWYBsE8jmf2F91jTo4rNtwIk5OMoLAeyjQIxT xPEGwc/n0f7GKDPAy+AMYBKCYCT+IawQHfVT94AS4gyoTJIDYCrtxuItiWqUQgbgaZkN CtmcYyrV6DGXzAnRpJuRLF9+PVNT+slBYMFXbZv+sCwVor8/GmOo6oTFogku6OJtKmky pQ== Received: from dc5-exch02.marvell.com ([199.233.59.182]) by mx0a-0016f401.pphosted.com (PPS) with ESMTPS id 3utd0ruc6h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Wed, 06 Dec 2023 09:24:24 -0800 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Wed, 6 Dec 2023 09:24:23 -0800 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server id 15.0.1497.48 via Frontend Transport; Wed, 6 Dec 2023 09:24:23 -0800 Received: from MININT-80QBFE8.corp.innovium.com (MININT-80QBFE8.marvell.com [10.28.164.106]) by maili.marvell.com (Postfix) with ESMTP id 9FCD23F704E; Wed, 6 Dec 2023 09:24:21 -0800 (PST) From: To: , Vamsi Attunuru CC: , Pavan Nikhilesh Subject: [PATCH v3 1/3] net/octeon_ep: optimize Rx and Tx routines Date: Wed, 6 Dec 2023 22:54:17 +0530 Message-ID: <20231206172419.878-1-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231125160349.2021-1-pbhagavatula@marvell.com> References: <20231125160349.2021-1-pbhagavatula@marvell.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: _xNira5ihxkprfI48gmTAV_flKwYYK3O X-Proofpoint-GUID: _xNira5ihxkprfI48gmTAV_flKwYYK3O X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-06_15,2023-12-06_01,2023-05-22_02 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Pavan Nikhilesh Preset rearm data to avoid writing multiple fields in fastpath, Increase maximum outstanding Tx instructions from 128 to 256. Signed-off-by: Pavan Nikhilesh --- v3 Chnages: - Add more comments to the code. - Re-enable 32b build to prevent ABI break. v2 Changes: - Skip compiling for 32b x86 targets. drivers/net/octeon_ep/cnxk_ep_rx.c | 12 ++++++++---- drivers/net/octeon_ep/otx_ep_common.h | 3 +++ drivers/net/octeon_ep/otx_ep_rxtx.c | 27 +++++++++++++++++++++++++++ drivers/net/octeon_ep/otx_ep_rxtx.h | 2 +- 4 files changed, 39 insertions(+), 5 deletions(-) -- 2.25.1 diff --git a/drivers/net/octeon_ep/cnxk_ep_rx.c b/drivers/net/octeon_ep/cnxk_ep_rx.c index 74f0011283..75bb7225d2 100644 --- a/drivers/net/octeon_ep/cnxk_ep_rx.c +++ b/drivers/net/octeon_ep/cnxk_ep_rx.c @@ -93,7 +93,7 @@ cnxk_ep_check_rx_pkts(struct otx_ep_droq *droq) new_pkts = val - droq->pkts_sent_ism_prev; droq->pkts_sent_ism_prev = val; - if (val > (uint32_t)(1 << 31)) { + if (val > RTE_BIT32(31)) { /* Only subtract the packet count in the HW counter * when count above halfway to saturation. */ @@ -128,7 +128,6 @@ cnxk_ep_process_pkts_scalar(struct rte_mbuf **rx_pkts, struct otx_ep_droq *droq, { struct rte_mbuf **recv_buf_list = droq->recv_buf_list; uint32_t bytes_rsvd = 0, read_idx = droq->read_idx; - uint16_t port_id = droq->otx_ep_dev->port_id; uint16_t nb_desc = droq->nb_desc; uint16_t pkts; @@ -137,14 +136,19 @@ cnxk_ep_process_pkts_scalar(struct rte_mbuf **rx_pkts, struct otx_ep_droq *droq, struct rte_mbuf *mbuf; uint16_t pkt_len; + rte_prefetch0(recv_buf_list[otx_ep_incr_index(read_idx, 2, nb_desc)]); + rte_prefetch0(rte_pktmbuf_mtod(recv_buf_list[otx_ep_incr_index(read_idx, + 2, nb_desc)], + void *)); + mbuf = recv_buf_list[read_idx]; info = rte_pktmbuf_mtod(mbuf, struct otx_ep_droq_info *); read_idx = otx_ep_incr_index(read_idx, 1, nb_desc); pkt_len = rte_bswap16(info->length >> 48); - mbuf->data_off += OTX_EP_INFO_SIZE; mbuf->pkt_len = pkt_len; mbuf->data_len = pkt_len; - mbuf->port = port_id; + + *(uint64_t *)&mbuf->rearm_data = droq->rearm_data; rx_pkts[pkts] = mbuf; bytes_rsvd += pkt_len; } diff --git a/drivers/net/octeon_ep/otx_ep_common.h b/drivers/net/octeon_ep/otx_ep_common.h index 82e57520d3..299b5122d8 100644 --- a/drivers/net/octeon_ep/otx_ep_common.h +++ b/drivers/net/octeon_ep/otx_ep_common.h @@ -365,6 +365,9 @@ struct otx_ep_droq { /* receive buffer list contains mbuf ptr list */ struct rte_mbuf **recv_buf_list; + /* Packet re-arm data. */ + uint64_t rearm_data; + /* Packets pending to be processed */ uint64_t pkts_pending; diff --git a/drivers/net/octeon_ep/otx_ep_rxtx.c b/drivers/net/octeon_ep/otx_ep_rxtx.c index c421ef0a1c..40c4a16a38 100644 --- a/drivers/net/octeon_ep/otx_ep_rxtx.c +++ b/drivers/net/octeon_ep/otx_ep_rxtx.c @@ -284,6 +284,32 @@ otx_ep_droq_setup_ring_buffers(struct otx_ep_droq *droq) return 0; } +static inline uint64_t +otx_ep_set_rearm_data(struct otx_ep_device *otx_ep) +{ + uint16_t port_id = otx_ep->port_id; + struct rte_mbuf mb_def; + uint64_t *tmp; + + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, data_off) % 8 != 0); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, refcnt) - offsetof(struct rte_mbuf, data_off) != + 2); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, nb_segs) - offsetof(struct rte_mbuf, data_off) != + 4); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, port) - offsetof(struct rte_mbuf, data_off) != + 6); + mb_def.nb_segs = 1; + mb_def.data_off = RTE_PKTMBUF_HEADROOM + OTX_EP_INFO_SIZE; + mb_def.port = port_id; + rte_mbuf_refcnt_set(&mb_def, 1); + + /* Prevent compiler reordering: rearm_data covers previous fields */ + rte_compiler_barrier(); + tmp = (uint64_t *)&mb_def.rearm_data; + + return *tmp; +} + /* OQ initialization */ static int otx_ep_init_droq(struct otx_ep_device *otx_ep, uint32_t q_no, @@ -340,6 +366,7 @@ otx_ep_init_droq(struct otx_ep_device *otx_ep, uint32_t q_no, goto init_droq_fail; droq->refill_threshold = c_refill_threshold; + droq->rearm_data = otx_ep_set_rearm_data(otx_ep); /* Set up OQ registers */ ret = otx_ep->fn_list.setup_oq_regs(otx_ep, q_no); diff --git a/drivers/net/octeon_ep/otx_ep_rxtx.h b/drivers/net/octeon_ep/otx_ep_rxtx.h index cb68ef3b41..b159c32cae 100644 --- a/drivers/net/octeon_ep/otx_ep_rxtx.h +++ b/drivers/net/octeon_ep/otx_ep_rxtx.h @@ -17,7 +17,7 @@ #define OTX_EP_FSZ 28 #define OTX2_EP_FSZ 24 -#define OTX_EP_MAX_INSTR 128 +#define OTX_EP_MAX_INSTR 256 /* SDP_LENGTH_S specifies packet length and is of 8-byte size */ #define OTX_EP_INFO_SIZE 8 From patchwork Wed Dec 6 17:24:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavan Nikhilesh Bhagavatula X-Patchwork-Id: 134888 X-Patchwork-Delegate: jerinj@marvell.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6F8AE4368C; Wed, 6 Dec 2023 18:24:37 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id DB46342E92; Wed, 6 Dec 2023 18:24:31 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by mails.dpdk.org (Postfix) with ESMTP id 901D74021E for ; Wed, 6 Dec 2023 18:24:29 +0100 (CET) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3B6CCCgi032691; Wed, 6 Dec 2023 09:24:28 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=QmEAKUdMzZjSwoDEOX6t2b3IRiEDS9HdtOQj4cgJ6OI=; b=VJZ+lVZmbjwAeP4nCkrzkthnPA+TWWfbEqter9/koyaDd0/qdmY4zEupzwpXDp9XR926 Kf/hk8aZPdMaAG8WGBgy9umnTSdhjNYe6WB9pk9gMid7Q6Yv5gx82rzsztk5PNtNJHRC c3yI7DJRGVcbu9SS7vc1vcPFXzjW2G05gSdFUI2KjL9eeHfxB8nYcrzms/oJRdJ8KrBh uilo8V/AiXFrtXk4PHfXsuIdv8smCCCHrLPBbbUrALaVIgSR3jlb2kI99AY+l0w4njuY g8D2ldFWctQPU6BOeEo3f0+TKH4cM4zcFzYvvo+OWdr/58LNPWhvN5sHWIxg83ud6vU9 vQ== Received: from dc5-exch01.marvell.com ([199.233.59.181]) by mx0a-0016f401.pphosted.com (PPS) with ESMTPS id 3utd0ruc6r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Wed, 06 Dec 2023 09:24:28 -0800 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Wed, 6 Dec 2023 09:24:26 -0800 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server id 15.0.1497.48 via Frontend Transport; Wed, 6 Dec 2023 09:24:26 -0800 Received: from MININT-80QBFE8.corp.innovium.com (MININT-80QBFE8.marvell.com [10.28.164.106]) by maili.marvell.com (Postfix) with ESMTP id D67023F70A0; Wed, 6 Dec 2023 09:24:23 -0800 (PST) From: To: , Vamsi Attunuru , "Bruce Richardson" , Konstantin Ananyev CC: , Pavan Nikhilesh Subject: [PATCH v3 2/3] net/octeon_ep: use SSE instructions for Rx routine Date: Wed, 6 Dec 2023 22:54:18 +0530 Message-ID: <20231206172419.878-2-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231206172419.878-1-pbhagavatula@marvell.com> References: <20231125160349.2021-1-pbhagavatula@marvell.com> <20231206172419.878-1-pbhagavatula@marvell.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: pAqk4C88vrXPDPtpnu6H9rroGsJrdRkE X-Proofpoint-GUID: pAqk4C88vrXPDPtpnu6H9rroGsJrdRkE X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-06_15,2023-12-06_01,2023-05-22_02 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Pavan Nikhilesh Optimize Rx routine to use SSE instructions. Signed-off-by: Pavan Nikhilesh --- drivers/net/octeon_ep/cnxk_ep_rx.c | 159 +---------------------- drivers/net/octeon_ep/cnxk_ep_rx.h | 167 +++++++++++++++++++++++++ drivers/net/octeon_ep/cnxk_ep_rx_sse.c | 130 +++++++++++++++++++ drivers/net/octeon_ep/meson.build | 11 ++ drivers/net/octeon_ep/otx_ep_ethdev.c | 7 ++ drivers/net/octeon_ep/otx_ep_rxtx.h | 6 + 6 files changed, 322 insertions(+), 158 deletions(-) create mode 100644 drivers/net/octeon_ep/cnxk_ep_rx.h create mode 100644 drivers/net/octeon_ep/cnxk_ep_rx_sse.c diff --git a/drivers/net/octeon_ep/cnxk_ep_rx.c b/drivers/net/octeon_ep/cnxk_ep_rx.c index 75bb7225d2..f3e4fb27d1 100644 --- a/drivers/net/octeon_ep/cnxk_ep_rx.c +++ b/drivers/net/octeon_ep/cnxk_ep_rx.c @@ -2,164 +2,7 @@ * Copyright(C) 2023 Marvell. */ -#include "otx_ep_common.h" -#include "otx2_ep_vf.h" -#include "otx_ep_rxtx.h" - -static inline int -cnxk_ep_rx_refill_mbuf(struct otx_ep_droq *droq, uint32_t count) -{ - struct otx_ep_droq_desc *desc_ring = droq->desc_ring; - struct rte_mbuf **recv_buf_list = droq->recv_buf_list; - uint32_t refill_idx = droq->refill_idx; - struct rte_mbuf *buf; - uint32_t i; - int rc; - - rc = rte_pktmbuf_alloc_bulk(droq->mpool, &recv_buf_list[refill_idx], count); - if (unlikely(rc)) { - droq->stats.rx_alloc_failure++; - return rc; - } - - for (i = 0; i < count; i++) { - buf = recv_buf_list[refill_idx]; - desc_ring[refill_idx].buffer_ptr = rte_mbuf_data_iova_default(buf); - refill_idx++; - } - - droq->refill_idx = otx_ep_incr_index(droq->refill_idx, count, droq->nb_desc); - droq->refill_count -= count; - - return 0; -} - -static inline void -cnxk_ep_rx_refill(struct otx_ep_droq *droq) -{ - uint32_t desc_refilled = 0, count; - uint32_t nb_desc = droq->nb_desc; - uint32_t refill_idx = droq->refill_idx; - int rc; - - if (unlikely(droq->read_idx == refill_idx)) - return; - - if (refill_idx < droq->read_idx) { - count = droq->read_idx - refill_idx; - rc = cnxk_ep_rx_refill_mbuf(droq, count); - if (unlikely(rc)) { - droq->stats.rx_alloc_failure++; - return; - } - desc_refilled = count; - } else { - count = nb_desc - refill_idx; - rc = cnxk_ep_rx_refill_mbuf(droq, count); - if (unlikely(rc)) { - droq->stats.rx_alloc_failure++; - return; - } - - desc_refilled = count; - count = droq->read_idx; - rc = cnxk_ep_rx_refill_mbuf(droq, count); - if (unlikely(rc)) { - droq->stats.rx_alloc_failure++; - return; - } - desc_refilled += count; - } - - /* Flush the droq descriptor data to memory to be sure - * that when we update the credits the data in memory is - * accurate. - */ - rte_io_wmb(); - rte_write32(desc_refilled, droq->pkts_credit_reg); -} - -static inline uint32_t -cnxk_ep_check_rx_pkts(struct otx_ep_droq *droq) -{ - uint32_t new_pkts; - uint32_t val; - - /* Batch subtractions from the HW counter to reduce PCIe traffic - * This adds an extra local variable, but almost halves the - * number of PCIe writes. - */ - val = __atomic_load_n(droq->pkts_sent_ism, __ATOMIC_RELAXED); - new_pkts = val - droq->pkts_sent_ism_prev; - droq->pkts_sent_ism_prev = val; - - if (val > RTE_BIT32(31)) { - /* Only subtract the packet count in the HW counter - * when count above halfway to saturation. - */ - rte_write64((uint64_t)val, droq->pkts_sent_reg); - rte_mb(); - - rte_write64(OTX2_SDP_REQUEST_ISM, droq->pkts_sent_reg); - while (__atomic_load_n(droq->pkts_sent_ism, __ATOMIC_RELAXED) >= val) { - rte_write64(OTX2_SDP_REQUEST_ISM, droq->pkts_sent_reg); - rte_mb(); - } - - droq->pkts_sent_ism_prev = 0; - } - rte_write64(OTX2_SDP_REQUEST_ISM, droq->pkts_sent_reg); - droq->pkts_pending += new_pkts; - - return new_pkts; -} - -static inline int16_t __rte_hot -cnxk_ep_rx_pkts_to_process(struct otx_ep_droq *droq, uint16_t nb_pkts) -{ - if (droq->pkts_pending < nb_pkts) - cnxk_ep_check_rx_pkts(droq); - - return RTE_MIN(nb_pkts, droq->pkts_pending); -} - -static __rte_always_inline void -cnxk_ep_process_pkts_scalar(struct rte_mbuf **rx_pkts, struct otx_ep_droq *droq, uint16_t new_pkts) -{ - struct rte_mbuf **recv_buf_list = droq->recv_buf_list; - uint32_t bytes_rsvd = 0, read_idx = droq->read_idx; - uint16_t nb_desc = droq->nb_desc; - uint16_t pkts; - - for (pkts = 0; pkts < new_pkts; pkts++) { - struct otx_ep_droq_info *info; - struct rte_mbuf *mbuf; - uint16_t pkt_len; - - rte_prefetch0(recv_buf_list[otx_ep_incr_index(read_idx, 2, nb_desc)]); - rte_prefetch0(rte_pktmbuf_mtod(recv_buf_list[otx_ep_incr_index(read_idx, - 2, nb_desc)], - void *)); - - mbuf = recv_buf_list[read_idx]; - info = rte_pktmbuf_mtod(mbuf, struct otx_ep_droq_info *); - read_idx = otx_ep_incr_index(read_idx, 1, nb_desc); - pkt_len = rte_bswap16(info->length >> 48); - mbuf->pkt_len = pkt_len; - mbuf->data_len = pkt_len; - - *(uint64_t *)&mbuf->rearm_data = droq->rearm_data; - rx_pkts[pkts] = mbuf; - bytes_rsvd += pkt_len; - } - droq->read_idx = read_idx; - - droq->refill_count += new_pkts; - droq->pkts_pending -= new_pkts; - /* Stats */ - droq->stats.pkts_received += new_pkts; - droq->stats.bytes_received += bytes_rsvd; -} +#include "cnxk_ep_rx.h" static __rte_always_inline void cnxk_ep_process_pkts_scalar_mseg(struct rte_mbuf **rx_pkts, struct otx_ep_droq *droq, diff --git a/drivers/net/octeon_ep/cnxk_ep_rx.h b/drivers/net/octeon_ep/cnxk_ep_rx.h new file mode 100644 index 0000000000..e71fc0de5c --- /dev/null +++ b/drivers/net/octeon_ep/cnxk_ep_rx.h @@ -0,0 +1,167 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2023 Marvell. + */ + +#include + +#include "otx_ep_common.h" +#include "otx2_ep_vf.h" +#include "otx_ep_rxtx.h" + +#define CNXK_EP_OQ_DESC_PER_LOOP_SSE 4 +#define CNXK_EP_OQ_DESC_PER_LOOP_AVX 8 + +static inline int +cnxk_ep_rx_refill_mbuf(struct otx_ep_droq *droq, uint32_t count) +{ + struct otx_ep_droq_desc *desc_ring = droq->desc_ring; + struct rte_mbuf **recv_buf_list = droq->recv_buf_list; + uint32_t refill_idx = droq->refill_idx; + struct rte_mbuf *buf; + uint32_t i; + int rc; + + rc = rte_pktmbuf_alloc_bulk(droq->mpool, &recv_buf_list[refill_idx], count); + if (unlikely(rc)) { + droq->stats.rx_alloc_failure++; + return rc; + } + + for (i = 0; i < count; i++) { + buf = recv_buf_list[refill_idx]; + desc_ring[refill_idx].buffer_ptr = rte_mbuf_data_iova_default(buf); + refill_idx++; + } + + droq->refill_idx = otx_ep_incr_index(droq->refill_idx, count, droq->nb_desc); + droq->refill_count -= count; + + return 0; +} + +static inline void +cnxk_ep_rx_refill(struct otx_ep_droq *droq) +{ + uint32_t desc_refilled = 0, count; + uint32_t nb_desc = droq->nb_desc; + uint32_t refill_idx = droq->refill_idx; + int rc; + + if (unlikely(droq->read_idx == refill_idx)) + return; + + if (refill_idx < droq->read_idx) { + count = droq->read_idx - refill_idx; + rc = cnxk_ep_rx_refill_mbuf(droq, count); + if (unlikely(rc)) { + droq->stats.rx_alloc_failure++; + return; + } + desc_refilled = count; + } else { + count = nb_desc - refill_idx; + rc = cnxk_ep_rx_refill_mbuf(droq, count); + if (unlikely(rc)) { + droq->stats.rx_alloc_failure++; + return; + } + + desc_refilled = count; + count = droq->read_idx; + rc = cnxk_ep_rx_refill_mbuf(droq, count); + if (unlikely(rc)) { + droq->stats.rx_alloc_failure++; + return; + } + desc_refilled += count; + } + + /* Flush the droq descriptor data to memory to be sure + * that when we update the credits the data in memory is + * accurate. + */ + rte_io_wmb(); + rte_write32(desc_refilled, droq->pkts_credit_reg); +} + +static inline uint32_t +cnxk_ep_check_rx_pkts(struct otx_ep_droq *droq) +{ + uint32_t new_pkts; + uint32_t val; + + /* Batch subtractions from the HW counter to reduce PCIe traffic + * This adds an extra local variable, but almost halves the + * number of PCIe writes. + */ + val = __atomic_load_n(droq->pkts_sent_ism, __ATOMIC_RELAXED); + new_pkts = val - droq->pkts_sent_ism_prev; + droq->pkts_sent_ism_prev = val; + + if (val > RTE_BIT32(31)) { + /* Only subtract the packet count in the HW counter + * when count above halfway to saturation. + */ + rte_write64((uint64_t)val, droq->pkts_sent_reg); + rte_mb(); + + rte_write64(OTX2_SDP_REQUEST_ISM, droq->pkts_sent_reg); + while (__atomic_load_n(droq->pkts_sent_ism, __ATOMIC_RELAXED) >= val) { + rte_write64(OTX2_SDP_REQUEST_ISM, droq->pkts_sent_reg); + rte_mb(); + } + + droq->pkts_sent_ism_prev = 0; + } + rte_write64(OTX2_SDP_REQUEST_ISM, droq->pkts_sent_reg); + droq->pkts_pending += new_pkts; + + return new_pkts; +} + +static inline int16_t __rte_hot +cnxk_ep_rx_pkts_to_process(struct otx_ep_droq *droq, uint16_t nb_pkts) +{ + if (droq->pkts_pending < nb_pkts) + cnxk_ep_check_rx_pkts(droq); + + return RTE_MIN(nb_pkts, droq->pkts_pending); +} + +static __rte_always_inline void +cnxk_ep_process_pkts_scalar(struct rte_mbuf **rx_pkts, struct otx_ep_droq *droq, uint16_t new_pkts) +{ + struct rte_mbuf **recv_buf_list = droq->recv_buf_list; + uint32_t bytes_rsvd = 0, read_idx = droq->read_idx; + uint16_t nb_desc = droq->nb_desc; + uint16_t pkts; + + for (pkts = 0; pkts < new_pkts; pkts++) { + struct otx_ep_droq_info *info; + struct rte_mbuf *mbuf; + uint16_t pkt_len; + + rte_prefetch0(recv_buf_list[otx_ep_incr_index(read_idx, 2, nb_desc)]); + rte_prefetch0(rte_pktmbuf_mtod(recv_buf_list[otx_ep_incr_index(read_idx, + 2, nb_desc)], + void *)); + + mbuf = recv_buf_list[read_idx]; + info = rte_pktmbuf_mtod(mbuf, struct otx_ep_droq_info *); + read_idx = otx_ep_incr_index(read_idx, 1, nb_desc); + pkt_len = rte_bswap16(info->length >> 48); + mbuf->pkt_len = pkt_len; + mbuf->data_len = pkt_len; + + *(uint64_t *)&mbuf->rearm_data = droq->rearm_data; + rx_pkts[pkts] = mbuf; + bytes_rsvd += pkt_len; + } + droq->read_idx = read_idx; + + droq->refill_count += new_pkts; + droq->pkts_pending -= new_pkts; + /* Stats */ + droq->stats.pkts_received += new_pkts; + droq->stats.bytes_received += bytes_rsvd; +} diff --git a/drivers/net/octeon_ep/cnxk_ep_rx_sse.c b/drivers/net/octeon_ep/cnxk_ep_rx_sse.c new file mode 100644 index 0000000000..afa3616caa --- /dev/null +++ b/drivers/net/octeon_ep/cnxk_ep_rx_sse.c @@ -0,0 +1,130 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2023 Marvell. + */ + +#include "cnxk_ep_rx.h" + +static __rte_always_inline uint32_t +hadd(__m128i x) +{ + __m128i hi64 = _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)); + __m128i sum64 = _mm_add_epi32(hi64, x); + __m128i hi32 = _mm_shufflelo_epi16(sum64, _MM_SHUFFLE(1, 0, 3, 2)); + __m128i sum32 = _mm_add_epi32(sum64, hi32); + return _mm_cvtsi128_si32(sum32); +} + +static __rte_always_inline void +cnxk_ep_process_pkts_vec_sse(struct rte_mbuf **rx_pkts, struct otx_ep_droq *droq, uint16_t new_pkts) +{ + struct rte_mbuf **recv_buf_list = droq->recv_buf_list; + uint32_t bytes_rsvd = 0, read_idx = droq->read_idx; + uint32_t idx0, idx1, idx2, idx3; + struct rte_mbuf *m0, *m1, *m2, *m3; + uint16_t nb_desc = droq->nb_desc; + uint16_t pkts = 0; + + idx0 = read_idx; + while (pkts < new_pkts) { + const __m128i bswap_mask = _mm_set_epi8(0xFF, 0xFF, 12, 13, 0xFF, 0xFF, 8, 9, 0xFF, + 0xFF, 4, 5, 0xFF, 0xFF, 0, 1); + const __m128i cpy_mask = _mm_set_epi8(0xFF, 0xFF, 9, 8, 0xFF, 0xFF, 9, 8, 0xFF, + 0xFF, 1, 0, 0xFF, 0xFF, 1, 0); + __m128i s01, s23; + + idx1 = otx_ep_incr_index(idx0, 1, nb_desc); + idx2 = otx_ep_incr_index(idx1, 1, nb_desc); + idx3 = otx_ep_incr_index(idx2, 1, nb_desc); + + m0 = recv_buf_list[idx0]; + m1 = recv_buf_list[idx1]; + m2 = recv_buf_list[idx2]; + m3 = recv_buf_list[idx3]; + + /* Load packet size big-endian. */ + s01 = _mm_set_epi32(rte_pktmbuf_mtod(m3, struct otx_ep_droq_info *)->length >> 48, + rte_pktmbuf_mtod(m1, struct otx_ep_droq_info *)->length >> 48, + rte_pktmbuf_mtod(m2, struct otx_ep_droq_info *)->length >> 48, + rte_pktmbuf_mtod(m0, struct otx_ep_droq_info *)->length >> 48); + /* Convert to littel-endian. */ + s01 = _mm_shuffle_epi8(s01, bswap_mask); + /* Horizontal add. */ + bytes_rsvd += hadd(s01); + /* Segregate to packet length and data length. */ + s23 = _mm_shuffle_epi32(s01, _MM_SHUFFLE(3, 3, 1, 1)); + s01 = _mm_shuffle_epi8(s01, cpy_mask); + s23 = _mm_shuffle_epi8(s23, cpy_mask); + + /* Store packet length and data length to mbuf. */ + *(uint64_t *)&m0->pkt_len = ((rte_xmm_t)s01).u64[0]; + *(uint64_t *)&m1->pkt_len = ((rte_xmm_t)s01).u64[1]; + *(uint64_t *)&m2->pkt_len = ((rte_xmm_t)s23).u64[0]; + *(uint64_t *)&m3->pkt_len = ((rte_xmm_t)s23).u64[1]; + + /* Reset rearm data. */ + *(uint64_t *)&m0->rearm_data = droq->rearm_data; + *(uint64_t *)&m1->rearm_data = droq->rearm_data; + *(uint64_t *)&m2->rearm_data = droq->rearm_data; + *(uint64_t *)&m3->rearm_data = droq->rearm_data; + + rx_pkts[pkts++] = m0; + rx_pkts[pkts++] = m1; + rx_pkts[pkts++] = m2; + rx_pkts[pkts++] = m3; + idx0 = otx_ep_incr_index(idx3, 1, nb_desc); + } + droq->read_idx = idx0; + + droq->refill_count += new_pkts; + droq->pkts_pending -= new_pkts; + /* Stats */ + droq->stats.pkts_received += new_pkts; + droq->stats.bytes_received += bytes_rsvd; +} + +uint16_t __rte_noinline __rte_hot +cnxk_ep_recv_pkts_sse(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) +{ + struct otx_ep_droq *droq = (struct otx_ep_droq *)rx_queue; + uint16_t new_pkts, vpkts; + + new_pkts = cnxk_ep_rx_pkts_to_process(droq, nb_pkts); + vpkts = RTE_ALIGN_FLOOR(new_pkts, CNXK_EP_OQ_DESC_PER_LOOP_SSE); + cnxk_ep_process_pkts_vec_sse(rx_pkts, droq, vpkts); + cnxk_ep_process_pkts_scalar(&rx_pkts[vpkts], droq, new_pkts - vpkts); + + /* Refill RX buffers */ + if (droq->refill_count >= DROQ_REFILL_THRESHOLD) + cnxk_ep_rx_refill(droq); + + return new_pkts; +} + +uint16_t __rte_noinline __rte_hot +cn9k_ep_recv_pkts_sse(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) +{ + struct otx_ep_droq *droq = (struct otx_ep_droq *)rx_queue; + uint16_t new_pkts, vpkts; + + new_pkts = cnxk_ep_rx_pkts_to_process(droq, nb_pkts); + vpkts = RTE_ALIGN_FLOOR(new_pkts, CNXK_EP_OQ_DESC_PER_LOOP_SSE); + cnxk_ep_process_pkts_vec_sse(rx_pkts, droq, vpkts); + cnxk_ep_process_pkts_scalar(&rx_pkts[vpkts], droq, new_pkts - vpkts); + + /* Refill RX buffers */ + if (droq->refill_count >= DROQ_REFILL_THRESHOLD) { + cnxk_ep_rx_refill(droq); + } else { + /* SDP output goes into DROP state when output doorbell count + * goes below drop count. When door bell count is written with + * a value greater than drop count SDP output should come out + * of DROP state. Due to a race condition this is not happening. + * Writing doorbell register with 0 again may make SDP output + * come out of this state. + */ + + rte_write32(0, droq->pkts_credit_reg); + } + + return new_pkts; +} diff --git a/drivers/net/octeon_ep/meson.build b/drivers/net/octeon_ep/meson.build index 749776d70c..feba1fdf25 100644 --- a/drivers/net/octeon_ep/meson.build +++ b/drivers/net/octeon_ep/meson.build @@ -12,3 +12,14 @@ sources = files( 'cnxk_ep_rx.c', 'cnxk_ep_tx.c', ) + +if arch_subdir == 'x86' + sources += files('cnxk_ep_rx_sse.c') +endif + +extra_flags = ['-Wno-strict-aliasing'] +foreach flag: extra_flags + if cc.has_argument(flag) + cflags += flag + endif +endforeach diff --git a/drivers/net/octeon_ep/otx_ep_ethdev.c b/drivers/net/octeon_ep/otx_ep_ethdev.c index 615cbbb648..51b34cdaa0 100644 --- a/drivers/net/octeon_ep/otx_ep_ethdev.c +++ b/drivers/net/octeon_ep/otx_ep_ethdev.c @@ -52,10 +52,17 @@ otx_ep_set_rx_func(struct rte_eth_dev *eth_dev) if (otx_epvf->chip_gen == OTX_EP_CN10XX) { eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts; +#ifdef RTE_ARCH_X86 + eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts_sse; +#endif if (otx_epvf->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER) eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts_mseg; } else if (otx_epvf->chip_gen == OTX_EP_CN9XX) { eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts; +#ifdef RTE_ARCH_X86 + eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts_sse; +#endif + if (otx_epvf->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER) eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts_mseg; } else { diff --git a/drivers/net/octeon_ep/otx_ep_rxtx.h b/drivers/net/octeon_ep/otx_ep_rxtx.h index b159c32cae..efc41a8275 100644 --- a/drivers/net/octeon_ep/otx_ep_rxtx.h +++ b/drivers/net/octeon_ep/otx_ep_rxtx.h @@ -48,12 +48,18 @@ cnxk_ep_xmit_pkts_mseg(void *tx_queue, struct rte_mbuf **pkts, uint16_t nb_pkts) uint16_t cnxk_ep_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); +uint16_t +cnxk_ep_recv_pkts_sse(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); + uint16_t cnxk_ep_recv_pkts_mseg(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); uint16_t cn9k_ep_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); +uint16_t +cn9k_ep_recv_pkts_sse(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); + uint16_t cn9k_ep_recv_pkts_mseg(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); #endif /* _OTX_EP_RXTX_H_ */ From patchwork Wed Dec 6 17:24:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavan Nikhilesh Bhagavatula X-Patchwork-Id: 134889 X-Patchwork-Delegate: jerinj@marvell.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E41F04368C; Wed, 6 Dec 2023 18:24:45 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5965042E9F; Wed, 6 Dec 2023 18:24:33 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by mails.dpdk.org (Postfix) with ESMTP id 2541E42E94 for ; Wed, 6 Dec 2023 18:24:32 +0100 (CET) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3B6EtBTY020279; Wed, 6 Dec 2023 09:24:31 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=f0lvcY2FpExn08ViT1KygH4LF8PxrlGerC4MNksyk+s=; b=XawWKdHLXmbSTlRZjGKo1EOzWqOqTEpkHqGrZIhf/IYfIqlkrNfpI6g0QbK1HkbY8tGy LxzSIgifKfG4lRUSVr2JCcXnV+PSlnU4TmlDH7CBxmKZb06/WgbrzWQSLZj1L5KhEma0 ydhpKsIJaCXetC1aqh4wtimB1V7Hm3TjtMVkV0Ljq0FVjeycCZYRCwIBM+sloh5dr5dm YLHYwzFXmHU3NJMMXRINh5q+BMvErhzgFj+INteKl6VjsT1UODvKgPH6/outPjXMBwML d7DqIuXmtgLxkKrgfiLW3PA3pYi2MLBD8rHv1sFpKggi+kU8OpOmvMcGXeUTrz2uDYnP KA== Received: from dc5-exch01.marvell.com ([199.233.59.181]) by mx0b-0016f401.pphosted.com (PPS) with ESMTPS id 3utu530mey-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Wed, 06 Dec 2023 09:24:31 -0800 Received: from DC5-EXCH02.marvell.com (10.69.176.39) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Wed, 6 Dec 2023 09:24:29 -0800 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server id 15.0.1497.48 via Frontend Transport; Wed, 6 Dec 2023 09:24:29 -0800 Received: from MININT-80QBFE8.corp.innovium.com (MININT-80QBFE8.marvell.com [10.28.164.106]) by maili.marvell.com (Postfix) with ESMTP id D1FCF3F704E; Wed, 6 Dec 2023 09:24:26 -0800 (PST) From: To: , Bruce Richardson , Konstantin Ananyev , Vamsi Attunuru CC: , Pavan Nikhilesh Subject: [PATCH v3 3/3] net/octeon_ep: use AVX2 instructions for Rx Date: Wed, 6 Dec 2023 22:54:19 +0530 Message-ID: <20231206172419.878-3-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231206172419.878-1-pbhagavatula@marvell.com> References: <20231125160349.2021-1-pbhagavatula@marvell.com> <20231206172419.878-1-pbhagavatula@marvell.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: i61Jqk5MhINRLAyGsZ_hy-ZJEBrnSiMO X-Proofpoint-GUID: i61Jqk5MhINRLAyGsZ_hy-ZJEBrnSiMO X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-06_15,2023-12-06_01,2023-05-22_02 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Pavan Nikhilesh Optimize Rx routine to use AVX2 instructions when underlying architecture supports it. Signed-off-by: Pavan Nikhilesh --- drivers/net/octeon_ep/cnxk_ep_rx_avx.c | 123 +++++++++++++++++++++++++ drivers/net/octeon_ep/meson.build | 12 +++ drivers/net/octeon_ep/otx_ep_ethdev.c | 10 ++ drivers/net/octeon_ep/otx_ep_rxtx.h | 6 ++ 4 files changed, 151 insertions(+) create mode 100644 drivers/net/octeon_ep/cnxk_ep_rx_avx.c diff --git a/drivers/net/octeon_ep/cnxk_ep_rx_avx.c b/drivers/net/octeon_ep/cnxk_ep_rx_avx.c new file mode 100644 index 0000000000..ae4615e6da --- /dev/null +++ b/drivers/net/octeon_ep/cnxk_ep_rx_avx.c @@ -0,0 +1,123 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2023 Marvell. + */ + +#include "cnxk_ep_rx.h" + +static __rte_always_inline void +cnxk_ep_process_pkts_vec_avx(struct rte_mbuf **rx_pkts, struct otx_ep_droq *droq, uint16_t new_pkts) +{ + struct rte_mbuf **recv_buf_list = droq->recv_buf_list; + uint32_t bytes_rsvd = 0, read_idx = droq->read_idx; + const uint64_t rearm_data = droq->rearm_data; + struct rte_mbuf *m[CNXK_EP_OQ_DESC_PER_LOOP_AVX]; + uint32_t pidx[CNXK_EP_OQ_DESC_PER_LOOP_AVX]; + uint32_t idx[CNXK_EP_OQ_DESC_PER_LOOP_AVX]; + uint16_t nb_desc = droq->nb_desc; + uint16_t pkts = 0; + uint8_t i; + + idx[0] = read_idx; + while (pkts < new_pkts) { + __m256i data[CNXK_EP_OQ_DESC_PER_LOOP_AVX]; + /* mask to shuffle from desc. to mbuf (2 descriptors)*/ + const __m256i mask = + _mm256_set_epi8(0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 20, 21, 0xFF, 0xFF, 20, + 21, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, + 0xFF, 0xFF, 0xFF, 7, 6, 5, 4, 3, 2, 1, 0); + + /* Load indexes. */ + for (i = 1; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) + idx[i] = otx_ep_incr_index(idx[i - 1], 1, nb_desc); + + /* Prefetch next indexes. */ + if (new_pkts - pkts > 8) { + pidx[0] = otx_ep_incr_index(idx[i - 1], 1, nb_desc); + for (i = 1; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) + pidx[i] = otx_ep_incr_index(pidx[i - 1], 1, nb_desc); + + for (i = 0; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) { + rte_prefetch0(recv_buf_list[pidx[i]]); + rte_prefetch0(rte_pktmbuf_mtod(recv_buf_list[pidx[i]], void *)); + } + } + + /* Load mbuf array. */ + for (i = 0; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) + m[i] = recv_buf_list[idx[i]]; + + /* Load rearm data and packet length for shuffle. */ + for (i = 0; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) + data[i] = _mm256_set_epi64x(0, + rte_pktmbuf_mtod(m[i], struct otx_ep_droq_info *)->length >> 16, + 0, rearm_data); + + /* Shuffle data to its place and sum the packet length. */ + for (i = 0; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) { + data[i] = _mm256_shuffle_epi8(data[i], mask); + bytes_rsvd += _mm256_extract_epi16(data[i], 10); + } + + /* Store the 256bit data to the mbuf. */ + for (i = 0; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) + _mm256_storeu_si256((__m256i *)&m[i]->rearm_data, data[i]); + + for (i = 0; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) + rx_pkts[pkts++] = m[i]; + idx[0] = otx_ep_incr_index(idx[i - 1], 1, nb_desc); + } + droq->read_idx = idx[0]; + + droq->refill_count += new_pkts; + droq->pkts_pending -= new_pkts; + /* Stats */ + droq->stats.pkts_received += new_pkts; + droq->stats.bytes_received += bytes_rsvd; +} + +uint16_t __rte_noinline __rte_hot +cnxk_ep_recv_pkts_avx(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) +{ + struct otx_ep_droq *droq = (struct otx_ep_droq *)rx_queue; + uint16_t new_pkts, vpkts; + + new_pkts = cnxk_ep_rx_pkts_to_process(droq, nb_pkts); + vpkts = RTE_ALIGN_FLOOR(new_pkts, CNXK_EP_OQ_DESC_PER_LOOP_AVX); + cnxk_ep_process_pkts_vec_avx(rx_pkts, droq, vpkts); + cnxk_ep_process_pkts_scalar(&rx_pkts[vpkts], droq, new_pkts - vpkts); + + /* Refill RX buffers */ + if (droq->refill_count >= DROQ_REFILL_THRESHOLD) + cnxk_ep_rx_refill(droq); + + return new_pkts; +} + +uint16_t __rte_noinline __rte_hot +cn9k_ep_recv_pkts_avx(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) +{ + struct otx_ep_droq *droq = (struct otx_ep_droq *)rx_queue; + uint16_t new_pkts, vpkts; + + new_pkts = cnxk_ep_rx_pkts_to_process(droq, nb_pkts); + vpkts = RTE_ALIGN_FLOOR(new_pkts, CNXK_EP_OQ_DESC_PER_LOOP_AVX); + cnxk_ep_process_pkts_vec_avx(rx_pkts, droq, vpkts); + cnxk_ep_process_pkts_scalar(&rx_pkts[vpkts], droq, new_pkts - vpkts); + + /* Refill RX buffers */ + if (droq->refill_count >= DROQ_REFILL_THRESHOLD) { + cnxk_ep_rx_refill(droq); + } else { + /* SDP output goes into DROP state when output doorbell count + * goes below drop count. When door bell count is written with + * a value greater than drop count SDP output should come out + * of DROP state. Due to a race condition this is not happening. + * Writing doorbell register with 0 again may make SDP output + * come out of this state. + */ + + rte_write32(0, droq->pkts_credit_reg); + } + + return new_pkts; +} diff --git a/drivers/net/octeon_ep/meson.build b/drivers/net/octeon_ep/meson.build index feba1fdf25..e8ae56018d 100644 --- a/drivers/net/octeon_ep/meson.build +++ b/drivers/net/octeon_ep/meson.build @@ -15,6 +15,18 @@ sources = files( if arch_subdir == 'x86' sources += files('cnxk_ep_rx_sse.c') + if cc.get_define('__AVX2__', args: machine_args) != '' + cflags += ['-DCC_AVX2_SUPPORT'] + sources += files('cnxk_ep_rx_avx.c') + elif cc.has_argument('-mavx2') + cflags += ['-DCC_AVX2_SUPPORT'] + otx_ep_avx2_lib = static_library('otx_ep_avx2_lib', + 'cnxk_ep_rx_avx.c', + dependencies: [static_rte_ethdev, static_rte_pci, static_rte_bus_pci], + include_directories: includes, + c_args: [cflags, '-mavx2']) + objs += otx_ep_avx2_lib.extract_objects('cnxk_ep_rx_avx.c') + endif endif extra_flags = ['-Wno-strict-aliasing'] diff --git a/drivers/net/octeon_ep/otx_ep_ethdev.c b/drivers/net/octeon_ep/otx_ep_ethdev.c index 51b34cdaa0..42a97ea110 100644 --- a/drivers/net/octeon_ep/otx_ep_ethdev.c +++ b/drivers/net/octeon_ep/otx_ep_ethdev.c @@ -54,6 +54,11 @@ otx_ep_set_rx_func(struct rte_eth_dev *eth_dev) eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts; #ifdef RTE_ARCH_X86 eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts_sse; +#ifdef CC_AVX2_SUPPORT + if (rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_256 && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2) == 1) + eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts_avx; +#endif #endif if (otx_epvf->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER) eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts_mseg; @@ -61,6 +66,11 @@ otx_ep_set_rx_func(struct rte_eth_dev *eth_dev) eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts; #ifdef RTE_ARCH_X86 eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts_sse; +#ifdef CC_AVX2_SUPPORT + if (rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_256 && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2) == 1) + eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts_avx; +#endif #endif if (otx_epvf->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER) diff --git a/drivers/net/octeon_ep/otx_ep_rxtx.h b/drivers/net/octeon_ep/otx_ep_rxtx.h index efc41a8275..0adcbc7814 100644 --- a/drivers/net/octeon_ep/otx_ep_rxtx.h +++ b/drivers/net/octeon_ep/otx_ep_rxtx.h @@ -51,6 +51,9 @@ cnxk_ep_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); uint16_t cnxk_ep_recv_pkts_sse(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); +uint16_t +cnxk_ep_recv_pkts_avx(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); + uint16_t cnxk_ep_recv_pkts_mseg(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); @@ -60,6 +63,9 @@ cn9k_ep_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); uint16_t cn9k_ep_recv_pkts_sse(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); +uint16_t +cn9k_ep_recv_pkts_avx(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); + uint16_t cn9k_ep_recv_pkts_mseg(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); #endif /* _OTX_EP_RXTX_H_ */