From patchwork Mon Dec 11 13:43:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavan Nikhilesh Bhagavatula X-Patchwork-Id: 135025 X-Patchwork-Delegate: jerinj@marvell.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E6152436C8; Mon, 11 Dec 2023 14:43:36 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 233A042D2E; Mon, 11 Dec 2023 14:43:31 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by mails.dpdk.org (Postfix) with ESMTP id F1F5642D2B for ; Mon, 11 Dec 2023 14:43:28 +0100 (CET) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.17.1.24/8.17.1.24) with ESMTP id 3BB9IDJZ009842; Mon, 11 Dec 2023 05:43:28 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s= pfpt0220; bh=SJmXvXCn59E3no99zFswBkkE4uwxZdmfJ1UzScgQBCY=; b=h1+ KSMlvKCP89bPNKmB8dzQ0/UKjH6m5TKTYNw6jhe7SeNMnPHLuiuUxhgqTpFE+i79 cVnxJEnG46bQc8F6FBS2mbPkFi9KbkOiQnMTvfbNe1qQ6YKhGgsKI4UFZYsnxzOq DvuEgDK5ESlr075JCtkrrsHK2To491Hlpa0PFvTEDhWe7dAXhdtJ/RleaOocWBtE HAJUkEjQf9EKV3yfeCUAo1SZZwtsH1voAursIH5bwi6fhyYNcT73IY31mQpdN12Q b2JXl0B0WepND8vNGQHgRYXvM5DBeUu1Y3yP7obq+IWlErfOOHePDyNyHjlIclfo 4526tNkF6Cn5bHJ+tlQ== Received: from dc5-exch01.marvell.com ([199.233.59.181]) by mx0a-0016f401.pphosted.com (PPS) with ESMTPS id 3uwyp4gsn5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Mon, 11 Dec 2023 05:43:27 -0800 (PST) Received: from DC5-EXCH02.marvell.com (10.69.176.39) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Mon, 11 Dec 2023 05:43:26 -0800 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server id 15.0.1497.48 via Frontend Transport; Mon, 11 Dec 2023 05:43:26 -0800 Received: from MININT-80QBFE8.corp.innovium.com (MININT-80QBFE8.marvell.com [10.28.164.106]) by maili.marvell.com (Postfix) with ESMTP id 8E17F3F709C; Mon, 11 Dec 2023 05:43:23 -0800 (PST) From: To: , Bruce Richardson , Konstantin Ananyev , Vamsi Attunuru CC: , Pavan Nikhilesh Subject: [PATCH v5 3/3] net/octeon_ep: use AVX2 instructions for Rx Date: Mon, 11 Dec 2023 19:13:16 +0530 Message-ID: <20231211134316.2986-3-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231211134316.2986-1-pbhagavatula@marvell.com> References: <20231207064941.1256-1-pbhagavatula@marvell.com> <20231211134316.2986-1-pbhagavatula@marvell.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: eak1OOeYYZZ5Q9w8fFHvxswYDbs296Ss X-Proofpoint-GUID: eak1OOeYYZZ5Q9w8fFHvxswYDbs296Ss X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-09_02,2023-12-07_01,2023-05-22_02 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Pavan Nikhilesh Optimize Rx routine to use AVX2 instructions when underlying architecture supports it. Signed-off-by: Pavan Nikhilesh --- doc/guides/rel_notes/release_24_03.rst | 1 + drivers/net/octeon_ep/cnxk_ep_rx_avx.c | 123 +++++++++++++++++++++++++ drivers/net/octeon_ep/meson.build | 12 +++ drivers/net/octeon_ep/otx_ep_ethdev.c | 10 ++ drivers/net/octeon_ep/otx_ep_rxtx.h | 6 ++ 5 files changed, 152 insertions(+) create mode 100644 drivers/net/octeon_ep/cnxk_ep_rx_avx.c diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst index 2767d2a91b..b392a4f30a 100644 --- a/doc/guides/rel_notes/release_24_03.rst +++ b/doc/guides/rel_notes/release_24_03.rst @@ -60,6 +60,7 @@ New Features * Optimize mbuf rearm sequence. * Updated Tx queue mbuf free thresholds from 128 to 256 for better performance. * Added optimized SSE Rx routines. + * Added optimized AVX2 Rx routines. Removed Items diff --git a/drivers/net/octeon_ep/cnxk_ep_rx_avx.c b/drivers/net/octeon_ep/cnxk_ep_rx_avx.c new file mode 100644 index 0000000000..ae4615e6da --- /dev/null +++ b/drivers/net/octeon_ep/cnxk_ep_rx_avx.c @@ -0,0 +1,123 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2023 Marvell. + */ + +#include "cnxk_ep_rx.h" + +static __rte_always_inline void +cnxk_ep_process_pkts_vec_avx(struct rte_mbuf **rx_pkts, struct otx_ep_droq *droq, uint16_t new_pkts) +{ + struct rte_mbuf **recv_buf_list = droq->recv_buf_list; + uint32_t bytes_rsvd = 0, read_idx = droq->read_idx; + const uint64_t rearm_data = droq->rearm_data; + struct rte_mbuf *m[CNXK_EP_OQ_DESC_PER_LOOP_AVX]; + uint32_t pidx[CNXK_EP_OQ_DESC_PER_LOOP_AVX]; + uint32_t idx[CNXK_EP_OQ_DESC_PER_LOOP_AVX]; + uint16_t nb_desc = droq->nb_desc; + uint16_t pkts = 0; + uint8_t i; + + idx[0] = read_idx; + while (pkts < new_pkts) { + __m256i data[CNXK_EP_OQ_DESC_PER_LOOP_AVX]; + /* mask to shuffle from desc. to mbuf (2 descriptors)*/ + const __m256i mask = + _mm256_set_epi8(0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 20, 21, 0xFF, 0xFF, 20, + 21, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, + 0xFF, 0xFF, 0xFF, 7, 6, 5, 4, 3, 2, 1, 0); + + /* Load indexes. */ + for (i = 1; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) + idx[i] = otx_ep_incr_index(idx[i - 1], 1, nb_desc); + + /* Prefetch next indexes. */ + if (new_pkts - pkts > 8) { + pidx[0] = otx_ep_incr_index(idx[i - 1], 1, nb_desc); + for (i = 1; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) + pidx[i] = otx_ep_incr_index(pidx[i - 1], 1, nb_desc); + + for (i = 0; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) { + rte_prefetch0(recv_buf_list[pidx[i]]); + rte_prefetch0(rte_pktmbuf_mtod(recv_buf_list[pidx[i]], void *)); + } + } + + /* Load mbuf array. */ + for (i = 0; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) + m[i] = recv_buf_list[idx[i]]; + + /* Load rearm data and packet length for shuffle. */ + for (i = 0; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) + data[i] = _mm256_set_epi64x(0, + rte_pktmbuf_mtod(m[i], struct otx_ep_droq_info *)->length >> 16, + 0, rearm_data); + + /* Shuffle data to its place and sum the packet length. */ + for (i = 0; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) { + data[i] = _mm256_shuffle_epi8(data[i], mask); + bytes_rsvd += _mm256_extract_epi16(data[i], 10); + } + + /* Store the 256bit data to the mbuf. */ + for (i = 0; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) + _mm256_storeu_si256((__m256i *)&m[i]->rearm_data, data[i]); + + for (i = 0; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) + rx_pkts[pkts++] = m[i]; + idx[0] = otx_ep_incr_index(idx[i - 1], 1, nb_desc); + } + droq->read_idx = idx[0]; + + droq->refill_count += new_pkts; + droq->pkts_pending -= new_pkts; + /* Stats */ + droq->stats.pkts_received += new_pkts; + droq->stats.bytes_received += bytes_rsvd; +} + +uint16_t __rte_noinline __rte_hot +cnxk_ep_recv_pkts_avx(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) +{ + struct otx_ep_droq *droq = (struct otx_ep_droq *)rx_queue; + uint16_t new_pkts, vpkts; + + new_pkts = cnxk_ep_rx_pkts_to_process(droq, nb_pkts); + vpkts = RTE_ALIGN_FLOOR(new_pkts, CNXK_EP_OQ_DESC_PER_LOOP_AVX); + cnxk_ep_process_pkts_vec_avx(rx_pkts, droq, vpkts); + cnxk_ep_process_pkts_scalar(&rx_pkts[vpkts], droq, new_pkts - vpkts); + + /* Refill RX buffers */ + if (droq->refill_count >= DROQ_REFILL_THRESHOLD) + cnxk_ep_rx_refill(droq); + + return new_pkts; +} + +uint16_t __rte_noinline __rte_hot +cn9k_ep_recv_pkts_avx(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) +{ + struct otx_ep_droq *droq = (struct otx_ep_droq *)rx_queue; + uint16_t new_pkts, vpkts; + + new_pkts = cnxk_ep_rx_pkts_to_process(droq, nb_pkts); + vpkts = RTE_ALIGN_FLOOR(new_pkts, CNXK_EP_OQ_DESC_PER_LOOP_AVX); + cnxk_ep_process_pkts_vec_avx(rx_pkts, droq, vpkts); + cnxk_ep_process_pkts_scalar(&rx_pkts[vpkts], droq, new_pkts - vpkts); + + /* Refill RX buffers */ + if (droq->refill_count >= DROQ_REFILL_THRESHOLD) { + cnxk_ep_rx_refill(droq); + } else { + /* SDP output goes into DROP state when output doorbell count + * goes below drop count. When door bell count is written with + * a value greater than drop count SDP output should come out + * of DROP state. Due to a race condition this is not happening. + * Writing doorbell register with 0 again may make SDP output + * come out of this state. + */ + + rte_write32(0, droq->pkts_credit_reg); + } + + return new_pkts; +} diff --git a/drivers/net/octeon_ep/meson.build b/drivers/net/octeon_ep/meson.build index feba1fdf25..e8ae56018d 100644 --- a/drivers/net/octeon_ep/meson.build +++ b/drivers/net/octeon_ep/meson.build @@ -15,6 +15,18 @@ sources = files( if arch_subdir == 'x86' sources += files('cnxk_ep_rx_sse.c') + if cc.get_define('__AVX2__', args: machine_args) != '' + cflags += ['-DCC_AVX2_SUPPORT'] + sources += files('cnxk_ep_rx_avx.c') + elif cc.has_argument('-mavx2') + cflags += ['-DCC_AVX2_SUPPORT'] + otx_ep_avx2_lib = static_library('otx_ep_avx2_lib', + 'cnxk_ep_rx_avx.c', + dependencies: [static_rte_ethdev, static_rte_pci, static_rte_bus_pci], + include_directories: includes, + c_args: [cflags, '-mavx2']) + objs += otx_ep_avx2_lib.extract_objects('cnxk_ep_rx_avx.c') + endif endif extra_flags = ['-Wno-strict-aliasing'] diff --git a/drivers/net/octeon_ep/otx_ep_ethdev.c b/drivers/net/octeon_ep/otx_ep_ethdev.c index 51b34cdaa0..42a97ea110 100644 --- a/drivers/net/octeon_ep/otx_ep_ethdev.c +++ b/drivers/net/octeon_ep/otx_ep_ethdev.c @@ -54,6 +54,11 @@ otx_ep_set_rx_func(struct rte_eth_dev *eth_dev) eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts; #ifdef RTE_ARCH_X86 eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts_sse; +#ifdef CC_AVX2_SUPPORT + if (rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_256 && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2) == 1) + eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts_avx; +#endif #endif if (otx_epvf->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER) eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts_mseg; @@ -61,6 +66,11 @@ otx_ep_set_rx_func(struct rte_eth_dev *eth_dev) eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts; #ifdef RTE_ARCH_X86 eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts_sse; +#ifdef CC_AVX2_SUPPORT + if (rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_256 && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2) == 1) + eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts_avx; +#endif #endif if (otx_epvf->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER) diff --git a/drivers/net/octeon_ep/otx_ep_rxtx.h b/drivers/net/octeon_ep/otx_ep_rxtx.h index efc41a8275..0adcbc7814 100644 --- a/drivers/net/octeon_ep/otx_ep_rxtx.h +++ b/drivers/net/octeon_ep/otx_ep_rxtx.h @@ -51,6 +51,9 @@ cnxk_ep_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); uint16_t cnxk_ep_recv_pkts_sse(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); +uint16_t +cnxk_ep_recv_pkts_avx(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); + uint16_t cnxk_ep_recv_pkts_mseg(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); @@ -60,6 +63,9 @@ cn9k_ep_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); uint16_t cn9k_ep_recv_pkts_sse(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); +uint16_t +cn9k_ep_recv_pkts_avx(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); + uint16_t cn9k_ep_recv_pkts_mseg(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); #endif /* _OTX_EP_RXTX_H_ */