From patchwork Wed Dec 6 17:24:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavan Nikhilesh Bhagavatula X-Patchwork-Id: 134889 X-Patchwork-Delegate: jerinj@marvell.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E41F04368C; Wed, 6 Dec 2023 18:24:45 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5965042E9F; Wed, 6 Dec 2023 18:24:33 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by mails.dpdk.org (Postfix) with ESMTP id 2541E42E94 for ; Wed, 6 Dec 2023 18:24:32 +0100 (CET) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3B6EtBTY020279; Wed, 6 Dec 2023 09:24:31 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=f0lvcY2FpExn08ViT1KygH4LF8PxrlGerC4MNksyk+s=; b=XawWKdHLXmbSTlRZjGKo1EOzWqOqTEpkHqGrZIhf/IYfIqlkrNfpI6g0QbK1HkbY8tGy LxzSIgifKfG4lRUSVr2JCcXnV+PSlnU4TmlDH7CBxmKZb06/WgbrzWQSLZj1L5KhEma0 ydhpKsIJaCXetC1aqh4wtimB1V7Hm3TjtMVkV0Ljq0FVjeycCZYRCwIBM+sloh5dr5dm YLHYwzFXmHU3NJMMXRINh5q+BMvErhzgFj+INteKl6VjsT1UODvKgPH6/outPjXMBwML d7DqIuXmtgLxkKrgfiLW3PA3pYi2MLBD8rHv1sFpKggi+kU8OpOmvMcGXeUTrz2uDYnP KA== Received: from dc5-exch01.marvell.com ([199.233.59.181]) by mx0b-0016f401.pphosted.com (PPS) with ESMTPS id 3utu530mey-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Wed, 06 Dec 2023 09:24:31 -0800 Received: from DC5-EXCH02.marvell.com (10.69.176.39) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Wed, 6 Dec 2023 09:24:29 -0800 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server id 15.0.1497.48 via Frontend Transport; Wed, 6 Dec 2023 09:24:29 -0800 Received: from MININT-80QBFE8.corp.innovium.com (MININT-80QBFE8.marvell.com [10.28.164.106]) by maili.marvell.com (Postfix) with ESMTP id D1FCF3F704E; Wed, 6 Dec 2023 09:24:26 -0800 (PST) From: To: , Bruce Richardson , Konstantin Ananyev , Vamsi Attunuru CC: , Pavan Nikhilesh Subject: [PATCH v3 3/3] net/octeon_ep: use AVX2 instructions for Rx Date: Wed, 6 Dec 2023 22:54:19 +0530 Message-ID: <20231206172419.878-3-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231206172419.878-1-pbhagavatula@marvell.com> References: <20231125160349.2021-1-pbhagavatula@marvell.com> <20231206172419.878-1-pbhagavatula@marvell.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: i61Jqk5MhINRLAyGsZ_hy-ZJEBrnSiMO X-Proofpoint-GUID: i61Jqk5MhINRLAyGsZ_hy-ZJEBrnSiMO X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-06_15,2023-12-06_01,2023-05-22_02 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Pavan Nikhilesh Optimize Rx routine to use AVX2 instructions when underlying architecture supports it. Signed-off-by: Pavan Nikhilesh --- drivers/net/octeon_ep/cnxk_ep_rx_avx.c | 123 +++++++++++++++++++++++++ drivers/net/octeon_ep/meson.build | 12 +++ drivers/net/octeon_ep/otx_ep_ethdev.c | 10 ++ drivers/net/octeon_ep/otx_ep_rxtx.h | 6 ++ 4 files changed, 151 insertions(+) create mode 100644 drivers/net/octeon_ep/cnxk_ep_rx_avx.c diff --git a/drivers/net/octeon_ep/cnxk_ep_rx_avx.c b/drivers/net/octeon_ep/cnxk_ep_rx_avx.c new file mode 100644 index 0000000000..ae4615e6da --- /dev/null +++ b/drivers/net/octeon_ep/cnxk_ep_rx_avx.c @@ -0,0 +1,123 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2023 Marvell. + */ + +#include "cnxk_ep_rx.h" + +static __rte_always_inline void +cnxk_ep_process_pkts_vec_avx(struct rte_mbuf **rx_pkts, struct otx_ep_droq *droq, uint16_t new_pkts) +{ + struct rte_mbuf **recv_buf_list = droq->recv_buf_list; + uint32_t bytes_rsvd = 0, read_idx = droq->read_idx; + const uint64_t rearm_data = droq->rearm_data; + struct rte_mbuf *m[CNXK_EP_OQ_DESC_PER_LOOP_AVX]; + uint32_t pidx[CNXK_EP_OQ_DESC_PER_LOOP_AVX]; + uint32_t idx[CNXK_EP_OQ_DESC_PER_LOOP_AVX]; + uint16_t nb_desc = droq->nb_desc; + uint16_t pkts = 0; + uint8_t i; + + idx[0] = read_idx; + while (pkts < new_pkts) { + __m256i data[CNXK_EP_OQ_DESC_PER_LOOP_AVX]; + /* mask to shuffle from desc. to mbuf (2 descriptors)*/ + const __m256i mask = + _mm256_set_epi8(0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 20, 21, 0xFF, 0xFF, 20, + 21, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, + 0xFF, 0xFF, 0xFF, 7, 6, 5, 4, 3, 2, 1, 0); + + /* Load indexes. */ + for (i = 1; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) + idx[i] = otx_ep_incr_index(idx[i - 1], 1, nb_desc); + + /* Prefetch next indexes. */ + if (new_pkts - pkts > 8) { + pidx[0] = otx_ep_incr_index(idx[i - 1], 1, nb_desc); + for (i = 1; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) + pidx[i] = otx_ep_incr_index(pidx[i - 1], 1, nb_desc); + + for (i = 0; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) { + rte_prefetch0(recv_buf_list[pidx[i]]); + rte_prefetch0(rte_pktmbuf_mtod(recv_buf_list[pidx[i]], void *)); + } + } + + /* Load mbuf array. */ + for (i = 0; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) + m[i] = recv_buf_list[idx[i]]; + + /* Load rearm data and packet length for shuffle. */ + for (i = 0; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) + data[i] = _mm256_set_epi64x(0, + rte_pktmbuf_mtod(m[i], struct otx_ep_droq_info *)->length >> 16, + 0, rearm_data); + + /* Shuffle data to its place and sum the packet length. */ + for (i = 0; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) { + data[i] = _mm256_shuffle_epi8(data[i], mask); + bytes_rsvd += _mm256_extract_epi16(data[i], 10); + } + + /* Store the 256bit data to the mbuf. */ + for (i = 0; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) + _mm256_storeu_si256((__m256i *)&m[i]->rearm_data, data[i]); + + for (i = 0; i < CNXK_EP_OQ_DESC_PER_LOOP_AVX; i++) + rx_pkts[pkts++] = m[i]; + idx[0] = otx_ep_incr_index(idx[i - 1], 1, nb_desc); + } + droq->read_idx = idx[0]; + + droq->refill_count += new_pkts; + droq->pkts_pending -= new_pkts; + /* Stats */ + droq->stats.pkts_received += new_pkts; + droq->stats.bytes_received += bytes_rsvd; +} + +uint16_t __rte_noinline __rte_hot +cnxk_ep_recv_pkts_avx(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) +{ + struct otx_ep_droq *droq = (struct otx_ep_droq *)rx_queue; + uint16_t new_pkts, vpkts; + + new_pkts = cnxk_ep_rx_pkts_to_process(droq, nb_pkts); + vpkts = RTE_ALIGN_FLOOR(new_pkts, CNXK_EP_OQ_DESC_PER_LOOP_AVX); + cnxk_ep_process_pkts_vec_avx(rx_pkts, droq, vpkts); + cnxk_ep_process_pkts_scalar(&rx_pkts[vpkts], droq, new_pkts - vpkts); + + /* Refill RX buffers */ + if (droq->refill_count >= DROQ_REFILL_THRESHOLD) + cnxk_ep_rx_refill(droq); + + return new_pkts; +} + +uint16_t __rte_noinline __rte_hot +cn9k_ep_recv_pkts_avx(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) +{ + struct otx_ep_droq *droq = (struct otx_ep_droq *)rx_queue; + uint16_t new_pkts, vpkts; + + new_pkts = cnxk_ep_rx_pkts_to_process(droq, nb_pkts); + vpkts = RTE_ALIGN_FLOOR(new_pkts, CNXK_EP_OQ_DESC_PER_LOOP_AVX); + cnxk_ep_process_pkts_vec_avx(rx_pkts, droq, vpkts); + cnxk_ep_process_pkts_scalar(&rx_pkts[vpkts], droq, new_pkts - vpkts); + + /* Refill RX buffers */ + if (droq->refill_count >= DROQ_REFILL_THRESHOLD) { + cnxk_ep_rx_refill(droq); + } else { + /* SDP output goes into DROP state when output doorbell count + * goes below drop count. When door bell count is written with + * a value greater than drop count SDP output should come out + * of DROP state. Due to a race condition this is not happening. + * Writing doorbell register with 0 again may make SDP output + * come out of this state. + */ + + rte_write32(0, droq->pkts_credit_reg); + } + + return new_pkts; +} diff --git a/drivers/net/octeon_ep/meson.build b/drivers/net/octeon_ep/meson.build index feba1fdf25..e8ae56018d 100644 --- a/drivers/net/octeon_ep/meson.build +++ b/drivers/net/octeon_ep/meson.build @@ -15,6 +15,18 @@ sources = files( if arch_subdir == 'x86' sources += files('cnxk_ep_rx_sse.c') + if cc.get_define('__AVX2__', args: machine_args) != '' + cflags += ['-DCC_AVX2_SUPPORT'] + sources += files('cnxk_ep_rx_avx.c') + elif cc.has_argument('-mavx2') + cflags += ['-DCC_AVX2_SUPPORT'] + otx_ep_avx2_lib = static_library('otx_ep_avx2_lib', + 'cnxk_ep_rx_avx.c', + dependencies: [static_rte_ethdev, static_rte_pci, static_rte_bus_pci], + include_directories: includes, + c_args: [cflags, '-mavx2']) + objs += otx_ep_avx2_lib.extract_objects('cnxk_ep_rx_avx.c') + endif endif extra_flags = ['-Wno-strict-aliasing'] diff --git a/drivers/net/octeon_ep/otx_ep_ethdev.c b/drivers/net/octeon_ep/otx_ep_ethdev.c index 51b34cdaa0..42a97ea110 100644 --- a/drivers/net/octeon_ep/otx_ep_ethdev.c +++ b/drivers/net/octeon_ep/otx_ep_ethdev.c @@ -54,6 +54,11 @@ otx_ep_set_rx_func(struct rte_eth_dev *eth_dev) eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts; #ifdef RTE_ARCH_X86 eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts_sse; +#ifdef CC_AVX2_SUPPORT + if (rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_256 && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2) == 1) + eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts_avx; +#endif #endif if (otx_epvf->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER) eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts_mseg; @@ -61,6 +66,11 @@ otx_ep_set_rx_func(struct rte_eth_dev *eth_dev) eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts; #ifdef RTE_ARCH_X86 eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts_sse; +#ifdef CC_AVX2_SUPPORT + if (rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_256 && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2) == 1) + eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts_avx; +#endif #endif if (otx_epvf->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER) diff --git a/drivers/net/octeon_ep/otx_ep_rxtx.h b/drivers/net/octeon_ep/otx_ep_rxtx.h index efc41a8275..0adcbc7814 100644 --- a/drivers/net/octeon_ep/otx_ep_rxtx.h +++ b/drivers/net/octeon_ep/otx_ep_rxtx.h @@ -51,6 +51,9 @@ cnxk_ep_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); uint16_t cnxk_ep_recv_pkts_sse(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); +uint16_t +cnxk_ep_recv_pkts_avx(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); + uint16_t cnxk_ep_recv_pkts_mseg(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); @@ -60,6 +63,9 @@ cn9k_ep_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); uint16_t cn9k_ep_recv_pkts_sse(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); +uint16_t +cn9k_ep_recv_pkts_avx(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); + uint16_t cn9k_ep_recv_pkts_mseg(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); #endif /* _OTX_EP_RXTX_H_ */