From patchwork Wed Dec 6 17:24:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavan Nikhilesh Bhagavatula X-Patchwork-Id: 134888 X-Patchwork-Delegate: jerinj@marvell.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6F8AE4368C; Wed, 6 Dec 2023 18:24:37 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id DB46342E92; Wed, 6 Dec 2023 18:24:31 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by mails.dpdk.org (Postfix) with ESMTP id 901D74021E for ; Wed, 6 Dec 2023 18:24:29 +0100 (CET) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3B6CCCgi032691; Wed, 6 Dec 2023 09:24:28 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=QmEAKUdMzZjSwoDEOX6t2b3IRiEDS9HdtOQj4cgJ6OI=; b=VJZ+lVZmbjwAeP4nCkrzkthnPA+TWWfbEqter9/koyaDd0/qdmY4zEupzwpXDp9XR926 Kf/hk8aZPdMaAG8WGBgy9umnTSdhjNYe6WB9pk9gMid7Q6Yv5gx82rzsztk5PNtNJHRC c3yI7DJRGVcbu9SS7vc1vcPFXzjW2G05gSdFUI2KjL9eeHfxB8nYcrzms/oJRdJ8KrBh uilo8V/AiXFrtXk4PHfXsuIdv8smCCCHrLPBbbUrALaVIgSR3jlb2kI99AY+l0w4njuY g8D2ldFWctQPU6BOeEo3f0+TKH4cM4zcFzYvvo+OWdr/58LNPWhvN5sHWIxg83ud6vU9 vQ== Received: from dc5-exch01.marvell.com ([199.233.59.181]) by mx0a-0016f401.pphosted.com (PPS) with ESMTPS id 3utd0ruc6r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Wed, 06 Dec 2023 09:24:28 -0800 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Wed, 6 Dec 2023 09:24:26 -0800 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server id 15.0.1497.48 via Frontend Transport; Wed, 6 Dec 2023 09:24:26 -0800 Received: from MININT-80QBFE8.corp.innovium.com (MININT-80QBFE8.marvell.com [10.28.164.106]) by maili.marvell.com (Postfix) with ESMTP id D67023F70A0; Wed, 6 Dec 2023 09:24:23 -0800 (PST) From: To: , Vamsi Attunuru , "Bruce Richardson" , Konstantin Ananyev CC: , Pavan Nikhilesh Subject: [PATCH v3 2/3] net/octeon_ep: use SSE instructions for Rx routine Date: Wed, 6 Dec 2023 22:54:18 +0530 Message-ID: <20231206172419.878-2-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231206172419.878-1-pbhagavatula@marvell.com> References: <20231125160349.2021-1-pbhagavatula@marvell.com> <20231206172419.878-1-pbhagavatula@marvell.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: pAqk4C88vrXPDPtpnu6H9rroGsJrdRkE X-Proofpoint-GUID: pAqk4C88vrXPDPtpnu6H9rroGsJrdRkE X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-06_15,2023-12-06_01,2023-05-22_02 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Pavan Nikhilesh Optimize Rx routine to use SSE instructions. Signed-off-by: Pavan Nikhilesh --- drivers/net/octeon_ep/cnxk_ep_rx.c | 159 +---------------------- drivers/net/octeon_ep/cnxk_ep_rx.h | 167 +++++++++++++++++++++++++ drivers/net/octeon_ep/cnxk_ep_rx_sse.c | 130 +++++++++++++++++++ drivers/net/octeon_ep/meson.build | 11 ++ drivers/net/octeon_ep/otx_ep_ethdev.c | 7 ++ drivers/net/octeon_ep/otx_ep_rxtx.h | 6 + 6 files changed, 322 insertions(+), 158 deletions(-) create mode 100644 drivers/net/octeon_ep/cnxk_ep_rx.h create mode 100644 drivers/net/octeon_ep/cnxk_ep_rx_sse.c diff --git a/drivers/net/octeon_ep/cnxk_ep_rx.c b/drivers/net/octeon_ep/cnxk_ep_rx.c index 75bb7225d2..f3e4fb27d1 100644 --- a/drivers/net/octeon_ep/cnxk_ep_rx.c +++ b/drivers/net/octeon_ep/cnxk_ep_rx.c @@ -2,164 +2,7 @@ * Copyright(C) 2023 Marvell. */ -#include "otx_ep_common.h" -#include "otx2_ep_vf.h" -#include "otx_ep_rxtx.h" - -static inline int -cnxk_ep_rx_refill_mbuf(struct otx_ep_droq *droq, uint32_t count) -{ - struct otx_ep_droq_desc *desc_ring = droq->desc_ring; - struct rte_mbuf **recv_buf_list = droq->recv_buf_list; - uint32_t refill_idx = droq->refill_idx; - struct rte_mbuf *buf; - uint32_t i; - int rc; - - rc = rte_pktmbuf_alloc_bulk(droq->mpool, &recv_buf_list[refill_idx], count); - if (unlikely(rc)) { - droq->stats.rx_alloc_failure++; - return rc; - } - - for (i = 0; i < count; i++) { - buf = recv_buf_list[refill_idx]; - desc_ring[refill_idx].buffer_ptr = rte_mbuf_data_iova_default(buf); - refill_idx++; - } - - droq->refill_idx = otx_ep_incr_index(droq->refill_idx, count, droq->nb_desc); - droq->refill_count -= count; - - return 0; -} - -static inline void -cnxk_ep_rx_refill(struct otx_ep_droq *droq) -{ - uint32_t desc_refilled = 0, count; - uint32_t nb_desc = droq->nb_desc; - uint32_t refill_idx = droq->refill_idx; - int rc; - - if (unlikely(droq->read_idx == refill_idx)) - return; - - if (refill_idx < droq->read_idx) { - count = droq->read_idx - refill_idx; - rc = cnxk_ep_rx_refill_mbuf(droq, count); - if (unlikely(rc)) { - droq->stats.rx_alloc_failure++; - return; - } - desc_refilled = count; - } else { - count = nb_desc - refill_idx; - rc = cnxk_ep_rx_refill_mbuf(droq, count); - if (unlikely(rc)) { - droq->stats.rx_alloc_failure++; - return; - } - - desc_refilled = count; - count = droq->read_idx; - rc = cnxk_ep_rx_refill_mbuf(droq, count); - if (unlikely(rc)) { - droq->stats.rx_alloc_failure++; - return; - } - desc_refilled += count; - } - - /* Flush the droq descriptor data to memory to be sure - * that when we update the credits the data in memory is - * accurate. - */ - rte_io_wmb(); - rte_write32(desc_refilled, droq->pkts_credit_reg); -} - -static inline uint32_t -cnxk_ep_check_rx_pkts(struct otx_ep_droq *droq) -{ - uint32_t new_pkts; - uint32_t val; - - /* Batch subtractions from the HW counter to reduce PCIe traffic - * This adds an extra local variable, but almost halves the - * number of PCIe writes. - */ - val = __atomic_load_n(droq->pkts_sent_ism, __ATOMIC_RELAXED); - new_pkts = val - droq->pkts_sent_ism_prev; - droq->pkts_sent_ism_prev = val; - - if (val > RTE_BIT32(31)) { - /* Only subtract the packet count in the HW counter - * when count above halfway to saturation. - */ - rte_write64((uint64_t)val, droq->pkts_sent_reg); - rte_mb(); - - rte_write64(OTX2_SDP_REQUEST_ISM, droq->pkts_sent_reg); - while (__atomic_load_n(droq->pkts_sent_ism, __ATOMIC_RELAXED) >= val) { - rte_write64(OTX2_SDP_REQUEST_ISM, droq->pkts_sent_reg); - rte_mb(); - } - - droq->pkts_sent_ism_prev = 0; - } - rte_write64(OTX2_SDP_REQUEST_ISM, droq->pkts_sent_reg); - droq->pkts_pending += new_pkts; - - return new_pkts; -} - -static inline int16_t __rte_hot -cnxk_ep_rx_pkts_to_process(struct otx_ep_droq *droq, uint16_t nb_pkts) -{ - if (droq->pkts_pending < nb_pkts) - cnxk_ep_check_rx_pkts(droq); - - return RTE_MIN(nb_pkts, droq->pkts_pending); -} - -static __rte_always_inline void -cnxk_ep_process_pkts_scalar(struct rte_mbuf **rx_pkts, struct otx_ep_droq *droq, uint16_t new_pkts) -{ - struct rte_mbuf **recv_buf_list = droq->recv_buf_list; - uint32_t bytes_rsvd = 0, read_idx = droq->read_idx; - uint16_t nb_desc = droq->nb_desc; - uint16_t pkts; - - for (pkts = 0; pkts < new_pkts; pkts++) { - struct otx_ep_droq_info *info; - struct rte_mbuf *mbuf; - uint16_t pkt_len; - - rte_prefetch0(recv_buf_list[otx_ep_incr_index(read_idx, 2, nb_desc)]); - rte_prefetch0(rte_pktmbuf_mtod(recv_buf_list[otx_ep_incr_index(read_idx, - 2, nb_desc)], - void *)); - - mbuf = recv_buf_list[read_idx]; - info = rte_pktmbuf_mtod(mbuf, struct otx_ep_droq_info *); - read_idx = otx_ep_incr_index(read_idx, 1, nb_desc); - pkt_len = rte_bswap16(info->length >> 48); - mbuf->pkt_len = pkt_len; - mbuf->data_len = pkt_len; - - *(uint64_t *)&mbuf->rearm_data = droq->rearm_data; - rx_pkts[pkts] = mbuf; - bytes_rsvd += pkt_len; - } - droq->read_idx = read_idx; - - droq->refill_count += new_pkts; - droq->pkts_pending -= new_pkts; - /* Stats */ - droq->stats.pkts_received += new_pkts; - droq->stats.bytes_received += bytes_rsvd; -} +#include "cnxk_ep_rx.h" static __rte_always_inline void cnxk_ep_process_pkts_scalar_mseg(struct rte_mbuf **rx_pkts, struct otx_ep_droq *droq, diff --git a/drivers/net/octeon_ep/cnxk_ep_rx.h b/drivers/net/octeon_ep/cnxk_ep_rx.h new file mode 100644 index 0000000000..e71fc0de5c --- /dev/null +++ b/drivers/net/octeon_ep/cnxk_ep_rx.h @@ -0,0 +1,167 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2023 Marvell. + */ + +#include + +#include "otx_ep_common.h" +#include "otx2_ep_vf.h" +#include "otx_ep_rxtx.h" + +#define CNXK_EP_OQ_DESC_PER_LOOP_SSE 4 +#define CNXK_EP_OQ_DESC_PER_LOOP_AVX 8 + +static inline int +cnxk_ep_rx_refill_mbuf(struct otx_ep_droq *droq, uint32_t count) +{ + struct otx_ep_droq_desc *desc_ring = droq->desc_ring; + struct rte_mbuf **recv_buf_list = droq->recv_buf_list; + uint32_t refill_idx = droq->refill_idx; + struct rte_mbuf *buf; + uint32_t i; + int rc; + + rc = rte_pktmbuf_alloc_bulk(droq->mpool, &recv_buf_list[refill_idx], count); + if (unlikely(rc)) { + droq->stats.rx_alloc_failure++; + return rc; + } + + for (i = 0; i < count; i++) { + buf = recv_buf_list[refill_idx]; + desc_ring[refill_idx].buffer_ptr = rte_mbuf_data_iova_default(buf); + refill_idx++; + } + + droq->refill_idx = otx_ep_incr_index(droq->refill_idx, count, droq->nb_desc); + droq->refill_count -= count; + + return 0; +} + +static inline void +cnxk_ep_rx_refill(struct otx_ep_droq *droq) +{ + uint32_t desc_refilled = 0, count; + uint32_t nb_desc = droq->nb_desc; + uint32_t refill_idx = droq->refill_idx; + int rc; + + if (unlikely(droq->read_idx == refill_idx)) + return; + + if (refill_idx < droq->read_idx) { + count = droq->read_idx - refill_idx; + rc = cnxk_ep_rx_refill_mbuf(droq, count); + if (unlikely(rc)) { + droq->stats.rx_alloc_failure++; + return; + } + desc_refilled = count; + } else { + count = nb_desc - refill_idx; + rc = cnxk_ep_rx_refill_mbuf(droq, count); + if (unlikely(rc)) { + droq->stats.rx_alloc_failure++; + return; + } + + desc_refilled = count; + count = droq->read_idx; + rc = cnxk_ep_rx_refill_mbuf(droq, count); + if (unlikely(rc)) { + droq->stats.rx_alloc_failure++; + return; + } + desc_refilled += count; + } + + /* Flush the droq descriptor data to memory to be sure + * that when we update the credits the data in memory is + * accurate. + */ + rte_io_wmb(); + rte_write32(desc_refilled, droq->pkts_credit_reg); +} + +static inline uint32_t +cnxk_ep_check_rx_pkts(struct otx_ep_droq *droq) +{ + uint32_t new_pkts; + uint32_t val; + + /* Batch subtractions from the HW counter to reduce PCIe traffic + * This adds an extra local variable, but almost halves the + * number of PCIe writes. + */ + val = __atomic_load_n(droq->pkts_sent_ism, __ATOMIC_RELAXED); + new_pkts = val - droq->pkts_sent_ism_prev; + droq->pkts_sent_ism_prev = val; + + if (val > RTE_BIT32(31)) { + /* Only subtract the packet count in the HW counter + * when count above halfway to saturation. + */ + rte_write64((uint64_t)val, droq->pkts_sent_reg); + rte_mb(); + + rte_write64(OTX2_SDP_REQUEST_ISM, droq->pkts_sent_reg); + while (__atomic_load_n(droq->pkts_sent_ism, __ATOMIC_RELAXED) >= val) { + rte_write64(OTX2_SDP_REQUEST_ISM, droq->pkts_sent_reg); + rte_mb(); + } + + droq->pkts_sent_ism_prev = 0; + } + rte_write64(OTX2_SDP_REQUEST_ISM, droq->pkts_sent_reg); + droq->pkts_pending += new_pkts; + + return new_pkts; +} + +static inline int16_t __rte_hot +cnxk_ep_rx_pkts_to_process(struct otx_ep_droq *droq, uint16_t nb_pkts) +{ + if (droq->pkts_pending < nb_pkts) + cnxk_ep_check_rx_pkts(droq); + + return RTE_MIN(nb_pkts, droq->pkts_pending); +} + +static __rte_always_inline void +cnxk_ep_process_pkts_scalar(struct rte_mbuf **rx_pkts, struct otx_ep_droq *droq, uint16_t new_pkts) +{ + struct rte_mbuf **recv_buf_list = droq->recv_buf_list; + uint32_t bytes_rsvd = 0, read_idx = droq->read_idx; + uint16_t nb_desc = droq->nb_desc; + uint16_t pkts; + + for (pkts = 0; pkts < new_pkts; pkts++) { + struct otx_ep_droq_info *info; + struct rte_mbuf *mbuf; + uint16_t pkt_len; + + rte_prefetch0(recv_buf_list[otx_ep_incr_index(read_idx, 2, nb_desc)]); + rte_prefetch0(rte_pktmbuf_mtod(recv_buf_list[otx_ep_incr_index(read_idx, + 2, nb_desc)], + void *)); + + mbuf = recv_buf_list[read_idx]; + info = rte_pktmbuf_mtod(mbuf, struct otx_ep_droq_info *); + read_idx = otx_ep_incr_index(read_idx, 1, nb_desc); + pkt_len = rte_bswap16(info->length >> 48); + mbuf->pkt_len = pkt_len; + mbuf->data_len = pkt_len; + + *(uint64_t *)&mbuf->rearm_data = droq->rearm_data; + rx_pkts[pkts] = mbuf; + bytes_rsvd += pkt_len; + } + droq->read_idx = read_idx; + + droq->refill_count += new_pkts; + droq->pkts_pending -= new_pkts; + /* Stats */ + droq->stats.pkts_received += new_pkts; + droq->stats.bytes_received += bytes_rsvd; +} diff --git a/drivers/net/octeon_ep/cnxk_ep_rx_sse.c b/drivers/net/octeon_ep/cnxk_ep_rx_sse.c new file mode 100644 index 0000000000..afa3616caa --- /dev/null +++ b/drivers/net/octeon_ep/cnxk_ep_rx_sse.c @@ -0,0 +1,130 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2023 Marvell. + */ + +#include "cnxk_ep_rx.h" + +static __rte_always_inline uint32_t +hadd(__m128i x) +{ + __m128i hi64 = _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)); + __m128i sum64 = _mm_add_epi32(hi64, x); + __m128i hi32 = _mm_shufflelo_epi16(sum64, _MM_SHUFFLE(1, 0, 3, 2)); + __m128i sum32 = _mm_add_epi32(sum64, hi32); + return _mm_cvtsi128_si32(sum32); +} + +static __rte_always_inline void +cnxk_ep_process_pkts_vec_sse(struct rte_mbuf **rx_pkts, struct otx_ep_droq *droq, uint16_t new_pkts) +{ + struct rte_mbuf **recv_buf_list = droq->recv_buf_list; + uint32_t bytes_rsvd = 0, read_idx = droq->read_idx; + uint32_t idx0, idx1, idx2, idx3; + struct rte_mbuf *m0, *m1, *m2, *m3; + uint16_t nb_desc = droq->nb_desc; + uint16_t pkts = 0; + + idx0 = read_idx; + while (pkts < new_pkts) { + const __m128i bswap_mask = _mm_set_epi8(0xFF, 0xFF, 12, 13, 0xFF, 0xFF, 8, 9, 0xFF, + 0xFF, 4, 5, 0xFF, 0xFF, 0, 1); + const __m128i cpy_mask = _mm_set_epi8(0xFF, 0xFF, 9, 8, 0xFF, 0xFF, 9, 8, 0xFF, + 0xFF, 1, 0, 0xFF, 0xFF, 1, 0); + __m128i s01, s23; + + idx1 = otx_ep_incr_index(idx0, 1, nb_desc); + idx2 = otx_ep_incr_index(idx1, 1, nb_desc); + idx3 = otx_ep_incr_index(idx2, 1, nb_desc); + + m0 = recv_buf_list[idx0]; + m1 = recv_buf_list[idx1]; + m2 = recv_buf_list[idx2]; + m3 = recv_buf_list[idx3]; + + /* Load packet size big-endian. */ + s01 = _mm_set_epi32(rte_pktmbuf_mtod(m3, struct otx_ep_droq_info *)->length >> 48, + rte_pktmbuf_mtod(m1, struct otx_ep_droq_info *)->length >> 48, + rte_pktmbuf_mtod(m2, struct otx_ep_droq_info *)->length >> 48, + rte_pktmbuf_mtod(m0, struct otx_ep_droq_info *)->length >> 48); + /* Convert to littel-endian. */ + s01 = _mm_shuffle_epi8(s01, bswap_mask); + /* Horizontal add. */ + bytes_rsvd += hadd(s01); + /* Segregate to packet length and data length. */ + s23 = _mm_shuffle_epi32(s01, _MM_SHUFFLE(3, 3, 1, 1)); + s01 = _mm_shuffle_epi8(s01, cpy_mask); + s23 = _mm_shuffle_epi8(s23, cpy_mask); + + /* Store packet length and data length to mbuf. */ + *(uint64_t *)&m0->pkt_len = ((rte_xmm_t)s01).u64[0]; + *(uint64_t *)&m1->pkt_len = ((rte_xmm_t)s01).u64[1]; + *(uint64_t *)&m2->pkt_len = ((rte_xmm_t)s23).u64[0]; + *(uint64_t *)&m3->pkt_len = ((rte_xmm_t)s23).u64[1]; + + /* Reset rearm data. */ + *(uint64_t *)&m0->rearm_data = droq->rearm_data; + *(uint64_t *)&m1->rearm_data = droq->rearm_data; + *(uint64_t *)&m2->rearm_data = droq->rearm_data; + *(uint64_t *)&m3->rearm_data = droq->rearm_data; + + rx_pkts[pkts++] = m0; + rx_pkts[pkts++] = m1; + rx_pkts[pkts++] = m2; + rx_pkts[pkts++] = m3; + idx0 = otx_ep_incr_index(idx3, 1, nb_desc); + } + droq->read_idx = idx0; + + droq->refill_count += new_pkts; + droq->pkts_pending -= new_pkts; + /* Stats */ + droq->stats.pkts_received += new_pkts; + droq->stats.bytes_received += bytes_rsvd; +} + +uint16_t __rte_noinline __rte_hot +cnxk_ep_recv_pkts_sse(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) +{ + struct otx_ep_droq *droq = (struct otx_ep_droq *)rx_queue; + uint16_t new_pkts, vpkts; + + new_pkts = cnxk_ep_rx_pkts_to_process(droq, nb_pkts); + vpkts = RTE_ALIGN_FLOOR(new_pkts, CNXK_EP_OQ_DESC_PER_LOOP_SSE); + cnxk_ep_process_pkts_vec_sse(rx_pkts, droq, vpkts); + cnxk_ep_process_pkts_scalar(&rx_pkts[vpkts], droq, new_pkts - vpkts); + + /* Refill RX buffers */ + if (droq->refill_count >= DROQ_REFILL_THRESHOLD) + cnxk_ep_rx_refill(droq); + + return new_pkts; +} + +uint16_t __rte_noinline __rte_hot +cn9k_ep_recv_pkts_sse(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) +{ + struct otx_ep_droq *droq = (struct otx_ep_droq *)rx_queue; + uint16_t new_pkts, vpkts; + + new_pkts = cnxk_ep_rx_pkts_to_process(droq, nb_pkts); + vpkts = RTE_ALIGN_FLOOR(new_pkts, CNXK_EP_OQ_DESC_PER_LOOP_SSE); + cnxk_ep_process_pkts_vec_sse(rx_pkts, droq, vpkts); + cnxk_ep_process_pkts_scalar(&rx_pkts[vpkts], droq, new_pkts - vpkts); + + /* Refill RX buffers */ + if (droq->refill_count >= DROQ_REFILL_THRESHOLD) { + cnxk_ep_rx_refill(droq); + } else { + /* SDP output goes into DROP state when output doorbell count + * goes below drop count. When door bell count is written with + * a value greater than drop count SDP output should come out + * of DROP state. Due to a race condition this is not happening. + * Writing doorbell register with 0 again may make SDP output + * come out of this state. + */ + + rte_write32(0, droq->pkts_credit_reg); + } + + return new_pkts; +} diff --git a/drivers/net/octeon_ep/meson.build b/drivers/net/octeon_ep/meson.build index 749776d70c..feba1fdf25 100644 --- a/drivers/net/octeon_ep/meson.build +++ b/drivers/net/octeon_ep/meson.build @@ -12,3 +12,14 @@ sources = files( 'cnxk_ep_rx.c', 'cnxk_ep_tx.c', ) + +if arch_subdir == 'x86' + sources += files('cnxk_ep_rx_sse.c') +endif + +extra_flags = ['-Wno-strict-aliasing'] +foreach flag: extra_flags + if cc.has_argument(flag) + cflags += flag + endif +endforeach diff --git a/drivers/net/octeon_ep/otx_ep_ethdev.c b/drivers/net/octeon_ep/otx_ep_ethdev.c index 615cbbb648..51b34cdaa0 100644 --- a/drivers/net/octeon_ep/otx_ep_ethdev.c +++ b/drivers/net/octeon_ep/otx_ep_ethdev.c @@ -52,10 +52,17 @@ otx_ep_set_rx_func(struct rte_eth_dev *eth_dev) if (otx_epvf->chip_gen == OTX_EP_CN10XX) { eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts; +#ifdef RTE_ARCH_X86 + eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts_sse; +#endif if (otx_epvf->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER) eth_dev->rx_pkt_burst = &cnxk_ep_recv_pkts_mseg; } else if (otx_epvf->chip_gen == OTX_EP_CN9XX) { eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts; +#ifdef RTE_ARCH_X86 + eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts_sse; +#endif + if (otx_epvf->rx_offloads & RTE_ETH_RX_OFFLOAD_SCATTER) eth_dev->rx_pkt_burst = &cn9k_ep_recv_pkts_mseg; } else { diff --git a/drivers/net/octeon_ep/otx_ep_rxtx.h b/drivers/net/octeon_ep/otx_ep_rxtx.h index b159c32cae..efc41a8275 100644 --- a/drivers/net/octeon_ep/otx_ep_rxtx.h +++ b/drivers/net/octeon_ep/otx_ep_rxtx.h @@ -48,12 +48,18 @@ cnxk_ep_xmit_pkts_mseg(void *tx_queue, struct rte_mbuf **pkts, uint16_t nb_pkts) uint16_t cnxk_ep_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); +uint16_t +cnxk_ep_recv_pkts_sse(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); + uint16_t cnxk_ep_recv_pkts_mseg(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); uint16_t cn9k_ep_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); +uint16_t +cn9k_ep_recv_pkts_sse(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); + uint16_t cn9k_ep_recv_pkts_mseg(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t budget); #endif /* _OTX_EP_RXTX_H_ */