From patchwork Mon Jun 28 19:41:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavan Nikhilesh Bhagavatula X-Patchwork-Id: 94927 Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 98FD0A0A0C; Mon, 28 Jun 2021 21:41:55 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 102D44068A; Mon, 28 Jun 2021 21:41:55 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by mails.dpdk.org (Postfix) with ESMTP id D10204003F for ; Mon, 28 Jun 2021 21:41:53 +0200 (CEST) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 15SJesAV014608 for ; Mon, 28 Jun 2021 12:41:53 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=OcqGXaVfXWvaN6wCVsCNg5T7DHRmo2zJVJQrRYVhnds=; b=BNzNTeGSIFtqlho1+ZwRQL1Mdp49STGr/tazRfBsZ+1C0i3qwiulz9hWxu+L71AtxN81 NiQWRY3T9RPAJqtSl2SmKfDxVU6cYoPnVSFjzQNBV2suicap7P31fawHS5pyCHfDyIR9 /f9hRvGb1qpHVhb7PYplyr9UhefIR4iPRBhV406MqHMj0n6+rpBADUhgSWTrgHPIuxfN RS3fyoJf0pYgt7tIdDaugf+jte5oS1KgfJQC3iNbpDTS5AOGO6dWLGzISwQyVaLcx1mg uSf6t8rsluinJcccwwgCkqpHdQzbqlRMuhNdsMxP9UGkYHtGtFMucKqm2HtivdAbEbDQ 9w== Received: from dc5-exch01.marvell.com ([199.233.59.181]) by mx0b-0016f401.pphosted.com with ESMTP id 39f964agp9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Mon, 28 Jun 2021 12:41:52 -0700 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Mon, 28 Jun 2021 12:41:50 -0700 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server id 15.0.1497.18 via Frontend Transport; Mon, 28 Jun 2021 12:41:50 -0700 Received: from BG-LT7430.marvell.com (BG-LT7430.marvell.com [10.28.177.176]) by maili.marvell.com (Postfix) with ESMTP id E89913F7055; Mon, 28 Jun 2021 12:41:47 -0700 (PDT) From: To: , Nithin Dabilpuram , "Kiran Kumar K" , Sunil Kumar Kori , Satha Rao CC: , Pavan Nikhilesh Date: Tue, 29 Jun 2021 01:11:38 +0530 Message-ID: <20210628194144.637-1-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210620202906.10974-1-pbhagavatula@marvell.com> References: <20210620202906.10974-1-pbhagavatula@marvell.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: lvLc-QBDxwp0GeyUkoGPt1MZ14HryMGh X-Proofpoint-GUID: lvLc-QBDxwp0GeyUkoGPt1MZ14HryMGh X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-06-28_14:2021-06-25, 2021-06-28 signatures=0 Subject: [dpdk-dev] [PATCH v4 1/6] net/cnxk: add multi seg Rx vector routine X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Pavan Nikhilesh Add multi-segment Rx vector routine, form the primary mbufs using vector path switch to scalar path when extracting segments. Signed-off-by: Pavan Nikhilesh --- v4 Changes: - Split patches for easier merge. - Rebase on dpdk-next-net-mrvl. v3 Changes: - Spell check. drivers/net/cnxk/cn10k_rx.c | 31 +++++++++++------ drivers/net/cnxk/cn10k_rx.h | 51 +++++++++++++++++++++------- drivers/net/cnxk/cn10k_rx_vec_mseg.c | 17 ++++++++++ drivers/net/cnxk/cn9k_rx.c | 31 +++++++++++------ drivers/net/cnxk/cn9k_rx.h | 51 +++++++++++++++++++++------- drivers/net/cnxk/cn9k_rx_vec_mseg.c | 18 ++++++++++ drivers/net/cnxk/meson.build | 2 ++ 7 files changed, 157 insertions(+), 44 deletions(-) create mode 100644 drivers/net/cnxk/cn10k_rx_vec_mseg.c create mode 100644 drivers/net/cnxk/cn9k_rx_vec_mseg.c -- 2.17.1 diff --git a/drivers/net/cnxk/cn10k_rx.c b/drivers/net/cnxk/cn10k_rx.c index 5c956c06b..3a9fd7130 100644 --- a/drivers/net/cnxk/cn10k_rx.c +++ b/drivers/net/cnxk/cn10k_rx.c @@ -29,6 +29,8 @@ pick_rx_func(struct rte_eth_dev *eth_dev, [!!(dev->rx_offload_flags & NIX_RX_OFFLOAD_CHECKSUM_F)] [!!(dev->rx_offload_flags & NIX_RX_OFFLOAD_PTYPE_F)] [!!(dev->rx_offload_flags & NIX_RX_OFFLOAD_RSS_F)]; + + rte_atomic_thread_fence(__ATOMIC_RELEASE); } void @@ -60,20 +62,29 @@ cn10k_eth_set_rx_function(struct rte_eth_dev *eth_dev) #undef R }; - /* For PTP enabled, scalar rx function should be chosen as most of the - * PTP apps are implemented to rx burst 1 pkt. - */ - if (dev->scalar_ena || dev->rx_offloads & DEV_RX_OFFLOAD_TIMESTAMP) - pick_rx_func(eth_dev, nix_eth_rx_burst); - else - pick_rx_func(eth_dev, nix_eth_rx_vec_burst); + const eth_rx_burst_t nix_eth_rx_vec_burst_mseg[2][2][2][2][2][2] = { +#define R(name, f5, f4, f3, f2, f1, f0, flags) \ + [f5][f4][f3][f2][f1][f0] = cn10k_nix_recv_pkts_vec_mseg_##name, - if (dev->rx_offloads & DEV_RX_OFFLOAD_SCATTER) - pick_rx_func(eth_dev, nix_eth_rx_burst_mseg); + NIX_RX_FASTPATH_MODES +#undef R + }; /* Copy multi seg version with no offload for tear down sequence */ if (rte_eal_process_type() == RTE_PROC_PRIMARY) dev->rx_pkt_burst_no_offload = nix_eth_rx_burst_mseg[0][0][0][0][0][0]; - rte_mb(); + + /* For PTP enabled, scalar rx function should be chosen as most of the + * PTP apps are implemented to rx burst 1 pkt. + */ + if (dev->scalar_ena || dev->rx_offloads & DEV_RX_OFFLOAD_TIMESTAMP) { + if (dev->rx_offloads & DEV_RX_OFFLOAD_SCATTER) + return pick_rx_func(eth_dev, nix_eth_rx_burst_mseg); + return pick_rx_func(eth_dev, nix_eth_rx_burst); + } + + if (dev->rx_offloads & DEV_RX_OFFLOAD_SCATTER) + return pick_rx_func(eth_dev, nix_eth_rx_vec_burst_mseg); + return pick_rx_func(eth_dev, nix_eth_rx_vec_burst); } diff --git a/drivers/net/cnxk/cn10k_rx.h b/drivers/net/cnxk/cn10k_rx.h index 1cc37cbaa..5926ff7f4 100644 --- a/drivers/net/cnxk/cn10k_rx.h +++ b/drivers/net/cnxk/cn10k_rx.h @@ -119,8 +119,15 @@ nix_cqe_xtract_mseg(const union nix_rx_parse_u *rx, struct rte_mbuf *mbuf, sg = *(const uint64_t *)(rx + 1); nb_segs = (sg >> 48) & 0x3; - mbuf->nb_segs = nb_segs; + + if (nb_segs == 1) { + mbuf->next = NULL; + return; + } + + mbuf->pkt_len = rx->pkt_lenm1 + 1; mbuf->data_len = sg & 0xFFFF; + mbuf->nb_segs = nb_segs; sg = sg >> 16; eol = ((const rte_iova_t *)(rx + 1) + ((rx->desc_sizem1 + 1) << 1)); @@ -195,15 +202,14 @@ cn10k_nix_cqe_to_mbuf(const struct nix_cqe_hdr_s *cq, const uint32_t tag, ol_flags = nix_update_match_id(rx->match_id, ol_flags, mbuf); mbuf->ol_flags = ol_flags; - *(uint64_t *)(&mbuf->rearm_data) = val; mbuf->pkt_len = len; + mbuf->data_len = len; + *(uint64_t *)(&mbuf->rearm_data) = val; - if (flag & NIX_RX_MULTI_SEG_F) { + if (flag & NIX_RX_MULTI_SEG_F) nix_cqe_xtract_mseg(rx, mbuf, val); - } else { - mbuf->data_len = len; + else mbuf->next = NULL; - } } static inline uint16_t @@ -481,16 +487,34 @@ cn10k_nix_recv_pkts_vector(void *rx_queue, struct rte_mbuf **rx_pkts, vst1q_u64((uint64_t *)mbuf2->rearm_data, rearm2); vst1q_u64((uint64_t *)mbuf3->rearm_data, rearm3); - /* Update that no more segments */ - mbuf0->next = NULL; - mbuf1->next = NULL; - mbuf2->next = NULL; - mbuf3->next = NULL; - /* Store the mbufs to rx_pkts */ vst1q_u64((uint64_t *)&rx_pkts[packets], mbuf01); vst1q_u64((uint64_t *)&rx_pkts[packets + 2], mbuf23); + if (flags & NIX_RX_MULTI_SEG_F) { + /* Multi segment is enable build mseg list for + * individual mbufs in scalar mode. + */ + nix_cqe_xtract_mseg((union nix_rx_parse_u *) + (cq0 + CQE_SZ(0) + 8), mbuf0, + mbuf_initializer); + nix_cqe_xtract_mseg((union nix_rx_parse_u *) + (cq0 + CQE_SZ(1) + 8), mbuf1, + mbuf_initializer); + nix_cqe_xtract_mseg((union nix_rx_parse_u *) + (cq0 + CQE_SZ(2) + 8), mbuf2, + mbuf_initializer); + nix_cqe_xtract_mseg((union nix_rx_parse_u *) + (cq0 + CQE_SZ(3) + 8), mbuf3, + mbuf_initializer); + } else { + /* Update that no more segments */ + mbuf0->next = NULL; + mbuf1->next = NULL; + mbuf2->next = NULL; + mbuf3->next = NULL; + } + /* Prefetch mbufs */ roc_prefetch_store_keep(mbuf0); roc_prefetch_store_keep(mbuf1); @@ -645,6 +669,9 @@ R(vlan_ts_mark_cksum_ptype_rss, 1, 1, 1, 1, 1, 1, \ void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts); \ \ uint16_t __rte_noinline __rte_hot cn10k_nix_recv_pkts_vec_##name( \ + void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts); \ + \ + uint16_t __rte_noinline __rte_hot cn10k_nix_recv_pkts_vec_mseg_##name( \ void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts); NIX_RX_FASTPATH_MODES diff --git a/drivers/net/cnxk/cn10k_rx_vec_mseg.c b/drivers/net/cnxk/cn10k_rx_vec_mseg.c new file mode 100644 index 000000000..04d1e46c8 --- /dev/null +++ b/drivers/net/cnxk/cn10k_rx_vec_mseg.c @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2021 Marvell. + */ + +#include "cn10k_ethdev.h" +#include "cn10k_rx.h" + +#define R(name, f5, f4, f3, f2, f1, f0, flags) \ + uint16_t __rte_noinline __rte_hot cn10k_nix_recv_pkts_vec_mseg_##name( \ + void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts) \ + { \ + return cn10k_nix_recv_pkts_vector(rx_queue, rx_pkts, pkts, \ + (flags) | NIX_RX_MULTI_SEG_F); \ + } + +NIX_RX_FASTPATH_MODES +#undef R diff --git a/drivers/net/cnxk/cn9k_rx.c b/drivers/net/cnxk/cn9k_rx.c index 0acedd0a1..d293d4eac 100644 --- a/drivers/net/cnxk/cn9k_rx.c +++ b/drivers/net/cnxk/cn9k_rx.c @@ -29,6 +29,8 @@ pick_rx_func(struct rte_eth_dev *eth_dev, [!!(dev->rx_offload_flags & NIX_RX_OFFLOAD_CHECKSUM_F)] [!!(dev->rx_offload_flags & NIX_RX_OFFLOAD_PTYPE_F)] [!!(dev->rx_offload_flags & NIX_RX_OFFLOAD_RSS_F)]; + + rte_atomic_thread_fence(__ATOMIC_RELEASE); } void @@ -60,20 +62,29 @@ cn9k_eth_set_rx_function(struct rte_eth_dev *eth_dev) #undef R }; - /* For PTP enabled, scalar rx function should be chosen as most of the - * PTP apps are implemented to rx burst 1 pkt. - */ - if (dev->scalar_ena || dev->rx_offloads & DEV_RX_OFFLOAD_TIMESTAMP) - pick_rx_func(eth_dev, nix_eth_rx_burst); - else - pick_rx_func(eth_dev, nix_eth_rx_vec_burst); + const eth_rx_burst_t nix_eth_rx_vec_burst_mseg[2][2][2][2][2][2] = { +#define R(name, f5, f4, f3, f2, f1, f0, flags) \ + [f5][f4][f3][f2][f1][f0] = cn9k_nix_recv_pkts_vec_mseg_##name, - if (dev->rx_offloads & DEV_RX_OFFLOAD_SCATTER) - pick_rx_func(eth_dev, nix_eth_rx_burst_mseg); + NIX_RX_FASTPATH_MODES +#undef R + }; /* Copy multi seg version with no offload for tear down sequence */ if (rte_eal_process_type() == RTE_PROC_PRIMARY) dev->rx_pkt_burst_no_offload = nix_eth_rx_burst_mseg[0][0][0][0][0][0]; - rte_mb(); + + /* For PTP enabled, scalar rx function should be chosen as most of the + * PTP apps are implemented to rx burst 1 pkt. + */ + if (dev->scalar_ena || dev->rx_offloads & DEV_RX_OFFLOAD_TIMESTAMP) { + if (dev->rx_offloads & DEV_RX_OFFLOAD_SCATTER) + return pick_rx_func(eth_dev, nix_eth_rx_burst_mseg); + return pick_rx_func(eth_dev, nix_eth_rx_burst); + } + + if (dev->rx_offloads & DEV_RX_OFFLOAD_SCATTER) + return pick_rx_func(eth_dev, nix_eth_rx_vec_burst_mseg); + return pick_rx_func(eth_dev, nix_eth_rx_vec_burst); } diff --git a/drivers/net/cnxk/cn9k_rx.h b/drivers/net/cnxk/cn9k_rx.h index 10ef5c690..5ae9e8195 100644 --- a/drivers/net/cnxk/cn9k_rx.h +++ b/drivers/net/cnxk/cn9k_rx.h @@ -120,8 +120,15 @@ nix_cqe_xtract_mseg(const union nix_rx_parse_u *rx, struct rte_mbuf *mbuf, sg = *(const uint64_t *)(rx + 1); nb_segs = (sg >> 48) & 0x3; - mbuf->nb_segs = nb_segs; + + if (nb_segs == 1) { + mbuf->next = NULL; + return; + } + + mbuf->pkt_len = rx->pkt_lenm1 + 1; mbuf->data_len = sg & 0xFFFF; + mbuf->nb_segs = nb_segs; sg = sg >> 16; eol = ((const rte_iova_t *)(rx + 1) + @@ -198,15 +205,14 @@ cn9k_nix_cqe_to_mbuf(const struct nix_cqe_hdr_s *cq, const uint32_t tag, nix_update_match_id(rx->cn9k.match_id, ol_flags, mbuf); mbuf->ol_flags = ol_flags; - *(uint64_t *)(&mbuf->rearm_data) = val; mbuf->pkt_len = len; + mbuf->data_len = len; + *(uint64_t *)(&mbuf->rearm_data) = val; - if (flag & NIX_RX_MULTI_SEG_F) { + if (flag & NIX_RX_MULTI_SEG_F) nix_cqe_xtract_mseg(rx, mbuf, val); - } else { - mbuf->data_len = len; + else mbuf->next = NULL; - } } static inline uint16_t @@ -484,16 +490,34 @@ cn9k_nix_recv_pkts_vector(void *rx_queue, struct rte_mbuf **rx_pkts, vst1q_u64((uint64_t *)mbuf2->rearm_data, rearm2); vst1q_u64((uint64_t *)mbuf3->rearm_data, rearm3); - /* Update that no more segments */ - mbuf0->next = NULL; - mbuf1->next = NULL; - mbuf2->next = NULL; - mbuf3->next = NULL; - /* Store the mbufs to rx_pkts */ vst1q_u64((uint64_t *)&rx_pkts[packets], mbuf01); vst1q_u64((uint64_t *)&rx_pkts[packets + 2], mbuf23); + if (flags & NIX_RX_MULTI_SEG_F) { + /* Multi segment is enable build mseg list for + * individual mbufs in scalar mode. + */ + nix_cqe_xtract_mseg((union nix_rx_parse_u *) + (cq0 + CQE_SZ(0) + 8), mbuf0, + mbuf_initializer); + nix_cqe_xtract_mseg((union nix_rx_parse_u *) + (cq0 + CQE_SZ(1) + 8), mbuf1, + mbuf_initializer); + nix_cqe_xtract_mseg((union nix_rx_parse_u *) + (cq0 + CQE_SZ(2) + 8), mbuf2, + mbuf_initializer); + nix_cqe_xtract_mseg((union nix_rx_parse_u *) + (cq0 + CQE_SZ(3) + 8), mbuf3, + mbuf_initializer); + } else { + /* Update that no more segments */ + mbuf0->next = NULL; + mbuf1->next = NULL; + mbuf2->next = NULL; + mbuf3->next = NULL; + } + /* Prefetch mbufs */ roc_prefetch_store_keep(mbuf0); roc_prefetch_store_keep(mbuf1); @@ -647,6 +671,9 @@ R(vlan_ts_mark_cksum_ptype_rss, 1, 1, 1, 1, 1, 1, \ void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts); \ \ uint16_t __rte_noinline __rte_hot cn9k_nix_recv_pkts_vec_##name( \ + void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts); \ + \ + uint16_t __rte_noinline __rte_hot cn9k_nix_recv_pkts_vec_mseg_##name( \ void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts); NIX_RX_FASTPATH_MODES diff --git a/drivers/net/cnxk/cn9k_rx_vec_mseg.c b/drivers/net/cnxk/cn9k_rx_vec_mseg.c new file mode 100644 index 000000000..e46d8a474 --- /dev/null +++ b/drivers/net/cnxk/cn9k_rx_vec_mseg.c @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2021 Marvell. + */ + +#include "cn9k_ethdev.h" +#include "cn9k_rx.h" + +#define R(name, f5, f4, f3, f2, f1, f0, flags) \ + uint16_t __rte_noinline __rte_hot cn9k_nix_recv_pkts_vec_mseg_##name( \ + void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts) \ + { \ + return cn9k_nix_recv_pkts_vector(rx_queue, rx_pkts, pkts, \ + (flags) | \ + NIX_RX_MULTI_SEG_F); \ + } + +NIX_RX_FASTPATH_MODES +#undef R diff --git a/drivers/net/cnxk/meson.build b/drivers/net/cnxk/meson.build index 2071d0dcb..aa8c7253f 100644 --- a/drivers/net/cnxk/meson.build +++ b/drivers/net/cnxk/meson.build @@ -23,6 +23,7 @@ sources += files('cn9k_ethdev.c', 'cn9k_rx.c', 'cn9k_rx_mseg.c', 'cn9k_rx_vec.c', + 'cn9k_rx_vec_mseg.c', 'cn9k_tx.c', 'cn9k_tx_mseg.c', 'cn9k_tx_vec.c') @@ -32,6 +33,7 @@ sources += files('cn10k_ethdev.c', 'cn10k_rx.c', 'cn10k_rx_mseg.c', 'cn10k_rx_vec.c', + 'cn10k_rx_vec_mseg.c', 'cn10k_tx.c', 'cn10k_tx_mseg.c', 'cn10k_tx_vec.c') From patchwork Mon Jun 28 19:41:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavan Nikhilesh Bhagavatula X-Patchwork-Id: 94928 Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8DE55A0A0C; Mon, 28 Jun 2021 21:42:02 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id AFEF341157; Mon, 28 Jun 2021 21:42:00 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by mails.dpdk.org (Postfix) with ESMTP id 6108B4115E for ; Mon, 28 Jun 2021 21:41:58 +0200 (CEST) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 15SJevi9014623 for ; Mon, 28 Jun 2021 12:41:57 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=N33NrxhM1vZDC2yGYsN3MbgVDlS4eezY/a1Cxck6W64=; b=YpYhYdPyjA80IfWfKRNb9GYZvStbC5ebY6b7Q+pL/j8Oa3YqB7R2xD3FOYQQKqZkN4Jl ed+OvqDpk7JZ/EWdup+qEu0A6ENTMPWn2au+guUmenOx5B0iAjET1R7Giw62+xIcaVt2 sXpAp6UzeVnqwp74B53MWLpD6EHa+7cYgtnRudsIsTVomVykzybTTq5b/rfxz7HkM0KH 51AmjqNOL/cucOCYZ+CZ8OW918OB1wtbXndtwSHIBmEQVQVQziOGw0u2PPc8nJf8SpOb VFVItPpo5d6+EOSbdHEI5tqUEATa5B90SzY1FG4Zhal+fTs0te5OAGX1B2Nmjb2MuQGs xw== Received: from dc5-exch02.marvell.com ([199.233.59.182]) by mx0b-0016f401.pphosted.com with ESMTP id 39f964agpc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Mon, 28 Jun 2021 12:41:56 -0700 Received: from DC5-EXCH02.marvell.com (10.69.176.39) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Mon, 28 Jun 2021 12:41:54 -0700 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server id 15.0.1497.18 via Frontend Transport; Mon, 28 Jun 2021 12:41:54 -0700 Received: from BG-LT7430.marvell.com (BG-LT7430.marvell.com [10.28.177.176]) by maili.marvell.com (Postfix) with ESMTP id 1F7A43F7055; Mon, 28 Jun 2021 12:41:51 -0700 (PDT) From: To: , Nithin Dabilpuram , "Kiran Kumar K" , Sunil Kumar Kori , Satha Rao CC: , Pavan Nikhilesh Date: Tue, 29 Jun 2021 01:11:39 +0530 Message-ID: <20210628194144.637-2-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210628194144.637-1-pbhagavatula@marvell.com> References: <20210620202906.10974-1-pbhagavatula@marvell.com> <20210628194144.637-1-pbhagavatula@marvell.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: 8rniDyn5OEfWDPtEO4GgTJUqtl8c01O7 X-Proofpoint-GUID: 8rniDyn5OEfWDPtEO4GgTJUqtl8c01O7 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-06-28_14:2021-06-25, 2021-06-28 signatures=0 Subject: [dpdk-dev] [PATCH v4 2/6] net/cnxk: enable ptp processing in vector Rx X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Pavan Nikhilesh Enable PTP offload in vector Rx burst function, use vector path for processing mbufs and finally switch to scalar when extracting timestamp. Signed-off-by: Pavan Nikhilesh --- drivers/net/cnxk/cn10k_ethdev.c | 1 - drivers/net/cnxk/cn10k_rx.c | 5 +- drivers/net/cnxk/cn10k_rx.h | 124 ++++++++++++++++++++++++++++---- drivers/net/cnxk/cn10k_rx_vec.c | 3 - drivers/net/cnxk/cn9k_ethdev.c | 1 - drivers/net/cnxk/cn9k_rx.c | 5 +- drivers/net/cnxk/cn9k_rx.h | 124 ++++++++++++++++++++++++++++---- drivers/net/cnxk/cn9k_rx_vec.c | 3 - drivers/net/cnxk/cnxk_ethdev.h | 19 ++--- 9 files changed, 232 insertions(+), 53 deletions(-) diff --git a/drivers/net/cnxk/cn10k_ethdev.c b/drivers/net/cnxk/cn10k_ethdev.c index b079edbd3..7caec6cf1 100644 --- a/drivers/net/cnxk/cn10k_ethdev.c +++ b/drivers/net/cnxk/cn10k_ethdev.c @@ -301,7 +301,6 @@ nix_ptp_enable_vf(struct rte_eth_dev *eth_dev) if (nix_recalc_mtu(eth_dev)) plt_err("Failed to set MTU size for ptp"); - dev->scalar_ena = true; dev->rx_offload_flags |= NIX_RX_OFFLOAD_TSTAMP_F; /* Setting up the function pointers as per new offload flags */ diff --git a/drivers/net/cnxk/cn10k_rx.c b/drivers/net/cnxk/cn10k_rx.c index 3a9fd7130..69e767ac3 100644 --- a/drivers/net/cnxk/cn10k_rx.c +++ b/drivers/net/cnxk/cn10k_rx.c @@ -75,10 +75,7 @@ cn10k_eth_set_rx_function(struct rte_eth_dev *eth_dev) dev->rx_pkt_burst_no_offload = nix_eth_rx_burst_mseg[0][0][0][0][0][0]; - /* For PTP enabled, scalar rx function should be chosen as most of the - * PTP apps are implemented to rx burst 1 pkt. - */ - if (dev->scalar_ena || dev->rx_offloads & DEV_RX_OFFLOAD_TIMESTAMP) { + if (dev->scalar_ena) { if (dev->rx_offloads & DEV_RX_OFFLOAD_SCATTER) return pick_rx_func(eth_dev, nix_eth_rx_burst_mseg); return pick_rx_func(eth_dev, nix_eth_rx_burst); diff --git a/drivers/net/cnxk/cn10k_rx.h b/drivers/net/cnxk/cn10k_rx.h index 5926ff7f4..d9572b19e 100644 --- a/drivers/net/cnxk/cn10k_rx.h +++ b/drivers/net/cnxk/cn10k_rx.h @@ -109,7 +109,7 @@ nix_update_match_id(const uint16_t match_id, uint64_t ol_flags, static __rte_always_inline void nix_cqe_xtract_mseg(const union nix_rx_parse_u *rx, struct rte_mbuf *mbuf, - uint64_t rearm) + uint64_t rearm, const uint16_t flags) { const rte_iova_t *iova_list; struct rte_mbuf *head; @@ -125,8 +125,10 @@ nix_cqe_xtract_mseg(const union nix_rx_parse_u *rx, struct rte_mbuf *mbuf, return; } - mbuf->pkt_len = rx->pkt_lenm1 + 1; - mbuf->data_len = sg & 0xFFFF; + mbuf->pkt_len = (rx->pkt_lenm1 + 1) - (flags & NIX_RX_OFFLOAD_TSTAMP_F ? + CNXK_NIX_TIMESYNC_RX_OFFSET : 0); + mbuf->data_len = (sg & 0xFFFF) - (flags & NIX_RX_OFFLOAD_TSTAMP_F ? + CNXK_NIX_TIMESYNC_RX_OFFSET : 0); mbuf->nb_segs = nb_segs; sg = sg >> 16; @@ -207,7 +209,7 @@ cn10k_nix_cqe_to_mbuf(const struct nix_cqe_hdr_s *cq, const uint32_t tag, *(uint64_t *)(&mbuf->rearm_data) = val; if (flag & NIX_RX_MULTI_SEG_F) - nix_cqe_xtract_mseg(rx, mbuf, val); + nix_cqe_xtract_mseg(rx, mbuf, val, flag); else mbuf->next = NULL; } @@ -272,8 +274,9 @@ cn10k_nix_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts, flags); cnxk_nix_mbuf_to_tstamp(mbuf, rxq->tstamp, (flags & NIX_RX_OFFLOAD_TSTAMP_F), - (uint64_t *)((uint8_t *)mbuf + data_off) - ); + (flags & NIX_RX_MULTI_SEG_F), + (uint64_t *)((uint8_t *)mbuf + + data_off)); rx_pkts[packets++] = mbuf; roc_prefetch_store_keep(mbuf); head++; @@ -469,6 +472,99 @@ cn10k_nix_recv_pkts_vector(void *rx_queue, struct rte_mbuf **rx_pkts, mbuf3); } + if (flags & NIX_RX_OFFLOAD_TSTAMP_F) { + const uint16x8_t len_off = { + 0, /* ptype 0:15 */ + 0, /* ptype 16:32 */ + CNXK_NIX_TIMESYNC_RX_OFFSET, /* pktlen 0:15*/ + 0, /* pktlen 16:32 */ + CNXK_NIX_TIMESYNC_RX_OFFSET, /* datalen 0:15 */ + 0, + 0, + 0}; + const uint32x4_t ptype = {RTE_PTYPE_L2_ETHER_TIMESYNC, + RTE_PTYPE_L2_ETHER_TIMESYNC, + RTE_PTYPE_L2_ETHER_TIMESYNC, + RTE_PTYPE_L2_ETHER_TIMESYNC}; + const uint64_t ts_olf = PKT_RX_IEEE1588_PTP | + PKT_RX_IEEE1588_TMST | + rxq->tstamp->rx_tstamp_dynflag; + const uint32x4_t and_mask = {0x1, 0x2, 0x4, 0x8}; + uint64x2_t ts01, ts23, mask; + uint64_t ts[4]; + uint8_t res; + + /* Subtract timesync length from total pkt length. */ + f0 = vsubq_u16(f0, len_off); + f1 = vsubq_u16(f1, len_off); + f2 = vsubq_u16(f2, len_off); + f3 = vsubq_u16(f3, len_off); + + /* Get the address of actual timestamp. */ + ts01 = vaddq_u64(mbuf01, data_off); + ts23 = vaddq_u64(mbuf23, data_off); + /* Load timestamp from address. */ + ts01 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts01, + 0), + ts01, 0); + ts01 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts01, + 1), + ts01, 1); + ts23 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts23, + 0), + ts23, 0); + ts23 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts23, + 1), + ts23, 1); + /* Convert from be to cpu byteorder. */ + ts01 = vrev64q_u8(ts01); + ts23 = vrev64q_u8(ts23); + /* Store timestamp into scalar for later use. */ + ts[0] = vgetq_lane_u64(ts01, 0); + ts[1] = vgetq_lane_u64(ts01, 1); + ts[2] = vgetq_lane_u64(ts23, 0); + ts[3] = vgetq_lane_u64(ts23, 1); + + /* Store timestamp into dynfield. */ + *cnxk_nix_timestamp_dynfield(mbuf0, rxq->tstamp) = + ts[0]; + *cnxk_nix_timestamp_dynfield(mbuf1, rxq->tstamp) = + ts[1]; + *cnxk_nix_timestamp_dynfield(mbuf2, rxq->tstamp) = + ts[2]; + *cnxk_nix_timestamp_dynfield(mbuf3, rxq->tstamp) = + ts[3]; + + /* Generate ptype mask to filter L2 ether timesync */ + mask = vdupq_n_u32(vgetq_lane_u32(f0, 0)); + mask = vsetq_lane_u32(vgetq_lane_u32(f1, 0), mask, 1); + mask = vsetq_lane_u32(vgetq_lane_u32(f2, 0), mask, 2); + mask = vsetq_lane_u32(vgetq_lane_u32(f3, 0), mask, 3); + + /* Match against L2 ether timesync. */ + mask = vceqq_u32(mask, ptype); + /* Convert from vector from scalar mask */ + res = vaddvq_u32(vandq_u32(mask, and_mask)); + res &= 0xF; + + if (res) { + /* Fill in the ol_flags for any packets that + * matched. + */ + ol_flags0 |= ((res & 0x1) ? ts_olf : 0); + ol_flags1 |= ((res & 0x2) ? ts_olf : 0); + ol_flags2 |= ((res & 0x4) ? ts_olf : 0); + ol_flags3 |= ((res & 0x8) ? ts_olf : 0); + + /* Update Rxq timestamp with the latest + * timestamp. + */ + rxq->tstamp->rx_ready = 1; + rxq->tstamp->rx_tstamp = + ts[31 - __builtin_clz(res)]; + } + } + /* Form rearm_data with ol_flags */ rearm0 = vsetq_lane_u64(ol_flags0, rearm0, 1); rearm1 = vsetq_lane_u64(ol_flags1, rearm1, 1); @@ -496,17 +592,17 @@ cn10k_nix_recv_pkts_vector(void *rx_queue, struct rte_mbuf **rx_pkts, * individual mbufs in scalar mode. */ nix_cqe_xtract_mseg((union nix_rx_parse_u *) - (cq0 + CQE_SZ(0) + 8), mbuf0, - mbuf_initializer); + (cq0 + CQE_SZ(0) + 8), mbuf0, + mbuf_initializer, flags); nix_cqe_xtract_mseg((union nix_rx_parse_u *) - (cq0 + CQE_SZ(1) + 8), mbuf1, - mbuf_initializer); + (cq0 + CQE_SZ(1) + 8), mbuf1, + mbuf_initializer, flags); nix_cqe_xtract_mseg((union nix_rx_parse_u *) - (cq0 + CQE_SZ(2) + 8), mbuf2, - mbuf_initializer); + (cq0 + CQE_SZ(2) + 8), mbuf2, + mbuf_initializer, flags); nix_cqe_xtract_mseg((union nix_rx_parse_u *) - (cq0 + CQE_SZ(3) + 8), mbuf3, - mbuf_initializer); + (cq0 + CQE_SZ(3) + 8), mbuf3, + mbuf_initializer, flags); } else { /* Update that no more segments */ mbuf0->next = NULL; diff --git a/drivers/net/cnxk/cn10k_rx_vec.c b/drivers/net/cnxk/cn10k_rx_vec.c index 65ffa9784..93528a44f 100644 --- a/drivers/net/cnxk/cn10k_rx_vec.c +++ b/drivers/net/cnxk/cn10k_rx_vec.c @@ -11,9 +11,6 @@ struct rte_mbuf **rx_pkts, \ uint16_t pkts) \ { \ - /* TSTMP is not supported by vector */ \ - if ((flags) & NIX_RX_OFFLOAD_TSTAMP_F) \ - return 0; \ return cn10k_nix_recv_pkts_vector(rx_queue, rx_pkts, pkts, \ (flags)); \ } diff --git a/drivers/net/cnxk/cn9k_ethdev.c b/drivers/net/cnxk/cn9k_ethdev.c index 994fdb7c3..115e67891 100644 --- a/drivers/net/cnxk/cn9k_ethdev.c +++ b/drivers/net/cnxk/cn9k_ethdev.c @@ -309,7 +309,6 @@ nix_ptp_enable_vf(struct rte_eth_dev *eth_dev) if (nix_recalc_mtu(eth_dev)) plt_err("Failed to set MTU size for ptp"); - dev->scalar_ena = true; dev->rx_offload_flags |= NIX_RX_OFFLOAD_TSTAMP_F; /* Setting up the function pointers as per new offload flags */ diff --git a/drivers/net/cnxk/cn9k_rx.c b/drivers/net/cnxk/cn9k_rx.c index d293d4eac..7d9f1bd61 100644 --- a/drivers/net/cnxk/cn9k_rx.c +++ b/drivers/net/cnxk/cn9k_rx.c @@ -75,10 +75,7 @@ cn9k_eth_set_rx_function(struct rte_eth_dev *eth_dev) dev->rx_pkt_burst_no_offload = nix_eth_rx_burst_mseg[0][0][0][0][0][0]; - /* For PTP enabled, scalar rx function should be chosen as most of the - * PTP apps are implemented to rx burst 1 pkt. - */ - if (dev->scalar_ena || dev->rx_offloads & DEV_RX_OFFLOAD_TIMESTAMP) { + if (dev->scalar_ena) { if (dev->rx_offloads & DEV_RX_OFFLOAD_SCATTER) return pick_rx_func(eth_dev, nix_eth_rx_burst_mseg); return pick_rx_func(eth_dev, nix_eth_rx_burst); diff --git a/drivers/net/cnxk/cn9k_rx.h b/drivers/net/cnxk/cn9k_rx.h index 5ae9e8195..beb52f39d 100644 --- a/drivers/net/cnxk/cn9k_rx.h +++ b/drivers/net/cnxk/cn9k_rx.h @@ -110,7 +110,7 @@ nix_update_match_id(const uint16_t match_id, uint64_t ol_flags, static __rte_always_inline void nix_cqe_xtract_mseg(const union nix_rx_parse_u *rx, struct rte_mbuf *mbuf, - uint64_t rearm) + uint64_t rearm, const uint16_t flags) { const rte_iova_t *iova_list; struct rte_mbuf *head; @@ -126,8 +126,10 @@ nix_cqe_xtract_mseg(const union nix_rx_parse_u *rx, struct rte_mbuf *mbuf, return; } - mbuf->pkt_len = rx->pkt_lenm1 + 1; - mbuf->data_len = sg & 0xFFFF; + mbuf->pkt_len = (rx->pkt_lenm1 + 1) - (flags & NIX_RX_OFFLOAD_TSTAMP_F ? + CNXK_NIX_TIMESYNC_RX_OFFSET : 0); + mbuf->data_len = (sg & 0xFFFF) - (flags & NIX_RX_OFFLOAD_TSTAMP_F ? + CNXK_NIX_TIMESYNC_RX_OFFSET : 0); mbuf->nb_segs = nb_segs; sg = sg >> 16; @@ -210,7 +212,7 @@ cn9k_nix_cqe_to_mbuf(const struct nix_cqe_hdr_s *cq, const uint32_t tag, *(uint64_t *)(&mbuf->rearm_data) = val; if (flag & NIX_RX_MULTI_SEG_F) - nix_cqe_xtract_mseg(rx, mbuf, val); + nix_cqe_xtract_mseg(rx, mbuf, val, flag); else mbuf->next = NULL; } @@ -275,8 +277,9 @@ cn9k_nix_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts, flags); cnxk_nix_mbuf_to_tstamp(mbuf, rxq->tstamp, (flags & NIX_RX_OFFLOAD_TSTAMP_F), - (uint64_t *)((uint8_t *)mbuf + data_off) - ); + (flags & NIX_RX_MULTI_SEG_F), + (uint64_t *)((uint8_t *)mbuf + + data_off)); rx_pkts[packets++] = mbuf; roc_prefetch_store_keep(mbuf); head++; @@ -472,6 +475,99 @@ cn9k_nix_recv_pkts_vector(void *rx_queue, struct rte_mbuf **rx_pkts, mbuf3); } + if (flags & NIX_RX_OFFLOAD_TSTAMP_F) { + const uint16x8_t len_off = { + 0, /* ptype 0:15 */ + 0, /* ptype 16:32 */ + CNXK_NIX_TIMESYNC_RX_OFFSET, /* pktlen 0:15*/ + 0, /* pktlen 16:32 */ + CNXK_NIX_TIMESYNC_RX_OFFSET, /* datalen 0:15 */ + 0, + 0, + 0}; + const uint32x4_t ptype = {RTE_PTYPE_L2_ETHER_TIMESYNC, + RTE_PTYPE_L2_ETHER_TIMESYNC, + RTE_PTYPE_L2_ETHER_TIMESYNC, + RTE_PTYPE_L2_ETHER_TIMESYNC}; + const uint64_t ts_olf = PKT_RX_IEEE1588_PTP | + PKT_RX_IEEE1588_TMST | + rxq->tstamp->rx_tstamp_dynflag; + const uint32x4_t and_mask = {0x1, 0x2, 0x4, 0x8}; + uint64x2_t ts01, ts23, mask; + uint64_t ts[4]; + uint8_t res; + + /* Subtract timesync length from total pkt length. */ + f0 = vsubq_u16(f0, len_off); + f1 = vsubq_u16(f1, len_off); + f2 = vsubq_u16(f2, len_off); + f3 = vsubq_u16(f3, len_off); + + /* Get the address of actual timestamp. */ + ts01 = vaddq_u64(mbuf01, data_off); + ts23 = vaddq_u64(mbuf23, data_off); + /* Load timestamp from address. */ + ts01 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts01, + 0), + ts01, 0); + ts01 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts01, + 1), + ts01, 1); + ts23 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts23, + 0), + ts23, 0); + ts23 = vsetq_lane_u64(*(uint64_t *)vgetq_lane_u64(ts23, + 1), + ts23, 1); + /* Convert from be to cpu byteorder. */ + ts01 = vrev64q_u8(ts01); + ts23 = vrev64q_u8(ts23); + /* Store timestamp into scalar for later use. */ + ts[0] = vgetq_lane_u64(ts01, 0); + ts[1] = vgetq_lane_u64(ts01, 1); + ts[2] = vgetq_lane_u64(ts23, 0); + ts[3] = vgetq_lane_u64(ts23, 1); + + /* Store timestamp into dynfield. */ + *cnxk_nix_timestamp_dynfield(mbuf0, rxq->tstamp) = + ts[0]; + *cnxk_nix_timestamp_dynfield(mbuf1, rxq->tstamp) = + ts[1]; + *cnxk_nix_timestamp_dynfield(mbuf2, rxq->tstamp) = + ts[2]; + *cnxk_nix_timestamp_dynfield(mbuf3, rxq->tstamp) = + ts[3]; + + /* Generate ptype mask to filter L2 ether timesync */ + mask = vdupq_n_u32(vgetq_lane_u32(f0, 0)); + mask = vsetq_lane_u32(vgetq_lane_u32(f1, 0), mask, 1); + mask = vsetq_lane_u32(vgetq_lane_u32(f2, 0), mask, 2); + mask = vsetq_lane_u32(vgetq_lane_u32(f3, 0), mask, 3); + + /* Match against L2 ether timesync. */ + mask = vceqq_u32(mask, ptype); + /* Convert from vector from scalar mask */ + res = vaddvq_u32(vandq_u32(mask, and_mask)); + res &= 0xF; + + if (res) { + /* Fill in the ol_flags for any packets that + * matched. + */ + ol_flags0 |= ((res & 0x1) ? ts_olf : 0); + ol_flags1 |= ((res & 0x2) ? ts_olf : 0); + ol_flags2 |= ((res & 0x4) ? ts_olf : 0); + ol_flags3 |= ((res & 0x8) ? ts_olf : 0); + + /* Update Rxq timestamp with the latest + * timestamp. + */ + rxq->tstamp->rx_ready = 1; + rxq->tstamp->rx_tstamp = + ts[31 - __builtin_clz(res)]; + } + } + /* Form rearm_data with ol_flags */ rearm0 = vsetq_lane_u64(ol_flags0, rearm0, 1); rearm1 = vsetq_lane_u64(ol_flags1, rearm1, 1); @@ -499,17 +595,17 @@ cn9k_nix_recv_pkts_vector(void *rx_queue, struct rte_mbuf **rx_pkts, * individual mbufs in scalar mode. */ nix_cqe_xtract_mseg((union nix_rx_parse_u *) - (cq0 + CQE_SZ(0) + 8), mbuf0, - mbuf_initializer); + (cq0 + CQE_SZ(0) + 8), mbuf0, + mbuf_initializer, flags); nix_cqe_xtract_mseg((union nix_rx_parse_u *) - (cq0 + CQE_SZ(1) + 8), mbuf1, - mbuf_initializer); + (cq0 + CQE_SZ(1) + 8), mbuf1, + mbuf_initializer, flags); nix_cqe_xtract_mseg((union nix_rx_parse_u *) - (cq0 + CQE_SZ(2) + 8), mbuf2, - mbuf_initializer); + (cq0 + CQE_SZ(2) + 8), mbuf2, + mbuf_initializer, flags); nix_cqe_xtract_mseg((union nix_rx_parse_u *) - (cq0 + CQE_SZ(3) + 8), mbuf3, - mbuf_initializer); + (cq0 + CQE_SZ(3) + 8), mbuf3, + mbuf_initializer, flags); } else { /* Update that no more segments */ mbuf0->next = NULL; diff --git a/drivers/net/cnxk/cn9k_rx_vec.c b/drivers/net/cnxk/cn9k_rx_vec.c index e61c2225c..ef5f771ef 100644 --- a/drivers/net/cnxk/cn9k_rx_vec.c +++ b/drivers/net/cnxk/cn9k_rx_vec.c @@ -9,9 +9,6 @@ uint16_t __rte_noinline __rte_hot cn9k_nix_recv_pkts_vec_##name( \ void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t pkts) \ { \ - /* TSTMP is not supported by vector */ \ - if ((flags) & NIX_RX_OFFLOAD_TSTAMP_F) \ - return 0; \ return cn9k_nix_recv_pkts_vector(rx_queue, rx_pkts, pkts, \ (flags)); \ } diff --git a/drivers/net/cnxk/cnxk_ethdev.h b/drivers/net/cnxk/cnxk_ethdev.h index 67b1f4253..4eead0390 100644 --- a/drivers/net/cnxk/cnxk_ethdev.h +++ b/drivers/net/cnxk/cnxk_ethdev.h @@ -136,13 +136,12 @@ struct cnxk_eth_qconf { }; struct cnxk_timesync_info { + uint8_t rx_ready; + uint64_t rx_tstamp; uint64_t rx_tstamp_dynflag; + int tstamp_dynfield_offset; rte_iova_t tx_tstamp_iova; uint64_t *tx_tstamp; - uint64_t rx_tstamp; - int tstamp_dynfield_offset; - uint8_t tx_ready; - uint8_t rx_ready; } __plt_cache_aligned; struct cnxk_eth_dev { @@ -465,13 +464,15 @@ cnxk_nix_timestamp_dynfield(struct rte_mbuf *mbuf, static __rte_always_inline void cnxk_nix_mbuf_to_tstamp(struct rte_mbuf *mbuf, - struct cnxk_timesync_info *tstamp, bool ts_enable, + struct cnxk_timesync_info *tstamp, + const uint8_t ts_enable, const uint8_t mseg_enable, uint64_t *tstamp_ptr) { - if (ts_enable && - (mbuf->data_off == - RTE_PKTMBUF_HEADROOM + CNXK_NIX_TIMESYNC_RX_OFFSET)) { - mbuf->pkt_len -= CNXK_NIX_TIMESYNC_RX_OFFSET; + if (ts_enable) { + if (!mseg_enable) { + mbuf->pkt_len -= CNXK_NIX_TIMESYNC_RX_OFFSET; + mbuf->data_len -= CNXK_NIX_TIMESYNC_RX_OFFSET; + } /* Reading the rx timestamp inserted by CGX, viz at * starting of the packet data. From patchwork Mon Jun 28 19:41:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavan Nikhilesh Bhagavatula X-Patchwork-Id: 94929 Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C75D0A0A0C; Mon, 28 Jun 2021 21:42:08 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D444D4114C; Mon, 28 Jun 2021 21:42:03 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by mails.dpdk.org (Postfix) with ESMTP id 5C4C741159 for ; Mon, 28 Jun 2021 21:42:01 +0200 (CEST) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 15SJebfI014744 for ; Mon, 28 Jun 2021 12:42:00 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=Dj/j50Eoe35v1D6tjoUV/9RxjBcorcUBIdeNHdlsp6M=; b=C5ppgf0e/0hd1WINu+xXvF0gvDcFXlZRujjAUkzGxdlSqBD706D14B2VmQlTlHSG3h5s 6//LX1k9Tu5w8uZqlOQcaHTAVgFqMfKvAsaz3qPiDNQckpfZQnfdDpiQzSsAJNJgdR0r 5XO0wt19WvK9hV3n28wUeqe5kIFkWRR27xRl9dAlE+FoV3jeKuwq2Gc60U18b79AjjSg h602o7vvEQ1p4HJrkFLryoD4LGcqh4+sp+1tX/Hc59aPsETkxTBgn5hidH0zYT3waaDJ ETfQ7jkFfOo1wKeZBzJX4ELCsv7cck84d8NEMceVSeRNuQno4eNi0vwgl5I/dQ+hgwm6 7A== Received: from dc5-exch01.marvell.com ([199.233.59.181]) by mx0a-0016f401.pphosted.com with ESMTP id 39f11y3umh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Mon, 28 Jun 2021 12:42:00 -0700 Received: from DC5-EXCH02.marvell.com (10.69.176.39) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Mon, 28 Jun 2021 12:41:58 -0700 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server id 15.0.1497.18 via Frontend Transport; Mon, 28 Jun 2021 12:41:58 -0700 Received: from BG-LT7430.marvell.com (BG-LT7430.marvell.com [10.28.177.176]) by maili.marvell.com (Postfix) with ESMTP id 392193F7055; Mon, 28 Jun 2021 12:41:55 -0700 (PDT) From: To: , Nithin Dabilpuram , "Kiran Kumar K" , Sunil Kumar Kori , Satha Rao CC: , Pavan Nikhilesh Date: Tue, 29 Jun 2021 01:11:40 +0530 Message-ID: <20210628194144.637-3-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210628194144.637-1-pbhagavatula@marvell.com> References: <20210620202906.10974-1-pbhagavatula@marvell.com> <20210628194144.637-1-pbhagavatula@marvell.com> MIME-Version: 1.0 X-Proofpoint-GUID: njXAGMBKjymjofjXeoOj76d9Snnp-c4b X-Proofpoint-ORIG-GUID: njXAGMBKjymjofjXeoOj76d9Snnp-c4b X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-06-28_14:2021-06-25, 2021-06-28 signatures=0 Subject: [dpdk-dev] [PATCH v4 3/6] net/cnxk: enable VLAN processing in vector Tx X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Pavan Nikhilesh Enable VLAN offload in vector Tx burst function. Signed-off-by: Pavan Nikhilesh --- drivers/net/cnxk/cn10k_tx.c | 3 +- drivers/net/cnxk/cn10k_tx.h | 125 +++++++++++++++++++++++++++---- drivers/net/cnxk/cn10k_tx_vec.c | 3 +- drivers/net/cnxk/cn9k_tx.c | 3 +- drivers/net/cnxk/cn9k_tx.h | 128 ++++++++++++++++++++++++++++---- drivers/net/cnxk/cn9k_tx_vec.c | 3 +- 6 files changed, 227 insertions(+), 38 deletions(-) diff --git a/drivers/net/cnxk/cn10k_tx.c b/drivers/net/cnxk/cn10k_tx.c index 18694dc70..05bc163a4 100644 --- a/drivers/net/cnxk/cn10k_tx.c +++ b/drivers/net/cnxk/cn10k_tx.c @@ -69,8 +69,7 @@ cn10k_eth_set_tx_function(struct rte_eth_dev *eth_dev) if (dev->scalar_ena || (dev->tx_offload_flags & - (NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSTAMP_F | - NIX_TX_OFFLOAD_TSO_F))) + (NIX_TX_OFFLOAD_TSTAMP_F | NIX_TX_OFFLOAD_TSO_F))) pick_tx_func(eth_dev, nix_eth_tx_burst); else pick_tx_func(eth_dev, nix_eth_tx_vec_burst); diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h index 8b1446f25..1e1697858 100644 --- a/drivers/net/cnxk/cn10k_tx.h +++ b/drivers/net/cnxk/cn10k_tx.h @@ -62,9 +62,14 @@ cn10k_nix_tx_ext_subs(const uint16_t flags) static __rte_always_inline uint8_t cn10k_nix_pkts_per_vec_brst(const uint16_t flags) { - RTE_SET_USED(flags); - /* We can pack up to 4 packets per LMTLINE if there are no offloads. */ - return 4 << ROC_LMT_LINES_PER_CORE_LOG2; + return ((flags & NIX_TX_NEED_EXT_HDR) ? 2 : 4) + << ROC_LMT_LINES_PER_CORE_LOG2; +} + +static __rte_always_inline uint8_t +cn10k_nix_tx_dwords_per_line(const uint16_t flags) +{ + return (flags & NIX_TX_NEED_EXT_HDR) ? 6 : 8; } static __rte_always_inline uint64_t @@ -98,10 +103,9 @@ cn10k_nix_tx_steor_data(const uint16_t flags) static __rte_always_inline uint64_t cn10k_nix_tx_steor_vec_data(const uint16_t flags) { - const uint64_t dw_m1 = 0x7; + const uint64_t dw_m1 = cn10k_nix_tx_dwords_per_line(flags) - 1; uint64_t data; - RTE_SET_USED(flags); /* This will be moved to addr area */ data = dw_m1; /* 15 vector sizes for single seg */ @@ -690,11 +694,14 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, { uint64x2_t dataoff_iova0, dataoff_iova1, dataoff_iova2, dataoff_iova3; uint64x2_t len_olflags0, len_olflags1, len_olflags2, len_olflags3; - uint64x2_t cmd0[NIX_DESCS_PER_LOOP], cmd1[NIX_DESCS_PER_LOOP]; + uint64x2_t cmd0[NIX_DESCS_PER_LOOP], cmd1[NIX_DESCS_PER_LOOP], + cmd2[NIX_DESCS_PER_LOOP]; uint64_t *mbuf0, *mbuf1, *mbuf2, *mbuf3, data, pa; uint64x2_t senddesc01_w0, senddesc23_w0; uint64x2_t senddesc01_w1, senddesc23_w1; uint16_t left, scalar, burst, i, lmt_id; + uint64x2_t sendext01_w0, sendext23_w0; + uint64x2_t sendext01_w1, sendext23_w1; uint64x2_t sgdesc01_w0, sgdesc23_w0; uint64x2_t sgdesc01_w1, sgdesc23_w1; struct cn10k_eth_txq *txq = tx_queue; @@ -720,6 +727,14 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, sgdesc01_w0 = vld1q_dup_u64(&txq->sg_w0); sgdesc23_w0 = sgdesc01_w0; + /* Load command defaults into vector variables. */ + if (flags & NIX_TX_NEED_EXT_HDR) { + sendext01_w0 = vld1q_dup_u64(&txq->cmd[0]); + sendext23_w0 = sendext01_w0; + sendext01_w1 = vdupq_n_u64(12 | 12U << 24); + sendext23_w1 = sendext01_w1; + } + /* Get LMT base address and LMT ID as lcore id */ ROC_LMT_BASE_ID_GET(laddr, lmt_id); left = pkts; @@ -738,6 +753,13 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, senddesc23_w0 = senddesc01_w0; sgdesc23_w0 = sgdesc01_w0; + /* Clear vlan enables. */ + if (flags & NIX_TX_NEED_EXT_HDR) { + sendext01_w1 = vbicq_u64(sendext01_w1, + vdupq_n_u64(0x3FFFF00FFFF00)); + sendext23_w1 = sendext01_w1; + } + /* Move mbufs to iova */ mbuf0 = (uint64_t *)tx_pkts[0]; mbuf1 = (uint64_t *)tx_pkts[1]; @@ -1303,6 +1325,52 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, senddesc01_w0 = vorrq_u64(senddesc01_w0, xmask01); senddesc23_w0 = vorrq_u64(senddesc23_w0, xmask23); + if (flags & NIX_TX_OFFLOAD_VLAN_QINQ_F) { + /* Tx ol_flag for vlan. */ + const uint64x2_t olv = {PKT_TX_VLAN, PKT_TX_VLAN}; + /* Bit enable for VLAN1 */ + const uint64x2_t mlv = {BIT_ULL(49), BIT_ULL(49)}; + /* Tx ol_flag for QnQ. */ + const uint64x2_t olq = {PKT_TX_QINQ, PKT_TX_QINQ}; + /* Bit enable for VLAN0 */ + const uint64x2_t mlq = {BIT_ULL(48), BIT_ULL(48)}; + /* Load vlan values from packet. outer is VLAN 0 */ + uint64x2_t ext01 = { + ((uint32_t)tx_pkts[0]->vlan_tci_outer) << 8 | + ((uint64_t)tx_pkts[0]->vlan_tci) << 32, + ((uint32_t)tx_pkts[1]->vlan_tci_outer) << 8 | + ((uint64_t)tx_pkts[1]->vlan_tci) << 32, + }; + uint64x2_t ext23 = { + ((uint32_t)tx_pkts[2]->vlan_tci_outer) << 8 | + ((uint64_t)tx_pkts[2]->vlan_tci) << 32, + ((uint32_t)tx_pkts[3]->vlan_tci_outer) << 8 | + ((uint64_t)tx_pkts[3]->vlan_tci) << 32, + }; + + /* Get ol_flags of the packets. */ + xtmp128 = vzip1q_u64(len_olflags0, len_olflags1); + ytmp128 = vzip1q_u64(len_olflags2, len_olflags3); + + /* ORR vlan outer/inner values into cmd. */ + sendext01_w1 = vorrq_u64(sendext01_w1, ext01); + sendext23_w1 = vorrq_u64(sendext23_w1, ext23); + + /* Test for offload enable bits and generate masks. */ + xtmp128 = vorrq_u64(vandq_u64(vtstq_u64(xtmp128, olv), + mlv), + vandq_u64(vtstq_u64(xtmp128, olq), + mlq)); + ytmp128 = vorrq_u64(vandq_u64(vtstq_u64(ytmp128, olv), + mlv), + vandq_u64(vtstq_u64(ytmp128, olq), + mlq)); + + /* Set vlan enable bits into cmd based on mask. */ + sendext01_w1 = vorrq_u64(sendext01_w1, xtmp128); + sendext23_w1 = vorrq_u64(sendext23_w1, ytmp128); + } + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) { /* Set don't free bit if reference count > 1 */ xmask01 = vdupq_n_u64(0); @@ -1381,16 +1449,41 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, cmd1[2] = vzip1q_u64(sgdesc23_w0, sgdesc23_w1); cmd1[3] = vzip2q_u64(sgdesc23_w0, sgdesc23_w1); - /* Store the prepared send desc to LMT lines */ - vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[0]); - vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd1[0]); - vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd0[1]); - vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd1[1]); - vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd0[2]); - vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd1[2]); - vst1q_u64(LMT_OFF(laddr, lnum, 96), cmd0[3]); - vst1q_u64(LMT_OFF(laddr, lnum, 112), cmd1[3]); - lnum += 1; + if (flags & NIX_TX_NEED_EXT_HDR) { + cmd2[0] = vzip1q_u64(sendext01_w0, sendext01_w1); + cmd2[1] = vzip2q_u64(sendext01_w0, sendext01_w1); + cmd2[2] = vzip1q_u64(sendext23_w0, sendext23_w1); + cmd2[3] = vzip2q_u64(sendext23_w0, sendext23_w1); + } + + if (flags & NIX_TX_NEED_EXT_HDR) { + /* Store the prepared send desc to LMT lines */ + vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[0]); + vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[0]); + vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[0]); + vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd0[1]); + vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd2[1]); + vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd1[1]); + lnum += 1; + vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[2]); + vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[2]); + vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[2]); + vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd0[3]); + vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd2[3]); + vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd1[3]); + lnum += 1; + } else { + /* Store the prepared send desc to LMT lines */ + vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[0]); + vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd1[0]); + vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd0[1]); + vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd1[1]); + vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd0[2]); + vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd1[2]); + vst1q_u64(LMT_OFF(laddr, lnum, 96), cmd0[3]); + vst1q_u64(LMT_OFF(laddr, lnum, 112), cmd1[3]); + lnum += 1; + } tx_pkts = tx_pkts + NIX_DESCS_PER_LOOP; } diff --git a/drivers/net/cnxk/cn10k_tx_vec.c b/drivers/net/cnxk/cn10k_tx_vec.c index 7453f3bc9..beb5c649b 100644 --- a/drivers/net/cnxk/cn10k_tx_vec.c +++ b/drivers/net/cnxk/cn10k_tx_vec.c @@ -14,8 +14,7 @@ uint64_t cmd[sz]; \ \ /* VLAN, TSTMP, TSO is not supported by vec */ \ - if ((flags) & NIX_TX_OFFLOAD_VLAN_QINQ_F || \ - (flags) & NIX_TX_OFFLOAD_TSTAMP_F || \ + if ((flags) & NIX_TX_OFFLOAD_TSTAMP_F || \ (flags) & NIX_TX_OFFLOAD_TSO_F) \ return 0; \ return cn10k_nix_xmit_pkts_vector(tx_queue, tx_pkts, pkts, cmd,\ diff --git a/drivers/net/cnxk/cn9k_tx.c b/drivers/net/cnxk/cn9k_tx.c index b80260607..4b43cdaff 100644 --- a/drivers/net/cnxk/cn9k_tx.c +++ b/drivers/net/cnxk/cn9k_tx.c @@ -68,8 +68,7 @@ cn9k_eth_set_tx_function(struct rte_eth_dev *eth_dev) if (dev->scalar_ena || (dev->tx_offload_flags & - (NIX_TX_OFFLOAD_VLAN_QINQ_F | NIX_TX_OFFLOAD_TSTAMP_F | - NIX_TX_OFFLOAD_TSO_F))) + (NIX_TX_OFFLOAD_TSTAMP_F | NIX_TX_OFFLOAD_TSO_F))) pick_tx_func(eth_dev, nix_eth_tx_burst); else pick_tx_func(eth_dev, nix_eth_tx_vec_burst); diff --git a/drivers/net/cnxk/cn9k_tx.h b/drivers/net/cnxk/cn9k_tx.h index 1899d6670..d5715bb52 100644 --- a/drivers/net/cnxk/cn9k_tx.h +++ b/drivers/net/cnxk/cn9k_tx.h @@ -552,10 +552,13 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, { uint64x2_t dataoff_iova0, dataoff_iova1, dataoff_iova2, dataoff_iova3; uint64x2_t len_olflags0, len_olflags1, len_olflags2, len_olflags3; - uint64x2_t cmd0[NIX_DESCS_PER_LOOP], cmd1[NIX_DESCS_PER_LOOP]; + uint64x2_t cmd0[NIX_DESCS_PER_LOOP], cmd1[NIX_DESCS_PER_LOOP], + cmd2[NIX_DESCS_PER_LOOP]; uint64_t *mbuf0, *mbuf1, *mbuf2, *mbuf3; uint64x2_t senddesc01_w0, senddesc23_w0; uint64x2_t senddesc01_w1, senddesc23_w1; + uint64x2_t sendext01_w0, sendext23_w0; + uint64x2_t sendext01_w1, sendext23_w1; uint64x2_t sgdesc01_w0, sgdesc23_w0; uint64x2_t sgdesc01_w1, sgdesc23_w1; struct cn9k_eth_txq *txq = tx_queue; @@ -585,8 +588,19 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, senddesc23_w0 = senddesc01_w0; senddesc01_w1 = vdupq_n_u64(0); senddesc23_w1 = senddesc01_w1; - sgdesc01_w0 = vld1q_dup_u64(&txq->cmd[2]); - sgdesc23_w0 = sgdesc01_w0; + + /* Load command defaults into vector variables. */ + if (flags & NIX_TX_NEED_EXT_HDR) { + sendext01_w0 = vld1q_dup_u64(&txq->cmd[2]); + sendext23_w0 = sendext01_w0; + sendext01_w1 = vdupq_n_u64(12 | 12U << 24); + sendext23_w1 = sendext01_w1; + sgdesc01_w0 = vld1q_dup_u64(&txq->cmd[4]); + sgdesc23_w0 = sgdesc01_w0; + } else { + sgdesc01_w0 = vld1q_dup_u64(&txq->cmd[2]); + sgdesc23_w0 = sgdesc01_w0; + } for (i = 0; i < pkts; i += NIX_DESCS_PER_LOOP) { /* Clear lower 32bit of SEND_HDR_W0 and SEND_SG_W0 */ @@ -597,6 +611,13 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, senddesc23_w0 = senddesc01_w0; sgdesc23_w0 = sgdesc01_w0; + /* Clear vlan enables. */ + if (flags & NIX_TX_NEED_EXT_HDR) { + sendext01_w1 = vbicq_u64(sendext01_w1, + vdupq_n_u64(0x3FFFF00FFFF00)); + sendext23_w1 = sendext01_w1; + } + /* Move mbufs to iova */ mbuf0 = (uint64_t *)tx_pkts[0]; mbuf1 = (uint64_t *)tx_pkts[1]; @@ -1162,6 +1183,52 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, senddesc01_w0 = vorrq_u64(senddesc01_w0, xmask01); senddesc23_w0 = vorrq_u64(senddesc23_w0, xmask23); + if (flags & NIX_TX_OFFLOAD_VLAN_QINQ_F) { + /* Tx ol_flag for vlan. */ + const uint64x2_t olv = {PKT_TX_VLAN, PKT_TX_VLAN}; + /* Bit enable for VLAN1 */ + const uint64x2_t mlv = {BIT_ULL(49), BIT_ULL(49)}; + /* Tx ol_flag for QnQ. */ + const uint64x2_t olq = {PKT_TX_QINQ, PKT_TX_QINQ}; + /* Bit enable for VLAN0 */ + const uint64x2_t mlq = {BIT_ULL(48), BIT_ULL(48)}; + /* Load vlan values from packet. outer is VLAN 0 */ + uint64x2_t ext01 = { + ((uint32_t)tx_pkts[0]->vlan_tci_outer) << 8 | + ((uint64_t)tx_pkts[0]->vlan_tci) << 32, + ((uint32_t)tx_pkts[1]->vlan_tci_outer) << 8 | + ((uint64_t)tx_pkts[1]->vlan_tci) << 32, + }; + uint64x2_t ext23 = { + ((uint32_t)tx_pkts[2]->vlan_tci_outer) << 8 | + ((uint64_t)tx_pkts[2]->vlan_tci) << 32, + ((uint32_t)tx_pkts[3]->vlan_tci_outer) << 8 | + ((uint64_t)tx_pkts[3]->vlan_tci) << 32, + }; + + /* Get ol_flags of the packets. */ + xtmp128 = vzip1q_u64(len_olflags0, len_olflags1); + ytmp128 = vzip1q_u64(len_olflags2, len_olflags3); + + /* ORR vlan outer/inner values into cmd. */ + sendext01_w1 = vorrq_u64(sendext01_w1, ext01); + sendext23_w1 = vorrq_u64(sendext23_w1, ext23); + + /* Test for offload enable bits and generate masks. */ + xtmp128 = vorrq_u64(vandq_u64(vtstq_u64(xtmp128, olv), + mlv), + vandq_u64(vtstq_u64(xtmp128, olq), + mlq)); + ytmp128 = vorrq_u64(vandq_u64(vtstq_u64(ytmp128, olv), + mlv), + vandq_u64(vtstq_u64(ytmp128, olq), + mlq)); + + /* Set vlan enable bits into cmd based on mask. */ + sendext01_w1 = vorrq_u64(sendext01_w1, xtmp128); + sendext23_w1 = vorrq_u64(sendext23_w1, ytmp128); + } + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) { /* Set don't free bit if reference count > 1 */ xmask01 = vdupq_n_u64(0); @@ -1247,17 +1314,50 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, cmd1[2] = vzip1q_u64(sgdesc23_w0, sgdesc23_w1); cmd1[3] = vzip2q_u64(sgdesc23_w0, sgdesc23_w1); - do { - vst1q_u64(lmt_addr, cmd0[0]); - vst1q_u64(lmt_addr + 2, cmd1[0]); - vst1q_u64(lmt_addr + 4, cmd0[1]); - vst1q_u64(lmt_addr + 6, cmd1[1]); - vst1q_u64(lmt_addr + 8, cmd0[2]); - vst1q_u64(lmt_addr + 10, cmd1[2]); - vst1q_u64(lmt_addr + 12, cmd0[3]); - vst1q_u64(lmt_addr + 14, cmd1[3]); - lmt_status = roc_lmt_submit_ldeor(io_addr); - } while (lmt_status == 0); + if (flags & NIX_TX_NEED_EXT_HDR) { + cmd2[0] = vzip1q_u64(sendext01_w0, sendext01_w1); + cmd2[1] = vzip2q_u64(sendext01_w0, sendext01_w1); + cmd2[2] = vzip1q_u64(sendext23_w0, sendext23_w1); + cmd2[3] = vzip2q_u64(sendext23_w0, sendext23_w1); + } + + if (flags & NIX_TX_NEED_EXT_HDR) { + /* With ext header in the command we can no longer send + * all 4 packets together since LMTLINE is 128bytes. + * Split and Tx twice. + */ + do { + vst1q_u64(lmt_addr, cmd0[0]); + vst1q_u64(lmt_addr + 2, cmd2[0]); + vst1q_u64(lmt_addr + 4, cmd1[0]); + vst1q_u64(lmt_addr + 6, cmd0[1]); + vst1q_u64(lmt_addr + 8, cmd2[1]); + vst1q_u64(lmt_addr + 10, cmd1[1]); + lmt_status = roc_lmt_submit_ldeor(io_addr); + } while (lmt_status == 0); + + do { + vst1q_u64(lmt_addr, cmd0[2]); + vst1q_u64(lmt_addr + 2, cmd2[2]); + vst1q_u64(lmt_addr + 4, cmd1[2]); + vst1q_u64(lmt_addr + 6, cmd0[3]); + vst1q_u64(lmt_addr + 8, cmd2[3]); + vst1q_u64(lmt_addr + 10, cmd1[3]); + lmt_status = roc_lmt_submit_ldeor(io_addr); + } while (lmt_status == 0); + } else { + do { + vst1q_u64(lmt_addr, cmd0[0]); + vst1q_u64(lmt_addr + 2, cmd1[0]); + vst1q_u64(lmt_addr + 4, cmd0[1]); + vst1q_u64(lmt_addr + 6, cmd1[1]); + vst1q_u64(lmt_addr + 8, cmd0[2]); + vst1q_u64(lmt_addr + 10, cmd1[2]); + vst1q_u64(lmt_addr + 12, cmd0[3]); + vst1q_u64(lmt_addr + 14, cmd1[3]); + lmt_status = roc_lmt_submit_ldeor(io_addr); + } while (lmt_status == 0); + } tx_pkts = tx_pkts + NIX_DESCS_PER_LOOP; } diff --git a/drivers/net/cnxk/cn9k_tx_vec.c b/drivers/net/cnxk/cn9k_tx_vec.c index a6e7c9e54..5842facb5 100644 --- a/drivers/net/cnxk/cn9k_tx_vec.c +++ b/drivers/net/cnxk/cn9k_tx_vec.c @@ -14,8 +14,7 @@ uint64_t cmd[sz]; \ \ /* VLAN, TSTMP, TSO is not supported by vec */ \ - if ((flags) & NIX_TX_OFFLOAD_VLAN_QINQ_F || \ - (flags) & NIX_TX_OFFLOAD_TSTAMP_F || \ + if ((flags) & NIX_TX_OFFLOAD_TSTAMP_F || \ (flags) & NIX_TX_OFFLOAD_TSO_F) \ return 0; \ return cn9k_nix_xmit_pkts_vector(tx_queue, tx_pkts, pkts, cmd, \ From patchwork Mon Jun 28 19:41:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavan Nikhilesh Bhagavatula X-Patchwork-Id: 94930 Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B628EA0A0C; Mon, 28 Jun 2021 21:42:16 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 741AC41178; Mon, 28 Jun 2021 21:42:06 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by mails.dpdk.org (Postfix) with ESMTP id 1CD1B4116A for ; Mon, 28 Jun 2021 21:42:05 +0200 (CEST) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 15SJeqX8014516 for ; Mon, 28 Jun 2021 12:42:04 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=PdXqoUzs8KK7NqI6OocT55ywtotxq3dzfS4UUEMViCw=; b=aaqPiJJKcCzVwvoZcC2lWFuL0eLbdi3BG9rpYWM9CauQMYIHbIjKXini4Xus29cuFkuT 9PgDmNSHctOHQnJYwSXK0ZlBdJ/FMOJNmXdjENqqREFEIszIe8kseczFiorkh7WYgexz vjIsrHTGFBuKXG2s1FlVbSZa1bK7JZCJVaiMIxzD04DhIUDr+7Ewaw6a2aUEKDTSiFDq 0vz1DpyEz7OIfr5o8i7dfZ3eeW/F+4YDJqfspGpYJRMosyYGhcXfMAFqkHnQ/PjQ1ZcS 1FqKUhZ/3xy5O2RlX9OyBGfhpkkoRBSakttqI690OW6bBh1yy/4BRowpDuZiVD/zFGws Jw== Received: from dc5-exch01.marvell.com ([199.233.59.181]) by mx0b-0016f401.pphosted.com with ESMTP id 39f964agpr-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Mon, 28 Jun 2021 12:42:04 -0700 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Mon, 28 Jun 2021 12:42:02 -0700 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server id 15.0.1497.18 via Frontend Transport; Mon, 28 Jun 2021 12:42:02 -0700 Received: from BG-LT7430.marvell.com (BG-LT7430.marvell.com [10.28.177.176]) by maili.marvell.com (Postfix) with ESMTP id 55AC73F7055; Mon, 28 Jun 2021 12:42:00 -0700 (PDT) From: To: , Nithin Dabilpuram , "Kiran Kumar K" , Sunil Kumar Kori , Satha Rao CC: , Pavan Nikhilesh Date: Tue, 29 Jun 2021 01:11:41 +0530 Message-ID: <20210628194144.637-4-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210628194144.637-1-pbhagavatula@marvell.com> References: <20210620202906.10974-1-pbhagavatula@marvell.com> <20210628194144.637-1-pbhagavatula@marvell.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: jbr-Vi-KJKTEpNXIBQbTHrCQDdWpr6lc X-Proofpoint-GUID: jbr-Vi-KJKTEpNXIBQbTHrCQDdWpr6lc X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-06-28_14:2021-06-25, 2021-06-28 signatures=0 Subject: [dpdk-dev] [PATCH v4 4/6] net/cnxk: enable ptp processing in vector Tx X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Pavan Nikhilesh Enable PTP offload in vector Tx burst function. Since, we can no-longer use a single LMT line for burst of 4, split the LMT into two and transmit twice. Signed-off-by: Pavan Nikhilesh --- drivers/net/cnxk/cn10k_tx.c | 4 +- drivers/net/cnxk/cn10k_tx.h | 109 +++++++++++++++++++++++++++----- drivers/net/cnxk/cn10k_tx_vec.c | 5 +- drivers/net/cnxk/cn9k_tx.c | 4 +- drivers/net/cnxk/cn9k_tx.h | 105 ++++++++++++++++++++++++++---- drivers/net/cnxk/cn9k_tx_vec.c | 5 +- 6 files changed, 192 insertions(+), 40 deletions(-) diff --git a/drivers/net/cnxk/cn10k_tx.c b/drivers/net/cnxk/cn10k_tx.c index 05bc163a4..c4c3e6570 100644 --- a/drivers/net/cnxk/cn10k_tx.c +++ b/drivers/net/cnxk/cn10k_tx.c @@ -67,9 +67,7 @@ cn10k_eth_set_tx_function(struct rte_eth_dev *eth_dev) #undef T }; - if (dev->scalar_ena || - (dev->tx_offload_flags & - (NIX_TX_OFFLOAD_TSTAMP_F | NIX_TX_OFFLOAD_TSO_F))) + if (dev->scalar_ena || (dev->tx_offload_flags & NIX_TX_OFFLOAD_TSO_F)) pick_tx_func(eth_dev, nix_eth_tx_burst); else pick_tx_func(eth_dev, nix_eth_tx_vec_burst); diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h index 1e1697858..8af6799ff 100644 --- a/drivers/net/cnxk/cn10k_tx.h +++ b/drivers/net/cnxk/cn10k_tx.h @@ -69,7 +69,9 @@ cn10k_nix_pkts_per_vec_brst(const uint16_t flags) static __rte_always_inline uint8_t cn10k_nix_tx_dwords_per_line(const uint16_t flags) { - return (flags & NIX_TX_NEED_EXT_HDR) ? 6 : 8; + return (flags & NIX_TX_NEED_EXT_HDR) ? + ((flags & NIX_TX_OFFLOAD_TSTAMP_F) ? 8 : 6) : + 8; } static __rte_always_inline uint64_t @@ -695,13 +697,15 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, uint64x2_t dataoff_iova0, dataoff_iova1, dataoff_iova2, dataoff_iova3; uint64x2_t len_olflags0, len_olflags1, len_olflags2, len_olflags3; uint64x2_t cmd0[NIX_DESCS_PER_LOOP], cmd1[NIX_DESCS_PER_LOOP], - cmd2[NIX_DESCS_PER_LOOP]; + cmd2[NIX_DESCS_PER_LOOP], cmd3[NIX_DESCS_PER_LOOP]; uint64_t *mbuf0, *mbuf1, *mbuf2, *mbuf3, data, pa; uint64x2_t senddesc01_w0, senddesc23_w0; uint64x2_t senddesc01_w1, senddesc23_w1; uint16_t left, scalar, burst, i, lmt_id; uint64x2_t sendext01_w0, sendext23_w0; uint64x2_t sendext01_w1, sendext23_w1; + uint64x2_t sendmem01_w0, sendmem23_w0; + uint64x2_t sendmem01_w1, sendmem23_w1; uint64x2_t sgdesc01_w0, sgdesc23_w0; uint64x2_t sgdesc01_w1, sgdesc23_w1; struct cn10k_eth_txq *txq = tx_queue; @@ -733,6 +737,12 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, sendext23_w0 = sendext01_w0; sendext01_w1 = vdupq_n_u64(12 | 12U << 24); sendext23_w1 = sendext01_w1; + if (flags & NIX_TX_OFFLOAD_TSTAMP_F) { + sendmem01_w0 = vld1q_dup_u64(&txq->cmd[2]); + sendmem23_w0 = sendmem01_w0; + sendmem01_w1 = vld1q_dup_u64(&txq->cmd[3]); + sendmem23_w1 = sendmem01_w1; + } } /* Get LMT base address and LMT ID as lcore id */ @@ -760,6 +770,17 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, sendext23_w1 = sendext01_w1; } + if (flags & NIX_TX_OFFLOAD_TSTAMP_F) { + /* Reset send mem alg to SETTSTMP from SUB*/ + sendmem01_w0 = vbicq_u64(sendmem01_w0, + vdupq_n_u64(BIT_ULL(59))); + /* Reset send mem address to default. */ + sendmem01_w1 = + vbicq_u64(sendmem01_w1, vdupq_n_u64(0xF)); + sendmem23_w0 = sendmem01_w0; + sendmem23_w1 = sendmem01_w1; + } + /* Move mbufs to iova */ mbuf0 = (uint64_t *)tx_pkts[0]; mbuf1 = (uint64_t *)tx_pkts[1]; @@ -1371,6 +1392,44 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, sendext23_w1 = vorrq_u64(sendext23_w1, ytmp128); } + if (flags & NIX_TX_OFFLOAD_TSTAMP_F) { + /* Tx ol_flag for timestam. */ + const uint64x2_t olf = {PKT_TX_IEEE1588_TMST, + PKT_TX_IEEE1588_TMST}; + /* Set send mem alg to SUB. */ + const uint64x2_t alg = {BIT_ULL(59), BIT_ULL(59)}; + /* Increment send mem address by 8. */ + const uint64x2_t addr = {0x8, 0x8}; + + xtmp128 = vzip1q_u64(len_olflags0, len_olflags1); + ytmp128 = vzip1q_u64(len_olflags2, len_olflags3); + + /* Check if timestamp is requested and generate inverted + * mask as we need not make any changes to default cmd + * value. + */ + xtmp128 = vmvnq_u32(vtstq_u64(olf, xtmp128)); + ytmp128 = vmvnq_u32(vtstq_u64(olf, ytmp128)); + + /* Change send mem address to an 8 byte offset when + * TSTMP is disabled. + */ + sendmem01_w1 = vaddq_u64(sendmem01_w1, + vandq_u64(xtmp128, addr)); + sendmem23_w1 = vaddq_u64(sendmem23_w1, + vandq_u64(ytmp128, addr)); + /* Change send mem alg to SUB when TSTMP is disabled. */ + sendmem01_w0 = vorrq_u64(sendmem01_w0, + vandq_u64(xtmp128, alg)); + sendmem23_w0 = vorrq_u64(sendmem23_w0, + vandq_u64(ytmp128, alg)); + + cmd3[0] = vzip1q_u64(sendmem01_w0, sendmem01_w1); + cmd3[1] = vzip2q_u64(sendmem01_w0, sendmem01_w1); + cmd3[2] = vzip1q_u64(sendmem23_w0, sendmem23_w1); + cmd3[3] = vzip2q_u64(sendmem23_w0, sendmem23_w1); + } + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) { /* Set don't free bit if reference count > 1 */ xmask01 = vdupq_n_u64(0); @@ -1458,19 +1517,39 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, if (flags & NIX_TX_NEED_EXT_HDR) { /* Store the prepared send desc to LMT lines */ - vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[0]); - vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[0]); - vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[0]); - vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd0[1]); - vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd2[1]); - vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd1[1]); - lnum += 1; - vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[2]); - vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[2]); - vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[2]); - vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd0[3]); - vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd2[3]); - vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd1[3]); + if (flags & NIX_TX_OFFLOAD_TSTAMP_F) { + vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[0]); + vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[0]); + vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[0]); + vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd3[0]); + vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd0[1]); + vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd2[1]); + vst1q_u64(LMT_OFF(laddr, lnum, 96), cmd1[1]); + vst1q_u64(LMT_OFF(laddr, lnum, 112), cmd3[1]); + lnum += 1; + vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[2]); + vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[2]); + vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[2]); + vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd3[2]); + vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd0[3]); + vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd2[3]); + vst1q_u64(LMT_OFF(laddr, lnum, 96), cmd1[3]); + vst1q_u64(LMT_OFF(laddr, lnum, 112), cmd3[3]); + } else { + vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[0]); + vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[0]); + vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[0]); + vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd0[1]); + vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd2[1]); + vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd1[1]); + lnum += 1; + vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[2]); + vst1q_u64(LMT_OFF(laddr, lnum, 16), cmd2[2]); + vst1q_u64(LMT_OFF(laddr, lnum, 32), cmd1[2]); + vst1q_u64(LMT_OFF(laddr, lnum, 48), cmd0[3]); + vst1q_u64(LMT_OFF(laddr, lnum, 64), cmd2[3]); + vst1q_u64(LMT_OFF(laddr, lnum, 80), cmd1[3]); + } lnum += 1; } else { /* Store the prepared send desc to LMT lines */ diff --git a/drivers/net/cnxk/cn10k_tx_vec.c b/drivers/net/cnxk/cn10k_tx_vec.c index beb5c649b..0b4a4c7ba 100644 --- a/drivers/net/cnxk/cn10k_tx_vec.c +++ b/drivers/net/cnxk/cn10k_tx_vec.c @@ -13,9 +13,8 @@ { \ uint64_t cmd[sz]; \ \ - /* VLAN, TSTMP, TSO is not supported by vec */ \ - if ((flags) & NIX_TX_OFFLOAD_TSTAMP_F || \ - (flags) & NIX_TX_OFFLOAD_TSO_F) \ + /* TSO is not supported by vec */ \ + if ((flags) & NIX_TX_OFFLOAD_TSO_F) \ return 0; \ return cn10k_nix_xmit_pkts_vector(tx_queue, tx_pkts, pkts, cmd,\ (flags)); \ diff --git a/drivers/net/cnxk/cn9k_tx.c b/drivers/net/cnxk/cn9k_tx.c index 4b43cdaff..c32681ed4 100644 --- a/drivers/net/cnxk/cn9k_tx.c +++ b/drivers/net/cnxk/cn9k_tx.c @@ -66,9 +66,7 @@ cn9k_eth_set_tx_function(struct rte_eth_dev *eth_dev) #undef T }; - if (dev->scalar_ena || - (dev->tx_offload_flags & - (NIX_TX_OFFLOAD_TSTAMP_F | NIX_TX_OFFLOAD_TSO_F))) + if (dev->scalar_ena || (dev->tx_offload_flags & NIX_TX_OFFLOAD_TSO_F)) pick_tx_func(eth_dev, nix_eth_tx_burst); else pick_tx_func(eth_dev, nix_eth_tx_vec_burst); diff --git a/drivers/net/cnxk/cn9k_tx.h b/drivers/net/cnxk/cn9k_tx.h index d5715bb52..cb574a1c1 100644 --- a/drivers/net/cnxk/cn9k_tx.h +++ b/drivers/net/cnxk/cn9k_tx.h @@ -553,12 +553,14 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, uint64x2_t dataoff_iova0, dataoff_iova1, dataoff_iova2, dataoff_iova3; uint64x2_t len_olflags0, len_olflags1, len_olflags2, len_olflags3; uint64x2_t cmd0[NIX_DESCS_PER_LOOP], cmd1[NIX_DESCS_PER_LOOP], - cmd2[NIX_DESCS_PER_LOOP]; + cmd2[NIX_DESCS_PER_LOOP], cmd3[NIX_DESCS_PER_LOOP]; uint64_t *mbuf0, *mbuf1, *mbuf2, *mbuf3; uint64x2_t senddesc01_w0, senddesc23_w0; uint64x2_t senddesc01_w1, senddesc23_w1; uint64x2_t sendext01_w0, sendext23_w0; uint64x2_t sendext01_w1, sendext23_w1; + uint64x2_t sendmem01_w0, sendmem23_w0; + uint64x2_t sendmem01_w1, sendmem23_w1; uint64x2_t sgdesc01_w0, sgdesc23_w0; uint64x2_t sgdesc01_w1, sgdesc23_w1; struct cn9k_eth_txq *txq = tx_queue; @@ -597,6 +599,12 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, sendext23_w1 = sendext01_w1; sgdesc01_w0 = vld1q_dup_u64(&txq->cmd[4]); sgdesc23_w0 = sgdesc01_w0; + if (flags & NIX_TX_OFFLOAD_TSTAMP_F) { + sendmem01_w0 = vld1q_dup_u64(&txq->cmd[6]); + sendmem23_w0 = sendmem01_w0; + sendmem01_w1 = vld1q_dup_u64(&txq->cmd[7]); + sendmem23_w1 = sendmem01_w1; + } } else { sgdesc01_w0 = vld1q_dup_u64(&txq->cmd[2]); sgdesc23_w0 = sgdesc01_w0; @@ -618,6 +626,17 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, sendext23_w1 = sendext01_w1; } + if (flags & NIX_TX_OFFLOAD_TSTAMP_F) { + /* Reset send mem alg to SETTSTMP from SUB*/ + sendmem01_w0 = vbicq_u64(sendmem01_w0, + vdupq_n_u64(BIT_ULL(59))); + /* Reset send mem address to default. */ + sendmem01_w1 = + vbicq_u64(sendmem01_w1, vdupq_n_u64(0xF)); + sendmem23_w0 = sendmem01_w0; + sendmem23_w1 = sendmem01_w1; + } + /* Move mbufs to iova */ mbuf0 = (uint64_t *)tx_pkts[0]; mbuf1 = (uint64_t *)tx_pkts[1]; @@ -1229,6 +1248,44 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, sendext23_w1 = vorrq_u64(sendext23_w1, ytmp128); } + if (flags & NIX_TX_OFFLOAD_TSTAMP_F) { + /* Tx ol_flag for timestam. */ + const uint64x2_t olf = {PKT_TX_IEEE1588_TMST, + PKT_TX_IEEE1588_TMST}; + /* Set send mem alg to SUB. */ + const uint64x2_t alg = {BIT_ULL(59), BIT_ULL(59)}; + /* Increment send mem address by 8. */ + const uint64x2_t addr = {0x8, 0x8}; + + xtmp128 = vzip1q_u64(len_olflags0, len_olflags1); + ytmp128 = vzip1q_u64(len_olflags2, len_olflags3); + + /* Check if timestamp is requested and generate inverted + * mask as we need not make any changes to default cmd + * value. + */ + xtmp128 = vmvnq_u32(vtstq_u64(olf, xtmp128)); + ytmp128 = vmvnq_u32(vtstq_u64(olf, ytmp128)); + + /* Change send mem address to an 8 byte offset when + * TSTMP is disabled. + */ + sendmem01_w1 = vaddq_u64(sendmem01_w1, + vandq_u64(xtmp128, addr)); + sendmem23_w1 = vaddq_u64(sendmem23_w1, + vandq_u64(ytmp128, addr)); + /* Change send mem alg to SUB when TSTMP is disabled. */ + sendmem01_w0 = vorrq_u64(sendmem01_w0, + vandq_u64(xtmp128, alg)); + sendmem23_w0 = vorrq_u64(sendmem23_w0, + vandq_u64(ytmp128, alg)); + + cmd3[0] = vzip1q_u64(sendmem01_w0, sendmem01_w1); + cmd3[1] = vzip2q_u64(sendmem01_w0, sendmem01_w1); + cmd3[2] = vzip1q_u64(sendmem23_w0, sendmem23_w1); + cmd3[3] = vzip2q_u64(sendmem23_w0, sendmem23_w1); + } + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) { /* Set don't free bit if reference count > 1 */ xmask01 = vdupq_n_u64(0); @@ -1327,22 +1384,44 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, * Split and Tx twice. */ do { - vst1q_u64(lmt_addr, cmd0[0]); - vst1q_u64(lmt_addr + 2, cmd2[0]); - vst1q_u64(lmt_addr + 4, cmd1[0]); - vst1q_u64(lmt_addr + 6, cmd0[1]); - vst1q_u64(lmt_addr + 8, cmd2[1]); - vst1q_u64(lmt_addr + 10, cmd1[1]); + if (flags & NIX_TX_OFFLOAD_TSTAMP_F) { + vst1q_u64(lmt_addr, cmd0[0]); + vst1q_u64(lmt_addr + 2, cmd2[0]); + vst1q_u64(lmt_addr + 4, cmd1[0]); + vst1q_u64(lmt_addr + 6, cmd3[0]); + vst1q_u64(lmt_addr + 8, cmd0[1]); + vst1q_u64(lmt_addr + 10, cmd2[1]); + vst1q_u64(lmt_addr + 12, cmd1[1]); + vst1q_u64(lmt_addr + 14, cmd3[1]); + } else { + vst1q_u64(lmt_addr, cmd0[0]); + vst1q_u64(lmt_addr + 2, cmd2[0]); + vst1q_u64(lmt_addr + 4, cmd1[0]); + vst1q_u64(lmt_addr + 6, cmd0[1]); + vst1q_u64(lmt_addr + 8, cmd2[1]); + vst1q_u64(lmt_addr + 10, cmd1[1]); + } lmt_status = roc_lmt_submit_ldeor(io_addr); } while (lmt_status == 0); do { - vst1q_u64(lmt_addr, cmd0[2]); - vst1q_u64(lmt_addr + 2, cmd2[2]); - vst1q_u64(lmt_addr + 4, cmd1[2]); - vst1q_u64(lmt_addr + 6, cmd0[3]); - vst1q_u64(lmt_addr + 8, cmd2[3]); - vst1q_u64(lmt_addr + 10, cmd1[3]); + if (flags & NIX_TX_OFFLOAD_TSTAMP_F) { + vst1q_u64(lmt_addr, cmd0[2]); + vst1q_u64(lmt_addr + 2, cmd2[2]); + vst1q_u64(lmt_addr + 4, cmd1[2]); + vst1q_u64(lmt_addr + 6, cmd3[2]); + vst1q_u64(lmt_addr + 8, cmd0[3]); + vst1q_u64(lmt_addr + 10, cmd2[3]); + vst1q_u64(lmt_addr + 12, cmd1[3]); + vst1q_u64(lmt_addr + 14, cmd3[3]); + } else { + vst1q_u64(lmt_addr, cmd0[2]); + vst1q_u64(lmt_addr + 2, cmd2[2]); + vst1q_u64(lmt_addr + 4, cmd1[2]); + vst1q_u64(lmt_addr + 6, cmd0[3]); + vst1q_u64(lmt_addr + 8, cmd2[3]); + vst1q_u64(lmt_addr + 10, cmd1[3]); + } lmt_status = roc_lmt_submit_ldeor(io_addr); } while (lmt_status == 0); } else { diff --git a/drivers/net/cnxk/cn9k_tx_vec.c b/drivers/net/cnxk/cn9k_tx_vec.c index 5842facb5..9ade66db2 100644 --- a/drivers/net/cnxk/cn9k_tx_vec.c +++ b/drivers/net/cnxk/cn9k_tx_vec.c @@ -13,9 +13,8 @@ { \ uint64_t cmd[sz]; \ \ - /* VLAN, TSTMP, TSO is not supported by vec */ \ - if ((flags) & NIX_TX_OFFLOAD_TSTAMP_F || \ - (flags) & NIX_TX_OFFLOAD_TSO_F) \ + /* TSO is not supported by vec */ \ + if ((flags) & NIX_TX_OFFLOAD_TSO_F) \ return 0; \ return cn9k_nix_xmit_pkts_vector(tx_queue, tx_pkts, pkts, cmd, \ (flags)); \ From patchwork Mon Jun 28 19:41:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavan Nikhilesh Bhagavatula X-Patchwork-Id: 94931 Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E15B6A0A0C; Mon, 28 Jun 2021 21:42:22 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 9A76F41151; Mon, 28 Jun 2021 21:42:10 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by mails.dpdk.org (Postfix) with ESMTP id 71B5941150 for ; Mon, 28 Jun 2021 21:42:09 +0200 (CEST) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 15SJeqX9014516 for ; Mon, 28 Jun 2021 12:42:08 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=ihHkvENedz6HBF8VQpfXzXHTdBaCmtr9x913TNnIpu8=; b=P4umt7yPgjGbW8CKH2j8kh/sFgBckP5QLkA7Dyn5HH8swWDLd6kybnj3bp8YUHg71Zk7 m/dYBCwwD+DTtLrWaUG8D/N6PDh0ApGRWAUsxs25yxoKcPhwkKoeYWKryRuCeInSa0H7 vNN/Y4SyIKQi0pEUWGMFD9to2E3QYb1wsVq1z7UsgR8eS9oNKf0i8x9kbN4234aRCSJu FflDMwI7fCxI+PXCkBwtfuXjr2pCwnh7eKgdtKKEqMZ36aQ9s0ueiNKw/e2i9gM0zx3d MQIAqp6/pR4DwBEtx4+2R+XkgZe80JD2/Qt+RKZVeWLrRVAyBxIoM/e75ft7sqNBynJE Ag== Received: from dc5-exch02.marvell.com ([199.233.59.182]) by mx0b-0016f401.pphosted.com with ESMTP id 39f964agpv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Mon, 28 Jun 2021 12:42:08 -0700 Received: from DC5-EXCH02.marvell.com (10.69.176.39) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Mon, 28 Jun 2021 12:42:06 -0700 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server id 15.0.1497.18 via Frontend Transport; Mon, 28 Jun 2021 12:42:06 -0700 Received: from BG-LT7430.marvell.com (BG-LT7430.marvell.com [10.28.177.176]) by maili.marvell.com (Postfix) with ESMTP id 427EF3F7055; Mon, 28 Jun 2021 12:42:03 -0700 (PDT) From: To: , Nithin Dabilpuram , "Kiran Kumar K" , Sunil Kumar Kori , Satha Rao CC: , Pavan Nikhilesh Date: Tue, 29 Jun 2021 01:11:42 +0530 Message-ID: <20210628194144.637-5-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210628194144.637-1-pbhagavatula@marvell.com> References: <20210620202906.10974-1-pbhagavatula@marvell.com> <20210628194144.637-1-pbhagavatula@marvell.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: mJMgNN3U-7z5NLVV2ilPY6_uNyCEhVDi X-Proofpoint-GUID: mJMgNN3U-7z5NLVV2ilPY6_uNyCEhVDi X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-06-28_14:2021-06-25, 2021-06-28 signatures=0 Subject: [dpdk-dev] [PATCH v4 5/6] net/cnxk: enable TSO processing in vector Tx X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Pavan Nikhilesh Enable TSO offload in vector Tx burst function. Signed-off-by: Pavan Nikhilesh --- drivers/net/cnxk/cn10k_tx.c | 2 +- drivers/net/cnxk/cn10k_tx.h | 97 +++++++++++++++++++++++++++++++++ drivers/net/cnxk/cn10k_tx_vec.c | 5 +- drivers/net/cnxk/cn9k_tx.c | 2 +- drivers/net/cnxk/cn9k_tx.h | 94 ++++++++++++++++++++++++++++++++ drivers/net/cnxk/cn9k_tx_vec.c | 5 +- 6 files changed, 199 insertions(+), 6 deletions(-) diff --git a/drivers/net/cnxk/cn10k_tx.c b/drivers/net/cnxk/cn10k_tx.c index c4c3e6570..d06879163 100644 --- a/drivers/net/cnxk/cn10k_tx.c +++ b/drivers/net/cnxk/cn10k_tx.c @@ -67,7 +67,7 @@ cn10k_eth_set_tx_function(struct rte_eth_dev *eth_dev) #undef T }; - if (dev->scalar_ena || (dev->tx_offload_flags & NIX_TX_OFFLOAD_TSO_F)) + if (dev->scalar_ena) pick_tx_func(eth_dev, nix_eth_tx_burst); else pick_tx_func(eth_dev, nix_eth_tx_vec_burst); diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h index 8af6799ff..26797581e 100644 --- a/drivers/net/cnxk/cn10k_tx.h +++ b/drivers/net/cnxk/cn10k_tx.h @@ -689,6 +689,46 @@ cn10k_nix_xmit_pkts_mseg(void *tx_queue, struct rte_mbuf **tx_pkts, #if defined(RTE_ARCH_ARM64) +static __rte_always_inline void +cn10k_nix_prepare_tso(struct rte_mbuf *m, union nix_send_hdr_w1_u *w1, + union nix_send_ext_w0_u *w0, uint64_t ol_flags, + const uint64_t flags, const uint64_t lso_tun_fmt) +{ + uint16_t lso_sb; + uint64_t mask; + + if (!(ol_flags & PKT_TX_TCP_SEG)) + return; + + mask = -(!w1->il3type); + lso_sb = (mask & w1->ol4ptr) + (~mask & w1->il4ptr) + m->l4_len; + + w0->u |= BIT(14); + w0->lso_sb = lso_sb; + w0->lso_mps = m->tso_segsz; + w0->lso_format = NIX_LSO_FORMAT_IDX_TSOV4 + !!(ol_flags & PKT_TX_IPV6); + w1->ol4type = NIX_SENDL4TYPE_TCP_CKSUM; + + /* Handle tunnel tso */ + if ((flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) && + (ol_flags & PKT_TX_TUNNEL_MASK)) { + const uint8_t is_udp_tun = + (CNXK_NIX_UDP_TUN_BITMASK >> + ((ol_flags & PKT_TX_TUNNEL_MASK) >> 45)) & + 0x1; + uint8_t shift = is_udp_tun ? 32 : 0; + + shift += (!!(ol_flags & PKT_TX_OUTER_IPV6) << 4); + shift += (!!(ol_flags & PKT_TX_IPV6) << 3); + + w1->il4type = NIX_SENDL4TYPE_TCP_CKSUM; + w1->ol4type = is_udp_tun ? NIX_SENDL4TYPE_UDP_CKSUM : 0; + /* Update format for UDP tunneled packet */ + + w0->lso_format = (lso_tun_fmt >> shift); + } +} + #define NIX_DESCS_PER_LOOP 4 static __rte_always_inline uint16_t cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, @@ -723,6 +763,11 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, /* Reduce the cached count */ txq->fc_cache_pkts -= pkts; + /* Perform header writes before barrier for TSO */ + if (flags & NIX_TX_OFFLOAD_TSO_F) { + for (i = 0; i < pkts; i++) + cn10k_nix_xmit_prepare_tso(tx_pkts[i], flags); + } senddesc01_w0 = vld1q_dup_u64(&txq->send_hdr_w0); senddesc23_w0 = senddesc01_w0; @@ -781,6 +826,13 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, sendmem23_w1 = sendmem01_w1; } + if (flags & NIX_TX_OFFLOAD_TSO_F) { + /* Clear the LSO enable bit. */ + sendext01_w0 = vbicq_u64(sendext01_w0, + vdupq_n_u64(BIT_ULL(14))); + sendext23_w0 = sendext01_w0; + } + /* Move mbufs to iova */ mbuf0 = (uint64_t *)tx_pkts[0]; mbuf1 = (uint64_t *)tx_pkts[1]; @@ -1430,6 +1482,51 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, cmd3[3] = vzip2q_u64(sendmem23_w0, sendmem23_w1); } + if (flags & NIX_TX_OFFLOAD_TSO_F) { + const uint64_t lso_fmt = txq->lso_tun_fmt; + uint64_t sx_w0[NIX_DESCS_PER_LOOP]; + uint64_t sd_w1[NIX_DESCS_PER_LOOP]; + + /* Extract SD W1 as we need to set L4 types. */ + vst1q_u64(sd_w1, senddesc01_w1); + vst1q_u64(sd_w1 + 2, senddesc23_w1); + + /* Extract SX W0 as we need to set LSO fields. */ + vst1q_u64(sx_w0, sendext01_w0); + vst1q_u64(sx_w0 + 2, sendext23_w0); + + /* Extract ol_flags. */ + xtmp128 = vzip1q_u64(len_olflags0, len_olflags1); + ytmp128 = vzip1q_u64(len_olflags2, len_olflags3); + + /* Prepare individual mbufs. */ + cn10k_nix_prepare_tso(tx_pkts[0], + (union nix_send_hdr_w1_u *)&sd_w1[0], + (union nix_send_ext_w0_u *)&sx_w0[0], + vgetq_lane_u64(xtmp128, 0), flags, lso_fmt); + + cn10k_nix_prepare_tso(tx_pkts[1], + (union nix_send_hdr_w1_u *)&sd_w1[1], + (union nix_send_ext_w0_u *)&sx_w0[1], + vgetq_lane_u64(xtmp128, 1), flags, lso_fmt); + + cn10k_nix_prepare_tso(tx_pkts[2], + (union nix_send_hdr_w1_u *)&sd_w1[2], + (union nix_send_ext_w0_u *)&sx_w0[2], + vgetq_lane_u64(ytmp128, 0), flags, lso_fmt); + + cn10k_nix_prepare_tso(tx_pkts[3], + (union nix_send_hdr_w1_u *)&sd_w1[3], + (union nix_send_ext_w0_u *)&sx_w0[3], + vgetq_lane_u64(ytmp128, 1), flags, lso_fmt); + + senddesc01_w1 = vld1q_u64(sd_w1); + senddesc23_w1 = vld1q_u64(sd_w1 + 2); + + sendext01_w0 = vld1q_u64(sx_w0); + sendext23_w0 = vld1q_u64(sx_w0 + 2); + } + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) { /* Set don't free bit if reference count > 1 */ xmask01 = vdupq_n_u64(0); diff --git a/drivers/net/cnxk/cn10k_tx_vec.c b/drivers/net/cnxk/cn10k_tx_vec.c index 0b4a4c7ba..34e373750 100644 --- a/drivers/net/cnxk/cn10k_tx_vec.c +++ b/drivers/net/cnxk/cn10k_tx_vec.c @@ -13,8 +13,9 @@ { \ uint64_t cmd[sz]; \ \ - /* TSO is not supported by vec */ \ - if ((flags) & NIX_TX_OFFLOAD_TSO_F) \ + /* For TSO inner checksum is a must */ \ + if (((flags) & NIX_TX_OFFLOAD_TSO_F) && \ + !((flags) & NIX_TX_OFFLOAD_L3_L4_CSUM_F)) \ return 0; \ return cn10k_nix_xmit_pkts_vector(tx_queue, tx_pkts, pkts, cmd,\ (flags)); \ diff --git a/drivers/net/cnxk/cn9k_tx.c b/drivers/net/cnxk/cn9k_tx.c index c32681ed4..735e21cc6 100644 --- a/drivers/net/cnxk/cn9k_tx.c +++ b/drivers/net/cnxk/cn9k_tx.c @@ -66,7 +66,7 @@ cn9k_eth_set_tx_function(struct rte_eth_dev *eth_dev) #undef T }; - if (dev->scalar_ena || (dev->tx_offload_flags & NIX_TX_OFFLOAD_TSO_F)) + if (dev->scalar_ena) pick_tx_func(eth_dev, nix_eth_tx_burst); else pick_tx_func(eth_dev, nix_eth_tx_vec_burst); diff --git a/drivers/net/cnxk/cn9k_tx.h b/drivers/net/cnxk/cn9k_tx.h index cb574a1c1..dca732a9f 100644 --- a/drivers/net/cnxk/cn9k_tx.h +++ b/drivers/net/cnxk/cn9k_tx.h @@ -545,6 +545,43 @@ cn9k_nix_xmit_pkts_mseg(void *tx_queue, struct rte_mbuf **tx_pkts, #if defined(RTE_ARCH_ARM64) +static __rte_always_inline void +cn9k_nix_prepare_tso(struct rte_mbuf *m, union nix_send_hdr_w1_u *w1, + union nix_send_ext_w0_u *w0, uint64_t ol_flags, + uint64_t flags) +{ + uint16_t lso_sb; + uint64_t mask; + + if (!(ol_flags & PKT_TX_TCP_SEG)) + return; + + mask = -(!w1->il3type); + lso_sb = (mask & w1->ol4ptr) + (~mask & w1->il4ptr) + m->l4_len; + + w0->u |= BIT(14); + w0->lso_sb = lso_sb; + w0->lso_mps = m->tso_segsz; + w0->lso_format = NIX_LSO_FORMAT_IDX_TSOV4 + !!(ol_flags & PKT_TX_IPV6); + w1->ol4type = NIX_SENDL4TYPE_TCP_CKSUM; + + /* Handle tunnel tso */ + if ((flags & NIX_TX_OFFLOAD_OL3_OL4_CSUM_F) && + (ol_flags & PKT_TX_TUNNEL_MASK)) { + const uint8_t is_udp_tun = + (CNXK_NIX_UDP_TUN_BITMASK >> + ((ol_flags & PKT_TX_TUNNEL_MASK) >> 45)) & + 0x1; + + w1->il4type = NIX_SENDL4TYPE_TCP_CKSUM; + w1->ol4type = is_udp_tun ? NIX_SENDL4TYPE_UDP_CKSUM : 0; + /* Update format for UDP tunneled packet */ + w0->lso_format += is_udp_tun ? 2 : 6; + + w0->lso_format += !!(ol_flags & PKT_TX_OUTER_IPV6) << 1; + } +} + #define NIX_DESCS_PER_LOOP 4 static __rte_always_inline uint16_t cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, @@ -580,6 +617,12 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, /* Reduce the cached count */ txq->fc_cache_pkts -= pkts; + /* Perform header writes before barrier for TSO */ + if (flags & NIX_TX_OFFLOAD_TSO_F) { + for (i = 0; i < pkts; i++) + cn9k_nix_xmit_prepare_tso(tx_pkts[i], flags); + } + /* Lets commit any changes in the packet here as no further changes * to the packet will be done unless no fast free is enabled. */ @@ -637,6 +680,13 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, sendmem23_w1 = sendmem01_w1; } + if (flags & NIX_TX_OFFLOAD_TSO_F) { + /* Clear the LSO enable bit. */ + sendext01_w0 = vbicq_u64(sendext01_w0, + vdupq_n_u64(BIT_ULL(14))); + sendext23_w0 = sendext01_w0; + } + /* Move mbufs to iova */ mbuf0 = (uint64_t *)tx_pkts[0]; mbuf1 = (uint64_t *)tx_pkts[1]; @@ -1286,6 +1336,50 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, cmd3[3] = vzip2q_u64(sendmem23_w0, sendmem23_w1); } + if (flags & NIX_TX_OFFLOAD_TSO_F) { + uint64_t sx_w0[NIX_DESCS_PER_LOOP]; + uint64_t sd_w1[NIX_DESCS_PER_LOOP]; + + /* Extract SD W1 as we need to set L4 types. */ + vst1q_u64(sd_w1, senddesc01_w1); + vst1q_u64(sd_w1 + 2, senddesc23_w1); + + /* Extract SX W0 as we need to set LSO fields. */ + vst1q_u64(sx_w0, sendext01_w0); + vst1q_u64(sx_w0 + 2, sendext23_w0); + + /* Extract ol_flags. */ + xtmp128 = vzip1q_u64(len_olflags0, len_olflags1); + ytmp128 = vzip1q_u64(len_olflags2, len_olflags3); + + /* Prepare individual mbufs. */ + cn9k_nix_prepare_tso(tx_pkts[0], + (union nix_send_hdr_w1_u *)&sd_w1[0], + (union nix_send_ext_w0_u *)&sx_w0[0], + vgetq_lane_u64(xtmp128, 0), flags); + + cn9k_nix_prepare_tso(tx_pkts[1], + (union nix_send_hdr_w1_u *)&sd_w1[1], + (union nix_send_ext_w0_u *)&sx_w0[1], + vgetq_lane_u64(xtmp128, 1), flags); + + cn9k_nix_prepare_tso(tx_pkts[2], + (union nix_send_hdr_w1_u *)&sd_w1[2], + (union nix_send_ext_w0_u *)&sx_w0[2], + vgetq_lane_u64(ytmp128, 0), flags); + + cn9k_nix_prepare_tso(tx_pkts[3], + (union nix_send_hdr_w1_u *)&sd_w1[3], + (union nix_send_ext_w0_u *)&sx_w0[3], + vgetq_lane_u64(ytmp128, 1), flags); + + senddesc01_w1 = vld1q_u64(sd_w1); + senddesc23_w1 = vld1q_u64(sd_w1 + 2); + + sendext01_w0 = vld1q_u64(sx_w0); + sendext23_w0 = vld1q_u64(sx_w0 + 2); + } + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) { /* Set don't free bit if reference count > 1 */ xmask01 = vdupq_n_u64(0); diff --git a/drivers/net/cnxk/cn9k_tx_vec.c b/drivers/net/cnxk/cn9k_tx_vec.c index 9ade66db2..56a3e2514 100644 --- a/drivers/net/cnxk/cn9k_tx_vec.c +++ b/drivers/net/cnxk/cn9k_tx_vec.c @@ -13,8 +13,9 @@ { \ uint64_t cmd[sz]; \ \ - /* TSO is not supported by vec */ \ - if ((flags) & NIX_TX_OFFLOAD_TSO_F) \ + /* For TSO inner checksum is a must */ \ + if (((flags) & NIX_TX_OFFLOAD_TSO_F) && \ + !((flags) & NIX_TX_OFFLOAD_L3_L4_CSUM_F)) \ return 0; \ return cn9k_nix_xmit_pkts_vector(tx_queue, tx_pkts, pkts, cmd, \ (flags)); \ From patchwork Mon Jun 28 19:41:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pavan Nikhilesh Bhagavatula X-Patchwork-Id: 94932 Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E9D16A0A0C; Mon, 28 Jun 2021 21:42:30 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 2F2B141159; Mon, 28 Jun 2021 21:42:16 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by mails.dpdk.org (Postfix) with ESMTP id 80F5C4003F for ; Mon, 28 Jun 2021 21:42:13 +0200 (CEST) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 15SJeccs014748 for ; Mon, 28 Jun 2021 12:42:12 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0220; bh=ueqP0mtBsfjM9/A/Ri1+3XC5ET/B0es54l+aPLJYack=; b=gvIZmCdwmbmIGQ4BudHpFbxqMiATsHnVIZH/28gGA/ERwejVyT60uVaaenbEY/30xBKb Fy+m9Zp6K105l1EgVcBtp0sq6z3i8MfxbCT51c7rMbjK4KbdwLBzIdnZPbbuaS/ZKfrc MCz8KuOIF2KgO8RO7mBjVGr5lOZ/wlcMiz0B6v9OL+PC1psXEupliM5KlEG/RvCcGv6t ayk5rzXW4ybSqYPabhuUIL3EdukpTi3ZZXxTQtszA3kqA0r1A9Gd0IRR3EOX6nthqh2d z2veJmrNskcb0rVEe9KQzCI6tti5ECeSaxmQGDqTPIozVEY+dCg5zIi3+rjCvFmUZQE+ iQ== Received: from dc5-exch02.marvell.com ([199.233.59.182]) by mx0a-0016f401.pphosted.com with ESMTP id 39f11y3unc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Mon, 28 Jun 2021 12:42:12 -0700 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by DC5-EXCH02.marvell.com (10.69.176.39) with Microsoft SMTP Server (TLS) id 15.0.1497.18; Mon, 28 Jun 2021 12:42:10 -0700 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server id 15.0.1497.18 via Frontend Transport; Mon, 28 Jun 2021 12:42:10 -0700 Received: from BG-LT7430.marvell.com (BG-LT7430.marvell.com [10.28.177.176]) by maili.marvell.com (Postfix) with ESMTP id 385EA3F7055; Mon, 28 Jun 2021 12:42:07 -0700 (PDT) From: To: , Nithin Dabilpuram , "Kiran Kumar K" , Sunil Kumar Kori , Satha Rao CC: , Pavan Nikhilesh Date: Tue, 29 Jun 2021 01:11:43 +0530 Message-ID: <20210628194144.637-6-pbhagavatula@marvell.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20210628194144.637-1-pbhagavatula@marvell.com> References: <20210620202906.10974-1-pbhagavatula@marvell.com> <20210628194144.637-1-pbhagavatula@marvell.com> MIME-Version: 1.0 X-Proofpoint-GUID: t2sX6mGjUlGofRKXXFEgrEbRD8bAGwK2 X-Proofpoint-ORIG-GUID: t2sX6mGjUlGofRKXXFEgrEbRD8bAGwK2 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-06-28_14:2021-06-25, 2021-06-28 signatures=0 Subject: [dpdk-dev] [PATCH v4 6/6] net/cnxk: add multi seg Tx vector routine X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Pavan Nikhilesh Add multi segment Tx vector routine. Signed-off-by: Pavan Nikhilesh --- drivers/net/cnxk/cn10k_tx.c | 20 +- drivers/net/cnxk/cn10k_tx.h | 388 +++++++++++++++++++++++++-- drivers/net/cnxk/cn10k_tx_vec_mseg.c | 24 ++ drivers/net/cnxk/cn9k_tx.c | 20 +- drivers/net/cnxk/cn9k_tx.h | 272 ++++++++++++++++++- drivers/net/cnxk/cn9k_tx_vec_mseg.c | 24 ++ drivers/net/cnxk/meson.build | 6 +- 7 files changed, 709 insertions(+), 45 deletions(-) create mode 100644 drivers/net/cnxk/cn10k_tx_vec_mseg.c create mode 100644 drivers/net/cnxk/cn9k_tx_vec_mseg.c diff --git a/drivers/net/cnxk/cn10k_tx.c b/drivers/net/cnxk/cn10k_tx.c index d06879163..1f30bab59 100644 --- a/drivers/net/cnxk/cn10k_tx.c +++ b/drivers/net/cnxk/cn10k_tx.c @@ -67,13 +67,23 @@ cn10k_eth_set_tx_function(struct rte_eth_dev *eth_dev) #undef T }; - if (dev->scalar_ena) + const eth_tx_burst_t nix_eth_tx_vec_burst_mseg[2][2][2][2][2][2] = { +#define T(name, f5, f4, f3, f2, f1, f0, sz, flags) \ + [f5][f4][f3][f2][f1][f0] = cn10k_nix_xmit_pkts_vec_mseg_##name, + + NIX_TX_FASTPATH_MODES +#undef T + }; + + if (dev->scalar_ena) { pick_tx_func(eth_dev, nix_eth_tx_burst); - else + if (dev->tx_offloads & DEV_TX_OFFLOAD_MULTI_SEGS) + pick_tx_func(eth_dev, nix_eth_tx_burst_mseg); + } else { pick_tx_func(eth_dev, nix_eth_tx_vec_burst); - - if (dev->tx_offloads & DEV_TX_OFFLOAD_MULTI_SEGS) - pick_tx_func(eth_dev, nix_eth_tx_burst_mseg); + if (dev->tx_offloads & DEV_TX_OFFLOAD_MULTI_SEGS) + pick_tx_func(eth_dev, nix_eth_tx_vec_burst_mseg); + } rte_mb(); } diff --git a/drivers/net/cnxk/cn10k_tx.h b/drivers/net/cnxk/cn10k_tx.h index 26797581e..532b53b31 100644 --- a/drivers/net/cnxk/cn10k_tx.h +++ b/drivers/net/cnxk/cn10k_tx.h @@ -42,6 +42,13 @@ } \ } while (0) +/* Encoded number of segments to number of dwords macro, each value of nb_segs + * is encoded as 4bits. + */ +#define NIX_SEGDW_MAGIC 0x76654432210ULL + +#define NIX_NB_SEGS_TO_SEGDW(x) ((NIX_SEGDW_MAGIC >> ((x) << 2)) & 0xF) + #define LMT_OFF(lmt_addr, lmt_num, offset) \ (void *)((lmt_addr) + ((lmt_num) << ROC_LMT_LINE_SIZE_LOG2) + (offset)) @@ -102,6 +109,14 @@ cn10k_nix_tx_steor_data(const uint16_t flags) return data; } +static __rte_always_inline uint8_t +cn10k_nix_tx_dwords_per_line_seg(const uint16_t flags) +{ + return ((flags & NIX_TX_NEED_EXT_HDR) ? + (flags & NIX_TX_OFFLOAD_TSTAMP_F) ? 8 : 6 : + 4); +} + static __rte_always_inline uint64_t cn10k_nix_tx_steor_vec_data(const uint16_t flags) { @@ -729,7 +744,244 @@ cn10k_nix_prepare_tso(struct rte_mbuf *m, union nix_send_hdr_w1_u *w1, } } +static __rte_always_inline void +cn10k_nix_prepare_mseg_vec_list(struct rte_mbuf *m, uint64_t *cmd, + union nix_send_hdr_w0_u *sh, + union nix_send_sg_s *sg, const uint32_t flags) +{ + struct rte_mbuf *m_next; + uint64_t *slist, sg_u; + uint16_t nb_segs; + int i = 1; + + sh->total = m->pkt_len; + /* Clear sg->u header before use */ + sg->u &= 0xFC00000000000000; + sg_u = sg->u; + slist = &cmd[0]; + + sg_u = sg_u | ((uint64_t)m->data_len); + + nb_segs = m->nb_segs - 1; + m_next = m->next; + + /* Set invert df if buffer is not to be freed by H/W */ + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) + sg_u |= (cnxk_nix_prefree_seg(m) << 55); + /* Mark mempool object as "put" since it is freed by NIX */ +#ifdef RTE_LIBRTE_MEMPOOL_DEBUG + if (!(sg_u & (1ULL << 55))) + __mempool_check_cookies(m->pool, (void **)&m, 1, 0); + rte_io_wmb(); +#endif + + m = m_next; + /* Fill mbuf segments */ + do { + m_next = m->next; + sg_u = sg_u | ((uint64_t)m->data_len << (i << 4)); + *slist = rte_mbuf_data_iova(m); + /* Set invert df if buffer is not to be freed by H/W */ + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) + sg_u |= (cnxk_nix_prefree_seg(m) << (i + 55)); + /* Mark mempool object as "put" since it is freed by NIX + */ +#ifdef RTE_LIBRTE_MEMPOOL_DEBUG + if (!(sg_u & (1ULL << (i + 55)))) + __mempool_check_cookies(m->pool, (void **)&m, 1, 0); + rte_io_wmb(); +#endif + slist++; + i++; + nb_segs--; + if (i > 2 && nb_segs) { + i = 0; + /* Next SG subdesc */ + *(uint64_t *)slist = sg_u & 0xFC00000000000000; + sg->u = sg_u; + sg->segs = 3; + sg = (union nix_send_sg_s *)slist; + sg_u = sg->u; + slist++; + } + m = m_next; + } while (nb_segs); + + sg->u = sg_u; + sg->segs = i; +} + +static __rte_always_inline void +cn10k_nix_prepare_mseg_vec(struct rte_mbuf *m, uint64_t *cmd, uint64x2_t *cmd0, + uint64x2_t *cmd1, const uint8_t segdw, + const uint32_t flags) +{ + union nix_send_hdr_w0_u sh; + union nix_send_sg_s sg; + + if (m->nb_segs == 1) { + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) { + sg.u = vgetq_lane_u64(cmd1[0], 0); + sg.u |= (cnxk_nix_prefree_seg(m) << 55); + cmd1[0] = vsetq_lane_u64(sg.u, cmd1[0], 0); + } + +#ifdef RTE_LIBRTE_MEMPOOL_DEBUG + sg.u = vgetq_lane_u64(cmd1[0], 0); + if (!(sg.u & (1ULL << 55))) + __mempool_check_cookies(m->pool, (void **)&m, 1, 0); + rte_io_wmb(); +#endif + return; + } + + sh.u = vgetq_lane_u64(cmd0[0], 0); + sg.u = vgetq_lane_u64(cmd1[0], 0); + + cn10k_nix_prepare_mseg_vec_list(m, cmd, &sh, &sg, flags); + + sh.sizem1 = segdw - 1; + cmd0[0] = vsetq_lane_u64(sh.u, cmd0[0], 0); + cmd1[0] = vsetq_lane_u64(sg.u, cmd1[0], 0); +} + #define NIX_DESCS_PER_LOOP 4 + +static __rte_always_inline uint8_t +cn10k_nix_prep_lmt_mseg_vector(struct rte_mbuf **mbufs, uint64x2_t *cmd0, + uint64x2_t *cmd1, uint64x2_t *cmd2, + uint64x2_t *cmd3, uint8_t *segdw, + uint64_t *lmt_addr, __uint128_t *data128, + uint8_t *shift, const uint16_t flags) +{ + uint8_t j, off, lmt_used; + + if (!(flags & NIX_TX_NEED_EXT_HDR) && + !(flags & NIX_TX_OFFLOAD_TSTAMP_F)) { + /* No segments in 4 consecutive packets. */ + if ((segdw[0] + segdw[1] + segdw[2] + segdw[3]) <= 8) { + for (j = 0; j < NIX_DESCS_PER_LOOP; j++) + cn10k_nix_prepare_mseg_vec(mbufs[j], NULL, + &cmd0[j], &cmd1[j], + segdw[j], flags); + vst1q_u64(lmt_addr, cmd0[0]); + vst1q_u64(lmt_addr + 2, cmd1[0]); + vst1q_u64(lmt_addr + 4, cmd0[1]); + vst1q_u64(lmt_addr + 6, cmd1[1]); + vst1q_u64(lmt_addr + 8, cmd0[2]); + vst1q_u64(lmt_addr + 10, cmd1[2]); + vst1q_u64(lmt_addr + 12, cmd0[3]); + vst1q_u64(lmt_addr + 14, cmd1[3]); + + *data128 |= ((__uint128_t)7) << *shift; + shift += 3; + + return 1; + } + } + + lmt_used = 0; + for (j = 0; j < NIX_DESCS_PER_LOOP;) { + /* Fit consecutive packets in same LMTLINE. */ + if ((segdw[j] + segdw[j + 1]) <= 8) { + if (flags & NIX_TX_OFFLOAD_TSTAMP_F) { + cn10k_nix_prepare_mseg_vec(mbufs[j], NULL, + &cmd0[j], &cmd1[j], + segdw[j], flags); + cn10k_nix_prepare_mseg_vec(mbufs[j + 1], NULL, + &cmd0[j + 1], + &cmd1[j + 1], + segdw[j + 1], flags); + /* TSTAMP takes 4 each, no segs. */ + vst1q_u64(lmt_addr, cmd0[j]); + vst1q_u64(lmt_addr + 2, cmd2[j]); + vst1q_u64(lmt_addr + 4, cmd1[j]); + vst1q_u64(lmt_addr + 6, cmd3[j]); + + vst1q_u64(lmt_addr + 8, cmd0[j + 1]); + vst1q_u64(lmt_addr + 10, cmd2[j + 1]); + vst1q_u64(lmt_addr + 12, cmd1[j + 1]); + vst1q_u64(lmt_addr + 14, cmd3[j + 1]); + } else if (flags & NIX_TX_NEED_EXT_HDR) { + /* EXT header take 3 each, space for 2 segs.*/ + cn10k_nix_prepare_mseg_vec(mbufs[j], + lmt_addr + 6, + &cmd0[j], &cmd1[j], + segdw[j], flags); + vst1q_u64(lmt_addr, cmd0[j]); + vst1q_u64(lmt_addr + 2, cmd2[j]); + vst1q_u64(lmt_addr + 4, cmd1[j]); + off = segdw[j] - 3; + off <<= 1; + cn10k_nix_prepare_mseg_vec(mbufs[j + 1], + lmt_addr + 12 + off, + &cmd0[j + 1], + &cmd1[j + 1], + segdw[j + 1], flags); + vst1q_u64(lmt_addr + 6 + off, cmd0[j + 1]); + vst1q_u64(lmt_addr + 8 + off, cmd2[j + 1]); + vst1q_u64(lmt_addr + 10 + off, cmd1[j + 1]); + } else { + cn10k_nix_prepare_mseg_vec(mbufs[j], + lmt_addr + 4, + &cmd0[j], &cmd1[j], + segdw[j], flags); + vst1q_u64(lmt_addr, cmd0[j]); + vst1q_u64(lmt_addr + 2, cmd1[j]); + off = segdw[j] - 2; + off <<= 1; + cn10k_nix_prepare_mseg_vec(mbufs[j + 1], + lmt_addr + 8 + off, + &cmd0[j + 1], + &cmd1[j + 1], + segdw[j + 1], flags); + vst1q_u64(lmt_addr + 4 + off, cmd0[j + 1]); + vst1q_u64(lmt_addr + 6 + off, cmd1[j + 1]); + } + *data128 |= ((__uint128_t)(segdw[j] + segdw[j + 1]) - 1) + << *shift; + *shift += 3; + j += 2; + } else { + if ((flags & NIX_TX_NEED_EXT_HDR) && + (flags & NIX_TX_OFFLOAD_TSTAMP_F)) { + cn10k_nix_prepare_mseg_vec(mbufs[j], + lmt_addr + 6, + &cmd0[j], &cmd1[j], + segdw[j], flags); + vst1q_u64(lmt_addr, cmd0[j]); + vst1q_u64(lmt_addr + 2, cmd2[j]); + vst1q_u64(lmt_addr + 4, cmd1[j]); + off = segdw[j] - 4; + off <<= 1; + vst1q_u64(lmt_addr + 6 + off, cmd3[j]); + } else if (flags & NIX_TX_NEED_EXT_HDR) { + cn10k_nix_prepare_mseg_vec(mbufs[j], + lmt_addr + 6, + &cmd0[j], &cmd1[j], + segdw[j], flags); + vst1q_u64(lmt_addr, cmd0[j]); + vst1q_u64(lmt_addr + 2, cmd2[j]); + vst1q_u64(lmt_addr + 4, cmd1[j]); + } else { + cn10k_nix_prepare_mseg_vec(mbufs[j], + lmt_addr + 4, + &cmd0[j], &cmd1[j], + segdw[j], flags); + vst1q_u64(lmt_addr, cmd0[j]); + vst1q_u64(lmt_addr + 2, cmd1[j]); + } + *data128 |= ((__uint128_t)(segdw[j]) - 1) << *shift; + *shift += 3; + j++; + } + lmt_used++; + lmt_addr += 16; + } + + return lmt_used; +} + static __rte_always_inline uint16_t cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts, uint64_t *cmd, const uint16_t flags) @@ -738,7 +990,7 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, uint64x2_t len_olflags0, len_olflags1, len_olflags2, len_olflags3; uint64x2_t cmd0[NIX_DESCS_PER_LOOP], cmd1[NIX_DESCS_PER_LOOP], cmd2[NIX_DESCS_PER_LOOP], cmd3[NIX_DESCS_PER_LOOP]; - uint64_t *mbuf0, *mbuf1, *mbuf2, *mbuf3, data, pa; + uint64_t *mbuf0, *mbuf1, *mbuf2, *mbuf3, pa; uint64x2_t senddesc01_w0, senddesc23_w0; uint64x2_t senddesc01_w1, senddesc23_w1; uint16_t left, scalar, burst, i, lmt_id; @@ -746,6 +998,7 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, uint64x2_t sendext01_w1, sendext23_w1; uint64x2_t sendmem01_w0, sendmem23_w0; uint64x2_t sendmem01_w1, sendmem23_w1; + uint8_t segdw[NIX_DESCS_PER_LOOP + 1]; uint64x2_t sgdesc01_w0, sgdesc23_w0; uint64x2_t sgdesc01_w1, sgdesc23_w1; struct cn10k_eth_txq *txq = tx_queue; @@ -754,7 +1007,11 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, uint64x2_t ltypes01, ltypes23; uint64x2_t xtmp128, ytmp128; uint64x2_t xmask01, xmask23; - uint8_t lnum; + uint8_t lnum, shift; + union wdata { + __uint128_t data128; + uint64_t data[2]; + } wd; NIX_XMIT_FC_OR_RETURN(txq, pkts); @@ -798,8 +1055,43 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, burst = left > cn10k_nix_pkts_per_vec_brst(flags) ? cn10k_nix_pkts_per_vec_brst(flags) : left; + if (flags & NIX_TX_MULTI_SEG_F) { + wd.data128 = 0; + shift = 16; + } lnum = 0; + for (i = 0; i < burst; i += NIX_DESCS_PER_LOOP) { + if (flags & NIX_TX_MULTI_SEG_F) { + struct rte_mbuf *m = tx_pkts[j]; + uint8_t j; + + for (j = 0; j < NIX_DESCS_PER_LOOP; j++) { + /* Get dwords based on nb_segs. */ + segdw[j] = NIX_NB_SEGS_TO_SEGDW(m->nb_segs); + /* Add dwords based on offloads. */ + segdw[j] += 1 + /* SEND HDR */ + !!(flags & NIX_TX_NEED_EXT_HDR) + + !!(flags & NIX_TX_OFFLOAD_TSTAMP_F); + } + + /* Check if there are enough LMTLINES for this loop */ + if (lnum + 4 > 32) { + uint8_t ldwords_con = 0, lneeded = 0; + for (j = 0; j < NIX_DESCS_PER_LOOP; j++) { + ldwords_con += segdw[j]; + if (ldwords_con > 8) { + lneeded += 1; + ldwords_con = segdw[j]; + } + } + lneeded += 1; + if (lnum + lneeded > 32) { + burst = i; + break; + } + } + } /* Clear lower 32bit of SEND_HDR_W0 and SEND_SG_W0 */ senddesc01_w0 = vbicq_u64(senddesc01_w0, vdupq_n_u64(0xFFFFFFFF)); @@ -1527,7 +1819,8 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, sendext23_w0 = vld1q_u64(sx_w0 + 2); } - if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) { + if ((flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) && + !(flags & NIX_TX_MULTI_SEG_F)) { /* Set don't free bit if reference count > 1 */ xmask01 = vdupq_n_u64(0); xmask23 = xmask01; @@ -1567,7 +1860,7 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, (void **)&mbuf3, 1, 0); senddesc01_w0 = vorrq_u64(senddesc01_w0, xmask01); senddesc23_w0 = vorrq_u64(senddesc23_w0, xmask23); - } else { + } else if (!(flags & NIX_TX_MULTI_SEG_F)) { /* Move mbufs to iova */ mbuf0 = (uint64_t *)tx_pkts[0]; mbuf1 = (uint64_t *)tx_pkts[1]; @@ -1612,7 +1905,19 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, cmd2[3] = vzip2q_u64(sendext23_w0, sendext23_w1); } - if (flags & NIX_TX_NEED_EXT_HDR) { + if (flags & NIX_TX_MULTI_SEG_F) { + uint8_t j; + + segdw[4] = 8; + j = cn10k_nix_prep_lmt_mseg_vector(tx_pkts, cmd0, cmd1, + cmd2, cmd3, segdw, + (uint64_t *) + LMT_OFF(laddr, lnum, + 0), + &wd.data128, &shift, + flags); + lnum += j; + } else if (flags & NIX_TX_NEED_EXT_HDR) { /* Store the prepared send desc to LMT lines */ if (flags & NIX_TX_OFFLOAD_TSTAMP_F) { vst1q_u64(LMT_OFF(laddr, lnum, 0), cmd0[0]); @@ -1664,34 +1969,55 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, tx_pkts = tx_pkts + NIX_DESCS_PER_LOOP; } + if (flags & NIX_TX_MULTI_SEG_F) + wd.data[0] >>= 16; + /* Trigger LMTST */ if (lnum > 16) { - data = cn10k_nix_tx_steor_vec_data(flags); - pa = io_addr | (data & 0x7) << 4; - data &= ~0x7ULL; - data |= (15ULL << 12); - data |= (uint64_t)lmt_id; + if (!(flags & NIX_TX_MULTI_SEG_F)) + wd.data[0] = cn10k_nix_tx_steor_vec_data(flags); + + pa = io_addr | (wd.data[0] & 0x7) << 4; + wd.data[0] &= ~0x7ULL; + + if (flags & NIX_TX_MULTI_SEG_F) + wd.data[0] <<= 16; + + wd.data[0] |= (15ULL << 12); + wd.data[0] |= (uint64_t)lmt_id; /* STEOR0 */ - roc_lmt_submit_steorl(data, pa); + roc_lmt_submit_steorl(wd.data[0], pa); - data = cn10k_nix_tx_steor_vec_data(flags); - pa = io_addr | (data & 0x7) << 4; - data &= ~0x7ULL; - data |= ((uint64_t)(lnum - 17)) << 12; - data |= (uint64_t)(lmt_id + 16); + if (!(flags & NIX_TX_MULTI_SEG_F)) + wd.data[1] = cn10k_nix_tx_steor_vec_data(flags); + + pa = io_addr | (wd.data[1] & 0x7) << 4; + wd.data[1] &= ~0x7ULL; + + if (flags & NIX_TX_MULTI_SEG_F) + wd.data[1] <<= 16; + + wd.data[1] |= ((uint64_t)(lnum - 17)) << 12; + wd.data[1] |= (uint64_t)(lmt_id + 16); /* STEOR1 */ - roc_lmt_submit_steorl(data, pa); + roc_lmt_submit_steorl(wd.data[1], pa); } else if (lnum) { - data = cn10k_nix_tx_steor_vec_data(flags); - pa = io_addr | (data & 0x7) << 4; - data &= ~0x7ULL; - data |= ((uint64_t)(lnum - 1)) << 12; - data |= lmt_id; + if (!(flags & NIX_TX_MULTI_SEG_F)) + wd.data[0] = cn10k_nix_tx_steor_vec_data(flags); + + pa = io_addr | (wd.data[0] & 0x7) << 4; + wd.data[0] &= ~0x7ULL; + + if (flags & NIX_TX_MULTI_SEG_F) + wd.data[0] <<= 16; + + wd.data[0] |= ((uint64_t)(lnum - 1)) << 12; + wd.data[0] |= lmt_id; /* STEOR0 */ - roc_lmt_submit_steorl(data, pa); + roc_lmt_submit_steorl(wd.data[0], pa); } left -= burst; @@ -1699,9 +2025,14 @@ cn10k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, if (left) goto again; - if (unlikely(scalar)) - pkts += cn10k_nix_xmit_pkts(tx_queue, tx_pkts, scalar, cmd, - flags); + if (unlikely(scalar)) { + if (flags & NIX_TX_MULTI_SEG_F) + pkts += cn10k_nix_xmit_pkts_mseg(tx_queue, tx_pkts, + scalar, cmd, flags); + else + pkts += cn10k_nix_xmit_pkts(tx_queue, tx_pkts, scalar, + cmd, flags); + } return pkts; } @@ -1866,7 +2197,10 @@ T(ts_tso_noff_vlan_ol3ol4csum_l3l4csum, 1, 1, 1, 1, 1, 1, 8, \ void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts); \ \ uint16_t __rte_noinline __rte_hot cn10k_nix_xmit_pkts_vec_##name( \ - void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts); + void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts); \ + \ + uint16_t __rte_noinline __rte_hot cn10k_nix_xmit_pkts_vec_mseg_##name( \ + void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts); \ NIX_TX_FASTPATH_MODES #undef T diff --git a/drivers/net/cnxk/cn10k_tx_vec_mseg.c b/drivers/net/cnxk/cn10k_tx_vec_mseg.c new file mode 100644 index 000000000..1fad81dba --- /dev/null +++ b/drivers/net/cnxk/cn10k_tx_vec_mseg.c @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2021 Marvell. + */ + +#include "cn10k_ethdev.h" +#include "cn10k_tx.h" + +#define T(name, f5, f4, f3, f2, f1, f0, sz, flags) \ + uint16_t __rte_noinline __rte_hot cn10k_nix_xmit_pkts_vec_mseg_##name( \ + void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts) \ + { \ + uint64_t cmd[sz]; \ + \ + /* For TSO inner checksum is a must */ \ + if (((flags) & NIX_TX_OFFLOAD_TSO_F) && \ + !((flags) & NIX_TX_OFFLOAD_L3_L4_CSUM_F)) \ + return 0; \ + return cn10k_nix_xmit_pkts_vector( \ + tx_queue, tx_pkts, pkts, cmd, \ + (flags) | NIX_TX_MULTI_SEG_F); \ + } + +NIX_TX_FASTPATH_MODES +#undef T diff --git a/drivers/net/cnxk/cn9k_tx.c b/drivers/net/cnxk/cn9k_tx.c index 735e21cc6..763f9a14f 100644 --- a/drivers/net/cnxk/cn9k_tx.c +++ b/drivers/net/cnxk/cn9k_tx.c @@ -66,13 +66,23 @@ cn9k_eth_set_tx_function(struct rte_eth_dev *eth_dev) #undef T }; - if (dev->scalar_ena) + const eth_tx_burst_t nix_eth_tx_vec_burst_mseg[2][2][2][2][2][2] = { +#define T(name, f5, f4, f3, f2, f1, f0, sz, flags) \ + [f5][f4][f3][f2][f1][f0] = cn9k_nix_xmit_pkts_vec_mseg_##name, + + NIX_TX_FASTPATH_MODES +#undef T + }; + + if (dev->scalar_ena) { pick_tx_func(eth_dev, nix_eth_tx_burst); - else + if (dev->tx_offloads & DEV_TX_OFFLOAD_MULTI_SEGS) + pick_tx_func(eth_dev, nix_eth_tx_burst_mseg); + } else { pick_tx_func(eth_dev, nix_eth_tx_vec_burst); - - if (dev->tx_offloads & DEV_TX_OFFLOAD_MULTI_SEGS) - pick_tx_func(eth_dev, nix_eth_tx_burst_mseg); + if (dev->tx_offloads & DEV_TX_OFFLOAD_MULTI_SEGS) + pick_tx_func(eth_dev, nix_eth_tx_vec_burst_mseg); + } rte_mb(); } diff --git a/drivers/net/cnxk/cn9k_tx.h b/drivers/net/cnxk/cn9k_tx.h index dca732a9f..ed65cd351 100644 --- a/drivers/net/cnxk/cn9k_tx.h +++ b/drivers/net/cnxk/cn9k_tx.h @@ -582,7 +582,238 @@ cn9k_nix_prepare_tso(struct rte_mbuf *m, union nix_send_hdr_w1_u *w1, } } +static __rte_always_inline uint8_t +cn9k_nix_prepare_mseg_vec_list(struct rte_mbuf *m, uint64_t *cmd, + union nix_send_hdr_w0_u *sh, + union nix_send_sg_s *sg, const uint32_t flags) +{ + struct rte_mbuf *m_next; + uint64_t *slist, sg_u; + uint16_t nb_segs; + uint64_t segdw; + int i = 1; + + sh->total = m->pkt_len; + /* Clear sg->u header before use */ + sg->u &= 0xFC00000000000000; + sg_u = sg->u; + slist = &cmd[0]; + + sg_u = sg_u | ((uint64_t)m->data_len); + + nb_segs = m->nb_segs - 1; + m_next = m->next; + + /* Set invert df if buffer is not to be freed by H/W */ + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) + sg_u |= (cnxk_nix_prefree_seg(m) << 55); + /* Mark mempool object as "put" since it is freed by NIX */ +#ifdef RTE_LIBRTE_MEMPOOL_DEBUG + if (!(sg_u & (1ULL << 55))) + __mempool_check_cookies(m->pool, (void **)&m, 1, 0); + rte_io_wmb(); +#endif + + m = m_next; + /* Fill mbuf segments */ + do { + m_next = m->next; + sg_u = sg_u | ((uint64_t)m->data_len << (i << 4)); + *slist = rte_mbuf_data_iova(m); + /* Set invert df if buffer is not to be freed by H/W */ + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) + sg_u |= (cnxk_nix_prefree_seg(m) << (i + 55)); + /* Mark mempool object as "put" since it is freed by NIX + */ +#ifdef RTE_LIBRTE_MEMPOOL_DEBUG + if (!(sg_u & (1ULL << (i + 55)))) + __mempool_check_cookies(m->pool, (void **)&m, 1, 0); + rte_io_wmb(); +#endif + slist++; + i++; + nb_segs--; + if (i > 2 && nb_segs) { + i = 0; + /* Next SG subdesc */ + *(uint64_t *)slist = sg_u & 0xFC00000000000000; + sg->u = sg_u; + sg->segs = 3; + sg = (union nix_send_sg_s *)slist; + sg_u = sg->u; + slist++; + } + m = m_next; + } while (nb_segs); + + sg->u = sg_u; + sg->segs = i; + segdw = (uint64_t *)slist - (uint64_t *)&cmd[0]; + + segdw += 2; + /* Roundup extra dwords to multiple of 2 */ + segdw = (segdw >> 1) + (segdw & 0x1); + /* Default dwords */ + segdw += 1 + !!(flags & NIX_TX_NEED_EXT_HDR) + + !!(flags & NIX_TX_OFFLOAD_TSTAMP_F); + sh->sizem1 = segdw - 1; + + return segdw; +} + +static __rte_always_inline uint8_t +cn9k_nix_prepare_mseg_vec(struct rte_mbuf *m, uint64_t *cmd, uint64x2_t *cmd0, + uint64x2_t *cmd1, const uint32_t flags) +{ + union nix_send_hdr_w0_u sh; + union nix_send_sg_s sg; + uint8_t ret; + + if (m->nb_segs == 1) { + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) { + sg.u = vgetq_lane_u64(cmd1[0], 0); + sg.u |= (cnxk_nix_prefree_seg(m) << 55); + cmd1[0] = vsetq_lane_u64(sg.u, cmd1[0], 0); + } + +#ifdef RTE_LIBRTE_MEMPOOL_DEBUG + sg.u = vgetq_lane_u64(cmd1[0], 0); + if (!(sg.u & (1ULL << 55))) + __mempool_check_cookies(m->pool, (void **)&m, 1, 0); + rte_io_wmb(); +#endif + return 2 + !!(flags & NIX_TX_NEED_EXT_HDR) + + !!(flags & NIX_TX_OFFLOAD_TSTAMP_F); + } + + sh.u = vgetq_lane_u64(cmd0[0], 0); + sg.u = vgetq_lane_u64(cmd1[0], 0); + + ret = cn9k_nix_prepare_mseg_vec_list(m, cmd, &sh, &sg, flags); + + cmd0[0] = vsetq_lane_u64(sh.u, cmd0[0], 0); + cmd1[0] = vsetq_lane_u64(sg.u, cmd1[0], 0); + return ret; +} + #define NIX_DESCS_PER_LOOP 4 + +static __rte_always_inline void +cn9k_nix_xmit_pkts_mseg_vector(uint64x2_t *cmd0, uint64x2_t *cmd1, + uint64x2_t *cmd2, uint64x2_t *cmd3, + uint8_t *segdw, + uint64_t slist[][CNXK_NIX_TX_MSEG_SG_DWORDS - 2], + uint64_t *lmt_addr, rte_iova_t io_addr, + const uint32_t flags) +{ + uint64_t lmt_status; + uint8_t j, off; + + if (!(flags & NIX_TX_NEED_EXT_HDR) && + !(flags & NIX_TX_OFFLOAD_TSTAMP_F)) { + /* No segments in 4 consecutive packets. */ + if ((segdw[0] + segdw[1] + segdw[2] + segdw[3]) <= 8) { + do { + vst1q_u64(lmt_addr, cmd0[0]); + vst1q_u64(lmt_addr + 2, cmd1[0]); + vst1q_u64(lmt_addr + 4, cmd0[1]); + vst1q_u64(lmt_addr + 6, cmd1[1]); + vst1q_u64(lmt_addr + 8, cmd0[2]); + vst1q_u64(lmt_addr + 10, cmd1[2]); + vst1q_u64(lmt_addr + 12, cmd0[3]); + vst1q_u64(lmt_addr + 14, cmd1[3]); + lmt_status = roc_lmt_submit_ldeor(io_addr); + } while (lmt_status == 0); + + return; + } + } + + for (j = 0; j < NIX_DESCS_PER_LOOP;) { + /* Fit consecutive packets in same LMTLINE. */ + if ((segdw[j] + segdw[j + 1]) <= 8) { +again0: + if ((flags & NIX_TX_NEED_EXT_HDR) && + (flags & NIX_TX_OFFLOAD_TSTAMP_F)) { + vst1q_u64(lmt_addr, cmd0[j]); + vst1q_u64(lmt_addr + 2, cmd2[j]); + vst1q_u64(lmt_addr + 4, cmd1[j]); + /* Copy segs */ + off = segdw[j] - 4; + roc_lmt_mov_seg(lmt_addr + 6, slist[j], off); + off <<= 1; + vst1q_u64(lmt_addr + 6 + off, cmd3[j]); + + vst1q_u64(lmt_addr + 8 + off, cmd0[j + 1]); + vst1q_u64(lmt_addr + 10 + off, cmd2[j + 1]); + vst1q_u64(lmt_addr + 12 + off, cmd1[j + 1]); + roc_lmt_mov_seg(lmt_addr + 14 + off, + slist[j + 1], segdw[j + 1] - 4); + off += ((segdw[j + 1] - 4) << 1); + vst1q_u64(lmt_addr + 14 + off, cmd3[j + 1]); + } else if (flags & NIX_TX_NEED_EXT_HDR) { + vst1q_u64(lmt_addr, cmd0[j]); + vst1q_u64(lmt_addr + 2, cmd2[j]); + vst1q_u64(lmt_addr + 4, cmd1[j]); + /* Copy segs */ + off = segdw[j] - 3; + roc_lmt_mov_seg(lmt_addr + 6, slist[j], off); + off <<= 1; + vst1q_u64(lmt_addr + 6 + off, cmd0[j + 1]); + vst1q_u64(lmt_addr + 8 + off, cmd2[j + 1]); + vst1q_u64(lmt_addr + 10 + off, cmd1[j + 1]); + roc_lmt_mov_seg(lmt_addr + 12 + off, + slist[j + 1], segdw[j + 1] - 3); + } else { + vst1q_u64(lmt_addr, cmd0[j]); + vst1q_u64(lmt_addr + 2, cmd1[j]); + /* Copy segs */ + off = segdw[j] - 2; + roc_lmt_mov_seg(lmt_addr + 4, slist[j], off); + off <<= 1; + vst1q_u64(lmt_addr + 4 + off, cmd0[j + 1]); + vst1q_u64(lmt_addr + 6 + off, cmd1[j + 1]); + roc_lmt_mov_seg(lmt_addr + 8 + off, + slist[j + 1], segdw[j + 1] - 2); + } + lmt_status = roc_lmt_submit_ldeor(io_addr); + if (lmt_status == 0) + goto again0; + j += 2; + } else { +again1: + if ((flags & NIX_TX_NEED_EXT_HDR) && + (flags & NIX_TX_OFFLOAD_TSTAMP_F)) { + vst1q_u64(lmt_addr, cmd0[j]); + vst1q_u64(lmt_addr + 2, cmd2[j]); + vst1q_u64(lmt_addr + 4, cmd1[j]); + /* Copy segs */ + off = segdw[j] - 4; + roc_lmt_mov_seg(lmt_addr + 6, slist[j], off); + off <<= 1; + vst1q_u64(lmt_addr + 6 + off, cmd3[j]); + } else if (flags & NIX_TX_NEED_EXT_HDR) { + vst1q_u64(lmt_addr, cmd0[j]); + vst1q_u64(lmt_addr + 2, cmd2[j]); + vst1q_u64(lmt_addr + 4, cmd1[j]); + /* Copy segs */ + off = segdw[j] - 3; + roc_lmt_mov_seg(lmt_addr + 6, slist[j], off); + } else { + vst1q_u64(lmt_addr, cmd0[j]); + vst1q_u64(lmt_addr + 2, cmd1[j]); + /* Copy segs */ + off = segdw[j] - 2; + roc_lmt_mov_seg(lmt_addr + 4, slist[j], off); + } + lmt_status = roc_lmt_submit_ldeor(io_addr); + if (lmt_status == 0) + goto again1; + j += 1; + } + } +} + static __rte_always_inline uint16_t cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts, uint64_t *cmd, const uint16_t flags) @@ -1380,7 +1611,8 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, sendext23_w0 = vld1q_u64(sx_w0 + 2); } - if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) { + if ((flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) && + !(flags & NIX_TX_MULTI_SEG_F)) { /* Set don't free bit if reference count > 1 */ xmask01 = vdupq_n_u64(0); xmask23 = xmask01; @@ -1424,7 +1656,7 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, * cnxk_nix_prefree_seg are written before LMTST. */ rte_io_wmb(); - } else { + } else if (!(flags & NIX_TX_MULTI_SEG_F)) { /* Move mbufs to iova */ mbuf0 = (uint64_t *)tx_pkts[0]; mbuf1 = (uint64_t *)tx_pkts[1]; @@ -1472,7 +1704,27 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, cmd2[3] = vzip2q_u64(sendext23_w0, sendext23_w1); } - if (flags & NIX_TX_NEED_EXT_HDR) { + if (flags & NIX_TX_MULTI_SEG_F) { + uint64_t seg_list[NIX_DESCS_PER_LOOP] + [CNXK_NIX_TX_MSEG_SG_DWORDS - 2]; + uint8_t j, segdw[NIX_DESCS_PER_LOOP + 1]; + + /* Build mseg list for each packet individually. */ + for (j = 0; j < NIX_DESCS_PER_LOOP; j++) + segdw[j] = cn9k_nix_prepare_mseg_vec(tx_pkts[j], + seg_list[j], &cmd0[j], + &cmd1[j], flags); + segdw[4] = 8; + + /* Commit all changes to mbuf before LMTST. */ + if (flags & NIX_TX_OFFLOAD_MBUF_NOFF_F) + rte_io_wmb(); + + cn9k_nix_xmit_pkts_mseg_vector(cmd0, cmd1, cmd2, cmd3, + segdw, seg_list, + lmt_addr, io_addr, + flags); + } else if (flags & NIX_TX_NEED_EXT_HDR) { /* With ext header in the command we can no longer send * all 4 packets together since LMTLINE is 128bytes. * Split and Tx twice. @@ -1534,9 +1786,14 @@ cn9k_nix_xmit_pkts_vector(void *tx_queue, struct rte_mbuf **tx_pkts, tx_pkts = tx_pkts + NIX_DESCS_PER_LOOP; } - if (unlikely(pkts_left)) - pkts += cn9k_nix_xmit_pkts(tx_queue, tx_pkts, pkts_left, cmd, - flags); + if (unlikely(pkts_left)) { + if (flags & NIX_TX_MULTI_SEG_F) + pkts += cn9k_nix_xmit_pkts_mseg(tx_queue, tx_pkts, + pkts_left, cmd, flags); + else + pkts += cn9k_nix_xmit_pkts(tx_queue, tx_pkts, pkts_left, + cmd, flags); + } return pkts; } @@ -1701,6 +1958,9 @@ T(ts_tso_noff_vlan_ol3ol4csum_l3l4csum, 1, 1, 1, 1, 1, 1, 8, \ void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts); \ \ uint16_t __rte_noinline __rte_hot cn9k_nix_xmit_pkts_vec_##name( \ + void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts); \ + \ + uint16_t __rte_noinline __rte_hot cn9k_nix_xmit_pkts_vec_mseg_##name( \ void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts); NIX_TX_FASTPATH_MODES diff --git a/drivers/net/cnxk/cn9k_tx_vec_mseg.c b/drivers/net/cnxk/cn9k_tx_vec_mseg.c new file mode 100644 index 000000000..0256efd45 --- /dev/null +++ b/drivers/net/cnxk/cn9k_tx_vec_mseg.c @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2021 Marvell. + */ + +#include "cn9k_ethdev.h" +#include "cn9k_tx.h" + +#define T(name, f5, f4, f3, f2, f1, f0, sz, flags) \ + uint16_t __rte_noinline __rte_hot cn9k_nix_xmit_pkts_vec_mseg_##name( \ + void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t pkts) \ + { \ + uint64_t cmd[sz]; \ + \ + /* For TSO inner checksum is a must */ \ + if (((flags) & NIX_TX_OFFLOAD_TSO_F) && \ + !((flags) & NIX_TX_OFFLOAD_L3_L4_CSUM_F)) \ + return 0; \ + return cn9k_nix_xmit_pkts_vector(tx_queue, tx_pkts, pkts, cmd, \ + (flags) | \ + NIX_TX_MULTI_SEG_F); \ + } + +NIX_TX_FASTPATH_MODES +#undef T diff --git a/drivers/net/cnxk/meson.build b/drivers/net/cnxk/meson.build index aa8c7253f..361f7ce84 100644 --- a/drivers/net/cnxk/meson.build +++ b/drivers/net/cnxk/meson.build @@ -26,7 +26,8 @@ sources += files('cn9k_ethdev.c', 'cn9k_rx_vec_mseg.c', 'cn9k_tx.c', 'cn9k_tx_mseg.c', - 'cn9k_tx_vec.c') + 'cn9k_tx_vec.c', + 'cn9k_tx_vec_mseg.c') # CN10K sources += files('cn10k_ethdev.c', 'cn10k_rte_flow.c', @@ -36,7 +37,8 @@ sources += files('cn10k_ethdev.c', 'cn10k_rx_vec_mseg.c', 'cn10k_tx.c', 'cn10k_tx_mseg.c', - 'cn10k_tx_vec.c') + 'cn10k_tx_vec.c', + 'cn10k_tx_vec_mseg.c') deps += ['bus_pci', 'cryptodev', 'eventdev', 'security'] deps += ['common_cnxk', 'mempool_cnxk']