From patchwork Thu Mar 18 10:25:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruifeng Wang X-Patchwork-Id: 89481 X-Patchwork-Delegate: david.marchand@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8E08CA0561; Thu, 18 Mar 2021 11:26:25 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 73026140E92; Thu, 18 Mar 2021 11:26:25 +0100 (CET) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id 65EAB140E9B for ; Thu, 18 Mar 2021 11:26:24 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D963E31B; Thu, 18 Mar 2021 03:26:23 -0700 (PDT) Received: from net-arm-n1amp-01.shanghai.arm.com (net-arm-n1amp-01.shanghai.arm.com [10.169.210.137]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 08FCD3F792; Thu, 18 Mar 2021 03:26:20 -0700 (PDT) From: Ruifeng Wang To: jerinj@marvell.com, hemant.agrawal@nxp.com, ferruh.yigit@intel.com, thomas@monjalon.net, david.marchand@redhat.com Cc: dev@dpdk.org, nd@arm.com, honnappa.nagarahalli@arm.com, Ruifeng Wang Date: Thu, 18 Mar 2021 10:25:47 +0000 Message-Id: <20210318102550.59265-2-ruifeng.wang@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210318102550.59265-1-ruifeng.wang@arm.com> References: <20210318102550.59265-1-ruifeng.wang@arm.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH 1/4] examples/l3fwd: tune prefetch for better performance X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Packet header is prefetched before packet processing for better memory access performance. As L2 header will be updated by l3fwd, using of prefetch for store hint will set cache line to proper status and reduce cache maintenance overhead. With this change, 12.9% performance uplift was measured on N1SDP platform with MLX5 NIC. Suggested-by: Honnappa Nagarahalli Signed-off-by: Ruifeng Wang Reviewed-by: Honnappa Nagarahalli --- examples/l3fwd/l3fwd_lpm_neon.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/examples/l3fwd/l3fwd_lpm_neon.h b/examples/l3fwd/l3fwd_lpm_neon.h index d6c0ba64a..ae8840694 100644 --- a/examples/l3fwd/l3fwd_lpm_neon.h +++ b/examples/l3fwd/l3fwd_lpm_neon.h @@ -97,13 +97,13 @@ l3fwd_lpm_send_packets(int nb_rx, struct rte_mbuf **pkts_burst, if (k) { for (i = 0; i < FWDSTEP; i++) { - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i], + rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[i], struct rte_ether_hdr *) + 1); } for (j = 0; j != k - FWDSTEP; j += FWDSTEP) { for (i = 0; i < FWDSTEP; i++) { - rte_prefetch0(rte_pktmbuf_mtod( + rte_prefetch0_write(rte_pktmbuf_mtod( pkts_burst[j + i + FWDSTEP], struct rte_ether_hdr *) + 1); } @@ -124,17 +124,17 @@ l3fwd_lpm_send_packets(int nb_rx, struct rte_mbuf **pkts_burst, /* Prefetch last up to 3 packets one by one */ switch (m) { case 3: - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], + rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[j], struct rte_ether_hdr *) + 1); j++; /* fallthrough */ case 2: - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], + rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[j], struct rte_ether_hdr *) + 1); j++; /* fallthrough */ case 1: - rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j], + rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[j], struct rte_ether_hdr *) + 1); j++; } From patchwork Thu Mar 18 10:25:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruifeng Wang X-Patchwork-Id: 89482 X-Patchwork-Delegate: david.marchand@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id BF727A0561; Thu, 18 Mar 2021 11:26:30 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id A7B7F140EAD; Thu, 18 Mar 2021 11:26:30 +0100 (CET) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id D0B76140E6D for ; Thu, 18 Mar 2021 11:26:29 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 540F231B; Thu, 18 Mar 2021 03:26:29 -0700 (PDT) Received: from net-arm-n1amp-01.shanghai.arm.com (net-arm-n1amp-01.shanghai.arm.com [10.169.210.137]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 795F13F792; Thu, 18 Mar 2021 03:26:26 -0700 (PDT) From: Ruifeng Wang To: jerinj@marvell.com, hemant.agrawal@nxp.com, ferruh.yigit@intel.com, thomas@monjalon.net, david.marchand@redhat.com Cc: dev@dpdk.org, nd@arm.com, honnappa.nagarahalli@arm.com, Ruifeng Wang Date: Thu, 18 Mar 2021 10:25:48 +0000 Message-Id: <20210318102550.59265-3-ruifeng.wang@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210318102550.59265-1-ruifeng.wang@arm.com> References: <20210318102550.59265-1-ruifeng.wang@arm.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH 2/4] examples/l3fwd: eliminate unnecessary calculations X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Both L2 and L3 headers will be used in forward processing. And these two headers are in the same cache line. It has the same effect for prefetching with L2 header address and prefetching with L3 header address. Changed to use L2 header address for prefetching. The change showed no measurable performance improvement, but it definitely removed unnecessary instructions for address calculation. Signed-off-by: Ruifeng Wang Acked-by: Jerin Jacob --- examples/l3fwd/l3fwd_lpm_neon.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/examples/l3fwd/l3fwd_lpm_neon.h b/examples/l3fwd/l3fwd_lpm_neon.h index ae8840694..1650ae444 100644 --- a/examples/l3fwd/l3fwd_lpm_neon.h +++ b/examples/l3fwd/l3fwd_lpm_neon.h @@ -98,14 +98,14 @@ l3fwd_lpm_send_packets(int nb_rx, struct rte_mbuf **pkts_burst, if (k) { for (i = 0; i < FWDSTEP; i++) { rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[i], - struct rte_ether_hdr *) + 1); + void *)); } for (j = 0; j != k - FWDSTEP; j += FWDSTEP) { for (i = 0; i < FWDSTEP; i++) { rte_prefetch0_write(rte_pktmbuf_mtod( pkts_burst[j + i + FWDSTEP], - struct rte_ether_hdr *) + 1); + void *)); } processx4_step1(&pkts_burst[j], &dip, &ipv4_flag); @@ -125,17 +125,17 @@ l3fwd_lpm_send_packets(int nb_rx, struct rte_mbuf **pkts_burst, switch (m) { case 3: rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[j], - struct rte_ether_hdr *) + 1); + void *)); j++; /* fallthrough */ case 2: rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[j], - struct rte_ether_hdr *) + 1); + void *)); j++; /* fallthrough */ case 1: rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[j], - struct rte_ether_hdr *) + 1); + void *)); j++; } From patchwork Thu Mar 18 10:25:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruifeng Wang X-Patchwork-Id: 89483 X-Patchwork-Delegate: david.marchand@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A1B83A0561; Thu, 18 Mar 2021 11:26:36 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E2F38140EB9; Thu, 18 Mar 2021 11:26:35 +0100 (CET) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id 76C47140E9B for ; Thu, 18 Mar 2021 11:26:34 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E41D831B; Thu, 18 Mar 2021 03:26:33 -0700 (PDT) Received: from net-arm-n1amp-01.shanghai.arm.com (net-arm-n1amp-01.shanghai.arm.com [10.169.210.137]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 14F0B3F792; Thu, 18 Mar 2021 03:26:30 -0700 (PDT) From: Ruifeng Wang To: jerinj@marvell.com, hemant.agrawal@nxp.com, ferruh.yigit@intel.com, thomas@monjalon.net, david.marchand@redhat.com Cc: dev@dpdk.org, nd@arm.com, honnappa.nagarahalli@arm.com, Ruifeng Wang Date: Thu, 18 Mar 2021 10:25:49 +0000 Message-Id: <20210318102550.59265-4-ruifeng.wang@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210318102550.59265-1-ruifeng.wang@arm.com> References: <20210318102550.59265-1-ruifeng.wang@arm.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH 3/4] examples/l3fwd: eliminate unnecessary reloads in loop X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Number of rx queue and number of rx port in lcore config are constants during the period of l3 forward application running. But compiler has no this information. Copied values from lcore config to local variables and used the local variables for iteration. Compiler can see that the local variables are not changed, so qconf reloads at each iteration can be eliminated. The change showed 1.8% performance uplift in single core, single port, single queue test on N1SDP platform with MLX5 NIC. Signed-off-by: Ruifeng Wang --- examples/l3fwd/l3fwd_lpm.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c index 3dcf1fef1..d338590b9 100644 --- a/examples/l3fwd/l3fwd_lpm.c +++ b/examples/l3fwd/l3fwd_lpm.c @@ -190,14 +190,16 @@ lpm_main_loop(__rte_unused void *dummy) lcore_id = rte_lcore_id(); qconf = &lcore_conf[lcore_id]; - if (qconf->n_rx_queue == 0) { + uint16_t n_rx_q = qconf->n_rx_queue; + uint16_t n_tx_p = qconf->n_tx_port; + if (n_rx_q == 0) { RTE_LOG(INFO, L3FWD, "lcore %u has nothing to do\n", lcore_id); return 0; } RTE_LOG(INFO, L3FWD, "entering main loop on lcore %u\n", lcore_id); - for (i = 0; i < qconf->n_rx_queue; i++) { + for (i = 0; i < n_rx_q; i++) { portid = qconf->rx_queue_list[i].port_id; queueid = qconf->rx_queue_list[i].queue_id; @@ -216,7 +218,7 @@ lpm_main_loop(__rte_unused void *dummy) diff_tsc = cur_tsc - prev_tsc; if (unlikely(diff_tsc > drain_tsc)) { - for (i = 0; i < qconf->n_tx_port; ++i) { + for (i = 0; i < n_tx_p; ++i) { portid = qconf->tx_port_id[i]; if (qconf->tx_mbufs[portid].len == 0) continue; @@ -232,7 +234,7 @@ lpm_main_loop(__rte_unused void *dummy) /* * Read packet from RX queues */ - for (i = 0; i < qconf->n_rx_queue; ++i) { + for (i = 0; i < n_rx_q; ++i) { portid = qconf->rx_queue_list[i].port_id; queueid = qconf->rx_queue_list[i].queue_id; nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst, From patchwork Thu Mar 18 10:25:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ruifeng Wang X-Patchwork-Id: 89484 X-Patchwork-Delegate: david.marchand@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D1FF6A0561; Thu, 18 Mar 2021 11:26:42 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 2B4A2140EBF; Thu, 18 Mar 2021 11:26:40 +0100 (CET) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id D370F140EB4 for ; Thu, 18 Mar 2021 11:26:38 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5FF6631B; Thu, 18 Mar 2021 03:26:38 -0700 (PDT) Received: from net-arm-n1amp-01.shanghai.arm.com (net-arm-n1amp-01.shanghai.arm.com [10.169.210.137]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 85B7C3F792; Thu, 18 Mar 2021 03:26:35 -0700 (PDT) From: Ruifeng Wang To: jerinj@marvell.com, hemant.agrawal@nxp.com, ferruh.yigit@intel.com, thomas@monjalon.net, david.marchand@redhat.com Cc: dev@dpdk.org, nd@arm.com, honnappa.nagarahalli@arm.com, Ruifeng Wang Date: Thu, 18 Mar 2021 10:25:50 +0000 Message-Id: <20210318102550.59265-5-ruifeng.wang@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210318102550.59265-1-ruifeng.wang@arm.com> References: <20210318102550.59265-1-ruifeng.wang@arm.com> MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH 4/4] examples/l3fwd: make data struct to be memory efficient X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" There are some holes in data struct lcore_conf. The holes are due to alignment requirement. For struct lcore_rx_queue, there is no need to make every element of this type to be cache line aligned, because the data is not shared between cores. Member len of struct mbuf_table can be moved out. So data can be packed and there will be no need to load an extra cache line when mbuf table is empty. The change showed slight performance improvement on N1SDP platform. Suggested-by: Honnappa Nagarahalli Signed-off-by: Ruifeng Wang --- examples/l3fwd/l3fwd.h | 12 ++++++------ examples/l3fwd/l3fwd_common.h | 4 ++-- examples/l3fwd/l3fwd_em.c | 6 +++--- examples/l3fwd/l3fwd_lpm.c | 6 +++--- 4 files changed, 14 insertions(+), 14 deletions(-) diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h index 2cf06099e..f3a301e12 100644 --- a/examples/l3fwd/l3fwd.h +++ b/examples/l3fwd/l3fwd.h @@ -57,22 +57,22 @@ #define HASH_ENTRY_NUMBER_DEFAULT 4 struct mbuf_table { - uint16_t len; struct rte_mbuf *m_table[MAX_PKT_BURST]; }; struct lcore_rx_queue { uint16_t port_id; uint8_t queue_id; -} __rte_cache_aligned; +}; struct lcore_conf { - uint16_t n_rx_queue; struct lcore_rx_queue rx_queue_list[MAX_RX_QUEUE_PER_LCORE]; - uint16_t n_tx_port; uint16_t tx_port_id[RTE_MAX_ETHPORTS]; uint16_t tx_queue_id[RTE_MAX_ETHPORTS]; + uint16_t tx_mbuf_len[RTE_MAX_ETHPORTS]; struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS]; + uint16_t n_rx_queue; + uint16_t n_tx_port; void *ipv4_lookup_struct; void *ipv6_lookup_struct; } __rte_cache_aligned; @@ -122,7 +122,7 @@ send_single_packet(struct lcore_conf *qconf, { uint16_t len; - len = qconf->tx_mbufs[port].len; + len = qconf->tx_mbuf_len[port]; qconf->tx_mbufs[port].m_table[len] = m; len++; @@ -132,7 +132,7 @@ send_single_packet(struct lcore_conf *qconf, len = 0; } - qconf->tx_mbufs[port].len = len; + qconf->tx_mbuf_len[port] = len; return 0; } diff --git a/examples/l3fwd/l3fwd_common.h b/examples/l3fwd/l3fwd_common.h index 7d83ff641..05e03dbfc 100644 --- a/examples/l3fwd/l3fwd_common.h +++ b/examples/l3fwd/l3fwd_common.h @@ -183,7 +183,7 @@ send_packetsx4(struct lcore_conf *qconf, uint16_t port, struct rte_mbuf *m[], { uint32_t len, j, n; - len = qconf->tx_mbufs[port].len; + len = qconf->tx_mbuf_len[port]; /* * If TX buffer for that queue is empty, and we have enough packets, @@ -258,7 +258,7 @@ send_packetsx4(struct lcore_conf *qconf, uint16_t port, struct rte_mbuf *m[], } } - qconf->tx_mbufs[port].len = len; + qconf->tx_mbuf_len[port] = len; } #endif /* _L3FWD_COMMON_H_ */ diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c index 9996bfba3..1970e0376 100644 --- a/examples/l3fwd/l3fwd_em.c +++ b/examples/l3fwd/l3fwd_em.c @@ -662,12 +662,12 @@ em_main_loop(__rte_unused void *dummy) for (i = 0; i < qconf->n_tx_port; ++i) { portid = qconf->tx_port_id[i]; - if (qconf->tx_mbufs[portid].len == 0) + if (qconf->tx_mbuf_len[portid] == 0) continue; send_burst(qconf, - qconf->tx_mbufs[portid].len, + qconf->tx_mbuf_len[portid], portid); - qconf->tx_mbufs[portid].len = 0; + qconf->tx_mbuf_len[portid] = 0; } prev_tsc = cur_tsc; diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c index d338590b9..e62139a0e 100644 --- a/examples/l3fwd/l3fwd_lpm.c +++ b/examples/l3fwd/l3fwd_lpm.c @@ -220,12 +220,12 @@ lpm_main_loop(__rte_unused void *dummy) for (i = 0; i < n_tx_p; ++i) { portid = qconf->tx_port_id[i]; - if (qconf->tx_mbufs[portid].len == 0) + if (qconf->tx_mbuf_len[portid] == 0) continue; send_burst(qconf, - qconf->tx_mbufs[portid].len, + qconf->tx_mbuf_len[portid], portid); - qconf->tx_mbufs[portid].len = 0; + qconf->tx_mbuf_len[portid] = 0; } prev_tsc = cur_tsc;