From patchwork Sat Jan 16 09:38:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Igor Russkikh X-Patchwork-Id: 86720 X-Patchwork-Delegate: ferruh.yigit@amd.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A381BA0A02; Sat, 16 Jan 2021 10:39:22 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 3B610140DD2; Sat, 16 Jan 2021 10:39:22 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by mails.dpdk.org (Postfix) with ESMTP id 20A81140DCE for ; Sat, 16 Jan 2021 10:39:19 +0100 (CET) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 10G9V0Ae013028; Sat, 16 Jan 2021 01:39:19 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : mime-version : content-type; s=pfpt0220; bh=Mc3YHyN0q/3AUxZpC5SWKIPSokz0zeRu1TAUZw+gIgU=; b=dseH4o+Y9wsETIMxBM3PXb8pI5jJ+hLF/lgmoipRXcu3VKBRZ/NbQtIO/SmEOhwY+HRW n81ZnBYnzJRvO7YL5rtHU9dTUWAV+6uJywqH08xm8JbSB/gsB/RW2AAXiABNE7M4SWt3 3CBBRDs40pydRov3DT4F4f8FX1VNh3V0/7pA6iL61GNy6RRM7K7ACb3MNVkGPzcC70oD yFbSa1WTiEqmbqDzMpKuZmzJsNXJLHsNbEjcaaRuesJfkUKJDOg2jcCH8qtiUrJFC0zJ oXw9fnHwn/EQkWzdoNZBUeH7U4Ss70wan45BlJ6UHC5gVzABe5kmaNMLzubyVxfB2xWn bw== Received: from dc5-exch01.marvell.com ([199.233.59.181]) by mx0a-0016f401.pphosted.com with ESMTP id 35yaqt595q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Sat, 16 Jan 2021 01:39:18 -0800 Received: from SC-EXCH04.marvell.com (10.93.176.84) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Sat, 16 Jan 2021 01:39:17 -0800 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by SC-EXCH04.marvell.com (10.93.176.84) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Sat, 16 Jan 2021 01:39:17 -0800 Received: from maili.marvell.com (10.69.176.80) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Sat, 16 Jan 2021 01:39:17 -0800 Received: from NN-LT0019.marvell.com (unknown [10.193.38.82]) by maili.marvell.com (Postfix) with ESMTP id 001403F7045; Sat, 16 Jan 2021 01:39:14 -0800 (PST) From: Igor Russkikh To: CC: Rasesh Mody , Devendra Singh Rawat , Ferruh Yigit , Wenzhuo Lu , Beilei Xing , "Bernard Iremonger" , Igor Russkikh Date: Sat, 16 Jan 2021 10:38:59 +0100 Message-ID: <20210116093859.3025-1-irusskikh@marvell.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.343, 18.0.737 definitions=2021-01-16_05:2021-01-15, 2021-01-16 signatures=0 Subject: [dpdk-dev] [PATCH] app/testpmd: tx pkt clones parameter in flowgen X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" When testing high performance numbers, it is often that CPU performance limits the max values device can reach (both in pps and in gbps) Here instead of recreating each packet separately, we use clones counter to resend the same mbuf to the line multiple times. PMDs handle that transparently due to reference counting inside of mbuf. Reaching max PPS on small packet sizes helps here: Some data from our 2 port x 50G device. Using 2*6 tx queues, 64b packets, PowerEdge R7525, AMD EPYC 7452: ./build/app/dpdk-testpmd -l 32-63 -- --forward-mode=flowgen \ --rxq=6 --txq=6 --disable-crc-strip --burst=512 \ --flowgen-clones=0 --txd=4096 --stats-period=1 --txpkts=64 Gives ~46MPPS TX output: Tx-pps: 22926849 Tx-bps: 11738590176 Tx-pps: 23642629 Tx-bps: 12105024112 Setting flowgen-clones to 512 pushes TX almost to our device physical limit (68MPPS) using same 2*6 queues(cores): Tx-pps: 34357556 Tx-bps: 17591073696 Tx-pps: 34353211 Tx-bps: 17588802640 Doing similar measurements per core, I see one core can do 6.9MPPS (without clones) vs 11MPPS (with clones) Verified on Marvell qede and atlantic PMDs. this v1: - fixes on Ferruh's comments rfc v2: http://patchwork.dpdk.org/patch/78800/ - increment ref counter for each mbuf pointer copy rfc v1: http://patchwork.dpdk.org/patch/78674/ Signed-off-by: Igor Russkikh --- app/test-pmd/flowgen.c | 105 ++++++++++++++------------ app/test-pmd/parameters.c | 10 +++ app/test-pmd/testpmd.c | 1 + app/test-pmd/testpmd.h | 1 + doc/guides/testpmd_app_ug/run_app.rst | 7 ++ 5 files changed, 77 insertions(+), 47 deletions(-) diff --git a/app/test-pmd/flowgen.c b/app/test-pmd/flowgen.c index acf3e2460..53a2e5a63 100644 --- a/app/test-pmd/flowgen.c +++ b/app/test-pmd/flowgen.c @@ -94,6 +94,7 @@ pkt_burst_flow_gen(struct fwd_stream *fs) uint16_t nb_rx; uint16_t nb_tx; uint16_t nb_pkt; + uint16_t nb_clones = nb_pkt_flowgen_clones; uint16_t i; uint32_t retry; uint64_t tx_offloads; @@ -123,53 +124,63 @@ pkt_burst_flow_gen(struct fwd_stream *fs) ol_flags |= PKT_TX_MACSEC; for (nb_pkt = 0; nb_pkt < nb_pkt_per_burst; nb_pkt++) { - pkt = rte_mbuf_raw_alloc(mbp); - if (!pkt) - break; - - pkt->data_len = pkt_size; - pkt->next = NULL; - - /* Initialize Ethernet header. */ - eth_hdr = rte_pktmbuf_mtod(pkt, struct rte_ether_hdr *); - rte_ether_addr_copy(&cfg_ether_dst, ð_hdr->d_addr); - rte_ether_addr_copy(&cfg_ether_src, ð_hdr->s_addr); - eth_hdr->ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4); - - /* Initialize IP header. */ - ip_hdr = (struct rte_ipv4_hdr *)(eth_hdr + 1); - memset(ip_hdr, 0, sizeof(*ip_hdr)); - ip_hdr->version_ihl = RTE_IPV4_VHL_DEF; - ip_hdr->type_of_service = 0; - ip_hdr->fragment_offset = 0; - ip_hdr->time_to_live = IP_DEFTTL; - ip_hdr->next_proto_id = IPPROTO_UDP; - ip_hdr->packet_id = 0; - ip_hdr->src_addr = rte_cpu_to_be_32(cfg_ip_src); - ip_hdr->dst_addr = rte_cpu_to_be_32(cfg_ip_dst + - next_flow); - ip_hdr->total_length = RTE_CPU_TO_BE_16(pkt_size - - sizeof(*eth_hdr)); - ip_hdr->hdr_checksum = ip_sum((unaligned_uint16_t *)ip_hdr, - sizeof(*ip_hdr)); - - /* Initialize UDP header. */ - udp_hdr = (struct rte_udp_hdr *)(ip_hdr + 1); - udp_hdr->src_port = rte_cpu_to_be_16(cfg_udp_src); - udp_hdr->dst_port = rte_cpu_to_be_16(cfg_udp_dst); - udp_hdr->dgram_cksum = 0; /* No UDP checksum. */ - udp_hdr->dgram_len = RTE_CPU_TO_BE_16(pkt_size - - sizeof(*eth_hdr) - - sizeof(*ip_hdr)); - pkt->nb_segs = 1; - pkt->pkt_len = pkt_size; - pkt->ol_flags &= EXT_ATTACHED_MBUF; - pkt->ol_flags |= ol_flags; - pkt->vlan_tci = vlan_tci; - pkt->vlan_tci_outer = vlan_tci_outer; - pkt->l2_len = sizeof(struct rte_ether_hdr); - pkt->l3_len = sizeof(struct rte_ipv4_hdr); - pkts_burst[nb_pkt] = pkt; + if (!nb_pkt || !nb_clones) { + nb_clones = nb_pkt_flowgen_clones; + /* Logic limitation */ + if (nb_clones > nb_pkt_per_burst) + nb_clones = nb_pkt_per_burst; + + pkt = rte_mbuf_raw_alloc(mbp); + if (!pkt) + break; + + pkt->data_len = pkt_size; + pkt->next = NULL; + + /* Initialize Ethernet header. */ + eth_hdr = rte_pktmbuf_mtod(pkt, struct rte_ether_hdr *); + rte_ether_addr_copy(&cfg_ether_dst, ð_hdr->d_addr); + rte_ether_addr_copy(&cfg_ether_src, ð_hdr->s_addr); + eth_hdr->ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4); + + /* Initialize IP header. */ + ip_hdr = (struct rte_ipv4_hdr *)(eth_hdr + 1); + memset(ip_hdr, 0, sizeof(*ip_hdr)); + ip_hdr->version_ihl = RTE_IPV4_VHL_DEF; + ip_hdr->type_of_service = 0; + ip_hdr->fragment_offset = 0; + ip_hdr->time_to_live = IP_DEFTTL; + ip_hdr->next_proto_id = IPPROTO_UDP; + ip_hdr->packet_id = 0; + ip_hdr->src_addr = rte_cpu_to_be_32(cfg_ip_src); + ip_hdr->dst_addr = rte_cpu_to_be_32(cfg_ip_dst + + next_flow); + ip_hdr->total_length = RTE_CPU_TO_BE_16(pkt_size - + sizeof(*eth_hdr)); + ip_hdr->hdr_checksum = ip_sum((unaligned_uint16_t *)ip_hdr, + sizeof(*ip_hdr)); + + /* Initialize UDP header. */ + udp_hdr = (struct rte_udp_hdr *)(ip_hdr + 1); + udp_hdr->src_port = rte_cpu_to_be_16(cfg_udp_src); + udp_hdr->dst_port = rte_cpu_to_be_16(cfg_udp_dst); + udp_hdr->dgram_cksum = 0; /* No UDP checksum. */ + udp_hdr->dgram_len = RTE_CPU_TO_BE_16(pkt_size - + sizeof(*eth_hdr) - + sizeof(*ip_hdr)); + pkt->nb_segs = 1; + pkt->pkt_len = pkt_size; + pkt->ol_flags &= EXT_ATTACHED_MBUF; + pkt->ol_flags |= ol_flags; + pkt->vlan_tci = vlan_tci; + pkt->vlan_tci_outer = vlan_tci_outer; + pkt->l2_len = sizeof(struct rte_ether_hdr); + pkt->l3_len = sizeof(struct rte_ipv4_hdr); + } else { + nb_clones--; + rte_mbuf_refcnt_update(pkt, 1); + } + pkts_burst[nb_pkt] = pkt; next_flow = (next_flow + 1) % cfg_n_flows; } diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c index 414a0068f..a095aa8f6 100644 --- a/app/test-pmd/parameters.c +++ b/app/test-pmd/parameters.c @@ -163,6 +163,7 @@ usage(char* progname) printf(" --hairpinq=N: set the number of hairpin queues per port to " "N.\n"); printf(" --burst=N: set the number of packets per burst to N.\n"); + printf(" --flowgen-clones=N: set the number of single packet clones to send in flowgen mode. Should be less than burst value.\n"); printf(" --mbcache=N: set the cache of mbuf memory pool to N.\n"); printf(" --rxpt=N: set prefetch threshold register of RX rings to N.\n"); printf(" --rxht=N: set the host threshold register of RX rings to N.\n"); @@ -561,6 +562,7 @@ launch_args_parse(int argc, char** argv) { "hairpinq", 1, 0, 0 }, { "hairpin-mode", 1, 0, 0 }, { "burst", 1, 0, 0 }, + { "flowgen-clones", 1, 0, 0 }, { "mbcache", 1, 0, 0 }, { "txpt", 1, 0, 0 }, { "txht", 1, 0, 0 }, @@ -1089,6 +1091,14 @@ launch_args_parse(int argc, char** argv) else nb_pkt_per_burst = (uint16_t) n; } + if (!strcmp(lgopts[opt_idx].name, "flowgen-clones")) { + n = atoi(optarg); + if (n >= 0) + nb_pkt_flowgen_clones = (uint16_t) n; + else + rte_exit(EXIT_FAILURE, + "clones must be >= 0 and <= current burst\n"); + } if (!strcmp(lgopts[opt_idx].name, "mbcache")) { n = atoi(optarg); if ((n >= 0) && diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 2b60f6c5d..b0f825f6f 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -240,6 +240,7 @@ uint32_t tx_pkt_times_intra; /**< Timings for send scheduling in TXONLY mode, time between packets. */ uint16_t nb_pkt_per_burst = DEF_PKT_BURST; /**< Number of packets per burst. */ +uint16_t nb_pkt_flowgen_clones; /**< Number of tx packet clones to send in flowgen mode. */ uint16_t mb_mempool_cache = DEF_MBUF_CACHE; /**< Size of mbuf mempool cache. */ /* current configuration is in DCB or not,0 means it is not in DCB mode */ diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index 5f2316210..efd558d15 100644 --- a/app/test-pmd/testpmd.h +++ b/app/test-pmd/testpmd.h @@ -476,6 +476,7 @@ extern enum tx_pkt_split tx_pkt_split; extern uint8_t txonly_multi_flow; extern uint16_t nb_pkt_per_burst; +extern uint16_t nb_pkt_flowgen_clones; extern uint16_t mb_mempool_cache; extern int8_t rx_pthresh; extern int8_t rx_hthresh; diff --git a/doc/guides/testpmd_app_ug/run_app.rst b/doc/guides/testpmd_app_ug/run_app.rst index ca67105b7..c4c8f3a6c 100644 --- a/doc/guides/testpmd_app_ug/run_app.rst +++ b/doc/guides/testpmd_app_ug/run_app.rst @@ -299,6 +299,13 @@ The command line options are: If set to 0, driver default is used if defined. Else, if driver default is not defined, default of 32 is used. +* ``--flowgen-clones=N`` + + Set the number of each packet clones to be sent in `flowgen` mode. + Sending clones reduces host CPU load on creating packets and may help + in testing extreme speeds or maxing out tx packet performance. + N should be not zero, but less than 'burst' parameter. + * ``--mbcache=N`` Set the cache of mbuf memory pools to N, where 0 <= N <= 512.