From patchwork Tue Dec 13 17:41:54 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tomasz Kulasek X-Patchwork-Id: 17925 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id C970410A7; Tue, 13 Dec 2016 18:48:05 +0100 (CET) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 0C43720F for ; Tue, 13 Dec 2016 18:48:03 +0100 (CET) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga104.fm.intel.com with ESMTP; 13 Dec 2016 09:48:03 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,342,1477983600"; d="scan'208";a="17595576" Received: from unknown (HELO Sent) ([10.103.102.79]) by orsmga002.jf.intel.com with SMTP; 13 Dec 2016 09:48:01 -0800 Received: by Sent (sSMTP sendmail emulation); Tue, 13 Dec 2016 18:47:18 +0100 From: Tomasz Kulasek To: dev@dpdk.org Date: Tue, 13 Dec 2016 18:41:54 +0100 Message-Id: <1481650914-40324-8-git-send-email-tomaszx.kulasek@intel.com> X-Mailer: git-send-email 2.1.4 In-Reply-To: <1481650914-40324-1-git-send-email-tomaszx.kulasek@intel.com> References: <1479922585-8640-1-git-send-email-tomaszx.kulasek@intel.com> <1481650914-40324-1-git-send-email-tomaszx.kulasek@intel.com> Subject: [dpdk-dev] [PATCH v13 7/7] testpmd: use Tx preparation in csum engine X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Added "csum txprep (on|off)" command which allows to switch to the tx path using Tx preparation API. By default unchanged implementation is used. Using Tx preparation path, pseudo header calculation for udp/tcp/tso packets from application, and used Tx preparation API for packet preparation and verification. Adding additional step to the csum engine costs about 3-4% of performance drop, on my setup with ixgbe driver. It's caused mostly by the need of reaccessing and modification of packet data. Signed-off-by: Tomasz Kulasek Acked-by: Konstantin Ananyev --- app/test-pmd/cmdline.c | 49 +++++++++++++++++++++++++++ app/test-pmd/csumonly.c | 33 ++++++++++++++---- app/test-pmd/testpmd.c | 5 +++ app/test-pmd/testpmd.h | 2 ++ doc/guides/testpmd_app_ug/testpmd_funcs.rst | 13 +++++++ 5 files changed, 95 insertions(+), 7 deletions(-) diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index d03a592..499a00b 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -366,6 +366,10 @@ static void cmd_help_long_parsed(void *parsed_result, "csum show (port_id)\n" " Display tx checksum offload configuration\n\n" + "csum txprep (on|off)" + " Enable tx preparation path in csum forward engine" + "\n\n" + "tso set (segsize) (portid)\n" " Enable TCP Segmentation Offload in csum forward" " engine.\n" @@ -3528,6 +3532,50 @@ struct cmd_csum_tunnel_result { }, }; +/* Enable/disable tx preparation path */ +struct cmd_csum_txprep_result { + cmdline_fixed_string_t csum; + cmdline_fixed_string_t parse; + cmdline_fixed_string_t onoff; +}; + +static void +cmd_csum_txprep_parsed(void *parsed_result, + __attribute__((unused)) struct cmdline *cl, + __attribute__((unused)) void *data) +{ + struct cmd_csum_txprep_result *res = parsed_result; + + if (!strcmp(res->onoff, "on")) + tx_prepare = 1; + else + tx_prepare = 0; + +} + +cmdline_parse_token_string_t cmd_csum_txprep_csum = + TOKEN_STRING_INITIALIZER(struct cmd_csum_txprep_result, + csum, "csum"); +cmdline_parse_token_string_t cmd_csum_txprep_parse = + TOKEN_STRING_INITIALIZER(struct cmd_csum_txprep_result, + parse, "txprep"); +cmdline_parse_token_string_t cmd_csum_txprep_onoff = + TOKEN_STRING_INITIALIZER(struct cmd_csum_txprep_result, + onoff, "on#off"); + +cmdline_parse_inst_t cmd_csum_txprep = { + .f = cmd_csum_txprep_parsed, + .data = NULL, + .help_str = "csum txprep on|off: Enable/Disable tx preparation path " + "for csum engine", + .tokens = { + (void *)&cmd_csum_txprep_csum, + (void *)&cmd_csum_txprep_parse, + (void *)&cmd_csum_txprep_onoff, + NULL, + }, +}; + /* *** ENABLE HARDWARE SEGMENTATION IN TX NON-TUNNELED PACKETS *** */ struct cmd_tso_set_result { cmdline_fixed_string_t tso; @@ -11518,6 +11566,7 @@ struct cmd_set_vf_mac_addr_result { (cmdline_parse_inst_t *)&cmd_csum_set, (cmdline_parse_inst_t *)&cmd_csum_show, (cmdline_parse_inst_t *)&cmd_csum_tunnel, + (cmdline_parse_inst_t *)&cmd_csum_txprep, (cmdline_parse_inst_t *)&cmd_tso_set, (cmdline_parse_inst_t *)&cmd_tso_show, (cmdline_parse_inst_t *)&cmd_tunnel_tso_set, diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index 57e6ae2..3afa9ab 100644 --- a/app/test-pmd/csumonly.c +++ b/app/test-pmd/csumonly.c @@ -372,8 +372,10 @@ struct simple_gre_hdr { udp_hdr->dgram_cksum = 0; if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_UDP_CKSUM) { ol_flags |= PKT_TX_UDP_CKSUM; - udp_hdr->dgram_cksum = get_psd_sum(l3_hdr, - info->ethertype, ol_flags); + if (!tx_prepare) + udp_hdr->dgram_cksum = get_psd_sum( + l3_hdr, info->ethertype, + ol_flags); } else { udp_hdr->dgram_cksum = get_udptcp_checksum(l3_hdr, udp_hdr, @@ -385,12 +387,15 @@ struct simple_gre_hdr { tcp_hdr->cksum = 0; if (tso_segsz) { ol_flags |= PKT_TX_TCP_SEG; - tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype, - ol_flags); + if (!tx_prepare) + tcp_hdr->cksum = get_psd_sum(l3_hdr, + info->ethertype, ol_flags); + } else if (testpmd_ol_flags & TESTPMD_TX_OFFLOAD_TCP_CKSUM) { ol_flags |= PKT_TX_TCP_CKSUM; - tcp_hdr->cksum = get_psd_sum(l3_hdr, info->ethertype, - ol_flags); + if (!tx_prepare) + tcp_hdr->cksum = get_psd_sum(l3_hdr, + info->ethertype, ol_flags); } else { tcp_hdr->cksum = get_udptcp_checksum(l3_hdr, tcp_hdr, @@ -648,6 +653,7 @@ struct simple_gre_hdr { void *l3_hdr = NULL, *outer_l3_hdr = NULL; /* can be IPv4 or IPv6 */ uint16_t nb_rx; uint16_t nb_tx; + uint16_t nb_prep; uint16_t i; uint64_t rx_ol_flags, tx_ol_flags; uint16_t testpmd_ol_flags; @@ -857,7 +863,20 @@ struct simple_gre_hdr { printf("\n"); } } - nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_rx); + + if (tx_prepare) { + nb_prep = rte_eth_tx_prepare(fs->tx_port, fs->tx_queue, + pkts_burst, nb_rx); + if (nb_prep != nb_rx) + printf("Preparing packet burst to transmit failed: %s\n", + rte_strerror(rte_errno)); + + nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, + nb_prep); + } else + nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, + nb_rx); + /* * Retry if necessary */ diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index a0332c2..634f10b 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -180,6 +180,11 @@ struct fwd_engine * fwd_engines[] = { enum tx_pkt_split tx_pkt_split = TX_PKT_SPLIT_OFF; /**< Split policy for packets to TX. */ +/* + * Enable Tx preparation path in the "csum" engine. + */ +uint8_t tx_prepare; + uint16_t nb_pkt_per_burst = DEF_PKT_BURST; /**< Number of packets per burst. */ uint16_t mb_mempool_cache = DEF_MBUF_CACHE; /**< Size of mbuf mempool cache. */ diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index 9c1e703..488a6e1 100644 --- a/app/test-pmd/testpmd.h +++ b/app/test-pmd/testpmd.h @@ -383,6 +383,8 @@ enum tx_pkt_split { extern enum tx_pkt_split tx_pkt_split; +extern uint8_t tx_prepare; + extern uint16_t nb_pkt_per_burst; extern uint16_t mb_mempool_cache; extern int8_t rx_pthresh; diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst index f1c269a..d77336e 100644 --- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst +++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst @@ -750,6 +750,19 @@ Display tx checksum offload configuration:: testpmd> csum show (port_id) +csum txprep +~~~~~~~~~~~ + +Select TX preparation path for the ``csum`` forwarding engine:: + + testpmd> csum txprep (on|off) + +If enabled, the csum forward engine uses TX preparation API for full packet +preparation and verification before TX burst. + +If disabled, csum engine initializes all required fields on application level +and TX preparation stage is not executed. + tso set ~~~~~~~