From patchwork Mon Nov 9 09:20:16 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Ananyev, Konstantin" X-Patchwork-Id: 8788 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id 9084F568A; Mon, 9 Nov 2015 10:20:47 +0100 (CET) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id A6A77532D for ; Mon, 9 Nov 2015 10:20:46 +0100 (CET) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga103.fm.intel.com with ESMTP; 09 Nov 2015 01:20:45 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,265,1444719600"; d="scan'208";a="596770632" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by FMSMGA003.fm.intel.com with ESMTP; 09 Nov 2015 01:20:44 -0800 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id tA99Kh8r025894; Mon, 9 Nov 2015 09:20:43 GMT Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id tA99Kh6Q010200; Mon, 9 Nov 2015 09:20:43 GMT Received: (from kananye1@localhost) by sivswdev01.ir.intel.com with id tA99Kh5A010196; Mon, 9 Nov 2015 09:20:43 GMT From: Konstantin Ananyev To: dev@dpdk.org Date: Mon, 9 Nov 2015 09:20:16 +0000 Message-Id: <1447060816-9923-3-git-send-email-konstantin.ananyev@intel.com> X-Mailer: git-send-email 1.7.4.1 In-Reply-To: <1447060816-9923-1-git-send-email-konstantin.ananyev@intel.com> References: <1447060816-9923-1-git-send-email-konstantin.ananyev@intel.com> Subject: [dpdk-dev] [PATCHv5 2/2] ixgbe: fix TX hang when RS distance exceeds HW limit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" One of the ways to reproduce the issue: testpmd -- -i --txqflags=0 testpmd> set fwd txonly testpmd> set txpkts 64,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4 testpmd> set txsplit rand testpmd> start After some time TX on ixgbe queue will hang, and all packet transmission on that queue will stop. This bug was first reported and investigated by Vlad Zolotarov : "We can reproduce this issue when stressed the xmit path with a lot of highly fragmented TCP frames (packets with up to 33 fragments with non-headers fragments as small as 4 bytes) with all offload features enabled." The root cause is that ixgbe_xmit_pkts() in some cases violates the HW rule that the distance between TDs with RS bit set should not exceed 40 TDs. From the latest 82599 spec update: "When WTHRESH is set to zero, the software device driver should set the RS bit in the Tx descriptors with the EOP bit set and at least once in the 40 descriptors." The fix is to make sure that the distance between TDs with RS bit set would never exceed HW limit. With that fix slight slowdown for the full-featured ixgbe TX path might be observed (from our testing - up to 4%). ixgbe simple TX path is unaffected by that patch. Reported-by: Vlad Zolotarov Signed-off-by: Konstantin Ananyev --- drivers/net/ixgbe/ixgbe_rxtx.c | 32 +++++++++++++++++++++++++++----- 1 file changed, 27 insertions(+), 5 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index 5561195..80cae5e 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.c +++ b/drivers/net/ixgbe/ixgbe_rxtx.c @@ -572,7 +572,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, struct ixgbe_tx_entry *sw_ring; struct ixgbe_tx_entry *txe, *txn; volatile union ixgbe_adv_tx_desc *txr; - volatile union ixgbe_adv_tx_desc *txd; + volatile union ixgbe_adv_tx_desc *txd, *txp; struct rte_mbuf *tx_pkt; struct rte_mbuf *m_seg; uint64_t buf_dma_addr; @@ -595,6 +595,7 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, txr = txq->tx_ring; tx_id = txq->tx_tail; txe = &sw_ring[tx_id]; + txp = NULL; /* Determine if the descriptor ring needs to be cleaned. */ if (txq->nb_tx_free < txq->tx_free_thresh) @@ -638,6 +639,12 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, */ nb_used = (uint16_t)(tx_pkt->nb_segs + new_ctx); + if (txp != NULL && + nb_used + txq->nb_tx_used >= txq->tx_rs_thresh) + /* set RS on the previous packet in the burst */ + txp->read.cmd_type_len |= + rte_cpu_to_le_32(IXGBE_TXD_CMD_RS); + /* * The number of descriptors that must be allocated for a * packet is the number of segments of that packet, plus 1 @@ -840,10 +847,18 @@ ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, /* Update txq RS bit counters */ txq->nb_tx_used = 0; - } + txp = NULL; + } else + txp = txd; + txd->read.cmd_type_len |= rte_cpu_to_le_32(cmd_type_len); } + end_of_tx: + /* set RS on last packet in the burst */ + if (txp != NULL) + txp->read.cmd_type_len |= rte_cpu_to_le_32(IXGBE_TXD_CMD_RS); + rte_wmb(); /* @@ -2019,9 +2034,16 @@ ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev, tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH); if (tx_rs_thresh >= (nb_desc - 2)) { PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the number " - "of TX descriptors minus 2. (tx_rs_thresh=%u " - "port=%d queue=%d)", (unsigned int)tx_rs_thresh, - (int)dev->data->port_id, (int)queue_idx); + "of TX descriptors minus 2. (tx_rs_thresh=%u " + "port=%d queue=%d)", (unsigned int)tx_rs_thresh, + (int)dev->data->port_id, (int)queue_idx); + return -(EINVAL); + } + if (tx_rs_thresh > DEFAULT_TX_RS_THRESH) { + PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than %u. " + "(tx_rs_thresh=%u port=%d queue=%d)", + DEFAULT_TX_RS_THRESH, (unsigned int)tx_rs_thresh, + (int)dev->data->port_id, (int)queue_idx); return -(EINVAL); } if (tx_free_thresh >= (nb_desc - 3)) {