[dpdk-dev,v4] ixgbe_pmd: enforce RS bit on every EOP descriptor for devices newer than 82598

Message ID 1441988763-26335-1-git-send-email-konstantin.ananyev@intel.com (mailing list archive)
State Not Applicable, archived
Headers

Commit Message

Ananyev, Konstantin Sept. 11, 2015, 4:26 p.m. UTC
  Hi Vlad,

>> Unfortunately we are seeing a huge performance drop with that patch:
>> On my box bi-directional traffic (64B packet) over one port can't reach even 11 Mpps.
>Konstantin, could u clarify - u saw "only" 11 Mpps with v3 of this patch which doesn't change the rs_thresh and only sets RS on every packet? What is the performance in the same test without this patch? 

Yes, that's with you latest patch - v4.
I am seeing:
vectorRX+fullfeaturedTX over 1 port:
orig_code   14.74 Mpps
your_patch: 10.6 Mpps

Actually, while we speaking about it,
could you try another patch for that issue on your test environment,
see below.
It seems to fix the problem in our test environment.
It is based on the observation that it is ok not to set RS on each EOP if:
the distance between TDs with RS bit set doesn't exceed size of
on-die descriptor queue (40 descriptors).

With that approach I also see a slight performance drop
but it is much less then with your approach:
with the same conditions it can do 14.2 Mpps over 1 port.

Thanks
Konstantin


Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/ixgbe/ixgbe_rxtx.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)
  

Patch

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 91023b9..a7a32ad 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -573,7 +573,7 @@  ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	struct ixgbe_tx_entry *sw_ring;
 	struct ixgbe_tx_entry *txe, *txn;
 	volatile union ixgbe_adv_tx_desc *txr;
-	volatile union ixgbe_adv_tx_desc *txd;
+	volatile union ixgbe_adv_tx_desc *txd, *txp;
 	struct rte_mbuf     *tx_pkt;
 	struct rte_mbuf     *m_seg;
 	uint64_t buf_dma_addr;
@@ -596,6 +596,7 @@  ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 	txr     = txq->tx_ring;
 	tx_id   = txq->tx_tail;
 	txe = &sw_ring[tx_id];
+	txp = NULL;
 
 	/* Determine if the descriptor ring needs to be cleaned. */
 	if (txq->nb_tx_free < txq->tx_free_thresh)
@@ -639,6 +640,12 @@  ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 		 */
 		nb_used = (uint16_t)(tx_pkt->nb_segs + new_ctx);
 
+		if (txp != NULL &&
+				nb_used + txq->nb_tx_used >= txq->tx_rs_thresh)
+			/* set RS on the previous packet in the burst */
+			txp->read.cmd_type_len |=
+				rte_cpu_to_le_32(IXGBE_TXD_CMD_RS);
+
 		/*
 		 * The number of descriptors that must be allocated for a
 		 * packet is the number of segments of that packet, plus 1
@@ -843,8 +850,14 @@  ixgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 			txq->nb_tx_used = 0;
 		}
 		txd->read.cmd_type_len |= rte_cpu_to_le_32(cmd_type_len);
+		txp = txd;
 	}
+
 end_of_tx:
+	/* set RS on last packet in the burst */
+	if (txp != NULL)
+		txp->read.cmd_type_len |= rte_cpu_to_le_32(IXGBE_TXD_CMD_RS);
+			
 	rte_wmb();
 
 	/*
@@ -2124,11 +2137,11 @@  ixgbe_dev_tx_queue_setup(struct rte_eth_dev *dev,
 			tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH);
 	tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ?
 			tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH);
-	if (tx_rs_thresh >= (nb_desc - 2)) {
-		PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than the number "
-			     "of TX descriptors minus 2. (tx_rs_thresh=%u "
-			     "port=%d queue=%d)", (unsigned int)tx_rs_thresh,
-			     (int)dev->data->port_id, (int)queue_idx);
+	if (tx_rs_thresh > DEFAULT_TX_RS_THRESH) {
+		PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than %u. "
+			"(tx_rs_thresh=%u port=%d queue=%d)",
+			DEFAULT_TX_FREE_THRESH, (unsigned int)tx_rs_thresh,
+			(int)dev->data->port_id, (int)queue_idx);
 		return -(EINVAL);
 	}
 	if (tx_free_thresh >= (nb_desc - 3)) {