[v2] app/testpmd: add mempool bulk get for txonly mode

Message ID 20190301134700.8220-1-pbhagavatula@marvell.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series [v2] app/testpmd: add mempool bulk get for txonly mode |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/intel-Performance-Testing success Performance Testing PASS
ci/mellanox-Performance-Testing success Performance Testing PASS

Commit Message

Pavan Nikhilesh Bhagavatula March 1, 2019, 1:47 p.m. UTC
  From: Pavan Nikhilesh <pbhagavatula@marvell.com>

Use mempool bulk get ops to alloc burst of packets and process them.
If bulk get fails fallback to rte_mbuf_raw_alloc.

Suggested-by: Andrew Rybchenko <arybchenko@solarflare.com>
Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---

 v2 Changes:
 - Use bulk ops for fetching segments. (Andrew Rybchenko)
 - Fallback to rte_mbuf_raw_alloc if bulk get fails. (Andrew Rybchenko)
 - Fix mbufs not being freed when there is no more mbufs available for
 segments. (Andrew Rybchenko)

 app/test-pmd/txonly.c | 159 ++++++++++++++++++++++++------------------
 1 file changed, 93 insertions(+), 66 deletions(-)

--
2.21.0
  

Comments

Ferruh Yigit March 19, 2019, 4:48 p.m. UTC | #1
On 3/1/2019 1:47 PM, Pavan Nikhilesh Bhagavatula wrote:
> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> 
> Use mempool bulk get ops to alloc burst of packets and process them.
> If bulk get fails fallback to rte_mbuf_raw_alloc.

I assume the motivation is performance improvement, do you have the numbers
before and after this patch?

cc'ed maintainers.

> 
> Suggested-by: Andrew Rybchenko <arybchenko@solarflare.com>
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>

<...>
  
Pavan Nikhilesh Bhagavatula March 20, 2019, 4:53 a.m. UTC | #2
On Tue, 2019-03-19 at 16:48 +0000, Ferruh Yigit wrote:
> On 3/1/2019 1:47 PM, Pavan Nikhilesh Bhagavatula wrote:
> > From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> > 
> > Use mempool bulk get ops to alloc burst of packets and process
> > them.
> > If bulk get fails fallback to rte_mbuf_raw_alloc.
> 
> I assume the motivation is performance improvement, do you have the
> numbers
> before and after this patch?

We are seeing approximately 500% improvement with the patch on
octeontx2.

> 
> cc'ed maintainers.
> 
> > Suggested-by: Andrew Rybchenko <arybchenko@solarflare.com>
> > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> 
> <...>
>
  
Ferruh Yigit March 26, 2019, 8:43 a.m. UTC | #3
On 3/20/2019 4:53 AM, Pavan Nikhilesh Bhagavatula wrote:
> On Tue, 2019-03-19 at 16:48 +0000, Ferruh Yigit wrote:
>> On 3/1/2019 1:47 PM, Pavan Nikhilesh Bhagavatula wrote:
>>> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>>>
>>> Use mempool bulk get ops to alloc burst of packets and process
>>> them.
>>> If bulk get fails fallback to rte_mbuf_raw_alloc.
>>
>> I assume the motivation is performance improvement, do you have the
>> numbers
>> before and after this patch?
> 
> We are seeing approximately 500% improvement with the patch on
> octeontx2.

%500 is interesting.

btw, Intel also confirmed the improvement:
Tested-by: Yingya Han <yingyax.han@intel.com>

> 
>>
>> cc'ed maintainers.
>>
>>> Suggested-by: Andrew Rybchenko <arybchenko@solarflare.com>
>>> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
>>
>> <...>
>>
> 
>
  
Thomas Monjalon March 26, 2019, 11 a.m. UTC | #4
01/03/2019 14:47, Pavan Nikhilesh Bhagavatula:
> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> 
> Use mempool bulk get ops to alloc burst of packets and process them.
> If bulk get fails fallback to rte_mbuf_raw_alloc.
> 
> Suggested-by: Andrew Rybchenko <arybchenko@solarflare.com>
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> ---
> 
>  v2 Changes:
>  - Use bulk ops for fetching segments. (Andrew Rybchenko)
>  - Fallback to rte_mbuf_raw_alloc if bulk get fails. (Andrew Rybchenko)
>  - Fix mbufs not being freed when there is no more mbufs available for
>  segments. (Andrew Rybchenko)
> 
>  app/test-pmd/txonly.c | 159 ++++++++++++++++++++++++------------------
>  1 file changed, 93 insertions(+), 66 deletions(-)

This is changing a lot of lines so it is difficult to know
what is changed exactly. Please split it with a refactoring
without any real change, and introduce the real change later.
Then we'll be able to examine it and check the performance.

We need to have more tests with more hardware in order
to better understand the performance improvement.
For info, a degradation is seen in Mellanox lab.
  
Iremonger, Bernard March 26, 2019, 11:50 a.m. UTC | #5
Hi Pavan,

<snip>

> Subject: Re: [dpdk-dev] [PATCH v2] app/testpmd: add mempool bulk get for
> txonly mode
> 
> 01/03/2019 14:47, Pavan Nikhilesh Bhagavatula:
> > From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> >
> > Use mempool bulk get ops to alloc burst of packets and process them.
> > If bulk get fails fallback to rte_mbuf_raw_alloc.
> >
> > Suggested-by: Andrew Rybchenko <arybchenko@solarflare.com>
> > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> > ---
> >
> >  v2 Changes:
> >  - Use bulk ops for fetching segments. (Andrew Rybchenko)
> >  - Fallback to rte_mbuf_raw_alloc if bulk get fails. (Andrew
> > Rybchenko)
> >  - Fix mbufs not being freed when there is no more mbufs available for
> > segments. (Andrew Rybchenko)
> >
> >  app/test-pmd/txonly.c | 159
> > ++++++++++++++++++++++++------------------
> >  1 file changed, 93 insertions(+), 66 deletions(-)
> 
> This is changing a lot of lines so it is difficult to know what is changed exactly.
> Please split it with a refactoring without any real change, and introduce the
> real change later.
> Then we'll be able to examine it and check the performance.
> 
> We need to have more tests with more hardware in order to better
> understand the performance improvement.
> For info, a degradation is seen in Mellanox lab.
> 
> 
+1 
Not easy to review.
Btw, unnecessary change at lines  157 and 158 in txonly.c

Regards,

Bernard
  
Pavan Nikhilesh Bhagavatula March 26, 2019, 12:06 p.m. UTC | #6
> -----Original Message-----
> From: Iremonger, Bernard <bernard.iremonger@intel.com>
> Sent: Tuesday, March 26, 2019 5:20 PM
> To: Thomas Monjalon <thomas@monjalon.net>; Pavan Nikhilesh
> Bhagavatula <pbhagavatula@marvell.com>
> Cc: dev@dpdk.org; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> arybchenko@solarflare.com; Yigit, Ferruh <ferruh.yigit@intel.com>
> Subject: [EXT] RE: [dpdk-dev] [PATCH v2] app/testpmd: add mempool bulk
> get for txonly mode
> 
> External Email
> 
> ----------------------------------------------------------------------
> Hi Pavan,
> 
> <snip>
> 
> > Subject: Re: [dpdk-dev] [PATCH v2] app/testpmd: add mempool bulk get
> > for txonly mode
> >
> > 01/03/2019 14:47, Pavan Nikhilesh Bhagavatula:
> > > From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> > >
> > > Use mempool bulk get ops to alloc burst of packets and process them.
> > > If bulk get fails fallback to rte_mbuf_raw_alloc.
> > >
> > > Suggested-by: Andrew Rybchenko <arybchenko@solarflare.com>
> > > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> > > ---
> > >
> > >  v2 Changes:
> > >  - Use bulk ops for fetching segments. (Andrew Rybchenko)
> > >  - Fallback to rte_mbuf_raw_alloc if bulk get fails. (Andrew
> > > Rybchenko)
> > >  - Fix mbufs not being freed when there is no more mbufs available
> > > for segments. (Andrew Rybchenko)
> > >
> > >  app/test-pmd/txonly.c | 159
> > > ++++++++++++++++++++++++------------------
> > >  1 file changed, 93 insertions(+), 66 deletions(-)
> >
> > This is changing a lot of lines so it is difficult to know what is changed
> exactly.
> > Please split it with a refactoring without any real change, and
> > introduce the real change later.

Ok will split in the next version.

> > Then we'll be able to examine it and check the performance.
> >
> > We need to have more tests with more hardware in order to better
> > understand the performance improvement.
> > For info, a degradation is seen in Mellanox lab.

The only real change was that we introduced mempool_bulk get to avoid multiple calls to mempool layer. 
I don't see how that would degrade the performance.

> >
> >
> +1
> Not easy to review.
> Btw, unnecessary change at lines  157 and 158 in txonly.c

Will remove the assignments.

> 
> Regards,
> 
> Bernard
  
Jerin Jacob Kollanukkaran March 26, 2019, 1:16 p.m. UTC | #7
On Tue, 2019-03-26 at 12:06 +0000, Pavan Nikhilesh Bhagavatula wrote:
> > -----Original Message-----
> > From: Iremonger, Bernard <bernard.iremonger@intel.com>
> > Sent: Tuesday, March 26, 2019 5:20 PM
> > To: Thomas Monjalon <thomas@monjalon.net>; Pavan Nikhilesh
> > Bhagavatula <pbhagavatula@marvell.com>
> > Cc: dev@dpdk.org; Jerin Jacob Kollanukkaran <jerinj@marvell.com>;
> > arybchenko@solarflare.com; Yigit, Ferruh <ferruh.yigit@intel.com>
> > Subject: [EXT] RE: [dpdk-dev] [PATCH v2] app/testpmd: add mempool
> > bulk
> > get for txonly mode
> > 
> > 
> > > Then we'll be able to examine it and check the performance.
> > > 
> > > We need to have more tests with more hardware in order to better
> > > understand the performance improvement.
> > > For info, a degradation is seen in Mellanox lab.
> 
> The only real change was that we introduced mempool_bulk get to avoid
> multiple calls to mempool layer. 
> I don't see how that would degrade the performance.

I don't think, it is CPU instruction based degradation as it gives
better performance on x86 and arm64. Adding Mellanox maintainers to
comment on, How does it impact on Mellanox NIC hardware?
  

Patch

diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c
index 1f08b6ed3..af0be89ca 100644
--- a/app/test-pmd/txonly.c
+++ b/app/test-pmd/txonly.c
@@ -147,6 +147,62 @@  setup_pkt_udp_ip_headers(struct ipv4_hdr *ip_hdr,
 	ip_hdr->hdr_checksum = (uint16_t) ip_cksum;
 }

+static inline bool
+pkt_burst_prepare(struct rte_mbuf *pkt, struct rte_mempool *mbp,
+		struct ether_hdr *eth_hdr, const uint16_t vlan_tci,
+		const uint16_t vlan_tci_outer, const uint64_t ol_flags)
+{
+	struct rte_mbuf *pkt_segs[RTE_MAX_SEGS_PER_PKT];
+	struct rte_mbuf *pkt_seg;
+	uint32_t nb_segs, pkt_len = 0;
+	uint8_t i;
+
+	if (unlikely(tx_pkt_split == TX_PKT_SPLIT_RND))
+		nb_segs = random() % tx_pkt_nb_segs + 1;
+	else
+		nb_segs = tx_pkt_nb_segs;
+
+	if (nb_segs > 1) {
+		if (rte_mempool_get_bulk(mbp, (void **)pkt_segs, nb_segs))
+			return false;
+	}
+
+	rte_pktmbuf_reset_headroom(pkt);
+	pkt->data_len = tx_pkt_seg_lengths[0];
+	pkt->ol_flags = ol_flags;
+	pkt->vlan_tci = vlan_tci;
+	pkt->vlan_tci_outer = vlan_tci_outer;
+	pkt->l2_len = sizeof(struct ether_hdr);
+	pkt->l3_len = sizeof(struct ipv4_hdr);
+
+	pkt_seg = pkt;
+	for (i = 1; i < nb_segs; i++) {
+		pkt_seg->next = pkt_segs[i - 1];
+		pkt_seg = pkt_seg->next;
+		pkt_seg->data_len = tx_pkt_seg_lengths[i];
+		pkt_len += pkt_seg->data_len;
+	}
+	pkt_seg->next = NULL; /* Last segment of packet. */
+	/*
+	 * Copy headers in first packet segment(s).
+	 */
+	copy_buf_to_pkt(eth_hdr, sizeof(eth_hdr), pkt, 0);
+	copy_buf_to_pkt(&pkt_ip_hdr, sizeof(pkt_ip_hdr), pkt,
+			sizeof(struct ether_hdr));
+	copy_buf_to_pkt(&pkt_udp_hdr, sizeof(pkt_udp_hdr), pkt,
+			sizeof(struct ether_hdr) +
+			sizeof(struct ipv4_hdr));
+
+	/*
+	 * Complete first mbuf of packet and append it to the
+	 * burst of packets to be transmitted.
+	 */
+	pkt->nb_segs = nb_segs;
+	pkt->pkt_len += pkt_len;
+
+	return true;
+}
+
 /*
  * Transmit a burst of multi-segments packets.
  */
@@ -154,9 +210,8 @@  static void
 pkt_burst_transmit(struct fwd_stream *fs)
 {
 	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
-	struct rte_port *txp;
 	struct rte_mbuf *pkt;
-	struct rte_mbuf *pkt_seg;
+	struct rte_port *txp;
 	struct rte_mempool *mbp;
 	struct ether_hdr eth_hdr;
 	uint16_t nb_tx;
@@ -164,14 +219,12 @@  pkt_burst_transmit(struct fwd_stream *fs)
 	uint16_t vlan_tci, vlan_tci_outer;
 	uint32_t retry;
 	uint64_t ol_flags = 0;
-	uint8_t  i;
 	uint64_t tx_offloads;
 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	uint64_t start_tsc;
 	uint64_t end_tsc;
 	uint64_t core_cycles;
 #endif
-	uint32_t nb_segs, pkt_len;

 #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES
 	start_tsc = rte_rdtsc();
@@ -188,72 +241,46 @@  pkt_burst_transmit(struct fwd_stream *fs)
 		ol_flags |= PKT_TX_QINQ_PKT;
 	if (tx_offloads & DEV_TX_OFFLOAD_MACSEC_INSERT)
 		ol_flags |= PKT_TX_MACSEC;
-	for (nb_pkt = 0; nb_pkt < nb_pkt_per_burst; nb_pkt++) {
-		pkt = rte_mbuf_raw_alloc(mbp);
-		if (pkt == NULL) {
-		nomore_mbuf:
-			if (nb_pkt == 0)
-				return;
-			break;
-		}

-		/*
-		 * Using raw alloc is good to improve performance,
-		 * but some consumers may use the headroom and so
-		 * decrement data_off. We need to make sure it is
-		 * reset to default value.
-		 */
-		rte_pktmbuf_reset_headroom(pkt);
-		pkt->data_len = tx_pkt_seg_lengths[0];
-		pkt_seg = pkt;
-		if (tx_pkt_split == TX_PKT_SPLIT_RND)
-			nb_segs = random() % tx_pkt_nb_segs + 1;
-		else
-			nb_segs = tx_pkt_nb_segs;
-		pkt_len = pkt->data_len;
-		for (i = 1; i < nb_segs; i++) {
-			pkt_seg->next = rte_mbuf_raw_alloc(mbp);
-			if (pkt_seg->next == NULL) {
-				pkt->nb_segs = i;
-				rte_pktmbuf_free(pkt);
-				goto nomore_mbuf;
+	/*
+	 * Initialize Ethernet header.
+	 */
+	ether_addr_copy(&peer_eth_addrs[fs->peer_addr], &eth_hdr.d_addr);
+	ether_addr_copy(&ports[fs->tx_port].eth_addr, &eth_hdr.s_addr);
+	eth_hdr.ether_type = rte_cpu_to_be_16(ETHER_TYPE_IPv4);
+
+	if (rte_mempool_get_bulk(mbp, (void **)pkts_burst,
+				nb_pkt_per_burst) == 0) {
+		for (nb_pkt = 0; nb_pkt < nb_pkt_per_burst; nb_pkt++) {
+			if (unlikely(!pkt_burst_prepare(pkts_burst[nb_pkt], mbp,
+							&eth_hdr, vlan_tci,
+							vlan_tci_outer,
+							ol_flags))) {
+				rte_mempool_put_bulk(mbp,
+						(void **)&pkts_burst[nb_pkt],
+						nb_pkt_per_burst - nb_pkt);
+				break;
+			}
+		}
+	} else {
+		for (nb_pkt = 0; nb_pkt < nb_pkt_per_burst; nb_pkt++) {
+			pkt = rte_mbuf_raw_alloc(mbp);
+			if (pkt == NULL)
+				break;
+			if (unlikely(!pkt_burst_prepare(pkt, mbp,
+							&eth_hdr, vlan_tci,
+							vlan_tci_outer,
+							ol_flags))) {
+				rte_mempool_put(mbp, pkt);
+				break;
 			}
-			pkt_seg = pkt_seg->next;
-			pkt_seg->data_len = tx_pkt_seg_lengths[i];
-			pkt_len += pkt_seg->data_len;
+			pkts_burst[nb_pkt] = pkt;
 		}
-		pkt_seg->next = NULL; /* Last segment of packet. */
-
-		/*
-		 * Initialize Ethernet header.
-		 */
-		ether_addr_copy(&peer_eth_addrs[fs->peer_addr],&eth_hdr.d_addr);
-		ether_addr_copy(&ports[fs->tx_port].eth_addr, &eth_hdr.s_addr);
-		eth_hdr.ether_type = rte_cpu_to_be_16(ETHER_TYPE_IPv4);
-
-		/*
-		 * Copy headers in first packet segment(s).
-		 */
-		copy_buf_to_pkt(&eth_hdr, sizeof(eth_hdr), pkt, 0);
-		copy_buf_to_pkt(&pkt_ip_hdr, sizeof(pkt_ip_hdr), pkt,
-				sizeof(struct ether_hdr));
-		copy_buf_to_pkt(&pkt_udp_hdr, sizeof(pkt_udp_hdr), pkt,
-				sizeof(struct ether_hdr) +
-				sizeof(struct ipv4_hdr));
-
-		/*
-		 * Complete first mbuf of packet and append it to the
-		 * burst of packets to be transmitted.
-		 */
-		pkt->nb_segs = nb_segs;
-		pkt->pkt_len = pkt_len;
-		pkt->ol_flags = ol_flags;
-		pkt->vlan_tci = vlan_tci;
-		pkt->vlan_tci_outer = vlan_tci_outer;
-		pkt->l2_len = sizeof(struct ether_hdr);
-		pkt->l3_len = sizeof(struct ipv4_hdr);
-		pkts_burst[nb_pkt] = pkt;
 	}
+
+	if (nb_pkt == 0)
+		return;
+
 	nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, nb_pkt);
 	/*
 	 * Retry if necessary