[dpdk-dev,memnic,7/7] pmd: split calling mbuf free

Message ID 7F861DC0615E0C47A872E6F3C5FCDDBD011A99C6@BPXM14GP.gisp.nec.co.jp (mailing list archive)
State Superseded, archived
Headers

Commit Message

Hiroshi Shimamoto Sept. 11, 2014, 7:52 a.m. UTC
  From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>

In rte_pktmbuf_free(), there might be cache miss/memory stall issue.
In small packet case, it could harm the performance.

From the result of memnic-tester, in less than 1024 frame size the
performance could be improved.

Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
 size |  before  |  after
   64 | 5.55Mpps | 5.83Mpps
  128 | 5.44Mpps | 5.71Mpps
  256 | 5.22Mpps | 5.40Mpps
  512 | 4.52Mpps | 4.64Mpps
 1024 | 3.73Mpps | 3.68Mpps
 1280 | 3.22Mpps | 3.17Mpps
 1518 | 2.93Mpps | 2.90Mpps

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Reviewed-by: Hayato Momma <h-momma@ce.jp.nec.com>
---
 pmd/pmd_memnic.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
  

Comments

Thomas Monjalon Sept. 24, 2014, 3:20 p.m. UTC | #1
2014-09-11 07:52, Hiroshi Shimamoto:
> @@ -408,9 +408,9 @@ retry:
>  
>  		rte_compiler_barrier();
>  		p->status = MEMNIC_PKT_ST_FILLED;
> -
> -		rte_pktmbuf_free(tx_pkts[nr]);
>  	}
> +	for (i = 0; i < nr; i++)
> +		rte_pktmbuf_free(tx_pkts[i]);
>  
>  	/* stats */
>  	st->opackets += pkts;
> 

You are bursting mbuf freeing. Why title is about "split"?
  
Wiles, Roger Keith Sept. 24, 2014, 4:01 p.m. UTC | #2
On Sep 24, 2014, at 10:20 AM, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:

> 2014-09-11 07:52, Hiroshi Shimamoto:
>> @@ -408,9 +408,9 @@ retry:
>> 
>> 		rte_compiler_barrier();
>> 		p->status = MEMNIC_PKT_ST_FILLED;
>> -
>> -		rte_pktmbuf_free(tx_pkts[nr]);
>> 	}
>> +	for (i = 0; i < nr; i++)
>> +		rte_pktmbuf_free(tx_pkts[i]);
>> 
>> 	/* stats */
>> 	st->opackets += pkts;
>> 
> 
> You are bursting mbuf freeing. Why title is about "split”?

Maybe this should be a new API as in rte_pktmbuf_bulk_free(tx_pkts, nr); ??
This would remove the loop in the application and I know I have done the same thing for Pktgen too.
> 
> -- 
> Thomas

Keith Wiles, Principal Technologist with CTO office, Wind River mobile 972-213-5533
  
Hiroshi Shimamoto Sept. 25, 2014, 1:12 a.m. UTC | #3
Hi Thomas, Keith,

> Subject: Re: [dpdk-dev] [memnic PATCH 7/7] pmd: split calling mbuf free
> 
> 
> On Sep 24, 2014, at 10:20 AM, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> 
> > 2014-09-11 07:52, Hiroshi Shimamoto:
> >> @@ -408,9 +408,9 @@ retry:
> >>
> >> 		rte_compiler_barrier();
> >> 		p->status = MEMNIC_PKT_ST_FILLED;
> >> -
> >> -		rte_pktmbuf_free(tx_pkts[nr]);
> >> 	}
> >> +	for (i = 0; i < nr; i++)
> >> +		rte_pktmbuf_free(tx_pkts[i]);
> >>
> >> 	/* stats */
> >> 	st->opackets += pkts;
> >>
> >
> > You are bursting mbuf freeing. Why title is about "split”?

I thought that in this patch splits main loop operations to putting content and
freeing mbuf, then took work "split", but I see "burst mbuf freeing" is preferable.

> 
> Maybe this should be a new API as in rte_pktmbuf_bulk_free(tx_pkts, nr); ??
> This would remove the loop in the application and I know I have done the same thing for Pktgen too.

Good point, yes, I'm thinking that having new API like rte_pktmbuf_(alloc|free)_bulk()
is good to reduce TLS access and gain performance.
I put that on my stack, but haven't had a time yet.

Do you have any plan to do such thing?

thanks,
Hiroshi

> >
> > --
> > Thomas
> 
> Keith Wiles, Principal Technologist with CTO office, Wind River mobile 972-213-5533
  
Wiles, Roger Keith Sept. 25, 2014, 2:18 a.m. UTC | #4
On Sep 24, 2014, at 8:12 PM, Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> wrote:

> Hi Thomas, Keith,
> 
>> Subject: Re: [dpdk-dev] [memnic PATCH 7/7] pmd: split calling mbuf free
>> 
>> 
>> On Sep 24, 2014, at 10:20 AM, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
>> 
>>> 2014-09-11 07:52, Hiroshi Shimamoto:
>>>> @@ -408,9 +408,9 @@ retry:
>>>> 
>>>> 		rte_compiler_barrier();
>>>> 		p->status = MEMNIC_PKT_ST_FILLED;
>>>> -
>>>> -		rte_pktmbuf_free(tx_pkts[nr]);
>>>> 	}
>>>> +	for (i = 0; i < nr; i++)
>>>> +		rte_pktmbuf_free(tx_pkts[i]);
>>>> 
>>>> 	/* stats */
>>>> 	st->opackets += pkts;
>>>> 
>>> 
>>> You are bursting mbuf freeing. Why title is about "split”?
> 
> I thought that in this patch splits main loop operations to putting content and
> freeing mbuf, then took work "split", but I see "burst mbuf freeing" is preferable.
> 
>> 
>> Maybe this should be a new API as in rte_pktmbuf_bulk_free(tx_pkts, nr); ??
>> This would remove the loop in the application and I know I have done the same thing for Pktgen too.
> 
> Good point, yes, I'm thinking that having new API like rte_pktmbuf_(alloc|free)_bulk()
> is good to reduce TLS access and gain performance.
> I put that on my stack, but haven't had a time yet.
> 
> Do you have any plan to do such thing?

I do not have any plans, but the alloc would be good too.
> 
> thanks,
> Hiroshi
> 
>>> 
>>> --
>>> Thomas
>> 
>> Keith Wiles, Principal Technologist with CTO office, Wind River mobile 972-213-5533

Keith Wiles, Principal Technologist with CTO office, Wind River mobile 972-213-5533
  

Patch

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index cc0ae25..1db065f 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -344,7 +344,7 @@  static uint16_t memnic_xmit_pkts(void *tx_queue,
 	struct memnic_adapter *adapter = q->adapter;
 	struct memnic_data *data = &adapter->nic->down;
 	struct memnic_packet *p;
-	uint16_t nr;
+	uint16_t i, nr;
 	int idx;
 	struct rte_eth_stats *st = &adapter->stats[rte_lcore_id()];
 	uint64_t pkts, bytes, errs;
@@ -408,9 +408,9 @@  retry:
 
 		rte_compiler_barrier();
 		p->status = MEMNIC_PKT_ST_FILLED;
-
-		rte_pktmbuf_free(tx_pkts[nr]);
 	}
+	for (i = 0; i < nr; i++)
+		rte_pktmbuf_free(tx_pkts[i]);
 
 	/* stats */
 	st->opackets += pkts;