[dpdk-dev,v2] kni: add new mbuf in alloc_q only based on its empty slots

Message ID 1494503486-20876-1-git-send-email-gowrishankar.m@linux.vnet.ibm.com (mailing list archive)
State Accepted, archived
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK

Commit Message

Gowrishankar May 11, 2017, 11:51 a.m. UTC
  From: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>

In kni_allocate_mbufs(), we attempt to add max_burst (32) count of mbuf
always into alloc_q, which is excessively leading too many rte_pktmbuf_
free() when alloc_q is contending at high packet rate (for eg 10Gig data).
In a situation when alloc_q fifo can only accommodate very few (or zero)
mbuf, create only what needed and add in fifo.

With this patch, we could stop random network stall in KNI at higher packet
rate (eg 1G or 10G data between vEth0 and PMD) sufficiently exhausting
alloc_q on above condition. I tested i40e PMD for this purpose in ppc64le.

Changes:
 v2 - alloc_q free count calculation corrected.
      line wrap fixed for commit message.

Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
---
 lib/librte_kni/rte_kni.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
  

Comments

Ferruh Yigit May 16, 2017, 5:15 p.m. UTC | #1
On 5/11/2017 12:51 PM, Gowrishankar wrote:
> From: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
> 
> In kni_allocate_mbufs(), we attempt to add max_burst (32) count of mbuf
> always into alloc_q, which is excessively leading too many rte_pktmbuf_
> free() when alloc_q is contending at high packet rate (for eg 10Gig data).
> In a situation when alloc_q fifo can only accommodate very few (or zero)
> mbuf, create only what needed and add in fifo.

I remember I have tried similar, also tried allocating amount of
nb_packets read from kernel, both produced worse performance.
Can you please share your before/after performance numbers?

kni_allocate_mbufs() called within rte_kni_rx_burst() if any packet
received from kernel. If there is a heavy traffic, kernel will always
consume the alloc_q before this function called and this function will
fill it back. So there shouldn't be much cases that alloc_q fifo already
full.

Perhaps this can happen if application burst Rx from kernel in a number
less than 32, but fifo filled with fixed 32mbufs, is this your case?

Can you measure number of times rte_pktmbuf_free() called because of
alloc_q is full?

> 
> With this patch, we could stop random network stall in KNI at higher packet
> rate (eg 1G or 10G data between vEth0 and PMD) sufficiently exhausting
> alloc_q on above condition. I tested i40e PMD for this purpose in ppc64le.

If stall happens from NIC to kernel, this is kernel receive path, and
alloc_q is in kernel transmit path.

> 
> Changes:
>  v2 - alloc_q free count calculation corrected.
>       line wrap fixed for commit message.
> 
> Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
> ---
>  lib/librte_kni/rte_kni.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
> index c3f9208..9c5d485 100644
> --- a/lib/librte_kni/rte_kni.c
> +++ b/lib/librte_kni/rte_kni.c
> @@ -624,6 +624,7 @@ struct rte_kni *
>  	int i, ret;
>  	struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM];
>  	void *phys[MAX_MBUF_BURST_NUM];
> +	int allocq_free;
>  
>  	RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, pool) !=
>  			 offsetof(struct rte_kni_mbuf, pool));
> @@ -646,7 +647,9 @@ struct rte_kni *
>  		return;
>  	}
>  
> -	for (i = 0; i < MAX_MBUF_BURST_NUM; i++) {
> +	allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1) \
> +			& (MAX_MBUF_BURST_NUM - 1);
> +	for (i = 0; i < allocq_free; i++) {
>  		pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
>  		if (unlikely(pkts[i] == NULL)) {
>  			/* Out of memory */
>
  
Gowrishankar May 18, 2017, 5:45 p.m. UTC | #2
On Tuesday 16 May 2017 10:45 PM, Ferruh Yigit wrote:
> On 5/11/2017 12:51 PM, Gowrishankar wrote:
>> From: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
>>
>> In kni_allocate_mbufs(), we attempt to add max_burst (32) count of mbuf
>> always into alloc_q, which is excessively leading too many rte_pktmbuf_
>> free() when alloc_q is contending at high packet rate (for eg 10Gig data).
>> In a situation when alloc_q fifo can only accommodate very few (or zero)
>> mbuf, create only what needed and add in fifo.
> I remember I have tried similar, also tried allocating amount of
> nb_packets read from kernel, both produced worse performance.
> Can you please share your before/after performance numbers?
Sure Ferruh, please find below comparison of call counts I set at two places
along with additional stat on kni egress for more than one packet in txq 
burst read,
as in pseudo code below:

    @@ -589,8 +592,12 @@ rte_kni_rx_burst(struct rte_kni *kni, struct
    rte_mbuf **mbufs, unsigned num)
             unsigned ret = kni_fifo_get(kni->tx_q, (void **)mbufs, num);

             /* If buffers removed, allocate mbufs and then put them
    into alloc_q */
            if (ret) {
                    ++alloc_call;
                    if (ret > 1)
                            alloc_call_mt1tx += ret;
                     kni_allocate_mbufs(kni);
            }

             return ret;
      }
    @@ -659,6 +666,7 @@ kni_allocate_mbufs(struct rte_kni *kni)
             if (ret >= 0 && ret < i && ret < MAX_MBUF_BURST_NUM) {
                     int j;

                    freembuf_call += (i-ret);
                    for (j = ret; j < i; j++)
                        rte_pktmbuf_free(pkts[j]);



> kni_allocate_mbufs() called within rte_kni_rx_burst() if any packet
> received from kernel. If there is a heavy traffic, kernel will always
> consume the alloc_q before this function called and this function will
> fill it back. So there shouldn't be much cases that alloc_q fifo already
> full.
> Perhaps this can happen if application burst Rx from kernel in a number
> less than 32, but fifo filled with fixed 32mbufs, is this your case?

I think some resemblance to this case based on below stats. W/o patch, 
application
would spend its most of processing in freeing mbufs right ?.

>
> Can you measure number of times rte_pktmbuf_free() called because of
> alloc_q is full?

I have sampled below data in x86_64 for KNI on ixgbe pmd. iperf server 
runs on
remote interface connecting PMD and iperf client runs on KNI interface, 
so as to
create more egress from KNI into DPDK (w/o and with this patch) for 1MB and
100MB data. rx and tx stats are from kni app (USR1).

100MB w/o patch 1.28Gbps
rx      tx        alloc_call  alloc_call_mt1tx freembuf_call
3933 72464 51042      42472              1560540

1MB w/o patch 204Mbps
rx   tx       alloc_call alloc_call_mt1tx freembuf_call
84  734    566        330                   17378

100MB w/ patch 1.23Gbps
rx      tx        alloc_call  alloc_call_mt1tx freembuf_call
4258 72466 72466      0                      0

1MB w/ patch 203Mbps
rx  tx       alloc_call alloc_call_mt1tx freembuf_call
76 734    733        2                       0

With patch, KNI egress on txq seems to be almost only one packet at a time
(and in 1MB test, a rare instance of more than 2 packets seen even 
though it is
burst read). Also, as it is one mbuf consumed by module and added by lib at
a time, rte_pktmbuf_free is not called at all, due to right amount (1 or 2)
of mbufs enqueued in alloc_q.

This controlled enqueue on alloc_q avoids nw stall for i40e in ppc64le. 
Could you
please check if i40e is able to handle data at order of 10GiB in your 
arch, as I see
that, network stalls at some random point w/o this patch.

Thanks,
Gowrishankar

>> With this patch, we could stop random network stall in KNI at higher packet
>> rate (eg 1G or 10G data between vEth0 and PMD) sufficiently exhausting
>> alloc_q on above condition. I tested i40e PMD for this purpose in ppc64le.
> If stall happens from NIC to kernel, this is kernel receive path, and
> alloc_q is in kernel transmit path.
>
>> Changes:
>>   v2 - alloc_q free count calculation corrected.
>>        line wrap fixed for commit message.
>>
>> Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
>> ---
>>   lib/librte_kni/rte_kni.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
>> index c3f9208..9c5d485 100644
>> --- a/lib/librte_kni/rte_kni.c
>> +++ b/lib/librte_kni/rte_kni.c
>> @@ -624,6 +624,7 @@ struct rte_kni *
>>   	int i, ret;
>>   	struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM];
>>   	void *phys[MAX_MBUF_BURST_NUM];
>> +	int allocq_free;
>>   
>>   	RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, pool) !=
>>   			 offsetof(struct rte_kni_mbuf, pool));
>> @@ -646,7 +647,9 @@ struct rte_kni *
>>   		return;
>>   	}
>>   
>> -	for (i = 0; i < MAX_MBUF_BURST_NUM; i++) {
>> +	allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1) \
>> +			& (MAX_MBUF_BURST_NUM - 1);
>> +	for (i = 0; i < allocq_free; i++) {
>>   		pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
>>   		if (unlikely(pkts[i] == NULL)) {
>>   			/* Out of memory */
>>
  
Ferruh Yigit May 31, 2017, 4:21 p.m. UTC | #3
Hi Gowrishankar,

Sorry for late response.

On 5/18/2017 6:45 PM, gowrishankar muthukrishnan wrote:
> On Tuesday 16 May 2017 10:45 PM, Ferruh Yigit wrote:
>> On 5/11/2017 12:51 PM, Gowrishankar wrote:
>>> From: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
>>>
>>> In kni_allocate_mbufs(), we attempt to add max_burst (32) count of mbuf
>>> always into alloc_q, which is excessively leading too many rte_pktmbuf_
>>> free() when alloc_q is contending at high packet rate (for eg 10Gig data).
>>> In a situation when alloc_q fifo can only accommodate very few (or zero)
>>> mbuf, create only what needed and add in fifo.
>> I remember I have tried similar, also tried allocating amount of
>> nb_packets read from kernel, both produced worse performance.
>> Can you please share your before/after performance numbers?
> Sure Ferruh, please find below comparison of call counts I set at two places
> along with additional stat on kni egress for more than one packet in txq
> burst read,
> as in pseudo code below:
> 
>     @@ -589,8 +592,12 @@ rte_kni_rx_burst(struct rte_kni *kni, struct
>     rte_mbuf **mbufs, unsigned num)
>             unsigned ret = kni_fifo_get(kni->tx_q, (void **)mbufs, num);
>      
>             /* If buffers removed, allocate mbufs and then put them into
>     alloc_q */
>            if (ret) {
>                    ++alloc_call;
>                    if (ret > 1)
>                            alloc_call_mt1tx += ret;
>                     kni_allocate_mbufs(kni);
>            }
>      
>             return ret;
>      }
>     @@ -659,6 +666,7 @@ kni_allocate_mbufs(struct rte_kni *kni)
>             if (ret >= 0 && ret < i && ret < MAX_MBUF_BURST_NUM) {
>                     int j;
>      
>                    freembuf_call += (i-ret);
>                    for (j = ret; j < i; j++)
>                        rte_pktmbuf_free(pkts[j]);
> 
> 
> 
>> kni_allocate_mbufs() called within rte_kni_rx_burst() if any packet
>> received from kernel. If there is a heavy traffic, kernel will always
>> consume the alloc_q before this function called and this function will
>> fill it back. So there shouldn't be much cases that alloc_q fifo already
>> full.
>> Perhaps this can happen if application burst Rx from kernel in a number
>> less than 32, but fifo filled with fixed 32mbufs, is this your case?
> 
> I think some resemblance to this case based on below stats. W/o patch,
> application
> would spend its most of processing in freeing mbufs right ?.
> 
>>
>> Can you measure number of times rte_pktmbuf_free() called because of
>> alloc_q is full?
> 
> I have sampled below data in x86_64 for KNI on ixgbe pmd. iperf server
> runs on
> remote interface connecting PMD and iperf client runs on KNI interface,
> so as to
> create more egress from KNI into DPDK (w/o and with this patch) for 1MB and
> 100MB data. rx and tx stats are from kni app (USR1).
> 
> 100MB w/o patch 1.28Gbps
> rx      tx        alloc_call  alloc_call_mt1tx freembuf_call
> 3933 72464 51042      42472              1560540

Some math:

alloc called 51042 times with allocating 32 mbufs each time,
51042 * 32 = 1633344

freed mbufs: 1560540

used mbufs: 1633344 - 1560540 = 72804

72804 =~ 72464, so looks correct.

Which means rte_kni_rx_burst() called 51042 times and 72464 buffers
received.

As you already mentioned, for each call kernel able to put only 1-2
packets into the fifo. This number is close to 3 for my test with KNI PMD.

And for this case, agree your patch looks reasonable.

But what if kni has more egress traffic, that able to put >= 32 packets
between each rte_kni_rx_burst()?
For that case this patch introduces extra cost to get allocq_free count.

Overall I am not disagree with patch, but I have concern if this would
cause performance loss some cases while making better for this one. That
would help a lot if KNI users test and comment.

For me, applying patch didn't give any difference in final performance
numbers, but if there is no objection, I am OK to get this patch.


> 
> 1MB w/o patch 204Mbps
> rx   tx       alloc_call alloc_call_mt1tx freembuf_call
> 84  734    566        330                   17378
> 
> 100MB w/ patch 1.23Gbps
> rx      tx        alloc_call  alloc_call_mt1tx freembuf_call
> 4258 72466 72466      0                      0
> 
> 1MB w/ patch 203Mbps
> rx  tx       alloc_call alloc_call_mt1tx freembuf_call
> 76 734    733        2                       0
> 
> With patch, KNI egress on txq seems to be almost only one packet at a time
> (and in 1MB test, a rare instance of more than 2 packets seen even
> though it is
> burst read). Also, as it is one mbuf consumed by module and added by lib at
> a time, rte_pktmbuf_free is not called at all, due to right amount (1 or 2)
> of mbufs enqueued in alloc_q.
> 
> This controlled enqueue on alloc_q avoids nw stall for i40e in ppc64le.
> Could you
> please check if i40e is able to handle data at order of 10GiB in your
> arch, as I see
> that, network stalls at some random point w/o this patch.
> 
> Thanks,
> Gowrishankar
> 
>>> With this patch, we could stop random network stall in KNI at higher packet
>>> rate (eg 1G or 10G data between vEth0 and PMD) sufficiently exhausting
>>> alloc_q on above condition. I tested i40e PMD for this purpose in ppc64le.
>> If stall happens from NIC to kernel, this is kernel receive path, and
>> alloc_q is in kernel transmit path.
>>
>>> Changes:
>>>  v2 - alloc_q free count calculation corrected.
>>>       line wrap fixed for commit message.
>>>
>>> Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
>>> ---
>>>  lib/librte_kni/rte_kni.c | 5 ++++-
>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
>>> index c3f9208..9c5d485 100644
>>> --- a/lib/librte_kni/rte_kni.c
>>> +++ b/lib/librte_kni/rte_kni.c
>>> @@ -624,6 +624,7 @@ struct rte_kni *
>>>  	int i, ret;
>>>  	struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM];
>>>  	void *phys[MAX_MBUF_BURST_NUM];
>>> +	int allocq_free;
>>>  
>>>  	RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, pool) !=
>>>  			 offsetof(struct rte_kni_mbuf, pool));
>>> @@ -646,7 +647,9 @@ struct rte_kni *
>>>  		return;
>>>  	}
>>>  
>>> -	for (i = 0; i < MAX_MBUF_BURST_NUM; i++) {
>>> +	allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1) \
>>> +			& (MAX_MBUF_BURST_NUM - 1);
>>> +	for (i = 0; i < allocq_free; i++) {
>>>  		pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
>>>  		if (unlikely(pkts[i] == NULL)) {
>>>  			/* Out of memory */
>>>
> 
>
  
Gowrishankar June 1, 2017, 5:56 a.m. UTC | #4
Hi Ferruh,

On Wednesday 31 May 2017 09:51 PM, Ferruh Yigit wrote:
<cut>
> I have sampled below data in x86_64 for KNI on ixgbe pmd. iperf server
>> runs on
>> remote interface connecting PMD and iperf client runs on KNI interface,
>> so as to
>> create more egress from KNI into DPDK (w/o and with this patch) for 1MB and
>> 100MB data. rx and tx stats are from kni app (USR1).
>>
>> 100MB w/o patch 1.28Gbps
>> rx      tx        alloc_call  alloc_call_mt1tx freembuf_call
>> 3933 72464 51042      42472              1560540
> Some math:
>
> alloc called 51042 times with allocating 32 mbufs each time,
> 51042 * 32 = 1633344
>
> freed mbufs: 1560540
>
> used mbufs: 1633344 - 1560540 = 72804
>
> 72804 =~ 72464, so looks correct.
>
> Which means rte_kni_rx_burst() called 51042 times and 72464 buffers
> received.
>
> As you already mentioned, for each call kernel able to put only 1-2
> packets into the fifo. This number is close to 3 for my test with KNI PMD.
>
> And for this case, agree your patch looks reasonable.
>
> But what if kni has more egress traffic, that able to put >= 32 packets
> between each rte_kni_rx_burst()?
> For that case this patch introduces extra cost to get allocq_free count.

Are there case(s) we see kernel thread writing txq faster at a rate 
higher than kni application
could dequeue it ?. In my understanding, KNI is suppose to be a slow 
path as it puts
packets back into network stack (control plane ?).

Regards,
Gowrishankar

> Overall I am not disagree with patch, but I have concern if this would
> cause performance loss some cases while making better for this one. That
> would help a lot if KNI users test and comment.
>
> For me, applying patch didn't give any difference in final performance
> numbers, but if there is no objection, I am OK to get this patch.
>
>

<cut>
  
Ferruh Yigit June 1, 2017, 9:18 a.m. UTC | #5
On 6/1/2017 6:56 AM, gowrishankar muthukrishnan wrote:
> Hi Ferruh,
> 
> On Wednesday 31 May 2017 09:51 PM, Ferruh Yigit wrote:
> <cut>
>> I have sampled below data in x86_64 for KNI on ixgbe pmd. iperf server
>>> runs on
>>> remote interface connecting PMD and iperf client runs on KNI interface,
>>> so as to
>>> create more egress from KNI into DPDK (w/o and with this patch) for 1MB and
>>> 100MB data. rx and tx stats are from kni app (USR1).
>>>
>>> 100MB w/o patch 1.28Gbps
>>> rx      tx        alloc_call  alloc_call_mt1tx freembuf_call
>>> 3933 72464 51042      42472              1560540
>> Some math:
>>
>> alloc called 51042 times with allocating 32 mbufs each time,
>> 51042 * 32 = 1633344
>>
>> freed mbufs: 1560540
>>
>> used mbufs: 1633344 - 1560540 = 72804
>>
>> 72804 =~ 72464, so looks correct.
>>
>> Which means rte_kni_rx_burst() called 51042 times and 72464 buffers
>> received.
>>
>> As you already mentioned, for each call kernel able to put only 1-2
>> packets into the fifo. This number is close to 3 for my test with KNI PMD.
>>
>> And for this case, agree your patch looks reasonable.
>>
>> But what if kni has more egress traffic, that able to put >= 32 packets
>> between each rte_kni_rx_burst()?
>> For that case this patch introduces extra cost to get allocq_free count.
> 
> Are there case(s) we see kernel thread writing txq faster at a rate 
> higher than kni application
> could dequeue it ?. In my understanding, KNI is suppose to be a slow 
> path as it puts
> packets back into network stack (control plane ?).

Kernel thread doesn't need to be faster than what app can dequeue,  it
is enough if kernel thread can put 32 or more packets for this case, but
I see this goes to same place.

And for kernel multi-thread mode, each kernel thread has more time to
enqueue packets, although I don't have the numbers.

> 
> Regards,
> Gowrishankar
> 
>> Overall I am not disagree with patch, but I have concern if this would
>> cause performance loss some cases while making better for this one. That
>> would help a lot if KNI users test and comment.
>>
>> For me, applying patch didn't give any difference in final performance
>> numbers, but if there is no objection, I am OK to get this patch.
>>
>>
> 
> <cut>
>
  
Gowrishankar June 6, 2017, 2:43 p.m. UTC | #6
Hi Ferruh,
Just wanted to check with you on the verdict of this patch, whether we 
are waiting for
any objection/ack ?.

Thanks,
Gowrishankar

On Thursday 01 June 2017 02:48 PM, Ferruh Yigit wrote:
> On 6/1/2017 6:56 AM, gowrishankar muthukrishnan wrote:
>> Hi Ferruh,
>>
>> On Wednesday 31 May 2017 09:51 PM, Ferruh Yigit wrote:
>> <cut>
>>> I have sampled below data in x86_64 for KNI on ixgbe pmd. iperf server
>>>> runs on
>>>> remote interface connecting PMD and iperf client runs on KNI interface,
>>>> so as to
>>>> create more egress from KNI into DPDK (w/o and with this patch) for 1MB and
>>>> 100MB data. rx and tx stats are from kni app (USR1).
>>>>
>>>> 100MB w/o patch 1.28Gbps
>>>> rx      tx        alloc_call  alloc_call_mt1tx freembuf_call
>>>> 3933 72464 51042      42472              1560540
>>> Some math:
>>>
>>> alloc called 51042 times with allocating 32 mbufs each time,
>>> 51042 * 32 = 1633344
>>>
>>> freed mbufs: 1560540
>>>
>>> used mbufs: 1633344 - 1560540 = 72804
>>>
>>> 72804 =~ 72464, so looks correct.
>>>
>>> Which means rte_kni_rx_burst() called 51042 times and 72464 buffers
>>> received.
>>>
>>> As you already mentioned, for each call kernel able to put only 1-2
>>> packets into the fifo. This number is close to 3 for my test with KNI PMD.
>>>
>>> And for this case, agree your patch looks reasonable.
>>>
>>> But what if kni has more egress traffic, that able to put >= 32 packets
>>> between each rte_kni_rx_burst()?
>>> For that case this patch introduces extra cost to get allocq_free count.
>> Are there case(s) we see kernel thread writing txq faster at a rate
>> higher than kni application
>> could dequeue it ?. In my understanding, KNI is suppose to be a slow
>> path as it puts
>> packets back into network stack (control plane ?).
> Kernel thread doesn't need to be faster than what app can dequeue,  it
> is enough if kernel thread can put 32 or more packets for this case, but
> I see this goes to same place.
>
> And for kernel multi-thread mode, each kernel thread has more time to
> enqueue packets, although I don't have the numbers.
>
>> Regards,
>> Gowrishankar
>>
>>> Overall I am not disagree with patch, but I have concern if this would
>>> cause performance loss some cases while making better for this one. That
>>> would help a lot if KNI users test and comment.
>>>
>>> For me, applying patch didn't give any difference in final performance
>>> numbers, but if there is no objection, I am OK to get this patch.
>>>
>>>
>> <cut>
>>
  
Ferruh Yigit June 7, 2017, 5:20 p.m. UTC | #7
On 6/6/2017 3:43 PM, gowrishankar muthukrishnan wrote:
> Hi Ferruh,
> Just wanted to check with you on the verdict of this patch, whether we 
> are waiting for
> any objection/ack ?.

I was waiting for more comment, I will ack explicitly.

> 
> Thanks,
> Gowrishankar
> 
> On Thursday 01 June 2017 02:48 PM, Ferruh Yigit wrote:
>> On 6/1/2017 6:56 AM, gowrishankar muthukrishnan wrote:
>>> Hi Ferruh,
>>>
>>> On Wednesday 31 May 2017 09:51 PM, Ferruh Yigit wrote:
>>> <cut>
>>>> I have sampled below data in x86_64 for KNI on ixgbe pmd. iperf server
>>>>> runs on
>>>>> remote interface connecting PMD and iperf client runs on KNI interface,
>>>>> so as to
>>>>> create more egress from KNI into DPDK (w/o and with this patch) for 1MB and
>>>>> 100MB data. rx and tx stats are from kni app (USR1).
>>>>>
>>>>> 100MB w/o patch 1.28Gbps
>>>>> rx      tx        alloc_call  alloc_call_mt1tx freembuf_call
>>>>> 3933 72464 51042      42472              1560540
>>>> Some math:
>>>>
>>>> alloc called 51042 times with allocating 32 mbufs each time,
>>>> 51042 * 32 = 1633344
>>>>
>>>> freed mbufs: 1560540
>>>>
>>>> used mbufs: 1633344 - 1560540 = 72804
>>>>
>>>> 72804 =~ 72464, so looks correct.
>>>>
>>>> Which means rte_kni_rx_burst() called 51042 times and 72464 buffers
>>>> received.
>>>>
>>>> As you already mentioned, for each call kernel able to put only 1-2
>>>> packets into the fifo. This number is close to 3 for my test with KNI PMD.
>>>>
>>>> And for this case, agree your patch looks reasonable.
>>>>
>>>> But what if kni has more egress traffic, that able to put >= 32 packets
>>>> between each rte_kni_rx_burst()?
>>>> For that case this patch introduces extra cost to get allocq_free count.
>>> Are there case(s) we see kernel thread writing txq faster at a rate
>>> higher than kni application
>>> could dequeue it ?. In my understanding, KNI is suppose to be a slow
>>> path as it puts
>>> packets back into network stack (control plane ?).
>> Kernel thread doesn't need to be faster than what app can dequeue,  it
>> is enough if kernel thread can put 32 or more packets for this case, but
>> I see this goes to same place.
>>
>> And for kernel multi-thread mode, each kernel thread has more time to
>> enqueue packets, although I don't have the numbers.
>>
>>> Regards,
>>> Gowrishankar
>>>
>>>> Overall I am not disagree with patch, but I have concern if this would
>>>> cause performance loss some cases while making better for this one. That
>>>> would help a lot if KNI users test and comment.
>>>>
>>>> For me, applying patch didn't give any difference in final performance
>>>> numbers, but if there is no objection, I am OK to get this patch.
>>>>
>>>>
>>> <cut>
>>>
>
  
Ferruh Yigit June 7, 2017, 5:20 p.m. UTC | #8
On 5/11/2017 12:51 PM, Gowrishankar wrote:
> From: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
> 
> In kni_allocate_mbufs(), we attempt to add max_burst (32) count of mbuf
> always into alloc_q, which is excessively leading too many rte_pktmbuf_
> free() when alloc_q is contending at high packet rate (for eg 10Gig data).
> In a situation when alloc_q fifo can only accommodate very few (or zero)
> mbuf, create only what needed and add in fifo.
> 
> With this patch, we could stop random network stall in KNI at higher packet
> rate (eg 1G or 10G data between vEth0 and PMD) sufficiently exhausting
> alloc_q on above condition. I tested i40e PMD for this purpose in ppc64le.
> 
> Changes:
>  v2 - alloc_q free count calculation corrected.
>       line wrap fixed for commit message.
> 
> Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>

Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
  
Thomas Monjalon July 1, 2017, 10:56 a.m. UTC | #9
07/06/2017 19:20, Ferruh Yigit:
> On 5/11/2017 12:51 PM, Gowrishankar wrote:
> > From: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
> > 
> > In kni_allocate_mbufs(), we attempt to add max_burst (32) count of mbuf
> > always into alloc_q, which is excessively leading too many rte_pktmbuf_
> > free() when alloc_q is contending at high packet rate (for eg 10Gig data).
> > In a situation when alloc_q fifo can only accommodate very few (or zero)
> > mbuf, create only what needed and add in fifo.
> > 
> > With this patch, we could stop random network stall in KNI at higher packet
> > rate (eg 1G or 10G data between vEth0 and PMD) sufficiently exhausting
> > alloc_q on above condition. I tested i40e PMD for this purpose in ppc64le.
> > 
> > Changes:
> >  v2 - alloc_q free count calculation corrected.
> >       line wrap fixed for commit message.
> > 
> > Signed-off-by: Gowrishankar Muthukrishnan <gowrishankar.m@linux.vnet.ibm.com>
> 
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

Applied with this title:
	"kni: allocate no more mbuf than empty slots in queue"
Thanks
  

Patch

diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c
index c3f9208..9c5d485 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -624,6 +624,7 @@  struct rte_kni *
 	int i, ret;
 	struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM];
 	void *phys[MAX_MBUF_BURST_NUM];
+	int allocq_free;
 
 	RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, pool) !=
 			 offsetof(struct rte_kni_mbuf, pool));
@@ -646,7 +647,9 @@  struct rte_kni *
 		return;
 	}
 
-	for (i = 0; i < MAX_MBUF_BURST_NUM; i++) {
+	allocq_free = (kni->alloc_q->read - kni->alloc_q->write - 1) \
+			& (MAX_MBUF_BURST_NUM - 1);
+	for (i = 0; i < allocq_free; i++) {
 		pkts[i] = rte_pktmbuf_alloc(kni->pktmbuf_pool);
 		if (unlikely(pkts[i] == NULL)) {
 			/* Out of memory */