net/mlx5: fix wrong segmented packet in Rx

Message ID 1613384114-17855-1-git-send-email-17826875952@163.com (mailing list archive)
State Superseded, archived
Delegated to: Raslan Darawsheh
Headers
Series net/mlx5: fix wrong segmented packet in Rx |

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-testing success Testing PASS
ci/travis-robot fail travis build: failed
ci/github-robot success github build: passed
ci/checkpatch success coding style OK

Commit Message

Jiawei Zhu Feb. 15, 2021, 10:15 a.m. UTC
  Fixed issue could occur when Mbuf starvation happens
in a middle of reception of a segmented packet.
In such a situation, after release the segments of that
packet, it does not align consumer index to the next stride.
This would cause receive a wrong segmented packet.

Fixes: 15a756b63734 ("net/mlx5: fix possible NULL dereference in Rx path")
Cc: stable@dpdk.org

Signed-off-by: Jiawei Zhu <17826875952@163.com>
---
 drivers/net/mlx5/mlx5_rxtx.c | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

Slava Ovsiienko Feb. 24, 2021, 1:20 p.m. UTC | #1
Hi, Jiawei

Thank you for the patch, but It seems I need some clarifications.
As far I understand the issue:

- we are in the midst of receiving the multi-segment packet
- we have some mbufs allocated and packet chain is partially built
- we fail on allocation replenishing mbuf for the segment
- we free all the mbuf of the built chain
- exit from the rx_burtst loop
- rq_ci is expected to be kept pointing to the beginning of the current
  stride - it is supposed on next rx_burst() invocation we'll continue
  Rx queue handling from the stride where we failed
- on loop exit we see the code:
   if (unlikely((i == 0) && ((rq_ci >> sges_n) == rxq->rq_ci)))
          return 0;
   /* Update the consumer index. */
   rxq->rq_ci = rq_ci >> sges_n;
hence, rq_ci is always shifted by sges_n, all increments happened
during failed packet processing are just discarded, it seems no fix is needed.

Did I miss something? 

With best regards,
Slava

> -----Original Message-----
> From: Jiawei Zhu <17826875952@163.com>
> Sent: Monday, February 15, 2021 12:15
> To: dev@dpdk.org
> Cc: zhujiawei12@huawei.com; Matan Azrad <matan@nvidia.com>; Shahaf
> Shuler <shahafs@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>;
> Jiawei Zhu <17826875952@163.com>; stable@dpdk.org
> Subject: [PATCH] net/mlx5: fix wrong segmented packet in Rx
> 
> Fixed issue could occur when Mbuf starvation happens in a middle of
> reception of a segmented packet.
> In such a situation, after release the segments of that packet, it does not align
> consumer index to the next stride.
> This would cause receive a wrong segmented packet.
> 
> Fixes: 15a756b63734 ("net/mlx5: fix possible NULL dereference in Rx path")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Jiawei Zhu <17826875952@163.com>
> ---
>  drivers/net/mlx5/mlx5_rxtx.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index
> 2e4b87c..e3ce9fd 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.c
> +++ b/drivers/net/mlx5/mlx5_rxtx.c
> @@ -1480,6 +1480,9 @@ enum mlx5_txcmp_code {
>  				rte_mbuf_raw_free(pkt);
>  				pkt = rep;
>  			}
> +			rq_ci >>= sges_n;
> +			++rq_ci;
> +			rq_ci <<= sges_n;
>  			break;
>  		}
>  		if (!pkt) {
> --
> 1.8.3.1
>
  
Jiawei Zhu Feb. 26, 2021, 4:11 p.m. UTC | #2
Hi, Slava

Thanks for reading my patch, my issue may not be clear.
Here I give a possible error.
- we assume segs_n is 4 and we are receiving 4 segments multi-segment 
packet.
- we fail to alloc mbuf when receive the 3th segment,so it will free the 
mbufs which packet chain we have built. Here are the 1st and 2nd segment.
- Rx queue in this stride, the 1st and the 2nd segment are fill the new 
mbuf and there data will be rand, but the 3th and 4th segment are still 
fill the last data. So next if still begin on this stride, it will 
reveice wrong multi-segment packet.

- So we should discarded this packets and pass this stride. After exit 
the loop, we should align the next consumer index.

What Do you thinking?

With best regards
Jiawei

On 2021/2/24 9:20 PM, Slava Ovsiienko wrote:
> Hi, Jiawei
> 
> Thank you for the patch, but It seems I need some clarifications.
> As far I understand the issue:
> 
> - we are in the midst of receiving the multi-segment packet
> - we have some mbufs allocated and packet chain is partially built
> - we fail on allocation replenishing mbuf for the segment
> - we free all the mbuf of the built chain
> - exit from the rx_burtst loop
> - rq_ci is expected to be kept pointing to the beginning of the current
>    stride - it is supposed on next rx_burst() invocation we'll continue
>    Rx queue handling from the stride where we failed
> - on loop exit we see the code:
>     if (unlikely((i == 0) && ((rq_ci >> sges_n) == rxq->rq_ci)))
>            return 0;
>     /* Update the consumer index. */
>     rxq->rq_ci = rq_ci >> sges_n;
> hence, rq_ci is always shifted by sges_n, all increments happened
> during failed packet processing are just discarded, it seems no fix is needed.
> 
> Did I miss something?
> 
> With best regards,
> Slava
> 
>> -----Original Message-----
>> From: Jiawei Zhu <17826875952@163.com>
>> Sent: Monday, February 15, 2021 12:15
>> To: dev@dpdk.org
>> Cc: zhujiawei12@huawei.com; Matan Azrad <matan@nvidia.com>; Shahaf
>> Shuler <shahafs@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>;
>> Jiawei Zhu <17826875952@163.com>; stable@dpdk.org
>> Subject: [PATCH] net/mlx5: fix wrong segmented packet in Rx
>>
>> Fixed issue could occur when Mbuf starvation happens in a middle of
>> reception of a segmented packet.
>> In such a situation, after release the segments of that packet, it does not align
>> consumer index to the next stride.
>> This would cause receive a wrong segmented packet.
>>
>> Fixes: 15a756b63734 ("net/mlx5: fix possible NULL dereference in Rx path")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Jiawei Zhu <17826875952@163.com>
>> ---
>>   drivers/net/mlx5/mlx5_rxtx.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c index
>> 2e4b87c..e3ce9fd 100644
>> --- a/drivers/net/mlx5/mlx5_rxtx.c
>> +++ b/drivers/net/mlx5/mlx5_rxtx.c
>> @@ -1480,6 +1480,9 @@ enum mlx5_txcmp_code {
>>   				rte_mbuf_raw_free(pkt);
>>   				pkt = rep;
>>   			}
>> +			rq_ci >>= sges_n;
>> +			++rq_ci;
>> +			rq_ci <<= sges_n;
>>   			break;
>>   		}
>>   		if (!pkt) {
>> --
>> 1.8.3.1
>>
>
  
Slava Ovsiienko March 1, 2021, 9:13 a.m. UTC | #3
Hi, Jiawei

Thank you for the clarification. I missed the point that we have updated elts array 
with new allocated mbufs and are not able to retry packet building anymore.
Very good catch, thank you!  Could you, please, add this extra explanation
to the  commit message and send the v2 ?

With best regards, 
Slava

> -----Original Message-----
> From: Jiawei Zhu <17826875952@163.com>
> Sent: Friday, February 26, 2021 18:11
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org
> Cc: zhujiawei12@huawei.com; Matan Azrad <matan@nvidia.com>; Shahaf
> Shuler <shahafs@nvidia.com>; stable@dpdk.org
> Subject: Re: [PATCH] net/mlx5: fix wrong segmented packet in Rx
> 
> Hi, Slava
> 
> Thanks for reading my patch, my issue may not be clear.
> Here I give a possible error.
> - we assume segs_n is 4 and we are receiving 4 segments multi-segment
> packet.
> - we fail to alloc mbuf when receive the 3th segment,so it will free the
> mbufs which packet chain we have built. Here are the 1st and 2nd segment.
> - Rx queue in this stride, the 1st and the 2nd segment are fill the new mbuf
> and there data will be rand, but the 3th and 4th segment are still fill the last
> data. So next if still begin on this stride, it will reveice wrong multi-segment
> packet.
> 
> - So we should discarded this packets and pass this stride. After exit the loop,
> we should align the next consumer index.
> 
> What Do you thinking?
> 
> With best regards
> Jiawei
> 
> On 2021/2/24 9:20 PM, Slava Ovsiienko wrote:
> > Hi, Jiawei
> >
> > Thank you for the patch, but It seems I need some clarifications.
> > As far I understand the issue:
> >
> > - we are in the midst of receiving the multi-segment packet
> > - we have some mbufs allocated and packet chain is partially built
> > - we fail on allocation replenishing mbuf for the segment
> > - we free all the mbuf of the built chain
> > - exit from the rx_burtst loop
> > - rq_ci is expected to be kept pointing to the beginning of the current
> >    stride - it is supposed on next rx_burst() invocation we'll continue
> >    Rx queue handling from the stride where we failed
> > - on loop exit we see the code:
> >     if (unlikely((i == 0) && ((rq_ci >> sges_n) == rxq->rq_ci)))
> >            return 0;
> >     /* Update the consumer index. */
> >     rxq->rq_ci = rq_ci >> sges_n;
> > hence, rq_ci is always shifted by sges_n, all increments happened
> > during failed packet processing are just discarded, it seems no fix is needed.
> >
> > Did I miss something?
> >
> > With best regards,
> > Slava
> >
> >> -----Original Message-----
> >> From: Jiawei Zhu <17826875952@163.com>
> >> Sent: Monday, February 15, 2021 12:15
> >> To: dev@dpdk.org
> >> Cc: zhujiawei12@huawei.com; Matan Azrad <matan@nvidia.com>; Shahaf
> >> Shuler <shahafs@nvidia.com>; Slava Ovsiienko
> >> <viacheslavo@nvidia.com>; Jiawei Zhu <17826875952@163.com>;
> >> stable@dpdk.org
> >> Subject: [PATCH] net/mlx5: fix wrong segmented packet in Rx
> >>
> >> Fixed issue could occur when Mbuf starvation happens in a middle of
> >> reception of a segmented packet.
> >> In such a situation, after release the segments of that packet, it
> >> does not align consumer index to the next stride.
> >> This would cause receive a wrong segmented packet.
> >>
> >> Fixes: 15a756b63734 ("net/mlx5: fix possible NULL dereference in Rx
> >> path")
> >> Cc: stable@dpdk.org
> >>
> >> Signed-off-by: Jiawei Zhu <17826875952@163.com>
> >> ---
> >>   drivers/net/mlx5/mlx5_rxtx.c | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >> diff --git a/drivers/net/mlx5/mlx5_rxtx.c
> >> b/drivers/net/mlx5/mlx5_rxtx.c index 2e4b87c..e3ce9fd 100644
> >> --- a/drivers/net/mlx5/mlx5_rxtx.c
> >> +++ b/drivers/net/mlx5/mlx5_rxtx.c
> >> @@ -1480,6 +1480,9 @@ enum mlx5_txcmp_code {
> >>   				rte_mbuf_raw_free(pkt);
> >>   				pkt = rep;
> >>   			}
> >> +			rq_ci >>= sges_n;
> >> +			++rq_ci;
> >> +			rq_ci <<= sges_n;
> >>   			break;
> >>   		}
> >>   		if (!pkt) {
> >> --
> >> 1.8.3.1
> >>
> >
  
Jiawei Zhu March 1, 2021, 5:01 p.m. UTC | #4
Hi, Slava
Thank you for your agreement. Here is the v2 patch:

https://patches.dpdk.org/project/dpdk/patch/1614617885-2650-1-git-send-email-17826875952@163.com/

With best regards,
Jiawei

On 2021/3/1 5:13 PM, Slava Ovsiienko wrote:
> Hi, Jiawei
> 
> Thank you for the clarification. I missed the point that we have updated elts array
> with new allocated mbufs and are not able to retry packet building anymore.
> Very good catch, thank you!  Could you, please, add this extra explanation
> to the  commit message and send the v2 ?
> 
> With best regards,
> Slava
> 
>> -----Original Message-----
>> From: Jiawei Zhu <17826875952@163.com>
>> Sent: Friday, February 26, 2021 18:11
>> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org
>> Cc: zhujiawei12@huawei.com; Matan Azrad <matan@nvidia.com>; Shahaf
>> Shuler <shahafs@nvidia.com>; stable@dpdk.org
>> Subject: Re: [PATCH] net/mlx5: fix wrong segmented packet in Rx
>>
>> Hi, Slava
>>
>> Thanks for reading my patch, my issue may not be clear.
>> Here I give a possible error.
>> - we assume segs_n is 4 and we are receiving 4 segments multi-segment
>> packet.
>> - we fail to alloc mbuf when receive the 3th segment,so it will free the
>> mbufs which packet chain we have built. Here are the 1st and 2nd segment.
>> - Rx queue in this stride, the 1st and the 2nd segment are fill the new mbuf
>> and there data will be rand, but the 3th and 4th segment are still fill the last
>> data. So next if still begin on this stride, it will reveice wrong multi-segment
>> packet.
>>
>> - So we should discarded this packets and pass this stride. After exit the loop,
>> we should align the next consumer index.
>>
>> What Do you thinking?
>>
>> With best regards
>> Jiawei
>>
>> On 2021/2/24 9:20 PM, Slava Ovsiienko wrote:
>>> Hi, Jiawei
>>>
>>> Thank you for the patch, but It seems I need some clarifications.
>>> As far I understand the issue:
>>>
>>> - we are in the midst of receiving the multi-segment packet
>>> - we have some mbufs allocated and packet chain is partially built
>>> - we fail on allocation replenishing mbuf for the segment
>>> - we free all the mbuf of the built chain
>>> - exit from the rx_burtst loop
>>> - rq_ci is expected to be kept pointing to the beginning of the current
>>>     stride - it is supposed on next rx_burst() invocation we'll continue
>>>     Rx queue handling from the stride where we failed
>>> - on loop exit we see the code:
>>>      if (unlikely((i == 0) && ((rq_ci >> sges_n) == rxq->rq_ci)))
>>>             return 0;
>>>      /* Update the consumer index. */
>>>      rxq->rq_ci = rq_ci >> sges_n;
>>> hence, rq_ci is always shifted by sges_n, all increments happened
>>> during failed packet processing are just discarded, it seems no fix is needed.
>>>
>>> Did I miss something?
>>>
>>> With best regards,
>>> Slava
>>>
>>>> -----Original Message-----
>>>> From: Jiawei Zhu <17826875952@163.com>
>>>> Sent: Monday, February 15, 2021 12:15
>>>> To: dev@dpdk.org
>>>> Cc: zhujiawei12@huawei.com; Matan Azrad <matan@nvidia.com>; Shahaf
>>>> Shuler <shahafs@nvidia.com>; Slava Ovsiienko
>>>> <viacheslavo@nvidia.com>; Jiawei Zhu <17826875952@163.com>;
>>>> stable@dpdk.org
>>>> Subject: [PATCH] net/mlx5: fix wrong segmented packet in Rx
>>>>
>>>> Fixed issue could occur when Mbuf starvation happens in a middle of
>>>> reception of a segmented packet.
>>>> In such a situation, after release the segments of that packet, it
>>>> does not align consumer index to the next stride.
>>>> This would cause receive a wrong segmented packet.
>>>>
>>>> Fixes: 15a756b63734 ("net/mlx5: fix possible NULL dereference in Rx
>>>> path")
>>>> Cc: stable@dpdk.org
>>>>
>>>> Signed-off-by: Jiawei Zhu <17826875952@163.com>
>>>> ---
>>>>    drivers/net/mlx5/mlx5_rxtx.c | 3 +++
>>>>    1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/net/mlx5/mlx5_rxtx.c
>>>> b/drivers/net/mlx5/mlx5_rxtx.c index 2e4b87c..e3ce9fd 100644
>>>> --- a/drivers/net/mlx5/mlx5_rxtx.c
>>>> +++ b/drivers/net/mlx5/mlx5_rxtx.c
>>>> @@ -1480,6 +1480,9 @@ enum mlx5_txcmp_code {
>>>>    				rte_mbuf_raw_free(pkt);
>>>>    				pkt = rep;
>>>>    			}
>>>> +			rq_ci >>= sges_n;
>>>> +			++rq_ci;
>>>> +			rq_ci <<= sges_n;
>>>>    			break;
>>>>    		}
>>>>    		if (!pkt) {
>>>> --
>>>> 1.8.3.1
>>>>
>>>
>
  
Slava Ovsiienko March 2, 2021, 8:10 a.m. UTC | #5
Hi, Jiawei

Thanks a lot for the update.
There are some common points for the commit messages of fixing patches:
- the bug/error/issue should be described in PAST tense (what we HAD before the fix)
- what fix is doing should be described in PRESENT tense (what we HAVE right now, after fix apply)

Also, can we fix some typos in the message and reword it a little bit?
What do you think about something like this:

The issue occurred if mbuf starvation happened
in the middle of segmented packet reception.
In such a situation, after release the segments of
packet being received, code did not advance the
consumer index to the next stride. This caused
the receiving of the wrong segmented packet data.

The possible error scenario:
- we assume segs_n is 4 and we are receiving 4
  segments of multi-segment packet.
- we fail to allocate mbuf while receiving the 3rd segment,
  and this frees the mbufs of the packet chain we have built.
  There are the 1st and 2nd segments in the chain.
- the 1st and the 2nd segments of this stride of Rx queue
  are filled up (in elts array) with the new allocated
  mbufs and their data are random (the 3rd and 4th
  segments still contain the valid data of the packet though).
- on the next iteration of stride processing we get
  the wrong two segments of the multi-segment packet.

Hence, we should skip these mbufs in the stride and
we should advance the consumer index on loop exit.

With best regards,
Slava

> -----Original Message-----
> From: Jiawei Zhu <17826875952@163.com>
> Sent: Monday, March 1, 2021 19:02
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org
> Cc: zhujiawei12@huawei.com; Matan Azrad <matan@nvidia.com>; Shahaf
> Shuler <shahafs@nvidia.com>; stable@dpdk.org
> Subject: Re: [PATCH] net/mlx5: fix wrong segmented packet in Rx
> 
> Hi, Slava
> Thank you for your agreement. Here is the v2 patch:
> 
> https://patches.dpdk.org/project/dpdk/patch/1614617885-2650-1-git-send-
> email-17826875952@163.com/
> 
> With best regards,
> Jiawei
> 
> On 2021/3/1 5:13 PM, Slava Ovsiienko wrote:
> > Hi, Jiawei
> >
> > Thank you for the clarification. I missed the point that we have
> > updated elts array with new allocated mbufs and are not able to retry
> packet building anymore.
> > Very good catch, thank you!  Could you, please, add this extra
> > explanation to the  commit message and send the v2 ?
> >
> > With best regards,
> > Slava
> >
> >> -----Original Message-----
> >> From: Jiawei Zhu <17826875952@163.com>
> >> Sent: Friday, February 26, 2021 18:11
> >> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org
> >> Cc: zhujiawei12@huawei.com; Matan Azrad <matan@nvidia.com>; Shahaf
> >> Shuler <shahafs@nvidia.com>; stable@dpdk.org
> >> Subject: Re: [PATCH] net/mlx5: fix wrong segmented packet in Rx
> >>
> >> Hi, Slava
> >>
> >> Thanks for reading my patch, my issue may not be clear.
> >> Here I give a possible error.
> >> - we assume segs_n is 4 and we are receiving 4 segments multi-segment
> >> packet.
> >> - we fail to alloc mbuf when receive the 3th segment,so it will free
> >> the mbufs which packet chain we have built. Here are the 1st and 2nd
> segment.
> >> - Rx queue in this stride, the 1st and the 2nd segment are fill the
> >> new mbuf and there data will be rand, but the 3th and 4th segment are
> >> still fill the last data. So next if still begin on this stride, it
> >> will reveice wrong multi-segment packet.
> >>
> >> - So we should discarded this packets and pass this stride. After
> >> exit the loop, we should align the next consumer index.
> >>
> >> What Do you thinking?
> >>
> >> With best regards
> >> Jiawei
> >>
> >> On 2021/2/24 9:20 PM, Slava Ovsiienko wrote:
> >>> Hi, Jiawei
> >>>
> >>> Thank you for the patch, but It seems I need some clarifications.
> >>> As far I understand the issue:
> >>>
> >>> - we are in the midst of receiving the multi-segment packet
> >>> - we have some mbufs allocated and packet chain is partially built
> >>> - we fail on allocation replenishing mbuf for the segment
> >>> - we free all the mbuf of the built chain
> >>> - exit from the rx_burtst loop
> >>> - rq_ci is expected to be kept pointing to the beginning of the current
> >>>     stride - it is supposed on next rx_burst() invocation we'll continue
> >>>     Rx queue handling from the stride where we failed
> >>> - on loop exit we see the code:
> >>>      if (unlikely((i == 0) && ((rq_ci >> sges_n) == rxq->rq_ci)))
> >>>             return 0;
> >>>      /* Update the consumer index. */
> >>>      rxq->rq_ci = rq_ci >> sges_n;
> >>> hence, rq_ci is always shifted by sges_n, all increments happened
> >>> during failed packet processing are just discarded, it seems no fix is
> needed.
> >>>
> >>> Did I miss something?
> >>>
> >>> With best regards,
> >>> Slava
> >>>
> >>>> -----Original Message-----
> >>>> From: Jiawei Zhu <17826875952@163.com>
> >>>> Sent: Monday, February 15, 2021 12:15
> >>>> To: dev@dpdk.org
> >>>> Cc: zhujiawei12@huawei.com; Matan Azrad <matan@nvidia.com>;
> Shahaf
> >>>> Shuler <shahafs@nvidia.com>; Slava Ovsiienko
> >>>> <viacheslavo@nvidia.com>; Jiawei Zhu <17826875952@163.com>;
> >>>> stable@dpdk.org
> >>>> Subject: [PATCH] net/mlx5: fix wrong segmented packet in Rx
> >>>>
> >>>> Fixed issue could occur when Mbuf starvation happens in a middle of
> >>>> reception of a segmented packet.
> >>>> In such a situation, after release the segments of that packet, it
> >>>> does not align consumer index to the next stride.
> >>>> This would cause receive a wrong segmented packet.
> >>>>
> >>>> Fixes: 15a756b63734 ("net/mlx5: fix possible NULL dereference in Rx
> >>>> path")
> >>>> Cc: stable@dpdk.org
> >>>>
> >>>> Signed-off-by: Jiawei Zhu <17826875952@163.com>
> >>>> ---
> >>>>    drivers/net/mlx5/mlx5_rxtx.c | 3 +++
> >>>>    1 file changed, 3 insertions(+)
> >>>>
> >>>> diff --git a/drivers/net/mlx5/mlx5_rxtx.c
> >>>> b/drivers/net/mlx5/mlx5_rxtx.c index 2e4b87c..e3ce9fd 100644
> >>>> --- a/drivers/net/mlx5/mlx5_rxtx.c
> >>>> +++ b/drivers/net/mlx5/mlx5_rxtx.c
> >>>> @@ -1480,6 +1480,9 @@ enum mlx5_txcmp_code {
> >>>>    				rte_mbuf_raw_free(pkt);
> >>>>    				pkt = rep;
> >>>>    			}
> >>>> +			rq_ci >>= sges_n;
> >>>> +			++rq_ci;
> >>>> +			rq_ci <<= sges_n;
> >>>>    			break;
> >>>>    		}
> >>>>    		if (!pkt) {
> >>>> --
> >>>> 1.8.3.1
> >>>>
> >>>
> >
  
Jiawei Zhu March 2, 2021, 4:44 p.m. UTC | #6
Hi, Slava

Thanks a lot for your correction and advice. I'm not very good at 
grammar 😁.
Here is new patch and used your words.😃

https://patches.dpdk.org/project/dpdk/patch/1614619190-3846-1-git-send-email-17826875952@163.com/

With best regards,
Jiawei

On 2021/3/2 4:10 PM, Slava Ovsiienko wrote:
> Hi, Jiawei
> 
> Thanks a lot for the update.
> There are some common points for the commit messages of fixing patches:
> - the bug/error/issue should be described in PAST tense (what we HAD before the fix)
> - what fix is doing should be described in PRESENT tense (what we HAVE right now, after fix apply)
> 
> Also, can we fix some typos in the message and reword it a little bit?
> What do you think about something like this:
> 
> The issue occurred if mbuf starvation happened
> in the middle of segmented packet reception.
> In such a situation, after release the segments of
> packet being received, code did not advance the
> consumer index to the next stride. This caused
> the receiving of the wrong segmented packet data.
> 
> The possible error scenario:
> - we assume segs_n is 4 and we are receiving 4
>    segments of multi-segment packet.
> - we fail to allocate mbuf while receiving the 3rd segment,
>    and this frees the mbufs of the packet chain we have built.
>    There are the 1st and 2nd segments in the chain.
> - the 1st and the 2nd segments of this stride of Rx queue
>    are filled up (in elts array) with the new allocated
>    mbufs and their data are random (the 3rd and 4th
>    segments still contain the valid data of the packet though).
> - on the next iteration of stride processing we get
>    the wrong two segments of the multi-segment packet.
> 
> Hence, we should skip these mbufs in the stride and
> we should advance the consumer index on loop exit.
> 
> With best regards,
> Slava
> 
>> -----Original Message-----
>> From: Jiawei Zhu <17826875952@163.com>
>> Sent: Monday, March 1, 2021 19:02
>> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org
>> Cc: zhujiawei12@huawei.com; Matan Azrad <matan@nvidia.com>; Shahaf
>> Shuler <shahafs@nvidia.com>; stable@dpdk.org
>> Subject: Re: [PATCH] net/mlx5: fix wrong segmented packet in Rx
>>
>> Hi, Slava
>> Thank you for your agreement. Here is the v2 patch:
>>
>> https://patches.dpdk.org/project/dpdk/patch/1614617885-2650-1-git-send-
>> email-17826875952@163.com/
>>
>> With best regards,
>> Jiawei
>>
>> On 2021/3/1 5:13 PM, Slava Ovsiienko wrote:
>>> Hi, Jiawei
>>>
>>> Thank you for the clarification. I missed the point that we have
>>> updated elts array with new allocated mbufs and are not able to retry
>> packet building anymore.
>>> Very good catch, thank you!  Could you, please, add this extra
>>> explanation to the  commit message and send the v2 ?
>>>
>>> With best regards,
>>> Slava
>>>
>>>> -----Original Message-----
>>>> From: Jiawei Zhu <17826875952@163.com>
>>>> Sent: Friday, February 26, 2021 18:11
>>>> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org
>>>> Cc: zhujiawei12@huawei.com; Matan Azrad <matan@nvidia.com>; Shahaf
>>>> Shuler <shahafs@nvidia.com>; stable@dpdk.org
>>>> Subject: Re: [PATCH] net/mlx5: fix wrong segmented packet in Rx
>>>>
>>>> Hi, Slava
>>>>
>>>> Thanks for reading my patch, my issue may not be clear.
>>>> Here I give a possible error.
>>>> - we assume segs_n is 4 and we are receiving 4 segments multi-segment
>>>> packet.
>>>> - we fail to alloc mbuf when receive the 3th segment,so it will free
>>>> the mbufs which packet chain we have built. Here are the 1st and 2nd
>> segment.
>>>> - Rx queue in this stride, the 1st and the 2nd segment are fill the
>>>> new mbuf and there data will be rand, but the 3th and 4th segment are
>>>> still fill the last data. So next if still begin on this stride, it
>>>> will reveice wrong multi-segment packet.
>>>>
>>>> - So we should discarded this packets and pass this stride. After
>>>> exit the loop, we should align the next consumer index.
>>>>
>>>> What Do you thinking?
>>>>
>>>> With best regards
>>>> Jiawei
>>>>
>>>> On 2021/2/24 9:20 PM, Slava Ovsiienko wrote:
>>>>> Hi, Jiawei
>>>>>
>>>>> Thank you for the patch, but It seems I need some clarifications.
>>>>> As far I understand the issue:
>>>>>
>>>>> - we are in the midst of receiving the multi-segment packet
>>>>> - we have some mbufs allocated and packet chain is partially built
>>>>> - we fail on allocation replenishing mbuf for the segment
>>>>> - we free all the mbuf of the built chain
>>>>> - exit from the rx_burtst loop
>>>>> - rq_ci is expected to be kept pointing to the beginning of the current
>>>>>      stride - it is supposed on next rx_burst() invocation we'll continue
>>>>>      Rx queue handling from the stride where we failed
>>>>> - on loop exit we see the code:
>>>>>       if (unlikely((i == 0) && ((rq_ci >> sges_n) == rxq->rq_ci)))
>>>>>              return 0;
>>>>>       /* Update the consumer index. */
>>>>>       rxq->rq_ci = rq_ci >> sges_n;
>>>>> hence, rq_ci is always shifted by sges_n, all increments happened
>>>>> during failed packet processing are just discarded, it seems no fix is
>> needed.
>>>>>
>>>>> Did I miss something?
>>>>>
>>>>> With best regards,
>>>>> Slava
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Jiawei Zhu <17826875952@163.com>
>>>>>> Sent: Monday, February 15, 2021 12:15
>>>>>> To: dev@dpdk.org
>>>>>> Cc: zhujiawei12@huawei.com; Matan Azrad <matan@nvidia.com>;
>> Shahaf
>>>>>> Shuler <shahafs@nvidia.com>; Slava Ovsiienko
>>>>>> <viacheslavo@nvidia.com>; Jiawei Zhu <17826875952@163.com>;
>>>>>> stable@dpdk.org
>>>>>> Subject: [PATCH] net/mlx5: fix wrong segmented packet in Rx
>>>>>>
>>>>>> Fixed issue could occur when Mbuf starvation happens in a middle of
>>>>>> reception of a segmented packet.
>>>>>> In such a situation, after release the segments of that packet, it
>>>>>> does not align consumer index to the next stride.
>>>>>> This would cause receive a wrong segmented packet.
>>>>>>
>>>>>> Fixes: 15a756b63734 ("net/mlx5: fix possible NULL dereference in Rx
>>>>>> path")
>>>>>> Cc: stable@dpdk.org
>>>>>>
>>>>>> Signed-off-by: Jiawei Zhu <17826875952@163.com>
>>>>>> ---
>>>>>>     drivers/net/mlx5/mlx5_rxtx.c | 3 +++
>>>>>>     1 file changed, 3 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/net/mlx5/mlx5_rxtx.c
>>>>>> b/drivers/net/mlx5/mlx5_rxtx.c index 2e4b87c..e3ce9fd 100644
>>>>>> --- a/drivers/net/mlx5/mlx5_rxtx.c
>>>>>> +++ b/drivers/net/mlx5/mlx5_rxtx.c
>>>>>> @@ -1480,6 +1480,9 @@ enum mlx5_txcmp_code {
>>>>>>     				rte_mbuf_raw_free(pkt);
>>>>>>     				pkt = rep;
>>>>>>     			}
>>>>>> +			rq_ci >>= sges_n;
>>>>>> +			++rq_ci;
>>>>>> +			rq_ci <<= sges_n;
>>>>>>     			break;
>>>>>>     		}
>>>>>>     		if (!pkt) {
>>>>>> --
>>>>>> 1.8.3.1
>>>>>>
>>>>>
>>>
>
  

Patch

diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
index 2e4b87c..e3ce9fd 100644
--- a/drivers/net/mlx5/mlx5_rxtx.c
+++ b/drivers/net/mlx5/mlx5_rxtx.c
@@ -1480,6 +1480,9 @@  enum mlx5_txcmp_code {
 				rte_mbuf_raw_free(pkt);
 				pkt = rep;
 			}
+			rq_ci >>= sges_n;
+			++rq_ci;
+			rq_ci <<= sges_n;
 			break;
 		}
 		if (!pkt) {