mbox series

[v1,0/1] distributor test fix

Message ID 20210119035910.8324-1-l.wojciechow@partner.samsung.com (mailing list archive)
Headers
Series distributor test fix |

Message

Lukasz Wojciechowski Jan. 19, 2021, 3:59 a.m. UTC
  According to the discussion in this thread:
https://protect2.fireeye.com/v1/url?k=24ecce33-7b77f77c-24ed457c-0cc47a30
d446-e317a6beb8cfa273&q=1&e=f8bb12df-3698-4bce-a7b7-d72e22b91431&u=https%
3A%2F%2Finbox.dpdk.org%2Fdev%2FCAOE1vsOehF4ZMOWffpEv%3DQF6YOc5wXtg23PV83B
9CLiTMn8wQA%40mail.gmail.com%2F%23r

I was able to reproduce the distributor test failure in the exactly same
way as described, but on x86_64 machine with 32 cores.
So it does not seem to be the problem related to the ARM architecture.
IMO issue occurs when there are many worker threads returning at the same
time packets.

I was not able to observe the issue on ARM devices, but I used only
machines with 4 cores. So that is max 3 worker cores,
so maximum of 32*3 = 96 packets processed at the same time
which is less than 127 , so the issue cannot occur.

Can anyone verify this patch on a machine similar to one used in CI lab,
on which the issue occurred?

Lukasz Wojciechowski (1):
  test/distributor: prevent return buffer overload

 app/test/test_distributor.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)
  

Comments

David Marchand Jan. 19, 2021, 8:44 a.m. UTC | #1
On Tue, Jan 19, 2021 at 4:59 AM Lukasz Wojciechowski
<l.wojciechow@partner.samsung.com> wrote:
>
> According to the discussion in this thread:
> https://protect2.fireeye.com/v1/url?k=24ecce33-7b77f77c-24ed457c-0cc47a30
> d446-e317a6beb8cfa273&q=1&e=f8bb12df-3698-4bce-a7b7-d72e22b91431&u=https%
> 3A%2F%2Finbox.dpdk.org%2Fdev%2FCAOE1vsOehF4ZMOWffpEv%3DQF6YOc5wXtg23PV83B
> 9CLiTMn8wQA%40mail.gmail.com%2F%23r
>
> I was able to reproduce the distributor test failure in the exactly same
> way as described, but on x86_64 machine with 32 cores.
> So it does not seem to be the problem related to the ARM architecture.
> IMO issue occurs when there are many worker threads returning at the same
> time packets.
>
> I was not able to observe the issue on ARM devices, but I used only
> machines with 4 cores. So that is max 3 worker cores,
> so maximum of 32*3 = 96 packets processed at the same time
> which is less than 127 , so the issue cannot occur.
>
> Can anyone verify this patch on a machine similar to one used in CI lab,
> on which the issue occurred?

Thanks for looking at it, Lukasz.
Unfortunately, I can't reproduce it on my x86 system (26 workers in
the test) and I don't have a ARM machine.
  
Lukasz Wojciechowski Jan. 19, 2021, 1:06 p.m. UTC | #2
Thank you David,
If you have the possibility you can try on some emulated virtual 
machine, where cores are much slower, so the workers don't return 
packages immediately.
It reproduces in 100% cases in such environment.

Best regards

Lukasz

W dniu 19.01.2021 o 09:44, David Marchand pisze:
> On Tue, Jan 19, 2021 at 4:59 AM Lukasz Wojciechowski
> <l.wojciechow@partner.samsung.com> wrote:
>> According to the discussion in this thread:
>> https://protect2.fireeye.com/v1/url?k=24ecce33-7b77f77c-24ed457c-0cc47a30
>> d446-e317a6beb8cfa273&q=1&e=f8bb12df-3698-4bce-a7b7-d72e22b91431&u=https%
>> 3A%2F%2Finbox.dpdk.org%2Fdev%2FCAOE1vsOehF4ZMOWffpEv%3DQF6YOc5wXtg23PV83B
>> 9CLiTMn8wQA%40mail.gmail.com%2F%23r
>>
>> I was able to reproduce the distributor test failure in the exactly same
>> way as described, but on x86_64 machine with 32 cores.
>> So it does not seem to be the problem related to the ARM architecture.
>> IMO issue occurs when there are many worker threads returning at the same
>> time packets.
>>
>> I was not able to observe the issue on ARM devices, but I used only
>> machines with 4 cores. So that is max 3 worker cores,
>> so maximum of 32*3 = 96 packets processed at the same time
>> which is less than 127 , so the issue cannot occur.
>>
>> Can anyone verify this patch on a machine similar to one used in CI lab,
>> on which the issue occurred?
> Thanks for looking at it, Lukasz.
> Unfortunately, I can't reproduce it on my x86 system (26 workers in
> the test) and I don't have a ARM machine.
>
  
David Marchand Jan. 28, 2021, 1:34 p.m. UTC | #3
On Tue, Jan 19, 2021 at 2:07 PM Lukasz Wojciechowski
<l.wojciechow@partner.samsung.com> wrote:
>
> Thank you David,
> If you have the possibility you can try on some emulated virtual
> machine, where cores are much slower, so the workers don't return
> packages immediately.
> It reproduces in 100% cases in such environment.

I reproduced the issue with starting a testpmd on the same cores in this system.
I usually reproduce it after 1-2 minutes of continuously running the
distributor_autotest unit test.

I've applied your fix in my tree and I will let this loop run for a while.