[RFC] test/distributor: fix burst flush on worker quit

Message ID 20210426163310.1043438-1-kda@semihalf.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [RFC] test/distributor: fix burst flush on worker quit |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS

Commit Message

Stanislaw Kardach April 26, 2021, 4:33 p.m. UTC
  While working on RISC-V port I have encountered a situation where worker
threads get stuck in the rte_distributor_return_pkt() function in the
burst test.
After investigation some of the threads enter this function with
flag RTE_DISTRIB_GET_BUF set in the d->retptr64[0]. At the same time
main thread has already passed rte_distributor_process() so nobody will
clear this flag and hence workers can't return.

What I've noticed is that adding a flush just after the last _process(),
similarly to how quit_workers() function is written in the
test_distributor.c fixes the issue.
Additionally the issue disappears when I remove the rdtsc delay code
inside the rte_distributor_request_pkt().
However I can't get this to reproduce on x86 (even with SIMD forced
off) and with artificial delays, which is why I wonder whether I'm not
actually hiding some other issue.

Looking at the implementation of the distributor, it is based on
__atomic_* builtins and the only platform related bit in the fast-path
is the rte_rdtsc() and rte_pause(). There may be some issues in the
toolchain (I've tried so far with the Ubuntu one - 10.2.0-8ubuntu1).
I should add that all unit tests for distributor are passing so either
there's some coverage corner case or the implementation works on RISC-V.
As for RDTSC I'm using a sleep-stable time counter with 1MHz frequency
and switching to high resolution cycle counter also removes the issue
but that's the same as removing the rdtsc delay as mentioned above.

I'd love to hear from You if this fix makes any sense.

While modifying this test, I've also pulled in a fix from
test_distributor.c which ensures that each thread gets his own wakeup
packet as it's possible that when sending a burst of packets, they won't
be spread over all the workers.

Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
Fixes: 7c3287a10535 ("test/distributor: add performance test for burst mode")
Cc: david.hunt@intel.com
Cc: l.wojciechow@partner.samsung.com
Cc: David Marchand <david.marchand@redhat.com>
---
 app/test/test_distributor_perf.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
  

Comments

Lukasz Wojciechowski April 28, 2021, 7:46 a.m. UTC | #1
Hi Stanislaw,

W dniu 26.04.2021 o 18:33, Stanislaw Kardach pisze:
> While working on RISC-V port I have encountered a situation where worker
> threads get stuck in the rte_distributor_return_pkt() function in the
> burst test.
> After investigation some of the threads enter this function with
> flag RTE_DISTRIB_GET_BUF set in the d->retptr64[0]. At the same time
> main thread has already passed rte_distributor_process() so nobody will
> clear this flag and hence workers can't return.
>
> What I've noticed is that adding a flush just after the last _process(),
> similarly to how quit_workers() function is written in the
> test_distributor.c fixes the issue.
> Additionally the issue disappears when I remove the rdtsc delay code
> inside the rte_distributor_request_pkt().
> However I can't get this to reproduce on x86 (even with SIMD forced
> off) and with artificial delays, which is why I wonder whether I'm not
> actually hiding some other issue.
I was able to reproduce the issue on x86 arch using VM with 32 emulated 
CPU cores.
I guess it would be enough to just have more than 8 worker threads, so 
not all of them would be awaken.
>
> Looking at the implementation of the distributor, it is based on
> __atomic_* builtins and the only platform related bit in the fast-path
> is the rte_rdtsc() and rte_pause(). There may be some issues in the
> toolchain (I've tried so far with the Ubuntu one - 10.2.0-8ubuntu1).
> I should add that all unit tests for distributor are passing so either
> there's some coverage corner case or the implementation works on RISC-V.
> As for RDTSC I'm using a sleep-stable time counter with 1MHz frequency
> and switching to high resolution cycle counter also removes the issue
> but that's the same as removing the rdtsc delay as mentioned above.
>
> I'd love to hear from You if this fix makes any sense.
Yes your patch fixes the issue and is perfectly fine.
>
> While modifying this test, I've also pulled in a fix from
> test_distributor.c which ensures that each thread gets his own wakeup
> packet as it's possible that when sending a burst of packets, they won't
> be spread over all the workers.
>
> Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
> Fixes: 7c3287a10535 ("test/distributor: add performance test for burst mode")
> Cc: david.hunt@intel.com
> Cc: l.wojciechow@partner.samsung.com
> Cc: David Marchand <david.marchand@redhat.com>
> ---
>   app/test/test_distributor_perf.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/app/test/test_distributor_perf.c b/app/test/test_distributor_perf.c
> index b25f79a34..fdbeae6d2 100644
> --- a/app/test/test_distributor_perf.c
> +++ b/app/test/test_distributor_perf.c
> @@ -188,13 +188,15 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
>   	rte_mempool_get_bulk(p, (void *)bufs, num_workers);
>   
>   	quit = 1;
> -	for (i = 0; i < num_workers; i++)
> +	for (i = 0; i < num_workers; i++) {
>   		bufs[i]->hash.usr = i << 1;
> -	rte_distributor_process(d, bufs, num_workers);
> +		rte_distributor_process(d, &bufs[i], 1);
> +	}
>   
>   	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
>   
>   	rte_distributor_process(d, NULL, 0);
> +	rte_distributor_flush(d);
>   	rte_eal_mp_wait_lcore();
>   	quit = 0;
>   	worker_idx = 0;
     Tested-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
     Reviewed-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
  
Hunt, David April 28, 2021, 12:50 p.m. UTC | #2
On 28/4/2021 8:46 AM, Lukasz Wojciechowski wrote:
> Hi Stanislaw,
>
> W dniu 26.04.2021 o 18:33, Stanislaw Kardach pisze:
>> While working on RISC-V port I have encountered a situation where worker
>> threads get stuck in the rte_distributor_return_pkt() function in the
>> burst test.
>> After investigation some of the threads enter this function with
>> flag RTE_DISTRIB_GET_BUF set in the d->retptr64[0]. At the same time
>> main thread has already passed rte_distributor_process() so nobody will
>> clear this flag and hence workers can't return.
>>
>> What I've noticed is that adding a flush just after the last _process(),
>> similarly to how quit_workers() function is written in the
>> test_distributor.c fixes the issue.
>> Additionally the issue disappears when I remove the rdtsc delay code
>> inside the rte_distributor_request_pkt().
>> However I can't get this to reproduce on x86 (even with SIMD forced
>> off) and with artificial delays, which is why I wonder whether I'm not
>> actually hiding some other issue.
> I was able to reproduce the issue on x86 arch using VM with 32 emulated
> CPU cores.
> I guess it would be enough to just have more than 8 worker threads, so
> not all of them would be awaken.
>> Looking at the implementation of the distributor, it is based on
>> __atomic_* builtins and the only platform related bit in the fast-path
>> is the rte_rdtsc() and rte_pause(). There may be some issues in the
>> toolchain (I've tried so far with the Ubuntu one - 10.2.0-8ubuntu1).
>> I should add that all unit tests for distributor are passing so either
>> there's some coverage corner case or the implementation works on RISC-V.
>> As for RDTSC I'm using a sleep-stable time counter with 1MHz frequency
>> and switching to high resolution cycle counter also removes the issue
>> but that's the same as removing the rdtsc delay as mentioned above.
>>
>> I'd love to hear from You if this fix makes any sense.
> Yes your patch fixes the issue and is perfectly fine.
>> While modifying this test, I've also pulled in a fix from
>> test_distributor.c which ensures that each thread gets his own wakeup
>> packet as it's possible that when sending a burst of packets, they won't
>> be spread over all the workers.
>>
>> Signed-off-by: Stanislaw Kardach <kda@semihalf.com>
>> Fixes: 7c3287a10535 ("test/distributor: add performance test for burst mode")
>> Cc: david.hunt@intel.com
>> Cc: l.wojciechow@partner.samsung.com
>> Cc: David Marchand <david.marchand@redhat.com>
>> ---
>>    app/test/test_distributor_perf.c | 6 ++++--
>>    1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/app/test/test_distributor_perf.c b/app/test/test_distributor_perf.c
>> index b25f79a34..fdbeae6d2 100644
>> --- a/app/test/test_distributor_perf.c
>> +++ b/app/test/test_distributor_perf.c
>> @@ -188,13 +188,15 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
>>    	rte_mempool_get_bulk(p, (void *)bufs, num_workers);
>>    
>>    	quit = 1;
>> -	for (i = 0; i < num_workers; i++)
>> +	for (i = 0; i < num_workers; i++) {
>>    		bufs[i]->hash.usr = i << 1;
>> -	rte_distributor_process(d, bufs, num_workers);
>> +		rte_distributor_process(d, &bufs[i], 1);
>> +	}
>>    
>>    	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
>>    
>>    	rte_distributor_process(d, NULL, 0);
>> +	rte_distributor_flush(d);
>>    	rte_eal_mp_wait_lcore();
>>    	quit = 0;
>>    	worker_idx = 0;
>       Tested-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
>       Reviewed-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>

Thanks, Stanislaw, Lukasz.

Acked-by: David Hunt <david.hunt@intel.com>
  
Stanislaw Kardach April 28, 2021, 12:53 p.m. UTC | #3
On Wed, Apr 28, 2021 at 2:51 PM David Hunt <david.hunt@intel.com> wrote:
<snip>

> Thanks, Stanislaw, Lukasz.
>
> Acked-by: David Hunt <david.hunt@intel.com>
>
> Just to be sure, may I send the non-RFC patch without the question part of
the description or would you prefer the current version to be merged?
  
Hunt, David April 28, 2021, 1:03 p.m. UTC | #4
On 28/4/2021 1:53 PM, Stanisław Kardach wrote:
> On Wed, Apr 28, 2021 at 2:51 PM David Hunt <david.hunt@intel.com 
> <mailto:david.hunt@intel.com>> wrote:
> <snip>
>
>     Thanks, Stanislaw, Lukasz.
>
>     Acked-by: David Hunt <david.hunt@intel.com
>     <mailto:david.hunt@intel.com>>
>
> Just to be sure, may I send the non-RFC patch without the question 
> part of the description or would you prefer the current version to be 
> merged?


Hi Stanislaw,

    It would be better to send a v2, as the concept looks good, was 
reviewd and tested by Lukasz, so remove the questions and send as a 
patch that could be merged.

Thanks,

Dave.
  
David Marchand April 28, 2021, 1:11 p.m. UTC | #5
On Mon, Apr 26, 2021 at 6:33 PM Stanislaw Kardach <kda@semihalf.com> wrote:
>
> While working on RISC-V port I have encountered a situation where worker
> threads get stuck in the rte_distributor_return_pkt() function in the
> burst test.
> After investigation some of the threads enter this function with
> flag RTE_DISTRIB_GET_BUF set in the d->retptr64[0]. At the same time
> main thread has already passed rte_distributor_process() so nobody will
> clear this flag and hence workers can't return.
>
> What I've noticed is that adding a flush just after the last _process(),
> similarly to how quit_workers() function is written in the
> test_distributor.c fixes the issue.
> Additionally the issue disappears when I remove the rdtsc delay code
> inside the rte_distributor_request_pkt().
> However I can't get this to reproduce on x86 (even with SIMD forced
> off) and with artificial delays, which is why I wonder whether I'm not
> actually hiding some other issue.
>
> Looking at the implementation of the distributor, it is based on
> __atomic_* builtins and the only platform related bit in the fast-path
> is the rte_rdtsc() and rte_pause(). There may be some issues in the
> toolchain (I've tried so far with the Ubuntu one - 10.2.0-8ubuntu1).
> I should add that all unit tests for distributor are passing so either
> there's some coverage corner case or the implementation works on RISC-V.
> As for RDTSC I'm using a sleep-stable time counter with 1MHz frequency
> and switching to high resolution cycle counter also removes the issue
> but that's the same as removing the rdtsc delay as mentioned above.
>
> I'd love to hear from You if this fix makes any sense.
>
> While modifying this test, I've also pulled in a fix from
> test_distributor.c which ensures that each thread gets his own wakeup
> packet as it's possible that when sending a burst of packets, they won't
> be spread over all the workers.

Do we have two fixes in a single patch?
This is ok if both point at the same originating commit, but else, we
need two patches with proper Fixes: line.

I see we have reviews, can you submit non-rfc patches?

Thanks.
  
Stanislaw Kardach April 28, 2021, 1:22 p.m. UTC | #6
On Wed, Apr 28, 2021 at 3:11 PM David Marchand <david.marchand@redhat.com>
wrote:
<snip>

>
> Do we have two fixes in a single patch?
> This is ok if both point at the same originating commit, but else, we
> need two patches with proper Fixes: line.
>
Those touch different commits, I'll split them and send together.

>
> I see we have reviews, can you submit non-rfc patches?
>
Will do.

>
> Thanks.
> --
> David Marchand
>
>
  

Patch

diff --git a/app/test/test_distributor_perf.c b/app/test/test_distributor_perf.c
index b25f79a34..fdbeae6d2 100644
--- a/app/test/test_distributor_perf.c
+++ b/app/test/test_distributor_perf.c
@@ -188,13 +188,15 @@  quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 	rte_mempool_get_bulk(p, (void *)bufs, num_workers);
 
 	quit = 1;
-	for (i = 0; i < num_workers; i++)
+	for (i = 0; i < num_workers; i++) {
 		bufs[i]->hash.usr = i << 1;
-	rte_distributor_process(d, bufs, num_workers);
+		rte_distributor_process(d, &bufs[i], 1);
+	}
 
 	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
 
 	rte_distributor_process(d, NULL, 0);
+	rte_distributor_flush(d);
 	rte_eal_mp_wait_lcore();
 	quit = 0;
 	worker_idx = 0;