test/service: fix race condition on stopping lcore

Message ID 20201016090804.1242907-1-kevin.laatz@intel.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series test/service: fix race condition on stopping lcore |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/Intel-compilation success Compilation OK
ci/travis-robot success Travis build: passed
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-testing fail Testing issues
ci/iol-mellanox-Performance success Performance Testing PASS

Commit Message

Kevin Laatz Oct. 16, 2020, 9:08 a.m. UTC
  There is a potential race condition in 'service_attr_get' which will cause
test failures since the service core thread is still running while the
values are being retrieved/reset.

This patch fixes the race condition by waiting for the service core thread
to stop before continuing with the unit test checks.

Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
---
 app/test/test_service_cores.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)
  

Comments

Van Haaren, Harry Oct. 16, 2020, 9:18 a.m. UTC | #1
> -----Original Message-----
> From: Laatz, Kevin <kevin.laatz@intel.com>
> Sent: Friday, October 16, 2020 10:08 AM
> To: dev@dpdk.org
> Cc: Van Haaren, Harry <harry.van.haaren@intel.com>;
> david.marchand@redhat.com; l.wojciechow@partner.samsung.com;
> Honnappa.Nagarahalli@arm.com; phil.yang@arm.com; aconole@redhat.com;
> Laatz, Kevin <kevin.laatz@intel.com>
> Subject: [PATCH] test/service: fix race condition on stopping lcore
> 
> There is a potential race condition in 'service_attr_get' which will cause
> test failures since the service core thread is still running while the
> values are being retrieved/reset.
> 
> This patch fixes the race condition by waiting for the service core thread
> to stop before continuing with the unit test checks.
> 
> Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>

Thanks Kevin for handling; can't reproduce race-cond here, but by code review
this is the correct fix, thanks also for refactoring the wait into its own function.

Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
  
David Marchand Oct. 16, 2020, 11:50 a.m. UTC | #2
On Fri, Oct 16, 2020 at 11:13 AM Kevin Laatz <kevin.laatz@intel.com> wrote:
>
> There is a potential race condition in 'service_attr_get' which will cause
> test failures since the service core thread is still running while the
> values are being retrieved/reset.
>
> This patch fixes the race condition by waiting for the service core thread
> to stop before continuing with the unit test checks.

We won't backport it, since we need a new API, but I would flag it for info as:
Fixes: 4d55194d76a4 ("service: add attribute get function")

Ok for you?


--
David Marchand
  
Van Haaren, Harry Oct. 16, 2020, 11:51 a.m. UTC | #3
> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Friday, October 16, 2020 12:50 PM
> To: Laatz, Kevin <kevin.laatz@intel.com>; Van Haaren, Harry
> <harry.van.haaren@intel.com>
> Cc: dev <dev@dpdk.org>; Lukasz Wojciechowski
> <l.wojciechow@partner.samsung.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Phil Yang <phil.yang@arm.com>; Aaron
> Conole <aconole@redhat.com>
> Subject: Re: [PATCH] test/service: fix race condition on stopping lcore
> 
> On Fri, Oct 16, 2020 at 11:13 AM Kevin Laatz <kevin.laatz@intel.com> wrote:
> >
> > There is a potential race condition in 'service_attr_get' which will cause
> > test failures since the service core thread is still running while the
> > values are being retrieved/reset.
> >
> > This patch fixes the race condition by waiting for the service core thread
> > to stop before continuing with the unit test checks.
> 
> We won't backport it, since we need a new API, but I would flag it for info as:
> Fixes: 4d55194d76a4 ("service: add attribute get function")
> 
> Ok for you?

Yes - thanks.

> David Marchand
  
David Marchand Oct. 16, 2020, 11:54 a.m. UTC | #4
On Fri, Oct 16, 2020 at 11:18 AM Van Haaren, Harry
<harry.van.haaren@intel.com> wrote:
> > Subject: [PATCH] test/service: fix race condition on stopping lcore
> >
> > There is a potential race condition in 'service_attr_get' which will cause
> > test failures since the service core thread is still running while the
> > values are being retrieved/reset.
> >
> > This patch fixes the race condition by waiting for the service core thread
> > to stop before continuing with the unit test checks.

Fixes: 4d55194d76a4 ("service: add attribute get function")

> > Signed-off-by: Kevin Laatz <kevin.laatz@intel.com>
> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>

Thanks Kevin, Harry, applied.
  

Patch

diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c
index 5d92bea8af..44b6fc3624 100644
--- a/app/test/test_service_cores.c
+++ b/app/test/test_service_cores.c
@@ -119,6 +119,17 @@  unregister_all(void)
 	return TEST_SUCCESS;
 }
 
+/* Wait until service lcore not active, or for 100x SERVICE_DELAY */
+static void
+wait_slcore_inactive(uint32_t slcore_id)
+{
+	int i;
+
+	for (i = 0; rte_service_lcore_may_be_active(slcore_id) == 1 &&
+			i < 100; i++)
+		rte_delay_ms(SERVICE_DELAY);
+}
+
 /* register a single dummy service */
 static int
 dummy_register(void)
@@ -305,6 +316,8 @@  service_attr_get(void)
 
 	rte_service_lcore_stop(slcore_id);
 
+	wait_slcore_inactive(slcore_id);
+
 	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_calls, &attr_value),
 			"Valid attr_get() call didn't return success");
 	TEST_ASSERT_EQUAL(1, (attr_value > 0),
@@ -394,11 +407,7 @@  service_lcore_attr_get(void)
 	TEST_ASSERT_EQUAL(0, rte_service_lcore_stop(slcore_id),
 			"Failed to stop service lcore");
 
-	/* Wait until service lcore not active, or for 100x SERVICE_DELAY */
-	int i;
-	for (i = 0; rte_service_lcore_may_be_active(slcore_id) == 1 &&
-			i < 100; i++)
-		rte_delay_ms(SERVICE_DELAY);
+	wait_slcore_inactive(slcore_id);
 
 	TEST_ASSERT_EQUAL(0, rte_service_lcore_may_be_active(slcore_id),
 			  "Service lcore not stopped after waiting.");