[v2,2/2] test/service: fix race condition on stopping lcore

Message ID 20200720143829.46280-2-harry.van.haaren@intel.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series [v2,1/2] service: add API to retrieve service core active |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK
ci/travis-robot success Travis build: passed

Commit Message

Van Haaren, Harry July 20, 2020, 2:38 p.m. UTC
  This commit fixes a potential race condition in the tests
where the lcore running a service would increment a counter
that was already reset by the test-suite thread. The resulting
race-condition incremented value could cause CI failures, as
indicated by DPDK's CI.

This patch fixes the race-condition by making use of the
added rte_service_lcore_active() API, which indicates when
a service-core is no longer in the service-core polling loop.

The unit test makes use of the above function to detect when
all statistics increments are done in the service-core thread,
and then the unit test continues finalizing and checking state.

Fixes: f28f3594ded2 ("service: add attribute API")

Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>

---

Thanks for discussion on v1, this v2 fixup for the CI
including previous feedback on ML.
---
 app/test/test_service_cores.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)
  

Comments

Lukasz Wojciechowski July 20, 2020, 5:45 p.m. UTC | #1
W dniu 20.07.2020 o 16:38, Harry van Haaren pisze:
> This commit fixes a potential race condition in the tests
> where the lcore running a service would increment a counter
> that was already reset by the test-suite thread. The resulting
> race-condition incremented value could cause CI failures, as
> indicated by DPDK's CI.
>
> This patch fixes the race-condition by making use of the
> added rte_service_lcore_active() API, which indicates when
> a service-core is no longer in the service-core polling loop.
>
> The unit test makes use of the above function to detect when
> all statistics increments are done in the service-core thread,
> and then the unit test continues finalizing and checking state.
>
> Fixes: f28f3594ded2 ("service: add attribute API")
>
> Reported-by: David Marchand <david.marchand@redhat.com>
> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
>
> ---
>
> Thanks for discussion on v1, this v2 fixup for the CI
> including previous feedback on ML.
> ---
>   app/test/test_service_cores.c | 22 +++++++++++++++++++++-
>   1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c
> index ef1d8fcb9..a45762915 100644
> --- a/app/test/test_service_cores.c
> +++ b/app/test/test_service_cores.c
> @@ -362,6 +362,9 @@ service_lcore_attr_get(void)
>   			"Service core add did not return zero");
>   	TEST_ASSERT_EQUAL(0, rte_service_map_lcore_set(id, slcore_id, 1),
>   			"Enabling valid service and core failed");
> +	/* Ensure service is not active before starting */
> +	TEST_ASSERT_EQUAL(0, rte_service_lcore_active(slcore_id),
> +			"Not-active service core reported as active");
>   	TEST_ASSERT_EQUAL(0, rte_service_lcore_start(slcore_id),
>   			"Starting service core failed");
>   
> @@ -382,7 +385,24 @@ service_lcore_attr_get(void)
>   			lcore_attr_id, &lcore_attr_value),
>   			"Invalid lcore attr didn't return -EINVAL");
>   
> -	rte_service_lcore_stop(slcore_id);
> +	/* Ensure service is active */
> +	TEST_ASSERT_EQUAL(1, rte_service_lcore_active(slcore_id),
> +			"Active service core reported as not-active");
> +
> +	TEST_ASSERT_EQUAL(0, rte_service_map_lcore_set(id, slcore_id, 0),
> +			"Disabling valid service and core failed");
> +	TEST_ASSERT_EQUAL(0, rte_service_lcore_stop(slcore_id),
> +			"Failed to stop service lcore");
> +
> +	int i = 0;
> +	while (rte_service_lcore_active(slcore_id) == 1) {
> +		rte_delay_ms(1);
> +		i++;
> +		if (i > 100)
> +			break;
> +	}
> +	TEST_ASSERT_EQUAL(0, rte_service_lcore_active(slcore_id),
> +			  "Service lcore not stopped after waiting.");
>   
>   	TEST_ASSERT_EQUAL(0, rte_service_lcore_attr_reset_all(slcore_id),
>   			  "Valid lcore_attr_reset_all() didn't return success");
Acked-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
  
Phil Yang July 21, 2020, 8:38 a.m. UTC | #2
<...>

> Subject: [dpdk-dev] [PATCH v2 2/2] test/service: fix race condition on
> stopping lcore
> 
> This commit fixes a potential race condition in the tests
> where the lcore running a service would increment a counter
> that was already reset by the test-suite thread. The resulting
> race-condition incremented value could cause CI failures, as
> indicated by DPDK's CI.
> 
> This patch fixes the race-condition by making use of the
> added rte_service_lcore_active() API, which indicates when
> a service-core is no longer in the service-core polling loop.
> 
> The unit test makes use of the above function to detect when
> all statistics increments are done in the service-core thread,
> and then the unit test continues finalizing and checking state.
> 
> Fixes: f28f3594ded2 ("service: add attribute API")
> 
> Reported-by: David Marchand <david.marchand@redhat.com>
> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>

Minor nit, otherwise it looks good to me.

Reviewed-by: Phil Yang <phil.yang@arm.com>

> 
> ---
> 
> Thanks for discussion on v1, this v2 fixup for the CI
> including previous feedback on ML.
> ---
>  app/test/test_service_cores.c | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c
> index ef1d8fcb9..a45762915 100644
> --- a/app/test/test_service_cores.c
> +++ b/app/test/test_service_cores.c
> @@ -362,6 +362,9 @@ service_lcore_attr_get(void)
>  			"Service core add did not return zero");
>  	TEST_ASSERT_EQUAL(0, rte_service_map_lcore_set(id, slcore_id, 1),
>  			"Enabling valid service and core failed");
> +	/* Ensure service is not active before starting */
> +	TEST_ASSERT_EQUAL(0, rte_service_lcore_active(slcore_id),
> +			"Not-active service core reported as active");
>  	TEST_ASSERT_EQUAL(0, rte_service_lcore_start(slcore_id),
>  			"Starting service core failed");
> 
> @@ -382,7 +385,24 @@ service_lcore_attr_get(void)
>  			lcore_attr_id, &lcore_attr_value),
>  			"Invalid lcore attr didn't return -EINVAL");
> 
> -	rte_service_lcore_stop(slcore_id);
> +	/* Ensure service is active */
> +	TEST_ASSERT_EQUAL(1, rte_service_lcore_active(slcore_id),
> +			"Active service core reported as not-active");
> +
> +	TEST_ASSERT_EQUAL(0, rte_service_map_lcore_set(id, slcore_id, 0),
> +			"Disabling valid service and core failed");
> +	TEST_ASSERT_EQUAL(0, rte_service_lcore_stop(slcore_id),
> +			"Failed to stop service lcore");
> +
> +	int i = 0;
> +	while (rte_service_lcore_active(slcore_id) == 1) {
> +		rte_delay_ms(1);

Just as it does in other functions, use the macro instead of the magic number would be better.
rte_delay_ms(SERVICE_DELAY); 

> +		i++;
> +		if (i > 100)
> +			break;
> +	}
> +	TEST_ASSERT_EQUAL(0, rte_service_lcore_active(slcore_id),
> +			  "Service lcore not stopped after waiting.");
> 
>  	TEST_ASSERT_EQUAL(0, rte_service_lcore_attr_reset_all(slcore_id),
>  			  "Valid lcore_attr_reset_all() didn't return success");
> --
> 2.17.1
  
Van Haaren, Harry July 22, 2020, 10:26 a.m. UTC | #3
> -----Original Message-----
> From: Phil Yang <Phil.Yang@arm.com>
> Sent: Tuesday, July 21, 2020 9:39 AM
> To: Van Haaren, Harry <harry.van.haaren@intel.com>; dev@dpdk.org
> Cc: david.marchand@redhat.com; igor.romanov@oktetlabs.ru; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Yigit, Ferruh
> <ferruh.yigit@intel.com>; nd <nd@arm.com>; aconole@redhat.com;
> l.wojciechow@partner.samsung.com; nd <nd@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v2 2/2] test/service: fix race condition on
> stopping lcore
> 
> <...>
> 
> > Subject: [dpdk-dev] [PATCH v2 2/2] test/service: fix race condition on
> > stopping lcore
> >
> > This commit fixes a potential race condition in the tests
> > where the lcore running a service would increment a counter
> > that was already reset by the test-suite thread. The resulting
> > race-condition incremented value could cause CI failures, as
> > indicated by DPDK's CI.
> >
> > This patch fixes the race-condition by making use of the
> > added rte_service_lcore_active() API, which indicates when
> > a service-core is no longer in the service-core polling loop.
> >
> > The unit test makes use of the above function to detect when
> > all statistics increments are done in the service-core thread,
> > and then the unit test continues finalizing and checking state.
> >
> > Fixes: f28f3594ded2 ("service: add attribute API")
> >
> > Reported-by: David Marchand <david.marchand@redhat.com>
> > Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> 
> Minor nit, otherwise it looks good to me.
> 
> Reviewed-by: Phil Yang <phil.yang@arm.com>

Thanks, will add in v3.

<snip>

> > +	int i = 0;
> > +	while (rte_service_lcore_active(slcore_id) == 1) {
> > +		rte_delay_ms(1);
> 
> Just as it does in other functions, use the macro instead of the magic number
> would be better.
> rte_delay_ms(SERVICE_DELAY);

Sure, will change. I've refactored the while() to a for() too, think it cleans up a little.

<snip>
  

Patch

diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c
index ef1d8fcb9..a45762915 100644
--- a/app/test/test_service_cores.c
+++ b/app/test/test_service_cores.c
@@ -362,6 +362,9 @@  service_lcore_attr_get(void)
 			"Service core add did not return zero");
 	TEST_ASSERT_EQUAL(0, rte_service_map_lcore_set(id, slcore_id, 1),
 			"Enabling valid service and core failed");
+	/* Ensure service is not active before starting */
+	TEST_ASSERT_EQUAL(0, rte_service_lcore_active(slcore_id),
+			"Not-active service core reported as active");
 	TEST_ASSERT_EQUAL(0, rte_service_lcore_start(slcore_id),
 			"Starting service core failed");
 
@@ -382,7 +385,24 @@  service_lcore_attr_get(void)
 			lcore_attr_id, &lcore_attr_value),
 			"Invalid lcore attr didn't return -EINVAL");
 
-	rte_service_lcore_stop(slcore_id);
+	/* Ensure service is active */
+	TEST_ASSERT_EQUAL(1, rte_service_lcore_active(slcore_id),
+			"Active service core reported as not-active");
+
+	TEST_ASSERT_EQUAL(0, rte_service_map_lcore_set(id, slcore_id, 0),
+			"Disabling valid service and core failed");
+	TEST_ASSERT_EQUAL(0, rte_service_lcore_stop(slcore_id),
+			"Failed to stop service lcore");
+
+	int i = 0;
+	while (rte_service_lcore_active(slcore_id) == 1) {
+		rte_delay_ms(1);
+		i++;
+		if (i > 100)
+			break;
+	}
+	TEST_ASSERT_EQUAL(0, rte_service_lcore_active(slcore_id),
+			  "Service lcore not stopped after waiting.");
 
 	TEST_ASSERT_EQUAL(0, rte_service_lcore_attr_reset_all(slcore_id),
 			  "Valid lcore_attr_reset_all() didn't return success");