diff mbox series

[v7] eal: fix race in ctrl thread creation

Message ID 20210407152903.130730-1-lucp.at.work@gmail.com (mailing list archive)
State Superseded
Delegated to: David Marchand
Headers show
Series [v7] eal: fix race in ctrl thread creation | expand

Checks

Context Check Description
ci/intel-Testing success Testing PASS
ci/iol-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/github-robot success github build: passed
ci/travis-robot success travis build: passed
ci/checkpatch warning coding style issues

Commit Message

Luc Pelletier April 7, 2021, 3:29 p.m. UTC
The creation of control threads uses a pthread barrier for
synchronization. This patch fixes a race condition where the pthread
barrier could get destroyed while one of the threads has not yet
returned from the pthread_barrier_wait function, which could result in
undefined behaviour.

Fixes: 3a0d465d4c53 ("eal: fix use-after-free on control thread creation")
Cc: jianfeng.tan@intel.com
Cc: stable@dpdk.org

Signed-off-by: Luc Pelletier <lucp.at.work@gmail.com>
---

Hi Olivier,

Olivier, thanks for pointing out that pthread_barrier_wait acts as a
full memory barrier; I didn't know that was explicitly specified in the
documentation. I've changed it to use a regular read/write.

Hi Honnappa,

I have not changed the code to use pthread_attr_setaffinity_np because
the attr parameter is const. I could make a copy of the provided
attributes and use that instead, but I think this would still violate
the spirit of the API and, AFAICT, there's no official mechanism for
copying a pthread_attr_t.

Please let me know what you think.

 lib/librte_eal/common/eal_common_thread.c | 63 +++++++++++++----------
 1 file changed, 35 insertions(+), 28 deletions(-)

Comments

Honnappa Nagarahalli April 7, 2021, 5:15 p.m. UTC | #1
<snip>

> Subject: [PATCH v7] eal: fix race in ctrl thread creation
> 
> The creation of control threads uses a pthread barrier for synchronization.
> This patch fixes a race condition where the pthread barrier could get
> destroyed while one of the threads has not yet returned from the
> pthread_barrier_wait function, which could result in undefined behaviour.
> 
> Fixes: 3a0d465d4c53 ("eal: fix use-after-free on control thread creation")
> Cc: jianfeng.tan@intel.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Luc Pelletier <lucp.at.work@gmail.com>
Overall, this LGTM.

Just wondering if this should be split into 2 commits.
1) Fix the race condition with refcnt.
2) Remove the use of pthread_cancel

I have some observations below.
> ---
> 
> Hi Olivier,
> 
> Olivier, thanks for pointing out that pthread_barrier_wait acts as a full
> memory barrier; I didn't know that was explicitly specified in the
"full memory barrier" just says that the memory operations prior and after the barrier will not be mixed. It does not guarantee that the modification done by one core are 'visible' to other cores. Additional code is required to ensure that the modifications are visible. For ex: in the following code setting start_routine=NULL is not guaranteed to be visible to other cores if pthread_barrier_wait is just a full barrier.
The particular wording used in the reference is "synchronize memory with respect to other threads". I am not exactly sure what is means, but may be safe to assume it means the updates are 'visible' to other cores (as the locking APIs are part of the list and they guarantee the visibility aspects).

May be we can simplify this during 21.11 release as we will have the ability to make more invasive changes.

> documentation. I've changed it to use a regular read/write.
> 
> Hi Honnappa,
> 
> I have not changed the code to use pthread_attr_setaffinity_np because the
> attr parameter is const. I could make a copy of the provided attributes and
> use that instead, but I think this would still violate the spirit of the API and,
> AFAICT, there's no official mechanism for copying a pthread_attr_t.
Ok, thank you.

> 
> Please let me know what you think.
> 
>  lib/librte_eal/common/eal_common_thread.c | 63 +++++++++++++----------
>  1 file changed, 35 insertions(+), 28 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_thread.c
> b/lib/librte_eal/common/eal_common_thread.c
> index 73a055902..d4e09f84b 100644
> --- a/lib/librte_eal/common/eal_common_thread.c
> +++ b/lib/librte_eal/common/eal_common_thread.c
> @@ -170,25 +170,34 @@ struct rte_thread_ctrl_params {
>  	void *(*start_routine)(void *);
>  	void *arg;
>  	pthread_barrier_t configured;
> +	unsigned int refcnt;
>  };
> 
> +static void ctrl_params_free(struct rte_thread_ctrl_params *params) {
> +	if (__atomic_sub_fetch(&params->refcnt, 1, __ATOMIC_ACQ_REL) ==
> 0) {
> +		pthread_barrier_destroy(&params->configured);
> +		free(params);
> +	}
> +}
> +
>  static void *ctrl_thread_init(void *arg)  {
> -	int ret;
>  	struct internal_config *internal_conf =
>  		eal_get_internal_configuration();
>  	rte_cpuset_t *cpuset = &internal_conf->ctrl_cpuset;
>  	struct rte_thread_ctrl_params *params = arg;
> -	void *(*start_routine)(void *) = params->start_routine;
> +	void *(*start_routine)(void *);
>  	void *routine_arg = params->arg;
> 
>  	__rte_thread_init(rte_lcore_id(), cpuset);
> 
> -	ret = pthread_barrier_wait(&params->configured);
> -	if (ret == PTHREAD_BARRIER_SERIAL_THREAD) {
> -		pthread_barrier_destroy(&params->configured);
> -		free(params);
> -	}
> +	pthread_barrier_wait(&params->configured);
> +	start_routine = params->start_routine;
> +	ctrl_params_free(params);
> +
> +	if (start_routine == NULL)
> +		return NULL;
> 
>  	return start_routine(routine_arg);
>  }
> @@ -210,14 +219,15 @@ rte_ctrl_thread_create(pthread_t *thread, const
> char *name,
> 
>  	params->start_routine = start_routine;
>  	params->arg = arg;
> +	params->refcnt = 2;
> 
> -	pthread_barrier_init(&params->configured, NULL, 2);
> +	ret = pthread_barrier_init(&params->configured, NULL, 2);
> +	if (ret != 0)
> +		goto fail_no_barrier;
> 
>  	ret = pthread_create(thread, attr, ctrl_thread_init, (void *)params);
> -	if (ret != 0) {
> -		free(params);
> -		return -ret;
> -	}
> +	if (ret != 0)
> +		goto fail_with_barrier;
> 
>  	if (name != NULL) {
>  		ret = rte_thread_setname(*thread, name); @@ -227,25
> +237,22 @@ rte_ctrl_thread_create(pthread_t *thread, const char *name,
>  	}
> 
>  	ret = pthread_setaffinity_np(*thread, sizeof(*cpuset), cpuset);
> -	if (ret)
> -		goto fail;
> +	if (ret != 0)
> +		params->start_routine = NULL;
> +	pthread_barrier_wait(&params->configured);
> +	ctrl_params_free(params);
> 
> -	ret = pthread_barrier_wait(&params->configured);
> -	if (ret == PTHREAD_BARRIER_SERIAL_THREAD) {
> -		pthread_barrier_destroy(&params->configured);
> -		free(params);
> -	}
> +	if (ret != 0)
A comment here on why the pthread_join will not wait indefinitely will help. Splitting the patch will also help understand the changes better.

> +		pthread_join(*thread, NULL);
> 
> -	return 0;
> +	return -ret;
> +
> +fail_with_barrier:
> +	pthread_barrier_destroy(&params->configured);
> +
> +fail_no_barrier:
> +	free(params);
> 
> -fail:
> -	if (PTHREAD_BARRIER_SERIAL_THREAD ==
> -	    pthread_barrier_wait(&params->configured)) {
> -		pthread_barrier_destroy(&params->configured);
> -		free(params);
> -	}
> -	pthread_cancel(*thread);
> -	pthread_join(*thread, NULL);
>  	return -ret;
>  }
> 
> --
> 2.25.1
diff mbox series

Patch

diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c
index 73a055902..d4e09f84b 100644
--- a/lib/librte_eal/common/eal_common_thread.c
+++ b/lib/librte_eal/common/eal_common_thread.c
@@ -170,25 +170,34 @@  struct rte_thread_ctrl_params {
 	void *(*start_routine)(void *);
 	void *arg;
 	pthread_barrier_t configured;
+	unsigned int refcnt;
 };
 
+static void ctrl_params_free(struct rte_thread_ctrl_params *params)
+{
+	if (__atomic_sub_fetch(&params->refcnt, 1, __ATOMIC_ACQ_REL) == 0) {
+		pthread_barrier_destroy(&params->configured);
+		free(params);
+	}
+}
+
 static void *ctrl_thread_init(void *arg)
 {
-	int ret;
 	struct internal_config *internal_conf =
 		eal_get_internal_configuration();
 	rte_cpuset_t *cpuset = &internal_conf->ctrl_cpuset;
 	struct rte_thread_ctrl_params *params = arg;
-	void *(*start_routine)(void *) = params->start_routine;
+	void *(*start_routine)(void *);
 	void *routine_arg = params->arg;
 
 	__rte_thread_init(rte_lcore_id(), cpuset);
 
-	ret = pthread_barrier_wait(&params->configured);
-	if (ret == PTHREAD_BARRIER_SERIAL_THREAD) {
-		pthread_barrier_destroy(&params->configured);
-		free(params);
-	}
+	pthread_barrier_wait(&params->configured);
+	start_routine = params->start_routine;
+	ctrl_params_free(params);
+
+	if (start_routine == NULL)
+		return NULL;
 
 	return start_routine(routine_arg);
 }
@@ -210,14 +219,15 @@  rte_ctrl_thread_create(pthread_t *thread, const char *name,
 
 	params->start_routine = start_routine;
 	params->arg = arg;
+	params->refcnt = 2;
 
-	pthread_barrier_init(&params->configured, NULL, 2);
+	ret = pthread_barrier_init(&params->configured, NULL, 2);
+	if (ret != 0)
+		goto fail_no_barrier;
 
 	ret = pthread_create(thread, attr, ctrl_thread_init, (void *)params);
-	if (ret != 0) {
-		free(params);
-		return -ret;
-	}
+	if (ret != 0)
+		goto fail_with_barrier;
 
 	if (name != NULL) {
 		ret = rte_thread_setname(*thread, name);
@@ -227,25 +237,22 @@  rte_ctrl_thread_create(pthread_t *thread, const char *name,
 	}
 
 	ret = pthread_setaffinity_np(*thread, sizeof(*cpuset), cpuset);
-	if (ret)
-		goto fail;
+	if (ret != 0)
+		params->start_routine = NULL;
+	pthread_barrier_wait(&params->configured);
+	ctrl_params_free(params);
 
-	ret = pthread_barrier_wait(&params->configured);
-	if (ret == PTHREAD_BARRIER_SERIAL_THREAD) {
-		pthread_barrier_destroy(&params->configured);
-		free(params);
-	}
+	if (ret != 0)
+		pthread_join(*thread, NULL);
 
-	return 0;
+	return -ret;
+
+fail_with_barrier:
+	pthread_barrier_destroy(&params->configured);
+
+fail_no_barrier:
+	free(params);
 
-fail:
-	if (PTHREAD_BARRIER_SERIAL_THREAD ==
-	    pthread_barrier_wait(&params->configured)) {
-		pthread_barrier_destroy(&params->configured);
-		free(params);
-	}
-	pthread_cancel(*thread);
-	pthread_join(*thread, NULL);
 	return -ret;
 }