[v3] lib/eal: fix segfaults due to thread exit order
Checks
Commit Message
From: Zhichao Zeng <zhichaox.zeng@intel.com>
The eal-intr-thread is not closed before memory cleanup in the
process of exiting. There is a small probability that when the
eal-intr-thread is about to use some pointers, the memory were
just cleaned, which cause the segment fault error caught by ASan.
This patch close the eal-intr-thread before memory cleanup when
exiting to avoid segment fault.
---
v2: add the same API for FreeBSD
---
v3: fix rte_eal_cleanup crash in debug_autotest
Signed-off-by: Zhichao Zeng <zhichaox.zeng@intel.com>
---
lib/eal/common/eal_private.h | 7 +++++++
lib/eal/freebsd/eal.c | 22 +++++++++++++++++++++-
lib/eal/freebsd/eal_interrupts.c | 12 ++++++++++++
lib/eal/linux/eal.c | 21 ++++++++++++++++++++-
lib/eal/linux/eal_interrupts.c | 12 ++++++++++++
5 files changed, 72 insertions(+), 2 deletions(-)
Comments
On Mon, 30 May 2022 13:47:38 +0000
zhichaox.zeng@intel.com wrote:
> +static void warn_parent(void)
> +{
> + RTE_LOG(WARNING, EAL, "fork() was called, DPDK won't work in the child "
> + "process unless it calls rte_eal_init()\n");
> +}
> +
Too wordy. Don't break messages across lines.
On Mon, 30 May 2022 13:47:38 +0000
zhichaox.zeng@intel.com wrote:
> @@ -883,6 +896,8 @@ rte_eal_init(int argc, char **argv)
>
> eal_mcfg_complete();
>
> + pthread_atfork(NULL, warn_parent, scratch_child);
> +
> return fctret;
> }
There are lots of other cases where DPDK will die if you fork()
in a DPDK process then call DPDK functions in child.
Not sure what the problem you are trying to solve is?
Hi Stephen:
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Tuesday, May 31, 2022 12:29 AM
> To: Zeng, ZhichaoX <zhichaox.zeng@intel.com>
> Cc: dev@dpdk.org; Yang, Qiming <qiming.yang@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; hkalra@marvell.com; david.marchand@redhat.com; aconole@redhat.com; Matz, Olivier > > <olivier.matz@6wind.com>; thomas@monjalon.net; stable@dpdk.org
> Subject: Re: [PATCH v3] lib/eal: fix segfaults due to thread exit order
>
> On Mon, 30 May 2022 13:47:38 +0000
> zhichaox.zeng@intel.com wrote:
> > @@ -883,6 +896,8 @@ rte_eal_init(int argc, char **argv)
> >
> > eal_mcfg_complete();
> >
> > + pthread_atfork(NULL, warn_parent, scratch_child);
> > +
> > return fctret;
> > }
> There are lots of other cases where DPDK will die if you fork() in a DPDK process then call DPDK functions in child.
> Not sure what the problem you are trying to solve is?
The original goal of this patch was to cancel eal-intr-thread before memory cleanup to avoid a small probability of segfaults.
But in the debug_autotest test of dpdk-test, fork() is called and the exit process is tested in the child process which will call the rte_eal_cleanup function and mistakenly clean up non-existing threads. This will cause segmentation fault.
Based on patch v2, atomic operation is added to determine whether it is a child process, so that the cleaning process will not execute in the child process to avoid the above problem.
Regards
Hi David, Harman
Please review this patch at your convenience. Thanks!
In addition, I tried to figure out the reason for the failure of the CI test. There is a saying that
this is a problem with Asan when cancelling a thread that is waiting on epoll. After adding the
'-fstack-protector-all' parameter, there will be no exception. But I don't know the software
environment for automated testing, so I can't verify this statement.
Regards
> Subject: [PATCH v3] lib/eal: fix segfaults due to thread exit order
> From: Zhichao Zeng <zhichaox.zeng@intel.com>
> The eal-intr-thread is not closed before memory cleanup in the process of exiting. There is a small probability that when the eal-intr-thread is about to use some pointers, the memory were just cleaned, which cause the segment fault error caught by ASan.
> This patch close the eal-intr-thread before memory cleanup when exiting to avoid segment fault.
> ---
> v2: add the same API for FreeBSD
> ---
> v3: fix rte_eal_cleanup crash in debug_autotest
> Signed-off-by: Zhichao Zeng <zhichaox.zeng@intel.com>
@@ -152,6 +152,13 @@ int rte_eal_tailqs_init(void);
*/
int rte_eal_intr_init(void);
+/**
+ * Destroy interrupt handling thread.
+ *
+ * This function is private to EAL.
+ */
+void rte_eal_intr_destroy(void);
+
/**
* Close the default log stream
*
@@ -72,6 +72,8 @@ struct lcore_config lcore_config[RTE_MAX_LCORE];
/* used by rte_rdtsc() */
int rte_cycles_vmware_tsc_map;
+/* used to judge the running status of the eal */
+static uint32_t run_once;
int
eal_clean_runtime_dir(void)
@@ -574,12 +576,23 @@ static void rte_eal_init_alert(const char *msg)
RTE_LOG(ERR, EAL, "%s\n", msg);
}
+static void warn_parent(void)
+{
+ RTE_LOG(WARNING, EAL, "fork() was called, DPDK won't work in the child "
+ "process unless it calls rte_eal_init()\n");
+}
+
+static void scratch_child(void)
+{
+ /* Scratch run_once so that a call to rte_eal_cleanup won't crash... */
+ __atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+}
+
/* Launch threads, called at application init(). */
int
rte_eal_init(int argc, char **argv)
{
int i, fctret, ret;
- static uint32_t run_once;
uint32_t has_run = 0;
char cpuset[RTE_CPU_AFFINITY_STR_LEN];
char thread_name[RTE_MAX_THREAD_NAME_LEN];
@@ -883,6 +896,8 @@ rte_eal_init(int argc, char **argv)
eal_mcfg_complete();
+ pthread_atfork(NULL, warn_parent, scratch_child);
+
return fctret;
}
@@ -891,8 +906,13 @@ rte_eal_cleanup(void)
{
struct internal_config *internal_conf =
eal_get_internal_configuration();
+
+ if (__atomic_load_n(&run_once, __ATOMIC_RELAXED) == 0)
+ return 0;
+
rte_service_finalize();
rte_mp_channel_cleanup();
+ rte_eal_intr_destroy();
/* after this point, any DPDK pointers will become dangling */
rte_eal_memory_detach();
rte_eal_alarm_cleanup();
@@ -648,6 +648,18 @@ rte_eal_intr_init(void)
return ret;
}
+void
+rte_eal_intr_destroy(void)
+{
+ /* cancel the host thread to wait/handle the interrupt */
+ pthread_cancel(intr_thread);
+ pthread_join(intr_thread, NULL);
+
+ /* close kqueue */
+ close(kq);
+ kq = -1;
+}
+
int
rte_intr_rx_ctl(struct rte_intr_handle *intr_handle,
int epfd, int op, unsigned int vec, void *data)
@@ -76,6 +76,8 @@ struct lcore_config lcore_config[RTE_MAX_LCORE];
/* used by rte_rdtsc() */
int rte_cycles_vmware_tsc_map;
+/* used to judge the running status of the eal */
+static uint32_t run_once;
int
eal_clean_runtime_dir(void)
@@ -857,12 +859,23 @@ is_iommu_enabled(void)
return n > 2;
}
+static void warn_parent(void)
+{
+ RTE_LOG(WARNING, EAL, "fork() was called, DPDK won't work in the child "
+ "process unless it calls rte_eal_init()\n");
+}
+
+static void scratch_child(void)
+{
+ /* Scratch run_once so that a call to rte_eal_cleanup won't crash... */
+ __atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+}
+
/* Launch threads, called at application init(). */
int
rte_eal_init(int argc, char **argv)
{
int i, fctret, ret;
- static uint32_t run_once;
uint32_t has_run = 0;
const char *p;
static char logid[PATH_MAX];
@@ -1228,6 +1241,8 @@ rte_eal_init(int argc, char **argv)
eal_mcfg_complete();
+ pthread_atfork(NULL, warn_parent, scratch_child);
+
return fctret;
}
@@ -1257,6 +1272,9 @@ rte_eal_cleanup(void)
struct internal_config *internal_conf =
eal_get_internal_configuration();
+ if (__atomic_load_n(&run_once, __ATOMIC_RELAXED) == 0)
+ return 0;
+
if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
internal_conf->hugepage_file.unlink_existing)
rte_memseg_walk(mark_freeable, NULL);
@@ -1266,6 +1284,7 @@ rte_eal_cleanup(void)
vfio_mp_sync_cleanup();
#endif
rte_mp_channel_cleanup();
+ rte_eal_intr_destroy();
/* after this point, any DPDK pointers will become dangling */
rte_eal_memory_detach();
eal_mp_dev_hotplug_cleanup();
@@ -1199,6 +1199,18 @@ rte_eal_intr_init(void)
return ret;
}
+void
+rte_eal_intr_destroy(void)
+{
+ /* cancel the host thread to wait/handle the interrupt */
+ pthread_cancel(intr_thread);
+ pthread_join(intr_thread, NULL);
+
+ /* close the pipe used by epoll */
+ close(intr_pipe.writefd);
+ close(intr_pipe.readfd);
+}
+
static void
eal_intr_proc_rxtx_intr(int fd, const struct rte_intr_handle *intr_handle)
{