[v2] eal: fix memory initialization deadlock
Checks
Commit Message
The issue arose due to the change in the DPDK read-write lock
implementation. That change added a new flag, RTE_RWLOCK_WAIT, designed
to prevent new read locks while a write lock is in the queue. However,
this change has led to a scenario where a recursive read lock, where a
lock is acquired twice by the same execution thread, can initiate a
sequence of events resulting in a deadlock:
Process 1 takes the first read lock.
Process 2 attempts to take a write lock, triggering RTE_RWLOCK_WAIT due
to the presence of a read lock. This makes process 2 enter a wait loop
until the read lock is released.
Process 1 tries to take a second read lock. However, since a write lock
is waiting (due to RTE_RWLOCK_WAIT), it also enters a wait loop until
the write lock is acquired and then released.
Both processes end up in a blocked state, unable to proceed, resulting
in a deadlock scenario.
Following these changes, the RW-lock no longer supports
recursion, implying that a single thread shouldn't obtain a read lock if
it already possesses one. The problem arises during initialization: the
rte_eal_init() function acquires the memory_hotplug_lock, and later on,
the sequence of calls rte_eal_memory_init() -> eal_memalloc_init() ->
rte_memseg_list_walk() acquires it again without releasing it. This
scenario introduces the risk of a potential deadlock when concurrent
write locks are applied to the same memory_hotplug_lock. To address this
we resolved the issue by replacing rte_memseg_list_walk() with
rte_memseg_list_walk_thread_unsafe().
Bugzilla ID: 1277
Fixes: 832cecc03d77 ("rwlock: prevent readers from starving writers")
Cc: stable@dpdk.org
Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
---
v2:
changed walk to thread-unsafe version in eal_dynmem_hugepage_init() 32-bit flow
---
lib/eal/common/eal_common_dynmem.c | 5 ++++-
lib/eal/include/generic/rte_rwlock.h | 4 ++++
lib/eal/linux/eal_memalloc.c | 7 +++++--
3 files changed, 13 insertions(+), 3 deletions(-)
Comments
2023-09-04 11:24 (UTC+0300), Artemy Kovalyov:
> diff --git a/lib/eal/common/eal_common_dynmem.c b/lib/eal/common/eal_common_dynmem.c
> index bdbbe233a0..0d5da40096 100644
> --- a/lib/eal/common/eal_common_dynmem.c
> +++ b/lib/eal/common/eal_common_dynmem.c
> @@ -251,7 +251,10 @@ eal_dynmem_hugepage_init(void)
> */
> memset(&dummy, 0, sizeof(dummy));
> dummy.hugepage_sz = hpi->hugepage_sz;
> - if (rte_memseg_list_walk(hugepage_count_walk, &dummy) < 0)
> + /* memory_hotplug_lock is taken in rte_eal_init(), so it's
> + * safe to call thread-unsafe version.
> + */
Nit: the lock is really taken in rte_eal_memory_init().
Probably "The lock is held during initialization, so..."
would more robust against code changes and differences between platforms.
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> > + /* memory_hotplug_lock is taken in rte_eal_init(), so it's
> > + * safe to call thread-unsafe version.
> > + */
>
> Nit: the lock is really taken in rte_eal_memory_init().
> Probably "The lock is held during initialization, so..."
> would more robust against code changes and differences between platforms.
It was previously located differently, but in the current version, it has been shifted to rte_eal_init(). It might be worth noting this to ensure that if there are further code changes in the future, the locking problem becomes more apparent. We had discussed this in the bug report.
On Tue, Sep 5, 2023 at 11:05 AM Artemy Kovalyov <artemyko@nvidia.com> wrote:
>
> > > + /* memory_hotplug_lock is taken in rte_eal_init(), so it's
> > > + * safe to call thread-unsafe version.
> > > + */
> >
> > Nit: the lock is really taken in rte_eal_memory_init().
> > Probably "The lock is held during initialization, so..."
> > would more robust against code changes and differences between platforms.
>
> It was previously located differently, but in the current version, it has been shifted to rte_eal_init(). It might be worth noting this to ensure that if there are further code changes in the future, the locking problem becomes more apparent. We had discussed this in the bug report.
One option to explore is lock annotations.
One note thought: those annotations do not get inherited in called code.
So some special care is needed to maintain/annotate all code leading
to the locations where the locks do matter.
Quick example with rte_memseg_list_walk:
https://github.com/david-marchand/dpdk/commit/mem_annotations
And clang catches a deadlock:
https://github.com/david-marchand/dpdk/actions/runs/6082842080/job/16501450978#step:19:816
@@ -251,7 +251,10 @@ eal_dynmem_hugepage_init(void)
*/
memset(&dummy, 0, sizeof(dummy));
dummy.hugepage_sz = hpi->hugepage_sz;
- if (rte_memseg_list_walk(hugepage_count_walk, &dummy) < 0)
+ /* memory_hotplug_lock is taken in rte_eal_init(), so it's
+ * safe to call thread-unsafe version.
+ */
+ if (rte_memseg_list_walk_thread_unsafe(hugepage_count_walk, &dummy) < 0)
return -1;
for (i = 0; i < RTE_DIM(dummy.num_pages); i++) {
@@ -80,6 +80,10 @@ rte_rwlock_init(rte_rwlock_t *rwl)
/**
* Take a read lock. Loop until the lock is held.
*
+ * @note The RW lock isn't recursive, so calling this function on the same
+ * lock twice without releasing it could potentially result in a deadlock
+ * scenario when a write lock is involved.
+ *
* @param rwl
* A pointer to a rwlock structure.
*/
@@ -1740,7 +1740,10 @@ eal_memalloc_init(void)
eal_get_internal_configuration();
if (rte_eal_process_type() == RTE_PROC_SECONDARY)
- if (rte_memseg_list_walk(secondary_msl_create_walk, NULL) < 0)
+ /* memory_hotplug_lock is taken in rte_eal_init(), so it's
+ * safe to call thread-unsafe version.
+ */
+ if (rte_memseg_list_walk_thread_unsafe(secondary_msl_create_walk, NULL) < 0)
return -1;
if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
internal_conf->in_memory) {
@@ -1778,7 +1781,7 @@ eal_memalloc_init(void)
}
/* initialize all of the fd lists */
- if (rte_memseg_list_walk(fd_list_create_walk, NULL))
+ if (rte_memseg_list_walk_thread_unsafe(fd_list_create_walk, NULL))
return -1;
return 0;
}