[v2] eal: fix memory initialization deadlock

Message ID 20230904082455.3864024-1-artemyko@nvidia.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [v2] eal: fix memory initialization deadlock |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/github-robot: build success github build: passed
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-sample-apps-testing success Testing PASS
ci/iol-unit-amd64-testing fail Testing issues
ci/iol-compile-amd64-testing success Testing PASS
ci/iol-compile-arm64-testing success Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS

Commit Message

Artemy Kovalyov Sept. 4, 2023, 8:24 a.m. UTC
  The issue arose due to the change in the DPDK read-write lock
implementation. That change added a new flag, RTE_RWLOCK_WAIT, designed
to prevent new read locks while a write lock is in the queue. However,
this change has led to a scenario where a recursive read lock, where a
lock is acquired twice by the same execution thread, can initiate a
sequence of events resulting in a deadlock:

Process 1 takes the first read lock.
Process 2 attempts to take a write lock, triggering RTE_RWLOCK_WAIT due
to the presence of a read lock. This makes process 2 enter a wait loop
until the read lock is released.
Process 1 tries to take a second read lock. However, since a write lock
is waiting (due to RTE_RWLOCK_WAIT), it also enters a wait loop until
the write lock is acquired and then released.

Both processes end up in a blocked state, unable to proceed, resulting
in a deadlock scenario.

Following these changes, the RW-lock no longer supports
recursion, implying that a single thread shouldn't obtain a read lock if
it already possesses one. The problem arises during initialization: the
rte_eal_init() function acquires the memory_hotplug_lock, and later on,
the sequence of calls rte_eal_memory_init() -> eal_memalloc_init() ->
rte_memseg_list_walk() acquires it again without releasing it. This
scenario introduces the risk of a potential deadlock when concurrent
write locks are applied to the same memory_hotplug_lock. To address this
we resolved the issue by replacing rte_memseg_list_walk() with
rte_memseg_list_walk_thread_unsafe().

Bugzilla ID: 1277
Fixes: 832cecc03d77 ("rwlock: prevent readers from starving writers")
Cc: stable@dpdk.org

Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
---
v2:
changed walk to thread-unsafe version in eal_dynmem_hugepage_init() 32-bit flow
---
 lib/eal/common/eal_common_dynmem.c   | 5 ++++-
 lib/eal/include/generic/rte_rwlock.h | 4 ++++
 lib/eal/linux/eal_memalloc.c         | 7 +++++--
 3 files changed, 13 insertions(+), 3 deletions(-)
  

Comments

Dmitry Kozlyuk Sept. 5, 2023, 7:05 a.m. UTC | #1
2023-09-04 11:24 (UTC+0300), Artemy Kovalyov:
> diff --git a/lib/eal/common/eal_common_dynmem.c b/lib/eal/common/eal_common_dynmem.c
> index bdbbe233a0..0d5da40096 100644
> --- a/lib/eal/common/eal_common_dynmem.c
> +++ b/lib/eal/common/eal_common_dynmem.c
> @@ -251,7 +251,10 @@ eal_dynmem_hugepage_init(void)
>  		 */
>  		memset(&dummy, 0, sizeof(dummy));
>  		dummy.hugepage_sz = hpi->hugepage_sz;
> -		if (rte_memseg_list_walk(hugepage_count_walk, &dummy) < 0)
> +		/*  memory_hotplug_lock is taken in rte_eal_init(), so it's
> +		 *  safe to call thread-unsafe version.
> +		 */

Nit: the lock is really taken in rte_eal_memory_init().
Probably "The lock is held during initialization, so..."
would more robust against code changes and differences between platforms.

Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
  
Artemy Kovalyov Sept. 5, 2023, 9:05 a.m. UTC | #2
> > +		/*  memory_hotplug_lock is taken in rte_eal_init(), so it's
> > +		 *  safe to call thread-unsafe version.
> > +		 */
>
> Nit: the lock is really taken in rte_eal_memory_init().
> Probably "The lock is held during initialization, so..."
> would more robust against code changes and differences between platforms.

It was previously located differently, but in the current version, it has been shifted to rte_eal_init(). It might be worth noting this to ensure that if there are further code changes in the future, the locking problem becomes more apparent. We had discussed this in the bug report.
  
David Marchand Sept. 5, 2023, 10:15 a.m. UTC | #3
On Tue, Sep 5, 2023 at 11:05 AM Artemy Kovalyov <artemyko@nvidia.com> wrote:
>
> > > +           /*  memory_hotplug_lock is taken in rte_eal_init(), so it's
> > > +            *  safe to call thread-unsafe version.
> > > +            */
> >
> > Nit: the lock is really taken in rte_eal_memory_init().
> > Probably "The lock is held during initialization, so..."
> > would more robust against code changes and differences between platforms.
>
> It was previously located differently, but in the current version, it has been shifted to rte_eal_init(). It might be worth noting this to ensure that if there are further code changes in the future, the locking problem becomes more apparent. We had discussed this in the bug report.

One option to explore is lock annotations.

One note thought: those annotations do not get inherited in called code.
So some special care is needed to maintain/annotate all code leading
to the locations where the locks do matter.

Quick example with rte_memseg_list_walk:
https://github.com/david-marchand/dpdk/commit/mem_annotations

And clang catches a deadlock:
https://github.com/david-marchand/dpdk/actions/runs/6082842080/job/16501450978#step:19:816
  

Patch

diff --git a/lib/eal/common/eal_common_dynmem.c b/lib/eal/common/eal_common_dynmem.c
index bdbbe233a0..0d5da40096 100644
--- a/lib/eal/common/eal_common_dynmem.c
+++ b/lib/eal/common/eal_common_dynmem.c
@@ -251,7 +251,10 @@  eal_dynmem_hugepage_init(void)
 		 */
 		memset(&dummy, 0, sizeof(dummy));
 		dummy.hugepage_sz = hpi->hugepage_sz;
-		if (rte_memseg_list_walk(hugepage_count_walk, &dummy) < 0)
+		/*  memory_hotplug_lock is taken in rte_eal_init(), so it's
+		 *  safe to call thread-unsafe version.
+		 */
+		if (rte_memseg_list_walk_thread_unsafe(hugepage_count_walk, &dummy) < 0)
 			return -1;
 
 		for (i = 0; i < RTE_DIM(dummy.num_pages); i++) {
diff --git a/lib/eal/include/generic/rte_rwlock.h b/lib/eal/include/generic/rte_rwlock.h
index 9e083bbc61..c98fc7d083 100644
--- a/lib/eal/include/generic/rte_rwlock.h
+++ b/lib/eal/include/generic/rte_rwlock.h
@@ -80,6 +80,10 @@  rte_rwlock_init(rte_rwlock_t *rwl)
 /**
  * Take a read lock. Loop until the lock is held.
  *
+ * @note The RW lock isn't recursive, so calling this function on the same
+ * lock twice without releasing it could potentially result in a deadlock
+ * scenario when a write lock is involved.
+ *
  * @param rwl
  *   A pointer to a rwlock structure.
  */
diff --git a/lib/eal/linux/eal_memalloc.c b/lib/eal/linux/eal_memalloc.c
index f8b1588cae..3705b41f5f 100644
--- a/lib/eal/linux/eal_memalloc.c
+++ b/lib/eal/linux/eal_memalloc.c
@@ -1740,7 +1740,10 @@  eal_memalloc_init(void)
 		eal_get_internal_configuration();
 
 	if (rte_eal_process_type() == RTE_PROC_SECONDARY)
-		if (rte_memseg_list_walk(secondary_msl_create_walk, NULL) < 0)
+		/*  memory_hotplug_lock is taken in rte_eal_init(), so it's
+		 *  safe to call thread-unsafe version.
+		 */
+		if (rte_memseg_list_walk_thread_unsafe(secondary_msl_create_walk, NULL) < 0)
 			return -1;
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY &&
 			internal_conf->in_memory) {
@@ -1778,7 +1781,7 @@  eal_memalloc_init(void)
 	}
 
 	/* initialize all of the fd lists */
-	if (rte_memseg_list_walk(fd_list_create_walk, NULL))
+	if (rte_memseg_list_walk_thread_unsafe(fd_list_create_walk, NULL))
 		return -1;
 	return 0;
 }