[v7] enhance NUMA affinity heuristic

Message ID 20230523025004.192071-1-kaisenx.you@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [v7] enhance NUMA affinity heuristic |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation success Compilation OK
ci/github-robot: build success github build: passed
ci/intel-Functional success Functional PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/intel-Testing success Testing PASS

Commit Message

Kaisen You May 23, 2023, 2:50 a.m. UTC
  When a DPDK application is started on only one numa node, memory is 
allocated for only one socket. When interrupt threads use memory, 
memory may not be found on the socket where the interrupt thread 
is currently located, and memory has to be reallocated on the hugepage, 
this operation will lead to performance degradation.

Fixes: 705356f0811f ("eal: simplify control thread creation")
Fixes: 770d41bf3309 ("malloc: fix allocation with unknown socket ID")
Cc: stable@dpdk.org

Signed-off-by: Kaisen You <kaisenx.you@intel.com>
---
Changes since v6:
- New explanation for easy understanding,

Changes since v5:
- Add comments to the code,

Changes since v4:
- mod the patch title,

Changes since v3:
- add the assignment of socket_id in thread initialization,

Changes since v2:
- add uncommitted local change and fix compilation,

Changes since v1:
- accomodate for configurations with main lcore running on multiples
  physical cores belonging to different numa,
---
 lib/eal/common/eal_common_thread.c | 6 ++++++
 lib/eal/common/malloc_heap.c       | 9 +++++++++
 2 files changed, 15 insertions(+)
  

Comments

Burakov, Anatoly May 23, 2023, 10:44 a.m. UTC | #1
On 5/23/2023 3:50 AM, Kaisen You wrote:
> When a DPDK application is started on only one numa node, memory is
> allocated for only one socket. When interrupt threads use memory,
> memory may not be found on the socket where the interrupt thread
> is currently located, and memory has to be reallocated on the hugepage,
> this operation will lead to performance degradation.
> 
> Fixes: 705356f0811f ("eal: simplify control thread creation")
> Fixes: 770d41bf3309 ("malloc: fix allocation with unknown socket ID")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Kaisen You <kaisenx.you@intel.com>

Hi You,

I've suggested comment rewordings based on my understanding of the issue.

> ---
> Changes since v6:
> - New explanation for easy understanding,
> 
> Changes since v5:
> - Add comments to the code,
> 
> Changes since v4:
> - mod the patch title,
> 
> Changes since v3:
> - add the assignment of socket_id in thread initialization,
> 
> Changes since v2:
> - add uncommitted local change and fix compilation,
> 
> Changes since v1:
> - accomodate for configurations with main lcore running on multiples
>    physical cores belonging to different numa,
> ---
>   lib/eal/common/eal_common_thread.c | 6 ++++++
>   lib/eal/common/malloc_heap.c       | 9 +++++++++
>   2 files changed, 15 insertions(+)
> 
> diff --git a/lib/eal/common/eal_common_thread.c b/lib/eal/common/eal_common_thread.c
> index 079a385630..6479b66da1 100644
> --- a/lib/eal/common/eal_common_thread.c
> +++ b/lib/eal/common/eal_common_thread.c
> @@ -252,6 +252,12 @@ static int ctrl_thread_init(void *arg)
>   	struct rte_thread_ctrl_params *params = arg;
>   
>   	__rte_thread_init(rte_lcore_id(), cpuset);
> +	/* set the value of the per-core variable _socket_id to SOCKET_ID_ANY.
> +	 * Satisfy the judgment condition when threads find memory.
> +	 * If SOCKET_ID_ANY is not specified, the thread may go to a node with
> +	 * unallocated memory in a subsequent memory search.

I suggest a different comment wording:

Set control thread socket ID to SOCKET_ID_ANY as control threads may be 
scheduled on any NUMA node.

> +	 */
> +	RTE_PER_LCORE(_socket_id) = SOCKET_ID_ANY;
>   	params->ret = rte_thread_set_affinity_by_id(rte_thread_self(), cpuset);
>   	if (params->ret != 0) {
>   		__atomic_store_n(&params->ctrl_thread_status,
> diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c
> index d25bdc98f9..6d37f8afee 100644
> --- a/lib/eal/common/malloc_heap.c
> +++ b/lib/eal/common/malloc_heap.c
> @@ -716,6 +716,15 @@ malloc_get_numa_socket(void)
>   		if (conf->socket_mem[socket_id] != 0)
>   			return socket_id;
>   	}
> +	/* Trying to allocate memory on the main lcore numa node.
> +	 * especially when the DPDK application is started only on one numa node.
> +	 */

I suggest the following comment wording:

We couldn't find quickly find a NUMA node where memory was available, so 
fall back to using main lcore socket ID.

> +	socket_id = rte_lcore_to_socket_id(rte_get_main_lcore());
> +	/* When the socket_id obtained in the main lcore numa is SOCKET_ID_ANY,
> +	 * The probability of finding memory on rte_socket_id_by_idx(0) is higher.
> +	 */

I suggest the following comment wording:

Main lcore socket ID may be SOCKET_ID_ANY in cases when main lcore 
thread is affinitized to multiple NUMA nodes.

> +	if (socket_id != (unsigned int)SOCKET_ID_ANY)
> +		return socket_id;
>   

I suggest adding comment here:

Failed to find meaningful socket ID, so just use the first one available.

>   	return rte_socket_id_by_idx(0);
>   }

I believe these comments offer better explanation as to why we are doing 
the things we do here.

Whether or not you decide to take these corrections on board,

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
  
Burakov, Anatoly May 23, 2023, 12:45 p.m. UTC | #2
On 5/23/2023 3:50 AM, Kaisen You wrote:
> When a DPDK application is started on only one numa node, memory is
> allocated for only one socket. When interrupt threads use memory,
> memory may not be found on the socket where the interrupt thread
> is currently located, and memory has to be reallocated on the hugepage,
> this operation will lead to performance degradation.
> 
> Fixes: 705356f0811f ("eal: simplify control thread creation")
> Fixes: 770d41bf3309 ("malloc: fix allocation with unknown socket ID")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Kaisen You <kaisenx.you@intel.com>
> ---

For the record, I still think that this is a solution for a problem that 
should be fixed elsewhere, because a DPDK lcore (even main lcore!) 
having a specific NUMA node affinity is one of the most fundamental 
assumptions about DPDK, and I feel like we're inviting problems if we 
allow lcores to have multiple NUMA node affinities.

For example, if I run DPDK test app with the following command-line:

--lcores "1@(1,29),2@(30)"

The malloc autotest will fail because main lcore now returns -1 when 
we're calling `rte_socket_id()` from it. Correspondigly, any API's that 
use `rte_socket_id()` internally for various purposes (especially 
indexing arrays!) will now have to account for the fact that 
`rte_socket_id()` can just return -1 and it is not an exceptional situation.

IMO if we want to keep this behavior, EAL should at least warn the user 
that a DPDK lcore was assigned SOCKET_ID_ANY on account of multiple NUMA 
nodes being in its cpuset. So, as an unrealted change (so, i'm not 
suggesting doing it in this specific patchset), I would suggest that 
`thread_update_affinity()` should warn about DPDK lcore being assigned 
socket ID like that.
  
Kaisen You May 26, 2023, 6:44 a.m. UTC | #3
> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: 2023年5月23日 18:45
> To: You, KaisenX <kaisenx.you@intel.com>; dev@dpdk.org
> Cc: Zhou, YidingX <yidingx.zhou@intel.com>; thomas@monjalon.net;
> david.marchand@redhat.com; Matz, Olivier <olivier.matz@6wind.com>;
> ferruh.yigit@amd.com; zhoumin@loongson.cn; stable@dpdk.org
> Subject: Re: [PATCH v7] enhance NUMA affinity heuristic
> 
> On 5/23/2023 3:50 AM, Kaisen You wrote:
> > When a DPDK application is started on only one numa node, memory is
> > allocated for only one socket. When interrupt threads use memory,
> > memory may not be found on the socket where the interrupt thread is
> > currently located, and memory has to be reallocated on the hugepage,
> > this operation will lead to performance degradation.
> >
> > Fixes: 705356f0811f ("eal: simplify control thread creation")
> > Fixes: 770d41bf3309 ("malloc: fix allocation with unknown socket ID")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Kaisen You <kaisenx.you@intel.com>
> 
> Hi You,
> 
> I've suggested comment rewordings based on my understanding of the issue.
> 
> > ---
> > Changes since v6:
> > - New explanation for easy understanding,
> >
> > Changes since v5:
> > - Add comments to the code,
> >
> > Changes since v4:
> > - mod the patch title,
> >
> > Changes since v3:
> > - add the assignment of socket_id in thread initialization,
> >
> > Changes since v2:
> > - add uncommitted local change and fix compilation,
> >
> > Changes since v1:
> > - accomodate for configurations with main lcore running on multiples
> >    physical cores belonging to different numa,
> > ---
> >   lib/eal/common/eal_common_thread.c | 6 ++++++
> >   lib/eal/common/malloc_heap.c       | 9 +++++++++
> >   2 files changed, 15 insertions(+)
> >
> > diff --git a/lib/eal/common/eal_common_thread.c
> > b/lib/eal/common/eal_common_thread.c
> > index 079a385630..6479b66da1 100644
> > --- a/lib/eal/common/eal_common_thread.c
> > +++ b/lib/eal/common/eal_common_thread.c
> > @@ -252,6 +252,12 @@ static int ctrl_thread_init(void *arg)
> >   	struct rte_thread_ctrl_params *params = arg;
> >
> >   	__rte_thread_init(rte_lcore_id(), cpuset);
> > +	/* set the value of the per-core variable _socket_id to
> SOCKET_ID_ANY.
> > +	 * Satisfy the judgment condition when threads find memory.
> > +	 * If SOCKET_ID_ANY is not specified, the thread may go to a node
> with
> > +	 * unallocated memory in a subsequent memory search.
> 
> I suggest a different comment wording:
> 
> Set control thread socket ID to SOCKET_ID_ANY as control threads may be
> scheduled on any NUMA node.
> 
> > +	 */
> > +	RTE_PER_LCORE(_socket_id) = SOCKET_ID_ANY;
> >   	params->ret = rte_thread_set_affinity_by_id(rte_thread_self(),
> cpuset);
> >   	if (params->ret != 0) {
> >   		__atomic_store_n(&params->ctrl_thread_status,
> > diff --git a/lib/eal/common/malloc_heap.c
> > b/lib/eal/common/malloc_heap.c index d25bdc98f9..6d37f8afee 100644
> > --- a/lib/eal/common/malloc_heap.c
> > +++ b/lib/eal/common/malloc_heap.c
> > @@ -716,6 +716,15 @@ malloc_get_numa_socket(void)
> >   		if (conf->socket_mem[socket_id] != 0)
> >   			return socket_id;
> >   	}
> > +	/* Trying to allocate memory on the main lcore numa node.
> > +	 * especially when the DPDK application is started only on one numa
> node.
> > +	 */
> 
> I suggest the following comment wording:
> 
> We couldn't find quickly find a NUMA node where memory was available, so
> fall back to using main lcore socket ID.
> 
> > +	socket_id = rte_lcore_to_socket_id(rte_get_main_lcore());
> > +	/* When the socket_id obtained in the main lcore numa is
> SOCKET_ID_ANY,
> > +	 * The probability of finding memory on rte_socket_id_by_idx(0) is
> higher.
> > +	 */
> 
> I suggest the following comment wording:
> 
> Main lcore socket ID may be SOCKET_ID_ANY in cases when main lcore
> thread is affinitized to multiple NUMA nodes.
> 
> > +	if (socket_id != (unsigned int)SOCKET_ID_ANY)
> > +		return socket_id;
> >
> 
> I suggest adding comment here:
> 
> Failed to find meaningful socket ID, so just use the first one available.
> 
> >   	return rte_socket_id_by_idx(0);
> >   }
> 
> I believe these comments offer better explanation as to why we are doing
> the things we do here.
> 
> Whether or not you decide to take these corrections on board,
> 
> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

Thank you for your acked and suggestions, I will adopt your suggestions in the V8 version.
> 
> --
> Thanks,
> Anatoly
  

Patch

diff --git a/lib/eal/common/eal_common_thread.c b/lib/eal/common/eal_common_thread.c
index 079a385630..6479b66da1 100644
--- a/lib/eal/common/eal_common_thread.c
+++ b/lib/eal/common/eal_common_thread.c
@@ -252,6 +252,12 @@  static int ctrl_thread_init(void *arg)
 	struct rte_thread_ctrl_params *params = arg;
 
 	__rte_thread_init(rte_lcore_id(), cpuset);
+	/* set the value of the per-core variable _socket_id to SOCKET_ID_ANY.
+	 * Satisfy the judgment condition when threads find memory.
+	 * If SOCKET_ID_ANY is not specified, the thread may go to a node with
+	 * unallocated memory in a subsequent memory search.
+	 */
+	RTE_PER_LCORE(_socket_id) = SOCKET_ID_ANY;
 	params->ret = rte_thread_set_affinity_by_id(rte_thread_self(), cpuset);
 	if (params->ret != 0) {
 		__atomic_store_n(&params->ctrl_thread_status,
diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c
index d25bdc98f9..6d37f8afee 100644
--- a/lib/eal/common/malloc_heap.c
+++ b/lib/eal/common/malloc_heap.c
@@ -716,6 +716,15 @@  malloc_get_numa_socket(void)
 		if (conf->socket_mem[socket_id] != 0)
 			return socket_id;
 	}
+	/* Trying to allocate memory on the main lcore numa node.
+	 * especially when the DPDK application is started only on one numa node.
+	 */
+	socket_id = rte_lcore_to_socket_id(rte_get_main_lcore());
+	/* When the socket_id obtained in the main lcore numa is SOCKET_ID_ANY,
+	 * The probability of finding memory on rte_socket_id_by_idx(0) is higher.
+	 */
+	if (socket_id != (unsigned int)SOCKET_ID_ANY)
+		return socket_id;
 
 	return rte_socket_id_by_idx(0);
 }