[v8] enhance NUMA affinity heuristic

Message ID 20230526084535.374803-1-kaisenx.you@intel.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series [v8] enhance NUMA affinity heuristic |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/github-robot: build success github build: passed
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-abi-testing success Testing PASS
ci/intel-Testing success Testing PASS
ci/iol-unit-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/intel-Functional success Functional PASS

Commit Message

Kaisen You May 26, 2023, 8:45 a.m. UTC
  When a DPDK application is started on only one numa node, memory is
allocated for only one socket. When interrupt threads use memory,
memory may not be found on the socket where the interrupt thread
is currently located, and memory has to be reallocated on the hugepage,
this operation will lead to performance degradation.

Fixes: 705356f0811f ("eal: simplify control thread creation")
Fixes: 770d41bf3309 ("malloc: fix allocation with unknown socket ID")
Cc: stable@dpdk.org

Signed-off-by: Kaisen You <kaisenx.you@intel.com>
---
Changes since v7:
- Update commet,

Changes since v6:
- New explanation for easy understanding,

Changes since v5:
- Add comments to the code,

Changes since v4:
- mod the patch title,

Changes since v3:
- add the assignment of socket_id in thread initialization,

Changes since v2:
- add uncommitted local change and fix compilation,

Changes since v1:
- accomodate for configurations with main lcore running on multiples
  physical cores belonging to different numa,
---
 lib/eal/common/eal_common_thread.c |  4 ++++
 lib/eal/common/malloc_heap.c       | 11 ++++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)
  

Comments

Burakov, Anatoly May 26, 2023, 2:44 p.m. UTC | #1
On 5/26/2023 9:45 AM, Kaisen You wrote:
> When a DPDK application is started on only one numa node, memory is
> allocated for only one socket. When interrupt threads use memory,
> memory may not be found on the socket where the interrupt thread
> is currently located, and memory has to be reallocated on the hugepage,
> this operation will lead to performance degradation.
> 
> Fixes: 705356f0811f ("eal: simplify control thread creation")
> Fixes: 770d41bf3309 ("malloc: fix allocation with unknown socket ID")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Kaisen You <kaisenx.you@intel.com>
> ---
> Changes since v7:
> - Update commet,
> 
> Changes since v6:
> - New explanation for easy understanding,
> 
> Changes since v5:
> - Add comments to the code,
> 
> Changes since v4:
> - mod the patch title,
> 
> Changes since v3:
> - add the assignment of socket_id in thread initialization,
> 
> Changes since v2:
> - add uncommitted local change and fix compilation,
> 
> Changes since v1:
> - accomodate for configurations with main lcore running on multiples
>    physical cores belonging to different numa,
> ---
>   lib/eal/common/eal_common_thread.c |  4 ++++
>   lib/eal/common/malloc_heap.c       | 11 ++++++++++-
>   2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/eal/common/eal_common_thread.c b/lib/eal/common/eal_common_thread.c
> index 079a385630..22480aa61f 100644
> --- a/lib/eal/common/eal_common_thread.c
> +++ b/lib/eal/common/eal_common_thread.c
> @@ -252,6 +252,10 @@ static int ctrl_thread_init(void *arg)
>   	struct rte_thread_ctrl_params *params = arg;
>   
>   	__rte_thread_init(rte_lcore_id(), cpuset);
> +	/* Set control thread socket ID to SOCKET_ID_ANY as control
> +	 * threads may be scheduled on any NUMA node.
> +	 */
> +	RTE_PER_LCORE(_socket_id) = SOCKET_ID_ANY;
>   	params->ret = rte_thread_set_affinity_by_id(rte_thread_self(), cpuset);
>   	if (params->ret != 0) {
>   		__atomic_store_n(&params->ctrl_thread_status,
> diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c
> index d25bdc98f9..d833a71e7a 100644
> --- a/lib/eal/common/malloc_heap.c
> +++ b/lib/eal/common/malloc_heap.c
> @@ -716,7 +716,16 @@ malloc_get_numa_socket(void)
>   		if (conf->socket_mem[socket_id] != 0)
>   			return socket_id;
>   	}
> -
> +	/* We couldn't find quickly find a NUMA node where memory was available,

typo: `find quickly find`, should probably be `quickly find`

Can be fixed on apply.

Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>

> +	 * so fall back to using main lcore socket ID.
> +	 */
> +	socket_id = rte_lcore_to_socket_id(rte_get_main_lcore());
> +	/* Main lcore socket ID may be SOCKET_ID_ANY in cases when main lcore
> +	 * thread is affinitized to multiple NUMA nodes.
> +	 */
> +	if (socket_id != (unsigned int)SOCKET_ID_ANY)
> +		return socket_id;
> +	/* Failed to find meaningful socket ID, so just use the first one available */
>   	return rte_socket_id_by_idx(0);
>   }
>
  
Stephen Hemminger May 26, 2023, 5:50 p.m. UTC | #2
On Fri, 26 May 2023 15:44:15 +0100
"Burakov, Anatoly" <anatoly.burakov@intel.com> wrote:

> > +	/* Set control thread socket ID to SOCKET_ID_ANY as control
> > +	 * threads may be scheduled on any NUMA node.
> > +	 */
> > +	RTE_PER_LCORE(_socket_id) = SOCKET_ID_ANY;

This is not always true. Read the control thread documentation.
If DPDK application is run in a cgroup with cpuset, it maybe limited differently.
  
Burakov, Anatoly May 29, 2023, 10:37 a.m. UTC | #3
On 5/26/2023 6:50 PM, Stephen Hemminger wrote:
> On Fri, 26 May 2023 15:44:15 +0100
> "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote:
> 
>>> +	/* Set control thread socket ID to SOCKET_ID_ANY as control
>>> +	 * threads may be scheduled on any NUMA node.
>>> +	 */
>>> +	RTE_PER_LCORE(_socket_id) = SOCKET_ID_ANY;
> 
> This is not always true. Read the control thread documentation.
> If DPDK application is run in a cgroup with cpuset, it maybe limited differently.

The point was more to highlight that control thread NUMA affinity is 
"undefined" (and depends on a lot of factors) rather than necessarily 
"uses all NUMA nodes". IMO the message is OK, even if technically it's 
not 100% accurate.

I mean, we could do some magic and figure out the effective NUMA node of 
a control thread, but do you think this would be worth the effort?
  
David Marchand June 1, 2023, 2:42 p.m. UTC | #4
On Fri, May 26, 2023 at 11:03 AM Kaisen You <kaisenx.you@intel.com> wrote:
>
> When a DPDK application is started on only one numa node, memory is
> allocated for only one socket. When interrupt threads use memory,
> memory may not be found on the socket where the interrupt thread
> is currently located, and memory has to be reallocated on the hugepage,
> this operation will lead to performance degradation.
>
> Fixes: 705356f0811f ("eal: simplify control thread creation")
> Fixes: 770d41bf3309 ("malloc: fix allocation with unknown socket ID")
> Cc: stable@dpdk.org

Backporting this kind of change seems risky for a LTS.
Heads up for LTS maintainers.

Anatoly, are we sure we want it backported?
  
Thomas Monjalon June 6, 2023, 2:04 p.m. UTC | #5
01/06/2023 16:42, David Marchand:
> On Fri, May 26, 2023 at 11:03 AM Kaisen You <kaisenx.you@intel.com> wrote:
> >
> > When a DPDK application is started on only one numa node, memory is
> > allocated for only one socket. When interrupt threads use memory,
> > memory may not be found on the socket where the interrupt thread
> > is currently located, and memory has to be reallocated on the hugepage,
> > this operation will lead to performance degradation.
> >
> > Fixes: 705356f0811f ("eal: simplify control thread creation")
> > Fixes: 770d41bf3309 ("malloc: fix allocation with unknown socket ID")
> > Cc: stable@dpdk.org
> 
> Backporting this kind of change seems risky for a LTS.
> Heads up for LTS maintainers.
> 
> Anatoly, are we sure we want it backported?

No answer, so I take on me to remove the backport request.

Applied with minor fix suggested by Anatoly.
Hope this patch will have no side effect.
  
Burakov, Anatoly June 12, 2023, 9:36 a.m. UTC | #6
On 6/1/2023 3:42 PM, David Marchand wrote:
> On Fri, May 26, 2023 at 11:03 AM Kaisen You <kaisenx.you@intel.com> wrote:
>>
>> When a DPDK application is started on only one numa node, memory is
>> allocated for only one socket. When interrupt threads use memory,
>> memory may not be found on the socket where the interrupt thread
>> is currently located, and memory has to be reallocated on the hugepage,
>> this operation will lead to performance degradation.
>>
>> Fixes: 705356f0811f ("eal: simplify control thread creation")
>> Fixes: 770d41bf3309 ("malloc: fix allocation with unknown socket ID")
>> Cc: stable@dpdk.org
> 
> Backporting this kind of change seems risky for a LTS.
> Heads up for LTS maintainers.
> 
> Anatoly, are we sure we want it backported?
> 
> 

Yeah, apologies, I was away on leave. Yeah I'd err on the side of not 
backporting anything unless this comes up independently.
  

Patch

diff --git a/lib/eal/common/eal_common_thread.c b/lib/eal/common/eal_common_thread.c
index 079a385630..22480aa61f 100644
--- a/lib/eal/common/eal_common_thread.c
+++ b/lib/eal/common/eal_common_thread.c
@@ -252,6 +252,10 @@  static int ctrl_thread_init(void *arg)
 	struct rte_thread_ctrl_params *params = arg;
 
 	__rte_thread_init(rte_lcore_id(), cpuset);
+	/* Set control thread socket ID to SOCKET_ID_ANY as control
+	 * threads may be scheduled on any NUMA node.
+	 */
+	RTE_PER_LCORE(_socket_id) = SOCKET_ID_ANY;
 	params->ret = rte_thread_set_affinity_by_id(rte_thread_self(), cpuset);
 	if (params->ret != 0) {
 		__atomic_store_n(&params->ctrl_thread_status,
diff --git a/lib/eal/common/malloc_heap.c b/lib/eal/common/malloc_heap.c
index d25bdc98f9..d833a71e7a 100644
--- a/lib/eal/common/malloc_heap.c
+++ b/lib/eal/common/malloc_heap.c
@@ -716,7 +716,16 @@  malloc_get_numa_socket(void)
 		if (conf->socket_mem[socket_id] != 0)
 			return socket_id;
 	}
-
+	/* We couldn't find quickly find a NUMA node where memory was available,
+	 * so fall back to using main lcore socket ID.
+	 */
+	socket_id = rte_lcore_to_socket_id(rte_get_main_lcore());
+	/* Main lcore socket ID may be SOCKET_ID_ANY in cases when main lcore
+	 * thread is affinitized to multiple NUMA nodes.
+	 */
+	if (socket_id != (unsigned int)SOCKET_ID_ANY)
+		return socket_id;
+	/* Failed to find meaningful socket ID, so just use the first one available */
 	return rte_socket_id_by_idx(0);
 }