eal: fix eal init may failed when too much continuous memsegs under legacy mode

Message ID 20230516122108.38617-1-changfengnan@bytedance.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series eal: fix eal init may failed when too much continuous memsegs under legacy mode |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/loongarch-compilation success Compilation OK
ci/loongarch-unit-testing success Unit Testing PASS
ci/Intel-compilation fail Compilation issues
ci/github-robot: build success github build: passed
ci/intel-Testing fail Testing issues
ci/iol-mellanox-Performance success Performance Testing PASS
ci/intel-Functional success Functional PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/iol-unit-testing success Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS

Commit Message

Fengnan Chang May 16, 2023, 12:21 p.m. UTC
  Under legacy mode, if the number of continuous memsegs greater
than RTE_MAX_MEMSEG_PER_LIST, eal init will failed even though
another memseg list is empty, because only one memseg list used
to check in remap_needed_hugepages.

For example:
hugepage configure:
20480
13370
7110

startup log:
EAL: Detected memory type: socket_id:0 hugepage_sz:2097152
EAL: Detected memory type: socket_id:1 hugepage_sz:2097152
EAL: Creating 4 segment lists: n_segs:8192 socket_id:0 hugepage_sz:2097152
EAL: Creating 4 segment lists: n_segs:8192 socket_id:1 hugepage_sz:2097152
EAL: Requesting 13370 pages of size 2MB from socket 0
EAL: Requesting 7110 pages of size 2MB from socket 1
EAL: Attempting to map 14220M on socket 1
EAL: Allocated 14220M on socket 1
EAL: Attempting to map 26740M on socket 0
EAL: Could not find space for memseg. Please increase 32768 and/or 65536 in
configuration.
EAL: Couldn't remap hugepage files into memseg lists
EAL: FATAL: Cannot init memory
EAL: Cannot init memory

Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
Signed-off-by: Lin Li <lilintjpu@bytedance.com>
---
 lib/eal/linux/eal_memory.c | 2 ++
 1 file changed, 2 insertions(+)
  

Comments

Burakov, Anatoly May 20, 2023, 3:03 p.m. UTC | #1
Hi,

On 5/16/2023 1:21 PM, Fengnan Chang wrote:
> Under legacy mode, if the number of continuous memsegs greater
> than RTE_MAX_MEMSEG_PER_LIST, eal init will failed even though
> another memseg list is empty, because only one memseg list used
> to check in remap_needed_hugepages.
> 
> For example:
> hugepage configure:
> 20480
> 13370
> 7110
> 
> startup log:
> EAL: Detected memory type: socket_id:0 hugepage_sz:2097152
> EAL: Detected memory type: socket_id:1 hugepage_sz:2097152
> EAL: Creating 4 segment lists: n_segs:8192 socket_id:0 hugepage_sz:2097152
> EAL: Creating 4 segment lists: n_segs:8192 socket_id:1 hugepage_sz:2097152
> EAL: Requesting 13370 pages of size 2MB from socket 0
> EAL: Requesting 7110 pages of size 2MB from socket 1
> EAL: Attempting to map 14220M on socket 1
> EAL: Allocated 14220M on socket 1
> EAL: Attempting to map 26740M on socket 0
> EAL: Could not find space for memseg. Please increase 32768 and/or 65536 in
> configuration.

Unrelated, but this is probably a wrong message, this should've called 
out the config options to change, not their values. Sounds like a log 
message needs fixing somewhere...

> EAL: Couldn't remap hugepage files into memseg lists
> EAL: FATAL: Cannot init memory
> EAL: Cannot init memory
> 
> Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
> Signed-off-by: Lin Li <lilintjpu@bytedance.com>
> ---
>   lib/eal/linux/eal_memory.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c
> index 60fc8cc6ca..36b9e78f5f 100644
> --- a/lib/eal/linux/eal_memory.c
> +++ b/lib/eal/linux/eal_memory.c
> @@ -1001,6 +1001,8 @@ remap_needed_hugepages(struct hugepage_file *hugepages, int n_pages)
>   		if (cur->size == 0)
>   			break;
>   
> +		if (cur_page - seg_start_page >= RTE_MAX_MEMSEG_PER_LIST)
> +			new_memseg = 1;

I don't think this is quite right, because technically, 
`RTE_MAX_MEMSEG_PER_LIST` is only applied to smaller page size segment 
lists - larger page sizes segment lists will hit their limits earlier. 
So, while this will work for 2MB pages, it won't work for page sizes 
which segment list length is smaller than the maximum (such as 1GB pages).

I think this solution could be improved upon by trying to break up the 
contiguous area instead. I suspect the core of the issue is not even the 
fact that we're exceeding limits of one memseg list, but that we're 
always attempting to map exactly N pages in `remap_hugepages`, which 
results in us leaving large contiguous zones inside memseg lists unused 
because we couldn't satisfy current allocation request and skipped to a 
new memseg list.

For example, let's suppose we found a large contiguous area that 
would've exceeded limits of current memseg list. Sooner or later, this 
contiguous area will end, and we'll attempt to remap this virtual area 
into a memseg list. Whenever that happens, we call into the remap code, 
which will start with first segment, attempt to find exactly N number of 
free spots, fail to do so, and skip to the next segment list.

Thus, sooner or later, if we get contiguous areas that are large enough, 
we will not populate our memseg lists but instead skip through them, and 
start with a new memseg list every time we need a large contiguous area. 
We prioritize having a large contiguous area over using up all of our 
memory map.

If, instead, we could break up the allocation - that is, use 
`rte_fbarray_find_biggest_free()` instead of 
`rte_fbarray_find_next_n_free()`, and keep doing it until we run out of 
segment lists, we will achieve the same result your patch does, but have 
it work for all page sizes, because now we would be targeting the actual 
issue (under-utilization of memseg lists), not its symptoms (exceeding 
segment list limits for large allocations).

This logic could either be inside `remap_hugepages`, or we could just 
return number of pages mapped from `remap_hugepages`, and have the 
calling code (`remap_needed_hugepages`) try again, this time with a 
different start segment, reflecting how much pages we actually mapped. 
IMO this would be easier to implement, as `remap_hugepages` is overly 
complex as it is!

>   		if (cur_page == 0)
>   			new_memseg = 1;
>   		else if (cur->socket_id != prev->socket_id)
  
Fengnan Chang May 22, 2023, 12:09 p.m. UTC | #2
Burakov, Anatoly <anatoly.burakov@intel.com> 于2023年5月20日周六 23:03写道:
>
> Hi,
>
> On 5/16/2023 1:21 PM, Fengnan Chang wrote:
> > Under legacy mode, if the number of continuous memsegs greater
> > than RTE_MAX_MEMSEG_PER_LIST, eal init will failed even though
> > another memseg list is empty, because only one memseg list used
> > to check in remap_needed_hugepages.
> >
> > For example:
> > hugepage configure:
> > 20480
> > 13370
> > 7110
> >
> > startup log:
> > EAL: Detected memory type: socket_id:0 hugepage_sz:2097152
> > EAL: Detected memory type: socket_id:1 hugepage_sz:2097152
> > EAL: Creating 4 segment lists: n_segs:8192 socket_id:0 hugepage_sz:2097152
> > EAL: Creating 4 segment lists: n_segs:8192 socket_id:1 hugepage_sz:2097152
> > EAL: Requesting 13370 pages of size 2MB from socket 0
> > EAL: Requesting 7110 pages of size 2MB from socket 1
> > EAL: Attempting to map 14220M on socket 1
> > EAL: Allocated 14220M on socket 1
> > EAL: Attempting to map 26740M on socket 0
> > EAL: Could not find space for memseg. Please increase 32768 and/or 65536 in
> > configuration.
>
> Unrelated, but this is probably a wrong message, this should've called
> out the config options to change, not their values. Sounds like a log
> message needs fixing somewhere...

In the older version, the log is:
EAL: Could not find space for memseg. Please increase
CONFIG_RTE_MAX_MEMSEG_PER_TYPE and/or CONFIG_RTE_MAX_MEM_PER_TYPE in
configuration.
Maybe it's better ?

>
> > EAL: Couldn't remap hugepage files into memseg lists
> > EAL: FATAL: Cannot init memory
> > EAL: Cannot init memory
> >
> > Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
> > Signed-off-by: Lin Li <lilintjpu@bytedance.com>
> > ---
> >   lib/eal/linux/eal_memory.c | 2 ++
> >   1 file changed, 2 insertions(+)
> >
> > diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c
> > index 60fc8cc6ca..36b9e78f5f 100644
> > --- a/lib/eal/linux/eal_memory.c
> > +++ b/lib/eal/linux/eal_memory.c
> > @@ -1001,6 +1001,8 @@ remap_needed_hugepages(struct hugepage_file *hugepages, int n_pages)
> >               if (cur->size == 0)
> >                       break;
> >
> > +             if (cur_page - seg_start_page >= RTE_MAX_MEMSEG_PER_LIST)
> > +                     new_memseg = 1;
>
> I don't think this is quite right, because technically,
> `RTE_MAX_MEMSEG_PER_LIST` is only applied to smaller page size segment
> lists - larger page sizes segment lists will hit their limits earlier.
> So, while this will work for 2MB pages, it won't work for page sizes
> which segment list length is smaller than the maximum (such as 1GB pages).
>
> I think this solution could be improved upon by trying to break up the
> contiguous area instead. I suspect the core of the issue is not even the
> fact that we're exceeding limits of one memseg list, but that we're
> always attempting to map exactly N pages in `remap_hugepages`, which
> results in us leaving large contiguous zones inside memseg lists unused
> because we couldn't satisfy current allocation request and skipped to a
> new memseg list.

Correct, I didn't consider 1GB pages case, I get your point.
Thanks.
>
> For example, let's suppose we found a large contiguous area that
> would've exceeded limits of current memseg list. Sooner or later, this
> contiguous area will end, and we'll attempt to remap this virtual area
> into a memseg list. Whenever that happens, we call into the remap code,
> which will start with first segment, attempt to find exactly N number of
> free spots, fail to do so, and skip to the next segment list.
>
> Thus, sooner or later, if we get contiguous areas that are large enough,
> we will not populate our memseg lists but instead skip through them, and
> start with a new memseg list every time we need a large contiguous area.
> We prioritize having a large contiguous area over using up all of our
> memory map.
>
> If, instead, we could break up the allocation - that is, use
> `rte_fbarray_find_biggest_free()` instead of
> `rte_fbarray_find_next_n_free()`, and keep doing it until we run out of
> segment lists, we will achieve the same result your patch does, but have
> it work for all page sizes, because now we would be targeting the actual
> issue (under-utilization of memseg lists), not its symptoms (exceeding
> segment list limits for large allocations).
>
> This logic could either be inside `remap_hugepages`, or we could just
> return number of pages mapped from `remap_hugepages`, and have the
> calling code (`remap_needed_hugepages`) try again, this time with a
> different start segment, reflecting how much pages we actually mapped.
> IMO this would be easier to implement, as `remap_hugepages` is overly
> complex as it is!
>
> >               if (cur_page == 0)
> >                       new_memseg = 1;
> >               else if (cur->socket_id != prev->socket_id)
>
> --
> Thanks,
> Anatoly
>
  
Burakov, Anatoly May 22, 2023, 12:43 p.m. UTC | #3
On 5/22/2023 1:09 PM, Fengnan Chang wrote:
> Burakov, Anatoly <anatoly.burakov@intel.com> 于2023年5月20日周六 23:03写道:
>>
>> Hi,
>>
>> On 5/16/2023 1:21 PM, Fengnan Chang wrote:
>>> Under legacy mode, if the number of continuous memsegs greater
>>> than RTE_MAX_MEMSEG_PER_LIST, eal init will failed even though
>>> another memseg list is empty, because only one memseg list used
>>> to check in remap_needed_hugepages.
>>>
>>> For example:
>>> hugepage configure:
>>> 20480
>>> 13370
>>> 7110
>>>
>>> startup log:
>>> EAL: Detected memory type: socket_id:0 hugepage_sz:2097152
>>> EAL: Detected memory type: socket_id:1 hugepage_sz:2097152
>>> EAL: Creating 4 segment lists: n_segs:8192 socket_id:0 hugepage_sz:2097152
>>> EAL: Creating 4 segment lists: n_segs:8192 socket_id:1 hugepage_sz:2097152
>>> EAL: Requesting 13370 pages of size 2MB from socket 0
>>> EAL: Requesting 7110 pages of size 2MB from socket 1
>>> EAL: Attempting to map 14220M on socket 1
>>> EAL: Allocated 14220M on socket 1
>>> EAL: Attempting to map 26740M on socket 0
>>> EAL: Could not find space for memseg. Please increase 32768 and/or 65536 in
>>> configuration.
>>
>> Unrelated, but this is probably a wrong message, this should've called
>> out the config options to change, not their values. Sounds like a log
>> message needs fixing somewhere...
> 
> In the older version, the log is:
> EAL: Could not find space for memseg. Please increase
> CONFIG_RTE_MAX_MEMSEG_PER_TYPE and/or CONFIG_RTE_MAX_MEM_PER_TYPE in
> configuration.
> Maybe it's better ?
> 

It was better indeed, but this seems to have to do with the fact that 
RTE_STR doesn't treat valid macros like a string, it expands them.

So, e.g. `RTE_STR(INVALID_MACRO)` gets translated to `INVALID_MACRO` 
string, while `RTE_STR(RTE_MAX_LCORE)` gets translated to whatever 
actual value `RTE_MAX_LCORE` has (128 in my case).

Using `_RTE_STR` in place of `RTE_STR` fixes the issue, but i'm not sure 
if it's the right solution, as `_RTE_STR` is prefixed with an 
underscore, which implies it's not to be used directly.

+Thomas thoughts?
  

Patch

diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c
index 60fc8cc6ca..36b9e78f5f 100644
--- a/lib/eal/linux/eal_memory.c
+++ b/lib/eal/linux/eal_memory.c
@@ -1001,6 +1001,8 @@  remap_needed_hugepages(struct hugepage_file *hugepages, int n_pages)
 		if (cur->size == 0)
 			break;
 
+		if (cur_page - seg_start_page >= RTE_MAX_MEMSEG_PER_LIST)
+			new_memseg = 1;
 		if (cur_page == 0)
 			new_memseg = 1;
 		else if (cur->socket_id != prev->socket_id)