[v3,2/6] mem: use address hint for mapping hugepages

Message ID 1538743527-8285-3-git-send-email-alejandro.lucero@netronome.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series use IOVAs check based on DMA mask |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Alejandro Lucero Oct. 5, 2018, 12:45 p.m. UTC
  Linux kernel uses a really high address as starting address for
serving mmaps calls. If there exist addressing limitations and
IOVA mode is VA, this starting address is likely too high for
those devices. However, it is possible to use a lower address in
the process virtual address space as with 64 bits there is a lot
of available space.

This patch adds an address hint as starting address for 64 bits
systems and increments the hint for next invocations. If the mmap
call does not use the hint address, repeat the mmap call using
the hint address incremented by page size.

Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
 lib/librte_eal/common/eal_common_memory.c | 34 ++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)
  

Comments

Dariusz Stojaczyk Oct. 29, 2018, 4:08 p.m. UTC | #1
On Fri, Oct 5, 2018 at 2:47 PM Alejandro Lucero
<alejandro.lucero@netronome.com> wrote:
>
> Linux kernel uses a really high address as starting address for
> serving mmaps calls. If there exist addressing limitations and
> IOVA mode is VA, this starting address is likely too high for
> those devices. However, it is possible to use a lower address in
> the process virtual address space as with 64 bits there is a lot
> of available space.
>
> This patch adds an address hint as starting address for 64 bits
> systems and increments the hint for next invocations. If the mmap
> call does not use the hint address, repeat the mmap call using
> the hint address incremented by page size.
>
> Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
> ---
>  lib/librte_eal/common/eal_common_memory.c | 34 ++++++++++++++++++++++++++++++-
>  1 file changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
> index c482f0d..853c44c 100644
> --- a/lib/librte_eal/common/eal_common_memory.c
> +++ b/lib/librte_eal/common/eal_common_memory.c
> @@ -37,6 +37,23 @@
>  static void *next_baseaddr;
>  static uint64_t system_page_sz;
>
> +#ifdef RTE_ARCH_64
> +/*
> + * Linux kernel uses a really high address as starting address for serving
> + * mmaps calls. If there exists addressing limitations and IOVA mode is VA,
> + * this starting address is likely too high for those devices. However, it
> + * is possible to use a lower address in the process virtual address space
> + * as with 64 bits there is a lot of available space.
> + *
> + * Current known limitations are 39 or 40 bits. Setting the starting address
> + * at 4GB implies there are 508GB or 1020GB for mapping the available
> + * hugepages. This is likely enough for most systems, although a device with
> + * addressing limitations should call rte_eal_check_dma_mask for ensuring all
> + * memory is within supported range.
> + */
> +static uint64_t baseaddr = 0x100000000;
> +#endif

This breaks running with ASAN unless a custom --base-virtaddr option
is specified. The default base-virtaddr introduced by this patch falls
into an area that's already reserved by ASAN.

See here: https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm
The only available address space starts at 0x10007fff8000, which
unfortunately doesn't fit in 39 bits.

Right now the very first eal_get_virtual_area() in EAL initialization
is used with 4KB pagesize, meaning that DPDK will try to mmap at each
4KB-aligned offset all the way from 0x100000000 to 0x10007fff8000,
which takes quite a long, long time.

I'm not sure about the solution to this problem, but I verify that
starting DPDK 18.11-rc1 with `--base-virtaddr 0x200000000000` works
just fine under ASAN.

D.

>
> <snip>
  
Alejandro Lucero Oct. 29, 2018, 4:40 p.m. UTC | #2
Hi Dariousz,

On Mon, Oct 29, 2018 at 4:08 PM Dariusz Stojaczyk <darek.stojaczyk@gmail.com>
wrote:

> On Fri, Oct 5, 2018 at 2:47 PM Alejandro Lucero
> <alejandro.lucero@netronome.com> wrote:
> >
> > Linux kernel uses a really high address as starting address for
> > serving mmaps calls. If there exist addressing limitations and
> > IOVA mode is VA, this starting address is likely too high for
> > those devices. However, it is possible to use a lower address in
> > the process virtual address space as with 64 bits there is a lot
> > of available space.
> >
> > This patch adds an address hint as starting address for 64 bits
> > systems and increments the hint for next invocations. If the mmap
> > call does not use the hint address, repeat the mmap call using
> > the hint address incremented by page size.
> >
> > Signed-off-by: Alejandro Lucero <alejandro.lucero@netronome.com>
> > Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > ---
> >  lib/librte_eal/common/eal_common_memory.c | 34
> ++++++++++++++++++++++++++++++-
> >  1 file changed, 33 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_eal/common/eal_common_memory.c
> b/lib/librte_eal/common/eal_common_memory.c
> > index c482f0d..853c44c 100644
> > --- a/lib/librte_eal/common/eal_common_memory.c
> > +++ b/lib/librte_eal/common/eal_common_memory.c
> > @@ -37,6 +37,23 @@
> >  static void *next_baseaddr;
> >  static uint64_t system_page_sz;
> >
> > +#ifdef RTE_ARCH_64
> > +/*
> > + * Linux kernel uses a really high address as starting address for
> serving
> > + * mmaps calls. If there exists addressing limitations and IOVA mode is
> VA,
> > + * this starting address is likely too high for those devices. However,
> it
> > + * is possible to use a lower address in the process virtual address
> space
> > + * as with 64 bits there is a lot of available space.
> > + *
> > + * Current known limitations are 39 or 40 bits. Setting the starting
> address
> > + * at 4GB implies there are 508GB or 1020GB for mapping the available
> > + * hugepages. This is likely enough for most systems, although a device
> with
> > + * addressing limitations should call rte_eal_check_dma_mask for
> ensuring all
> > + * memory is within supported range.
> > + */
> > +static uint64_t baseaddr = 0x100000000;
> > +#endif
>
> This breaks running with ASAN unless a custom --base-virtaddr option
> is specified. The default base-virtaddr introduced by this patch falls
> into an area that's already reserved by ASAN.
>
> See here:
> https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm
> The only available address space starts at 0x10007fff8000, which
> unfortunately doesn't fit in 39 bits.
>
> Right now the very first eal_get_virtual_area() in EAL initialization
> is used with 4KB pagesize, meaning that DPDK will try to mmap at each
> 4KB-aligned offset all the way from 0x100000000 to 0x10007fff8000,
> which takes quite a long, long time.
>
> I'm not sure about the solution to this problem, but I verify that
> starting DPDK 18.11-rc1 with `--base-virtaddr 0x200000000000` works
> just fine under ASAN.
>
>
Do we have documentation about using Address Sanitizer?
I understand the goal but, which is the cost? Do you have numbers about the
impact on performance?

Solving this is not trivial. I would say someone interested in this but
using a hardware with addressing limitations needs to choose.
Could it be possible to modify the virtual addresses used by default? I
guess the shadow regions can be higher that the default ones.




> D.
>
> >
> > <snip>
>
  

Patch

diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index c482f0d..853c44c 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -37,6 +37,23 @@ 
 static void *next_baseaddr;
 static uint64_t system_page_sz;
 
+#ifdef RTE_ARCH_64
+/*
+ * Linux kernel uses a really high address as starting address for serving
+ * mmaps calls. If there exists addressing limitations and IOVA mode is VA,
+ * this starting address is likely too high for those devices. However, it
+ * is possible to use a lower address in the process virtual address space
+ * as with 64 bits there is a lot of available space.
+ *
+ * Current known limitations are 39 or 40 bits. Setting the starting address
+ * at 4GB implies there are 508GB or 1020GB for mapping the available
+ * hugepages. This is likely enough for most systems, although a device with
+ * addressing limitations should call rte_eal_check_dma_mask for ensuring all
+ * memory is within supported range.
+ */
+static uint64_t baseaddr = 0x100000000;
+#endif
+
 void *
 eal_get_virtual_area(void *requested_addr, size_t *size,
 		size_t page_sz, int flags, int mmap_flags)
@@ -60,6 +77,11 @@ 
 			rte_eal_process_type() == RTE_PROC_PRIMARY)
 		next_baseaddr = (void *) internal_config.base_virtaddr;
 
+#ifdef RTE_ARCH_64
+	if (next_baseaddr == NULL && internal_config.base_virtaddr == 0 &&
+			rte_eal_process_type() == RTE_PROC_PRIMARY)
+		next_baseaddr = (void *) baseaddr;
+#endif
 	if (requested_addr == NULL && next_baseaddr != NULL) {
 		requested_addr = next_baseaddr;
 		requested_addr = RTE_PTR_ALIGN(requested_addr, page_sz);
@@ -91,7 +113,17 @@ 
 				mmap_flags, -1, 0);
 		if (mapped_addr == MAP_FAILED && allow_shrink)
 			*size -= page_sz;
-	} while (allow_shrink && mapped_addr == MAP_FAILED && *size > 0);
+
+		if (mapped_addr != MAP_FAILED && addr_is_hint &&
+		    mapped_addr != requested_addr) {
+			/* hint was not used. Try with another offset */
+			munmap(mapped_addr, map_sz);
+			mapped_addr = MAP_FAILED;
+			next_baseaddr = RTE_PTR_ADD(next_baseaddr, page_sz);
+			requested_addr = next_baseaddr;
+		}
+	} while ((allow_shrink || addr_is_hint) &&
+		 mapped_addr == MAP_FAILED && *size > 0);
 
 	/* align resulting address - if map failed, we will ignore the value
 	 * anyway, so no need to add additional checks.