memory: do not use base-virtaddr in secondary processes

Message ID 1529351589-173939-1-git-send-email-dariuszx.stojaczyk@intel.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series memory: do not use base-virtaddr in secondary processes |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail apply issues

Commit Message

Stojaczyk, DariuszX June 18, 2018, 7:53 p.m. UTC
  Since secondary process' address space is highly dictated
by the primary process' mappings, it doesn't make much
sense to use base-virtaddr for secondary processes.

This patch is intended to fix PCI resource mapping
in secondary processes using the same base-virtaddr
as their primary processes. PCI uses the end of the hugepage
memory area to map all resources. [pci_find_max_end_va()]
It works for primary processes, but can't be mapped 1:1
by secondary ones, as the same addresses are currently always
occupied by shadow memseg lists, which were created with
eal_get_virtual_area(NULL, ...).

```
PRIMARY PROCESS
0x6e00e00000    388K rw-s- fbarray_memseg-2048k-1-3
0x6e01000000 16777216K r----   [ anon ]
0x7201000000     16K rw-s- resource0

SECONDARY PROCESS
0x6e00e00000    388K rw-s- fbarray_memseg-2048k-1-3
0x6e01000000 16777216K r----   [ anon ]
0x7201000000      4K rw-s- fbarray_memseg-1048576k-0-0_203213
```

Fixes: 524e43c2ad9a ("mem: prepare memseg lists for multiprocess sync")
Cc: anatoly.burakov@intel.com
Cc: stable@dpdk.org

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
---
 lib/librte_eal/common/eal_common_memory.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
  

Comments

Alejandro Lucero June 18, 2018, 5:21 p.m. UTC | #1
On Mon, Jun 18, 2018 at 8:53 PM, Dariusz Stojaczyk <
dariuszx.stojaczyk@intel.com> wrote:

> Since secondary process' address space is highly dictated
> by the primary process' mappings, it doesn't make much
> sense to use base-virtaddr for secondary processes.
>
> This patch is intended to fix PCI resource mapping
> in secondary processes using the same base-virtaddr
> as their primary processes. PCI uses the end of the hugepage
> memory area to map all resources. [pci_find_max_end_va()]
> It works for primary processes, but can't be mapped 1:1
> by secondary ones, as the same addresses are currently always
> occupied by shadow memseg lists, which were created with
> eal_get_virtual_area(NULL, ...).
>
>
Should not be better to handle these allocations being aware about the
problem for secondary processes?

I do not know exactly what are the (other) reasons behind base-virtaddr,
but it turns out NFP requires this to be used when DPDK apps executed by
non-root users.

I'm working on a RFC for handling our specific case, that could also be
required for other devices, and this change would make the NFP unusable for
the secondary processes.



> ```
> PRIMARY PROCESS
> 0x6e00e00000    388K rw-s- fbarray_memseg-2048k-1-3
> 0x6e01000000 16777216K r----   [ anon ]
> 0x7201000000     16K rw-s- resource0
>
> SECONDARY PROCESS
> 0x6e00e00000    388K rw-s- fbarray_memseg-2048k-1-3
> 0x6e01000000 16777216K r----   [ anon ]
> 0x7201000000      4K rw-s- fbarray_memseg-1048576k-0-0_203213
> ```
>
> Fixes: 524e43c2ad9a ("mem: prepare memseg lists for multiprocess sync")
> Cc: anatoly.burakov@intel.com
> Cc: stable@dpdk.org
>
> Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
> ---
>  lib/librte_eal/common/eal_common_memory.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_eal/common/eal_common_memory.c
> b/lib/librte_eal/common/eal_common_memory.c
> index 53d9b51..6c47e11 100644
> --- a/lib/librte_eal/common/eal_common_memory.c
> +++ b/lib/librte_eal/common/eal_common_memory.c
> @@ -56,7 +56,8 @@ eal_get_virtual_area(void *requested_addr, size_t *size,
>         allow_shrink = (flags & EAL_VIRTUAL_AREA_ALLOW_SHRINK) > 0;
>         unmap = (flags & EAL_VIRTUAL_AREA_UNMAP) > 0;
>
> -       if (next_baseaddr == NULL && internal_config.base_virtaddr != 0)
> +       if (next_baseaddr == NULL && internal_config.base_virtaddr != 0 &&
> +                       rte_eal_process_type() == RTE_PROC_PRIMARY)
>                 next_baseaddr = (void *) internal_config.base_virtaddr;
>
>         if (requested_addr == NULL && next_baseaddr) {
> --
> 2.7.4
>
>
  
Stojaczyk, DariuszX June 18, 2018, 7:03 p.m. UTC | #2
> -----Original Message-----
> From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> Sent: Monday, June 18, 2018 7:22 PM
> 
> Should not be better to handle these allocations being aware about the
> problem for secondary processes?
> 
> I do not know exactly what are the (other) reasons behind base-virtaddr,
> but it turns out NFP requires this to be used when DPDK apps executed
> by non-root users.
> 
> I'm working on a RFC for handling our specific case, that could also be
> required for other devices, and this change would make the NFP unusable
> for the secondary processes.
> 

The only place base-virtaddr is used in secondary processes in DPDK 18.05 is this shadow memseg mapping, which shouldn't really need to be accessed by anyone else than DPDK EAL. Can you point me out to an NFP guide or some code that describes this in more detail?

If we're talking about base-virtaddr for hugepages, then that's always inherited from the primary process, regardless of what base-virtaddr is supplied to the secondary.

Regards!
D.
  
Alejandro Lucero June 18, 2018, 7:33 p.m. UTC | #3
On Mon, Jun 18, 2018 at 8:03 PM, Stojaczyk, DariuszX <
dariuszx.stojaczyk@intel.com> wrote:

>
> > -----Original Message-----
> > From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> > Sent: Monday, June 18, 2018 7:22 PM
> >
> > Should not be better to handle these allocations being aware about the
> > problem for secondary processes?
> >
> > I do not know exactly what are the (other) reasons behind base-virtaddr,
> > but it turns out NFP requires this to be used when DPDK apps executed
> > by non-root users.
> >
> > I'm working on a RFC for handling our specific case, that could also be
> > required for other devices, and this change would make the NFP unusable
> > for the secondary processes.
> >
>
> The only place base-virtaddr is used in secondary processes in DPDK 18.05
> is this shadow memseg mapping, which shouldn't really need to be accessed
> by anyone else than DPDK EAL.


Yes, I'm aware this is EAL code.

Can you point me out to an NFP guide or some code that describes this in
> more detail?
>

As I said, I'm working on a RFC. I will send something shortly. But I could
give you an advance: the hugepages needs to be mapped below certain virtual
address, 1TB, and I'm afraid that includes the primary and also the
secondary processes. At least if any process can send or receive packets
to/from a NFP.


>
> If we're talking about base-virtaddr for hugepages, then that's always
> inherited from the primary process, regardless of what base-virtaddr is
> supplied to the secondary.
>
>
But, is not your patch avoiding to use that base-virtaddr for secondary
processes?


> Regards!
> D.
>
  
Stojaczyk, DariuszX June 18, 2018, 8:12 p.m. UTC | #4
> -----Original Message-----
> From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
> Sent: Monday, June 18, 2018 9:33 PM
> 
> On Mon, Jun 18, 2018 at 8:03 PM, Stojaczyk, DariuszX
> <dariuszx.stojaczyk@intel.com <mailto:dariuszx.stojaczyk@intel.com> >
> wrote:
> 
> 	Can you point me out to an NFP guide or some code that describes
> this in more detail?
> 
> 
> 
> As I said, I'm working on a RFC. I will send something shortly. But I could give
> you an advance: the hugepages needs to be mapped below certain virtual
> address, 1TB, and I'm afraid that includes the primary and also the
> secondary processes. At least if any process can send or receive packets
> to/from a NFP.
> 
> 

Thanks, I'm pretty sure we're safe, then.

> 
> 	If we're talking about base-virtaddr for hugepages, then that's always
> inherited from the primary process, regardless of what base-virtaddr is
> supplied to the secondary.
> 
> 
> 
> 
> But, is not your patch avoiding to use that base-virtaddr for secondary
> processes?

I see now that the patch name is slightly misleading. Maybe I shouldn’t pick such a catchy title. Let me clarify: As of DPDK 18.05, --base-virtaddr param for secondary process applications only affects that shadow memseg metadata that's not useful for anyone, but can still do a lot of harm. Hugepage memory in secondary processes is always mapped to the same addresses the primary process uses.

D.
  
Burakov, Anatoly June 19, 2018, 9:21 a.m. UTC | #5
On 18-Jun-18 8:53 PM, Dariusz Stojaczyk wrote:
> Since secondary process' address space is highly dictated
> by the primary process' mappings, it doesn't make much
> sense to use base-virtaddr for secondary processes.
> 
> This patch is intended to fix PCI resource mapping
> in secondary processes using the same base-virtaddr
> as their primary processes. PCI uses the end of the hugepage
> memory area to map all resources. [pci_find_max_end_va()]
> It works for primary processes, but can't be mapped 1:1
> by secondary ones, as the same addresses are currently always
> occupied by shadow memseg lists, which were created with
> eal_get_virtual_area(NULL, ...).
> 
> ```
> PRIMARY PROCESS
> 0x6e00e00000    388K rw-s- fbarray_memseg-2048k-1-3
> 0x6e01000000 16777216K r----   [ anon ]
> 0x7201000000     16K rw-s- resource0
> 
> SECONDARY PROCESS
> 0x6e00e00000    388K rw-s- fbarray_memseg-2048k-1-3
> 0x6e01000000 16777216K r----   [ anon ]
> 0x7201000000      4K rw-s- fbarray_memseg-1048576k-0-0_203213
> ```
> 
> Fixes: 524e43c2ad9a ("mem: prepare memseg lists for multiprocess sync")
> Cc: anatoly.burakov@intel.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
  
Burakov, Anatoly June 19, 2018, 9:24 a.m. UTC | #6
On 18-Jun-18 9:12 PM, Stojaczyk, DariuszX wrote:
> 
> 
>> -----Original Message-----
>> From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
>> Sent: Monday, June 18, 2018 9:33 PM
>>
>> On Mon, Jun 18, 2018 at 8:03 PM, Stojaczyk, DariuszX
>> <dariuszx.stojaczyk@intel.com <mailto:dariuszx.stojaczyk@intel.com> >
>> wrote:
>>
>> 	Can you point me out to an NFP guide or some code that describes
>> this in more detail?
>>
>>
>>
>> As I said, I'm working on a RFC. I will send something shortly. But I could give
>> you an advance: the hugepages needs to be mapped below certain virtual
>> address, 1TB, and I'm afraid that includes the primary and also the
>> secondary processes. At least if any process can send or receive packets
>> to/from a NFP.
>>
>>
> 
> Thanks, I'm pretty sure we're safe, then.
> 
>>
>> 	If we're talking about base-virtaddr for hugepages, then that's always
>> inherited from the primary process, regardless of what base-virtaddr is
>> supplied to the secondary.
>>
>>
>>
>>
>> But, is not your patch avoiding to use that base-virtaddr for secondary
>> processes?
> 
> I see now that the patch name is slightly misleading. Maybe I shouldn’t pick such a catchy title. Let me clarify: As of DPDK 18.05, --base-virtaddr param for secondary process applications only affects that shadow memseg metadata that's not useful for anyone, but can still do a lot of harm. Hugepage memory in secondary processes is always mapped to the same addresses the primary process uses.
> 
> D.
> 

Hi Alejandro,

To solve this problem, one possible approach would be to have maximum VA 
address, and allocate memory downwards, rather than upwards. Is that by 
any chance approximate contents of your RFC? :)
  
Alejandro Lucero June 19, 2018, 10:23 a.m. UTC | #7
On Tue, Jun 19, 2018 at 10:24 AM, Burakov, Anatoly <
anatoly.burakov@intel.com> wrote:

> On 18-Jun-18 9:12 PM, Stojaczyk, DariuszX wrote:
>
>>
>>
>> -----Original Message-----
>>> From: Alejandro Lucero [mailto:alejandro.lucero@netronome.com]
>>> Sent: Monday, June 18, 2018 9:33 PM
>>>
>>> On Mon, Jun 18, 2018 at 8:03 PM, Stojaczyk, DariuszX
>>> <dariuszx.stojaczyk@intel.com <mailto:dariuszx.stojaczyk@intel.com> >
>>> wrote:
>>>
>>>         Can you point me out to an NFP guide or some code that describes
>>> this in more detail?
>>>
>>>
>>>
>>> As I said, I'm working on a RFC. I will send something shortly. But I
>>> could give
>>> you an advance: the hugepages needs to be mapped below certain virtual
>>> address, 1TB, and I'm afraid that includes the primary and also the
>>> secondary processes. At least if any process can send or receive packets
>>> to/from a NFP.
>>>
>>>
>>>
>> Thanks, I'm pretty sure we're safe, then.
>>
>>
>>>         If we're talking about base-virtaddr for hugepages, then that's
>>> always
>>> inherited from the primary process, regardless of what base-virtaddr is
>>> supplied to the secondary.
>>>
>>>
>>>
>>>
>>> But, is not your patch avoiding to use that base-virtaddr for secondary
>>> processes?
>>>
>>
>> I see now that the patch name is slightly misleading. Maybe I shouldn’t
>> pick such a catchy title. Let me clarify: As of DPDK 18.05, --base-virtaddr
>> param for secondary process applications only affects that shadow memseg
>> metadata that's not useful for anyone, but can still do a lot of harm.
>> Hugepage memory in secondary processes is always mapped to the same
>> addresses the primary process uses.
>>
>> D.
>>
>>
> Hi Alejandro,
>
> To solve this problem, one possible approach would be to have maximum VA
> address, and allocate memory downwards, rather than upwards. Is that by any
> chance approximate contents of your RFC? :)
>
>
Hi Anatoly,

There's a lot of space below 1TB, but this specific upper limit is just in
the NFP case. My RFC will propose a generic solution assuming there could
be other devices in the future, and not just NIC devices, which could have
also some sort of limitation. The problem is, those devices limitations can
not be known at memory initialization time, so some sort of check is
required afterwards, once the devices have been proved and initialised.


> --
> Thanks,
> Anatoly
>
  
Burakov, Anatoly June 19, 2018, 10:27 a.m. UTC | #8
On 19-Jun-18 11:23 AM, Alejandro Lucero wrote:
> 
> 
> On Tue, Jun 19, 2018 at 10:24 AM, Burakov, Anatoly 
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> 
>     On 18-Jun-18 9:12 PM, Stojaczyk, DariuszX wrote:
> 
> 
> 
>             -----Original Message-----
>             From: Alejandro Lucero
>             [mailto:alejandro.lucero@netronome.com
>             <mailto:alejandro.lucero@netronome.com>]
>             Sent: Monday, June 18, 2018 9:33 PM
> 
>             On Mon, Jun 18, 2018 at 8:03 PM, Stojaczyk, DariuszX
>             <dariuszx.stojaczyk@intel.com
>             <mailto:dariuszx.stojaczyk@intel.com>
>             <mailto:dariuszx.stojaczyk@intel.com
>             <mailto:dariuszx.stojaczyk@intel.com>> >
>             wrote:
> 
>                      Can you point me out to an NFP guide or some code
>             that describes
>             this in more detail?
> 
> 
> 
>             As I said, I'm working on a RFC. I will send something
>             shortly. But I could give
>             you an advance: the hugepages needs to be mapped below
>             certain virtual
>             address, 1TB, and I'm afraid that includes the primary and
>             also the
>             secondary processes. At least if any process can send or
>             receive packets
>             to/from a NFP.
> 
> 
> 
>         Thanks, I'm pretty sure we're safe, then.
> 
> 
>                      If we're talking about base-virtaddr for hugepages,
>             then that's always
>             inherited from the primary process, regardless of what
>             base-virtaddr is
>             supplied to the secondary.
> 
> 
> 
> 
>             But, is not your patch avoiding to use that base-virtaddr
>             for secondary
>             processes?
> 
> 
>         I see now that the patch name is slightly misleading. Maybe I
>         shouldn’t pick such a catchy title. Let me clarify: As of DPDK
>         18.05, --base-virtaddr param for secondary process applications
>         only affects that shadow memseg metadata that's not useful for
>         anyone, but can still do a lot of harm. Hugepage memory in
>         secondary processes is always mapped to the same addresses the
>         primary process uses.
> 
>         D.
> 
> 
>     Hi Alejandro,
> 
>     To solve this problem, one possible approach would be to have
>     maximum VA address, and allocate memory downwards, rather than
>     upwards. Is that by any chance approximate contents of your RFC? :)
> 
> 
> Hi Anatoly,
> 
> There's a lot of space below 1TB, but this specific upper limit is just 
> in the NFP case. My RFC will propose a generic solution assuming there 
> could be other devices in the future, and not just NIC devices, which 
> could have also some sort of limitation.

OK, looking forward to it.

> The problem is, those devices 
> limitations can not be known at memory initialization time, so some sort 
> of check is required afterwards, once the devices have been proved and 
> initialised.
> 

Presumably, device driver would be in the best position to know things 
like that, no? Maybe a new device flag and an API to query such things 
from all devices?
  
Alejandro Lucero June 19, 2018, 11:48 a.m. UTC | #9
On Tue, Jun 19, 2018 at 11:27 AM, Burakov, Anatoly <
anatoly.burakov@intel.com> wrote:

> On 19-Jun-18 11:23 AM, Alejandro Lucero wrote:
>
>>
>>
>> On Tue, Jun 19, 2018 at 10:24 AM, Burakov, Anatoly <
>> anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
>>
>>     On 18-Jun-18 9:12 PM, Stojaczyk, DariuszX wrote:
>>
>>
>>
>>             -----Original Message-----
>>             From: Alejandro Lucero
>>             [mailto:alejandro.lucero@netronome.com
>>             <mailto:alejandro.lucero@netronome.com>]
>>             Sent: Monday, June 18, 2018 9:33 PM
>>
>>             On Mon, Jun 18, 2018 at 8:03 PM, Stojaczyk, DariuszX
>>             <dariuszx.stojaczyk@intel.com
>>             <mailto:dariuszx.stojaczyk@intel.com>
>>             <mailto:dariuszx.stojaczyk@intel.com
>>
>>             <mailto:dariuszx.stojaczyk@intel.com>> >
>>             wrote:
>>
>>                      Can you point me out to an NFP guide or some code
>>             that describes
>>             this in more detail?
>>
>>
>>
>>             As I said, I'm working on a RFC. I will send something
>>             shortly. But I could give
>>             you an advance: the hugepages needs to be mapped below
>>             certain virtual
>>             address, 1TB, and I'm afraid that includes the primary and
>>             also the
>>             secondary processes. At least if any process can send or
>>             receive packets
>>             to/from a NFP.
>>
>>
>>
>>         Thanks, I'm pretty sure we're safe, then.
>>
>>
>>                      If we're talking about base-virtaddr for hugepages,
>>             then that's always
>>             inherited from the primary process, regardless of what
>>             base-virtaddr is
>>             supplied to the secondary.
>>
>>
>>
>>
>>             But, is not your patch avoiding to use that base-virtaddr
>>             for secondary
>>             processes?
>>
>>
>>         I see now that the patch name is slightly misleading. Maybe I
>>         shouldn’t pick such a catchy title. Let me clarify: As of DPDK
>>         18.05, --base-virtaddr param for secondary process applications
>>         only affects that shadow memseg metadata that's not useful for
>>         anyone, but can still do a lot of harm. Hugepage memory in
>>         secondary processes is always mapped to the same addresses the
>>         primary process uses.
>>
>>         D.
>>
>>
>>     Hi Alejandro,
>>
>>     To solve this problem, one possible approach would be to have
>>     maximum VA address, and allocate memory downwards, rather than
>>     upwards. Is that by any chance approximate contents of your RFC? :)
>>
>>
>> Hi Anatoly,
>>
>> There's a lot of space below 1TB, but this specific upper limit is just
>> in the NFP case. My RFC will propose a generic solution assuming there
>> could be other devices in the future, and not just NIC devices, which could
>> have also some sort of limitation.
>>
>
> OK, looking forward to it.
>
> The problem is, those devices limitations can not be known at memory
>> initialization time, so some sort of check is required afterwards, once the
>> devices have been proved and initialised.
>>
>>
> Presumably, device driver would be in the best position to know things
> like that, no? Maybe a new device flag and an API to query such things from
> all devices?
>
>
Right. I plan to add a per-device DMA mask, just mimicking what the kernel
has.


> --
> Thanks,
> Anatoly
>
  
Thomas Monjalon July 12, 2018, 10:08 p.m. UTC | #10
19/06/2018 11:21, Burakov, Anatoly:
> On 18-Jun-18 8:53 PM, Dariusz Stojaczyk wrote:
> > Since secondary process' address space is highly dictated
> > by the primary process' mappings, it doesn't make much
> > sense to use base-virtaddr for secondary processes.
> > 
> > This patch is intended to fix PCI resource mapping
> > in secondary processes using the same base-virtaddr
> > as their primary processes. PCI uses the end of the hugepage
> > memory area to map all resources. [pci_find_max_end_va()]
> > It works for primary processes, but can't be mapped 1:1
> > by secondary ones, as the same addresses are currently always
> > occupied by shadow memseg lists, which were created with
> > eal_get_virtual_area(NULL, ...).
> > 
> > ```
> > PRIMARY PROCESS
> > 0x6e00e00000    388K rw-s- fbarray_memseg-2048k-1-3
> > 0x6e01000000 16777216K r----   [ anon ]
> > 0x7201000000     16K rw-s- resource0
> > 
> > SECONDARY PROCESS
> > 0x6e00e00000    388K rw-s- fbarray_memseg-2048k-1-3
> > 0x6e01000000 16777216K r----   [ anon ]
> > 0x7201000000      4K rw-s- fbarray_memseg-1048576k-0-0_203213
> > ```
> > 
> > Fixes: 524e43c2ad9a ("mem: prepare memseg lists for multiprocess sync")
> > Cc: anatoly.burakov@intel.com
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
> > ---
> 
> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>

Applied, thanks
  

Patch

diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c
index 53d9b51..6c47e11 100644
--- a/lib/librte_eal/common/eal_common_memory.c
+++ b/lib/librte_eal/common/eal_common_memory.c
@@ -56,7 +56,8 @@  eal_get_virtual_area(void *requested_addr, size_t *size,
 	allow_shrink = (flags & EAL_VIRTUAL_AREA_ALLOW_SHRINK) > 0;
 	unmap = (flags & EAL_VIRTUAL_AREA_UNMAP) > 0;
 
-	if (next_baseaddr == NULL && internal_config.base_virtaddr != 0)
+	if (next_baseaddr == NULL && internal_config.base_virtaddr != 0 &&
+			rte_eal_process_type() == RTE_PROC_PRIMARY)
 		next_baseaddr = (void *) internal_config.base_virtaddr;
 
 	if (requested_addr == NULL && next_baseaddr) {