mbox series

[00/16] Support externally allocated memory in DPDK

Message ID cover.1536064999.git.anatoly.burakov@intel.com (mailing list archive)
Headers
Series Support externally allocated memory in DPDK |

Message

Burakov, Anatoly Sept. 4, 2018, 1:11 p.m. UTC
  This is a proposal to enable using externally allocated memory
in DPDK.

In a nutshell, here is what is being done here:

- Index internal malloc heaps by NUMA node index, rather than NUMA
  node itself (external heaps will have ID's in order of creation)
- Add identifier string to malloc heap, to uniquely identify it
  - Each new heap will receive a unique socket ID that will be used by
    allocator to decide from which heap (internal or external) to
    allocate requested amount of memory
- Allow creating named heaps and add/remove memory to/from those heaps
- Allocate memseg lists at runtime, to keep track of IOVA addresses
  of externally allocated memory
  - If IOVA addresses aren't provided, use RTE_BAD_IOVA
- Allow malloc and memzones to allocate from external heaps
- Allow other data structures to allocate from externall heaps

The responsibility to ensure memory is accessible before using it is
on the shoulders of the user - there is no checking done with regards
to validity of the memory (nor could there be...).

The general approach is to create heap and add memory into it. For any
other process wishing to use the same memory, said memory must first
be attached (otherwise some things will not work).

A design decision was made to make multiprocess synchronization a
manual process. Due to underlying issues with attaching to fbarrays in
secondary processes, this design was deemed to be better because we
don't want to fail to create external heap in the primary because
something in the secondary has failed when in fact we may not eve have
wanted this memory to be accessible in the secondary in the first
place.

Using external memory in multiprocess is *hard*, because not only
memory space needs to be preallocated, but it also needs to be attached
in each process to allow other processes to access the page table. The
attach API call may or may not succeed, depending on memory layout, for
reasons similar to other multiprocess failures. This is treated as a
"known issue" for this release.

RFC -> v1 changes:
- Removed the "named heaps" API, allocate using fake socket ID instead
- Added multiprocess support
- Everything is now thread-safe
- Numerous bugfixes and API improvements

Anatoly Burakov (16):
  mem: add length to memseg list
  mem: allow memseg lists to be marked as external
  malloc: index heaps using heap ID rather than NUMA node
  mem: do not check for invalid socket ID
  flow_classify: do not check for invalid socket ID
  pipeline: do not check for invalid socket ID
  sched: do not check for invalid socket ID
  malloc: add name to malloc heaps
  malloc: add function to query socket ID of named heap
  malloc: allow creating malloc heaps
  malloc: allow destroying heaps
  malloc: allow adding memory to named heaps
  malloc: allow removing memory from named heaps
  malloc: allow attaching to external memory chunks
  malloc: allow detaching from external memory
  test: add unit tests for external memory support

 config/common_base                            |   1 +
 config/rte_config.h                           |   1 +
 drivers/bus/fslmc/fslmc_vfio.c                |   7 +-
 drivers/bus/pci/linux/pci.c                   |   2 +-
 drivers/net/mlx4/mlx4_mr.c                    |   3 +
 drivers/net/mlx5/mlx5.c                       |   5 +-
 drivers/net/mlx5/mlx5_mr.c                    |   3 +
 drivers/net/virtio/virtio_user/vhost_kernel.c |   5 +-
 lib/librte_eal/bsdapp/eal/eal.c               |   3 +
 lib/librte_eal/bsdapp/eal/eal_memory.c        |   9 +-
 lib/librte_eal/common/eal_common_memory.c     |   9 +-
 lib/librte_eal/common/eal_common_memzone.c    |   8 +-
 .../common/include/rte_eal_memconfig.h        |   6 +-
 lib/librte_eal/common/include/rte_malloc.h    | 181 +++++++++
 .../common/include/rte_malloc_heap.h          |   3 +
 lib/librte_eal/common/include/rte_memory.h    |   9 +
 lib/librte_eal/common/malloc_heap.c           | 287 +++++++++++--
 lib/librte_eal/common/malloc_heap.h           |  17 +
 lib/librte_eal/common/rte_malloc.c            | 383 ++++++++++++++++-
 lib/librte_eal/linuxapp/eal/eal.c             |   3 +
 lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  12 +-
 lib/librte_eal/linuxapp/eal/eal_memory.c      |   4 +-
 lib/librte_eal/linuxapp/eal/eal_vfio.c        |  17 +-
 lib/librte_eal/rte_eal_version.map            |   7 +
 lib/librte_flow_classify/rte_flow_classify.c  |   3 +-
 lib/librte_mempool/rte_mempool.c              |  31 +-
 lib/librte_pipeline/rte_pipeline.c            |   3 +-
 lib/librte_sched/rte_sched.c                  |   2 +-
 test/test/Makefile                            |   1 +
 test/test/autotest_data.py                    |  14 +-
 test/test/meson.build                         |   1 +
 test/test/test_external_mem.c                 | 384 ++++++++++++++++++
 test/test/test_malloc.c                       |   3 +
 test/test/test_memzone.c                      |   3 +
 34 files changed, 1346 insertions(+), 84 deletions(-)
 create mode 100644 test/test/test_external_mem.c
  

Comments

Shahaf Shuler Sept. 13, 2018, 7:44 a.m. UTC | #1
Hi Anatoly,

First thanks for the patchset, it is a great enhancement. 

See question below. 

Tuesday, September 4, 2018 4:12 PM, Anatoly Burakov:
> Subject: [dpdk-dev] [PATCH 00/16] Support externally allocated memory in
> DPDK
> 
> This is a proposal to enable using externally allocated memory in DPDK.
> 
> In a nutshell, here is what is being done here:
> 
> - Index internal malloc heaps by NUMA node index, rather than NUMA
>   node itself (external heaps will have ID's in order of creation)
> - Add identifier string to malloc heap, to uniquely identify it
>   - Each new heap will receive a unique socket ID that will be used by
>     allocator to decide from which heap (internal or external) to
>     allocate requested amount of memory
> - Allow creating named heaps and add/remove memory to/from those
> heaps
> - Allocate memseg lists at runtime, to keep track of IOVA addresses
>   of externally allocated memory
>   - If IOVA addresses aren't provided, use RTE_BAD_IOVA
> - Allow malloc and memzones to allocate from external heaps
> - Allow other data structures to allocate from externall heaps
> 
> The responsibility to ensure memory is accessible before using it is on the
> shoulders of the user - there is no checking done with regards to validity of
> the memory (nor could there be...).

That makes sense. However who should be in-charge of mapping this memory for dma access?
The user or internally be the PMD when encounter the first packet or while traversing the existing mempools? 

> 
> The general approach is to create heap and add memory into it. For any other
> process wishing to use the same memory, said memory must first be
> attached (otherwise some things will not work).
> 
> A design decision was made to make multiprocess synchronization a manual
> process. Due to underlying issues with attaching to fbarrays in secondary
> processes, this design was deemed to be better because we don't want to
> fail to create external heap in the primary because something in the
> secondary has failed when in fact we may not eve have wanted this memory
> to be accessible in the secondary in the first place.
> 
> Using external memory in multiprocess is *hard*, because not only memory
> space needs to be preallocated, but it also needs to be attached in each
> process to allow other processes to access the page table. The attach API call
> may or may not succeed, depending on memory layout, for reasons similar to
> other multiprocess failures. This is treated as a "known issue" for this release.
> 
> RFC -> v1 changes:
> - Removed the "named heaps" API, allocate using fake socket ID instead
> - Added multiprocess support
> - Everything is now thread-safe
> - Numerous bugfixes and API improvements
> 
> Anatoly Burakov (16):
>   mem: add length to memseg list
>   mem: allow memseg lists to be marked as external
>   malloc: index heaps using heap ID rather than NUMA node
>   mem: do not check for invalid socket ID
>   flow_classify: do not check for invalid socket ID
>   pipeline: do not check for invalid socket ID
>   sched: do not check for invalid socket ID
>   malloc: add name to malloc heaps
>   malloc: add function to query socket ID of named heap
>   malloc: allow creating malloc heaps
>   malloc: allow destroying heaps
>   malloc: allow adding memory to named heaps
>   malloc: allow removing memory from named heaps
>   malloc: allow attaching to external memory chunks
>   malloc: allow detaching from external memory
>   test: add unit tests for external memory support
> 
>  config/common_base                            |   1 +
>  config/rte_config.h                           |   1 +
>  drivers/bus/fslmc/fslmc_vfio.c                |   7 +-
>  drivers/bus/pci/linux/pci.c                   |   2 +-
>  drivers/net/mlx4/mlx4_mr.c                    |   3 +
>  drivers/net/mlx5/mlx5.c                       |   5 +-
>  drivers/net/mlx5/mlx5_mr.c                    |   3 +
>  drivers/net/virtio/virtio_user/vhost_kernel.c |   5 +-
>  lib/librte_eal/bsdapp/eal/eal.c               |   3 +
>  lib/librte_eal/bsdapp/eal/eal_memory.c        |   9 +-
>  lib/librte_eal/common/eal_common_memory.c     |   9 +-
>  lib/librte_eal/common/eal_common_memzone.c    |   8 +-
>  .../common/include/rte_eal_memconfig.h        |   6 +-
>  lib/librte_eal/common/include/rte_malloc.h    | 181 +++++++++
>  .../common/include/rte_malloc_heap.h          |   3 +
>  lib/librte_eal/common/include/rte_memory.h    |   9 +
>  lib/librte_eal/common/malloc_heap.c           | 287 +++++++++++--
>  lib/librte_eal/common/malloc_heap.h           |  17 +
>  lib/librte_eal/common/rte_malloc.c            | 383 ++++++++++++++++-
>  lib/librte_eal/linuxapp/eal/eal.c             |   3 +
>  lib/librte_eal/linuxapp/eal/eal_memalloc.c    |  12 +-
>  lib/librte_eal/linuxapp/eal/eal_memory.c      |   4 +-
>  lib/librte_eal/linuxapp/eal/eal_vfio.c        |  17 +-
>  lib/librte_eal/rte_eal_version.map            |   7 +
>  lib/librte_flow_classify/rte_flow_classify.c  |   3 +-
>  lib/librte_mempool/rte_mempool.c              |  31 +-
>  lib/librte_pipeline/rte_pipeline.c            |   3 +-
>  lib/librte_sched/rte_sched.c                  |   2 +-
>  test/test/Makefile                            |   1 +
>  test/test/autotest_data.py                    |  14 +-
>  test/test/meson.build                         |   1 +
>  test/test/test_external_mem.c                 | 384 ++++++++++++++++++
>  test/test/test_malloc.c                       |   3 +
>  test/test/test_memzone.c                      |   3 +
>  34 files changed, 1346 insertions(+), 84 deletions(-)  create mode 100644
> test/test/test_external_mem.c
> 
> --
> 2.17.1
  
Burakov, Anatoly Sept. 17, 2018, 10:07 a.m. UTC | #2
On 13-Sep-18 8:44 AM, Shahaf Shuler wrote:
> Hi Anatoly,
> 
> First thanks for the patchset, it is a great enhancement.
> 
> See question below.
> 
> Tuesday, September 4, 2018 4:12 PM, Anatoly Burakov:
>> Subject: [dpdk-dev] [PATCH 00/16] Support externally allocated memory in
>> DPDK
>>
>> This is a proposal to enable using externally allocated memory in DPDK.
>>
>> In a nutshell, here is what is being done here:
>>
>> - Index internal malloc heaps by NUMA node index, rather than NUMA
>>    node itself (external heaps will have ID's in order of creation)
>> - Add identifier string to malloc heap, to uniquely identify it
>>    - Each new heap will receive a unique socket ID that will be used by
>>      allocator to decide from which heap (internal or external) to
>>      allocate requested amount of memory
>> - Allow creating named heaps and add/remove memory to/from those
>> heaps
>> - Allocate memseg lists at runtime, to keep track of IOVA addresses
>>    of externally allocated memory
>>    - If IOVA addresses aren't provided, use RTE_BAD_IOVA
>> - Allow malloc and memzones to allocate from external heaps
>> - Allow other data structures to allocate from externall heaps
>>
>> The responsibility to ensure memory is accessible before using it is on the
>> shoulders of the user - there is no checking done with regards to validity of
>> the memory (nor could there be...).
> 
> That makes sense. However who should be in-charge of mapping this memory for dma access?
> The user or internally be the PMD when encounter the first packet or while traversing the existing mempools?
> 
Hi Shahaf,

There are two ways this can be solved. The first way is to perform VFIO 
mapping automatically on adding/attaching memory. The second is to force 
user to do it manually. For now, the latter is chosen because user knows 
best if they intend to do DMA on that memory, but i'm open to suggestions.

There is an issue with some devices and buses (i.e. bus/fslmc) bypassing 
EAL VFIO infrastructure and performing their own VFIO/DMA mapping magic, 
but solving that problem is outside the scope of this patchset. Those 
devices/buses should fix themselves :)

When not using VFIO, it's out of our hands anyway.
  
Shahaf Shuler Sept. 17, 2018, 12:16 p.m. UTC | #3
Monday, September 17, 2018 1:07 PM, Burakov, Anatoly:
> Subject: Re: [dpdk-dev] [PATCH 00/16] Support externally allocated memory
> in DPDK
> 
> On 13-Sep-18 8:44 AM, Shahaf Shuler wrote:

[...]

> >> The responsibility to ensure memory is accessible before using it is
> >> on the shoulders of the user - there is no checking done with regards
> >> to validity of the memory (nor could there be...).
> >
> > That makes sense. However who should be in-charge of mapping this
> memory for dma access?
> > The user or internally be the PMD when encounter the first packet or while
> traversing the existing mempools?
> >
> Hi Shahaf,
> 
> There are two ways this can be solved. The first way is to perform VFIO
> mapping automatically on adding/attaching memory. The second is to force
> user to do it manually. For now, the latter is chosen because user knows best
> if they intend to do DMA on that memory, but i'm open to suggestions.

I agree with that approach, and will add not only if the mempool is for dma or not but also which ports will use this mempool (this can effect on the mapping). 
However I don't think this is generic enough to use only VFIO. As you said, there are some devices not using VFIO for mapping rather some proprietary driver utility. 
IMO DPDK should introduce generic and device agnostic APIs to the user. 

My suggestion is instead of doing vfio_dma_map that or vfio_dma_unmap that have a generic dma_map(uint8_t port, address, len). Each driver will register with its own mapping callback (can be vfio_dma_map).
It can be outside of this series, just wondering the people opinion on such approach. 

> 
> There is an issue with some devices and buses (i.e. bus/fslmc) bypassing EAL
> VFIO infrastructure and performing their own VFIO/DMA mapping magic, but
> solving that problem is outside the scope of this patchset. Those
> devices/buses should fix themselves :)
> 
> When not using VFIO, it's out of our hands anyway.

Why? 
VFIO is not a must requirement for devices in DPDK. 

> 
> --
> Thanks,
> Anatoly
  
Burakov, Anatoly Sept. 17, 2018, 1 p.m. UTC | #4
On 17-Sep-18 1:16 PM, Shahaf Shuler wrote:
> Monday, September 17, 2018 1:07 PM, Burakov, Anatoly:
>> Subject: Re: [dpdk-dev] [PATCH 00/16] Support externally allocated memory
>> in DPDK
>>
>> On 13-Sep-18 8:44 AM, Shahaf Shuler wrote:
> 
> [...]
> 
>>>> The responsibility to ensure memory is accessible before using it is
>>>> on the shoulders of the user - there is no checking done with regards
>>>> to validity of the memory (nor could there be...).
>>>
>>> That makes sense. However who should be in-charge of mapping this
>> memory for dma access?
>>> The user or internally be the PMD when encounter the first packet or while
>> traversing the existing mempools?
>>>
>> Hi Shahaf,
>>
>> There are two ways this can be solved. The first way is to perform VFIO
>> mapping automatically on adding/attaching memory. The second is to force
>> user to do it manually. For now, the latter is chosen because user knows best
>> if they intend to do DMA on that memory, but i'm open to suggestions.
> 
> I agree with that approach, and will add not only if the mempool is for dma or not but also which ports will use this mempool (this can effect on the mapping).

That is perhaps too hardware-specific - this should probably be handled 
inside the driver callbacks.

> However I don't think this is generic enough to use only VFIO. As you said, there are some devices not using VFIO for mapping rather some proprietary driver utility.
> IMO DPDK should introduce generic and device agnostic APIs to the user.
> 
> My suggestion is instead of doing vfio_dma_map that or vfio_dma_unmap that have a generic dma_map(uint8_t port, address, len). Each driver will register with its own mapping callback (can be vfio_dma_map).
> It can be outside of this series, just wondering the people opinion on such approach.

I don't disagree. I don't like bus/net/etc drivers doing their own thing 
with regards to mapping, and i would by far prefer generic way to set up 
DMA maps, to which VFIO will be a subscriber.

> 
>>
>> There is an issue with some devices and buses (i.e. bus/fslmc) bypassing EAL
>> VFIO infrastructure and performing their own VFIO/DMA mapping magic, but
>> solving that problem is outside the scope of this patchset. Those
>> devices/buses should fix themselves :)
>>
>> When not using VFIO, it's out of our hands anyway.
> 
> Why?
> VFIO is not a must requirement for devices in DPDK.

When i say "out of our hands", what i mean to say is, currently as far 
as EAL API is concerned, there is no DMA mapping outside of VFIO.

> 
>>
>> --
>> Thanks,
>> Anatoly
  
Shreyansh Jain Sept. 18, 2018, 12:29 p.m. UTC | #5
On Monday 17 September 2018 06:30 PM, Burakov, Anatoly wrote:
> On 17-Sep-18 1:16 PM, Shahaf Shuler wrote:
>> Monday, September 17, 2018 1:07 PM, Burakov, Anatoly:
>>> Subject: Re: [dpdk-dev] [PATCH 00/16] Support externally allocated 
>>> memory
>>> in DPDK
>>>
>>> On 13-Sep-18 8:44 AM, Shahaf Shuler wrote:
>>
>> [...]
>>
>>>>> The responsibility to ensure memory is accessible before using it is
>>>>> on the shoulders of the user - there is no checking done with regards
>>>>> to validity of the memory (nor could there be...).
>>>>
>>>> That makes sense. However who should be in-charge of mapping this
>>> memory for dma access?
>>>> The user or internally be the PMD when encounter the first packet or 
>>>> while
>>> traversing the existing mempools?
>>>>
>>> Hi Shahaf,
>>>
>>> There are two ways this can be solved. The first way is to perform VFIO
>>> mapping automatically on adding/attaching memory. The second is to force
>>> user to do it manually. For now, the latter is chosen because user 
>>> knows best
>>> if they intend to do DMA on that memory, but i'm open to suggestions.
>>
>> I agree with that approach, and will add not only if the mempool is 
>> for dma or not but also which ports will use this mempool (this can 
>> effect on the mapping).
> 
> That is perhaps too hardware-specific - this should probably be handled 
> inside the driver callbacks.
> 
>> However I don't think this is generic enough to use only VFIO. As you 
>> said, there are some devices not using VFIO for mapping rather some 
>> proprietary driver utility.
>> IMO DPDK should introduce generic and device agnostic APIs to the user.
>>
>> My suggestion is instead of doing vfio_dma_map that or vfio_dma_unmap 
>> that have a generic dma_map(uint8_t port, address, len). Each driver 
>> will register with its own mapping callback (can be vfio_dma_map).
>> It can be outside of this series, just wondering the people opinion on 
>> such approach.
> 
> I don't disagree. I don't like bus/net/etc drivers doing their own thing 
> with regards to mapping, and i would by far prefer generic way to set up 
> DMA maps, to which VFIO will be a subscriber.
> 
>>
>>>
>>> There is an issue with some devices and buses (i.e. bus/fslmc) 
>>> bypassing EAL
>>> VFIO infrastructure and performing their own VFIO/DMA mapping magic, but
>>> solving that problem is outside the scope of this patchset. Those
>>> devices/buses should fix themselves :)

DMA mapping is a very common principle and can be easily be a candidate 
for lets-make-generic-movement, but, being close to hardware (or 
hardware specific), it does require the driver to have some flexibility 
in terms of its eventual implementation.

I maintain one of those drivers (bus/fslmc) in DPDK which needs to have 
special VFIO layer - and from that experience, I can say that VFIO 
mapping does require some flexibility. SoC semantics are sometimes too 
complex to pin to general-universally-agreed-standard concept. (or, one 
can easily call it a 'bug', while it is a 'feature' for others :D)

In fact, NXP has another driver (bus/dpaa) which doesn't even work with 
VFIO - loves to work directly with Phys_addr. And, it is not at a lower 
priority than one with VFIO.

Thus, I really don't think a strongly controlled VFIO mapping should be 
EAL's responsibility. Failure because of lack of mapping is a driver's 
problem.

>>>
>>> When not using VFIO, it's out of our hands anyway.
>>
>> Why?
>> VFIO is not a must requirement for devices in DPDK.
> 
> When i say "out of our hands", what i mean to say is, currently as far 
> as EAL API is concerned, there is no DMA mapping outside of VFIO.
> 
>>
>>>
>>> -- 
>>> Thanks,
>>> Anatoly
> 
>
  
Burakov, Anatoly Sept. 18, 2018, 3:15 p.m. UTC | #6
On 18-Sep-18 1:29 PM, Shreyansh Jain wrote:
> On Monday 17 September 2018 06:30 PM, Burakov, Anatoly wrote:
>> On 17-Sep-18 1:16 PM, Shahaf Shuler wrote:
>>> Monday, September 17, 2018 1:07 PM, Burakov, Anatoly:
>>>> Subject: Re: [dpdk-dev] [PATCH 00/16] Support externally allocated 
>>>> memory
>>>> in DPDK
>>>>
>>>> On 13-Sep-18 8:44 AM, Shahaf Shuler wrote:
>>>
>>> [...]
>>>
>>>>>> The responsibility to ensure memory is accessible before using it is
>>>>>> on the shoulders of the user - there is no checking done with regards
>>>>>> to validity of the memory (nor could there be...).
>>>>>
>>>>> That makes sense. However who should be in-charge of mapping this
>>>> memory for dma access?
>>>>> The user or internally be the PMD when encounter the first packet 
>>>>> or while
>>>> traversing the existing mempools?
>>>>>
>>>> Hi Shahaf,
>>>>
>>>> There are two ways this can be solved. The first way is to perform VFIO
>>>> mapping automatically on adding/attaching memory. The second is to 
>>>> force
>>>> user to do it manually. For now, the latter is chosen because user 
>>>> knows best
>>>> if they intend to do DMA on that memory, but i'm open to suggestions.
>>>
>>> I agree with that approach, and will add not only if the mempool is 
>>> for dma or not but also which ports will use this mempool (this can 
>>> effect on the mapping).
>>
>> That is perhaps too hardware-specific - this should probably be 
>> handled inside the driver callbacks.
>>
>>> However I don't think this is generic enough to use only VFIO. As you 
>>> said, there are some devices not using VFIO for mapping rather some 
>>> proprietary driver utility.
>>> IMO DPDK should introduce generic and device agnostic APIs to the user.
>>>
>>> My suggestion is instead of doing vfio_dma_map that or vfio_dma_unmap 
>>> that have a generic dma_map(uint8_t port, address, len). Each driver 
>>> will register with its own mapping callback (can be vfio_dma_map).
>>> It can be outside of this series, just wondering the people opinion 
>>> on such approach.
>>
>> I don't disagree. I don't like bus/net/etc drivers doing their own 
>> thing with regards to mapping, and i would by far prefer generic way 
>> to set up DMA maps, to which VFIO will be a subscriber.
>>
>>>
>>>>
>>>> There is an issue with some devices and buses (i.e. bus/fslmc) 
>>>> bypassing EAL
>>>> VFIO infrastructure and performing their own VFIO/DMA mapping magic, 
>>>> but
>>>> solving that problem is outside the scope of this patchset. Those
>>>> devices/buses should fix themselves :)
> 
> DMA mapping is a very common principle and can be easily be a candidate 
> for lets-make-generic-movement, but, being close to hardware (or 
> hardware specific), it does require the driver to have some flexibility 
> in terms of its eventual implementation.

Perhaps i didn't word my response clearly enough. I didn't mean to say 
(or imply) that EAL must handle all DMA mappings itself. Rather, EAL 
should provide a generic infrastructure of maintaining current mappings 
etc., and provide a subscription mechanism for other users (e.g. 
drivers) so that the details of implementation of exactly how to map 
things for DMA is up to the drivers.

In other words, we agree :)

> 
> I maintain one of those drivers (bus/fslmc) in DPDK which needs to have 
> special VFIO layer - and from that experience, I can say that VFIO 
> mapping does require some flexibility. SoC semantics are sometimes too 
> complex to pin to general-universally-agreed-standard concept. (or, one 
> can easily call it a 'bug', while it is a 'feature' for others :D)
> 
> In fact, NXP has another driver (bus/dpaa) which doesn't even work with 
> VFIO - loves to work directly with Phys_addr. And, it is not at a lower 
> priority than one with VFIO.
> 
> Thus, I really don't think a strongly controlled VFIO mapping should be 
> EAL's responsibility. Failure because of lack of mapping is a driver's 
> problem.
> 

While EAL doesn't necessarily need to be involved with mapping things 
for VFIO, i believe it does need to be the authority on what gets 
mapped. The user needs a way to make arbitrary memory available for DMA 
- this is where EAL comes in. VFIO itself can be factored out into a 
separate subsystem (DMA drivers, anyone? :D ), but given that memory 
cometh and goeth (external memory included), and given that some things 
tend to be a bit complicated [*], EAL needs to know when something is 
supposed to be mapped or unmapped, and when to notify subscribers that 
they may have to refresh their DMA maps.

[*] for example, VFIO can only do mappings whenever there are devices 
actually attached to a VFIO container, so we have to maintain all maps 
between hotplug events to ensure that memory set up for DMA doesn't 
silently get unmapped on device detach and subsequent attach.