Message ID | cover.1536064999.git.anatoly.burakov@intel.com (mailing list archive) |
---|---|
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 4108B231E; Tue, 4 Sep 2018 15:11:58 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 7399811A4 for <dev@dpdk.org>; Tue, 4 Sep 2018 15:11:56 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Sep 2018 06:11:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,329,1531810800"; d="scan'208";a="67423754" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga007.fm.intel.com with ESMTP; 04 Sep 2018 06:11:52 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w84DBpKu023344; Tue, 4 Sep 2018 14:11:51 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w84DBpIj024180; Tue, 4 Sep 2018 14:11:51 +0100 Received: (from aburakov@localhost) by sivswdev01.ir.intel.com with LOCAL id w84DBpgj024174; Tue, 4 Sep 2018 14:11:51 +0100 From: Anatoly Burakov <anatoly.burakov@intel.com> To: dev@dpdk.org Cc: laszlo.madarassy@ericsson.com, laszlo.vadkerti@ericsson.com, andras.kovacs@ericsson.com, winnie.tian@ericsson.com, daniel.andrasi@ericsson.com, janos.kobor@ericsson.com, srinath.mannam@broadcom.com, scott.branden@broadcom.com, ajit.khaparde@broadcom.com, keith.wiles@intel.com, bruce.richardson@intel.com, thomas@monjalon.net Date: Tue, 4 Sep 2018 14:11:35 +0100 Message-Id: <cover.1536064999.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 1.7.0.7 Subject: [dpdk-dev] [PATCH 00/16] Support externally allocated memory in DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Series |
Support externally allocated memory in DPDK
|
|
Message
Anatoly Burakov
Sept. 4, 2018, 1:11 p.m. UTC
This is a proposal to enable using externally allocated memory in DPDK. In a nutshell, here is what is being done here: - Index internal malloc heaps by NUMA node index, rather than NUMA node itself (external heaps will have ID's in order of creation) - Add identifier string to malloc heap, to uniquely identify it - Each new heap will receive a unique socket ID that will be used by allocator to decide from which heap (internal or external) to allocate requested amount of memory - Allow creating named heaps and add/remove memory to/from those heaps - Allocate memseg lists at runtime, to keep track of IOVA addresses of externally allocated memory - If IOVA addresses aren't provided, use RTE_BAD_IOVA - Allow malloc and memzones to allocate from external heaps - Allow other data structures to allocate from externall heaps The responsibility to ensure memory is accessible before using it is on the shoulders of the user - there is no checking done with regards to validity of the memory (nor could there be...). The general approach is to create heap and add memory into it. For any other process wishing to use the same memory, said memory must first be attached (otherwise some things will not work). A design decision was made to make multiprocess synchronization a manual process. Due to underlying issues with attaching to fbarrays in secondary processes, this design was deemed to be better because we don't want to fail to create external heap in the primary because something in the secondary has failed when in fact we may not eve have wanted this memory to be accessible in the secondary in the first place. Using external memory in multiprocess is *hard*, because not only memory space needs to be preallocated, but it also needs to be attached in each process to allow other processes to access the page table. The attach API call may or may not succeed, depending on memory layout, for reasons similar to other multiprocess failures. This is treated as a "known issue" for this release. RFC -> v1 changes: - Removed the "named heaps" API, allocate using fake socket ID instead - Added multiprocess support - Everything is now thread-safe - Numerous bugfixes and API improvements Anatoly Burakov (16): mem: add length to memseg list mem: allow memseg lists to be marked as external malloc: index heaps using heap ID rather than NUMA node mem: do not check for invalid socket ID flow_classify: do not check for invalid socket ID pipeline: do not check for invalid socket ID sched: do not check for invalid socket ID malloc: add name to malloc heaps malloc: add function to query socket ID of named heap malloc: allow creating malloc heaps malloc: allow destroying heaps malloc: allow adding memory to named heaps malloc: allow removing memory from named heaps malloc: allow attaching to external memory chunks malloc: allow detaching from external memory test: add unit tests for external memory support config/common_base | 1 + config/rte_config.h | 1 + drivers/bus/fslmc/fslmc_vfio.c | 7 +- drivers/bus/pci/linux/pci.c | 2 +- drivers/net/mlx4/mlx4_mr.c | 3 + drivers/net/mlx5/mlx5.c | 5 +- drivers/net/mlx5/mlx5_mr.c | 3 + drivers/net/virtio/virtio_user/vhost_kernel.c | 5 +- lib/librte_eal/bsdapp/eal/eal.c | 3 + lib/librte_eal/bsdapp/eal/eal_memory.c | 9 +- lib/librte_eal/common/eal_common_memory.c | 9 +- lib/librte_eal/common/eal_common_memzone.c | 8 +- .../common/include/rte_eal_memconfig.h | 6 +- lib/librte_eal/common/include/rte_malloc.h | 181 +++++++++ .../common/include/rte_malloc_heap.h | 3 + lib/librte_eal/common/include/rte_memory.h | 9 + lib/librte_eal/common/malloc_heap.c | 287 +++++++++++-- lib/librte_eal/common/malloc_heap.h | 17 + lib/librte_eal/common/rte_malloc.c | 383 ++++++++++++++++- lib/librte_eal/linuxapp/eal/eal.c | 3 + lib/librte_eal/linuxapp/eal/eal_memalloc.c | 12 +- lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +- lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 +- lib/librte_eal/rte_eal_version.map | 7 + lib/librte_flow_classify/rte_flow_classify.c | 3 +- lib/librte_mempool/rte_mempool.c | 31 +- lib/librte_pipeline/rte_pipeline.c | 3 +- lib/librte_sched/rte_sched.c | 2 +- test/test/Makefile | 1 + test/test/autotest_data.py | 14 +- test/test/meson.build | 1 + test/test/test_external_mem.c | 384 ++++++++++++++++++ test/test/test_malloc.c | 3 + test/test/test_memzone.c | 3 + 34 files changed, 1346 insertions(+), 84 deletions(-) create mode 100644 test/test/test_external_mem.c
Comments
Hi Anatoly, First thanks for the patchset, it is a great enhancement. See question below. Tuesday, September 4, 2018 4:12 PM, Anatoly Burakov: > Subject: [dpdk-dev] [PATCH 00/16] Support externally allocated memory in > DPDK > > This is a proposal to enable using externally allocated memory in DPDK. > > In a nutshell, here is what is being done here: > > - Index internal malloc heaps by NUMA node index, rather than NUMA > node itself (external heaps will have ID's in order of creation) > - Add identifier string to malloc heap, to uniquely identify it > - Each new heap will receive a unique socket ID that will be used by > allocator to decide from which heap (internal or external) to > allocate requested amount of memory > - Allow creating named heaps and add/remove memory to/from those > heaps > - Allocate memseg lists at runtime, to keep track of IOVA addresses > of externally allocated memory > - If IOVA addresses aren't provided, use RTE_BAD_IOVA > - Allow malloc and memzones to allocate from external heaps > - Allow other data structures to allocate from externall heaps > > The responsibility to ensure memory is accessible before using it is on the > shoulders of the user - there is no checking done with regards to validity of > the memory (nor could there be...). That makes sense. However who should be in-charge of mapping this memory for dma access? The user or internally be the PMD when encounter the first packet or while traversing the existing mempools? > > The general approach is to create heap and add memory into it. For any other > process wishing to use the same memory, said memory must first be > attached (otherwise some things will not work). > > A design decision was made to make multiprocess synchronization a manual > process. Due to underlying issues with attaching to fbarrays in secondary > processes, this design was deemed to be better because we don't want to > fail to create external heap in the primary because something in the > secondary has failed when in fact we may not eve have wanted this memory > to be accessible in the secondary in the first place. > > Using external memory in multiprocess is *hard*, because not only memory > space needs to be preallocated, but it also needs to be attached in each > process to allow other processes to access the page table. The attach API call > may or may not succeed, depending on memory layout, for reasons similar to > other multiprocess failures. This is treated as a "known issue" for this release. > > RFC -> v1 changes: > - Removed the "named heaps" API, allocate using fake socket ID instead > - Added multiprocess support > - Everything is now thread-safe > - Numerous bugfixes and API improvements > > Anatoly Burakov (16): > mem: add length to memseg list > mem: allow memseg lists to be marked as external > malloc: index heaps using heap ID rather than NUMA node > mem: do not check for invalid socket ID > flow_classify: do not check for invalid socket ID > pipeline: do not check for invalid socket ID > sched: do not check for invalid socket ID > malloc: add name to malloc heaps > malloc: add function to query socket ID of named heap > malloc: allow creating malloc heaps > malloc: allow destroying heaps > malloc: allow adding memory to named heaps > malloc: allow removing memory from named heaps > malloc: allow attaching to external memory chunks > malloc: allow detaching from external memory > test: add unit tests for external memory support > > config/common_base | 1 + > config/rte_config.h | 1 + > drivers/bus/fslmc/fslmc_vfio.c | 7 +- > drivers/bus/pci/linux/pci.c | 2 +- > drivers/net/mlx4/mlx4_mr.c | 3 + > drivers/net/mlx5/mlx5.c | 5 +- > drivers/net/mlx5/mlx5_mr.c | 3 + > drivers/net/virtio/virtio_user/vhost_kernel.c | 5 +- > lib/librte_eal/bsdapp/eal/eal.c | 3 + > lib/librte_eal/bsdapp/eal/eal_memory.c | 9 +- > lib/librte_eal/common/eal_common_memory.c | 9 +- > lib/librte_eal/common/eal_common_memzone.c | 8 +- > .../common/include/rte_eal_memconfig.h | 6 +- > lib/librte_eal/common/include/rte_malloc.h | 181 +++++++++ > .../common/include/rte_malloc_heap.h | 3 + > lib/librte_eal/common/include/rte_memory.h | 9 + > lib/librte_eal/common/malloc_heap.c | 287 +++++++++++-- > lib/librte_eal/common/malloc_heap.h | 17 + > lib/librte_eal/common/rte_malloc.c | 383 ++++++++++++++++- > lib/librte_eal/linuxapp/eal/eal.c | 3 + > lib/librte_eal/linuxapp/eal/eal_memalloc.c | 12 +- > lib/librte_eal/linuxapp/eal/eal_memory.c | 4 +- > lib/librte_eal/linuxapp/eal/eal_vfio.c | 17 +- > lib/librte_eal/rte_eal_version.map | 7 + > lib/librte_flow_classify/rte_flow_classify.c | 3 +- > lib/librte_mempool/rte_mempool.c | 31 +- > lib/librte_pipeline/rte_pipeline.c | 3 +- > lib/librte_sched/rte_sched.c | 2 +- > test/test/Makefile | 1 + > test/test/autotest_data.py | 14 +- > test/test/meson.build | 1 + > test/test/test_external_mem.c | 384 ++++++++++++++++++ > test/test/test_malloc.c | 3 + > test/test/test_memzone.c | 3 + > 34 files changed, 1346 insertions(+), 84 deletions(-) create mode 100644 > test/test/test_external_mem.c > > -- > 2.17.1
On 13-Sep-18 8:44 AM, Shahaf Shuler wrote: > Hi Anatoly, > > First thanks for the patchset, it is a great enhancement. > > See question below. > > Tuesday, September 4, 2018 4:12 PM, Anatoly Burakov: >> Subject: [dpdk-dev] [PATCH 00/16] Support externally allocated memory in >> DPDK >> >> This is a proposal to enable using externally allocated memory in DPDK. >> >> In a nutshell, here is what is being done here: >> >> - Index internal malloc heaps by NUMA node index, rather than NUMA >> node itself (external heaps will have ID's in order of creation) >> - Add identifier string to malloc heap, to uniquely identify it >> - Each new heap will receive a unique socket ID that will be used by >> allocator to decide from which heap (internal or external) to >> allocate requested amount of memory >> - Allow creating named heaps and add/remove memory to/from those >> heaps >> - Allocate memseg lists at runtime, to keep track of IOVA addresses >> of externally allocated memory >> - If IOVA addresses aren't provided, use RTE_BAD_IOVA >> - Allow malloc and memzones to allocate from external heaps >> - Allow other data structures to allocate from externall heaps >> >> The responsibility to ensure memory is accessible before using it is on the >> shoulders of the user - there is no checking done with regards to validity of >> the memory (nor could there be...). > > That makes sense. However who should be in-charge of mapping this memory for dma access? > The user or internally be the PMD when encounter the first packet or while traversing the existing mempools? > Hi Shahaf, There are two ways this can be solved. The first way is to perform VFIO mapping automatically on adding/attaching memory. The second is to force user to do it manually. For now, the latter is chosen because user knows best if they intend to do DMA on that memory, but i'm open to suggestions. There is an issue with some devices and buses (i.e. bus/fslmc) bypassing EAL VFIO infrastructure and performing their own VFIO/DMA mapping magic, but solving that problem is outside the scope of this patchset. Those devices/buses should fix themselves :) When not using VFIO, it's out of our hands anyway.
Monday, September 17, 2018 1:07 PM, Burakov, Anatoly: > Subject: Re: [dpdk-dev] [PATCH 00/16] Support externally allocated memory > in DPDK > > On 13-Sep-18 8:44 AM, Shahaf Shuler wrote: [...] > >> The responsibility to ensure memory is accessible before using it is > >> on the shoulders of the user - there is no checking done with regards > >> to validity of the memory (nor could there be...). > > > > That makes sense. However who should be in-charge of mapping this > memory for dma access? > > The user or internally be the PMD when encounter the first packet or while > traversing the existing mempools? > > > Hi Shahaf, > > There are two ways this can be solved. The first way is to perform VFIO > mapping automatically on adding/attaching memory. The second is to force > user to do it manually. For now, the latter is chosen because user knows best > if they intend to do DMA on that memory, but i'm open to suggestions. I agree with that approach, and will add not only if the mempool is for dma or not but also which ports will use this mempool (this can effect on the mapping). However I don't think this is generic enough to use only VFIO. As you said, there are some devices not using VFIO for mapping rather some proprietary driver utility. IMO DPDK should introduce generic and device agnostic APIs to the user. My suggestion is instead of doing vfio_dma_map that or vfio_dma_unmap that have a generic dma_map(uint8_t port, address, len). Each driver will register with its own mapping callback (can be vfio_dma_map). It can be outside of this series, just wondering the people opinion on such approach. > > There is an issue with some devices and buses (i.e. bus/fslmc) bypassing EAL > VFIO infrastructure and performing their own VFIO/DMA mapping magic, but > solving that problem is outside the scope of this patchset. Those > devices/buses should fix themselves :) > > When not using VFIO, it's out of our hands anyway. Why? VFIO is not a must requirement for devices in DPDK. > > -- > Thanks, > Anatoly
On 17-Sep-18 1:16 PM, Shahaf Shuler wrote: > Monday, September 17, 2018 1:07 PM, Burakov, Anatoly: >> Subject: Re: [dpdk-dev] [PATCH 00/16] Support externally allocated memory >> in DPDK >> >> On 13-Sep-18 8:44 AM, Shahaf Shuler wrote: > > [...] > >>>> The responsibility to ensure memory is accessible before using it is >>>> on the shoulders of the user - there is no checking done with regards >>>> to validity of the memory (nor could there be...). >>> >>> That makes sense. However who should be in-charge of mapping this >> memory for dma access? >>> The user or internally be the PMD when encounter the first packet or while >> traversing the existing mempools? >>> >> Hi Shahaf, >> >> There are two ways this can be solved. The first way is to perform VFIO >> mapping automatically on adding/attaching memory. The second is to force >> user to do it manually. For now, the latter is chosen because user knows best >> if they intend to do DMA on that memory, but i'm open to suggestions. > > I agree with that approach, and will add not only if the mempool is for dma or not but also which ports will use this mempool (this can effect on the mapping). That is perhaps too hardware-specific - this should probably be handled inside the driver callbacks. > However I don't think this is generic enough to use only VFIO. As you said, there are some devices not using VFIO for mapping rather some proprietary driver utility. > IMO DPDK should introduce generic and device agnostic APIs to the user. > > My suggestion is instead of doing vfio_dma_map that or vfio_dma_unmap that have a generic dma_map(uint8_t port, address, len). Each driver will register with its own mapping callback (can be vfio_dma_map). > It can be outside of this series, just wondering the people opinion on such approach. I don't disagree. I don't like bus/net/etc drivers doing their own thing with regards to mapping, and i would by far prefer generic way to set up DMA maps, to which VFIO will be a subscriber. > >> >> There is an issue with some devices and buses (i.e. bus/fslmc) bypassing EAL >> VFIO infrastructure and performing their own VFIO/DMA mapping magic, but >> solving that problem is outside the scope of this patchset. Those >> devices/buses should fix themselves :) >> >> When not using VFIO, it's out of our hands anyway. > > Why? > VFIO is not a must requirement for devices in DPDK. When i say "out of our hands", what i mean to say is, currently as far as EAL API is concerned, there is no DMA mapping outside of VFIO. > >> >> -- >> Thanks, >> Anatoly
On Monday 17 September 2018 06:30 PM, Burakov, Anatoly wrote: > On 17-Sep-18 1:16 PM, Shahaf Shuler wrote: >> Monday, September 17, 2018 1:07 PM, Burakov, Anatoly: >>> Subject: Re: [dpdk-dev] [PATCH 00/16] Support externally allocated >>> memory >>> in DPDK >>> >>> On 13-Sep-18 8:44 AM, Shahaf Shuler wrote: >> >> [...] >> >>>>> The responsibility to ensure memory is accessible before using it is >>>>> on the shoulders of the user - there is no checking done with regards >>>>> to validity of the memory (nor could there be...). >>>> >>>> That makes sense. However who should be in-charge of mapping this >>> memory for dma access? >>>> The user or internally be the PMD when encounter the first packet or >>>> while >>> traversing the existing mempools? >>>> >>> Hi Shahaf, >>> >>> There are two ways this can be solved. The first way is to perform VFIO >>> mapping automatically on adding/attaching memory. The second is to force >>> user to do it manually. For now, the latter is chosen because user >>> knows best >>> if they intend to do DMA on that memory, but i'm open to suggestions. >> >> I agree with that approach, and will add not only if the mempool is >> for dma or not but also which ports will use this mempool (this can >> effect on the mapping). > > That is perhaps too hardware-specific - this should probably be handled > inside the driver callbacks. > >> However I don't think this is generic enough to use only VFIO. As you >> said, there are some devices not using VFIO for mapping rather some >> proprietary driver utility. >> IMO DPDK should introduce generic and device agnostic APIs to the user. >> >> My suggestion is instead of doing vfio_dma_map that or vfio_dma_unmap >> that have a generic dma_map(uint8_t port, address, len). Each driver >> will register with its own mapping callback (can be vfio_dma_map). >> It can be outside of this series, just wondering the people opinion on >> such approach. > > I don't disagree. I don't like bus/net/etc drivers doing their own thing > with regards to mapping, and i would by far prefer generic way to set up > DMA maps, to which VFIO will be a subscriber. > >> >>> >>> There is an issue with some devices and buses (i.e. bus/fslmc) >>> bypassing EAL >>> VFIO infrastructure and performing their own VFIO/DMA mapping magic, but >>> solving that problem is outside the scope of this patchset. Those >>> devices/buses should fix themselves :) DMA mapping is a very common principle and can be easily be a candidate for lets-make-generic-movement, but, being close to hardware (or hardware specific), it does require the driver to have some flexibility in terms of its eventual implementation. I maintain one of those drivers (bus/fslmc) in DPDK which needs to have special VFIO layer - and from that experience, I can say that VFIO mapping does require some flexibility. SoC semantics are sometimes too complex to pin to general-universally-agreed-standard concept. (or, one can easily call it a 'bug', while it is a 'feature' for others :D) In fact, NXP has another driver (bus/dpaa) which doesn't even work with VFIO - loves to work directly with Phys_addr. And, it is not at a lower priority than one with VFIO. Thus, I really don't think a strongly controlled VFIO mapping should be EAL's responsibility. Failure because of lack of mapping is a driver's problem. >>> >>> When not using VFIO, it's out of our hands anyway. >> >> Why? >> VFIO is not a must requirement for devices in DPDK. > > When i say "out of our hands", what i mean to say is, currently as far > as EAL API is concerned, there is no DMA mapping outside of VFIO. > >> >>> >>> -- >>> Thanks, >>> Anatoly > >
On 18-Sep-18 1:29 PM, Shreyansh Jain wrote: > On Monday 17 September 2018 06:30 PM, Burakov, Anatoly wrote: >> On 17-Sep-18 1:16 PM, Shahaf Shuler wrote: >>> Monday, September 17, 2018 1:07 PM, Burakov, Anatoly: >>>> Subject: Re: [dpdk-dev] [PATCH 00/16] Support externally allocated >>>> memory >>>> in DPDK >>>> >>>> On 13-Sep-18 8:44 AM, Shahaf Shuler wrote: >>> >>> [...] >>> >>>>>> The responsibility to ensure memory is accessible before using it is >>>>>> on the shoulders of the user - there is no checking done with regards >>>>>> to validity of the memory (nor could there be...). >>>>> >>>>> That makes sense. However who should be in-charge of mapping this >>>> memory for dma access? >>>>> The user or internally be the PMD when encounter the first packet >>>>> or while >>>> traversing the existing mempools? >>>>> >>>> Hi Shahaf, >>>> >>>> There are two ways this can be solved. The first way is to perform VFIO >>>> mapping automatically on adding/attaching memory. The second is to >>>> force >>>> user to do it manually. For now, the latter is chosen because user >>>> knows best >>>> if they intend to do DMA on that memory, but i'm open to suggestions. >>> >>> I agree with that approach, and will add not only if the mempool is >>> for dma or not but also which ports will use this mempool (this can >>> effect on the mapping). >> >> That is perhaps too hardware-specific - this should probably be >> handled inside the driver callbacks. >> >>> However I don't think this is generic enough to use only VFIO. As you >>> said, there are some devices not using VFIO for mapping rather some >>> proprietary driver utility. >>> IMO DPDK should introduce generic and device agnostic APIs to the user. >>> >>> My suggestion is instead of doing vfio_dma_map that or vfio_dma_unmap >>> that have a generic dma_map(uint8_t port, address, len). Each driver >>> will register with its own mapping callback (can be vfio_dma_map). >>> It can be outside of this series, just wondering the people opinion >>> on such approach. >> >> I don't disagree. I don't like bus/net/etc drivers doing their own >> thing with regards to mapping, and i would by far prefer generic way >> to set up DMA maps, to which VFIO will be a subscriber. >> >>> >>>> >>>> There is an issue with some devices and buses (i.e. bus/fslmc) >>>> bypassing EAL >>>> VFIO infrastructure and performing their own VFIO/DMA mapping magic, >>>> but >>>> solving that problem is outside the scope of this patchset. Those >>>> devices/buses should fix themselves :) > > DMA mapping is a very common principle and can be easily be a candidate > for lets-make-generic-movement, but, being close to hardware (or > hardware specific), it does require the driver to have some flexibility > in terms of its eventual implementation. Perhaps i didn't word my response clearly enough. I didn't mean to say (or imply) that EAL must handle all DMA mappings itself. Rather, EAL should provide a generic infrastructure of maintaining current mappings etc., and provide a subscription mechanism for other users (e.g. drivers) so that the details of implementation of exactly how to map things for DMA is up to the drivers. In other words, we agree :) > > I maintain one of those drivers (bus/fslmc) in DPDK which needs to have > special VFIO layer - and from that experience, I can say that VFIO > mapping does require some flexibility. SoC semantics are sometimes too > complex to pin to general-universally-agreed-standard concept. (or, one > can easily call it a 'bug', while it is a 'feature' for others :D) > > In fact, NXP has another driver (bus/dpaa) which doesn't even work with > VFIO - loves to work directly with Phys_addr. And, it is not at a lower > priority than one with VFIO. > > Thus, I really don't think a strongly controlled VFIO mapping should be > EAL's responsibility. Failure because of lack of mapping is a driver's > problem. > While EAL doesn't necessarily need to be involved with mapping things for VFIO, i believe it does need to be the authority on what gets mapped. The user needs a way to make arbitrary memory available for DMA - this is where EAL comes in. VFIO itself can be factored out into a separate subsystem (DMA drivers, anyone? :D ), but given that memory cometh and goeth (external memory included), and given that some things tend to be a bit complicated [*], EAL needs to know when something is supposed to be mapped or unmapped, and when to notify subscribers that they may have to refresh their DMA maps. [*] for example, VFIO can only do mappings whenever there are devices actually attached to a VFIO container, so we have to maintain all maps between hotplug events to ensure that memory set up for DMA doesn't silently get unmapped on device detach and subsequent attach.