[v4,3/6] bus: introduce device level DMA memory mapping

Message ID 1159b0da448d794e00011c7bbfd0a99523d005e6.1552206210.git.shahafs@mellanox.com
State Accepted, archived
Delegated to: Thomas Monjalon
Headers show
Series
  • introduce DMA memory mapping for external memory
Related show

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/checkpatch success coding style OK

Commit Message

Shahaf Shuler March 10, 2019, 8:28 a.m.
The DPDK APIs expose 3 different modes to work with memory used for DMA:

1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
This memory is allocated by the DPDK libraries, included in the DPDK
memory system (memseg lists) and automatically DMA mapped by the DPDK
layers.

2. Use memory allocated by the user and register to the DPDK memory
systems. Upon registration of memory, the DPDK layers will DMA map it
to all needed devices. After registration, allocation of this memory
will be done with rte_*malloc APIs.

3. Use memory allocated by the user and not registered to the DPDK memory
system. This is for users who wants to have tight control on this
memory (e.g. avoid the rte_malloc header).
The user should create a memory, register it through rte_extmem_register
API, and call DMA map function in order to register such memory to
the different devices.

The scope of the patch focus on #3 above.

Currently the only way to map external memory is through VFIO
(rte_vfio_dma_map). While VFIO is common, there are other vendors
which use different ways to map memory (e.g. Mellanox and NXP).

The work in this patch moves the DMA mapping to vendor agnostic APIs.
Device level DMA map and unmap APIs were added. Implementation of those
APIs was done currently only for PCI devices.

For PCI bus devices, the pci driver can expose its own map and unmap
functions to be used for the mapping. In case the driver doesn't provide
any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.

Application usage with those APIs is quite simple:
* allocate memory
* call rte_extmem_register on the memory chunk.
* take a device, and query its rte_device.
* call the device specific mapping function for this device.

Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
APIs, leaving the rte device APIs as the preferred option for the user.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
---
 drivers/bus/pci/pci_common.c            | 48 ++++++++++++++++++++++++++++
 drivers/bus/pci/rte_bus_pci.h           | 40 +++++++++++++++++++++++
 lib/librte_eal/common/eal_common_dev.c  | 34 ++++++++++++++++++++
 lib/librte_eal/common/include/rte_bus.h | 44 +++++++++++++++++++++++++
 lib/librte_eal/common/include/rte_dev.h | 47 +++++++++++++++++++++++++++
 lib/librte_eal/rte_eal_version.map      |  2 ++
 6 files changed, 215 insertions(+)

Comments

Burakov, Anatoly March 11, 2019, 10:19 a.m. | #1
On 10-Mar-19 8:28 AM, Shahaf Shuler wrote:
> The DPDK APIs expose 3 different modes to work with memory used for DMA:
> 
> 1. Use the DPDK owned memory (backed by the DPDK provided hugepages).
> This memory is allocated by the DPDK libraries, included in the DPDK
> memory system (memseg lists) and automatically DMA mapped by the DPDK
> layers.
> 
> 2. Use memory allocated by the user and register to the DPDK memory
> systems. Upon registration of memory, the DPDK layers will DMA map it
> to all needed devices. After registration, allocation of this memory
> will be done with rte_*malloc APIs.
> 
> 3. Use memory allocated by the user and not registered to the DPDK memory
> system. This is for users who wants to have tight control on this
> memory (e.g. avoid the rte_malloc header).
> The user should create a memory, register it through rte_extmem_register
> API, and call DMA map function in order to register such memory to
> the different devices.
> 
> The scope of the patch focus on #3 above.
> 
> Currently the only way to map external memory is through VFIO
> (rte_vfio_dma_map). While VFIO is common, there are other vendors
> which use different ways to map memory (e.g. Mellanox and NXP).
> 
> The work in this patch moves the DMA mapping to vendor agnostic APIs.
> Device level DMA map and unmap APIs were added. Implementation of those
> APIs was done currently only for PCI devices.
> 
> For PCI bus devices, the pci driver can expose its own map and unmap
> functions to be used for the mapping. In case the driver doesn't provide
> any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
> 
> Application usage with those APIs is quite simple:
> * allocate memory
> * call rte_extmem_register on the memory chunk.
> * take a device, and query its rte_device.
> * call the device specific mapping function for this device.
> 
> Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap
> APIs, leaving the rte device APIs as the preferred option for the user.
> 
> Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
Thomas Monjalon March 13, 2019, 9:56 a.m. | #2
10/03/2019 09:28, Shahaf Shuler:
> For PCI bus devices, the pci driver can expose its own map and unmap
> functions to be used for the mapping. In case the driver doesn't provide
> any, the memory will be mapped, if possible, to IOMMU through VFIO APIs.
> 
> Application usage with those APIs is quite simple:
> * allocate memory
> * call rte_extmem_register on the memory chunk.
> * take a device, and query its rte_device.
> * call the device specific mapping function for this device.

Should we make it documented somewhere?

> +/**
> + * Device level DMA map function.
> + * After a successful call, the memory segment will be mapped to the
> + * given device.
> + *
> + * @note: Memory must be registered in advance using rte_extmem_* APIs.

Could we make more explicit that this function is part of
the "external memory API"?
Shahaf Shuler March 13, 2019, 11:12 a.m. | #3
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Wednesday, March 13, 2019 11:56 AM
> To: Shahaf Shuler <shahafs@mellanox.com>; anatoly.burakov@intel.com
> Cc: dev@dpdk.org; Yongseok Koh <yskoh@mellanox.com>;
> ferruh.yigit@intel.com; nhorman@tuxdriver.com; gaetan.rivet@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v4 3/6] bus: introduce device level DMA
> memory mapping
> 
> 10/03/2019 09:28, Shahaf Shuler:
> > For PCI bus devices, the pci driver can expose its own map and unmap
> > functions to be used for the mapping. In case the driver doesn't
> > provide any, the memory will be mapped, if possible, to IOMMU through
> VFIO APIs.
> >
> > Application usage with those APIs is quite simple:
> > * allocate memory
> > * call rte_extmem_register on the memory chunk.
> > * take a device, and query its rte_device.
> > * call the device specific mapping function for this device.
> 
> Should we make it documented somewhere?

The full flow to work w/ external memory is documented at doc/guides/prog_guide/env_abstraction_layer.rst , Subchapter "Support for Externally Allocated Memory.
The last commit in series update the right API to use. 
> 
> > +/**
> > + * Device level DMA map function.
> > + * After a successful call, the memory segment will be mapped to the
> > + * given device.
> > + *
> > + * @note: Memory must be registered in advance using rte_extmem_*
> APIs.
> 
> Could we make more explicit that this function is part of the "external
> memory API"?

How do you suggest? 
This function belongs to rte_dev therefore the rte_dev prefix. better rte_dev_extmem_dma_map ?

> 
> 

[1]
https://patches.dpdk.org/patch/51018/
Thomas Monjalon March 13, 2019, 11:19 a.m. | #4
13/03/2019 12:12, Shahaf Shuler:
> From: Thomas Monjalon <thomas@monjalon.net>
> > > +/**
> > > + * Device level DMA map function.
> > > + * After a successful call, the memory segment will be mapped to the
> > > + * given device.
> > > + *
> > > + * @note: Memory must be registered in advance using rte_extmem_*
> > APIs.
> > 
> > Could we make more explicit that this function is part of the "external
> > memory API"?
> 
> How do you suggest? 

There could be an explicit comment.

> This function belongs to rte_dev therefore the rte_dev prefix. better rte_dev_extmem_dma_map ?

Not sure about the prefix. Anatoly?
Burakov, Anatoly March 13, 2019, 11:47 a.m. | #5
On 13-Mar-19 11:19 AM, Thomas Monjalon wrote:
> 13/03/2019 12:12, Shahaf Shuler:
>> From: Thomas Monjalon <thomas@monjalon.net>
>>>> +/**
>>>> + * Device level DMA map function.
>>>> + * After a successful call, the memory segment will be mapped to the
>>>> + * given device.
>>>> + *
>>>> + * @note: Memory must be registered in advance using rte_extmem_*
>>> APIs.
>>>
>>> Could we make more explicit that this function is part of the "external
>>> memory API"?
>>
>> How do you suggest?
> 
> There could be an explicit comment.
> 
>> This function belongs to rte_dev therefore the rte_dev prefix. better rte_dev_extmem_dma_map ?
> 
> Not sure about the prefix. Anatoly?
> 

IMO this is a dev API. The fact that its purpose is to use it with 
extmem is coincidental.
Thomas Monjalon March 30, 2019, 2:36 p.m. | #6
13/03/2019 12:12, Shahaf Shuler:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 10/03/2019 09:28, Shahaf Shuler:
> > > For PCI bus devices, the pci driver can expose its own map and unmap
> > > functions to be used for the mapping. In case the driver doesn't
> > > provide any, the memory will be mapped, if possible, to IOMMU through
> > VFIO APIs.
> > >
> > > Application usage with those APIs is quite simple:
> > > * allocate memory
> > > * call rte_extmem_register on the memory chunk.
> > > * take a device, and query its rte_device.
> > > * call the device specific mapping function for this device.
> > 
> > Should we make it documented somewhere?
> 
> The full flow to work w/ external memory is documented at doc/guides/prog_guide/env_abstraction_layer.rst , Subchapter "Support for Externally Allocated Memory.
> The last commit in series update the right API to use.

OK, then I will move this doc update in this patch.

Patch

diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index 6276e5d695..704b9d71af 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -528,6 +528,52 @@  pci_unplug(struct rte_device *dev)
 	return ret;
 }
 
+static int
+pci_dma_map(struct rte_device *dev, void *addr, uint64_t iova, size_t len)
+{
+	struct rte_pci_device *pdev = RTE_DEV_TO_PCI(dev);
+
+	if (!pdev || !pdev->driver) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	if (pdev->driver->dma_map)
+		return pdev->driver->dma_map(pdev, addr, iova, len);
+	/**
+	 *  In case driver don't provides any specific mapping
+	 *  try fallback to VFIO.
+	 */
+	if (pdev->kdrv == RTE_KDRV_VFIO)
+		return rte_vfio_container_dma_map
+				(RTE_VFIO_DEFAULT_CONTAINER_FD, (uintptr_t)addr,
+				 iova, len);
+	rte_errno = ENOTSUP;
+	return -1;
+}
+
+static int
+pci_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova, size_t len)
+{
+	struct rte_pci_device *pdev = RTE_DEV_TO_PCI(dev);
+
+	if (!pdev || !pdev->driver) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+	if (pdev->driver->dma_unmap)
+		return pdev->driver->dma_unmap(pdev, addr, iova, len);
+	/**
+	 *  In case driver don't provides any specific mapping
+	 *  try fallback to VFIO.
+	 */
+	if (pdev->kdrv == RTE_KDRV_VFIO)
+		return rte_vfio_container_dma_unmap
+				(RTE_VFIO_DEFAULT_CONTAINER_FD, (uintptr_t)addr,
+				 iova, len);
+	rte_errno = ENOTSUP;
+	return -1;
+}
+
 struct rte_pci_bus rte_pci_bus = {
 	.bus = {
 		.scan = rte_pci_scan,
@@ -536,6 +582,8 @@  struct rte_pci_bus rte_pci_bus = {
 		.plug = pci_plug,
 		.unplug = pci_unplug,
 		.parse = pci_parse,
+		.dma_map = pci_dma_map,
+		.dma_unmap = pci_dma_unmap,
 		.get_iommu_class = rte_pci_get_iommu_class,
 		.dev_iterate = rte_pci_dev_iterate,
 		.hot_unplug_handler = pci_hot_unplug_handler,
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index f0d6d81c00..06e004cd3f 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -114,6 +114,44 @@  typedef int (pci_probe_t)(struct rte_pci_driver *, struct rte_pci_device *);
 typedef int (pci_remove_t)(struct rte_pci_device *);
 
 /**
+ * Driver-specific DMA mapping. After a successful call the device
+ * will be able to read/write from/to this segment.
+ *
+ * @param dev
+ *   Pointer to the PCI device.
+ * @param addr
+ *   Starting virtual address of memory to be mapped.
+ * @param iova
+ *   Starting IOVA address of memory to be mapped.
+ * @param len
+ *   Length of memory segment being mapped.
+ * @return
+ *   - 0 On success.
+ *   - Negative value and rte_errno is set otherwise.
+ */
+typedef int (pci_dma_map_t)(struct rte_pci_device *dev, void *addr,
+			    uint64_t iova, size_t len);
+
+/**
+ * Driver-specific DMA un-mapping. After a successful call the device
+ * will not be able to read/write from/to this segment.
+ *
+ * @param dev
+ *   Pointer to the PCI device.
+ * @param addr
+ *   Starting virtual address of memory to be unmapped.
+ * @param iova
+ *   Starting IOVA address of memory to be unmapped.
+ * @param len
+ *   Length of memory segment being unmapped.
+ * @return
+ *   - 0 On success.
+ *   - Negative value and rte_errno is set otherwise.
+ */
+typedef int (pci_dma_unmap_t)(struct rte_pci_device *dev, void *addr,
+			      uint64_t iova, size_t len);
+
+/**
  * A structure describing a PCI driver.
  */
 struct rte_pci_driver {
@@ -122,6 +160,8 @@  struct rte_pci_driver {
 	struct rte_pci_bus *bus;           /**< PCI bus reference. */
 	pci_probe_t *probe;                /**< Device Probe function. */
 	pci_remove_t *remove;              /**< Device Remove function. */
+	pci_dma_map_t *dma_map;		   /**< device dma map function. */
+	pci_dma_unmap_t *dma_unmap;	   /**< device dma unmap function. */
 	const struct rte_pci_id *id_table; /**< ID table, NULL terminated. */
 	uint32_t drv_flags;                /**< Flags RTE_PCI_DRV_*. */
 };
diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index fd7f5ca7d5..0ec42d8289 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -756,3 +756,37 @@  rte_dev_iterator_next(struct rte_dev_iterator *it)
 	free(cls_str);
 	return it->device;
 }
+
+int
+rte_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova,
+		size_t len)
+{
+	if (dev->bus->dma_map == NULL || len == 0) {
+		rte_errno = ENOTSUP;
+		return -1;
+	}
+	/* Memory must be registered through rte_extmem_* APIs */
+	if (rte_mem_virt2memseg_list(addr) == NULL) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	return dev->bus->dma_map(dev, addr, iova, len);
+}
+
+int
+rte_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
+		  size_t len)
+{
+	if (dev->bus->dma_unmap == NULL || len == 0) {
+		rte_errno = ENOTSUP;
+		return -1;
+	}
+	/* Memory must be registered through rte_extmem_* APIs */
+	if (rte_mem_virt2memseg_list(addr) == NULL) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	return dev->bus->dma_unmap(dev, addr, iova, len);
+}
diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h
index 6be4b5cabe..4faf2d20a0 100644
--- a/lib/librte_eal/common/include/rte_bus.h
+++ b/lib/librte_eal/common/include/rte_bus.h
@@ -168,6 +168,48 @@  typedef int (*rte_bus_unplug_t)(struct rte_device *dev);
 typedef int (*rte_bus_parse_t)(const char *name, void *addr);
 
 /**
+ * Device level DMA map function.
+ * After a successful call, the memory segment will be mapped to the
+ * given device.
+ *
+ * @param dev
+ *	Device pointer.
+ * @param addr
+ *	Virtual address to map.
+ * @param iova
+ *	IOVA address to map.
+ * @param len
+ *	Length of the memory segment being mapped.
+ *
+ * @return
+ *	0 if mapping was successful.
+ *	Negative value and rte_errno is set otherwise.
+ */
+typedef int (*rte_dev_dma_map_t)(struct rte_device *dev, void *addr,
+				  uint64_t iova, size_t len);
+
+/**
+ * Device level DMA unmap function.
+ * After a successful call, the memory segment will no longer be
+ * accessible by the given device.
+ *
+ * @param dev
+ *	Device pointer.
+ * @param addr
+ *	Virtual address to unmap.
+ * @param iova
+ *	IOVA address to unmap.
+ * @param len
+ *	Length of the memory segment being mapped.
+ *
+ * @return
+ *	0 if un-mapping was successful.
+ *	Negative value and rte_errno is set otherwise.
+ */
+typedef int (*rte_dev_dma_unmap_t)(struct rte_device *dev, void *addr,
+				   uint64_t iova, size_t len);
+
+/**
  * Implement a specific hot-unplug handler, which is responsible for
  * handle the failure when device be hot-unplugged. When the event of
  * hot-unplug be detected, it could call this function to handle
@@ -238,6 +280,8 @@  struct rte_bus {
 	rte_bus_plug_t plug;         /**< Probe single device for drivers */
 	rte_bus_unplug_t unplug;     /**< Remove single device from driver */
 	rte_bus_parse_t parse;       /**< Parse a device name */
+	rte_dev_dma_map_t dma_map;   /**< DMA map for device in the bus */
+	rte_dev_dma_unmap_t dma_unmap; /**< DMA unmap for device in the bus */
 	struct rte_bus_conf conf;    /**< Bus configuration */
 	rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */
 	rte_dev_iterate_t dev_iterate; /**< Device iterator. */
diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h
index 3cad4bce57..0d5e25b500 100644
--- a/lib/librte_eal/common/include/rte_dev.h
+++ b/lib/librte_eal/common/include/rte_dev.h
@@ -463,4 +463,51 @@  rte_dev_hotplug_handle_enable(void);
 int __rte_experimental
 rte_dev_hotplug_handle_disable(void);
 
+/**
+ * Device level DMA map function.
+ * After a successful call, the memory segment will be mapped to the
+ * given device.
+ *
+ * @note: Memory must be registered in advance using rte_extmem_* APIs.
+ *
+ * @param dev
+ *	Device pointer.
+ * @param addr
+ *	Virtual address to map.
+ * @param iova
+ *	IOVA address to map.
+ * @param len
+ *	Length of the memory segment being mapped.
+ *
+ * @return
+ *	0 if mapping was successful.
+ *	Negative value and rte_errno is set otherwise.
+ */
+int __rte_experimental
+rte_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova, size_t len);
+
+/**
+ * Device level DMA unmap function.
+ * After a successful call, the memory segment will no longer be
+ * accessible by the given device.
+ *
+ * @note: Memory must be registered in advance using rte_extmem_* APIs.
+ *
+ * @param dev
+ *	Device pointer.
+ * @param addr
+ *	Virtual address to unmap.
+ * @param iova
+ *	IOVA address to unmap.
+ * @param len
+ *	Length of the memory segment being mapped.
+ *
+ * @return
+ *	0 if un-mapping was successful.
+ *	Negative value and rte_errno is set otherwise.
+ */
+int __rte_experimental
+rte_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
+		  size_t len);
+
 #endif /* _RTE_DEV_H_ */
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index eb5f7b9cbd..264aa050fa 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -277,6 +277,8 @@  EXPERIMENTAL {
 	rte_class_unregister;
 	rte_ctrl_thread_create;
 	rte_delay_us_sleep;
+	rte_dev_dma_map;
+	rte_dev_dma_unmap;
 	rte_dev_event_callback_process;
 	rte_dev_event_callback_register;
 	rte_dev_event_callback_unregister;