[dpdk-dev,v8,2/5] vfio: add multi container support

Message ID 20180416153438.79355-3-xiao.w.wang@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation fail Compilation issues

Commit Message

Xiao Wang April 16, 2018, 3:34 p.m. UTC
  This patch adds APIs to support container create/destroy and device
bind/unbind with a container. It also provides API for IOMMU programing
on a specified container.

A driver could use "rte_vfio_container_create" helper to create a new
container from eal, use "rte_vfio_container_group_bind" to bind a device
to the newly created container. During rte_vfio_setup_device the container
bound with the device will be used for IOMMU setup.

Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 lib/librte_eal/bsdapp/eal/eal.c          |  52 ++++++
 lib/librte_eal/common/include/rte_vfio.h | 128 ++++++++++++++-
 lib/librte_eal/linuxapp/eal/eal_vfio.c   | 269 ++++++++++++++++++++++++++++---
 lib/librte_eal/rte_eal_version.map       |   6 +
 4 files changed, 436 insertions(+), 19 deletions(-)
  

Comments

Anatoly Burakov April 16, 2018, 3:58 p.m. UTC | #1
On 16-Apr-18 4:34 PM, Xiao Wang wrote:
> This patch adds APIs to support container create/destroy and device
> bind/unbind with a container. It also provides API for IOMMU programing
> on a specified container.
> 
> A driver could use "rte_vfio_container_create" helper to create a new
> container from eal, use "rte_vfio_container_group_bind" to bind a device
> to the newly created container. During rte_vfio_setup_device the container
> bound with the device will be used for IOMMU setup.
> 
> Signed-off-by: Junjie Chen <junjie.j.chen@intel.com>
> Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
> ---

Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
  
Xiao Wang April 17, 2018, 7:06 a.m. UTC | #2
IFCVF driver
============
The IFCVF vDPA (vhost data path acceleration) driver provides support for the
Intel FPGA 100G VF (IFCVF). IFCVF's datapath is virtio ring compatible, it
works as a HW vhost backend which can send/receive packets to/from virtio
directly by DMA. Besides, it supports dirty page logging and device state
report/restore. This driver enables its vDPA functionality with live migration
feature.

vDPA mode
=========
IFCVF's vendor ID and device ID are same as that of virtio net pci device,
with its specific subsystem vendor ID and device ID. To let the device be
probed by IFCVF driver, adding "vdpa=1" parameter helps to specify that this
device is to be used in vDPA mode, rather than polling mode, virtio pmd will
skip when it detects this message.

Container per device
====================
vDPA needs to create different containers for different devices, thus this
patch set adds some APIs in eal/vfio to support multiple container, e.g.
- rte_vfio_container_create
- rte_vfio_container_destroy
- rte_vfio_container_group_bind
- rte_vfio_container_group_unbind

By this extension, a device can be put into a new specific container, rather
than the previous default container.

Two APIs are added for IOMMU programming for a specified container:
- rte_vfio_container_dma_map
- rte_vfio_container_dma_unmap

IFCVF vDPA details
==================
Key vDPA driver ops implemented:
- ifcvf_dev_config:
  Enable VF data path with virtio information provided by vhost lib, including
  IOMMU programming to enable VF DMA to VM's memory, VFIO interrupt setup to
  route HW interrupt to virtio driver, create notify relay thread to translate
  virtio driver's kick to a MMIO write onto HW, HW queues configuration.

  This function gets called to set up HW data path backend when virtio driver
  in VM gets ready.

- ifcvf_dev_close:
  Revoke all the setup in ifcvf_dev_config.

  This function gets called when virtio driver stops device in VM.

Change log
==========
v9:
- Rebase on master tree's HEAD.
- Fix compile error on 32-bit platform.

v8:
- Rebase on HEAD.
- Move vfio_group definition back to eal_vfio.h.
- Return NULL when vfio group num/fd is not found, let caller handle that.
- Fix wrong API name in commit log.
- Rename bind/unbind function to rte_vfio_container_group_bind/unbind for
  consistensy.
- Add note for rte_vfio_container_create and rte_vfio_dma_map and fix typo
  in comment.
- Extract out the shared code snip of rte_vfio_dma_map and
  rte_vfio_container_dma_map to avoid code duplication. So do for the unmap.

v7:
- Rebase on HEAD.
- Split the vfio patch into 2 parts, one for data structure extension, one for
  adding new API.
- Use static vfio_config array instead of dynamic alloating.
- Change rte_vfio_container_dma_map/unmap's parameters to use (va, iova, len).

v6:
- Rebase on master branch.
- Document "vdpa" devarg in virtio documentation.
- Rename ifcvf config option to CONFIG_RTE_LIBRTE_IFCVF_VDPA_PMD for
  consistensy, and add it into driver documentation.
- Add comments for ifcvf device ID.
- Minor code cleaning.

v5:
- Fix compilation in BSD, remove the rte_vfio.h including in BSD.

v4:
- Rebase on Zhihong's latest vDPA lib patch, with vDPA ops names change.
- Remove API "rte_vfio_get_group_fd", "rte_vfio_bind_group" will return the fd.
- Align the vfio_cfg search internal APIs naming.

v3:
- Add doc and release note for the new driver.
- Remove the vdev concept, make the driver as a PCI driver, it will get probed
  by PCI bus driver.
- Rebase on the v4 vDPA lib patch, register a vDPA device instead of a engine.
- Remove the PCI API exposure accordingly.
- Move the MAX_VFIO_CONTAINERS definition to config file.
- Let virtio pmd skips when a virtio device needs to work in vDPA mode.

v2:
- Rename function pci_get_kernel_driver_by_path to rte_pci_device_kdriver_name
  to make the API generic cross Linux and BSD, make it as EXPERIMENTAL.
- Rebase on Zhihong's vDPA v3 patch set.
- Minor code cleanup on vfio extension.


Xiao Wang (5):
  vfio: extend data structure for multi container
  vfio: add multi container support
  net/virtio: skip device probe in vdpa mode
  net/ifcvf: add ifcvf vdpa driver
  doc: add ifcvf driver document and release note

 config/common_base                       |   8 +
 config/common_linuxapp                   |   1 +
 doc/guides/nics/features/ifcvf.ini       |   8 +
 doc/guides/nics/ifcvf.rst                |  98 ++++
 doc/guides/nics/index.rst                |   1 +
 doc/guides/nics/virtio.rst               |  13 +
 doc/guides/rel_notes/release_18_05.rst   |   9 +
 drivers/net/Makefile                     |   3 +
 drivers/net/ifc/Makefile                 |  35 ++
 drivers/net/ifc/base/ifcvf.c             | 329 ++++++++++++
 drivers/net/ifc/base/ifcvf.h             | 160 ++++++
 drivers/net/ifc/base/ifcvf_osdep.h       |  52 ++
 drivers/net/ifc/ifcvf_vdpa.c             | 846 +++++++++++++++++++++++++++++++
 drivers/net/ifc/rte_ifcvf_version.map    |   4 +
 drivers/net/virtio/virtio_ethdev.c       |  43 ++
 lib/librte_eal/bsdapp/eal/eal.c          |  44 ++
 lib/librte_eal/common/include/rte_vfio.h | 128 ++++-
 lib/librte_eal/linuxapp/eal/eal_vfio.c   | 681 +++++++++++++++++++------
 lib/librte_eal/linuxapp/eal/eal_vfio.h   |   9 +-
 lib/librte_eal/rte_eal_version.map       |   6 +
 mk/rte.app.mk                            |   3 +
 21 files changed, 2325 insertions(+), 156 deletions(-)
 create mode 100644 doc/guides/nics/features/ifcvf.ini
 create mode 100644 doc/guides/nics/ifcvf.rst
 create mode 100644 drivers/net/ifc/Makefile
 create mode 100644 drivers/net/ifc/base/ifcvf.c
 create mode 100644 drivers/net/ifc/base/ifcvf.h
 create mode 100644 drivers/net/ifc/base/ifcvf_osdep.h
 create mode 100644 drivers/net/ifc/ifcvf_vdpa.c
 create mode 100644 drivers/net/ifc/rte_ifcvf_version.map
  
Ferruh Yigit April 17, 2018, 11:13 a.m. UTC | #3
On 4/17/2018 8:06 AM, Xiao Wang wrote:
> IFCVF driver
> ============
> The IFCVF vDPA (vhost data path acceleration) driver provides support for the
> Intel FPGA 100G VF (IFCVF). IFCVF's datapath is virtio ring compatible, it
> works as a HW vhost backend which can send/receive packets to/from virtio
> directly by DMA. Besides, it supports dirty page logging and device state
> report/restore. This driver enables its vDPA functionality with live migration
> feature.
> 
> vDPA mode
> =========
> IFCVF's vendor ID and device ID are same as that of virtio net pci device,
> with its specific subsystem vendor ID and device ID. To let the device be
> probed by IFCVF driver, adding "vdpa=1" parameter helps to specify that this
> device is to be used in vDPA mode, rather than polling mode, virtio pmd will
> skip when it detects this message.
> 
> Container per device
> ====================
> vDPA needs to create different containers for different devices, thus this
> patch set adds some APIs in eal/vfio to support multiple container, e.g.
> - rte_vfio_container_create
> - rte_vfio_container_destroy
> - rte_vfio_container_group_bind
> - rte_vfio_container_group_unbind
> 
> By this extension, a device can be put into a new specific container, rather
> than the previous default container.
> 
> Two APIs are added for IOMMU programming for a specified container:
> - rte_vfio_container_dma_map
> - rte_vfio_container_dma_unmap
> 
> IFCVF vDPA details
> ==================
> Key vDPA driver ops implemented:
> - ifcvf_dev_config:
>   Enable VF data path with virtio information provided by vhost lib, including
>   IOMMU programming to enable VF DMA to VM's memory, VFIO interrupt setup to
>   route HW interrupt to virtio driver, create notify relay thread to translate
>   virtio driver's kick to a MMIO write onto HW, HW queues configuration.
> 
>   This function gets called to set up HW data path backend when virtio driver
>   in VM gets ready.
> 
> - ifcvf_dev_close:
>   Revoke all the setup in ifcvf_dev_config.
> 
>   This function gets called when virtio driver stops device in VM.
> 
> Change log
> ==========
> v9:
> - Rebase on master tree's HEAD.
> - Fix compile error on 32-bit platform.
> 
> v8:
> - Rebase on HEAD.
> - Move vfio_group definition back to eal_vfio.h.
> - Return NULL when vfio group num/fd is not found, let caller handle that.
> - Fix wrong API name in commit log.
> - Rename bind/unbind function to rte_vfio_container_group_bind/unbind for
>   consistensy.
> - Add note for rte_vfio_container_create and rte_vfio_dma_map and fix typo
>   in comment.
> - Extract out the shared code snip of rte_vfio_dma_map and
>   rte_vfio_container_dma_map to avoid code duplication. So do for the unmap.
> 
> v7:
> - Rebase on HEAD.
> - Split the vfio patch into 2 parts, one for data structure extension, one for
>   adding new API.
> - Use static vfio_config array instead of dynamic alloating.
> - Change rte_vfio_container_dma_map/unmap's parameters to use (va, iova, len).
> 
> v6:
> - Rebase on master branch.
> - Document "vdpa" devarg in virtio documentation.
> - Rename ifcvf config option to CONFIG_RTE_LIBRTE_IFCVF_VDPA_PMD for
>   consistensy, and add it into driver documentation.
> - Add comments for ifcvf device ID.
> - Minor code cleaning.
> 
> v5:
> - Fix compilation in BSD, remove the rte_vfio.h including in BSD.
> 
> v4:
> - Rebase on Zhihong's latest vDPA lib patch, with vDPA ops names change.
> - Remove API "rte_vfio_get_group_fd", "rte_vfio_bind_group" will return the fd.
> - Align the vfio_cfg search internal APIs naming.
> 
> v3:
> - Add doc and release note for the new driver.
> - Remove the vdev concept, make the driver as a PCI driver, it will get probed
>   by PCI bus driver.
> - Rebase on the v4 vDPA lib patch, register a vDPA device instead of a engine.
> - Remove the PCI API exposure accordingly.
> - Move the MAX_VFIO_CONTAINERS definition to config file.
> - Let virtio pmd skips when a virtio device needs to work in vDPA mode.
> 
> v2:
> - Rename function pci_get_kernel_driver_by_path to rte_pci_device_kdriver_name
>   to make the API generic cross Linux and BSD, make it as EXPERIMENTAL.
> - Rebase on Zhihong's vDPA v3 patch set.
> - Minor code cleanup on vfio extension.
> 
> 
> Xiao Wang (5):
>   vfio: extend data structure for multi container
>   vfio: add multi container support
>   net/virtio: skip device probe in vdpa mode
>   net/ifcvf: add ifcvf vdpa driver
>   doc: add ifcvf driver document and release note

Series applied to dpdk-next-net/master, thanks.
  

Patch

diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c
index bfbec0d7f..b5c0386e4 100644
--- a/lib/librte_eal/bsdapp/eal/eal.c
+++ b/lib/librte_eal/bsdapp/eal/eal.c
@@ -769,6 +769,14 @@  int rte_vfio_noiommu_is_enabled(void);
 int rte_vfio_clear_group(int vfio_group_fd);
 int rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len);
 int rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len);
+int rte_vfio_container_create(void);
+int rte_vfio_container_destroy(int container_fd);
+int rte_vfio_container_group_bind(int container_fd, int iommu_group_num);
+int rte_vfio_container_group_unbind(int container_fd, int iommu_group_num);
+int rte_vfio_container_dma_map(int container_fd, uint64_t vaddr,
+		uint64_t iova, uint64_t len);
+int rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr,
+		uint64_t iova, uint64_t len);
 
 int rte_vfio_setup_device(__rte_unused const char *sysfs_base,
 		      __rte_unused const char *dev_addr,
@@ -838,3 +846,47 @@  rte_vfio_get_group_fd(__rte_unused int iommu_group_num)
 {
 	return -1;
 }
+
+int __rte_experimental
+rte_vfio_container_create(void)
+{
+	return -1;
+}
+
+int __rte_experimental
+rte_vfio_container_destroy(__rte_unused int container_fd)
+{
+	return -1;
+}
+
+int __rte_experimental
+rte_vfio_container_group_bind(__rte_unused int container_fd,
+		__rte_unused int iommu_group_num)
+{
+	return -1;
+}
+
+int __rte_experimental
+rte_vfio_container_group_unbind(__rte_unused int container_fd,
+		__rte_unused int iommu_group_num)
+{
+	return -1;
+}
+
+int __rte_experimental
+rte_vfio_container_dma_map(__rte_unused int container_fd,
+			__rte_unused uint64_t vaddr,
+			__rte_unused uint64_t iova,
+			__rte_unused uint64_t len)
+{
+	return -1;
+}
+
+int __rte_experimental
+rte_vfio_container_dma_unmap(__rte_unused int container_fd,
+			__rte_unused uint64_t vaddr,
+			__rte_unused uint64_t iova,
+			__rte_unused uint64_t len)
+{
+	return -1;
+}
diff --git a/lib/librte_eal/common/include/rte_vfio.h b/lib/librte_eal/common/include/rte_vfio.h
index c4a2e606f..c10c206a3 100644
--- a/lib/librte_eal/common/include/rte_vfio.h
+++ b/lib/librte_eal/common/include/rte_vfio.h
@@ -154,7 +154,10 @@  rte_vfio_clear_group(int vfio_group_fd);
 /**
  * Map memory region for use with VFIO.
  *
- * @note requires at least one device to be attached at the time of mapping.
+ * @note Require at least one device to be attached at the time of
+ *       mapping. DMA maps done via this API will only apply to default
+ *       container and will not apply to any of the containers created
+ *       via rte_vfio_container_create().
  *
  * @param vaddr
  *   Starting virtual address of memory to be mapped.
@@ -245,6 +248,129 @@  rte_vfio_get_container_fd(void);
 int __rte_experimental
 rte_vfio_get_group_fd(int iommu_group_num);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Create a new container for device binding.
+ *
+ * @note Any newly allocated DPDK memory will not be mapped into these
+ *       containers by default, user needs to manage DMA mappings for
+ *       any container created by this API.
+ *
+ * @return
+ *   the container fd if successful
+ *   <0 if failed
+ */
+int __rte_experimental
+rte_vfio_container_create(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Destroy the container, unbind all vfio groups within it.
+ *
+ * @param container_fd
+ *   the container fd to destroy
+ *
+ * @return
+ *    0 if successful
+ *   <0 if failed
+ */
+int __rte_experimental
+rte_vfio_container_destroy(int container_fd);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Bind a IOMMU group to a container.
+ *
+ * @param container_fd
+ *   the container's fd
+ *
+ * @param iommu_group_num
+ *   the iommu group number to bind to container
+ *
+ * @return
+ *   group fd if successful
+ *   <0 if failed
+ */
+int __rte_experimental
+rte_vfio_container_group_bind(int container_fd, int iommu_group_num);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Unbind a IOMMU group from a container.
+ *
+ * @param container_fd
+ *   the container fd of container
+ *
+ * @param iommu_group_num
+ *   the iommu group number to delete from container
+ *
+ * @return
+ *    0 if successful
+ *   <0 if failed
+ */
+int __rte_experimental
+rte_vfio_container_group_unbind(int container_fd, int iommu_group_num);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Perform DMA mapping for devices in a container.
+ *
+ * @param container_fd
+ *   the specified container fd
+ *
+ * @param vaddr
+ *   Starting virtual address of memory to be mapped.
+ *
+ * @param iova
+ *   Starting IOVA address of memory to be mapped.
+ *
+ * @param len
+ *   Length of memory segment being mapped.
+ *
+ * @return
+ *    0 if successful
+ *   <0 if failed
+ */
+int __rte_experimental
+rte_vfio_container_dma_map(int container_fd, uint64_t vaddr,
+		uint64_t iova, uint64_t len);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Perform DMA unmapping for devices in a container.
+ *
+ * @param container_fd
+ *   the specified container fd
+ *
+ * @param vaddr
+ *   Starting virtual address of memory to be unmapped.
+ *
+ * @param iova
+ *   Starting IOVA address of memory to be unmapped.
+ *
+ * @param len
+ *   Length of memory segment being unmapped.
+ *
+ * @return
+ *    0 if successful
+ *   <0 if failed
+ */
+int __rte_experimental
+rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr,
+		uint64_t iova, uint64_t len);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 6289f6316..64ea194f0 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1532,19 +1532,15 @@  vfio_dma_mem_map(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova,
 			len, do_map);
 }
 
-int __rte_experimental
-rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len)
+static int
+container_dma_map(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova,
+		uint64_t len)
 {
 	struct user_mem_map *new_map;
 	struct user_mem_maps *user_mem_maps;
 	int ret = 0;
 
-	if (len == 0) {
-		rte_errno = EINVAL;
-		return -1;
-	}
-
-	user_mem_maps = &default_vfio_cfg->mem_maps;
+	user_mem_maps = &vfio_cfg->mem_maps;
 	rte_spinlock_recursive_lock(&user_mem_maps->lock);
 	if (user_mem_maps->n_maps == VFIO_MAX_USER_MEM_MAPS) {
 		RTE_LOG(ERR, EAL, "No more space for user mem maps\n");
@@ -1553,7 +1549,7 @@  rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len)
 		goto out;
 	}
 	/* map the entry */
-	if (vfio_dma_mem_map(default_vfio_cfg, vaddr, iova, len, 1)) {
+	if (vfio_dma_mem_map(vfio_cfg, vaddr, iova, len, 1)) {
 		/* technically, this will fail if there are currently no devices
 		 * plugged in, even if a device were added later, this mapping
 		 * might have succeeded. however, since we cannot verify if this
@@ -1577,19 +1573,15 @@  rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len)
 	return ret;
 }
 
-int __rte_experimental
-rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len)
+static int
+container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova,
+		uint64_t len)
 {
 	struct user_mem_map *map, *new_map = NULL;
 	struct user_mem_maps *user_mem_maps;
 	int ret = 0;
 
-	if (len == 0) {
-		rte_errno = EINVAL;
-		return -1;
-	}
-
-	user_mem_maps = &default_vfio_cfg->mem_maps;
+	user_mem_maps = &vfio_cfg->mem_maps;
 	rte_spinlock_recursive_lock(&user_mem_maps->lock);
 
 	/* find our mapping */
@@ -1614,7 +1606,7 @@  rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len)
 	}
 
 	/* unmap the entry */
-	if (vfio_dma_mem_map(default_vfio_cfg, vaddr, iova, len, 0)) {
+	if (vfio_dma_mem_map(vfio_cfg, vaddr, iova, len, 0)) {
 		/* there may not be any devices plugged in, so unmapping will
 		 * fail with ENODEV/ENOTSUP rte_errno values, but that doesn't
 		 * stop us from removing the mapping, as the assumption is we
@@ -1653,6 +1645,28 @@  rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len)
 	return ret;
 }
 
+int __rte_experimental
+rte_vfio_dma_map(uint64_t vaddr, uint64_t iova, uint64_t len)
+{
+	if (len == 0) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	return container_dma_map(default_vfio_cfg, vaddr, iova, len);
+}
+
+int __rte_experimental
+rte_vfio_dma_unmap(uint64_t vaddr, uint64_t iova, uint64_t len)
+{
+	if (len == 0) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	return container_dma_unmap(default_vfio_cfg, vaddr, iova, len);
+}
+
 int
 rte_vfio_noiommu_is_enabled(void)
 {
@@ -1685,6 +1699,181 @@  rte_vfio_noiommu_is_enabled(void)
 	return c == 'Y';
 }
 
+int __rte_experimental
+rte_vfio_container_create(void)
+{
+	int i;
+
+	/* Find an empty slot to store new vfio config */
+	for (i = 1; i < VFIO_MAX_CONTAINERS; i++) {
+		if (vfio_cfgs[i].vfio_container_fd == -1)
+			break;
+	}
+
+	if (i == VFIO_MAX_CONTAINERS) {
+		RTE_LOG(ERR, EAL, "exceed max vfio container limit\n");
+		return -1;
+	}
+
+	vfio_cfgs[i].vfio_container_fd = rte_vfio_get_container_fd();
+	if (vfio_cfgs[i].vfio_container_fd < 0) {
+		RTE_LOG(NOTICE, EAL, "fail to create a new container\n");
+		return -1;
+	}
+
+	return vfio_cfgs[i].vfio_container_fd;
+}
+
+int __rte_experimental
+rte_vfio_container_destroy(int container_fd)
+{
+	struct vfio_config *vfio_cfg;
+	int i;
+
+	vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
+	if (vfio_cfg == NULL) {
+		RTE_LOG(ERR, EAL, "Invalid container fd\n");
+		return -1;
+	}
+
+	for (i = 0; i < VFIO_MAX_GROUPS; i++)
+		if (vfio_cfg->vfio_groups[i].group_num != -1)
+			rte_vfio_container_group_unbind(container_fd,
+				vfio_cfg->vfio_groups[i].group_num);
+
+	close(container_fd);
+	vfio_cfg->vfio_container_fd = -1;
+	vfio_cfg->vfio_active_groups = 0;
+	vfio_cfg->vfio_iommu_type = NULL;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_vfio_container_group_bind(int container_fd, int iommu_group_num)
+{
+	struct vfio_config *vfio_cfg;
+	struct vfio_group *cur_grp;
+	int vfio_group_fd;
+	int i;
+
+	vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
+	if (vfio_cfg == NULL) {
+		RTE_LOG(ERR, EAL, "Invalid container fd\n");
+		return -1;
+	}
+
+	/* Check room for new group */
+	if (vfio_cfg->vfio_active_groups == VFIO_MAX_GROUPS) {
+		RTE_LOG(ERR, EAL, "Maximum number of VFIO groups reached!\n");
+		return -1;
+	}
+
+	/* Get an index for the new group */
+	for (i = 0; i < VFIO_MAX_GROUPS; i++)
+		if (vfio_cfg->vfio_groups[i].group_num == -1) {
+			cur_grp = &vfio_cfg->vfio_groups[i];
+			break;
+		}
+
+	/* This should not happen */
+	if (i == VFIO_MAX_GROUPS) {
+		RTE_LOG(ERR, EAL, "No VFIO group free slot found\n");
+		return -1;
+	}
+
+	vfio_group_fd = vfio_open_group_fd(iommu_group_num);
+	if (vfio_group_fd < 0) {
+		RTE_LOG(ERR, EAL, "Failed to open group %d\n", iommu_group_num);
+		return -1;
+	}
+	cur_grp->group_num = iommu_group_num;
+	cur_grp->fd = vfio_group_fd;
+	cur_grp->devices = 0;
+	vfio_cfg->vfio_active_groups++;
+
+	return vfio_group_fd;
+}
+
+int __rte_experimental
+rte_vfio_container_group_unbind(int container_fd, int iommu_group_num)
+{
+	struct vfio_config *vfio_cfg;
+	struct vfio_group *cur_grp;
+	int i;
+
+	vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
+	if (vfio_cfg == NULL) {
+		RTE_LOG(ERR, EAL, "Invalid container fd\n");
+		return -1;
+	}
+
+	for (i = 0; i < VFIO_MAX_GROUPS; i++) {
+		if (vfio_cfg->vfio_groups[i].group_num == iommu_group_num) {
+			cur_grp = &vfio_cfg->vfio_groups[i];
+			break;
+		}
+	}
+
+	/* This should not happen */
+	if (i == VFIO_MAX_GROUPS) {
+		RTE_LOG(ERR, EAL, "Specified group number not found\n");
+		return -1;
+	}
+
+	if (cur_grp->fd >= 0 && close(cur_grp->fd) < 0) {
+		RTE_LOG(ERR, EAL, "Error when closing vfio_group_fd for"
+			" iommu_group_num %d\n", iommu_group_num);
+		return -1;
+	}
+	cur_grp->group_num = -1;
+	cur_grp->fd = -1;
+	cur_grp->devices = 0;
+	vfio_cfg->vfio_active_groups--;
+
+	return 0;
+}
+
+int __rte_experimental
+rte_vfio_container_dma_map(int container_fd, uint64_t vaddr, uint64_t iova,
+		uint64_t len)
+{
+	struct vfio_config *vfio_cfg;
+
+	if (len == 0) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
+	if (vfio_cfg == NULL) {
+		RTE_LOG(ERR, EAL, "Invalid container fd\n");
+		return -1;
+	}
+
+	return container_dma_map(vfio_cfg, vaddr, iova, len);
+}
+
+int __rte_experimental
+rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr, uint64_t iova,
+		uint64_t len)
+{
+	struct vfio_config *vfio_cfg;
+
+	if (len == 0) {
+		rte_errno = EINVAL;
+		return -1;
+	}
+
+	vfio_cfg = get_vfio_cfg_by_container_fd(container_fd);
+	if (vfio_cfg == NULL) {
+		RTE_LOG(ERR, EAL, "Invalid container fd\n");
+		return -1;
+	}
+
+	return container_dma_unmap(vfio_cfg, vaddr, iova, len);
+}
+
 #else
 
 int __rte_experimental
@@ -1701,4 +1890,48 @@  rte_vfio_dma_unmap(uint64_t __rte_unused vaddr, uint64_t __rte_unused iova,
 	return -1;
 }
 
+int __rte_experimental
+rte_vfio_container_create(void)
+{
+	return -1;
+}
+
+int __rte_experimental
+rte_vfio_container_destroy(__rte_unused int container_fd)
+{
+	return -1;
+}
+
+int __rte_experimental
+rte_vfio_container_group_bind(__rte_unused int container_fd,
+		__rte_unused int iommu_group_num)
+{
+	return -1;
+}
+
+int __rte_experimental
+rte_vfio_container_group_unbind(__rte_unused int container_fd,
+		__rte_unused int iommu_group_num)
+{
+	return -1;
+}
+
+int __rte_experimental
+rte_vfio_container_dma_map(__rte_unused int container_fd,
+		__rte_unused uint64_t vaddr,
+		__rte_unused uint64_t iova,
+		__rte_unused uint64_t len)
+{
+	return -1;
+}
+
+int __rte_experimental
+rte_vfio_container_dma_unmap(__rte_unused int container_fd,
+		__rte_unused uint64_t vaddr,
+		__rte_unused uint64_t iova,
+		__rte_unused uint64_t len)
+{
+	return -1;
+}
+
 #endif
diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map
index d02d80b8a..28f51f8d2 100644
--- a/lib/librte_eal/rte_eal_version.map
+++ b/lib/librte_eal/rte_eal_version.map
@@ -293,5 +293,11 @@  EXPERIMENTAL {
 	rte_vfio_get_container_fd;
 	rte_vfio_get_group_fd;
 	rte_vfio_get_group_num;
+	rte_vfio_container_create;
+	rte_vfio_container_destroy;
+	rte_vfio_container_dma_map;
+	rte_vfio_container_dma_unmap;
+	rte_vfio_container_group_bind;
+	rte_vfio_container_group_unbind;
 
 } DPDK_18.02;