From patchwork Fri Sep 10 11:27:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 98589 X-Patchwork-Delegate: david.marchand@redhat.com Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id F08BFA0547; Fri, 10 Sep 2021 13:27:42 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 6DFAA40DDE; Fri, 10 Sep 2021 13:27:42 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by mails.dpdk.org (Postfix) with ESMTP id 86819406B4 for ; Fri, 10 Sep 2021 13:27:39 +0200 (CEST) X-IronPort-AV: E=McAfee;i="6200,9189,10102"; a="306623882" X-IronPort-AV: E=Sophos;i="5.85,282,1624345200"; d="scan'208";a="306623882" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2021 04:27:38 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,282,1624345200"; d="scan'208";a="505048251" Received: from silpixa00401191.ir.intel.com ([10.55.128.95]) by fmsmga008.fm.intel.com with ESMTP; 10 Sep 2021 04:27:36 -0700 From: Anatoly Burakov To: dev@dpdk.org, Bruce Richardson , Ray Kinsella , Dmitry Kozlyuk , Narcisa Ana Maria Vasile , Dmitry Malloy , Pallavi Kadam Cc: xuan.ding@intel.com, ferruh.yigit@intel.com Date: Fri, 10 Sep 2021 11:27:35 +0000 Message-Id: <043fc2d53770da8248b9cd0214775f9d41f2e0fb.1631273229.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v1 1/1] vfio: add page-by-page mapping API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Currently, there is no way to map memory for DMA in a way that allows unmapping it partially later, because some IOMMU's do not support partial unmapping. There is a workaround of mapping all of these segments separately, but this is inconvenient and silly, so this commit adds a proper API that does it. This commit relies on earlier infrastructure that was built out to support "chunking", as the concept of "chunks" is essentially the same as page size. Signed-off-by: Anatoly Burakov --- lib/eal/freebsd/eal.c | 10 ++++ lib/eal/include/rte_vfio.h | 33 ++++++++++++++ lib/eal/linux/eal_vfio.c | 93 +++++++++++++++++++++++++++++++------- lib/eal/version.map | 3 ++ lib/eal/windows/eal.c | 10 ++++ 5 files changed, 133 insertions(+), 16 deletions(-) diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c index 6cee5ae369..78e18f9765 100644 --- a/lib/eal/freebsd/eal.c +++ b/lib/eal/freebsd/eal.c @@ -1085,6 +1085,16 @@ rte_vfio_container_dma_map(__rte_unused int container_fd, return -1; } +int +rte_vfio_container_dma_map_paged(__rte_unused int container_fd, + __rte_unused uint64_t vaddr, + __rte_unused uint64_t iova, + __rte_unused uint64_t len, + __rte_unused uint64_t pagesz) +{ + return -1; +} + int rte_vfio_container_dma_unmap(__rte_unused int container_fd, __rte_unused uint64_t vaddr, diff --git a/lib/eal/include/rte_vfio.h b/lib/eal/include/rte_vfio.h index 2d90b36480..6afae2ccce 100644 --- a/lib/eal/include/rte_vfio.h +++ b/lib/eal/include/rte_vfio.h @@ -17,6 +17,8 @@ extern "C" { #include #include +#include + /* * determine if VFIO is present on the system */ @@ -331,6 +333,37 @@ int rte_vfio_container_dma_map(int container_fd, uint64_t vaddr, uint64_t iova, uint64_t len); +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice + * + * Perform DMA mapping for devices in a container, mapping memory page-by-page. + * + * @param container_fd + * the specified container fd. Use RTE_VFIO_DEFAULT_CONTAINER_FD to + * use the default container. + * + * @param vaddr + * Starting virtual address of memory to be mapped. + * + * @param iova + * Starting IOVA address of memory to be mapped. + * + * @param len + * Length of memory segment being mapped. + * + * @param pagesz + * Page size of the underlying memory. + * + * @return + * 0 if successful + * <0 if failed + */ +__rte_experimental +int +rte_vfio_container_dma_map_paged(int container_fd, uint64_t vaddr, + uint64_t iova, uint64_t len, uint64_t pagesz); + /** * Perform DMA unmapping for devices in a container. * diff --git a/lib/eal/linux/eal_vfio.c b/lib/eal/linux/eal_vfio.c index 657c89ca58..c791730251 100644 --- a/lib/eal/linux/eal_vfio.c +++ b/lib/eal/linux/eal_vfio.c @@ -1872,11 +1872,12 @@ vfio_dma_mem_map(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, static int container_dma_map(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, - uint64_t len) + uint64_t len, uint64_t pagesz) { struct user_mem_map *new_map; struct user_mem_maps *user_mem_maps; bool has_partial_unmap; + uint64_t chunk_size; int ret = 0; user_mem_maps = &vfio_cfg->mem_maps; @@ -1887,19 +1888,37 @@ container_dma_map(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, ret = -1; goto out; } - /* map the entry */ - if (vfio_dma_mem_map(vfio_cfg, vaddr, iova, len, 1)) { - /* technically, this will fail if there are currently no devices - * plugged in, even if a device were added later, this mapping - * might have succeeded. however, since we cannot verify if this - * is a valid mapping without having a device attached, consider - * this to be unsupported, because we can't just store any old - * mapping and pollute list of active mappings willy-nilly. - */ - RTE_LOG(ERR, EAL, "Couldn't map new region for DMA\n"); - ret = -1; - goto out; + + /* technically, mapping will fail if there are currently no devices + * plugged in, even if a device were added later, this mapping might + * have succeeded. however, since we cannot verify if this is a valid + * mapping without having a device attached, consider this to be + * unsupported, because we can't just store any old mapping and pollute + * list of active mappings willy-nilly. + */ + + /* if page size was not specified, map the entire segment in one go */ + if (pagesz == 0) { + if (vfio_dma_mem_map(vfio_cfg, vaddr, iova, len, 1)) { + RTE_LOG(ERR, EAL, "Couldn't map new region for DMA\n"); + ret = -1; + goto out; + } + } else { + /* otherwise, do mappings page-by-page */ + uint64_t offset; + + for (offset = 0; offset < len; offset += pagesz) { + uint64_t va = vaddr + offset; + uint64_t io = iova + offset; + if (vfio_dma_mem_map(vfio_cfg, va, io, pagesz, 1)) { + RTE_LOG(ERR, EAL, "Couldn't map new region for DMA\n"); + ret = -1; + goto out; + } + } } + /* do we have partial unmap support? */ has_partial_unmap = vfio_cfg->vfio_iommu_type->partial_unmap; @@ -1908,8 +1927,18 @@ container_dma_map(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova, new_map->addr = vaddr; new_map->iova = iova; new_map->len = len; - /* for IOMMU types supporting partial unmap, we don't need chunking */ - new_map->chunk = has_partial_unmap ? 0 : len; + + /* + * Chunking essentially serves largely the same purpose as page sizes, + * so for the purposes of this calculation, we treat them as the same. + * The reason we have page sizes is because we want to map things in a + * way that allows us to partially unmap later. Therefore, when IOMMU + * supports partial unmap, page size is irrelevant and can be ignored. + * For IOMMU that don't support partial unmap, page size is equivalent + * to chunk size. + */ + chunk_size = pagesz == 0 ? len : pagesz; + new_map->chunk = has_partial_unmap ? 0 : chunk_size; compact_user_maps(user_mem_maps); out: @@ -2179,7 +2208,29 @@ rte_vfio_container_dma_map(int container_fd, uint64_t vaddr, uint64_t iova, return -1; } - return container_dma_map(vfio_cfg, vaddr, iova, len); + /* not having page size means we map entire segment */ + return container_dma_map(vfio_cfg, vaddr, iova, len, 0); +} + +int +rte_vfio_container_dma_map_paged(int container_fd, uint64_t vaddr, + uint64_t iova, uint64_t len, uint64_t pagesz) +{ + struct vfio_config *vfio_cfg; + + if (len == 0 || pagesz == 0 || !rte_is_power_of_2(pagesz) || + (len % pagesz) != 0) { + rte_errno = EINVAL; + return -1; + } + + vfio_cfg = get_vfio_cfg_by_container_fd(container_fd); + if (vfio_cfg == NULL) { + RTE_LOG(ERR, EAL, "Invalid VFIO container fd\n"); + return -1; + } + + return container_dma_map(vfio_cfg, vaddr, iova, len, pagesz); } int @@ -2299,6 +2350,16 @@ rte_vfio_container_dma_map(__rte_unused int container_fd, return -1; } +int +rte_vfio_container_dma_map_paged(__rte_unused int container_fd, + __rte_unused uint64_t vaddr, + __rte_unused uint64_t iova, + __rte_unused uint64_t len, + __rte_unused uint64_t pagesz) +{ + return -1; +} + int rte_vfio_container_dma_unmap(__rte_unused int container_fd, __rte_unused uint64_t vaddr, diff --git a/lib/eal/version.map b/lib/eal/version.map index beeb986adc..eaa6b0bedf 100644 --- a/lib/eal/version.map +++ b/lib/eal/version.map @@ -426,6 +426,9 @@ EXPERIMENTAL { # added in 21.08 rte_power_monitor_multi; # WINDOWS_NO_EXPORT + + # added in 21.11 + rte_vfio_container_dma_map_paged; }; INTERNAL { diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c index 3d8c520412..fcd6bc1894 100644 --- a/lib/eal/windows/eal.c +++ b/lib/eal/windows/eal.c @@ -459,6 +459,16 @@ rte_vfio_container_dma_map(__rte_unused int container_fd, return -1; } +int +rte_vfio_container_dma_map_paged(__rte_unused int container_fd, + __rte_unused uint64_t vaddr, + __rte_unused uint64_t iova, + __rte_unused uint64_t len, + __rte_unused uint64_t pagesz) +{ + return -1; +} + int rte_vfio_container_dma_unmap(__rte_unused int container_fd, __rte_unused uint64_t vaddr,