From patchwork Tue Mar 5 13:59:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shahaf Shuler X-Patchwork-Id: 50814 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 42D5D2BF2; Tue, 5 Mar 2019 15:00:02 +0100 (CET) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id B3C152B9E for ; Tue, 5 Mar 2019 15:00:00 +0100 (CET) Received: from Internal Mail-Server by MTLPINE1 (envelope-from shahafs@mellanox.com) with ESMTPS (AES256-SHA encrypted); 5 Mar 2019 15:59:56 +0200 Received: from unicorn01.mtl.labs.mlnx. (unicorn01.mtl.labs.mlnx [10.7.12.62]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x25DxtO9012489; Tue, 5 Mar 2019 15:59:55 +0200 From: Shahaf Shuler To: anatoly.burakov@intel.com, yskoh@mellanox.com, thomas@monjalon.net, ferruh.yigit@intel.com, nhorman@tuxdriver.com, gaetan.rivet@6wind.com Cc: dev@dpdk.org Date: Tue, 5 Mar 2019 15:59:41 +0200 Message-Id: X-Mailer: git-send-email 2.12.0 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v3 1/6] vfio: allow DMA map of memory for the default vfio fd X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Enable users the option to call rte_vfio_dma_map with request to map to the default vfio fd. Signed-off-by: Shahaf Shuler Acked-by: Anatoly Burakov --- doc/guides/rel_notes/release_19_05.rst | 3 +++ lib/librte_eal/common/include/rte_vfio.h | 8 ++++++-- lib/librte_eal/linuxapp/eal/eal_vfio.c | 10 ++++++++-- 3 files changed, 17 insertions(+), 4 deletions(-) diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst index 4a3e2a7f31..b02753bbc4 100644 --- a/doc/guides/rel_notes/release_19_05.rst +++ b/doc/guides/rel_notes/release_19_05.rst @@ -122,6 +122,9 @@ ABI Changes Also, make sure to start the actual text at the margin. ========================================================= +* vfio: Functions ``rte_vfio_container_dma_map`` and + ``rte_vfio_container_dma_unmap`` have been extended with an option to + request mapping or un-mapping to the default vfio container fd. Shared Library Versions ----------------------- diff --git a/lib/librte_eal/common/include/rte_vfio.h b/lib/librte_eal/common/include/rte_vfio.h index cae96fab90..cdfbedc1f9 100644 --- a/lib/librte_eal/common/include/rte_vfio.h +++ b/lib/librte_eal/common/include/rte_vfio.h @@ -80,6 +80,8 @@ struct vfio_device_info; #endif /* VFIO_PRESENT */ +#define RTE_VFIO_DEFAULT_CONTAINER_FD (-1) + /** * Setup vfio_cfg for the device identified by its address. * It discovers the configured I/O MMU groups or sets a new one for the device. @@ -347,7 +349,8 @@ rte_vfio_container_group_unbind(int container_fd, int iommu_group_num); * Perform DMA mapping for devices in a container. * * @param container_fd - * the specified container fd + * the specified container fd. Use RTE_VFIO_DEFAULT_CONTAINER_FD to + * use the default container. * * @param vaddr * Starting virtual address of memory to be mapped. @@ -370,7 +373,8 @@ rte_vfio_container_dma_map(int container_fd, uint64_t vaddr, * Perform DMA unmapping for devices in a container. * * @param container_fd - * the specified container fd + * the specified container fd. Use RTE_VFIO_DEFAULT_CONTAINER_FD to + * use the default container. * * @param vaddr * Starting virtual address of memory to be unmapped. diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c index c821e83826..9adbda8bb7 100644 --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c @@ -1897,7 +1897,10 @@ rte_vfio_container_dma_map(int container_fd, uint64_t vaddr, uint64_t iova, return -1; } - vfio_cfg = get_vfio_cfg_by_container_fd(container_fd); + if (container_fd == RTE_VFIO_DEFAULT_CONTAINER_FD) + vfio_cfg = default_vfio_cfg; + else + vfio_cfg = get_vfio_cfg_by_container_fd(container_fd); if (vfio_cfg == NULL) { RTE_LOG(ERR, EAL, "Invalid container fd\n"); return -1; @@ -1917,7 +1920,10 @@ rte_vfio_container_dma_unmap(int container_fd, uint64_t vaddr, uint64_t iova, return -1; } - vfio_cfg = get_vfio_cfg_by_container_fd(container_fd); + if (container_fd == RTE_VFIO_DEFAULT_CONTAINER_FD) + vfio_cfg = default_vfio_cfg; + else + vfio_cfg = get_vfio_cfg_by_container_fd(container_fd); if (vfio_cfg == NULL) { RTE_LOG(ERR, EAL, "Invalid container fd\n"); return -1; From patchwork Tue Mar 5 13:59:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shahaf Shuler X-Patchwork-Id: 50819 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 8D0255323; Tue, 5 Mar 2019 15:00:09 +0100 (CET) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id D54434CA7 for ; Tue, 5 Mar 2019 15:00:05 +0100 (CET) Received: from Internal Mail-Server by MTLPINE1 (envelope-from shahafs@mellanox.com) with ESMTPS (AES256-SHA encrypted); 5 Mar 2019 15:59:59 +0200 Received: from unicorn01.mtl.labs.mlnx. (unicorn01.mtl.labs.mlnx [10.7.12.62]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x25DxtOA012489; Tue, 5 Mar 2019 15:59:55 +0200 From: Shahaf Shuler To: anatoly.burakov@intel.com, yskoh@mellanox.com, thomas@monjalon.net, ferruh.yigit@intel.com, nhorman@tuxdriver.com, gaetan.rivet@6wind.com Cc: dev@dpdk.org Date: Tue, 5 Mar 2019 15:59:42 +0200 Message-Id: <1e8400f68a2fb1ceb07127c72f0874bb881e5d80.1551793527.git.shahafs@mellanox.com> X-Mailer: git-send-email 2.12.0 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v3 2/6] vfio: don't fail to DMA map if memory is already mapped X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Currently vfio DMA map function will fail in case the same memory segment is mapped twice. This is too strict, as this is not an error to map the same memory twice. Instead, use the kernel return value to detect such state and have the DMA function to return as successful. For type1 mapping the kernel driver returns EEXISTS. For spapr mapping EBUSY is returned since kernel 4.10. Signed-off-by: Shahaf Shuler Acked-by: Anatoly Burakov --- lib/librte_eal/linuxapp/eal/eal_vfio.c | 32 +++++++++++++++++++++++++---- 1 file changed, 28 insertions(+), 4 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c b/lib/librte_eal/linuxapp/eal/eal_vfio.c index 9adbda8bb7..d0a0f9c16f 100644 --- a/lib/librte_eal/linuxapp/eal/eal_vfio.c +++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c @@ -1264,9 +1264,21 @@ vfio_type1_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map); if (ret) { - RTE_LOG(ERR, EAL, " cannot set up DMA remapping, error %i (%s)\n", - errno, strerror(errno)); + /** + * In case the mapping was already done EEXIST will be + * returned from kernel. + */ + if (errno == EEXIST) { + RTE_LOG(DEBUG, EAL, + " Memory segment is allready mapped," + " skipping"); + } else { + RTE_LOG(ERR, EAL, + " cannot set up DMA remapping," + " error %i (%s)\n", + errno, strerror(errno)); return -1; + } } } else { memset(&dma_unmap, 0, sizeof(dma_unmap)); @@ -1325,9 +1337,21 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, ret = ioctl(vfio_container_fd, VFIO_IOMMU_MAP_DMA, &dma_map); if (ret) { - RTE_LOG(ERR, EAL, " cannot set up DMA remapping, error %i (%s)\n", - errno, strerror(errno)); + /** + * In case the mapping was already done EBUSY will be + * returned from kernel. + */ + if (errno == EBUSY) { + RTE_LOG(DEBUG, EAL, + " Memory segment is allready mapped," + " skipping"); + } else { + RTE_LOG(ERR, EAL, + " cannot set up DMA remapping," + " error %i (%s)\n", errno, + strerror(errno)); return -1; + } } } else { From patchwork Tue Mar 5 13:59:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shahaf Shuler X-Patchwork-Id: 50817 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id F122F4CAF; Tue, 5 Mar 2019 15:00:06 +0100 (CET) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id 4404B2B9E for ; Tue, 5 Mar 2019 15:00:00 +0100 (CET) Received: from Internal Mail-Server by MTLPINE1 (envelope-from shahafs@mellanox.com) with ESMTPS (AES256-SHA encrypted); 5 Mar 2019 15:59:56 +0200 Received: from unicorn01.mtl.labs.mlnx. (unicorn01.mtl.labs.mlnx [10.7.12.62]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x25DxtOB012489; Tue, 5 Mar 2019 15:59:55 +0200 From: Shahaf Shuler To: anatoly.burakov@intel.com, yskoh@mellanox.com, thomas@monjalon.net, ferruh.yigit@intel.com, nhorman@tuxdriver.com, gaetan.rivet@6wind.com Cc: dev@dpdk.org Date: Tue, 5 Mar 2019 15:59:43 +0200 Message-Id: <6b96596f4bf57f419f0cc9222c347f89cf15e194.1551793527.git.shahafs@mellanox.com> X-Mailer: git-send-email 2.12.0 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v3 3/6] bus: introduce device level DMA memory mapping X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The DPDK APIs expose 3 different modes to work with memory used for DMA: 1. Use the DPDK owned memory (backed by the DPDK provided hugepages). This memory is allocated by the DPDK libraries, included in the DPDK memory system (memseg lists) and automatically DMA mapped by the DPDK layers. 2. Use memory allocated by the user and register to the DPDK memory systems. Upon registration of memory, the DPDK layers will DMA map it to all needed devices. After registration, allocation of this memory will be done with rte_*malloc APIs. 3. Use memory allocated by the user and not registered to the DPDK memory system. This is for users who wants to have tight control on this memory (e.g. avoid the rte_malloc header). The user should create a memory, register it through rte_extmem_register API, and call DMA map function in order to register such memory to the different devices. The scope of the patch focus on #3 above. Currently the only way to map external memory is through VFIO (rte_vfio_dma_map). While VFIO is common, there are other vendors which use different ways to map memory (e.g. Mellanox and NXP). The work in this patch moves the DMA mapping to vendor agnostic APIs. Device level DMA map and unmap APIs were added. Implementation of those APIs was done currently only for PCI devices. For PCI bus devices, the pci driver can expose its own map and unmap functions to be used for the mapping. In case the driver doesn't provide any, the memory will be mapped, if possible, to IOMMU through VFIO APIs. Application usage with those APIs is quite simple: * allocate memory * call rte_extmem_register on the memory chunk. * take a device, and query its rte_device. * call the device specific mapping function for this device. Future work will deprecate the rte_vfio_dma_map and rte_vfio_dma_unmap APIs, leaving the rte device APIs as the preferred option for the user. Signed-off-by: Shahaf Shuler --- drivers/bus/pci/pci_common.c | 48 ++++++++++++++++++++++++++++ drivers/bus/pci/rte_bus_pci.h | 40 +++++++++++++++++++++++ lib/librte_eal/common/eal_common_dev.c | 34 ++++++++++++++++++++ lib/librte_eal/common/include/rte_bus.h | 44 +++++++++++++++++++++++++ lib/librte_eal/common/include/rte_dev.h | 47 +++++++++++++++++++++++++++ lib/librte_eal/rte_eal_version.map | 2 ++ 6 files changed, 215 insertions(+) diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c index 6276e5d695..704b9d71af 100644 --- a/drivers/bus/pci/pci_common.c +++ b/drivers/bus/pci/pci_common.c @@ -528,6 +528,52 @@ pci_unplug(struct rte_device *dev) return ret; } +static int +pci_dma_map(struct rte_device *dev, void *addr, uint64_t iova, size_t len) +{ + struct rte_pci_device *pdev = RTE_DEV_TO_PCI(dev); + + if (!pdev || !pdev->driver) { + rte_errno = EINVAL; + return -1; + } + if (pdev->driver->dma_map) + return pdev->driver->dma_map(pdev, addr, iova, len); + /** + * In case driver don't provides any specific mapping + * try fallback to VFIO. + */ + if (pdev->kdrv == RTE_KDRV_VFIO) + return rte_vfio_container_dma_map + (RTE_VFIO_DEFAULT_CONTAINER_FD, (uintptr_t)addr, + iova, len); + rte_errno = ENOTSUP; + return -1; +} + +static int +pci_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova, size_t len) +{ + struct rte_pci_device *pdev = RTE_DEV_TO_PCI(dev); + + if (!pdev || !pdev->driver) { + rte_errno = EINVAL; + return -1; + } + if (pdev->driver->dma_unmap) + return pdev->driver->dma_unmap(pdev, addr, iova, len); + /** + * In case driver don't provides any specific mapping + * try fallback to VFIO. + */ + if (pdev->kdrv == RTE_KDRV_VFIO) + return rte_vfio_container_dma_unmap + (RTE_VFIO_DEFAULT_CONTAINER_FD, (uintptr_t)addr, + iova, len); + rte_errno = ENOTSUP; + return -1; +} + struct rte_pci_bus rte_pci_bus = { .bus = { .scan = rte_pci_scan, @@ -536,6 +582,8 @@ struct rte_pci_bus rte_pci_bus = { .plug = pci_plug, .unplug = pci_unplug, .parse = pci_parse, + .dma_map = pci_dma_map, + .dma_unmap = pci_dma_unmap, .get_iommu_class = rte_pci_get_iommu_class, .dev_iterate = rte_pci_dev_iterate, .hot_unplug_handler = pci_hot_unplug_handler, diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h index f0d6d81c00..06e004cd3f 100644 --- a/drivers/bus/pci/rte_bus_pci.h +++ b/drivers/bus/pci/rte_bus_pci.h @@ -114,6 +114,44 @@ typedef int (pci_probe_t)(struct rte_pci_driver *, struct rte_pci_device *); typedef int (pci_remove_t)(struct rte_pci_device *); /** + * Driver-specific DMA mapping. After a successful call the device + * will be able to read/write from/to this segment. + * + * @param dev + * Pointer to the PCI device. + * @param addr + * Starting virtual address of memory to be mapped. + * @param iova + * Starting IOVA address of memory to be mapped. + * @param len + * Length of memory segment being mapped. + * @return + * - 0 On success. + * - Negative value and rte_errno is set otherwise. + */ +typedef int (pci_dma_map_t)(struct rte_pci_device *dev, void *addr, + uint64_t iova, size_t len); + +/** + * Driver-specific DMA un-mapping. After a successful call the device + * will not be able to read/write from/to this segment. + * + * @param dev + * Pointer to the PCI device. + * @param addr + * Starting virtual address of memory to be unmapped. + * @param iova + * Starting IOVA address of memory to be unmapped. + * @param len + * Length of memory segment being unmapped. + * @return + * - 0 On success. + * - Negative value and rte_errno is set otherwise. + */ +typedef int (pci_dma_unmap_t)(struct rte_pci_device *dev, void *addr, + uint64_t iova, size_t len); + +/** * A structure describing a PCI driver. */ struct rte_pci_driver { @@ -122,6 +160,8 @@ struct rte_pci_driver { struct rte_pci_bus *bus; /**< PCI bus reference. */ pci_probe_t *probe; /**< Device Probe function. */ pci_remove_t *remove; /**< Device Remove function. */ + pci_dma_map_t *dma_map; /**< device dma map function. */ + pci_dma_unmap_t *dma_unmap; /**< device dma unmap function. */ const struct rte_pci_id *id_table; /**< ID table, NULL terminated. */ uint32_t drv_flags; /**< Flags RTE_PCI_DRV_*. */ }; diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c index fd7f5ca7d5..08303b2f53 100644 --- a/lib/librte_eal/common/eal_common_dev.c +++ b/lib/librte_eal/common/eal_common_dev.c @@ -756,3 +756,37 @@ rte_dev_iterator_next(struct rte_dev_iterator *it) free(cls_str); return it->device; } + +int +rte_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova, + size_t len) +{ + if (dev->bus->dma_map == NULL || len == 0) { + rte_errno = EINVAL; + return -1; + } + /* Memory must be registered through rte_extmem_* APIs */ + if (rte_mem_virt2memseg_list(addr) == NULL) { + rte_errno = EINVAL; + return -1; + } + + return dev->bus->dma_map(dev, addr, iova, len); +} + +int +rte_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova, + size_t len) +{ + if (dev->bus->dma_unmap == NULL || len == 0) { + rte_errno = EINVAL; + return -1; + } + /* Memory must be registered through rte_extmem_* APIs */ + if (rte_mem_virt2memseg_list(addr) == NULL) { + rte_errno = EINVAL; + return -1; + } + + return dev->bus->dma_unmap(dev, addr, iova, len); +} diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h index 6be4b5cabe..4faf2d20a0 100644 --- a/lib/librte_eal/common/include/rte_bus.h +++ b/lib/librte_eal/common/include/rte_bus.h @@ -168,6 +168,48 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev); typedef int (*rte_bus_parse_t)(const char *name, void *addr); /** + * Device level DMA map function. + * After a successful call, the memory segment will be mapped to the + * given device. + * + * @param dev + * Device pointer. + * @param addr + * Virtual address to map. + * @param iova + * IOVA address to map. + * @param len + * Length of the memory segment being mapped. + * + * @return + * 0 if mapping was successful. + * Negative value and rte_errno is set otherwise. + */ +typedef int (*rte_dev_dma_map_t)(struct rte_device *dev, void *addr, + uint64_t iova, size_t len); + +/** + * Device level DMA unmap function. + * After a successful call, the memory segment will no longer be + * accessible by the given device. + * + * @param dev + * Device pointer. + * @param addr + * Virtual address to unmap. + * @param iova + * IOVA address to unmap. + * @param len + * Length of the memory segment being mapped. + * + * @return + * 0 if un-mapping was successful. + * Negative value and rte_errno is set otherwise. + */ +typedef int (*rte_dev_dma_unmap_t)(struct rte_device *dev, void *addr, + uint64_t iova, size_t len); + +/** * Implement a specific hot-unplug handler, which is responsible for * handle the failure when device be hot-unplugged. When the event of * hot-unplug be detected, it could call this function to handle @@ -238,6 +280,8 @@ struct rte_bus { rte_bus_plug_t plug; /**< Probe single device for drivers */ rte_bus_unplug_t unplug; /**< Remove single device from driver */ rte_bus_parse_t parse; /**< Parse a device name */ + rte_dev_dma_map_t dma_map; /**< DMA map for device in the bus */ + rte_dev_dma_unmap_t dma_unmap; /**< DMA unmap for device in the bus */ struct rte_bus_conf conf; /**< Bus configuration */ rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */ rte_dev_iterate_t dev_iterate; /**< Device iterator. */ diff --git a/lib/librte_eal/common/include/rte_dev.h b/lib/librte_eal/common/include/rte_dev.h index 3cad4bce57..0d5e25b500 100644 --- a/lib/librte_eal/common/include/rte_dev.h +++ b/lib/librte_eal/common/include/rte_dev.h @@ -463,4 +463,51 @@ rte_dev_hotplug_handle_enable(void); int __rte_experimental rte_dev_hotplug_handle_disable(void); +/** + * Device level DMA map function. + * After a successful call, the memory segment will be mapped to the + * given device. + * + * @note: Memory must be registered in advance using rte_extmem_* APIs. + * + * @param dev + * Device pointer. + * @param addr + * Virtual address to map. + * @param iova + * IOVA address to map. + * @param len + * Length of the memory segment being mapped. + * + * @return + * 0 if mapping was successful. + * Negative value and rte_errno is set otherwise. + */ +int __rte_experimental +rte_dev_dma_map(struct rte_device *dev, void *addr, uint64_t iova, size_t len); + +/** + * Device level DMA unmap function. + * After a successful call, the memory segment will no longer be + * accessible by the given device. + * + * @note: Memory must be registered in advance using rte_extmem_* APIs. + * + * @param dev + * Device pointer. + * @param addr + * Virtual address to unmap. + * @param iova + * IOVA address to unmap. + * @param len + * Length of the memory segment being mapped. + * + * @return + * 0 if un-mapping was successful. + * Negative value and rte_errno is set otherwise. + */ +int __rte_experimental +rte_dev_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova, + size_t len); + #endif /* _RTE_DEV_H_ */ diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index eb5f7b9cbd..264aa050fa 100644 --- a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ -277,6 +277,8 @@ EXPERIMENTAL { rte_class_unregister; rte_ctrl_thread_create; rte_delay_us_sleep; + rte_dev_dma_map; + rte_dev_dma_unmap; rte_dev_event_callback_process; rte_dev_event_callback_register; rte_dev_event_callback_unregister; From patchwork Tue Mar 5 13:59:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shahaf Shuler X-Patchwork-Id: 50820 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A616D5587; Tue, 5 Mar 2019 15:00:10 +0100 (CET) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id D63804CA9 for ; Tue, 5 Mar 2019 15:00:05 +0100 (CET) Received: from Internal Mail-Server by MTLPINE1 (envelope-from shahafs@mellanox.com) with ESMTPS (AES256-SHA encrypted); 5 Mar 2019 15:59:59 +0200 Received: from unicorn01.mtl.labs.mlnx. (unicorn01.mtl.labs.mlnx [10.7.12.62]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x25DxtOC012489; Tue, 5 Mar 2019 15:59:55 +0200 From: Shahaf Shuler To: anatoly.burakov@intel.com, yskoh@mellanox.com, thomas@monjalon.net, ferruh.yigit@intel.com, nhorman@tuxdriver.com, gaetan.rivet@6wind.com Cc: dev@dpdk.org Date: Tue, 5 Mar 2019 15:59:44 +0200 Message-Id: <47b74a157f1ab6227c4992162fb2af90beed3f30.1551793527.git.shahafs@mellanox.com> X-Mailer: git-send-email 2.12.0 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v3 4/6] net/mlx5: refactor external memory registration X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Move the memory region creation to a separate function to prepare the ground for the reuse of it on the PCI driver map and unmap functions. Signed-off-by: Shahaf Shuler --- drivers/net/mlx5/mlx5_mr.c | 86 +++++++++++++++++++++++++++-------------- 1 file changed, 57 insertions(+), 29 deletions(-) diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c index 700d83d1bc..43ee9c961b 100644 --- a/drivers/net/mlx5/mlx5_mr.c +++ b/drivers/net/mlx5/mlx5_mr.c @@ -1109,6 +1109,58 @@ mlx5_mr_flush_local_cache(struct mlx5_mr_ctrl *mr_ctrl) } /** + * Creates a memory region for external memory, that is memory which is not + * part of the DPDK memory segments. + * + * @param dev + * Pointer to the ethernet device. + * @param addr + * Starting virtual address of memory. + * @param len + * Length of memory segment being mapped. + * @param socked_id + * Socket to allocate heap memory for the control structures. + * + * @return + * Pointer to MR structure on success, NULL otherwise. + */ +static struct mlx5_mr * +mlx5_create_mr_ext(struct rte_eth_dev *dev, uintptr_t addr, size_t len, + int socket_id) +{ + struct mlx5_priv *priv = dev->data->dev_private; + struct mlx5_mr *mr = NULL; + + mr = rte_zmalloc_socket(NULL, + RTE_ALIGN_CEIL(sizeof(*mr), + RTE_CACHE_LINE_SIZE), + RTE_CACHE_LINE_SIZE, socket_id); + if (mr == NULL) + return NULL; + mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)addr, len, + IBV_ACCESS_LOCAL_WRITE); + if (mr->ibv_mr == NULL) { + DRV_LOG(WARNING, + "port %u fail to create a verbs MR for address (%p)", + dev->data->port_id, (void *)addr); + rte_free(mr); + return NULL; + } + mr->msl = NULL; /* Mark it is external memory. */ + mr->ms_bmp = NULL; + mr->ms_n = 1; + mr->ms_bmp_n = 1; + DRV_LOG(DEBUG, + "port %u MR CREATED (%p) for external memory %p:\n" + " [0x%" PRIxPTR ", 0x%" PRIxPTR ")," + " lkey=0x%x base_idx=%u ms_n=%u, ms_bmp_n=%u", + dev->data->port_id, (void *)mr, (void *)addr, + addr, addr + len, rte_cpu_to_be_32(mr->ibv_mr->lkey), + mr->ms_base_idx, mr->ms_n, mr->ms_bmp_n); + return mr; +} + +/** * Called during rte_mempool_mem_iter() by mlx5_mr_update_ext_mp(). * * Externally allocated chunk is registered and a MR is created for the chunk. @@ -1142,43 +1194,19 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque, rte_rwlock_read_unlock(&priv->mr.rwlock); if (lkey != UINT32_MAX) return; - mr = rte_zmalloc_socket(NULL, - RTE_ALIGN_CEIL(sizeof(*mr), - RTE_CACHE_LINE_SIZE), - RTE_CACHE_LINE_SIZE, mp->socket_id); - if (mr == NULL) { - DRV_LOG(WARNING, - "port %u unable to allocate memory for a new MR of" - " mempool (%s).", - dev->data->port_id, mp->name); - data->ret = -1; - return; - } DRV_LOG(DEBUG, "port %u register MR for chunk #%d of mempool (%s)", dev->data->port_id, mem_idx, mp->name); - mr->ibv_mr = mlx5_glue->reg_mr(priv->pd, (void *)addr, len, - IBV_ACCESS_LOCAL_WRITE); - if (mr->ibv_mr == NULL) { + mr = mlx5_create_mr_ext(dev, addr, len, mp->socket_id); + if (!mr) { DRV_LOG(WARNING, - "port %u fail to create a verbs MR for address (%p)", - dev->data->port_id, (void *)addr); - rte_free(mr); + "port %u unable to allocate a new MR of" + " mempool (%s).", + dev->data->port_id, mp->name); data->ret = -1; return; } - mr->msl = NULL; /* Mark it is external memory. */ - mr->ms_bmp = NULL; - mr->ms_n = 1; - mr->ms_bmp_n = 1; rte_rwlock_write_lock(&priv->mr.rwlock); LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr); - DRV_LOG(DEBUG, - "port %u MR CREATED (%p) for external memory %p:\n" - " [0x%" PRIxPTR ", 0x%" PRIxPTR ")," - " lkey=0x%x base_idx=%u ms_n=%u, ms_bmp_n=%u", - dev->data->port_id, (void *)mr, (void *)addr, - addr, addr + len, rte_cpu_to_be_32(mr->ibv_mr->lkey), - mr->ms_base_idx, mr->ms_n, mr->ms_bmp_n); /* Insert to the global cache table. */ mr_insert_dev_cache(dev, mr); rte_rwlock_write_unlock(&priv->mr.rwlock); From patchwork Tue Mar 5 13:59:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shahaf Shuler X-Patchwork-Id: 50818 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2FC664CC7; Tue, 5 Mar 2019 15:00:08 +0100 (CET) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id 46C582BF2 for ; Tue, 5 Mar 2019 15:00:00 +0100 (CET) Received: from Internal Mail-Server by MTLPINE1 (envelope-from shahafs@mellanox.com) with ESMTPS (AES256-SHA encrypted); 5 Mar 2019 15:59:56 +0200 Received: from unicorn01.mtl.labs.mlnx. (unicorn01.mtl.labs.mlnx [10.7.12.62]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x25DxtOD012489; Tue, 5 Mar 2019 15:59:55 +0200 From: Shahaf Shuler To: anatoly.burakov@intel.com, yskoh@mellanox.com, thomas@monjalon.net, ferruh.yigit@intel.com, nhorman@tuxdriver.com, gaetan.rivet@6wind.com Cc: dev@dpdk.org Date: Tue, 5 Mar 2019 15:59:45 +0200 Message-Id: <03843cc0576556bc6959fbba2ed6b859a3ec0de8.1551793527.git.shahafs@mellanox.com> X-Mailer: git-send-email 2.12.0 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v3 5/6] net/mlx5: support PCI device DMA map and unmap X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The implementation reuses the external memory registration work done by commit[1]. Note about representors: The current representor design will not work with those map and unmap functions. The reason is that for representors we have multiple IB devices share the same PCI function, so mapping will happen only on one of the representors and not all of them. While it is possible to implement such support, the IB representor design is going to be changed during DPDK19.05. The new design will have a single IB device for all representors, hence sharing of a single memory region between all representors will be possible. [1] commit 7e43a32ee060 ("net/mlx5: support externally allocated static memory") Signed-off-by: Shahaf Shuler --- drivers/net/mlx5/mlx5.c | 2 + drivers/net/mlx5/mlx5_mr.c | 139 ++++++++++++++++++++++++++++++++++++++ drivers/net/mlx5/mlx5_rxtx.h | 5 ++ 3 files changed, 146 insertions(+) diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 9706e351aa..ab98aec8a2 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -1630,6 +1630,8 @@ static struct rte_pci_driver mlx5_driver = { .id_table = mlx5_pci_id_map, .probe = mlx5_pci_probe, .remove = mlx5_pci_remove, + .dma_map = mlx5_dma_map, + .dma_unmap = mlx5_dma_unmap, .drv_flags = (RTE_PCI_DRV_INTR_LSC | RTE_PCI_DRV_INTR_RMV | RTE_PCI_DRV_PROBE_AGAIN), }; diff --git a/drivers/net/mlx5/mlx5_mr.c b/drivers/net/mlx5/mlx5_mr.c index 43ee9c961b..21f8b5e045 100644 --- a/drivers/net/mlx5/mlx5_mr.c +++ b/drivers/net/mlx5/mlx5_mr.c @@ -14,6 +14,7 @@ #include #include #include +#include #include "mlx5.h" #include "mlx5_mr.h" @@ -1215,6 +1216,144 @@ mlx5_mr_update_ext_mp_cb(struct rte_mempool *mp, void *opaque, } /** + * Finds the first ethdev that match the pci device. + * The existence of multiple ethdev per pci device is only with representors. + * On such case, it is enough to get only one of the ports as they all share + * the same ibv context. + * + * @param pdev + * Pointer to the PCI device. + * + * @return + * Pointer to the ethdev if found, NULL otherwise. + */ +static struct rte_eth_dev * +pci_dev_to_eth_dev(struct rte_pci_device *pdev) +{ + struct rte_dev_iterator it; + struct rte_device *dev; + + /** + * We really need to iterate all devices regardless of + * their owner. + */ + RTE_DEV_FOREACH(dev, "class=eth", &it) + if (dev == &pdev->device) + return it.class_device; + return NULL; +} + +/** + * DPDK callback to DMA map external memory to a PCI device. + * + * @param pdev + * Pointer to the PCI device. + * @param addr + * Starting virtual address of memory to be mapped. + * @param iova + * Starting IOVA address of memory to be mapped. + * @param len + * Length of memory segment being mapped. + * + * @return + * 0 on success, negative value on error. + */ +int +mlx5_dma_map(struct rte_pci_device *pdev, void *addr, + uint64_t iova __rte_unused, size_t len) +{ + struct rte_eth_dev *dev; + struct mlx5_mr *mr; + struct mlx5_priv *priv; + + dev = pci_dev_to_eth_dev(pdev); + if (!dev) { + DRV_LOG(WARNING, "unable to find matching ethdev " + "to PCI device %p", (void *)pdev); + rte_errno = ENODEV; + return -1; + } + priv = dev->data->dev_private; + mr = mlx5_create_mr_ext(dev, (uintptr_t)addr, len, SOCKET_ID_ANY); + if (!mr) { + DRV_LOG(WARNING, + "port %u unable to dma map", dev->data->port_id); + rte_errno = EINVAL; + return -1; + } + rte_rwlock_write_lock(&priv->mr.rwlock); + LIST_INSERT_HEAD(&priv->mr.mr_list, mr, mr); + /* Insert to the global cache table. */ + mr_insert_dev_cache(dev, mr); + rte_rwlock_write_unlock(&priv->mr.rwlock); + return 0; +} + +/** + * DPDK callback to DMA unmap external memory to a PCI device. + * + * @param pdev + * Pointer to the PCI device. + * @param addr + * Starting virtual address of memory to be unmapped. + * @param iova + * Starting IOVA address of memory to be unmapped. + * @param len + * Length of memory segment being unmapped. + * + * @return + * 0 on success, negative value on error. + */ +int +mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, + uint64_t iova __rte_unused, size_t len __rte_unused) +{ + struct rte_eth_dev *dev; + struct mlx5_priv *priv; + struct mlx5_mr *mr; + struct mlx5_mr_cache entry; + + dev = pci_dev_to_eth_dev(pdev); + if (!dev) { + DRV_LOG(WARNING, "unable to find matching ethdev " + "to PCI device %p", (void *)pdev); + rte_errno = ENODEV; + return -1; + } + priv = dev->data->dev_private; + rte_rwlock_read_lock(&priv->mr.rwlock); + mr = mr_lookup_dev_list(dev, &entry, (uintptr_t)addr); + if (!mr) { + rte_rwlock_read_unlock(&priv->mr.rwlock); + DRV_LOG(WARNING, "address 0x%" PRIxPTR " wasn't registered " + "to PCI device %p", (uintptr_t)addr, + (void *)pdev); + rte_errno = EINVAL; + return -1; + } + LIST_REMOVE(mr, mr); + LIST_INSERT_HEAD(&priv->mr.mr_free_list, mr, mr); + DEBUG("port %u remove MR(%p) from list", dev->data->port_id, + (void *)mr); + mr_rebuild_dev_cache(dev); + /* + * Flush local caches by propagating invalidation across cores. + * rte_smp_wmb() is enough to synchronize this event. If one of + * freed memsegs is seen by other core, that means the memseg + * has been allocated by allocator, which will come after this + * free call. Therefore, this store instruction (incrementing + * generation below) will be guaranteed to be seen by other core + * before the core sees the newly allocated memory. + */ + ++priv->mr.dev_gen; + DEBUG("broadcasting local cache flush, gen=%d", + priv->mr.dev_gen); + rte_smp_wmb(); + rte_rwlock_read_unlock(&priv->mr.rwlock); + return 0; +} + +/** * Register MR for entire memory chunks in a Mempool having externally allocated * memory and fill in local cache. * diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h index be464e8705..dcf044488e 100644 --- a/drivers/net/mlx5/mlx5_rxtx.h +++ b/drivers/net/mlx5/mlx5_rxtx.h @@ -28,6 +28,7 @@ #include #include #include +#include #include "mlx5_utils.h" #include "mlx5.h" @@ -367,6 +368,10 @@ uint32_t mlx5_rx_addr2mr_bh(struct mlx5_rxq_data *rxq, uintptr_t addr); uint32_t mlx5_tx_mb2mr_bh(struct mlx5_txq_data *txq, struct rte_mbuf *mb); uint32_t mlx5_tx_update_ext_mp(struct mlx5_txq_data *txq, uintptr_t addr, struct rte_mempool *mp); +int mlx5_dma_map(struct rte_pci_device *pdev, void *addr, uint64_t iova, + size_t len); +int mlx5_dma_unmap(struct rte_pci_device *pdev, void *addr, uint64_t iova, + size_t len); /** * Provide safe 64bit store operation to mlx5 UAR region for both 32bit and From patchwork Tue Mar 5 13:59:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shahaf Shuler X-Patchwork-Id: 50815 Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 59B1F47CE; Tue, 5 Mar 2019 15:00:04 +0100 (CET) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id B74D52BA7 for ; Tue, 5 Mar 2019 15:00:00 +0100 (CET) Received: from Internal Mail-Server by MTLPINE1 (envelope-from shahafs@mellanox.com) with ESMTPS (AES256-SHA encrypted); 5 Mar 2019 15:59:59 +0200 Received: from unicorn01.mtl.labs.mlnx. (unicorn01.mtl.labs.mlnx [10.7.12.62]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id x25DxtOE012489; Tue, 5 Mar 2019 15:59:56 +0200 From: Shahaf Shuler To: anatoly.burakov@intel.com, yskoh@mellanox.com, thomas@monjalon.net, ferruh.yigit@intel.com, nhorman@tuxdriver.com, gaetan.rivet@6wind.com Cc: dev@dpdk.org Date: Tue, 5 Mar 2019 15:59:46 +0200 Message-Id: X-Mailer: git-send-email 2.12.0 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v3 6/6] doc: deprecation notice for VFIO DMA map APIs X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" As those should be replaced by rte_dev_dma_map and rte_dev_dma_unmap APIs. Signed-off-by: Shahaf Shuler --- doc/guides/prog_guide/env_abstraction_layer.rst | 2 +- doc/guides/rel_notes/deprecation.rst | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index 929d76dba7..ec2fe65523 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -282,7 +282,7 @@ The expected workflow is as follows: - If IOVA table is not specified, IOVA addresses will be assumed to be unavailable - Other processes must attach to the memory area before they can use it -* Perform DMA mapping with ``rte_vfio_dma_map`` if needed +* Perform DMA mapping with ``rte_dev_dma_map`` if needed * Use the memory area in your application * If memory area is no longer needed, it can be unregistered - If the area was mapped for DMA, unmapping must be performed before diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst index 1b4fcb7e64..48ec4fee88 100644 --- a/doc/guides/rel_notes/deprecation.rst +++ b/doc/guides/rel_notes/deprecation.rst @@ -35,6 +35,10 @@ Deprecation Notices + ``rte_eal_devargs_type_count`` +* vfio: removal of ``rte_vfio_dma_map`` and ``rte_vfio_dma_unmap`` APIs which + have been replaced with ``rte_dev_dma_map`` and ``rte_dev_dma_unmap`` + functions. The due date for the removal targets DPDK 20.02. + * pci: Several exposed functions are misnamed. The following functions are deprecated starting from v17.11 and are replaced: