From patchwork Fri Jun 14 09:39:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Marchand X-Patchwork-Id: 54800 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 44D7A1D502; Fri, 14 Jun 2019 11:39:50 +0200 (CEST) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 67B111D4B6; Fri, 14 Jun 2019 11:39:48 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9FBD737EEA; Fri, 14 Jun 2019 09:39:47 +0000 (UTC) Received: from dmarchan.remote.csb (unknown [10.40.205.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 849846C357; Fri, 14 Jun 2019 09:39:44 +0000 (UTC) From: David Marchand To: dev@dpdk.org Cc: benjamin.walker@intel.com, jerinj@marvell.com, anatoly.burakov@intel.com, maxime.coquelin@redhat.com, thomas@monjalon.net, stable@dpdk.org, Ferruh Yigit Date: Fri, 14 Jun 2019 11:39:15 +0200 Message-Id: <1560505157-9769-2-git-send-email-david.marchand@redhat.com> In-Reply-To: <1560505157-9769-1-git-send-email-david.marchand@redhat.com> References: <20190530174819.1160221-1-benjamin.walker@intel.com> <1560505157-9769-1-git-send-email-david.marchand@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Fri, 14 Jun 2019 09:39:47 +0000 (UTC) Subject: [dpdk-dev] [PATCH v2 1/3] kni: refuse to initialise when IOVA is not PA X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" If a forced iova-mode has been passed at init, kni is not supposed to work. Fixes: 075b182b54ce ("eal: force IOVA to a particular mode") Cc: stable@dpdk.org Signed-off-by: David Marchand --- lib/librte_kni/rte_kni.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c index a0f1e37..a6bf323 100644 --- a/lib/librte_kni/rte_kni.c +++ b/lib/librte_kni/rte_kni.c @@ -97,6 +97,11 @@ enum kni_ops_status { int rte_kni_init(unsigned int max_kni_ifaces __rte_unused) { + if (rte_eal_iova_mode() != RTE_IOVA_PA) { + RTE_LOG(ERR, KNI, "KNI requires IOVA as PA\n"); + return -1; + } + /* Check FD and open */ if (kni_fd < 0) { kni_fd = open("/dev/" KNI_DEVICE, O_RDWR); From patchwork Fri Jun 14 09:39:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Marchand X-Patchwork-Id: 54801 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 3A9A41D507; Fri, 14 Jun 2019 11:39:54 +0200 (CEST) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id C751D1D507 for ; Fri, 14 Jun 2019 11:39:51 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 076863082E4D; Fri, 14 Jun 2019 09:39:51 +0000 (UTC) Received: from dmarchan.remote.csb (unknown [10.40.205.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3774C6C357; Fri, 14 Jun 2019 09:39:48 +0000 (UTC) From: David Marchand To: dev@dpdk.org Cc: benjamin.walker@intel.com, jerinj@marvell.com, anatoly.burakov@intel.com, maxime.coquelin@redhat.com, thomas@monjalon.net, Bruce Richardson Date: Fri, 14 Jun 2019 11:39:16 +0200 Message-Id: <1560505157-9769-3-git-send-email-david.marchand@redhat.com> In-Reply-To: <1560505157-9769-1-git-send-email-david.marchand@redhat.com> References: <20190530174819.1160221-1-benjamin.walker@intel.com> <1560505157-9769-1-git-send-email-david.marchand@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Fri, 14 Jun 2019 09:39:51 +0000 (UTC) Subject: [dpdk-dev] [PATCH v2 2/3] eal: compute IOVA mode based on PA availability X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Ben Walker Currently, if the bus selects IOVA as PA, the memory init can fail when lacking access to physical addresses. This can be quite hard for normal users to understand what is wrong since this is the default behavior. Catch this situation earlier in eal init by validating physical addresses availability, or select IOVA when no clear preferrence had been expressed. The bus code is changed so that it reports when it does not care about the IOVA mode and let the eal init decide. In Linux implementation, rework rte_eal_using_phys_addrs() so that it can be called earlier but still avoid a circular dependency with rte_mem_virt2phys(). In FreeBSD implementation, rte_eal_using_phys_addrs() always returns false, so the detection part is left as is. If librte_kni is compiled in and the KNI kmod is loaded, - if the buses requested VA, force to PA if physical addresses are available as it was done before, - else, keep iova as VA, KNI init will fail later. Signed-off-by: Ben Walker Signed-off-by: David Marchand Acked-by: Anatoly Burakov --- lib/librte_eal/common/eal_common_bus.c | 4 --- lib/librte_eal/common/include/rte_bus.h | 2 +- lib/librte_eal/freebsd/eal/eal.c | 10 +++++-- lib/librte_eal/linux/eal/eal.c | 38 +++++++++++++++++++++------ lib/librte_eal/linux/eal/eal_memory.c | 46 +++++++++------------------------ 5 files changed, 51 insertions(+), 49 deletions(-) diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c index c8f1901..77f1be1 100644 --- a/lib/librte_eal/common/eal_common_bus.c +++ b/lib/librte_eal/common/eal_common_bus.c @@ -237,10 +237,6 @@ enum rte_iova_mode mode |= bus->get_iommu_class(); } - if (mode != RTE_IOVA_VA) { - /* Use default IOVA mode */ - mode = RTE_IOVA_PA; - } return mode; } diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h index 4faf2d2..90fe4e9 100644 --- a/lib/librte_eal/common/include/rte_bus.h +++ b/lib/librte_eal/common/include/rte_bus.h @@ -392,7 +392,7 @@ struct rte_bus *rte_bus_find(const struct rte_bus *start, rte_bus_cmp_t cmp, /** * Get the common iommu class of devices bound on to buses available in the - * system. The default mode is PA. + * system. RTE_IOVA_DC means that no preferrence has been expressed. * * @return * enum rte_iova_mode value. diff --git a/lib/librte_eal/freebsd/eal/eal.c b/lib/librte_eal/freebsd/eal/eal.c index 4eaa531..231f1dc 100644 --- a/lib/librte_eal/freebsd/eal/eal.c +++ b/lib/librte_eal/freebsd/eal/eal.c @@ -689,13 +689,19 @@ static void rte_eal_init_alert(const char *msg) /* if no EAL option "--iova-mode=", use bus IOVA scheme */ if (internal_config.iova_mode == RTE_IOVA_DC) { /* autodetect the IOVA mapping mode (default is RTE_IOVA_PA) */ - rte_eal_get_configuration()->iova_mode = - rte_bus_get_iommu_class(); + enum rte_iova_mode iova_mode = rte_bus_get_iommu_class(); + + if (iova_mode == RTE_IOVA_DC) + iova_mode = RTE_IOVA_PA; + rte_eal_get_configuration()->iova_mode = iova_mode; } else { rte_eal_get_configuration()->iova_mode = internal_config.iova_mode; } + RTE_LOG(INFO, EAL, "Selected IOVA mode '%s'\n", + rte_eal_iova_mode() == RTE_IOVA_PA ? "PA" : "VA"); + if (internal_config.no_hugetlbfs == 0) { /* rte_config isn't initialized yet */ ret = internal_config.process_type == RTE_PROC_PRIMARY ? diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c index 3e1d6eb..785ed2b 100644 --- a/lib/librte_eal/linux/eal/eal.c +++ b/lib/librte_eal/linux/eal/eal.c @@ -948,6 +948,7 @@ static void rte_eal_init_alert(const char *msg) static char logid[PATH_MAX]; char cpuset[RTE_CPU_AFFINITY_STR_LEN]; char thread_name[RTE_MAX_THREAD_NAME_LEN]; + bool phys_addrs; /* checks if the machine is adequate */ if (!rte_cpu_is_supported()) { @@ -1035,25 +1036,46 @@ static void rte_eal_init_alert(const char *msg) return -1; } + phys_addrs = rte_eal_using_phys_addrs() != 0; + /* if no EAL option "--iova-mode=", use bus IOVA scheme */ if (internal_config.iova_mode == RTE_IOVA_DC) { - /* autodetect the IOVA mapping mode (default is RTE_IOVA_PA) */ - rte_eal_get_configuration()->iova_mode = - rte_bus_get_iommu_class(); + /* autodetect the IOVA mapping mode */ + enum rte_iova_mode iova_mode = rte_bus_get_iommu_class(); + if (iova_mode == RTE_IOVA_DC) { + iova_mode = phys_addrs ? RTE_IOVA_PA : RTE_IOVA_VA; + RTE_LOG(DEBUG, EAL, + "Buses did not request a specific IOVA mode, using '%s' based on physical addresses availability.\n", + phys_addrs ? "PA" : "VA"); + } +#ifdef RTE_LIBRTE_KNI /* Workaround for KNI which requires physical address to work */ - if (rte_eal_get_configuration()->iova_mode == RTE_IOVA_VA && + if (iova_mode == RTE_IOVA_VA && rte_eal_check_module("rte_kni") == 1) { - rte_eal_get_configuration()->iova_mode = RTE_IOVA_PA; - RTE_LOG(WARNING, EAL, - "Some devices want IOVA as VA but PA will be used because.. " - "KNI module inserted\n"); + if (phys_addrs) { + iova_mode = RTE_IOVA_PA; + RTE_LOG(WARNING, EAL, "Forcing IOVA as 'PA' because KNI module is loaded\n"); + } else { + RTE_LOG(DEBUG, EAL, "KNI can not work since physical addresses are unavailable\n"); + } } +#endif + rte_eal_get_configuration()->iova_mode = iova_mode; } else { rte_eal_get_configuration()->iova_mode = internal_config.iova_mode; } + if (rte_eal_iova_mode() == RTE_IOVA_PA && !phys_addrs) { + rte_eal_init_alert("Cannot use IOVA as 'PA' since physical addresses are not available"); + rte_errno = EINVAL; + return -1; + } + + RTE_LOG(INFO, EAL, "Selected IOVA mode '%s'\n", + rte_eal_iova_mode() == RTE_IOVA_PA ? "PA" : "VA"); + if (internal_config.no_hugetlbfs == 0) { /* rte_config isn't initialized yet */ ret = internal_config.process_type == RTE_PROC_PRIMARY ? diff --git a/lib/librte_eal/linux/eal/eal_memory.c b/lib/librte_eal/linux/eal/eal_memory.c index 1853ace..25c4145 100644 --- a/lib/librte_eal/linux/eal/eal_memory.c +++ b/lib/librte_eal/linux/eal/eal_memory.c @@ -65,34 +65,10 @@ * zone as well as a physical contiguous zone. */ -static bool phys_addrs_available = true; +static int phys_addrs_available = -1; #define RANDOMIZE_VA_SPACE_FILE "/proc/sys/kernel/randomize_va_space" -static void -test_phys_addrs_available(void) -{ - uint64_t tmp = 0; - phys_addr_t physaddr; - - if (!rte_eal_has_hugepages()) { - RTE_LOG(ERR, EAL, - "Started without hugepages support, physical addresses not available\n"); - phys_addrs_available = false; - return; - } - - physaddr = rte_mem_virt2phy(&tmp); - if (physaddr == RTE_BAD_PHYS_ADDR) { - if (rte_eal_iova_mode() == RTE_IOVA_PA) - RTE_LOG(ERR, EAL, - "Cannot obtain physical addresses: %s. " - "Only vfio will function.\n", - strerror(errno)); - phys_addrs_available = false; - } -} - /* * Get physical address of any mapped virtual address in the current process. */ @@ -105,8 +81,7 @@ int page_size; off_t offset; - /* Cannot parse /proc/self/pagemap, no need to log errors everywhere */ - if (!phys_addrs_available) + if (phys_addrs_available == 0) return RTE_BAD_IOVA; /* standard page size */ @@ -1336,8 +1311,6 @@ void numa_error(char *where) int nr_hugefiles, nr_hugepages = 0; void *addr; - test_phys_addrs_available(); - memset(used_hp, 0, sizeof(used_hp)); /* get pointer to global configuration */ @@ -1516,7 +1489,7 @@ void numa_error(char *where) continue; } - if (phys_addrs_available && + if (rte_eal_using_phys_addrs() && rte_eal_iova_mode() != RTE_IOVA_VA) { /* find physical addresses for each hugepage */ if (find_physaddrs(&tmp_hp[hp_offset], hpi) < 0) { @@ -1735,8 +1708,6 @@ void numa_error(char *where) uint64_t memory[RTE_MAX_NUMA_NODES]; int hp_sz_idx, socket_id; - test_phys_addrs_available(); - memset(used_hp, 0, sizeof(used_hp)); for (hp_sz_idx = 0; @@ -1879,8 +1850,6 @@ void numa_error(char *where) "into secondary processes\n"); } - test_phys_addrs_available(); - fd_hugepage = open(eal_hugepage_data_path(), O_RDONLY); if (fd_hugepage < 0) { RTE_LOG(ERR, EAL, "Could not open %s\n", @@ -2020,6 +1989,15 @@ void numa_error(char *where) int rte_eal_using_phys_addrs(void) { + if (phys_addrs_available == -1) { + uint64_t tmp = 0; + + if (rte_eal_has_hugepages() != 0 && + rte_mem_virt2phy(&tmp) != RTE_BAD_PHYS_ADDR) + phys_addrs_available = 1; + else + phys_addrs_available = 0; + } return phys_addrs_available; } From patchwork Fri Jun 14 09:39:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Marchand X-Patchwork-Id: 54802 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 81EB11D519; Fri, 14 Jun 2019 11:39:56 +0200 (CEST) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 2D6741D4B6 for ; Fri, 14 Jun 2019 11:39:54 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 95D1A307D84B; Fri, 14 Jun 2019 09:39:53 +0000 (UTC) Received: from dmarchan.remote.csb (unknown [10.40.205.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9422C7C576; Fri, 14 Jun 2019 09:39:51 +0000 (UTC) From: David Marchand To: dev@dpdk.org Cc: benjamin.walker@intel.com, jerinj@marvell.com, anatoly.burakov@intel.com, maxime.coquelin@redhat.com, thomas@monjalon.net Date: Fri, 14 Jun 2019 11:39:17 +0200 Message-Id: <1560505157-9769-4-git-send-email-david.marchand@redhat.com> In-Reply-To: <1560505157-9769-1-git-send-email-david.marchand@redhat.com> References: <20190530174819.1160221-1-benjamin.walker@intel.com> <1560505157-9769-1-git-send-email-david.marchand@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.48]); Fri, 14 Jun 2019 09:39:53 +0000 (UTC) Subject: [dpdk-dev] [PATCH v2 3/3] bus/pci: only consider usable devices to select IOVA mode X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Ben Walker When selecting the preferred IOVA mode of the pci bus, the current heuristic ("are devices bound?", "are devices bound to UIO?", "are pmd drivers supporting IOVA as VA?" etc..) should honor the device white/blacklist so that an unwanted device does not impact the decision. There is no reason to consider a device which has no driver available. This applies to all OS, so implements this in common code then call a OS specific callback. On Linux side: - the VFIO special considerations should be evaluated only if VFIO support is built, - there is no strong requirement on using VA rather than PA if a driver supports VA, so defaulting to DC in such a case. Signed-off-by: Ben Walker Signed-off-by: David Marchand Reviewed-by: Anatoly Burakov --- drivers/bus/pci/bsd/pci.c | 9 +- drivers/bus/pci/linux/pci.c | 191 ++++++++++++------------------------------- drivers/bus/pci/pci_common.c | 65 +++++++++++++++ drivers/bus/pci/private.h | 8 ++ 4 files changed, 131 insertions(+), 142 deletions(-) diff --git a/drivers/bus/pci/bsd/pci.c b/drivers/bus/pci/bsd/pci.c index c7b90cb..a2de709 100644 --- a/drivers/bus/pci/bsd/pci.c +++ b/drivers/bus/pci/bsd/pci.c @@ -376,13 +376,14 @@ return -1; } -/* - * Get iommu class of PCI devices on the bus. - */ enum rte_iova_mode -rte_pci_get_iommu_class(void) +pci_device_iova_mode(const struct rte_pci_driver *pdrv __rte_unused, + const struct rte_pci_device *pdev) { /* Supports only RTE_KDRV_NIC_UIO */ + if (pdev->kdrv != RTE_KDRV_NIC_UIO) + RTE_LOG(DEBUG, EAL, "Unsupported kernel driver? Defaulting to IOVA as 'PA'\n"); + return RTE_IOVA_PA; } diff --git a/drivers/bus/pci/linux/pci.c b/drivers/bus/pci/linux/pci.c index b931cf9..33c8ea7 100644 --- a/drivers/bus/pci/linux/pci.c +++ b/drivers/bus/pci/linux/pci.c @@ -500,95 +500,14 @@ return -1; } -/* - * Is pci device bound to any kdrv - */ -static inline int -pci_one_device_is_bound(void) -{ - struct rte_pci_device *dev = NULL; - int ret = 0; - - FOREACH_DEVICE_ON_PCIBUS(dev) { - if (dev->kdrv == RTE_KDRV_UNKNOWN || - dev->kdrv == RTE_KDRV_NONE) { - continue; - } else { - ret = 1; - break; - } - } - return ret; -} - -/* - * Any one of the device bound to uio - */ -static inline int -pci_one_device_bound_uio(void) -{ - struct rte_pci_device *dev = NULL; - struct rte_devargs *devargs; - int need_check; - - FOREACH_DEVICE_ON_PCIBUS(dev) { - devargs = dev->device.devargs; - - need_check = 0; - switch (rte_pci_bus.bus.conf.scan_mode) { - case RTE_BUS_SCAN_WHITELIST: - if (devargs && devargs->policy == RTE_DEV_WHITELISTED) - need_check = 1; - break; - case RTE_BUS_SCAN_UNDEFINED: - case RTE_BUS_SCAN_BLACKLIST: - if (devargs == NULL || - devargs->policy != RTE_DEV_BLACKLISTED) - need_check = 1; - break; - } - - if (!need_check) - continue; - - if (dev->kdrv == RTE_KDRV_IGB_UIO || - dev->kdrv == RTE_KDRV_UIO_GENERIC) { - return 1; - } - } - return 0; -} - -/* - * Any one of the device has iova as va - */ -static inline int -pci_one_device_has_iova_va(void) -{ - struct rte_pci_device *dev = NULL; - struct rte_pci_driver *drv = NULL; - - FOREACH_DRIVER_ON_PCIBUS(drv) { - if (drv && drv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) { - FOREACH_DEVICE_ON_PCIBUS(dev) { - if ((dev->kdrv == RTE_KDRV_VFIO || - dev->kdrv == RTE_KDRV_NIC_MLX) && - rte_pci_match(drv, dev)) - return 1; - } - } - } - return 0; -} - #if defined(RTE_ARCH_X86) static bool -pci_one_device_iommu_support_va(struct rte_pci_device *dev) +pci_one_device_iommu_support_va(const struct rte_pci_device *dev) { #define VTD_CAP_MGAW_SHIFT 16 #define VTD_CAP_MGAW_MASK (0x3fULL << VTD_CAP_MGAW_SHIFT) #define X86_VA_WIDTH 47 /* From Documentation/x86/x86_64/mm.txt */ - struct rte_pci_addr *addr = &dev->addr; + const struct rte_pci_addr *addr = &dev->addr; char filename[PATH_MAX]; FILE *fp; uint64_t mgaw, vtd_cap_reg = 0; @@ -632,80 +551,76 @@ } #elif defined(RTE_ARCH_PPC_64) static bool -pci_one_device_iommu_support_va(__rte_unused struct rte_pci_device *dev) +pci_one_device_iommu_support_va(__rte_unused const struct rte_pci_device *dev) { return false; } #else static bool -pci_one_device_iommu_support_va(__rte_unused struct rte_pci_device *dev) +pci_one_device_iommu_support_va(__rte_unused const struct rte_pci_device *dev) { return true; } #endif -/* - * All devices IOMMUs support VA as IOVA - */ -static bool -pci_devices_iommu_support_va(void) +enum rte_iova_mode +pci_device_iova_mode(const struct rte_pci_driver *pdrv, + const struct rte_pci_device *pdev) { - struct rte_pci_device *dev = NULL; - struct rte_pci_driver *drv = NULL; + enum rte_iova_mode iova_mode = RTE_IOVA_DC; + static int iommu_no_va = -1; - FOREACH_DRIVER_ON_PCIBUS(drv) { - FOREACH_DEVICE_ON_PCIBUS(dev) { - if (!rte_pci_match(drv, dev)) - continue; - /* - * just one PCI device needs to be checked out because - * the IOMMU hardware is the same for all of them. - */ - return pci_one_device_iommu_support_va(dev); + switch (pdev->kdrv) { + case RTE_KDRV_VFIO: { +#ifdef VFIO_PRESENT + static int is_vfio_noiommu_enabled = -1; + + if (is_vfio_noiommu_enabled == -1) { + if (rte_vfio_noiommu_is_enabled() == 1) + is_vfio_noiommu_enabled = 1; + else + is_vfio_noiommu_enabled = 0; + } + if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) == 0) { + iova_mode = RTE_IOVA_PA; + } else if (is_vfio_noiommu_enabled != 0) { + RTE_LOG(DEBUG, EAL, "Forcing to 'PA', vfio-noiommu mode configured\n"); + iova_mode = RTE_IOVA_PA; } +#endif + break; } - return true; -} -/* - * Get iommu class of PCI devices on the bus. - */ -enum rte_iova_mode -rte_pci_get_iommu_class(void) -{ - bool is_bound; - bool is_vfio_noiommu_enabled = true; - bool has_iova_va; - bool is_bound_uio; - bool iommu_no_va; - - is_bound = pci_one_device_is_bound(); - if (!is_bound) - return RTE_IOVA_DC; - - has_iova_va = pci_one_device_has_iova_va(); - is_bound_uio = pci_one_device_bound_uio(); - iommu_no_va = !pci_devices_iommu_support_va(); -#ifdef VFIO_PRESENT - is_vfio_noiommu_enabled = rte_vfio_noiommu_is_enabled() == true ? - true : false; -#endif + case RTE_KDRV_NIC_MLX: + if ((pdrv->drv_flags & RTE_PCI_DRV_IOVA_AS_VA) == 0) + iova_mode = RTE_IOVA_PA; + break; - if (has_iova_va && !is_bound_uio && !is_vfio_noiommu_enabled && - !iommu_no_va) - return RTE_IOVA_VA; + case RTE_KDRV_IGB_UIO: + case RTE_KDRV_UIO_GENERIC: + iova_mode = RTE_IOVA_PA; + break; - if (has_iova_va) { - RTE_LOG(WARNING, EAL, "Some devices want iova as va but pa will be used because.. "); - if (is_vfio_noiommu_enabled) - RTE_LOG(WARNING, EAL, "vfio-noiommu mode configured\n"); - if (is_bound_uio) - RTE_LOG(WARNING, EAL, "few device bound to UIO\n"); - if (iommu_no_va) - RTE_LOG(WARNING, EAL, "IOMMU does not support IOVA as VA\n"); + default: + RTE_LOG(DEBUG, EAL, "Unsupported kernel driver? Defaulting to IOVA as 'PA'\n"); + iova_mode = RTE_IOVA_PA; + break; } - return RTE_IOVA_PA; + if (iova_mode != RTE_IOVA_PA) { + /* + * We can check this only once, because the IOMMU hardware is + * the same for all of them. + */ + if (iommu_no_va == -1) + iommu_no_va = pci_one_device_iommu_support_va(pdev) + ? 0 : 1; + if (iommu_no_va != 0) { + RTE_LOG(DEBUG, EAL, "Forcing to 'PA', IOMMU does not support IOVA as 'VA'\n"); + iova_mode = RTE_IOVA_PA; + } + } + return iova_mode; } /* Read PCI config space. */ diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c index 704b9d7..d2af472 100644 --- a/drivers/bus/pci/pci_common.c +++ b/drivers/bus/pci/pci_common.c @@ -574,6 +574,71 @@ static struct rte_devargs *pci_devargs_lookup(struct rte_pci_device *dev) return -1; } +static bool +pci_ignore_device(const struct rte_pci_device *dev) +{ + struct rte_devargs *devargs = dev->device.devargs; + + switch (rte_pci_bus.bus.conf.scan_mode) { + case RTE_BUS_SCAN_WHITELIST: + if (devargs && devargs->policy == RTE_DEV_WHITELISTED) + return false; + break; + case RTE_BUS_SCAN_UNDEFINED: + case RTE_BUS_SCAN_BLACKLIST: + if (devargs == NULL || + devargs->policy != RTE_DEV_BLACKLISTED) + return false; + break; + } + return true; +} + +enum rte_iova_mode +rte_pci_get_iommu_class(void) +{ + enum rte_iova_mode iova_mode = RTE_IOVA_DC; + const struct rte_pci_device *dev; + const struct rte_pci_driver *drv; + bool devices_want_va = false; + bool devices_want_pa = false; + + FOREACH_DEVICE_ON_PCIBUS(dev) { + if (pci_ignore_device(dev)) + continue; + if (dev->kdrv == RTE_KDRV_UNKNOWN || + dev->kdrv == RTE_KDRV_NONE) + continue; + FOREACH_DRIVER_ON_PCIBUS(drv) { + enum rte_iova_mode dev_iova_mode; + + if (!rte_pci_match(drv, dev)) + continue; + + dev_iova_mode = pci_device_iova_mode(drv, dev); + RTE_LOG(DEBUG, EAL, "PCI driver %s for device " + PCI_PRI_FMT " wants IOVA as '%s'\n", + drv->driver.name, + dev->addr.domain, dev->addr.bus, + dev->addr.devid, dev->addr.function, + dev_iova_mode == RTE_IOVA_DC ? "DC" : + (dev_iova_mode == RTE_IOVA_PA ? "PA" : "VA")); + if (dev_iova_mode == RTE_IOVA_PA) + devices_want_pa = true; + else if (dev_iova_mode == RTE_IOVA_VA) + devices_want_va = true; + } + } + if (devices_want_pa) { + iova_mode = RTE_IOVA_PA; + if (devices_want_va) + RTE_LOG(WARNING, EAL, "Some devices want 'VA' but forcing 'PA' because other devices want it\n"); + } else if (devices_want_va) { + iova_mode = RTE_IOVA_VA; + } + return iova_mode; +} + struct rte_pci_bus rte_pci_bus = { .bus = { .scan = rte_pci_scan, diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h index 13c3324..8a55240 100644 --- a/drivers/bus/pci/private.h +++ b/drivers/bus/pci/private.h @@ -173,6 +173,14 @@ int pci_uio_map_resource_by_index(struct rte_pci_device *dev, int res_idx, const struct rte_pci_device *pci_dev); /** + * OS specific callback for rte_pci_get_iommu_class + * + */ +enum rte_iova_mode +pci_device_iova_mode(const struct rte_pci_driver *pci_drv, + const struct rte_pci_device *pci_dev); + +/** * Get iommu class of PCI devices on the bus. * And return their preferred iova mapping mode. *