From patchwork Thu Jul 5 07:38:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Guo, Jia" X-Patchwork-Id: 42294 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 69ED51BE42; Thu, 5 Jul 2018 09:41:09 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id BC6811BE58 for ; Thu, 5 Jul 2018 09:41:07 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jul 2018 00:41:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,311,1526367600"; d="scan'208";a="52144036" Received: from jeffguo-z170x-ud5.sh.intel.com (HELO localhost.localdomain) ([10.67.104.10]) by fmsmga007.fm.intel.com with ESMTP; 05 Jul 2018 00:41:04 -0700 From: Jeff Guo To: stephen@networkplumber.org, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, gaetan.rivet@6wind.com, jingjing.wu@intel.com, thomas@monjalon.net, motih@mellanox.com, matan@mellanox.com, harry.van.haaren@intel.com, qi.z.zhang@intel.com, shaopeng.he@intel.com, bernard.iremonger@intel.com Cc: jblunck@infradead.org, shreyansh.jain@nxp.com, dev@dpdk.org, jia.guo@intel.com, helin.zhang@intel.com Date: Thu, 5 Jul 2018 15:38:48 +0800 Message-Id: <1530776333-30318-2-git-send-email-jia.guo@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1530776333-30318-1-git-send-email-jia.guo@intel.com> References: <1498711073-42917-1-git-send-email-jia.guo@intel.com> <1530776333-30318-1-git-send-email-jia.guo@intel.com> Subject: [dpdk-dev] [PATCH V5 1/7] bus: add hotplug failure handler X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" When device be hotplug out, if app still continue to access device by mmio, it will cause of memory failure and result the system crash. This patch introduces a bus ops to handle device hotplug failure, it is a bus specific behavior,so that each kind of bus can implement its own logic case by case. Signed-off-by: Jeff Guo Acked-by: Shaopeng He --- v5->v4: change ops name to be more clear refine doc and commit log --- lib/librte_eal/common/include/rte_bus.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h index eb9eded..8a993cf 100644 --- a/lib/librte_eal/common/include/rte_bus.h +++ b/lib/librte_eal/common/include/rte_bus.h @@ -168,6 +168,19 @@ typedef int (*rte_bus_unplug_t)(struct rte_device *dev); typedef int (*rte_bus_parse_t)(const char *name, void *addr); /** + * Implementation a specific hotplug failure handler, which is responsible + * for handle the failure when hot remove the device, guaranty the system + * would not crash in the case. + * @param dev + * Pointer of the device structure. + * + * @return + * 0 on success. + * !0 on error. + */ +typedef int (*rte_bus_hotplug_failure_handler_t)(struct rte_device *dev); + +/** * Bus scan policies */ enum rte_bus_scan_mode { @@ -211,6 +224,8 @@ struct rte_bus { rte_bus_parse_t parse; /**< Parse a device name */ struct rte_bus_conf conf; /**< Bus configuration */ rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */ + rte_bus_hotplug_failure_handler_t hotplug_failure_handler; + /**< handle hotplug failure on bus */ }; /** From patchwork Thu Jul 5 07:38:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Guo, Jia" X-Patchwork-Id: 42295 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 64D9B1BEA0; Thu, 5 Jul 2018 09:41:13 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 8DC281BE8A for ; Thu, 5 Jul 2018 09:41:10 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jul 2018 00:41:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,311,1526367600"; d="scan'208";a="52144046" Received: from jeffguo-z170x-ud5.sh.intel.com (HELO localhost.localdomain) ([10.67.104.10]) by fmsmga007.fm.intel.com with ESMTP; 05 Jul 2018 00:41:07 -0700 From: Jeff Guo To: stephen@networkplumber.org, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, gaetan.rivet@6wind.com, jingjing.wu@intel.com, thomas@monjalon.net, motih@mellanox.com, matan@mellanox.com, harry.van.haaren@intel.com, qi.z.zhang@intel.com, shaopeng.he@intel.com, bernard.iremonger@intel.com Cc: jblunck@infradead.org, shreyansh.jain@nxp.com, dev@dpdk.org, jia.guo@intel.com, helin.zhang@intel.com Date: Thu, 5 Jul 2018 15:38:49 +0800 Message-Id: <1530776333-30318-3-git-send-email-jia.guo@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1530776333-30318-1-git-send-email-jia.guo@intel.com> References: <1498711073-42917-1-git-send-email-jia.guo@intel.com> <1530776333-30318-1-git-send-email-jia.guo@intel.com> Subject: [dpdk-dev] [PATCH V5 2/7] bus/pci: implement hotplug failure handler ops X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This patch implements the ops of hotplug failure handler for PCI bus, it is functional to remap a new dummy memory which overlap to the failure memory to avoid MMIO read/write error. Signed-off-by: Jeff Guo Acked-by: Shaopeng He --- v5->v4: refine log and commit log --- drivers/bus/pci/pci_common.c | 28 ++++++++++++++++++++++++++++ drivers/bus/pci/pci_common_uio.c | 33 +++++++++++++++++++++++++++++++++ drivers/bus/pci/private.h | 12 ++++++++++++ 3 files changed, 73 insertions(+) diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c index 94b0f41..bc3bcac 100644 --- a/drivers/bus/pci/pci_common.c +++ b/drivers/bus/pci/pci_common.c @@ -408,6 +408,33 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp, } static int +pci_hotplug_failure_handler(struct rte_device *dev) +{ + struct rte_pci_device *pdev = NULL; + int ret = 0; + + pdev = RTE_DEV_TO_PCI(dev); + if (!pdev) + return -1; + + switch (pdev->kdrv) { + case RTE_KDRV_IGB_UIO: + case RTE_KDRV_UIO_GENERIC: + case RTE_KDRV_NIC_UIO: + /* mmio resources is invalid, remap it to be safe. */ + ret = pci_uio_remap_resource(pdev); + break; + default: + RTE_LOG(DEBUG, EAL, + "Not managed by a supported kernel driver, skipped\n"); + ret = -1; + break; + } + + return ret; +} + +static int pci_plug(struct rte_device *dev) { return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev)); @@ -437,6 +464,7 @@ struct rte_pci_bus rte_pci_bus = { .unplug = pci_unplug, .parse = pci_parse, .get_iommu_class = rte_pci_get_iommu_class, + .hotplug_failure_handler = pci_hotplug_failure_handler, }, .device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list), .driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list), diff --git a/drivers/bus/pci/pci_common_uio.c b/drivers/bus/pci/pci_common_uio.c index 54bc20b..7ea73db 100644 --- a/drivers/bus/pci/pci_common_uio.c +++ b/drivers/bus/pci/pci_common_uio.c @@ -146,6 +146,39 @@ pci_uio_unmap(struct mapped_pci_resource *uio_res) } } +/* remap the PCI resource of a PCI device in anonymous virtual memory */ +int +pci_uio_remap_resource(struct rte_pci_device *dev) +{ + int i; + void *map_address; + + if (dev == NULL) + return -1; + + /* Remap all BARs */ + for (i = 0; i != PCI_MAX_RESOURCE; i++) { + /* skip empty BAR */ + if (dev->mem_resource[i].phys_addr == 0) + continue; + map_address = mmap(dev->mem_resource[i].addr, + (size_t)dev->mem_resource[i].len, + PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0); + if (map_address == MAP_FAILED) { + RTE_LOG(ERR, EAL, + "Cannot remap resource for device %s\n", + dev->name); + return -1; + } + RTE_LOG(INFO, EAL, + "Successful remap resource for device %s\n", + dev->name); + } + + return 0; +} + static struct mapped_pci_resource * pci_uio_find_resource(struct rte_pci_device *dev) { diff --git a/drivers/bus/pci/private.h b/drivers/bus/pci/private.h index 8ddd03e..6b312e5 100644 --- a/drivers/bus/pci/private.h +++ b/drivers/bus/pci/private.h @@ -123,6 +123,18 @@ void pci_uio_free_resource(struct rte_pci_device *dev, struct mapped_pci_resource *uio_res); /** + * Remap the PCI resource of a PCI device in anonymous virtual memory. + * + * @param dev + * Point to the struct rte pci device. + * @return + * - On success, zero. + * - On failure, a negative value. + */ +int +pci_uio_remap_resource(struct rte_pci_device *dev); + +/** * Map device memory to uio resource * * This function is private to EAL. From patchwork Thu Jul 5 07:38:50 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Guo, Jia" X-Patchwork-Id: 42296 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id DA85E1BEAE; Thu, 5 Jul 2018 09:41:15 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 758E41BEA1 for ; Thu, 5 Jul 2018 09:41:13 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jul 2018 00:41:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,311,1526367600"; d="scan'208";a="52144056" Received: from jeffguo-z170x-ud5.sh.intel.com (HELO localhost.localdomain) ([10.67.104.10]) by fmsmga007.fm.intel.com with ESMTP; 05 Jul 2018 00:41:10 -0700 From: Jeff Guo To: stephen@networkplumber.org, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, gaetan.rivet@6wind.com, jingjing.wu@intel.com, thomas@monjalon.net, motih@mellanox.com, matan@mellanox.com, harry.van.haaren@intel.com, qi.z.zhang@intel.com, shaopeng.he@intel.com, bernard.iremonger@intel.com Cc: jblunck@infradead.org, shreyansh.jain@nxp.com, dev@dpdk.org, jia.guo@intel.com, helin.zhang@intel.com Date: Thu, 5 Jul 2018 15:38:50 +0800 Message-Id: <1530776333-30318-4-git-send-email-jia.guo@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1530776333-30318-1-git-send-email-jia.guo@intel.com> References: <1498711073-42917-1-git-send-email-jia.guo@intel.com> <1530776333-30318-1-git-send-email-jia.guo@intel.com> Subject: [dpdk-dev] [PATCH V5 3/7] bus: add sigbus handler X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" When device be hotplug out, if data path still read/write device, the sigbus error will occur, this error need to be handled. So a handler need to be here to capture the signal and handle it correspondingly. This patch introduces a bus ops to handle sigbus error, it is a bus specific behavior,so that each kind of bus can implement its own logic case by case. Signed-off-by: Jeff Guo Acked-by: Shaopeng He --- v5->v4: refine log and commit log --- lib/librte_eal/common/include/rte_bus.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/lib/librte_eal/common/include/rte_bus.h b/lib/librte_eal/common/include/rte_bus.h index 8a993cf..d753575 100644 --- a/lib/librte_eal/common/include/rte_bus.h +++ b/lib/librte_eal/common/include/rte_bus.h @@ -181,6 +181,20 @@ typedef int (*rte_bus_parse_t)(const char *name, void *addr); typedef int (*rte_bus_hotplug_failure_handler_t)(struct rte_device *dev); /** + * Implementation a specific sigbus handler, which is responsible + * for handle the sigbus error which is original memory error, or specific + * memory error that caused of hot unplug. + * @param failure_addr + * Pointer of the fault address of the sigbus error. + * + * @return + * 0 for success handle the sigbus. + * 1 for no bus handle the sigbus. + * -1 for failed to handle the sigbus + */ +typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr); + +/** * Bus scan policies */ enum rte_bus_scan_mode { @@ -226,6 +240,8 @@ struct rte_bus { rte_bus_get_iommu_class_t get_iommu_class; /**< Get iommu class */ rte_bus_hotplug_failure_handler_t hotplug_failure_handler; /**< handle hotplug failure on bus */ + rte_bus_sigbus_handler_t sigbus_handler; /**< handle sigbus error */ + }; /** From patchwork Thu Jul 5 07:38:51 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Guo, Jia" X-Patchwork-Id: 42297 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id DCB601BE9D; Thu, 5 Jul 2018 09:41:18 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id B95641BEB3 for ; Thu, 5 Jul 2018 09:41:16 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jul 2018 00:41:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,311,1526367600"; d="scan'208";a="52144064" Received: from jeffguo-z170x-ud5.sh.intel.com (HELO localhost.localdomain) ([10.67.104.10]) by fmsmga007.fm.intel.com with ESMTP; 05 Jul 2018 00:41:13 -0700 From: Jeff Guo To: stephen@networkplumber.org, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, gaetan.rivet@6wind.com, jingjing.wu@intel.com, thomas@monjalon.net, motih@mellanox.com, matan@mellanox.com, harry.van.haaren@intel.com, qi.z.zhang@intel.com, shaopeng.he@intel.com, bernard.iremonger@intel.com Cc: jblunck@infradead.org, shreyansh.jain@nxp.com, dev@dpdk.org, jia.guo@intel.com, helin.zhang@intel.com Date: Thu, 5 Jul 2018 15:38:51 +0800 Message-Id: <1530776333-30318-5-git-send-email-jia.guo@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1530776333-30318-1-git-send-email-jia.guo@intel.com> References: <1498711073-42917-1-git-send-email-jia.guo@intel.com> <1530776333-30318-1-git-send-email-jia.guo@intel.com> Subject: [dpdk-dev] [PATCH V5 4/7] bus/pci: implement sigbus handler operation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This patch implements the ops of sigbus handler for PCI bus, it is functional to find the corresponding pci device which is be hotplug out. and then handle the hotplug failure for this device. Signed-off-by: Jeff Guo Acked-by: Shaopeng He --- v5->v4: no change --- drivers/bus/pci/pci_common.c | 49 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c index bc3bcac..f065271 100644 --- a/drivers/bus/pci/pci_common.c +++ b/drivers/bus/pci/pci_common.c @@ -407,6 +407,32 @@ pci_find_device(const struct rte_device *start, rte_dev_cmp_t cmp, return NULL; } +/* check the failure address belongs to which device. */ +static struct rte_pci_device * +pci_find_device_by_addr(const void *failure_addr) +{ + struct rte_pci_device *pdev = NULL; + int i; + + FOREACH_DEVICE_ON_PCIBUS(pdev) { + for (i = 0; i != RTE_DIM(pdev->mem_resource); i++) { + if ((uint64_t)(uintptr_t)failure_addr >= + (uint64_t)(uintptr_t)pdev->mem_resource[i].addr && + (uint64_t)(uintptr_t)failure_addr < + (uint64_t)(uintptr_t)pdev->mem_resource[i].addr + + pdev->mem_resource[i].len) { + RTE_LOG(INFO, EAL, "Failure address " + "%16.16"PRIx64" belongs to " + "device %s!\n", + (uint64_t)(uintptr_t)failure_addr, + pdev->device.name); + return pdev; + } + } + } + return NULL; +} + static int pci_hotplug_failure_handler(struct rte_device *dev) { @@ -435,6 +461,28 @@ pci_hotplug_failure_handler(struct rte_device *dev) } static int +pci_sigbus_handler(const void *failure_addr) +{ + struct rte_pci_device *pdev = NULL; + int ret = 0; + + pdev = pci_find_device_by_addr(failure_addr); + if (!pdev) { + /* It is a generic sigbus error, no bus would handle it. */ + ret = 1; + } else { + /* The sigbus error is caused of hot removal. */ + ret = pci_hotplug_failure_handler(&pdev->device); + if (ret) { + RTE_LOG(ERR, EAL, "Failed to handle hot plug for " + "device %s", pdev->name); + ret = -1; + } + } + return ret; +} + +static int pci_plug(struct rte_device *dev) { return pci_probe_all_drivers(RTE_DEV_TO_PCI(dev)); @@ -465,6 +513,7 @@ struct rte_pci_bus rte_pci_bus = { .parse = pci_parse, .get_iommu_class = rte_pci_get_iommu_class, .hotplug_failure_handler = pci_hotplug_failure_handler, + .sigbus_handler = pci_sigbus_handler, }, .device_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.device_list), .driver_list = TAILQ_HEAD_INITIALIZER(rte_pci_bus.driver_list), From patchwork Thu Jul 5 07:38:52 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Guo, Jia" X-Patchwork-Id: 42298 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A5CF31BEB3; Thu, 5 Jul 2018 09:41:21 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id D30CD1BEAF for ; Thu, 5 Jul 2018 09:41:19 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jul 2018 00:41:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,311,1526367600"; d="scan'208";a="52144072" Received: from jeffguo-z170x-ud5.sh.intel.com (HELO localhost.localdomain) ([10.67.104.10]) by fmsmga007.fm.intel.com with ESMTP; 05 Jul 2018 00:41:16 -0700 From: Jeff Guo To: stephen@networkplumber.org, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, gaetan.rivet@6wind.com, jingjing.wu@intel.com, thomas@monjalon.net, motih@mellanox.com, matan@mellanox.com, harry.van.haaren@intel.com, qi.z.zhang@intel.com, shaopeng.he@intel.com, bernard.iremonger@intel.com Cc: jblunck@infradead.org, shreyansh.jain@nxp.com, dev@dpdk.org, jia.guo@intel.com, helin.zhang@intel.com Date: Thu, 5 Jul 2018 15:38:52 +0800 Message-Id: <1530776333-30318-6-git-send-email-jia.guo@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1530776333-30318-1-git-send-email-jia.guo@intel.com> References: <1498711073-42917-1-git-send-email-jia.guo@intel.com> <1530776333-30318-1-git-send-email-jia.guo@intel.com> Subject: [dpdk-dev] [PATCH V5 5/7] bus: add helper to handle sigbus X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This patch aim to add a helper to iterate all buses to find the corresponding bus to handle the sigbus error. Signed-off-by: Jeff Guo Acked-by: Shaopeng He --- v5->v4: refine the errno restore logic --- lib/librte_eal/common/eal_common_bus.c | 36 +++++++++++++++++++++++++++++++++- lib/librte_eal/common/eal_private.h | 12 ++++++++++++ 2 files changed, 47 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/common/eal_common_bus.c b/lib/librte_eal/common/eal_common_bus.c index 0943851..c9f3566 100644 --- a/lib/librte_eal/common/eal_common_bus.c +++ b/lib/librte_eal/common/eal_common_bus.c @@ -37,6 +37,7 @@ #include #include #include +#include #include "eal_private.h" @@ -220,7 +221,6 @@ rte_bus_find_by_device_name(const char *str) return rte_bus_find(NULL, bus_can_parse, name); } - /* * Get iommu class of devices on the bus. */ @@ -242,3 +242,37 @@ rte_bus_get_iommu_class(void) } return mode; } + +static int +bus_handle_sigbus(const struct rte_bus *bus, + const void *failure_addr) +{ + int ret; + + ret = bus->sigbus_handler(failure_addr); + rte_errno = ret; + + return !(bus->sigbus_handler && ret <= 0); +} + +int +rte_bus_sigbus_handler(const void *failure_addr) +{ + struct rte_bus *bus; + + int ret = 0; + int old_errno = rte_errno; + rte_errno = 0; + + bus = rte_bus_find(NULL, bus_handle_sigbus, failure_addr); + /* failed to handle the sigbus, pass the new errno. */ + if (bus && rte_errno == -1) + return -1; + else if (!bus) + ret = 1; + + /* otherwise restore the old errno. */ + rte_errno = old_errno; + + return ret; +} diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h index bdadc4d..a91c4b5 100644 --- a/lib/librte_eal/common/eal_private.h +++ b/lib/librte_eal/common/eal_private.h @@ -258,4 +258,16 @@ int rte_mp_channel_init(void); */ void dev_callback_process(char *device_name, enum rte_dev_event_type event); + +/** + * Iterate all buses to find the corresponding bus, to handle the sigbus error. + * @param failure_addr + * Pointer of the fault address of the sigbus error. + * + * @return + * 0 success to handle the sigbus. + * -1 failed to handle the sigbus + * 1 no bus can handler the sigbus + */ +int rte_bus_sigbus_handler(const void *failure_addr); #endif /* _EAL_PRIVATE_H_ */ From patchwork Thu Jul 5 07:38:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Guo, Jia" X-Patchwork-Id: 42299 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id EFC081BEC5; Thu, 5 Jul 2018 09:41:24 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 4A2A51BEC5 for ; Thu, 5 Jul 2018 09:41:23 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jul 2018 00:41:22 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.51,311,1526367600"; d="scan'208";a="52144080" Received: from jeffguo-z170x-ud5.sh.intel.com (HELO localhost.localdomain) ([10.67.104.10]) by fmsmga007.fm.intel.com with ESMTP; 05 Jul 2018 00:41:19 -0700 From: Jeff Guo To: stephen@networkplumber.org, bruce.richardson@intel.com, ferruh.yigit@intel.com, konstantin.ananyev@intel.com, gaetan.rivet@6wind.com, jingjing.wu@intel.com, thomas@monjalon.net, motih@mellanox.com, matan@mellanox.com, harry.van.haaren@intel.com, qi.z.zhang@intel.com, shaopeng.he@intel.com, bernard.iremonger@intel.com Cc: jblunck@infradead.org, shreyansh.jain@nxp.com, dev@dpdk.org, jia.guo@intel.com, helin.zhang@intel.com Date: Thu, 5 Jul 2018 15:38:53 +0800 Message-Id: <1530776333-30318-7-git-send-email-jia.guo@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1530776333-30318-1-git-send-email-jia.guo@intel.com> References: <1498711073-42917-1-git-send-email-jia.guo@intel.com> <1530776333-30318-1-git-send-email-jia.guo@intel.com> Subject: [dpdk-dev] [PATCH V5 6/7] eal: add failure handle mechanism for hotplug X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This patch introduces a failure handler mechanism to handle device hot plug removal event. First register sigbus handler, once sigbus error be captured, will check the failure address and accordingly remap the invalid memory for the corresponding device. Bese on this mechanism, it could guaranty the application not to be crash when hotplug out device. Signed-off-by: Jeff Guo Acked-by: Shaopeng He --- v5->v4: add sigbus old handler recover. --- lib/librte_eal/linuxapp/eal/eal_dev.c | 111 +++++++++++++++++++++++++++++++++- 1 file changed, 110 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_dev.c b/lib/librte_eal/linuxapp/eal/eal_dev.c index 1cf6aeb..a22cb9a 100644 --- a/lib/librte_eal/linuxapp/eal/eal_dev.c +++ b/lib/librte_eal/linuxapp/eal/eal_dev.c @@ -4,6 +4,8 @@ #include #include +#include +#include #include #include @@ -14,15 +16,28 @@ #include #include #include +#include +#include +#include +#include #include "eal_private.h" static struct rte_intr_handle intr_handle = {.fd = -1 }; static bool monitor_started; +extern struct rte_bus_list rte_bus_list; + #define EAL_UEV_MSG_LEN 4096 #define EAL_UEV_MSG_ELEM_LEN 128 +/* spinlock for device failure process */ +static rte_spinlock_t dev_failure_lock = RTE_SPINLOCK_INITIALIZER; + +static struct sigaction sigbus_action_old; + +static int sigbus_need_recover; + static void dev_uev_handler(__rte_unused void *param); /* identify the system layer which reports this event. */ @@ -33,6 +48,49 @@ enum eal_dev_event_subsystem { EAL_DEV_EVENT_SUBSYSTEM_MAX }; +static void +sigbus_action_recover(void) +{ + if (sigbus_need_recover) { + sigaction(SIGBUS, &sigbus_action_old, NULL); + sigbus_need_recover = 0; + } +} + +static void sigbus_handler(int signum, siginfo_t *info, + void *ctx __rte_unused) +{ + int ret; + + RTE_LOG(INFO, EAL, "Thread[%d] catch SIGBUS, fault address:%p\n", + (int)pthread_self(), info->si_addr); + + rte_spinlock_lock(&dev_failure_lock); + ret = rte_bus_sigbus_handler(info->si_addr); + rte_spinlock_unlock(&dev_failure_lock); + if (ret == -1) { + rte_exit(EXIT_FAILURE, + "Failed to handle SIGBUS for hotplug, " + "(rte_errno: %s)!", strerror(rte_errno)); + } else if (ret == 1) { + if (sigbus_action_old.sa_handler) + (*(sigbus_action_old.sa_handler))(signum); + else + rte_exit(EXIT_FAILURE, + "Failed to handle generic SIGBUS!"); + } + + RTE_LOG(INFO, EAL, "Success to handle SIGBUS for hotplug!\n"); +} + +static int cmp_dev_name(const struct rte_device *dev, + const void *_name) +{ + const char *name = _name; + + return strcmp(dev->name, name); +} + static int dev_uev_socket_fd_create(void) { @@ -147,6 +205,9 @@ dev_uev_handler(__rte_unused void *param) struct rte_dev_event uevent; int ret; char buf[EAL_UEV_MSG_LEN]; + struct rte_bus *bus; + struct rte_device *dev; + const char *busname; memset(&uevent, 0, sizeof(struct rte_dev_event)); memset(buf, 0, EAL_UEV_MSG_LEN); @@ -171,13 +232,50 @@ dev_uev_handler(__rte_unused void *param) RTE_LOG(DEBUG, EAL, "receive uevent(name:%s, type:%d, subsystem:%d)\n", uevent.devname, uevent.type, uevent.subsystem); - if (uevent.devname) + switch (uevent.subsystem) { + case EAL_DEV_EVENT_SUBSYSTEM_PCI: + case EAL_DEV_EVENT_SUBSYSTEM_UIO: + busname = "pci"; + break; + default: + break; + } + + if (uevent.devname) { + if (uevent.type == RTE_DEV_EVENT_REMOVE) { + rte_spinlock_lock(&dev_failure_lock); + bus = rte_bus_find_by_name(busname); + if (bus == NULL) { + RTE_LOG(ERR, EAL, "Cannot find bus (%s)\n", + busname); + return; + } + + dev = bus->find_device(NULL, cmp_dev_name, + uevent.devname); + if (dev == NULL) { + RTE_LOG(ERR, EAL, "Cannot find device (%s) on " + "bus (%s)\n", uevent.devname, busname); + return; + } + + ret = bus->hotplug_failure_handler(dev); + rte_spinlock_unlock(&dev_failure_lock); + if (ret) { + RTE_LOG(ERR, EAL, "Can not handle hotplug for " + "device (%s)\n", dev->name); + return; + } + } dev_callback_process(uevent.devname, uevent.type); + } } int __rte_experimental rte_dev_event_monitor_start(void) { + sigset_t mask; + struct sigaction action; int ret; if (monitor_started) @@ -197,6 +295,14 @@ rte_dev_event_monitor_start(void) return -1; } + /* register sigbus handler */ + sigemptyset(&mask); + sigaddset(&mask, SIGBUS); + action.sa_flags = SA_SIGINFO; + action.sa_mask = mask; + action.sa_sigaction = sigbus_handler; + sigbus_need_recover = !sigaction(SIGBUS, &action, &sigbus_action_old); + monitor_started = true; return 0; @@ -217,8 +323,11 @@ rte_dev_event_monitor_stop(void) return ret; } + sigbus_action_recover(); + close(intr_handle.fd); intr_handle.fd = -1; monitor_started = false; + return 0; }