Message ID | 20200122101654.20824-1-kalesh-anakkur.purayil@broadcom.com (mailing list archive) |
---|---|
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id D23F7A052F; Wed, 22 Jan 2020 10:59:51 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 4128F2A62; Wed, 22 Jan 2020 10:59:51 +0100 (CET) Received: from relay.smtp.broadcom.com (relay.smtp.broadcom.com [192.19.211.62]) by dpdk.org (Postfix) with ESMTP id D1A5A2A62 for <dev@dpdk.org>; Wed, 22 Jan 2020 10:59:49 +0100 (CET) Received: from dhcp-10-123-153-22.dhcp.broadcom.net (bgccx-dev-host-lnx2.bec.broadcom.net [10.123.153.22]) by relay.smtp.broadcom.com (Postfix) with ESMTP id 7797228D9EE; Wed, 22 Jan 2020 01:59:48 -0800 (PST) DKIM-Filter: OpenDKIM Filter v2.10.3 relay.smtp.broadcom.com 7797228D9EE DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=broadcom.com; s=dkimrelay; t=1579687188; bh=0LXKvmp/99YEP9hTYKuy9xvdqK5Pf62hgGkx+GUfsaE=; h=From:To:Cc:Subject:Date:From; b=lLJV1e16N6JOYY7r23ADzgOPuqll9pI2ECuXC/SQmz6pw0yUPxeIc2FHFIbLgTcAs eh4skN99snqvfNWhMQPo60CCqUwnoiFXT+DCMiU11Cu4PlkhuJlLOH6Ekqw/yGPiwF b42R8G+4Erzg4kuhJbGSg89Umhju4Ty9z4K8E3pU= From: Kalesh A P <kalesh-anakkur.purayil@broadcom.com> To: dev@dpdk.org Cc: thomas@monjalon.net, ferruh.yigit@intel.com, declan.doherty@intel.com Date: Wed, 22 Jan 2020 15:46:51 +0530 Message-Id: <20200122101654.20824-1-kalesh-anakkur.purayil@broadcom.com> X-Mailer: git-send-email 2.10.1 Subject: [dpdk-dev] [RFC PATCH 0/3] librte_ethdev: error recovery support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Series |
librte_ethdev: error recovery support
|
|
Message
Kalesh A P
Jan. 22, 2020, 10:16 a.m. UTC
From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
This patch adds support for recovery event in rte_eth_event framework.
FW error and FW reset conditions would be managed by PMD. Driver uses
RTE_ETH_EVENT_INTR_RESET event to notify the applications about the
FW reset or error. In such cases, PMD would need recovery events to
notify application about PMD has recovered from FW reset or FW error.
Kalesh AP (3):
librte_ethdev: support device recovery event
net/bnxt: notify applications about device reset
app/testpmd: handle device recovery event
app/test-pmd/testpmd.c | 7 ++++++-
drivers/net/bnxt/bnxt_cpr.c | 3 +++
drivers/net/bnxt/bnxt_ethdev.c | 10 ++++++++++
lib/librte_ethdev/rte_ethdev.h | 1 +
4 files changed, 20 insertions(+), 1 deletion(-)
Comments
22/01/2020 11:16, Kalesh A P: > From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> > > This patch adds support for recovery event in rte_eth_event framework. > FW error and FW reset conditions would be managed by PMD. Driver uses "Driver"? THE driver? :) > RTE_ETH_EVENT_INTR_RESET event to notify the applications about the > FW reset or error. Which drivers doe that? > In such cases, PMD would need recovery events to > notify application about PMD has recovered from FW reset or FW error. Sorry I don't understand. You said application is notified of any error. But the PMD can recover from this error? So what is the error at the end? If the error is recovered why notifying the application?
Hi Thomas, On Wed, Mar 11, 2020 at 6:49 PM Thomas Monjalon <thomas@monjalon.net> wrote: > 22/01/2020 11:16, Kalesh A P: > > From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> > > > > This patch adds support for recovery event in rte_eth_event framework. > > FW error and FW reset conditions would be managed by PMD. Driver uses > > "Driver"? THE driver? :) > > > RTE_ETH_EVENT_INTR_RESET event to notify the applications about the > > FW reset or error. > > Which drivers doe that? > [Kalesh]: Second patch in this series implements this behavior in bnxt PMD. Error recovery is a new feature added in bnxt PMD in 19.11. This change is needed to support error recovery functionality. > > > In such cases, PMD would need recovery events to > > notify application about PMD has recovered from FW reset or FW error. > > Sorry I don't understand. You said application is notified of any error. > But the PMD can recover from this error? So what is the error at the end? > If the error is recovered why notifying the application? > [Kalesh] : Let me give you some insight on this. The error recovery solution is a protocol implemented between firmware and bnxt PMD to recover from the fatal errors without a system reboot. There is an alarm thread which constantly monitors the health of the firmware and initiates a recovery when needed. There are two scenarios here: 1. Hardware or firmware encountered an error which firmware detected. Firmware is in operational status here. In this case, firmware can reset the chip and notify the driver about the reset. 2. Hardware or firmware encountered an error but firmware is dead/hung. Firmware is not in operational status. In this case, the only possible way to recover the adapter is through host driver(bnxt PMD). In both cases, bnxt PMD reinitializes with the FW again after the reset. During that recovery process, data path will be halted and any control path operation would fail. So, bnxt PMD has to notify the application about this reset/error event to prevent any activities from application during this time.
12/03/2020 04:25, Kalesh Anakkur Purayil: > Hi Thomas, > > On Wed, Mar 11, 2020 at 6:49 PM Thomas Monjalon <thomas@monjalon.net> wrote: > > > 22/01/2020 11:16, Kalesh A P: > > > From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> > > > > > > This patch adds support for recovery event in rte_eth_event framework. > > > FW error and FW reset conditions would be managed by PMD. Driver uses > > > > "Driver"? THE driver? :) > > > > > RTE_ETH_EVENT_INTR_RESET event to notify the applications about the > > > FW reset or error. > > > > Which drivers doe that? > > > [Kalesh]: Second patch in this series implements this behavior in bnxt PMD. > Error recovery is a new feature added in bnxt PMD in 19.11. This change is > needed to support error recovery functionality. > > > > > > In such cases, PMD would need recovery events to > > > notify application about PMD has recovered from FW reset or FW error. > > > > Sorry I don't understand. You said application is notified of any error. > > But the PMD can recover from this error? So what is the error at the end? > > If the error is recovered why notifying the application? > > > [Kalesh] : Let me give you some insight on this. > > The error recovery solution is a protocol implemented between firmware and > bnxt PMD to recover from the fatal errors without a system reboot. There is > an alarm thread which constantly monitors the health of the firmware and > initiates a recovery when needed. > > There are two scenarios here: > > 1. Hardware or firmware encountered an error which firmware detected. > Firmware is in operational status here. In this case, firmware can reset > the chip and notify the driver about the reset. > 2. Hardware or firmware encountered an error but firmware is dead/hung. > Firmware is not in operational status. In this case, the only possible way > to recover the adapter is through host driver(bnxt PMD). > > In both cases, bnxt PMD reinitializes with the FW again after the reset. > During that recovery process, data path will be halted and any control path > operation would fail. So, bnxt PMD has to notify the application about this > reset/error event to prevent any activities from application during this > time. I think you are changing the meaning of the reset event. It was described like this: RTE_ETH_EVENT_INTR_RESET, /**< reset interrupt event, sent to VF on PF reset */ Please update this description as well. Of course, we'll need approval from other PMD maintainers to accept the new recovery API.
On 3/12/2020 7:34 AM, Thomas Monjalon wrote: > 12/03/2020 04:25, Kalesh Anakkur Purayil: >> Hi Thomas, >> >> On Wed, Mar 11, 2020 at 6:49 PM Thomas Monjalon <thomas@monjalon.net> wrote: >> >>> 22/01/2020 11:16, Kalesh A P: >>>> From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> >>>> >>>> This patch adds support for recovery event in rte_eth_event framework. >>>> FW error and FW reset conditions would be managed by PMD. Driver uses >>> >>> "Driver"? THE driver? :) >>> >>>> RTE_ETH_EVENT_INTR_RESET event to notify the applications about the >>>> FW reset or error. >>> >>> Which drivers doe that? >>> >> [Kalesh]: Second patch in this series implements this behavior in bnxt PMD. >> Error recovery is a new feature added in bnxt PMD in 19.11. This change is >> needed to support error recovery functionality. >> >>> >>>> In such cases, PMD would need recovery events to >>>> notify application about PMD has recovered from FW reset or FW error. >>> >>> Sorry I don't understand. You said application is notified of any error. >>> But the PMD can recover from this error? So what is the error at the end? >>> If the error is recovered why notifying the application? >>> >> [Kalesh] : Let me give you some insight on this. >> >> The error recovery solution is a protocol implemented between firmware and >> bnxt PMD to recover from the fatal errors without a system reboot. There is >> an alarm thread which constantly monitors the health of the firmware and >> initiates a recovery when needed. >> >> There are two scenarios here: >> >> 1. Hardware or firmware encountered an error which firmware detected. >> Firmware is in operational status here. In this case, firmware can reset >> the chip and notify the driver about the reset. >> 2. Hardware or firmware encountered an error but firmware is dead/hung. >> Firmware is not in operational status. In this case, the only possible way >> to recover the adapter is through host driver(bnxt PMD). >> >> In both cases, bnxt PMD reinitializes with the FW again after the reset. >> During that recovery process, data path will be halted and any control path >> operation would fail. So, bnxt PMD has to notify the application about this >> reset/error event to prevent any activities from application during this >> time. > > I think you are changing the meaning of the reset event. > It was described like this: > RTE_ETH_EVENT_INTR_RESET, > /**< reset interrupt event, sent to VF on PF reset */ > > Please update this description as well. > > Of course, we'll need approval from other PMD maintainers > to accept the new recovery API. > Hi Kalesh, Is this RFC still relevant/valid?