[v6,0/3] librte_ethdev: error recovery support
Message ID | 20201009034832.10302-1-kalesh-anakkur.purayil@broadcom.com (mailing list archive) |
---|---|
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id C01DDA04BC; Fri, 9 Oct 2020 05:34:05 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 7B03F1BEB7; Fri, 9 Oct 2020 05:34:03 +0200 (CEST) Received: from relay.smtp-ext.broadcom.com (lpdvacalvio01.broadcom.com [192.19.229.182]) by dpdk.org (Postfix) with ESMTP id 9A6391BEB2 for <dev@dpdk.org>; Fri, 9 Oct 2020 05:34:02 +0200 (CEST) Received: from dhcp-10-123-153-22.dhcp.broadcom.net (bgccx-dev-host-lnx2.bec.broadcom.net [10.123.153.22]) by relay.smtp-ext.broadcom.com (Postfix) with ESMTP id 34E292477A for <dev@dpdk.org>; Thu, 8 Oct 2020 20:33:59 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 relay.smtp-ext.broadcom.com 34E292477A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=broadcom.com; s=dkimrelay; t=1602214440; bh=ME13Nq7FMMJ4ZR4tElPxtez9/5XYd66KNFxiwG5EURs=; h=From:To:Subject:Date:In-Reply-To:References:From; b=fGowP4qZy89EyNZrTstJOcfV4Ga84+nyyLrQP2QtUX/rYqk/gdL9UzOj4wt6qp958 hjGG02WFCUA2mtU0ceMfHRXFLlFDrjpaNB+NDGWrxckxPys7+Tjf8leWdPoWu+lAIN aelepNWDP554ZNfK2oUezS6AYPUa6nuQ41piJX/s= From: Kalesh A P <kalesh-anakkur.purayil@broadcom.com> To: dev@dpdk.org Date: Fri, 9 Oct 2020 09:18:29 +0530 Message-Id: <20201009034832.10302-1-kalesh-anakkur.purayil@broadcom.com> X-Mailer: git-send-email 2.10.1 In-Reply-To: <20200122101654.20824-1-kalesh-anakkur.purayil@broadcom.com> References: <20200122101654.20824-1-kalesh-anakkur.purayil@broadcom.com> Subject: [dpdk-dev] [PATCH v6 0/3] librte_ethdev: error recovery support X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Message
Kalesh A P
Oct. 9, 2020, 3:48 a.m. UTC
From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
The error recovery solution is a protocol implemented between firmware
and bnxt PMD to recover from the fatal errors without a system reboot.
There is an alarm thread which constantly monitors the health of the
firmware and initiates a recovery when needed.
There are two scenarios here:
1. Hardware or firmware encountered an error which firmware detected.
Firmware is in operational status here. In this case, firmware can
reset the chip and notify the driver about the reset.
2. Hardware or firmware encountered an error but firmware is dead/hung.
Firmware is not in operational status. In this case, the only possible
way to recover the adapter is through host driver(bnxt PMD).
In both cases, bnxt PMD reinitializes with the FW again after the reset.
During that recovery process, data path will be halted and any control path
operation would fail. So, the PMD has to notify the application about this
reset/error event to prevent any activities from the application while
the PMD is recovering from the error.
This patch set adds support for the reset and recovery event in
the rte_eth_event framework. FW error and FW reset conditions would be
managed by the PMD. Driver uses RTE_ETH_EVENT_RESET event to notify
the applications about the FW reset or error. In such cases,
PMD use the RTE_ETH_EVENT_RECOVERED event to notify application about
PMD has recovered from FW reset or FW error.
v6: Addressed comments from Asaf Penso.
1. Updated 20.11 release notes with the new events added.
2. updated testpmd parse_event_printing_config function.
v5: Addressed comments from Ophir Munk.
1. Renamed the new event name to RTE_ETH_EVENT_ERR_RECOVERING.
2. Fixed testpmd logs.
3. Documented the new recovery events.
v4: Addressed comments from Thomas Monjalon
1. Added doxygen comments about new events.
V3: Fixed a typo in commit log.
V2: Added a new event RTE_ETH_EVENT_RESET instead of using the
RTE_ETH_EVENT_INTR_RESET to notify applications about device reset.
Kalesh AP (3):
ethdev: support device reset and recovery events
net/bnxt: notify applications about device reset/recovery
app/testpmd: handle device recovery event
app/test-pmd/parameters.c | 8 ++++++--
app/test-pmd/testpmd.c | 6 +++++-
doc/guides/prog_guide/poll_mode_drv.rst | 18 ++++++++++++++++++
doc/guides/rel_notes/release_20_11.rst | 10 ++++++++++
drivers/net/bnxt/bnxt_cpr.c | 3 +++
drivers/net/bnxt/bnxt_ethdev.c | 9 +++++++++
lib/librte_ethdev/rte_ethdev.h | 17 +++++++++++++++++
7 files changed, 68 insertions(+), 3 deletions(-)