Message ID | 20201009034832.10302-2-kalesh-anakkur.purayil@broadcom.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Ferruh Yigit |
Headers | show |
Series | librte_ethdev: error recovery support | expand |
Context | Check | Description |
---|---|---|
ci/checkpatch | success | coding style OK |
09/10/2020 05:48, Kalesh A P: > From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> > > Adding support for device reset and recovery events in the > rte_eth_event framework. FW error and FW reset conditions would be > managed internally by PMD without needing application intervention. > In such cases, PMD would need reset/recovery events to notify application > that PMD is undergoing a reset. > > Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> > Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> > Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> > Reviewed-by: Asaf Penso <asafp@nvidia.com> The ethdev maintainers are not Cc'ed. Please use the option --cc-cmd devtools/get-maintainer.sh > +Error recovery support > +~~~~~~~~~~~~~~~~~~~~~~ > + > +When the PMD detects a FW reset or error condition, it will try to recover > +from the error without needing the application intervention. In such cases, > +PMD would need events to notify the application that it is undergoing > +an error recovery. > + > +The PMD will trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the > +application that PMD detected a FW reset or FW error condition. PMD will > +try to recover from the error by itself. Data path will be halted and > +control path operations would fail during the recovery period. > + > +The PMD will trigger RTE_ETH_EVENT_RECOVERED event to notify the application > +that the it has recovered from the error condition. Control path and data path > +are up now. Since the device undergone a reset, flow rules offloaded prior to > +the reset will be lost and the application has to recreate the rules again. > diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h > index 9759f13..9b4b015 100644 > --- a/lib/librte_ethdev/rte_ethdev.h > +++ b/lib/librte_ethdev/rte_ethdev.h > @@ -3207,6 +3207,23 @@ enum rte_eth_event_type { > RTE_ETH_EVENT_DESTROY, /**< port is released */ > RTE_ETH_EVENT_IPSEC, /**< IPsec offload related event */ > RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */ > + RTE_ETH_EVENT_ERR_RECOVERING, > + /**< port recovering from an error > + * > + * PMD detected a FW reset or error condition. > + * PMD will try to recover from the error. > + * Data path will be halted and Control path operations > + * would fail at this time. > + */ Does it mean the application has nothing to do when receiving this event? I think the app should stop polling at least. > + RTE_ETH_EVENT_RECOVERED, > + /**< port recovered from an error > + * > + * PMD has recovered from the error condition. > + * Control path and Data path are up now. > + * Since the device undergone a reset, flow rules > + * offloaded prior to the reset will be lost and > + * the application has to recreate the rules again. > + */ Please be more precise. Should the app re-configure the port, setup the queues, start the port?
On 10/12/20 12:29 AM, Thomas Monjalon wrote: > 09/10/2020 05:48, Kalesh A P: >> From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> >> >> Adding support for device reset and recovery events in the >> rte_eth_event framework. FW error and FW reset conditions would be >> managed internally by PMD without needing application intervention. >> In such cases, PMD would need reset/recovery events to notify application >> that PMD is undergoing a reset. >> >> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> >> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> >> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> >> Reviewed-by: Asaf Penso <asafp@nvidia.com> > > The ethdev maintainers are not Cc'ed. > Please use the option --cc-cmd devtools/get-maintainer.sh > > >> +Error recovery support >> +~~~~~~~~~~~~~~~~~~~~~~ >> + >> +When the PMD detects a FW reset or error condition, it will try to recover >> +from the error without needing the application intervention. In such cases, >> +PMD would need events to notify the application that it is undergoing >> +an error recovery. >> + >> +The PMD will trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the >> +application that PMD detected a FW reset or FW error condition. PMD will >> +try to recover from the error by itself. Data path will be halted and >> +control path operations would fail during the recovery period. >> + >> +The PMD will trigger RTE_ETH_EVENT_RECOVERED event to notify the application >> +that the it has recovered from the error condition. Control path and data path >> +are up now. Since the device undergone a reset, flow rules offloaded prior to >> +the reset will be lost and the application has to recreate the rules again. What should be done if the state is not recoverable? >> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h >> index 9759f13..9b4b015 100644 >> --- a/lib/librte_ethdev/rte_ethdev.h >> +++ b/lib/librte_ethdev/rte_ethdev.h >> @@ -3207,6 +3207,23 @@ enum rte_eth_event_type { >> RTE_ETH_EVENT_DESTROY, /**< port is released */ >> RTE_ETH_EVENT_IPSEC, /**< IPsec offload related event */ >> RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */ >> + RTE_ETH_EVENT_ERR_RECOVERING, >> + /**< port recovering from an error >> + * >> + * PMD detected a FW reset or error condition. >> + * PMD will try to recover from the error. >> + * Data path will be halted and Control path operations >> + * would fail at this time. >> + */ > > Does it mean the application has nothing to do when receiving this event? > I think the app should stop polling at least. > >> + RTE_ETH_EVENT_RECOVERED, >> + /**< port recovered from an error >> + * >> + * PMD has recovered from the error condition. >> + * Control path and Data path are up now. >> + * Since the device undergone a reset, flow rules >> + * offloaded prior to the reset will be lost and >> + * the application has to recreate the rules again. >> + */ > > Please be more precise. > Should the app re-configure the port, setup the queues, start the port? > >
On 10/12/2020 9:09 AM, Andrew Rybchenko wrote: > On 10/12/20 12:29 AM, Thomas Monjalon wrote: >> 09/10/2020 05:48, Kalesh A P: >>> From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> >>> >>> Adding support for device reset and recovery events in the >>> rte_eth_event framework. FW error and FW reset conditions would be >>> managed internally by PMD without needing application intervention. >>> In such cases, PMD would need reset/recovery events to notify application >>> that PMD is undergoing a reset. >>> >>> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> >>> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> >>> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> >>> Reviewed-by: Asaf Penso <asafp@nvidia.com> >> >> The ethdev maintainers are not Cc'ed. >> Please use the option --cc-cmd devtools/get-maintainer.sh >> >> >>> +Error recovery support >>> +~~~~~~~~~~~~~~~~~~~~~~ >>> + >>> +When the PMD detects a FW reset or error condition, it will try to recover >>> +from the error without needing the application intervention. In such cases, >>> +PMD would need events to notify the application that it is undergoing >>> +an error recovery. >>> + >>> +The PMD will trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the >>> +application that PMD detected a FW reset or FW error condition. PMD will >>> +try to recover from the error by itself. Data path will be halted and >>> +control path operations would fail during the recovery period. >>> + >>> +The PMD will trigger RTE_ETH_EVENT_RECOVERED event to notify the application >>> +that the it has recovered from the error condition. Control path and data path >>> +are up now. Since the device undergone a reset, flow rules offloaded prior to >>> +the reset will be lost and the application has to recreate the rules again. > > What should be done if the state is not recoverable? > >>> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h >>> index 9759f13..9b4b015 100644 >>> --- a/lib/librte_ethdev/rte_ethdev.h >>> +++ b/lib/librte_ethdev/rte_ethdev.h >>> @@ -3207,6 +3207,23 @@ enum rte_eth_event_type { >>> RTE_ETH_EVENT_DESTROY, /**< port is released */ >>> RTE_ETH_EVENT_IPSEC, /**< IPsec offload related event */ >>> RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */ >>> + RTE_ETH_EVENT_ERR_RECOVERING, >>> + /**< port recovering from an error >>> + * >>> + * PMD detected a FW reset or error condition. >>> + * PMD will try to recover from the error. >>> + * Data path will be halted and Control path operations >>> + * would fail at this time. >>> + */ >> >> Does it mean the application has nothing to do when receiving this event? >> I think the app should stop polling at least. >> >>> + RTE_ETH_EVENT_RECOVERED, >>> + /**< port recovered from an error >>> + * >>> + * PMD has recovered from the error condition. >>> + * Control path and Data path are up now. >>> + * Since the device undergone a reset, flow rules >>> + * offloaded prior to the reset will be lost and >>> + * the application has to recreate the rules again. >>> + */ >> >> Please be more precise. >> Should the app re-configure the port, setup the queues, start the port? >> >> > Hi Kalesh Anakkur, The mechanics of notifying the application looks good, but the concerns seems more about what application should do with this information. PMD notifies the application on the FW/HW reset and pushes some tasks/responsibilities to the application, but for this to be useful, these tasks should be clear to application. Think yourself in a situation that you are developing an application and you received these events from a device that you don't know its internals, what will you do? Both Thomas and Andrew put cases that needs more clarification for application. Can you please send a new version with those clarifications? Thanks, ferruh
diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst index 86e0a14..c03f0ef 100644 --- a/doc/guides/prog_guide/poll_mode_drv.rst +++ b/doc/guides/prog_guide/poll_mode_drv.rst @@ -615,3 +615,21 @@ by application. The PMD itself should not call rte_eth_dev_reset(). The PMD can trigger the application to handle reset event. It is duty of application to handle all synchronization before it calls rte_eth_dev_reset(). + +Error recovery support +~~~~~~~~~~~~~~~~~~~~~~ + +When the PMD detects a FW reset or error condition, it will try to recover +from the error without needing the application intervention. In such cases, +PMD would need events to notify the application that it is undergoing +an error recovery. + +The PMD will trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the +application that PMD detected a FW reset or FW error condition. PMD will +try to recover from the error by itself. Data path will be halted and +control path operations would fail during the recovery period. + +The PMD will trigger RTE_ETH_EVENT_RECOVERED event to notify the application +that the it has recovered from the error condition. Control path and data path +are up now. Since the device undergone a reset, flow rules offloaded prior to +the reset will be lost and the application has to recreate the rules again. diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h index 9759f13..9b4b015 100644 --- a/lib/librte_ethdev/rte_ethdev.h +++ b/lib/librte_ethdev/rte_ethdev.h @@ -3207,6 +3207,23 @@ enum rte_eth_event_type { RTE_ETH_EVENT_DESTROY, /**< port is released */ RTE_ETH_EVENT_IPSEC, /**< IPsec offload related event */ RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */ + RTE_ETH_EVENT_ERR_RECOVERING, + /**< port recovering from an error + * + * PMD detected a FW reset or error condition. + * PMD will try to recover from the error. + * Data path will be halted and Control path operations + * would fail at this time. + */ + RTE_ETH_EVENT_RECOVERED, + /**< port recovered from an error + * + * PMD has recovered from the error condition. + * Control path and Data path are up now. + * Since the device undergone a reset, flow rules + * offloaded prior to the reset will be lost and + * the application has to recreate the rules again. + */ RTE_ETH_EVENT_MAX /**< max value of this enum */ };