[v7,1/4] ethdev: support device reset and recovery events

Message ID 20220128124830.427-2-kalesh-anakkur.purayil@broadcom.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series ethdev: error recovery support |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Kalesh A P Jan. 28, 2022, 12:48 p.m. UTC
  From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>

Adding support for the device reset and recovery events in the
rte_eth_event framework. FW error and FW reset conditions would be
managed internally by the PMD without needing application intervention.
In such cases, PMD would need reset/recovery events to notify application
that PMD is undergoing a reset.

While most of the recovery process is transparent to the application since
most of the driver ensures recovery from FW reset or FW error conditions,
the application will have to reprogram any flows which were offloaded to
the underlying hardware.

Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
---
 doc/guides/prog_guide/poll_mode_drv.rst | 24 ++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h                 | 18 ++++++++++++++++++
 2 files changed, 42 insertions(+)
  

Comments

Ferruh Yigit Feb. 1, 2022, 12:11 p.m. UTC | #1
On 1/28/2022 12:48 PM, Kalesh A P wrote:
> From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> 
> Adding support for the device reset and recovery events in the
> rte_eth_event framework. FW error and FW reset conditions would be
> managed internally by the PMD without needing application intervention.
> In such cases, PMD would need reset/recovery events to notify application
> that PMD is undergoing a reset.
> 
> While most of the recovery process is transparent to the application since
> most of the driver ensures recovery from FW reset or FW error conditions,
> the application will have to reprogram any flows which were offloaded to
> the underlying hardware.
> 
> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>

More developer cc'ed.

> ---
>   doc/guides/prog_guide/poll_mode_drv.rst | 24 ++++++++++++++++++++++++
>   lib/ethdev/rte_ethdev.h                 | 18 ++++++++++++++++++
>   2 files changed, 42 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
> index 6831289..9ecc0e4 100644
> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> @@ -623,3 +623,27 @@ by application.
>   The PMD itself should not call rte_eth_dev_reset(). The PMD can trigger
>   the application to handle reset event. It is duty of application to
>   handle all synchronization before it calls rte_eth_dev_reset().
> +
> +Error recovery support
> +~~~~~~~~~~~~~~~~~~~~~~
> +
> +When the PMD detects a FW reset or error condition, it may try to recover
> +from the error without needing the application intervention. In such cases,
> +PMD would need events to notify the application that it is undergoing
> +an error recovery.
> +
> +The PMD should trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the
> +application that PMD detected a FW reset or FW error condition. PMD may
> +try to recover from the error by itself. Data path may be quiesced and
> +control path operations may fail during the recovery period. The application
> +should stop polling till it receives RTE_ETH_EVENT_RECOVERED event from the PMD.
> +

Between the time FW error occurred and the application receive the RECOVERING event,
datapath will continue to poll and application may call control APIs, so the event
really is not solving the issue and driver somehow should be sure this won't crash
the application, in that case not sure about the benefit of this event.

> +The PMD should trigger RTE_ETH_EVENT_RECOVERED event to notify the application
> +that the it has recovered from the error condition. PMD re-configures the port
> +to the state prior to the error condition. Control path and data path are up now.
> +Since the device has undergone a reset, flow rules offloaded prior to reset
> +may be lost and the application should recreate the rules again.
> +

I think the most difficult part here is clarify what application should do
when this event received consistent for all devices, "flow rules may be lost"
looks very vague to me.
Unless it is not clear for application what to do when this event is received,
it is not that useful or it will be specific to some PMDs. And I can see it is
hard to clarify this but perhaps we can define a set of common behavior.
Not sure what others are thinking.

> +The PMD should trigger RTE_ETH_EVENT_INTR_RMV event to notify the application
> +that it has failed to recover from the error condition. The device may not be
> +usable anymore.
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 147cc1c..a46819f 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
>   	RTE_ETH_EVENT_DESTROY,  /**< port is released */
>   	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
>   	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
> +	RTE_ETH_EVENT_ERR_RECOVERING,
> +			/**< port recovering from an error
> +			 *
> +			 * PMD detected a FW reset or error condition.
> +			 * PMD will try to recover from the error.
> +			 * Data path may be quiesced and Control path operations
> +			 * may fail at this time.
> +			 */

Please put multi line comments before enum, Andrew did a set of cleanups for these.

> +	RTE_ETH_EVENT_RECOVERED,
> +			/**< port recovered from an error
> +			 *
> +			 * PMD has recovered from the error condition.
> +			 * Control path and Data path are up now.
> +			 * PMD re-configures the port to the state prior to the error.
> +			 * Since the device has undergone a reset, flow rules
> +			 * offloaded prior to reset may be lost and
> +			 * the application should recreate the rules again.
> +			 */
>   	RTE_ETH_EVENT_MAX       /**< max value of this enum */
>   };
>
  
Ferruh Yigit Feb. 1, 2022, 12:52 p.m. UTC | #2
On 1/28/2022 12:48 PM, Kalesh A P wrote:
> From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> 
> Adding support for the device reset and recovery events in the
> rte_eth_event framework. FW error and FW reset conditions would be
> managed internally by the PMD without needing application intervention.
> In such cases, PMD would need reset/recovery events to notify application
> that PMD is undergoing a reset.
> 
> While most of the recovery process is transparent to the application since
> most of the driver ensures recovery from FW reset or FW error conditions,
> the application will have to reprogram any flows which were offloaded to
> the underlying hardware.
> 
> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
> ---
>   doc/guides/prog_guide/poll_mode_drv.rst | 24 ++++++++++++++++++++++++
>   lib/ethdev/rte_ethdev.h                 | 18 ++++++++++++++++++
>   2 files changed, 42 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
> index 6831289..9ecc0e4 100644
> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> @@ -623,3 +623,27 @@ by application.
>   The PMD itself should not call rte_eth_dev_reset(). The PMD can trigger
>   the application to handle reset event. It is duty of application to
>   handle all synchronization before it calls rte_eth_dev_reset().
> +
> +Error recovery support
> +~~~~~~~~~~~~~~~~~~~~~~
> +
> +When the PMD detects a FW reset or error condition, it may try to recover
> +from the error without needing the application intervention. In such cases,
> +PMD would need events to notify the application that it is undergoing
> +an error recovery.
> +
> +The PMD should trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the
> +application that PMD detected a FW reset or FW error condition. PMD may
> +try to recover from the error by itself. Data path may be quiesced and
> +control path operations may fail during the recovery period. The application
> +should stop polling till it receives RTE_ETH_EVENT_RECOVERED event from the PMD.
> +
> +The PMD should trigger RTE_ETH_EVENT_RECOVERED event to notify the application
> +that the it has recovered from the error condition. PMD re-configures the port
> +to the state prior to the error condition. Control path and data path are up now.
> +Since the device has undergone a reset, flow rules offloaded prior to reset
> +may be lost and the application should recreate the rules again.
> +
> +The PMD should trigger RTE_ETH_EVENT_INTR_RMV event to notify the application
> +that it has failed to recover from the error condition. The device may not be
> +usable anymore.
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 147cc1c..a46819f 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
>   	RTE_ETH_EVENT_DESTROY,  /**< port is released */
>   	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
>   	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
> +	RTE_ETH_EVENT_ERR_RECOVERING,
> +			/**< port recovering from an error
> +			 *
> +			 * PMD detected a FW reset or error condition.
> +			 * PMD will try to recover from the error.
> +			 * Data path may be quiesced and Control path operations
> +			 * may fail at this time.
> +			 */
> +	RTE_ETH_EVENT_RECOVERED,
> +			/**< port recovered from an error
> +			 *
> +			 * PMD has recovered from the error condition.
> +			 * Control path and Data path are up now.
> +			 * PMD re-configures the port to the state prior to the error.
> +			 * Since the device has undergone a reset, flow rules
> +			 * offloaded prior to reset may be lost and
> +			 * the application should recreate the rules again.
> +			 */
>   	RTE_ETH_EVENT_MAX       /**< max value of this enum */


Also ABI check complains about 'RTE_ETH_EVENT_MAX' value check, cc'ed more people
to evaluate if it is a false positive:


1 function with some indirect sub-type change:
   [C] 'function int rte_eth_dev_callback_register(uint16_t, rte_eth_event_type, rte_eth_dev_cb_fn, void*)' at rte_ethdev.c:4637:1 has some indirect sub-type changes:
     parameter 3 of type 'typedef rte_eth_dev_cb_fn' has sub-type changes:
       underlying type 'int (typedef uint16_t, enum rte_eth_event_type, void*, void*)*' changed:
         in pointed to type 'function type int (typedef uint16_t, enum rte_eth_event_type, void*, void*)':
           parameter 2 of type 'enum rte_eth_event_type' has sub-type changes:
             type size hasn't changed
             2 enumerator insertions:
               'rte_eth_event_type::RTE_ETH_EVENT_ERR_RECOVERING' value '11'
               'rte_eth_event_type::RTE_ETH_EVENT_RECOVERED' value '12'
             1 enumerator change:
               'rte_eth_event_type::RTE_ETH_EVENT_MAX' from value '11' to '13' at rte_ethdev.h:3807:1
  
Kalesh A P Feb. 1, 2022, 1:09 p.m. UTC | #3
Thank you Ferruh for the review. Please see inline.

On Tue, Feb 1, 2022 at 5:41 PM Ferruh Yigit <ferruh.yigit@intel.com> wrote:

> On 1/28/2022 12:48 PM, Kalesh A P wrote:
> > From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> >
> > Adding support for the device reset and recovery events in the
> > rte_eth_event framework. FW error and FW reset conditions would be
> > managed internally by the PMD without needing application intervention.
> > In such cases, PMD would need reset/recovery events to notify application
> > that PMD is undergoing a reset.
> >
> > While most of the recovery process is transparent to the application
> since
> > most of the driver ensures recovery from FW reset or FW error conditions,
> > the application will have to reprogram any flows which were offloaded to
> > the underlying hardware.
> >
> > Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> > Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
> > Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
>
> More developer cc'ed.
>
> > ---
> >   doc/guides/prog_guide/poll_mode_drv.rst | 24 ++++++++++++++++++++++++
> >   lib/ethdev/rte_ethdev.h                 | 18 ++++++++++++++++++
> >   2 files changed, 42 insertions(+)
> >
> > diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
> b/doc/guides/prog_guide/poll_mode_drv.rst
> > index 6831289..9ecc0e4 100644
> > --- a/doc/guides/prog_guide/poll_mode_drv.rst
> > +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> > @@ -623,3 +623,27 @@ by application.
> >   The PMD itself should not call rte_eth_dev_reset(). The PMD can trigger
> >   the application to handle reset event. It is duty of application to
> >   handle all synchronization before it calls rte_eth_dev_reset().
> > +
> > +Error recovery support
> > +~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +When the PMD detects a FW reset or error condition, it may try to
> recover
> > +from the error without needing the application intervention. In such
> cases,
> > +PMD would need events to notify the application that it is undergoing
> > +an error recovery.
> > +
> > +The PMD should trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the
> > +application that PMD detected a FW reset or FW error condition. PMD may
> > +try to recover from the error by itself. Data path may be quiesced and
> > +control path operations may fail during the recovery period. The
> application
> > +should stop polling till it receives RTE_ETH_EVENT_RECOVERED event from
> the PMD.
> > +
>
> Between the time FW error occurred and the application receive the
> RECOVERING event,
> datapath will continue to poll and application may call control APIs, so
> the event
> really is not solving the issue and driver somehow should be sure this
> won't crash
> the application, in that case not sure about the benefit of this event.
>
[Kalesh]: As soon as the driver detects a FW dead or reset condition, it
sets the fastpath pointers to dummy functions. This will prevent the crash.
All control path operations would fail with -EBUSY. This change is already
there in bnxt PMD. This event is a notification to the application that the
PMD is recovering from a FW error condition so that it can stop polling and
issue control path operations.

>
> > +The PMD should trigger RTE_ETH_EVENT_RECOVERED event to notify the
> application
> > +that the it has recovered from the error condition. PMD re-configures
> the port
> > +to the state prior to the error condition. Control path and data path
> are up now.
> > +Since the device has undergone a reset, flow rules offloaded prior to
> reset
> > +may be lost and the application should recreate the rules again.
> > +
>
> I think the most difficult part here is clarify what application should do
> when this event received consistent for all devices, "flow rules may be
> lost"
> looks very vague to me.
> Unless it is not clear for application what to do when this event is
> received,
> it is not that useful or it will be specific to some PMDs. And I can see
> it is
> hard to clarify this but perhaps we can define a set of common behavior.
> Not sure what others are thinking.
>
[Kalesh]: Sure, let's wait for others' opinions as well.

>
> > +The PMD should trigger RTE_ETH_EVENT_INTR_RMV event to notify the
> application
> > +that it has failed to recover from the error condition. The device may
> not be
> > +usable anymore.
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > index 147cc1c..a46819f 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
> >       RTE_ETH_EVENT_DESTROY,  /**< port is released */
> >       RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
> >       RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
> > +     RTE_ETH_EVENT_ERR_RECOVERING,
> > +                     /**< port recovering from an error
> > +                      *
> > +                      * PMD detected a FW reset or error condition.
> > +                      * PMD will try to recover from the error.
> > +                      * Data path may be quiesced and Control path
> operations
> > +                      * may fail at this time.
> > +                      */
>
> Please put multi line comments before enum, Andrew did a set of cleanups
> for these.
>
[Kalesh]: Sure, will do.

>
> > +     RTE_ETH_EVENT_RECOVERED,
> > +                     /**< port recovered from an error
> > +                      *
> > +                      * PMD has recovered from the error condition.
> > +                      * Control path and Data path are up now.
> > +                      * PMD re-configures the port to the state prior
> to the error.
> > +                      * Since the device has undergone a reset, flow
> rules
> > +                      * offloaded prior to reset may be lost and
> > +                      * the application should recreate the rules again.
> > +                      */
> >       RTE_ETH_EVENT_MAX       /**< max value of this enum */
> >   };
> >
>
>
  
Ferruh Yigit Feb. 1, 2022, 1:19 p.m. UTC | #4
On 2/1/2022 1:09 PM, Kalesh Anakkur Purayil wrote:
> Thank you Ferruh for the review. Please see inline.
> 
> On Tue, Feb 1, 2022 at 5:41 PM Ferruh Yigit <ferruh.yigit@intel.com <mailto:ferruh.yigit@intel.com>> wrote:
> 
>     On 1/28/2022 12:48 PM, Kalesh A P wrote:
>      > From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com <mailto:kalesh-anakkur.purayil@broadcom.com>>
>      >
>      > Adding support for the device reset and recovery events in the
>      > rte_eth_event framework. FW error and FW reset conditions would be
>      > managed internally by the PMD without needing application intervention.
>      > In such cases, PMD would need reset/recovery events to notify application
>      > that PMD is undergoing a reset.
>      >
>      > While most of the recovery process is transparent to the application since
>      > most of the driver ensures recovery from FW reset or FW error conditions,
>      > the application will have to reprogram any flows which were offloaded to
>      > the underlying hardware.
>      >
>      > Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com <mailto:kalesh-anakkur.purayil@broadcom.com>>
>      > Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com <mailto:somnath.kotur@broadcom.com>>
>      > Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com <mailto:ajit.khaparde@broadcom.com>>
> 
>     More developer cc'ed.
> 
>      > ---
>      >   doc/guides/prog_guide/poll_mode_drv.rst | 24 ++++++++++++++++++++++++
>      >   lib/ethdev/rte_ethdev.h                 | 18 ++++++++++++++++++
>      >   2 files changed, 42 insertions(+)
>      >
>      > diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
>      > index 6831289..9ecc0e4 100644
>      > --- a/doc/guides/prog_guide/poll_mode_drv.rst
>      > +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>      > @@ -623,3 +623,27 @@ by application.
>      >   The PMD itself should not call rte_eth_dev_reset(). The PMD can trigger
>      >   the application to handle reset event. It is duty of application to
>      >   handle all synchronization before it calls rte_eth_dev_reset().
>      > +
>      > +Error recovery support
>      > +~~~~~~~~~~~~~~~~~~~~~~
>      > +
>      > +When the PMD detects a FW reset or error condition, it may try to recover
>      > +from the error without needing the application intervention. In such cases,
>      > +PMD would need events to notify the application that it is undergoing
>      > +an error recovery.
>      > +
>      > +The PMD should trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the
>      > +application that PMD detected a FW reset or FW error condition. PMD may
>      > +try to recover from the error by itself. Data path may be quiesced and
>      > +control path operations may fail during the recovery period. The application
>      > +should stop polling till it receives RTE_ETH_EVENT_RECOVERED event from the PMD.
>      > +
> 
>     Between the time FW error occurred and the application receive the RECOVERING event,
>     datapath will continue to poll and application may call control APIs, so the event
>     really is not solving the issue and driver somehow should be sure this won't crash
>     the application, in that case not sure about the benefit of this event.
> 
> [Kalesh]: As soon as the driver detects a FW dead or reset condition, it sets the fastpath pointers to dummy functions. This will prevent the crash. All control path operations would fail with -EBUSY. This change is already there in bnxt PMD. This event is a notification to the application that the PMD is recovering from a FW error condition so that it can stop polling and issue control path operations.
> 

This was my point indeed. PMD can't rely on this event and should take actions (for both
datapath and control path) to prevent possible crash/error. If that is the case event
doesn't have that much benefit since PMD already has to protect itself.

> 
>      > +The PMD should trigger RTE_ETH_EVENT_RECOVERED event to notify the application
>      > +that the it has recovered from the error condition. PMD re-configures the port
>      > +to the state prior to the error condition. Control path and data path are up now.
>      > +Since the device has undergone a reset, flow rules offloaded prior to reset
>      > +may be lost and the application should recreate the rules again.
>      > +
> 
>     I think the most difficult part here is clarify what application should do
>     when this event received consistent for all devices, "flow rules may be lost"
>     looks very vague to me.
>     Unless it is not clear for application what to do when this event is received,
>     it is not that useful or it will be specific to some PMDs. And I can see it is
>     hard to clarify this but perhaps we can define a set of common behavior.
>     Not sure what others are thinking.
> 
> [Kalesh]: Sure, let's wait for others' opinions as well.
> 
> 
>      > +The PMD should trigger RTE_ETH_EVENT_INTR_RMV event to notify the application
>      > +that it has failed to recover from the error condition. The device may not be
>      > +usable anymore.
>      > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>      > index 147cc1c..a46819f 100644
>      > --- a/lib/ethdev/rte_ethdev.h
>      > +++ b/lib/ethdev/rte_ethdev.h
>      > @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
>      >       RTE_ETH_EVENT_DESTROY,  /**< port is released */
>      >       RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
>      >       RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
>      > +     RTE_ETH_EVENT_ERR_RECOVERING,
>      > +                     /**< port recovering from an error
>      > +                      *
>      > +                      * PMD detected a FW reset or error condition.
>      > +                      * PMD will try to recover from the error.
>      > +                      * Data path may be quiesced and Control path operations
>      > +                      * may fail at this time.
>      > +                      */
> 
>     Please put multi line comments before enum, Andrew did a set of cleanups for these.
> 
> [Kalesh]: Sure, will do.
> 
> 
>      > +     RTE_ETH_EVENT_RECOVERED,
>      > +                     /**< port recovered from an error
>      > +                      *
>      > +                      * PMD has recovered from the error condition.
>      > +                      * Control path and Data path are up now.
>      > +                      * PMD re-configures the port to the state prior to the error.
>      > +                      * Since the device has undergone a reset, flow rules
>      > +                      * offloaded prior to reset may be lost and
>      > +                      * the application should recreate the rules again.
>      > +                      */
>      >       RTE_ETH_EVENT_MAX       /**< max value of this enum */
>      >   };
>      >
> 
> 
> 
> -- 
> Regards,
> Kalesh A P
  
Ray Kinsella Feb. 2, 2022, 11:44 a.m. UTC | #5
Ferruh Yigit <ferruh.yigit@intel.com> writes:

> On 1/28/2022 12:48 PM, Kalesh A P wrote:
>> From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
>> Adding support for the device reset and recovery events in the
>> rte_eth_event framework. FW error and FW reset conditions would be
>> managed internally by the PMD without needing application intervention.
>> In such cases, PMD would need reset/recovery events to notify application
>> that PMD is undergoing a reset.
>> While most of the recovery process is transparent to the application since
>> most of the driver ensures recovery from FW reset or FW error conditions,
>> the application will have to reprogram any flows which were offloaded to
>> the underlying hardware.
>> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
>> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
>> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
>> ---
>>   doc/guides/prog_guide/poll_mode_drv.rst | 24 ++++++++++++++++++++++++
>>   lib/ethdev/rte_ethdev.h                 | 18 ++++++++++++++++++
>>   2 files changed, 42 insertions(+)
>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
>> b/doc/guides/prog_guide/poll_mode_drv.rst
>> index 6831289..9ecc0e4 100644
>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>> @@ -623,3 +623,27 @@ by application.
>>   The PMD itself should not call rte_eth_dev_reset(). The PMD can trigger
>>   the application to handle reset event. It is duty of application to
>>   handle all synchronization before it calls rte_eth_dev_reset().
>> +
>> +Error recovery support
>> +~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +When the PMD detects a FW reset or error condition, it may try to recover
>> +from the error without needing the application intervention. In such cases,
>> +PMD would need events to notify the application that it is undergoing
>> +an error recovery.
>> +
>> +The PMD should trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the
>> +application that PMD detected a FW reset or FW error condition. PMD may
>> +try to recover from the error by itself. Data path may be quiesced and
>> +control path operations may fail during the recovery period. The application
>> +should stop polling till it receives RTE_ETH_EVENT_RECOVERED event from the PMD.
>> +
>> +The PMD should trigger RTE_ETH_EVENT_RECOVERED event to notify the application
>> +that the it has recovered from the error condition. PMD re-configures the port
>> +to the state prior to the error condition. Control path and data path are up now.
>> +Since the device has undergone a reset, flow rules offloaded prior to reset
>> +may be lost and the application should recreate the rules again.
>> +
>> +The PMD should trigger RTE_ETH_EVENT_INTR_RMV event to notify the application
>> +that it has failed to recover from the error condition. The device may not be
>> +usable anymore.
>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>> index 147cc1c..a46819f 100644
>> --- a/lib/ethdev/rte_ethdev.h
>> +++ b/lib/ethdev/rte_ethdev.h
>> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
>>   	RTE_ETH_EVENT_DESTROY,  /**< port is released */
>>   	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
>>   	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
>> +	RTE_ETH_EVENT_ERR_RECOVERING,
>> +			/**< port recovering from an error
>> +			 *
>> +			 * PMD detected a FW reset or error condition.
>> +			 * PMD will try to recover from the error.
>> +			 * Data path may be quiesced and Control path operations
>> +			 * may fail at this time.
>> +			 */
>> +	RTE_ETH_EVENT_RECOVERED,
>> +			/**< port recovered from an error
>> +			 *
>> +			 * PMD has recovered from the error condition.
>> +			 * Control path and Data path are up now.
>> +			 * PMD re-configures the port to the state prior to the error.
>> +			 * Since the device has undergone a reset, flow rules
>> +			 * offloaded prior to reset may be lost and
>> +			 * the application should recreate the rules again.
>> +			 */
>>   	RTE_ETH_EVENT_MAX       /**< max value of this enum */
>
>
> Also ABI check complains about 'RTE_ETH_EVENT_MAX' value check, cc'ed more people
> to evaluate if it is a false positive:
>
>
> 1 function with some indirect sub-type change:
>   [C] 'function int rte_eth_dev_callback_register(uint16_t, rte_eth_event_type, rte_eth_dev_cb_fn, void*)' at rte_ethdev.c:4637:1 has some indirect sub-type changes:
>     parameter 3 of type 'typedef rte_eth_dev_cb_fn' has sub-type changes:
>       underlying type 'int (typedef uint16_t, enum rte_eth_event_type, void*, void*)*' changed:
>         in pointed to type 'function type int (typedef uint16_t, enum rte_eth_event_type, void*, void*)':
>           parameter 2 of type 'enum rte_eth_event_type' has sub-type changes:
>             type size hasn't changed
>             2 enumerator insertions:
>               'rte_eth_event_type::RTE_ETH_EVENT_ERR_RECOVERING' value '11'
>               'rte_eth_event_type::RTE_ETH_EVENT_RECOVERED' value '12'
>             1 enumerator change:
>               'rte_eth_event_type::RTE_ETH_EVENT_MAX' from value '11' to '13' at rte_ethdev.h:3807:1

I don't immediately see the problem that this would cause.
There are no array sizes etc dependent on the value of MAX for instance.

Looks safe?
  
Ajit Khaparde Feb. 3, 2022, 8:28 p.m. UTC | #6
On Tue, Feb 1, 2022 at 5:20 AM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>
> On 2/1/2022 1:09 PM, Kalesh Anakkur Purayil wrote:
> > Thank you Ferruh for the review. Please see inline.
> >
> > On Tue, Feb 1, 2022 at 5:41 PM Ferruh Yigit <ferruh.yigit@intel.com <mailto:ferruh.yigit@intel.com>> wrote:
> >
> >     On 1/28/2022 12:48 PM, Kalesh A P wrote:
> >      > From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com <mailto:kalesh-anakkur.purayil@broadcom.com>>
> >      >
> >      > Adding support for the device reset and recovery events in the
> >      > rte_eth_event framework. FW error and FW reset conditions would be
> >      > managed internally by the PMD without needing application intervention.
> >      > In such cases, PMD would need reset/recovery events to notify application
> >      > that PMD is undergoing a reset.
> >      >
> >      > While most of the recovery process is transparent to the application since
> >      > most of the driver ensures recovery from FW reset or FW error conditions,
> >      > the application will have to reprogram any flows which were offloaded to
> >      > the underlying hardware.
> >      >
> >      > Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com <mailto:kalesh-anakkur.purayil@broadcom.com>>
> >      > Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com <mailto:somnath.kotur@broadcom.com>>
> >      > Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com <mailto:ajit.khaparde@broadcom.com>>
> >
> >     More developer cc'ed.
> >
> >      > ---
> >      >   doc/guides/prog_guide/poll_mode_drv.rst | 24 ++++++++++++++++++++++++
> >      >   lib/ethdev/rte_ethdev.h                 | 18 ++++++++++++++++++
> >      >   2 files changed, 42 insertions(+)
> >      >
> >      > diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
> >      > index 6831289..9ecc0e4 100644
> >      > --- a/doc/guides/prog_guide/poll_mode_drv.rst
> >      > +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> >      > @@ -623,3 +623,27 @@ by application.
> >      >   The PMD itself should not call rte_eth_dev_reset(). The PMD can trigger
> >      >   the application to handle reset event. It is duty of application to
> >      >   handle all synchronization before it calls rte_eth_dev_reset().
> >      > +
> >      > +Error recovery support
> >      > +~~~~~~~~~~~~~~~~~~~~~~
> >      > +
> >      > +When the PMD detects a FW reset or error condition, it may try to recover
> >      > +from the error without needing the application intervention. In such cases,
> >      > +PMD would need events to notify the application that it is undergoing
> >      > +an error recovery.
> >      > +
> >      > +The PMD should trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the
> >      > +application that PMD detected a FW reset or FW error condition. PMD may
> >      > +try to recover from the error by itself. Data path may be quiesced and
> >      > +control path operations may fail during the recovery period. The application
> >      > +should stop polling till it receives RTE_ETH_EVENT_RECOVERED event from the PMD.
> >      > +
> >
> >     Between the time FW error occurred and the application receive the RECOVERING event,
> >     datapath will continue to poll and application may call control APIs, so the event
> >     really is not solving the issue and driver somehow should be sure this won't crash
> >     the application, in that case not sure about the benefit of this event.
> >
> > [Kalesh]: As soon as the driver detects a FW dead or reset condition, it sets the fastpath pointers to dummy functions. This will prevent the crash. All control path operations would fail with -EBUSY. This change is already there in bnxt PMD. This event is a notification to the application that the PMD is recovering from a FW error condition so that it can stop polling and issue control path operations.
> >
>
> This was my point indeed. PMD can't rely on this event and should take actions (for both
> datapath and control path) to prevent possible crash/error. If that is the case event
> doesn't have that much benefit since PMD already has to protect itself.

This is an attempt to prevent PMD and applications from encountering
unexpected situations because of an error in hardware or firmware.
The unexpected scenarios can include lockups to segfaults apart from
other situations.

When hardware or firmware encounters and error, propagating it to the
application for it to react can be slow, time consuming and could be
vendor specific.

Instead in the case of Broadcom devices we are trying to contain the
detection and recovery within the hardware, firmware mostly and to a
small extent driver framework.
In the case of kernel drivers, this allows avoiding a system reset as
much as possible while in the case of DPDK PMD, it can prevent
application reloads or ungraceful exits.

The PMD generated notification to the application will be more like a
"For Your Information" for most of the applications.
But applications can decide to act on the notification and take
specific steps - like flushing or deleting all the existing flow
rules, meters etc.. without waiting for success or failure indication.
Applications can also re-offload the flow rules, recreate meters based
on the notification allowing them to sync with the fresh hardware,
firmware state.

We think other PMD can also take advantage of these notifications to
provide better reliability, accessibility and serviceability in
general.

Thanks

Ajit
>
> >
> >      > +The PMD should trigger RTE_ETH_EVENT_RECOVERED event to notify the application
> >      > +that the it has recovered from the error condition. PMD re-configures the port
> >      > +to the state prior to the error condition. Control path and data path are up now.
> >      > +Since the device has undergone a reset, flow rules offloaded prior to reset
> >      > +may be lost and the application should recreate the rules again.
> >      > +
> >
> >     I think the most difficult part here is clarify what application should do
> >     when this event received consistent for all devices, "flow rules may be lost"
> >     looks very vague to me.
> >     Unless it is not clear for application what to do when this event is received,
> >     it is not that useful or it will be specific to some PMDs. And I can see it is
> >     hard to clarify this but perhaps we can define a set of common behavior.
> >     Not sure what others are thinking.
> >
> > [Kalesh]: Sure, let's wait for others' opinions as well.
> >
> >
> >      > +The PMD should trigger RTE_ETH_EVENT_INTR_RMV event to notify the application
> >      > +that it has failed to recover from the error condition. The device may not be
> >      > +usable anymore.
> >      > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> >      > index 147cc1c..a46819f 100644
> >      > --- a/lib/ethdev/rte_ethdev.h
> >      > +++ b/lib/ethdev/rte_ethdev.h
> >      > @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
> >      >       RTE_ETH_EVENT_DESTROY,  /**< port is released */
> >      >       RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
> >      >       RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
> >      > +     RTE_ETH_EVENT_ERR_RECOVERING,
> >      > +                     /**< port recovering from an error
> >      > +                      *
> >      > +                      * PMD detected a FW reset or error condition.
> >      > +                      * PMD will try to recover from the error.
> >      > +                      * Data path may be quiesced and Control path operations
> >      > +                      * may fail at this time.
> >      > +                      */
> >
> >     Please put multi line comments before enum, Andrew did a set of cleanups for these.
> >
> > [Kalesh]: Sure, will do.
> >
> >
> >      > +     RTE_ETH_EVENT_RECOVERED,
> >      > +                     /**< port recovered from an error
> >      > +                      *
> >      > +                      * PMD has recovered from the error condition.
> >      > +                      * Control path and Data path are up now.
> >      > +                      * PMD re-configures the port to the state prior to the error.
> >      > +                      * Since the device has undergone a reset, flow rules
> >      > +                      * offloaded prior to reset may be lost and
> >      > +                      * the application should recreate the rules again.
> >      > +                      */
> >      >       RTE_ETH_EVENT_MAX       /**< max value of this enum */
> >      >   };
> >      >
> >
> >
> >
> > --
> > Regards,
> > Kalesh A P
>
  
Thomas Monjalon Feb. 10, 2022, 10:16 p.m. UTC | #7
02/02/2022 12:44, Ray Kinsella:
> Ferruh Yigit <ferruh.yigit@intel.com> writes:
> > On 1/28/2022 12:48 PM, Kalesh A P wrote:
> >> --- a/lib/ethdev/rte_ethdev.h
> >> +++ b/lib/ethdev/rte_ethdev.h
> >> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
> >>   	RTE_ETH_EVENT_DESTROY,  /**< port is released */
> >>   	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
> >>   	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
> >> +	RTE_ETH_EVENT_ERR_RECOVERING,
> >> +			/**< port recovering from an error
> >> +			 *
> >> +			 * PMD detected a FW reset or error condition.
> >> +			 * PMD will try to recover from the error.
> >> +			 * Data path may be quiesced and Control path operations
> >> +			 * may fail at this time.
> >> +			 */
> >> +	RTE_ETH_EVENT_RECOVERED,
> >> +			/**< port recovered from an error
> >> +			 *
> >> +			 * PMD has recovered from the error condition.
> >> +			 * Control path and Data path are up now.
> >> +			 * PMD re-configures the port to the state prior to the error.
> >> +			 * Since the device has undergone a reset, flow rules
> >> +			 * offloaded prior to reset may be lost and
> >> +			 * the application should recreate the rules again.
> >> +			 */
> >>   	RTE_ETH_EVENT_MAX       /**< max value of this enum */
> >
> >
> > Also ABI check complains about 'RTE_ETH_EVENT_MAX' value check, cc'ed more people
> > to evaluate if it is a false positive:
> >
> >
> > 1 function with some indirect sub-type change:
> >   [C] 'function int rte_eth_dev_callback_register(uint16_t, rte_eth_event_type, rte_eth_dev_cb_fn, void*)' at rte_ethdev.c:4637:1 has some indirect sub-type changes:
> >     parameter 3 of type 'typedef rte_eth_dev_cb_fn' has sub-type changes:
> >       underlying type 'int (typedef uint16_t, enum rte_eth_event_type, void*, void*)*' changed:
> >         in pointed to type 'function type int (typedef uint16_t, enum rte_eth_event_type, void*, void*)':
> >           parameter 2 of type 'enum rte_eth_event_type' has sub-type changes:
> >             type size hasn't changed
> >             2 enumerator insertions:
> >               'rte_eth_event_type::RTE_ETH_EVENT_ERR_RECOVERING' value '11'
> >               'rte_eth_event_type::RTE_ETH_EVENT_RECOVERED' value '12'
> >             1 enumerator change:
> >               'rte_eth_event_type::RTE_ETH_EVENT_MAX' from value '11' to '13' at rte_ethdev.h:3807:1
> 
> I don't immediately see the problem that this would cause.
> There are no array sizes etc dependent on the value of MAX for instance.
> 
> Looks safe?

We never know how this enum will be used by the application.
The max value may be used for the size of an event array.
It looks a real ABI issue unfortunately.

PS: I am not Cc'ed in this patchset,
so copying what I said on v6 (more than a year ago):
Please use the option --cc-cmd devtools/get-maintainer.sh
  
Thomas Monjalon Feb. 10, 2022, 10:42 p.m. UTC | #8
03/02/2022 21:28, Ajit Khaparde:
> On Tue, Feb 1, 2022 at 5:20 AM Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> > On 2/1/2022 1:09 PM, Kalesh Anakkur Purayil wrote:
> > > On Tue, Feb 1, 2022 at 5:41 PM Ferruh Yigit <ferruh.yigit@intel.com <mailto:ferruh.yigit@intel.com>> wrote:
> > >     On 1/28/2022 12:48 PM, Kalesh A P wrote:
> > >      > From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com <mailto:kalesh-anakkur.purayil@broadcom.com>>
> > >      > --- a/doc/guides/prog_guide/poll_mode_drv.rst
> > >      > +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> > >      > +Error recovery support
> > >      > +~~~~~~~~~~~~~~~~~~~~~~
> > >      > +
> > >      > +When the PMD detects a FW reset or error condition, it may try to recover
> > >      > +from the error without needing the application intervention. In such cases,

Wrong, the application needs to react on the event.

> > >      > +PMD would need events to notify the application that it is undergoing
> > >      > +an error recovery.
> > >      > +
> > >      > +The PMD should trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the
> > >      > +application that PMD detected a FW reset or FW error condition. PMD may
> > >      > +try to recover from the error by itself. Data path may be quiesced and

PMD may try to recover by itself but the application must take action.

> > >      > +control path operations may fail during the recovery period. The application
> > >      > +should stop polling till it receives RTE_ETH_EVENT_RECOVERED event from the PMD.

The application should stop polling and doing anything else with the device,
otherwise it will receive an error code during the recovery,
meaning until the RECOVERED event is received.
Said like that, it is less vague, you see.

> > >     Between the time FW error occurred and the application receive the RECOVERING event,
> > >     datapath will continue to poll and application may call control APIs, so the event
> > >     really is not solving the issue and driver somehow should be sure this won't crash
> > >     the application, in that case not sure about the benefit of this event.
> > >
> > > [Kalesh]: As soon as the driver detects a FW dead or reset condition, it sets the fastpath pointers to dummy functions. This will prevent the crash. All control path operations would fail with -EBUSY. This change is already there in bnxt PMD. This event is a notification to the application that the PMD is recovering from a FW error condition so that it can stop polling and issue control path operations.
> >
> > This was my point indeed. PMD can't rely on this event and should take actions (for both
> > datapath and control path) to prevent possible crash/error. If that is the case event
> > doesn't have that much benefit since PMD already has to protect itself.
> 
> This is an attempt to prevent PMD and applications from encountering
> unexpected situations because of an error in hardware or firmware.
> The unexpected scenarios can include lockups to segfaults apart from
> other situations.
> 
> When hardware or firmware encounters and error, propagating it to the
> application for it to react can be slow, time consuming and could be
> vendor specific.
> 
> Instead in the case of Broadcom devices we are trying to contain the
> detection and recovery within the hardware, firmware mostly and to a
> small extent driver framework.
> In the case of kernel drivers, this allows avoiding a system reset as
> much as possible while in the case of DPDK PMD, it can prevent
> application reloads or ungraceful exits.
> 
> The PMD generated notification to the application will be more like a
> "For Your Information" for most of the applications.

The most important for the application is to be tolerant to the errors
returned in the datapath and in control API.

> But applications can decide to act on the notification and take
> specific steps - like flushing or deleting all the existing flow
> rules, meters etc.. without waiting for success or failure indication.
> Applications can also re-offload the flow rules, recreate meters based
> on the notification allowing them to sync with the fresh hardware,
> firmware state.

We need to know what has to be recreated.
I think we should keep the same rule as for a restart.

> We think other PMD can also take advantage of these notifications to
> provide better reliability, accessibility and serviceability in
> general.

Indeed others can benefit of some mechanisms if they are common
*and well explained*.

[...]
> > >      > +The PMD should trigger RTE_ETH_EVENT_RECOVERED event to notify the application

s/should/must/

> > >      > +that the it has recovered from the error condition. PMD re-configures the port
> > >      > +to the state prior to the error condition. Control path and data path are up now.
> > >      > +Since the device has undergone a reset, flow rules offloaded prior to reset
> > >      > +may be lost and the application should recreate the rules again.

Please refer to the restart situation, including these capabilities:

/** Device supports keeping flow rules across restart. */
#define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP         RTE_BIT64(3)
/** Device supports keeping shared flow objects across restart. */
#define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)

> > >     I think the most difficult part here is clarify what application should do
> > >     when this event received consistent for all devices, "flow rules may be lost"
> > >     looks very vague to me.

This is the rule for a restart:

 * Please note that some configuration is not stored between calls to
 * rte_eth_dev_stop()/rte_eth_dev_start(). The following configuration will
 * be retained:
 *
 *     - MTU
 *     - flow control settings
 *     - receive mode configuration (promiscuous mode, all-multicast mode,
 *       hardware checksum mode, RSS/VMDq settings etc.)
 *     - VLAN filtering configuration
 *     - default MAC address
 *     - MAC addresses supplied to MAC address array
 *     - flow director filtering mode (but not filtering rules)
 *     - NIC queue statistics mappings
 *
 * The following configuration may be retained or not
 * depending on the device capabilities:
 *
 *     - flow rules
 *     - flow-related shared objects, e.g. indirect actions
 *
 * Any other configuration will not be stored and will need to be re-entered
 * before a call to rte_eth_dev_start().

> > >     Unless it is not clear for application what to do when this event is received,
> > >     it is not that useful or it will be specific to some PMDs. And I can see it is
> > >     hard to clarify this but perhaps we can define a set of common behavior.
> > >     Not sure what others are thinking.

+1

> > > [Kalesh]: Sure, let's wait for others' opinions as well.

Nothing new, we already did such comment in 2020.

> > >      > +The PMD should trigger RTE_ETH_EVENT_INTR_RMV event to notify the application
> > >      > +that it has failed to recover from the error condition. The device may not be
> > >      > +usable anymore.

Yes good idea to use RMV event in case of failure.
You should update the comment of RTE_ETH_EVENT_INTR_RMV
to list cases when it is fired:
	- device unplugged
	- recovery failure
	- reset failure?

> > >      > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > >      > index 147cc1c..a46819f 100644
> > >      > --- a/lib/ethdev/rte_ethdev.h
> > >      > +++ b/lib/ethdev/rte_ethdev.h
> > >      > @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
> > >      >       RTE_ETH_EVENT_DESTROY,  /**< port is released */
> > >      >       RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
> > >      >       RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
> > >      > +     RTE_ETH_EVENT_ERR_RECOVERING,
> > >      > +                     /**< port recovering from an error
> > >      > +                      *
> > >      > +                      * PMD detected a FW reset or error condition.
> > >      > +                      * PMD will try to recover from the error.
> > >      > +                      * Data path may be quiesced and Control path operations
> > >      > +                      * may fail at this time.
> > >      > +                      */
> > >
> > >     Please put multi line comments before enum, Andrew did a set of cleanups for these.
> > >
> > > [Kalesh]: Sure, will do.
> > >
> > >
> > >      > +     RTE_ETH_EVENT_RECOVERED,
> > >      > +                     /**< port recovered from an error
> > >      > +                      *
> > >      > +                      * PMD has recovered from the error condition.
> > >      > +                      * Control path and Data path are up now.
> > >      > +                      * PMD re-configures the port to the state prior to the error.
> > >      > +                      * Since the device has undergone a reset, flow rules
> > >      > +                      * offloaded prior to reset may be lost and
> > >      > +                      * the application should recreate the rules again.
> > >      > +                      */

Same as for the RST, the comments need to be very clear,
while being concise here.

I think there are a lot of tips in this thread to make this mechanism
work in a generic way. Please work on a precise new version, thanks.
  
Ray Kinsella Feb. 11, 2022, 10:09 a.m. UTC | #9
Thomas Monjalon <thomas@monjalon.net> writes:

> 02/02/2022 12:44, Ray Kinsella:
>> Ferruh Yigit <ferruh.yigit@intel.com> writes:
>> > On 1/28/2022 12:48 PM, Kalesh A P wrote:
>> >> --- a/lib/ethdev/rte_ethdev.h
>> >> +++ b/lib/ethdev/rte_ethdev.h
>> >> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
>> >>   	RTE_ETH_EVENT_DESTROY,  /**< port is released */
>> >>   	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
>> >>   	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
>> >> +	RTE_ETH_EVENT_ERR_RECOVERING,
>> >> +			/**< port recovering from an error
>> >> +			 *
>> >> +			 * PMD detected a FW reset or error condition.
>> >> +			 * PMD will try to recover from the error.
>> >> +			 * Data path may be quiesced and Control path operations
>> >> +			 * may fail at this time.
>> >> +			 */
>> >> +	RTE_ETH_EVENT_RECOVERED,
>> >> +			/**< port recovered from an error
>> >> +			 *
>> >> +			 * PMD has recovered from the error condition.
>> >> +			 * Control path and Data path are up now.
>> >> +			 * PMD re-configures the port to the state prior to the error.
>> >> +			 * Since the device has undergone a reset, flow rules
>> >> +			 * offloaded prior to reset may be lost and
>> >> +			 * the application should recreate the rules again.
>> >> +			 */
>> >>   	RTE_ETH_EVENT_MAX       /**< max value of this enum */
>> >
>> >
>> > Also ABI check complains about 'RTE_ETH_EVENT_MAX' value check, cc'ed more people
>> > to evaluate if it is a false positive:
>> >
>> >
>> > 1 function with some indirect sub-type change:
>> >   [C] 'function int rte_eth_dev_callback_register(uint16_t, rte_eth_event_type, rte_eth_dev_cb_fn, void*)' at rte_ethdev.c:4637:1 has some indirect sub-type changes:
>> >     parameter 3 of type 'typedef rte_eth_dev_cb_fn' has sub-type changes:
>> >       underlying type 'int (typedef uint16_t, enum rte_eth_event_type, void*, void*)*' changed:
>> >         in pointed to type 'function type int (typedef uint16_t, enum rte_eth_event_type, void*, void*)':
>> >           parameter 2 of type 'enum rte_eth_event_type' has sub-type changes:
>> >             type size hasn't changed
>> >             2 enumerator insertions:
>> >               'rte_eth_event_type::RTE_ETH_EVENT_ERR_RECOVERING' value '11'
>> >               'rte_eth_event_type::RTE_ETH_EVENT_RECOVERED' value '12'
>> >             1 enumerator change:
>> >               'rte_eth_event_type::RTE_ETH_EVENT_MAX' from value '11' to '13' at rte_ethdev.h:3807:1
>> 
>> I don't immediately see the problem that this would cause.
>> There are no array sizes etc dependent on the value of MAX for instance.
>> 
>> Looks safe?
>
> We never know how this enum will be used by the application.
> The max value may be used for the size of an event array.
> It looks a real ABI issue unfortunately.

Right - but we only really care about it when an array size based on MAX
is likely to be passed to DPDK, which doesn't apply in this case.

I noted that some Linux folks explicitly mark similar MAX values as not
part of the ABI.

/usr/include/linux/perf_event.h
37:     PERF_TYPE_MAX,                          /* non-ABI */
60:     PERF_COUNT_HW_MAX,                      /* non-ABI */
79:     PERF_COUNT_HW_CACHE_MAX,                /* non-ABI */
87:     PERF_COUNT_HW_CACHE_OP_MAX,             /* non-ABI */
94:     PERF_COUNT_HW_CACHE_RESULT_MAX,         /* non-ABI */
116:    PERF_COUNT_SW_MAX,                      /* non-ABI */
149:    PERF_SAMPLE_MAX = 1U << 24,             /* non-ABI */
151:    __PERF_SAMPLE_CALLCHAIN_EARLY           = 1ULL << 63, /*
non-ABI; internal use */
189:    PERF_SAMPLE_BRANCH_MAX_SHIFT            /* non-ABI */
267:    PERF_TXN_MAX            = (1 << 8), /* non-ABI */
301:    PERF_FORMAT_MAX = 1U << 4,              /* non-ABI */
1067:   PERF_RECORD_MAX,                        /* non-ABI */
1078:   PERF_RECORD_KSYMBOL_TYPE_MAX            /* non-ABI */
1087:   PERF_BPF_EVENT_MAX,             /* non-ABI */

>
> PS: I am not Cc'ed in this patchset,
> so copying what I said on v6 (more than a year ago):
> Please use the option --cc-cmd devtools/get-maintainer.sh
  
Ray Kinsella Feb. 14, 2022, 10:16 a.m. UTC | #10
Ray Kinsella <mdr@ashroe.eu> writes:

> Thomas Monjalon <thomas@monjalon.net> writes:
>
>> 02/02/2022 12:44, Ray Kinsella:
>>> Ferruh Yigit <ferruh.yigit@intel.com> writes:
>>> > On 1/28/2022 12:48 PM, Kalesh A P wrote:
>>> >> --- a/lib/ethdev/rte_ethdev.h
>>> >> +++ b/lib/ethdev/rte_ethdev.h
>>> >> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
>>> >>   	RTE_ETH_EVENT_DESTROY,  /**< port is released */
>>> >>   	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
>>> >>   	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
>>> >> +	RTE_ETH_EVENT_ERR_RECOVERING,
>>> >> +			/**< port recovering from an error
>>> >> +			 *
>>> >> +			 * PMD detected a FW reset or error condition.
>>> >> +			 * PMD will try to recover from the error.
>>> >> +			 * Data path may be quiesced and Control path operations
>>> >> +			 * may fail at this time.
>>> >> +			 */
>>> >> +	RTE_ETH_EVENT_RECOVERED,
>>> >> +			/**< port recovered from an error
>>> >> +			 *
>>> >> +			 * PMD has recovered from the error condition.
>>> >> +			 * Control path and Data path are up now.
>>> >> +			 * PMD re-configures the port to the state prior to the error.
>>> >> +			 * Since the device has undergone a reset, flow rules
>>> >> +			 * offloaded prior to reset may be lost and
>>> >> +			 * the application should recreate the rules again.
>>> >> +			 */
>>> >>   	RTE_ETH_EVENT_MAX       /**< max value of this enum */
>>> >
>>> >
>>> > Also ABI check complains about 'RTE_ETH_EVENT_MAX' value check, cc'ed more people
>>> > to evaluate if it is a false positive:
>>> >
>>> >
>>> > 1 function with some indirect sub-type change:
>>> >   [C] 'function int rte_eth_dev_callback_register(uint16_t, rte_eth_event_type, rte_eth_dev_cb_fn, void*)' at rte_ethdev.c:4637:1 has some indirect sub-type changes:
>>> >     parameter 3 of type 'typedef rte_eth_dev_cb_fn' has sub-type changes:
>>> >       underlying type 'int (typedef uint16_t, enum rte_eth_event_type, void*, void*)*' changed:
>>> >         in pointed to type 'function type int (typedef uint16_t, enum rte_eth_event_type, void*, void*)':
>>> >           parameter 2 of type 'enum rte_eth_event_type' has sub-type changes:
>>> >             type size hasn't changed
>>> >             2 enumerator insertions:
>>> >               'rte_eth_event_type::RTE_ETH_EVENT_ERR_RECOVERING' value '11'
>>> >               'rte_eth_event_type::RTE_ETH_EVENT_RECOVERED' value '12'
>>> >             1 enumerator change:
>>> >               'rte_eth_event_type::RTE_ETH_EVENT_MAX' from value '11' to '13' at rte_ethdev.h:3807:1
>>> 
>>> I don't immediately see the problem that this would cause.
>>> There are no array sizes etc dependent on the value of MAX for instance.
>>> 
>>> Looks safe?
>>
>> We never know how this enum will be used by the application.
>> The max value may be used for the size of an event array.
>> It looks a real ABI issue unfortunately.
>
> Right - but we only really care about it when an array size based on MAX
> is likely to be passed to DPDK, which doesn't apply in this case.
>
> I noted that some Linux folks explicitly mark similar MAX values as not
> part of the ABI.
>
> /usr/include/linux/perf_event.h
> 37:     PERF_TYPE_MAX,                          /* non-ABI */
> 60:     PERF_COUNT_HW_MAX,                      /* non-ABI */
> 79:     PERF_COUNT_HW_CACHE_MAX,                /* non-ABI */
> 87:     PERF_COUNT_HW_CACHE_OP_MAX,             /* non-ABI */
> 94:     PERF_COUNT_HW_CACHE_RESULT_MAX,         /* non-ABI */
> 116:    PERF_COUNT_SW_MAX,                      /* non-ABI */
> 149:    PERF_SAMPLE_MAX = 1U << 24,             /* non-ABI */
> 151:    __PERF_SAMPLE_CALLCHAIN_EARLY           = 1ULL << 63, /*
> non-ABI; internal use */
> 189:    PERF_SAMPLE_BRANCH_MAX_SHIFT            /* non-ABI */
> 267:    PERF_TXN_MAX            = (1 << 8), /* non-ABI */
> 301:    PERF_FORMAT_MAX = 1U << 4,              /* non-ABI */
> 1067:   PERF_RECORD_MAX,                        /* non-ABI */
> 1078:   PERF_RECORD_KSYMBOL_TYPE_MAX            /* non-ABI */
> 1087:   PERF_BPF_EVENT_MAX,             /* non-ABI */
>
>>
>> PS: I am not Cc'ed in this patchset,
>> so copying what I said on v6 (more than a year ago):
>> Please use the option --cc-cmd devtools/get-maintainer.sh

Any thoughts on similarly annotating all our _MAX enums in the same way?
We could also add a section in the ABI Policy to make it explicit _MAX
enum values are not part of the ABI - what do folks think?
  
Thomas Monjalon Feb. 14, 2022, 11:15 a.m. UTC | #11
14/02/2022 11:16, Ray Kinsella:
> Ray Kinsella <mdr@ashroe.eu> writes:
> > Thomas Monjalon <thomas@monjalon.net> writes:
> >> 02/02/2022 12:44, Ray Kinsella:
> >>> Ferruh Yigit <ferruh.yigit@intel.com> writes:
> >>> > On 1/28/2022 12:48 PM, Kalesh A P wrote:
> >>> >> --- a/lib/ethdev/rte_ethdev.h
> >>> >> +++ b/lib/ethdev/rte_ethdev.h
> >>> >> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
> >>> >>   	RTE_ETH_EVENT_DESTROY,  /**< port is released */
> >>> >>   	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
> >>> >>   	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
> >>> >> +	RTE_ETH_EVENT_ERR_RECOVERING,
> >>> >> +			/**< port recovering from an error
> >>> >> +			 *
> >>> >> +			 * PMD detected a FW reset or error condition.
> >>> >> +			 * PMD will try to recover from the error.
> >>> >> +			 * Data path may be quiesced and Control path operations
> >>> >> +			 * may fail at this time.
> >>> >> +			 */
> >>> >> +	RTE_ETH_EVENT_RECOVERED,
> >>> >> +			/**< port recovered from an error
> >>> >> +			 *
> >>> >> +			 * PMD has recovered from the error condition.
> >>> >> +			 * Control path and Data path are up now.
> >>> >> +			 * PMD re-configures the port to the state prior to the error.
> >>> >> +			 * Since the device has undergone a reset, flow rules
> >>> >> +			 * offloaded prior to reset may be lost and
> >>> >> +			 * the application should recreate the rules again.
> >>> >> +			 */
> >>> >>   	RTE_ETH_EVENT_MAX       /**< max value of this enum */
> >>> >
> >>> >
> >>> > Also ABI check complains about 'RTE_ETH_EVENT_MAX' value check, cc'ed more people
> >>> > to evaluate if it is a false positive:
> >>> >
> >>> >
> >>> > 1 function with some indirect sub-type change:
> >>> >   [C] 'function int rte_eth_dev_callback_register(uint16_t, rte_eth_event_type, rte_eth_dev_cb_fn, void*)' at rte_ethdev.c:4637:1 has some indirect sub-type changes:
> >>> >     parameter 3 of type 'typedef rte_eth_dev_cb_fn' has sub-type changes:
> >>> >       underlying type 'int (typedef uint16_t, enum rte_eth_event_type, void*, void*)*' changed:
> >>> >         in pointed to type 'function type int (typedef uint16_t, enum rte_eth_event_type, void*, void*)':
> >>> >           parameter 2 of type 'enum rte_eth_event_type' has sub-type changes:
> >>> >             type size hasn't changed
> >>> >             2 enumerator insertions:
> >>> >               'rte_eth_event_type::RTE_ETH_EVENT_ERR_RECOVERING' value '11'
> >>> >               'rte_eth_event_type::RTE_ETH_EVENT_RECOVERED' value '12'
> >>> >             1 enumerator change:
> >>> >               'rte_eth_event_type::RTE_ETH_EVENT_MAX' from value '11' to '13' at rte_ethdev.h:3807:1
> >>> 
> >>> I don't immediately see the problem that this would cause.
> >>> There are no array sizes etc dependent on the value of MAX for instance.
> >>> 
> >>> Looks safe?
> >>
> >> We never know how this enum will be used by the application.
> >> The max value may be used for the size of an event array.
> >> It looks a real ABI issue unfortunately.
> >
> > Right - but we only really care about it when an array size based on MAX
> > is likely to be passed to DPDK, which doesn't apply in this case.

I don't completely agree.
A developer may assume an event will never exceed MAX value.
However, after an upgrade of DPDK without app rebuild,
a higher event value may be received in the app,
breaking the assumption.
Should we consider this case as an ABI breakage?

> > I noted that some Linux folks explicitly mark similar MAX values as not
> > part of the ABI.
> >
> > /usr/include/linux/perf_event.h
> > 37:     PERF_TYPE_MAX,                          /* non-ABI */
> > 60:     PERF_COUNT_HW_MAX,                      /* non-ABI */
> > 79:     PERF_COUNT_HW_CACHE_MAX,                /* non-ABI */
> > 87:     PERF_COUNT_HW_CACHE_OP_MAX,             /* non-ABI */
> > 94:     PERF_COUNT_HW_CACHE_RESULT_MAX,         /* non-ABI */
> > 116:    PERF_COUNT_SW_MAX,                      /* non-ABI */
> > 149:    PERF_SAMPLE_MAX = 1U << 24,             /* non-ABI */
> > 151:    __PERF_SAMPLE_CALLCHAIN_EARLY           = 1ULL << 63, /*
> > non-ABI; internal use */
> > 189:    PERF_SAMPLE_BRANCH_MAX_SHIFT            /* non-ABI */
> > 267:    PERF_TXN_MAX            = (1 << 8), /* non-ABI */
> > 301:    PERF_FORMAT_MAX = 1U << 4,              /* non-ABI */
> > 1067:   PERF_RECORD_MAX,                        /* non-ABI */
> > 1078:   PERF_RECORD_KSYMBOL_TYPE_MAX            /* non-ABI */
> > 1087:   PERF_BPF_EVENT_MAX,             /* non-ABI */
> 
> Any thoughts on similarly annotating all our _MAX enums in the same way?
> We could also add a section in the ABI Policy to make it explicit _MAX
> enum values are not part of the ABI - what do folks think?

Interesting. I am not sure it is always ABI-safe though.
  
Ray Kinsella Feb. 14, 2022, 4:06 p.m. UTC | #12
Thomas Monjalon <thomas@monjalon.net> writes:

> 14/02/2022 11:16, Ray Kinsella:
>> Ray Kinsella <mdr@ashroe.eu> writes:
>> > Thomas Monjalon <thomas@monjalon.net> writes:
>> >> 02/02/2022 12:44, Ray Kinsella:
>> >>> Ferruh Yigit <ferruh.yigit@intel.com> writes:
>> >>> > On 1/28/2022 12:48 PM, Kalesh A P wrote:
>> >>> >> --- a/lib/ethdev/rte_ethdev.h
>> >>> >> +++ b/lib/ethdev/rte_ethdev.h
>> >>> >> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
>> >>> >>   	RTE_ETH_EVENT_DESTROY,  /**< port is released */
>> >>> >>   	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
>> >>> >>   	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
>> >>> >> +	RTE_ETH_EVENT_ERR_RECOVERING,
>> >>> >> +			/**< port recovering from an error
>> >>> >> +			 *
>> >>> >> +			 * PMD detected a FW reset or error condition.
>> >>> >> +			 * PMD will try to recover from the error.
>> >>> >> +			 * Data path may be quiesced and Control path operations
>> >>> >> +			 * may fail at this time.
>> >>> >> +			 */
>> >>> >> +	RTE_ETH_EVENT_RECOVERED,
>> >>> >> +			/**< port recovered from an error
>> >>> >> +			 *
>> >>> >> +			 * PMD has recovered from the error condition.
>> >>> >> +			 * Control path and Data path are up now.
>> >>> >> +			 * PMD re-configures the port to the state prior to the error.
>> >>> >> +			 * Since the device has undergone a reset, flow rules
>> >>> >> +			 * offloaded prior to reset may be lost and
>> >>> >> +			 * the application should recreate the rules again.
>> >>> >> +			 */
>> >>> >>   	RTE_ETH_EVENT_MAX       /**< max value of this enum */
>> >>> >
>> >>> >
>> >>> > Also ABI check complains about 'RTE_ETH_EVENT_MAX' value check, cc'ed more people
>> >>> > to evaluate if it is a false positive:
>> >>> >
>> >>> >
>> >>> > 1 function with some indirect sub-type change:
>> >>> >   [C] 'function int rte_eth_dev_callback_register(uint16_t, rte_eth_event_type, rte_eth_dev_cb_fn, void*)' at rte_ethdev.c:4637:1 has some indirect sub-type changes:
>> >>> >     parameter 3 of type 'typedef rte_eth_dev_cb_fn' has sub-type changes:
>> >>> >       underlying type 'int (typedef uint16_t, enum rte_eth_event_type, void*, void*)*' changed:
>> >>> >         in pointed to type 'function type int (typedef uint16_t, enum rte_eth_event_type, void*, void*)':
>> >>> >           parameter 2 of type 'enum rte_eth_event_type' has sub-type changes:
>> >>> >             type size hasn't changed
>> >>> >             2 enumerator insertions:
>> >>> >               'rte_eth_event_type::RTE_ETH_EVENT_ERR_RECOVERING' value '11'
>> >>> >               'rte_eth_event_type::RTE_ETH_EVENT_RECOVERED' value '12'
>> >>> >             1 enumerator change:
>> >>> >               'rte_eth_event_type::RTE_ETH_EVENT_MAX' from value '11' to '13' at rte_ethdev.h:3807:1
>> >>> 
>> >>> I don't immediately see the problem that this would cause.
>> >>> There are no array sizes etc dependent on the value of MAX for instance.
>> >>> 
>> >>> Looks safe?
>> >>
>> >> We never know how this enum will be used by the application.
>> >> The max value may be used for the size of an event array.
>> >> It looks a real ABI issue unfortunately.
>> >
>> > Right - but we only really care about it when an array size based on MAX
>> > is likely to be passed to DPDK, which doesn't apply in this case.
>
> I don't completely agree.
> A developer may assume an event will never exceed MAX value.
> However, after an upgrade of DPDK without app rebuild,
> a higher event value may be received in the app,
> breaking the assumption.
> Should we consider this case as an ABI breakage?

Nope - I think we should explicitly exclude MAX values from any
ABI guarantee, as being able to change them is key to our be able to
evolve DPDK while maintaining ABI stability. 

Consider what it means applying the ABI policy to a MAX value, you are
in effect saying that that no value can be added to this enumeration
until the next ABI version, for me this is very restrictive without a
solid reason. 

>
>> > I noted that some Linux folks explicitly mark similar MAX values as not
>> > part of the ABI.
>> >
>> > /usr/include/linux/perf_event.h
>> > 37:     PERF_TYPE_MAX,                          /* non-ABI */
>> > 60:     PERF_COUNT_HW_MAX,                      /* non-ABI */
>> > 79:     PERF_COUNT_HW_CACHE_MAX,                /* non-ABI */
>> > 87:     PERF_COUNT_HW_CACHE_OP_MAX,             /* non-ABI */
>> > 94:     PERF_COUNT_HW_CACHE_RESULT_MAX,         /* non-ABI */
>> > 116:    PERF_COUNT_SW_MAX,                      /* non-ABI */
>> > 149:    PERF_SAMPLE_MAX = 1U << 24,             /* non-ABI */
>> > 151:    __PERF_SAMPLE_CALLCHAIN_EARLY           = 1ULL << 63, /*
>> > non-ABI; internal use */
>> > 189:    PERF_SAMPLE_BRANCH_MAX_SHIFT            /* non-ABI */
>> > 267:    PERF_TXN_MAX            = (1 << 8), /* non-ABI */
>> > 301:    PERF_FORMAT_MAX = 1U << 4,              /* non-ABI */
>> > 1067:   PERF_RECORD_MAX,                        /* non-ABI */
>> > 1078:   PERF_RECORD_KSYMBOL_TYPE_MAX            /* non-ABI */
>> > 1087:   PERF_BPF_EVENT_MAX,             /* non-ABI */
>> 
>> Any thoughts on similarly annotating all our _MAX enums in the same way?
>> We could also add a section in the ABI Policy to make it explicit _MAX
>> enum values are not part of the ABI - what do folks think?
>
> Interesting. I am not sure it is always ABI-safe though.
  
Thomas Monjalon Feb. 14, 2022, 4:25 p.m. UTC | #13
14/02/2022 17:06, Ray Kinsella:
> Thomas Monjalon <thomas@monjalon.net> writes:
> > 14/02/2022 11:16, Ray Kinsella:
> >> Ray Kinsella <mdr@ashroe.eu> writes:
> >> > Thomas Monjalon <thomas@monjalon.net> writes:
> >> >> 02/02/2022 12:44, Ray Kinsella:
> >> >>> Ferruh Yigit <ferruh.yigit@intel.com> writes:
> >> >>> > On 1/28/2022 12:48 PM, Kalesh A P wrote:
> >> >>> >> --- a/lib/ethdev/rte_ethdev.h
> >> >>> >> +++ b/lib/ethdev/rte_ethdev.h
> >> >>> >> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
> >> >>> >>   	RTE_ETH_EVENT_DESTROY,  /**< port is released */
> >> >>> >>   	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
> >> >>> >>   	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
> >> >>> >> +	RTE_ETH_EVENT_ERR_RECOVERING,
> >> >>> >> +			/**< port recovering from an error
> >> >>> >> +			 *
> >> >>> >> +			 * PMD detected a FW reset or error condition.
> >> >>> >> +			 * PMD will try to recover from the error.
> >> >>> >> +			 * Data path may be quiesced and Control path operations
> >> >>> >> +			 * may fail at this time.
> >> >>> >> +			 */
> >> >>> >> +	RTE_ETH_EVENT_RECOVERED,
> >> >>> >> +			/**< port recovered from an error
> >> >>> >> +			 *
> >> >>> >> +			 * PMD has recovered from the error condition.
> >> >>> >> +			 * Control path and Data path are up now.
> >> >>> >> +			 * PMD re-configures the port to the state prior to the error.
> >> >>> >> +			 * Since the device has undergone a reset, flow rules
> >> >>> >> +			 * offloaded prior to reset may be lost and
> >> >>> >> +			 * the application should recreate the rules again.
> >> >>> >> +			 */
> >> >>> >>   	RTE_ETH_EVENT_MAX       /**< max value of this enum */
> >> >>> >
> >> >>> >
> >> >>> > Also ABI check complains about 'RTE_ETH_EVENT_MAX' value check, cc'ed more people
> >> >>> > to evaluate if it is a false positive:
> >> >>> >
> >> >>> >
> >> >>> > 1 function with some indirect sub-type change:
> >> >>> >   [C] 'function int rte_eth_dev_callback_register(uint16_t, rte_eth_event_type, rte_eth_dev_cb_fn, void*)' at rte_ethdev.c:4637:1 has some indirect sub-type changes:
> >> >>> >     parameter 3 of type 'typedef rte_eth_dev_cb_fn' has sub-type changes:
> >> >>> >       underlying type 'int (typedef uint16_t, enum rte_eth_event_type, void*, void*)*' changed:
> >> >>> >         in pointed to type 'function type int (typedef uint16_t, enum rte_eth_event_type, void*, void*)':
> >> >>> >           parameter 2 of type 'enum rte_eth_event_type' has sub-type changes:
> >> >>> >             type size hasn't changed
> >> >>> >             2 enumerator insertions:
> >> >>> >               'rte_eth_event_type::RTE_ETH_EVENT_ERR_RECOVERING' value '11'
> >> >>> >               'rte_eth_event_type::RTE_ETH_EVENT_RECOVERED' value '12'
> >> >>> >             1 enumerator change:
> >> >>> >               'rte_eth_event_type::RTE_ETH_EVENT_MAX' from value '11' to '13' at rte_ethdev.h:3807:1
> >> >>> 
> >> >>> I don't immediately see the problem that this would cause.
> >> >>> There are no array sizes etc dependent on the value of MAX for instance.
> >> >>> 
> >> >>> Looks safe?
> >> >>
> >> >> We never know how this enum will be used by the application.
> >> >> The max value may be used for the size of an event array.
> >> >> It looks a real ABI issue unfortunately.
> >> >
> >> > Right - but we only really care about it when an array size based on MAX
> >> > is likely to be passed to DPDK, which doesn't apply in this case.
> >
> > I don't completely agree.
> > A developer may assume an event will never exceed MAX value.
> > However, after an upgrade of DPDK without app rebuild,
> > a higher event value may be received in the app,
> > breaking the assumption.
> > Should we consider this case as an ABI breakage?
> 
> Nope - I think we should explicitly exclude MAX values from any
> ABI guarantee, as being able to change them is key to our be able to
> evolve DPDK while maintaining ABI stability.

Or we can simply remove the MAX values so there is no confusion.

> Consider what it means applying the ABI policy to a MAX value, you are
> in effect saying that that no value can be added to this enumeration
> until the next ABI version, for me this is very restrictive without a
> solid reason. 

I agree it is too much restrictive, that's why I am advocating
for their removal.

> >> > I noted that some Linux folks explicitly mark similar MAX values as not
> >> > part of the ABI.
> >> >
> >> > /usr/include/linux/perf_event.h
> >> > 37:     PERF_TYPE_MAX,                          /* non-ABI */
> >> > 60:     PERF_COUNT_HW_MAX,                      /* non-ABI */
> >> > 79:     PERF_COUNT_HW_CACHE_MAX,                /* non-ABI */
> >> > 87:     PERF_COUNT_HW_CACHE_OP_MAX,             /* non-ABI */
> >> > 94:     PERF_COUNT_HW_CACHE_RESULT_MAX,         /* non-ABI */
> >> > 116:    PERF_COUNT_SW_MAX,                      /* non-ABI */
> >> > 149:    PERF_SAMPLE_MAX = 1U << 24,             /* non-ABI */
> >> > 151:    __PERF_SAMPLE_CALLCHAIN_EARLY           = 1ULL << 63, /*
> >> > non-ABI; internal use */
> >> > 189:    PERF_SAMPLE_BRANCH_MAX_SHIFT            /* non-ABI */
> >> > 267:    PERF_TXN_MAX            = (1 << 8), /* non-ABI */
> >> > 301:    PERF_FORMAT_MAX = 1U << 4,              /* non-ABI */
> >> > 1067:   PERF_RECORD_MAX,                        /* non-ABI */
> >> > 1078:   PERF_RECORD_KSYMBOL_TYPE_MAX            /* non-ABI */
> >> > 1087:   PERF_BPF_EVENT_MAX,             /* non-ABI */
> >> 
> >> Any thoughts on similarly annotating all our _MAX enums in the same way?
> >> We could also add a section in the ABI Policy to make it explicit _MAX
> >> enum values are not part of the ABI - what do folks think?
> >
> > Interesting. I am not sure it is always ABI-safe though.
  
Ray Kinsella Feb. 14, 2022, 6:27 p.m. UTC | #14
Thomas Monjalon <thomas@monjalon.net> writes:

> 14/02/2022 17:06, Ray Kinsella:
>> Thomas Monjalon <thomas@monjalon.net> writes:
>> > 14/02/2022 11:16, Ray Kinsella:
>> >> Ray Kinsella <mdr@ashroe.eu> writes:
>> >> > Thomas Monjalon <thomas@monjalon.net> writes:
>> >> >> 02/02/2022 12:44, Ray Kinsella:
>> >> >>> Ferruh Yigit <ferruh.yigit@intel.com> writes:
>> >> >>> > On 1/28/2022 12:48 PM, Kalesh A P wrote:
>> >> >>> >> --- a/lib/ethdev/rte_ethdev.h
>> >> >>> >> +++ b/lib/ethdev/rte_ethdev.h
>> >> >>> >> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
>> >> >>> >>   	RTE_ETH_EVENT_DESTROY,  /**< port is released */
>> >> >>> >>   	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
>> >> >>> >>   	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
>> >> >>> >> +	RTE_ETH_EVENT_ERR_RECOVERING,
>> >> >>> >> +			/**< port recovering from an error
>> >> >>> >> +			 *
>> >> >>> >> +			 * PMD detected a FW reset or error condition.
>> >> >>> >> +			 * PMD will try to recover from the error.
>> >> >>> >> +			 * Data path may be quiesced and Control path operations
>> >> >>> >> +			 * may fail at this time.
>> >> >>> >> +			 */
>> >> >>> >> +	RTE_ETH_EVENT_RECOVERED,
>> >> >>> >> +			/**< port recovered from an error
>> >> >>> >> +			 *
>> >> >>> >> +			 * PMD has recovered from the error condition.
>> >> >>> >> +			 * Control path and Data path are up now.
>> >> >>> >> +			 * PMD re-configures the port to the state prior to the error.
>> >> >>> >> +			 * Since the device has undergone a reset, flow rules
>> >> >>> >> +			 * offloaded prior to reset may be lost and
>> >> >>> >> +			 * the application should recreate the rules again.
>> >> >>> >> +			 */
>> >> >>> >>   	RTE_ETH_EVENT_MAX       /**< max value of this enum */
>> >> >>> >
>> >> >>> >
>> >> >>> > Also ABI check complains about 'RTE_ETH_EVENT_MAX' value check, cc'ed more people
>> >> >>> > to evaluate if it is a false positive:
>> >> >>> >
>> >> >>> >
>> >> >>> > 1 function with some indirect sub-type change:
>> >> >>> >   [C] 'function int rte_eth_dev_callback_register(uint16_t, rte_eth_event_type, rte_eth_dev_cb_fn, void*)' at rte_ethdev.c:4637:1 has some indirect sub-type changes:
>> >> >>> >     parameter 3 of type 'typedef rte_eth_dev_cb_fn' has sub-type changes:
>> >> >>> >       underlying type 'int (typedef uint16_t, enum rte_eth_event_type, void*, void*)*' changed:
>> >> >>> >         in pointed to type 'function type int (typedef uint16_t, enum rte_eth_event_type, void*, void*)':
>> >> >>> >           parameter 2 of type 'enum rte_eth_event_type' has sub-type changes:
>> >> >>> >             type size hasn't changed
>> >> >>> >             2 enumerator insertions:
>> >> >>> >               'rte_eth_event_type::RTE_ETH_EVENT_ERR_RECOVERING' value '11'
>> >> >>> >               'rte_eth_event_type::RTE_ETH_EVENT_RECOVERED' value '12'
>> >> >>> >             1 enumerator change:
>> >> >>> >               'rte_eth_event_type::RTE_ETH_EVENT_MAX' from value '11' to '13' at rte_ethdev.h:3807:1
>> >> >>> 
>> >> >>> I don't immediately see the problem that this would cause.
>> >> >>> There are no array sizes etc dependent on the value of MAX for instance.
>> >> >>> 
>> >> >>> Looks safe?
>> >> >>
>> >> >> We never know how this enum will be used by the application.
>> >> >> The max value may be used for the size of an event array.
>> >> >> It looks a real ABI issue unfortunately.
>> >> >
>> >> > Right - but we only really care about it when an array size based on MAX
>> >> > is likely to be passed to DPDK, which doesn't apply in this case.
>> >
>> > I don't completely agree.
>> > A developer may assume an event will never exceed MAX value.
>> > However, after an upgrade of DPDK without app rebuild,
>> > a higher event value may be received in the app,
>> > breaking the assumption.
>> > Should we consider this case as an ABI breakage?
>> 
>> Nope - I think we should explicitly exclude MAX values from any
>> ABI guarantee, as being able to change them is key to our be able to
>> evolve DPDK while maintaining ABI stability.
>
> Or we can simply remove the MAX values so there is no confusion.
>
>> Consider what it means applying the ABI policy to a MAX value, you are
>> in effect saying that that no value can be added to this enumeration
>> until the next ABI version, for me this is very restrictive without a
>> solid reason. 
>
> I agree it is too much restrictive, that's why I am advocating
> for their removal.

I think that would be simplest yes - may require some rework of the
sample code though. I will take a look at it.

>
>> >> > I noted that some Linux folks explicitly mark similar MAX values as not
>> >> > part of the ABI.
>> >> >
>> >> > /usr/include/linux/perf_event.h
>> >> > 37:     PERF_TYPE_MAX,                          /* non-ABI */
>> >> > 60:     PERF_COUNT_HW_MAX,                      /* non-ABI */
>> >> > 79:     PERF_COUNT_HW_CACHE_MAX,                /* non-ABI */
>> >> > 87:     PERF_COUNT_HW_CACHE_OP_MAX,             /* non-ABI */
>> >> > 94:     PERF_COUNT_HW_CACHE_RESULT_MAX,         /* non-ABI */
>> >> > 116:    PERF_COUNT_SW_MAX,                      /* non-ABI */
>> >> > 149:    PERF_SAMPLE_MAX = 1U << 24,             /* non-ABI */
>> >> > 151:    __PERF_SAMPLE_CALLCHAIN_EARLY           = 1ULL << 63, /*
>> >> > non-ABI; internal use */
>> >> > 189:    PERF_SAMPLE_BRANCH_MAX_SHIFT            /* non-ABI */
>> >> > 267:    PERF_TXN_MAX            = (1 << 8), /* non-ABI */
>> >> > 301:    PERF_FORMAT_MAX = 1U << 4,              /* non-ABI */
>> >> > 1067:   PERF_RECORD_MAX,                        /* non-ABI */
>> >> > 1078:   PERF_RECORD_KSYMBOL_TYPE_MAX            /* non-ABI */
>> >> > 1087:   PERF_BPF_EVENT_MAX,             /* non-ABI */
>> >> 
>> >> Any thoughts on similarly annotating all our _MAX enums in the same way?
>> >> We could also add a section in the ABI Policy to make it explicit _MAX
>> >> enum values are not part of the ABI - what do folks think?
>> >
>> > Interesting. I am not sure it is always ABI-safe though.
  
Ray Kinsella Feb. 15, 2022, 1:55 p.m. UTC | #15
Ray Kinsella <mdr@ashroe.eu> writes:

> Thomas Monjalon <thomas@monjalon.net> writes:
>
>> 14/02/2022 17:06, Ray Kinsella:
>>> Thomas Monjalon <thomas@monjalon.net> writes:
>>> > 14/02/2022 11:16, Ray Kinsella:
>>> >> Ray Kinsella <mdr@ashroe.eu> writes:
>>> >> > Thomas Monjalon <thomas@monjalon.net> writes:
>>> >> >> 02/02/2022 12:44, Ray Kinsella:
>>> >> >>> Ferruh Yigit <ferruh.yigit@intel.com> writes:
>>> >> >>> > On 1/28/2022 12:48 PM, Kalesh A P wrote:
>>> >> >>> >> --- a/lib/ethdev/rte_ethdev.h
>>> >> >>> >> +++ b/lib/ethdev/rte_ethdev.h
>>> >> >>> >> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
>>> >> >>> >>   	RTE_ETH_EVENT_DESTROY,  /**< port is released */
>>> >> >>> >>   	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
>>> >> >>> >>   	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
>>> >> >>> >> +	RTE_ETH_EVENT_ERR_RECOVERING,
>>> >> >>> >> +			/**< port recovering from an error
>>> >> >>> >> +			 *
>>> >> >>> >> +			 * PMD detected a FW reset or error condition.
>>> >> >>> >> +			 * PMD will try to recover from the error.
>>> >> >>> >> +			 * Data path may be quiesced and Control path operations
>>> >> >>> >> +			 * may fail at this time.
>>> >> >>> >> +			 */
>>> >> >>> >> +	RTE_ETH_EVENT_RECOVERED,
>>> >> >>> >> +			/**< port recovered from an error
>>> >> >>> >> +			 *
>>> >> >>> >> +			 * PMD has recovered from the error condition.
>>> >> >>> >> +			 * Control path and Data path are up now.
>>> >> >>> >> +			 * PMD re-configures the port to the state prior to the error.
>>> >> >>> >> +			 * Since the device has undergone a reset, flow rules
>>> >> >>> >> +			 * offloaded prior to reset may be lost and
>>> >> >>> >> +			 * the application should recreate the rules again.
>>> >> >>> >> +			 */
>>> >> >>> >>   	RTE_ETH_EVENT_MAX       /**< max value of this enum */
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > Also ABI check complains about 'RTE_ETH_EVENT_MAX' value check, cc'ed more people
>>> >> >>> > to evaluate if it is a false positive:
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > 1 function with some indirect sub-type change:
>>> >> >>> >   [C] 'function int rte_eth_dev_callback_register(uint16_t, rte_eth_event_type, rte_eth_dev_cb_fn, void*)' at rte_ethdev.c:4637:1 has some indirect sub-type changes:
>>> >> >>> >     parameter 3 of type 'typedef rte_eth_dev_cb_fn' has sub-type changes:
>>> >> >>> >       underlying type 'int (typedef uint16_t, enum rte_eth_event_type, void*, void*)*' changed:
>>> >> >>> >         in pointed to type 'function type int (typedef uint16_t, enum rte_eth_event_type, void*, void*)':
>>> >> >>> >           parameter 2 of type 'enum rte_eth_event_type' has sub-type changes:
>>> >> >>> >             type size hasn't changed
>>> >> >>> >             2 enumerator insertions:
>>> >> >>> >               'rte_eth_event_type::RTE_ETH_EVENT_ERR_RECOVERING' value '11'
>>> >> >>> >               'rte_eth_event_type::RTE_ETH_EVENT_RECOVERED' value '12'
>>> >> >>> >             1 enumerator change:
>>> >> >>> >               'rte_eth_event_type::RTE_ETH_EVENT_MAX' from value '11' to '13' at rte_ethdev.h:3807:1
>>> >> >>> 
>>> >> >>> I don't immediately see the problem that this would cause.
>>> >> >>> There are no array sizes etc dependent on the value of MAX for instance.
>>> >> >>> 
>>> >> >>> Looks safe?
>>> >> >>
>>> >> >> We never know how this enum will be used by the application.
>>> >> >> The max value may be used for the size of an event array.
>>> >> >> It looks a real ABI issue unfortunately.
>>> >> >
>>> >> > Right - but we only really care about it when an array size based on MAX
>>> >> > is likely to be passed to DPDK, which doesn't apply in this case.
>>> >
>>> > I don't completely agree.
>>> > A developer may assume an event will never exceed MAX value.
>>> > However, after an upgrade of DPDK without app rebuild,
>>> > a higher event value may be received in the app,
>>> > breaking the assumption.
>>> > Should we consider this case as an ABI breakage?
>>> 
>>> Nope - I think we should explicitly exclude MAX values from any
>>> ABI guarantee, as being able to change them is key to our be able to
>>> evolve DPDK while maintaining ABI stability.
>>
>> Or we can simply remove the MAX values so there is no confusion.
>>
>>> Consider what it means applying the ABI policy to a MAX value, you are
>>> in effect saying that that no value can be added to this enumeration
>>> until the next ABI version, for me this is very restrictive without a
>>> solid reason. 
>>
>> I agree it is too much restrictive, that's why I am advocating
>> for their removal.
>
> I think that would be simplest yes - may require some rework of the
> sample code though. I will take a look at it.

Thinking about this some more - we can't remove the MAX values between
now the next stable ABI. So we may need a short term plan, and long term
plan.

Long term, I agree we look at every _MAX enumeration value and ask do we
need it.

Short term (until the next ABI), we still need to answer the question do
we allow people to change _MAX values?

>>
>>> >> > I noted that some Linux folks explicitly mark similar MAX values as not
>>> >> > part of the ABI.
>>> >> >
>>> >> > /usr/include/linux/perf_event.h
>>> >> > 37:     PERF_TYPE_MAX,                          /* non-ABI */
>>> >> > 60:     PERF_COUNT_HW_MAX,                      /* non-ABI */
>>> >> > 79:     PERF_COUNT_HW_CACHE_MAX,                /* non-ABI */
>>> >> > 87:     PERF_COUNT_HW_CACHE_OP_MAX,             /* non-ABI */
>>> >> > 94:     PERF_COUNT_HW_CACHE_RESULT_MAX,         /* non-ABI */
>>> >> > 116:    PERF_COUNT_SW_MAX,                      /* non-ABI */
>>> >> > 149:    PERF_SAMPLE_MAX = 1U << 24,             /* non-ABI */
>>> >> > 151:    __PERF_SAMPLE_CALLCHAIN_EARLY           = 1ULL << 63, /*
>>> >> > non-ABI; internal use */
>>> >> > 189:    PERF_SAMPLE_BRANCH_MAX_SHIFT            /* non-ABI */
>>> >> > 267:    PERF_TXN_MAX            = (1 << 8), /* non-ABI */
>>> >> > 301:    PERF_FORMAT_MAX = 1U << 4,              /* non-ABI */
>>> >> > 1067:   PERF_RECORD_MAX,                        /* non-ABI */
>>> >> > 1078:   PERF_RECORD_KSYMBOL_TYPE_MAX            /* non-ABI */
>>> >> > 1087:   PERF_BPF_EVENT_MAX,             /* non-ABI */
>>> >> 
>>> >> Any thoughts on similarly annotating all our _MAX enums in the same way?
>>> >> We could also add a section in the ABI Policy to make it explicit _MAX
>>> >> enum values are not part of the ABI - what do folks think?
>>> >
>>> > Interesting. I am not sure it is always ABI-safe though.
  
Thomas Monjalon Feb. 15, 2022, 3:12 p.m. UTC | #16
15/02/2022 14:55, Ray Kinsella:
> Ray Kinsella <mdr@ashroe.eu> writes:
> > Thomas Monjalon <thomas@monjalon.net> writes:
> >> 14/02/2022 17:06, Ray Kinsella:
> >>> Thomas Monjalon <thomas@monjalon.net> writes:
> >>> > 14/02/2022 11:16, Ray Kinsella:
> >>> >> Ray Kinsella <mdr@ashroe.eu> writes:
> >>> >> > Thomas Monjalon <thomas@monjalon.net> writes:
> >>> >> >> We never know how this enum will be used by the application.
> >>> >> >> The max value may be used for the size of an event array.
> >>> >> >> It looks a real ABI issue unfortunately.
> >>> >> >
> >>> >> > Right - but we only really care about it when an array size based on MAX
> >>> >> > is likely to be passed to DPDK, which doesn't apply in this case.
> >>> >
> >>> > I don't completely agree.
> >>> > A developer may assume an event will never exceed MAX value.
> >>> > However, after an upgrade of DPDK without app rebuild,
> >>> > a higher event value may be received in the app,
> >>> > breaking the assumption.
> >>> > Should we consider this case as an ABI breakage?
> >>> 
> >>> Nope - I think we should explicitly exclude MAX values from any
> >>> ABI guarantee, as being able to change them is key to our be able to
> >>> evolve DPDK while maintaining ABI stability.
> >>
> >> Or we can simply remove the MAX values so there is no confusion.
> >>
> >>> Consider what it means applying the ABI policy to a MAX value, you are
> >>> in effect saying that that no value can be added to this enumeration
> >>> until the next ABI version, for me this is very restrictive without a
> >>> solid reason. 
> >>
> >> I agree it is too much restrictive, that's why I am advocating
> >> for their removal.
> >
> > I think that would be simplest yes - may require some rework of the
> > sample code though. I will take a look at it.
> 
> Thinking about this some more - we can't remove the MAX values between
> now the next stable ABI. So we may need a short term plan, and long term
> plan.
> 
> Long term, I agree we look at every _MAX enumeration value and ask do we
> need it.
> 
> Short term (until the next ABI), we still need to answer the question do
> we allow people to change _MAX values?

There's a problem of incentive.
We already said in the past that we should remove the _MAX values,
but it doesn't happen because our time on this planet is limited.
I think it would work if the developers have a special interest in such work.
And guess what? Here there is a special interest to remove this one.
If we are at the negotiation table, we could imagine a short-term exception
if the long-term patch is available and reviewed.
  
Ray Kinsella Feb. 15, 2022, 4:12 p.m. UTC | #17
Thomas Monjalon <thomas@monjalon.net> writes:

> 15/02/2022 14:55, Ray Kinsella:
>> Ray Kinsella <mdr@ashroe.eu> writes:
>> > Thomas Monjalon <thomas@monjalon.net> writes:
>> >> 14/02/2022 17:06, Ray Kinsella:
>> >>> Thomas Monjalon <thomas@monjalon.net> writes:
>> >>> > 14/02/2022 11:16, Ray Kinsella:
>> >>> >> Ray Kinsella <mdr@ashroe.eu> writes:
>> >>> >> > Thomas Monjalon <thomas@monjalon.net> writes:
>> >>> >> >> We never know how this enum will be used by the application.
>> >>> >> >> The max value may be used for the size of an event array.
>> >>> >> >> It looks a real ABI issue unfortunately.
>> >>> >> >
>> >>> >> > Right - but we only really care about it when an array size based on MAX
>> >>> >> > is likely to be passed to DPDK, which doesn't apply in this case.
>> >>> >
>> >>> > I don't completely agree.
>> >>> > A developer may assume an event will never exceed MAX value.
>> >>> > However, after an upgrade of DPDK without app rebuild,
>> >>> > a higher event value may be received in the app,
>> >>> > breaking the assumption.
>> >>> > Should we consider this case as an ABI breakage?
>> >>> 
>> >>> Nope - I think we should explicitly exclude MAX values from any
>> >>> ABI guarantee, as being able to change them is key to our be able to
>> >>> evolve DPDK while maintaining ABI stability.
>> >>
>> >> Or we can simply remove the MAX values so there is no confusion.
>> >>
>> >>> Consider what it means applying the ABI policy to a MAX value, you are
>> >>> in effect saying that that no value can be added to this enumeration
>> >>> until the next ABI version, for me this is very restrictive without a
>> >>> solid reason. 
>> >>
>> >> I agree it is too much restrictive, that's why I am advocating
>> >> for their removal.
>> >
>> > I think that would be simplest yes - may require some rework of the
>> > sample code though. I will take a look at it.
>> 
>> Thinking about this some more - we can't remove the MAX values between
>> now the next stable ABI. So we may need a short term plan, and long term
>> plan.
>> 
>> Long term, I agree we look at every _MAX enumeration value and ask do we
>> need it.
>> 
>> Short term (until the next ABI), we still need to answer the question do
>> we allow people to change _MAX values?
>
> There's a problem of incentive.
> We already said in the past that we should remove the _MAX values,
> but it doesn't happen because our time on this planet is limited.
> I think it would work if the developers have a special interest in such work.
> And guess what? Here there is a special interest to remove this one.
> If we are at the negotiation table, we could imagine a short-term exception
> if the long-term patch is available and reviewed.

:-) ... I like that plan.
  
Chengwen Feng May 21, 2022, 10:33 a.m. UTC | #18
Hi all,

  This patch lasts for a long time. Are we waiting for 22.11 to deal with it?

  We have the same requirements for the reset or recovery mechanism, but there are differences:

    APP                                    PMD
     |                                      |
     |                                  detect error
     |     <---report error event---        |
     |                                      |
do error stats                              |
and report                                  |
     |      ---start recover-->             |
     |                                  do recover
     |     <---report recover result        |
     |                                      |
if succ just log
else may migrate
service

Can we generalize these processes(means that the implementation is at the framework layer)? or only at PMD API?


On 2022/2/15 0:06, Ray Kinsella wrote:
> 
> Thomas Monjalon <thomas@monjalon.net> writes:
> 
>> 14/02/2022 11:16, Ray Kinsella:
>>> Ray Kinsella <mdr@ashroe.eu> writes:
>>>> Thomas Monjalon <thomas@monjalon.net> writes:
>>>>> 02/02/2022 12:44, Ray Kinsella:
>>>>>> Ferruh Yigit <ferruh.yigit@intel.com> writes:
>>>>>>> On 1/28/2022 12:48 PM, Kalesh A P wrote:
>>>>>>>> --- a/lib/ethdev/rte_ethdev.h
>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>>>>>> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
>>>>>>>>   	RTE_ETH_EVENT_DESTROY,  /**< port is released */
>>>>>>>>   	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
>>>>>>>>   	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
>>>>>>>> +	RTE_ETH_EVENT_ERR_RECOVERING,
>>>>>>>> +			/**< port recovering from an error
>>>>>>>> +			 *
>>>>>>>> +			 * PMD detected a FW reset or error condition.
>>>>>>>> +			 * PMD will try to recover from the error.
>>>>>>>> +			 * Data path may be quiesced and Control path operations
>>>>>>>> +			 * may fail at this time.
>>>>>>>> +			 */
>>>>>>>> +	RTE_ETH_EVENT_RECOVERED,
>>>>>>>> +			/**< port recovered from an error
>>>>>>>> +			 *
>>>>>>>> +			 * PMD has recovered from the error condition.
>>>>>>>> +			 * Control path and Data path are up now.
>>>>>>>> +			 * PMD re-configures the port to the state prior to the error.
>>>>>>>> +			 * Since the device has undergone a reset, flow rules
>>>>>>>> +			 * offloaded prior to reset may be lost and
>>>>>>>> +			 * the application should recreate the rules again.
>>>>>>>> +			 */
>>>>>>>>   	RTE_ETH_EVENT_MAX       /**< max value of this enum */
>>>>>>>
>>>>>>>
>>>>>>> Also ABI check complains about 'RTE_ETH_EVENT_MAX' value check, cc'ed more people
>>>>>>> to evaluate if it is a false positive:
>>>>>>>
>>>>>>>
>>>>>>> 1 function with some indirect sub-type change:
>>>>>>>   [C] 'function int rte_eth_dev_callback_register(uint16_t, rte_eth_event_type, rte_eth_dev_cb_fn, void*)' at rte_ethdev.c:4637:1 has some indirect sub-type changes:
>>>>>>>     parameter 3 of type 'typedef rte_eth_dev_cb_fn' has sub-type changes:
>>>>>>>       underlying type 'int (typedef uint16_t, enum rte_eth_event_type, void*, void*)*' changed:
>>>>>>>         in pointed to type 'function type int (typedef uint16_t, enum rte_eth_event_type, void*, void*)':
>>>>>>>           parameter 2 of type 'enum rte_eth_event_type' has sub-type changes:
>>>>>>>             type size hasn't changed
>>>>>>>             2 enumerator insertions:
>>>>>>>               'rte_eth_event_type::RTE_ETH_EVENT_ERR_RECOVERING' value '11'
>>>>>>>               'rte_eth_event_type::RTE_ETH_EVENT_RECOVERED' value '12'
>>>>>>>             1 enumerator change:
>>>>>>>               'rte_eth_event_type::RTE_ETH_EVENT_MAX' from value '11' to '13' at rte_ethdev.h:3807:1
>>>>>>
>>>>>> I don't immediately see the problem that this would cause.
>>>>>> There are no array sizes etc dependent on the value of MAX for instance.
>>>>>>
>>>>>> Looks safe?
>>>>>
>>>>> We never know how this enum will be used by the application.
>>>>> The max value may be used for the size of an event array.
>>>>> It looks a real ABI issue unfortunately.
>>>>
>>>> Right - but we only really care about it when an array size based on MAX
>>>> is likely to be passed to DPDK, which doesn't apply in this case.
>>
>> I don't completely agree.
>> A developer may assume an event will never exceed MAX value.
>> However, after an upgrade of DPDK without app rebuild,
>> a higher event value may be received in the app,
>> breaking the assumption.
>> Should we consider this case as an ABI breakage?
> 
> Nope - I think we should explicitly exclude MAX values from any
> ABI guarantee, as being able to change them is key to our be able to
> evolve DPDK while maintaining ABI stability. 
> 
> Consider what it means applying the ABI policy to a MAX value, you are
> in effect saying that that no value can be added to this enumeration
> until the next ABI version, for me this is very restrictive without a
> solid reason. 
> 
>>
>>>> I noted that some Linux folks explicitly mark similar MAX values as not
>>>> part of the ABI.
>>>>
>>>> /usr/include/linux/perf_event.h
>>>> 37:     PERF_TYPE_MAX,                          /* non-ABI */
>>>> 60:     PERF_COUNT_HW_MAX,                      /* non-ABI */
>>>> 79:     PERF_COUNT_HW_CACHE_MAX,                /* non-ABI */
>>>> 87:     PERF_COUNT_HW_CACHE_OP_MAX,             /* non-ABI */
>>>> 94:     PERF_COUNT_HW_CACHE_RESULT_MAX,         /* non-ABI */
>>>> 116:    PERF_COUNT_SW_MAX,                      /* non-ABI */
>>>> 149:    PERF_SAMPLE_MAX = 1U << 24,             /* non-ABI */
>>>> 151:    __PERF_SAMPLE_CALLCHAIN_EARLY           = 1ULL << 63, /*
>>>> non-ABI; internal use */
>>>> 189:    PERF_SAMPLE_BRANCH_MAX_SHIFT            /* non-ABI */
>>>> 267:    PERF_TXN_MAX            = (1 << 8), /* non-ABI */
>>>> 301:    PERF_FORMAT_MAX = 1U << 4,              /* non-ABI */
>>>> 1067:   PERF_RECORD_MAX,                        /* non-ABI */
>>>> 1078:   PERF_RECORD_KSYMBOL_TYPE_MAX            /* non-ABI */
>>>> 1087:   PERF_BPF_EVENT_MAX,             /* non-ABI */
>>>
>>> Any thoughts on similarly annotating all our _MAX enums in the same way?
>>> We could also add a section in the ABI Policy to make it explicit _MAX
>>> enum values are not part of the ABI - what do folks think?
>>
>> Interesting. I am not sure it is always ABI-safe though.
> 
>
  
Ray Kinsella May 24, 2022, 3:11 p.m. UTC | #19
fengchengwen <fengchengwen@huawei.com> writes:

> Hi all,
>
>   This patch lasts for a long time. Are we waiting for 22.11 to deal with it?

That was my read, as can't reliably change the value of _MAX at this
stage without it having impact elsewhere. 


>   We have the same requirements for the reset or recovery mechanism, but there are differences:
>
>     APP                                    PMD
>      |                                      |
>      |                                  detect error
>      |     <---report error event---        |
>      |                                      |
> do error stats                              |
> and report                                  |
>      |      ---start recover-->             |
>      |                                  do recover
>      |     <---report recover result        |
>      |                                      |
> if succ just log
> else may migrate
> service
>
> Can we generalize these processes(means that the implementation is at the framework layer)? or only at PMD API?
>
>
> On 2022/2/15 0:06, Ray Kinsella wrote:
>> 
>> Thomas Monjalon <thomas@monjalon.net> writes:
>> 
>>> 14/02/2022 11:16, Ray Kinsella:
>>>> Ray Kinsella <mdr@ashroe.eu> writes:
>>>>> Thomas Monjalon <thomas@monjalon.net> writes:
>>>>>> 02/02/2022 12:44, Ray Kinsella:
>>>>>>> Ferruh Yigit <ferruh.yigit@intel.com> writes:
>>>>>>>> On 1/28/2022 12:48 PM, Kalesh A P wrote:
>>>>>>>>> --- a/lib/ethdev/rte_ethdev.h
>>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>>>>>>> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
>>>>>>>>>   	RTE_ETH_EVENT_DESTROY,  /**< port is released */
>>>>>>>>>   	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
>>>>>>>>>   	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
>>>>>>>>> +	RTE_ETH_EVENT_ERR_RECOVERING,
>>>>>>>>> +			/**< port recovering from an error
>>>>>>>>> +			 *
>>>>>>>>> +			 * PMD detected a FW reset or error condition.
>>>>>>>>> +			 * PMD will try to recover from the error.
>>>>>>>>> +			 * Data path may be quiesced and Control path operations
>>>>>>>>> +			 * may fail at this time.
>>>>>>>>> +			 */
>>>>>>>>> +	RTE_ETH_EVENT_RECOVERED,
>>>>>>>>> +			/**< port recovered from an error
>>>>>>>>> +			 *
>>>>>>>>> +			 * PMD has recovered from the error condition.
>>>>>>>>> +			 * Control path and Data path are up now.
>>>>>>>>> +			 * PMD re-configures the port to the state prior to the error.
>>>>>>>>> +			 * Since the device has undergone a reset, flow rules
>>>>>>>>> +			 * offloaded prior to reset may be lost and
>>>>>>>>> +			 * the application should recreate the rules again.
>>>>>>>>> +			 */
>>>>>>>>>   	RTE_ETH_EVENT_MAX       /**< max value of this enum */
>>>>>>>>
>>>>>>>>
>>>>>>>> Also ABI check complains about 'RTE_ETH_EVENT_MAX' value check, cc'ed more people
>>>>>>>> to evaluate if it is a false positive:
>>>>>>>>
>>>>>>>>
>>>>>>>> 1 function with some indirect sub-type change:
>>>>>>>>   [C] 'function int rte_eth_dev_callback_register(uint16_t, rte_eth_event_type, rte_eth_dev_cb_fn, void*)' at rte_ethdev.c:4637:1 has some indirect sub-type changes:
>>>>>>>>     parameter 3 of type 'typedef rte_eth_dev_cb_fn' has sub-type changes:
>>>>>>>>       underlying type 'int (typedef uint16_t, enum rte_eth_event_type, void*, void*)*' changed:
>>>>>>>>         in pointed to type 'function type int (typedef uint16_t, enum rte_eth_event_type, void*, void*)':
>>>>>>>>           parameter 2 of type 'enum rte_eth_event_type' has sub-type changes:
>>>>>>>>             type size hasn't changed
>>>>>>>>             2 enumerator insertions:
>>>>>>>>               'rte_eth_event_type::RTE_ETH_EVENT_ERR_RECOVERING' value '11'
>>>>>>>>               'rte_eth_event_type::RTE_ETH_EVENT_RECOVERED' value '12'
>>>>>>>>             1 enumerator change:
>>>>>>>>               'rte_eth_event_type::RTE_ETH_EVENT_MAX' from value '11' to '13' at rte_ethdev.h:3807:1
>>>>>>>
>>>>>>> I don't immediately see the problem that this would cause.
>>>>>>> There are no array sizes etc dependent on the value of MAX for instance.
>>>>>>>
>>>>>>> Looks safe?
>>>>>>
>>>>>> We never know how this enum will be used by the application.
>>>>>> The max value may be used for the size of an event array.
>>>>>> It looks a real ABI issue unfortunately.
>>>>>
>>>>> Right - but we only really care about it when an array size based on MAX
>>>>> is likely to be passed to DPDK, which doesn't apply in this case.
>>>
>>> I don't completely agree.
>>> A developer may assume an event will never exceed MAX value.
>>> However, after an upgrade of DPDK without app rebuild,
>>> a higher event value may be received in the app,
>>> breaking the assumption.
>>> Should we consider this case as an ABI breakage?
>> 
>> Nope - I think we should explicitly exclude MAX values from any
>> ABI guarantee, as being able to change them is key to our be able to
>> evolve DPDK while maintaining ABI stability. 
>> 
>> Consider what it means applying the ABI policy to a MAX value, you are
>> in effect saying that that no value can be added to this enumeration
>> until the next ABI version, for me this is very restrictive without a
>> solid reason. 
>> 
>>>
>>>>> I noted that some Linux folks explicitly mark similar MAX values as not
>>>>> part of the ABI.
>>>>>
>>>>> /usr/include/linux/perf_event.h
>>>>> 37:     PERF_TYPE_MAX,                          /* non-ABI */
>>>>> 60:     PERF_COUNT_HW_MAX,                      /* non-ABI */
>>>>> 79:     PERF_COUNT_HW_CACHE_MAX,                /* non-ABI */
>>>>> 87:     PERF_COUNT_HW_CACHE_OP_MAX,             /* non-ABI */
>>>>> 94:     PERF_COUNT_HW_CACHE_RESULT_MAX,         /* non-ABI */
>>>>> 116:    PERF_COUNT_SW_MAX,                      /* non-ABI */
>>>>> 149:    PERF_SAMPLE_MAX = 1U << 24,             /* non-ABI */
>>>>> 151:    __PERF_SAMPLE_CALLCHAIN_EARLY           = 1ULL << 63, /*
>>>>> non-ABI; internal use */
>>>>> 189:    PERF_SAMPLE_BRANCH_MAX_SHIFT            /* non-ABI */
>>>>> 267:    PERF_TXN_MAX            = (1 << 8), /* non-ABI */
>>>>> 301:    PERF_FORMAT_MAX = 1U << 4,              /* non-ABI */
>>>>> 1067:   PERF_RECORD_MAX,                        /* non-ABI */
>>>>> 1078:   PERF_RECORD_KSYMBOL_TYPE_MAX            /* non-ABI */
>>>>> 1087:   PERF_BPF_EVENT_MAX,             /* non-ABI */
>>>>
>>>> Any thoughts on similarly annotating all our _MAX enums in the same way?
>>>> We could also add a section in the ABI Policy to make it explicit _MAX
>>>> enum values are not part of the ABI - what do folks think?
>>>
>>> Interesting. I am not sure it is always ABI-safe though.
>> 
>>
  
Chengwen Feng June 10, 2022, 12:16 a.m. UTC | #20
On 2022/5/24 23:11, Ray Kinsella wrote:
> 
> fengchengwen <fengchengwen@huawei.com> writes:
> 
>> Hi all,
>>
>>   This patch lasts for a long time. Are we waiting for 22.11 to deal with it?
> 
> That was my read, as can't reliably change the value of _MAX at this
> stage without it having impact elsewhere. 
> 
> 
>>   We have the same requirements for the reset or recovery mechanism, but there are differences:
>>
>>     APP                                    PMD
>>      |                                      |
>>      |                                  detect error
>>      |     <---report error event---        |
>>      |                                      |
>> do error stats                              |
>> and report                                  |
>>      |      ---start recover-->             |
>>      |                                  do recover
>>      |     <---report recover result        |
>>      |                                      |
>> if succ just log
>> else may migrate
>> service
>>
>> Can we generalize these processes(means that the implementation is at the framework layer)? or only at PMD API?
>>
>>
>> On 2022/2/15 0:06, Ray Kinsella wrote:
>>>
>>> Thomas Monjalon <thomas@monjalon.net> writes:
>>>
>>>> 14/02/2022 11:16, Ray Kinsella:
>>>>> Ray Kinsella <mdr@ashroe.eu> writes:
>>>>>> Thomas Monjalon <thomas@monjalon.net> writes:
>>>>>>> 02/02/2022 12:44, Ray Kinsella:
>>>>>>>> Ferruh Yigit <ferruh.yigit@intel.com> writes:
>>>>>>>>> On 1/28/2022 12:48 PM, Kalesh A P wrote:
>>>>>>>>>> --- a/lib/ethdev/rte_ethdev.h
>>>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>>>>>>>> @@ -3818,6 +3818,24 @@ enum rte_eth_event_type {
>>>>>>>>>>   	RTE_ETH_EVENT_DESTROY,  /**< port is released */
>>>>>>>>>>   	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
>>>>>>>>>>   	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
>>>>>>>>>> +	RTE_ETH_EVENT_ERR_RECOVERING,
>>>>>>>>>> +			/**< port recovering from an error
>>>>>>>>>> +			 *
>>>>>>>>>> +			 * PMD detected a FW reset or error condition.
>>>>>>>>>> +			 * PMD will try to recover from the error.
>>>>>>>>>> +			 * Data path may be quiesced and Control path operations
>>>>>>>>>> +			 * may fail at this time.
>>>>>>>>>> +			 */

I think we should standard error reason which could pass to application, so that
application know the really reason. the error reason could as the ret_param of
rte_eth_dev_callback_process().

But I think it could be done later.

>>>>>>>>>> +	RTE_ETH_EVENT_RECOVERED,
>>>>>>>>>> +			/**< port recovered from an error
>>>>>>>>>> +			 *
>>>>>>>>>> +			 * PMD has recovered from the error condition.
>>>>>>>>>> +			 * Control path and Data path are up now.
>>>>>>>>>> +			 * PMD re-configures the port to the state prior to the error.
>>>>>>>>>> +			 * Since the device has undergone a reset, flow rules
>>>>>>>>>> +			 * offloaded prior to reset may be lost and
>>>>>>>>>> +			 * the application should recreate the rules again.
>>>>>>>>>> +			 */

please add RTE_ETH_EVENT_RECOVER_FAIL event, the RTE_ETH_EVENT_INTR_RMV event is
a big event, it has its own usecase. So please add the RECOVER_FAIL event to let
application decide remove or keep it.

>>>>>>>>>>   	RTE_ETH_EVENT_MAX       /**< max value of this enum */
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Also ABI check complains about 'RTE_ETH_EVENT_MAX' value check, cc'ed more people
>>>>>>>>> to evaluate if it is a false positive:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 1 function with some indirect sub-type change:
>>>>>>>>>   [C] 'function int rte_eth_dev_callback_register(uint16_t, rte_eth_event_type, rte_eth_dev_cb_fn, void*)' at rte_ethdev.c:4637:1 has some indirect sub-type changes:
>>>>>>>>>     parameter 3 of type 'typedef rte_eth_dev_cb_fn' has sub-type changes:
>>>>>>>>>       underlying type 'int (typedef uint16_t, enum rte_eth_event_type, void*, void*)*' changed:
>>>>>>>>>         in pointed to type 'function type int (typedef uint16_t, enum rte_eth_event_type, void*, void*)':
>>>>>>>>>           parameter 2 of type 'enum rte_eth_event_type' has sub-type changes:
>>>>>>>>>             type size hasn't changed
>>>>>>>>>             2 enumerator insertions:
>>>>>>>>>               'rte_eth_event_type::RTE_ETH_EVENT_ERR_RECOVERING' value '11'
>>>>>>>>>               'rte_eth_event_type::RTE_ETH_EVENT_RECOVERED' value '12'
>>>>>>>>>             1 enumerator change:
>>>>>>>>>               'rte_eth_event_type::RTE_ETH_EVENT_MAX' from value '11' to '13' at rte_ethdev.h:3807:1
>>>>>>>>
>>>>>>>> I don't immediately see the problem that this would cause.
>>>>>>>> There are no array sizes etc dependent on the value of MAX for instance.
>>>>>>>>
>>>>>>>> Looks safe?
>>>>>>>
>>>>>>> We never know how this enum will be used by the application.
>>>>>>> The max value may be used for the size of an event array.
>>>>>>> It looks a real ABI issue unfortunately.
>>>>>>
>>>>>> Right - but we only really care about it when an array size based on MAX
>>>>>> is likely to be passed to DPDK, which doesn't apply in this case.
>>>>
>>>> I don't completely agree.
>>>> A developer may assume an event will never exceed MAX value.
>>>> However, after an upgrade of DPDK without app rebuild,
>>>> a higher event value may be received in the app,
>>>> breaking the assumption.
>>>> Should we consider this case as an ABI breakage?
>>>
>>> Nope - I think we should explicitly exclude MAX values from any
>>> ABI guarantee, as being able to change them is key to our be able to
>>> evolve DPDK while maintaining ABI stability. 
>>>
>>> Consider what it means applying the ABI policy to a MAX value, you are
>>> in effect saying that that no value can be added to this enumeration
>>> until the next ABI version, for me this is very restrictive without a
>>> solid reason. 
>>>
>>>>
>>>>>> I noted that some Linux folks explicitly mark similar MAX values as not
>>>>>> part of the ABI.
>>>>>>
>>>>>> /usr/include/linux/perf_event.h
>>>>>> 37:     PERF_TYPE_MAX,                          /* non-ABI */
>>>>>> 60:     PERF_COUNT_HW_MAX,                      /* non-ABI */
>>>>>> 79:     PERF_COUNT_HW_CACHE_MAX,                /* non-ABI */
>>>>>> 87:     PERF_COUNT_HW_CACHE_OP_MAX,             /* non-ABI */
>>>>>> 94:     PERF_COUNT_HW_CACHE_RESULT_MAX,         /* non-ABI */
>>>>>> 116:    PERF_COUNT_SW_MAX,                      /* non-ABI */
>>>>>> 149:    PERF_SAMPLE_MAX = 1U << 24,             /* non-ABI */
>>>>>> 151:    __PERF_SAMPLE_CALLCHAIN_EARLY           = 1ULL << 63, /*
>>>>>> non-ABI; internal use */
>>>>>> 189:    PERF_SAMPLE_BRANCH_MAX_SHIFT            /* non-ABI */
>>>>>> 267:    PERF_TXN_MAX            = (1 << 8), /* non-ABI */
>>>>>> 301:    PERF_FORMAT_MAX = 1U << 4,              /* non-ABI */
>>>>>> 1067:   PERF_RECORD_MAX,                        /* non-ABI */
>>>>>> 1078:   PERF_RECORD_KSYMBOL_TYPE_MAX            /* non-ABI */
>>>>>> 1087:   PERF_BPF_EVENT_MAX,             /* non-ABI */
>>>>>
>>>>> Any thoughts on similarly annotating all our _MAX enums in the same way?
>>>>> We could also add a section in the ABI Policy to make it explicit _MAX
>>>>> enum values are not part of the ABI - what do folks think?
>>>>
>>>> Interesting. I am not sure it is always ABI-safe though.
>>>
>>>
> 
>
  

Patch

diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
index 6831289..9ecc0e4 100644
--- a/doc/guides/prog_guide/poll_mode_drv.rst
+++ b/doc/guides/prog_guide/poll_mode_drv.rst
@@ -623,3 +623,27 @@  by application.
 The PMD itself should not call rte_eth_dev_reset(). The PMD can trigger
 the application to handle reset event. It is duty of application to
 handle all synchronization before it calls rte_eth_dev_reset().
+
+Error recovery support
+~~~~~~~~~~~~~~~~~~~~~~
+
+When the PMD detects a FW reset or error condition, it may try to recover
+from the error without needing the application intervention. In such cases,
+PMD would need events to notify the application that it is undergoing
+an error recovery.
+
+The PMD should trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the
+application that PMD detected a FW reset or FW error condition. PMD may
+try to recover from the error by itself. Data path may be quiesced and
+control path operations may fail during the recovery period. The application
+should stop polling till it receives RTE_ETH_EVENT_RECOVERED event from the PMD.
+
+The PMD should trigger RTE_ETH_EVENT_RECOVERED event to notify the application
+that the it has recovered from the error condition. PMD re-configures the port
+to the state prior to the error condition. Control path and data path are up now.
+Since the device has undergone a reset, flow rules offloaded prior to reset
+may be lost and the application should recreate the rules again.
+
+The PMD should trigger RTE_ETH_EVENT_INTR_RMV event to notify the application
+that it has failed to recover from the error condition. The device may not be
+usable anymore.
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 147cc1c..a46819f 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -3818,6 +3818,24 @@  enum rte_eth_event_type {
 	RTE_ETH_EVENT_DESTROY,  /**< port is released */
 	RTE_ETH_EVENT_IPSEC,    /**< IPsec offload related event */
 	RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
+	RTE_ETH_EVENT_ERR_RECOVERING,
+			/**< port recovering from an error
+			 *
+			 * PMD detected a FW reset or error condition.
+			 * PMD will try to recover from the error.
+			 * Data path may be quiesced and Control path operations
+			 * may fail at this time.
+			 */
+	RTE_ETH_EVENT_RECOVERED,
+			/**< port recovered from an error
+			 *
+			 * PMD has recovered from the error condition.
+			 * Control path and Data path are up now.
+			 * PMD re-configures the port to the state prior to the error.
+			 * Since the device has undergone a reset, flow rules
+			 * offloaded prior to reset may be lost and
+			 * the application should recreate the rules again.
+			 */
 	RTE_ETH_EVENT_MAX       /**< max value of this enum */
 };