[1/5] ethdev: fix race-condition of proactive error handling mode

Message ID 20230301030610.49468-2-fengchengwen@huawei.com (mailing list archive)
State Changes Requested, archived
Delegated to: Ferruh Yigit
Headers
Series fix race-condition of proactive error handling mode |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

Chengwen Feng March 1, 2023, 3:06 a.m. UTC
  In the proactive error handling mode, the PMD will set the data path
pointers to dummy functions and then try recovery, in this period the
application may still invoking data path API. This will introduce a
race-condition with data path which may lead to crash [1].

Although the PMD added delay after setting data path pointers to cover
the above race-condition, it reduces the probability, but it doesn't
solve the problem.

To solve the race-condition problem fundamentally, the following
requirements are added:
1. The PMD should set the data path pointers to dummy functions after
   report RTE_ETH_EVENT_ERR_RECOVERING event.
2. The application should stop data path API invocation when process
   the RTE_ETH_EVENT_ERR_RECOVERING event.
3. The PMD should set the data path pointers to valid functions before
   report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
4. The application should enable data path API invocation when process
   the RTE_ETH_EVENT_RECOVERY_SUCCESS event.

Also, this patch introduce a driver internal function
rte_eth_fp_ops_setup which used as an help function for PMD.

[1] http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/

Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
Cc: stable@dpdk.org

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
 doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
 lib/ethdev/ethdev_driver.c              |  8 +++++++
 lib/ethdev/ethdev_driver.h              | 10 ++++++++
 lib/ethdev/rte_ethdev.h                 | 32 +++++++++++++++----------
 lib/ethdev/version.map                  |  1 +
 5 files changed, 46 insertions(+), 25 deletions(-)
  

Comments

Konstantin Ananyev March 2, 2023, 12:08 p.m. UTC | #1
> In the proactive error handling mode, the PMD will set the data path
> pointers to dummy functions and then try recovery, in this period the
> application may still invoking data path API. This will introduce a
> race-condition with data path which may lead to crash [1].
> 
> Although the PMD added delay after setting data path pointers to cover
> the above race-condition, it reduces the probability, but it doesn't
> solve the problem.
> 
> To solve the race-condition problem fundamentally, the following
> requirements are added:
> 1. The PMD should set the data path pointers to dummy functions after
>    report RTE_ETH_EVENT_ERR_RECOVERING event.
> 2. The application should stop data path API invocation when process
>    the RTE_ETH_EVENT_ERR_RECOVERING event.
> 3. The PMD should set the data path pointers to valid functions before
>    report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> 4. The application should enable data path API invocation when process
>    the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> 
> Also, this patch introduce a driver internal function
> rte_eth_fp_ops_setup which used as an help function for PMD.
> 
> [1] http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
> 
> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> ---
>  doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
>  lib/ethdev/ethdev_driver.c              |  8 +++++++
>  lib/ethdev/ethdev_driver.h              | 10 ++++++++
>  lib/ethdev/rte_ethdev.h                 | 32 +++++++++++++++----------
>  lib/ethdev/version.map                  |  1 +
>  5 files changed, 46 insertions(+), 25 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
> index c145a9066c..e380ff135a 100644
> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> @@ -638,14 +638,9 @@ different from the application invokes recovery in PASSIVE mode,
>  the PMD automatically recovers from error in PROACTIVE mode,
>  and only a small amount of work is required for the application.
> 
> -During error detection and automatic recovery,
> -the PMD sets the data path pointers to dummy functions
> -(which will prevent the crash),
> -and also make sure the control path operations fail with a return code ``-EBUSY``.
> -
> -Because the PMD recovers automatically,
> -the application can only sense that the data flow is disconnected for a while
> -and the control API returns an error in this period.
> +During error detection and automatic recovery, the PMD sets the data path
> +pointers to dummy functions and also make sure the control path operations
> +failed with a return code ``-EBUSY``.
> 
>  In order to sense the error happening/recovering,
>  as well as to restore some additional configuration,
> @@ -653,9 +648,9 @@ three events are available:
> 
>  ``RTE_ETH_EVENT_ERR_RECOVERING``
>     Notify the application that an error is detected
> -   and the recovery is being started.
> +   and the recovery is about to start.
>     Upon receiving the event, the application should not invoke
> -   any control path function until receiving
> +   any control and data path API until receiving
>     ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
> 
>  .. note::
> @@ -666,8 +661,9 @@ three events are available:
> 
>  ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
>     Notify the application that the recovery from error is successful,
> -   the PMD already re-configures the port,
> -   and the effect is the same as a restart operation.
> +   the PMD already re-configures the port.
> +   The application should restore some additional configuration, and then
> +   enable data path API invocation.
> 
>  ``RTE_ETH_EVENT_RECOVERY_FAILED``
>     Notify the application that the recovery from error failed,
> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
> index 0be1e8ca04..f994653fe9 100644
> --- a/lib/ethdev/ethdev_driver.c
> +++ b/lib/ethdev/ethdev_driver.c
> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev *dev, const char *ring_name,
>  	return rc;
>  }
> 
> +void
> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
> +{
> +	if (dev == NULL)
> +		return;
> +	eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
> +}
> +
>  const struct rte_memzone *
>  rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char *ring_name,
>  			 uint16_t queue_id, size_t size, unsigned int align,
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> index 2c9d615fb5..0d964d1f67 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -1621,6 +1621,16 @@ int
>  rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const char *name,
>  		 uint16_t queue_id);
> 
> +/**
> + * @internal
> + * Setup eth fast-path API to ethdev values.
> + *
> + * @param dev
> + *  Pointer to struct rte_eth_dev.
> + */
> +__rte_internal
> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
> +
>  /**
>   * @internal
>   * Atomically set the link status for the specific device.
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 049641d57c..44ee7229c1 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
>  	 */
>  	RTE_ETH_EVENT_RX_AVAIL_THRESH,
>  	/** Port recovering from a hardware or firmware error.
> -	 * If PMD supports proactive error recovery,
> -	 * it should trigger this event to notify application
> -	 * that it detected an error and the recovery is being started.
> -	 * Upon receiving the event, the application should not invoke any control path API
> -	 * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until receiving
> -	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED event.
> -	 * The PMD will set the data path pointers to dummy functions,
> -	 * and re-set the data path pointers to non-dummy functions
> -	 * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> -	 * It means that the application cannot send or receive any packets
> -	 * during this period.
> +	 *
> +	 * If PMD supports proactive error recovery, it should trigger this
> +	 * event to notify application that it detected an error and the
> +	 * recovery is about to start.
> +	 *
> +	 * Upon receiving the event, the application should not invoke any
> +	 * control and data path API until receiving
> +	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
> +	 * event.
> +	 *
> +	 * Once this event is reported, the PMD will set the data path pointers
> +	 * to dummy functions, and re-set the data path pointers to valid
> +	 * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> +	 *
>  	 * @note Before the PMD reports the recovery result,
>  	 * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event again,
>  	 * because a larger error may occur during the recovery.
>  	 */
>  	RTE_ETH_EVENT_ERR_RECOVERING,
>  	/** Port recovers successfully from the error.
> -	 * The PMD already re-configured the port,
> -	 * and the effect is the same as a restart operation.
> +	 *
> +	 * The PMD already re-configured the port:
>  	 * a) The following operation will be retained: (alphabetically)
>  	 *    - DCB configuration
>  	 *    - FEC configuration
> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
>  	 *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
>  	 * c) Any other configuration will not be stored
>  	 *    and will need to be re-configured.
> +	 *
> +	 * The application should restore some additional configuration
> +	 * (see above case b/c), and then enable data path API invocation.
>  	 */
>  	RTE_ETH_EVENT_RECOVERY_SUCCESS,
>  	/** Port recovery failed.
> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> index 357d1a88c0..c273e0bdae 100644
> --- a/lib/ethdev/version.map
> +++ b/lib/ethdev/version.map
> @@ -320,6 +320,7 @@ INTERNAL {
>  	rte_eth_devices;
>  	rte_eth_dma_zone_free;
>  	rte_eth_dma_zone_reserve;
> +	rte_eth_fp_ops_setup;
>  	rte_eth_hairpin_queue_peer_bind;
>  	rte_eth_hairpin_queue_peer_unbind;
>  	rte_eth_hairpin_queue_peer_update;
> --
 
Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>

> 2.17.1
  
Honnappa Nagarahalli March 2, 2023, 11:30 p.m. UTC | #2
> -----Original Message-----
> From: Chengwen Feng <fengchengwen@huawei.com>
> Sent: Tuesday, February 28, 2023 9:06 PM
> To: thomas@monjalon.net; ferruh.yigit@amd.com;
> konstantin.ananyev@huawei.com; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>; Kalesh AP <kalesh-
> anakkur.purayil@broadcom.com>; Ajit Khaparde
> (ajit.khaparde@broadcom.com) <ajit.khaparde@broadcom.com>
> Cc: dev@dpdk.org
> Subject: [PATCH 1/5] ethdev: fix race-condition of proactive error handling
> mode
> 
> In the proactive error handling mode, the PMD will set the data path pointers to
> dummy functions and then try recovery, in this period the application may still
> invoking data path API. This will introduce a race-condition with data path which
> may lead to crash [1].
> 
> Although the PMD added delay after setting data path pointers to cover the
> above race-condition, it reduces the probability, but it doesn't solve the
> problem.
> 
> To solve the race-condition problem fundamentally, the following requirements
> are added:
> 1. The PMD should set the data path pointers to dummy functions after
>    report RTE_ETH_EVENT_ERR_RECOVERING event.
Do you mean to say, PMD should set the data path pointers after calling the call back function?
The PMD is running in the context of multiple EAL threads. How do these threads synchronize such that only one thread sets these data pointers?

> 2. The application should stop data path API invocation when process
>    the RTE_ETH_EVENT_ERR_RECOVERING event.
Any thoughts on how an application can do this?

> 3. The PMD should set the data path pointers to valid functions before
>    report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> 4. The application should enable data path API invocation when process
>    the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
Do you mean to say that the application should not call the datapath APIs while the PMD is running the recovery process?

> 
> Also, this patch introduce a driver internal function rte_eth_fp_ops_setup
> which used as an help function for PMD.
> 
> [1]
> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-
> ashok.k.kaladi@intel.com/
> 
> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> ---
>  doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
>  lib/ethdev/ethdev_driver.c              |  8 +++++++
>  lib/ethdev/ethdev_driver.h              | 10 ++++++++
>  lib/ethdev/rte_ethdev.h                 | 32 +++++++++++++++----------
>  lib/ethdev/version.map                  |  1 +
>  5 files changed, 46 insertions(+), 25 deletions(-)
> 
> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
> b/doc/guides/prog_guide/poll_mode_drv.rst
> index c145a9066c..e380ff135a 100644
> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> @@ -638,14 +638,9 @@ different from the application invokes recovery in
> PASSIVE mode,  the PMD automatically recovers from error in PROACTIVE
> mode,  and only a small amount of work is required for the application.
> 
> -During error detection and automatic recovery, -the PMD sets the data path
> pointers to dummy functions -(which will prevent the crash), -and also make
> sure the control path operations fail with a return code ``-EBUSY``.
> -
> -Because the PMD recovers automatically, -the application can only sense that
> the data flow is disconnected for a while -and the control API returns an error in
> this period.
> +During error detection and automatic recovery, the PMD sets the data
> +path pointers to dummy functions and also make sure the control path
> +operations failed with a return code ``-EBUSY``.
> 
>  In order to sense the error happening/recovering,  as well as to restore some
> additional configuration, @@ -653,9 +648,9 @@ three events are available:
> 
>  ``RTE_ETH_EVENT_ERR_RECOVERING``
>     Notify the application that an error is detected
> -   and the recovery is being started.
> +   and the recovery is about to start.
>     Upon receiving the event, the application should not invoke
> -   any control path function until receiving
> +   any control and data path API until receiving
>     ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
> 
>  .. note::
> @@ -666,8 +661,9 @@ three events are available:
> 
>  ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
>     Notify the application that the recovery from error is successful,
> -   the PMD already re-configures the port,
> -   and the effect is the same as a restart operation.
> +   the PMD already re-configures the port.
> +   The application should restore some additional configuration, and then
What is the additional configuration? Is this specific to each NIC/PMD?
I thought, this is an auto recovery process and the application does not require to reconfigure anything. If the application has to restore the configuration, how does auto recovery differ from typical recovery process?

> +   enable data path API invocation.
> 
>  ``RTE_ETH_EVENT_RECOVERY_FAILED``
>     Notify the application that the recovery from error failed, diff --git
> a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c index
> 0be1e8ca04..f994653fe9 100644
> --- a/lib/ethdev/ethdev_driver.c
> +++ b/lib/ethdev/ethdev_driver.c
> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev
> *dev, const char *ring_name,
>  	return rc;
>  }
> 
> +void
> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev) {
> +	if (dev == NULL)
> +		return;
> +	eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev); }
> +
>  const struct rte_memzone *
>  rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char
> *ring_name,
>  			 uint16_t queue_id, size_t size, unsigned int align, diff -
> -git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index
> 2c9d615fb5..0d964d1f67 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -1621,6 +1621,16 @@ int
>  rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const char
> *name,
>  		 uint16_t queue_id);
> 
> +/**
> + * @internal
> + * Setup eth fast-path API to ethdev values.
> + *
> + * @param dev
> + *  Pointer to struct rte_eth_dev.
> + */
> +__rte_internal
> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
> +
>  /**
>   * @internal
>   * Atomically set the link status for the specific device.
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> 049641d57c..44ee7229c1 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
>  	 */
>  	RTE_ETH_EVENT_RX_AVAIL_THRESH,
>  	/** Port recovering from a hardware or firmware error.
> -	 * If PMD supports proactive error recovery,
> -	 * it should trigger this event to notify application
> -	 * that it detected an error and the recovery is being started.
> -	 * Upon receiving the event, the application should not invoke any
> control path API
> -	 * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until receiving
> -	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> RTE_ETH_EVENT_RECOVERY_FAILED event.
> -	 * The PMD will set the data path pointers to dummy functions,
> -	 * and re-set the data path pointers to non-dummy functions
> -	 * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> -	 * It means that the application cannot send or receive any packets
> -	 * during this period.
> +	 *
> +	 * If PMD supports proactive error recovery, it should trigger this
> +	 * event to notify application that it detected an error and the
> +	 * recovery is about to start.
> +	 *
> +	 * Upon receiving the event, the application should not invoke any
> +	 * control and data path API until receiving
> +	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> RTE_ETH_EVENT_RECOVERY_FAILED
> +	 * event.
> +	 *
> +	 * Once this event is reported, the PMD will set the data path pointers
> +	 * to dummy functions, and re-set the data path pointers to valid
> +	 * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
> event.
Why do we need to set the data path pointers to dummy functions if the application is restricted from invoking any control and data path APIs till the recovery process is completed?

> +	 *
>  	 * @note Before the PMD reports the recovery result,
>  	 * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
> again,
>  	 * because a larger error may occur during the recovery.
>  	 */
>  	RTE_ETH_EVENT_ERR_RECOVERING,
I understand this is not a change in this patch. But, just wondering, what is the purpose of this? How is the application supposed to use this?

>  	/** Port recovers successfully from the error.
> -	 * The PMD already re-configured the port,
> -	 * and the effect is the same as a restart operation.
> +	 *
> +	 * The PMD already re-configured the port:
>  	 * a) The following operation will be retained: (alphabetically)
>  	 *    - DCB configuration
>  	 *    - FEC configuration
> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
>  	 *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
>  	 * c) Any other configuration will not be stored
>  	 *    and will need to be re-configured.
> +	 *
> +	 * The application should restore some additional configuration
> +	 * (see above case b/c), and then enable data path API invocation.
>  	 */
>  	RTE_ETH_EVENT_RECOVERY_SUCCESS,
>  	/** Port recovery failed.
> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map index
> 357d1a88c0..c273e0bdae 100644
> --- a/lib/ethdev/version.map
> +++ b/lib/ethdev/version.map
> @@ -320,6 +320,7 @@ INTERNAL {
>  	rte_eth_devices;
>  	rte_eth_dma_zone_free;
>  	rte_eth_dma_zone_reserve;
> +	rte_eth_fp_ops_setup;
>  	rte_eth_hairpin_queue_peer_bind;
>  	rte_eth_hairpin_queue_peer_unbind;
>  	rte_eth_hairpin_queue_peer_update;
> --
> 2.17.1
  
Konstantin Ananyev March 3, 2023, 12:21 a.m. UTC | #3
> 
>> -----Original Message-----
>> From: Chengwen Feng <fengchengwen@huawei.com>
>> Sent: Tuesday, February 28, 2023 9:06 PM
>> To: thomas@monjalon.net; ferruh.yigit@amd.com;
>> konstantin.ananyev@huawei.com; Andrew Rybchenko
>> <andrew.rybchenko@oktetlabs.ru>; Kalesh AP <kalesh-
>> anakkur.purayil@broadcom.com>; Ajit Khaparde
>> (ajit.khaparde@broadcom.com) <ajit.khaparde@broadcom.com>
>> Cc: dev@dpdk.org
>> Subject: [PATCH 1/5] ethdev: fix race-condition of proactive error handling
>> mode
>>
>> In the proactive error handling mode, the PMD will set the data path pointers to
>> dummy functions and then try recovery, in this period the application may still
>> invoking data path API. This will introduce a race-condition with data path which
>> may lead to crash [1].
>>
>> Although the PMD added delay after setting data path pointers to cover the
>> above race-condition, it reduces the probability, but it doesn't solve the
>> problem.
>>
>> To solve the race-condition problem fundamentally, the following requirements
>> are added:
>> 1. The PMD should set the data path pointers to dummy functions after
>>     report RTE_ETH_EVENT_ERR_RECOVERING event.
> Do you mean to say, PMD should set the data path pointers after calling the call back function?
> The PMD is running in the context of multiple EAL threads. How do these threads synchronize such that only one thread sets these data pointers?

As I understand this event callback supposed to be called in the context 
of EAL interrupt thread (whoever is more familiar with original idea, 
feel free to correct me if I missed something).
How it is going to signal data-path threads that they need to 
stop/suspend calling data-path API - that's I suppose is left to 
application to decide...
Same as right now it is application responsibility to stop data-path 
threads before doing dev_stop()/dev/_config()/etc.


> 
>> 2. The application should stop data path API invocation when process
>>     the RTE_ETH_EVENT_ERR_RECOVERING event.
> Any thoughts on how an application can do this?
> 
>> 3. The PMD should set the data path pointers to valid functions before
>>     report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>> 4. The application should enable data path API invocation when process
>>     the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> Do you mean to say that the application should not call the datapath APIs while the PMD is running the recovery process?

Yes, I believe that's the intention.

>>
>> Also, this patch introduce a driver internal function rte_eth_fp_ops_setup
>> which used as an help function for PMD.
>>
>> [1]
>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-
>> ashok.k.kaladi@intel.com/
>>
>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>> ---
>>   doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
>>   lib/ethdev/ethdev_driver.c              |  8 +++++++
>>   lib/ethdev/ethdev_driver.h              | 10 ++++++++
>>   lib/ethdev/rte_ethdev.h                 | 32 +++++++++++++++----------
>>   lib/ethdev/version.map                  |  1 +
>>   5 files changed, 46 insertions(+), 25 deletions(-)
>>
>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
>> b/doc/guides/prog_guide/poll_mode_drv.rst
>> index c145a9066c..e380ff135a 100644
>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>> @@ -638,14 +638,9 @@ different from the application invokes recovery in
>> PASSIVE mode,  the PMD automatically recovers from error in PROACTIVE
>> mode,  and only a small amount of work is required for the application.
>>
>> -During error detection and automatic recovery, -the PMD sets the data path
>> pointers to dummy functions -(which will prevent the crash), -and also make
>> sure the control path operations fail with a return code ``-EBUSY``.
>> -
>> -Because the PMD recovers automatically, -the application can only sense that
>> the data flow is disconnected for a while -and the control API returns an error in
>> this period.
>> +During error detection and automatic recovery, the PMD sets the data
>> +path pointers to dummy functions and also make sure the control path
>> +operations failed with a return code ``-EBUSY``.
>>
>>   In order to sense the error happening/recovering,  as well as to restore some
>> additional configuration, @@ -653,9 +648,9 @@ three events are available:
>>
>>   ``RTE_ETH_EVENT_ERR_RECOVERING``
>>      Notify the application that an error is detected
>> -   and the recovery is being started.
>> +   and the recovery is about to start.
>>      Upon receiving the event, the application should not invoke
>> -   any control path function until receiving
>> +   any control and data path API until receiving
>>      ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
>>
>>   .. note::
>> @@ -666,8 +661,9 @@ three events are available:
>>
>>   ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
>>      Notify the application that the recovery from error is successful,
>> -   the PMD already re-configures the port,
>> -   and the effect is the same as a restart operation.
>> +   the PMD already re-configures the port.
>> +   The application should restore some additional configuration, and then
> What is the additional configuration? Is this specific to each NIC/PMD?
> I thought, this is an auto recovery process and the application does not require to reconfigure anything. If the application has to restore the configuration, how does auto recovery differ from typical recovery process?
> 
>> +   enable data path API invocation.
>>
>>   ``RTE_ETH_EVENT_RECOVERY_FAILED``
>>      Notify the application that the recovery from error failed, diff --git
>> a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c index
>> 0be1e8ca04..f994653fe9 100644
>> --- a/lib/ethdev/ethdev_driver.c
>> +++ b/lib/ethdev/ethdev_driver.c
>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev
>> *dev, const char *ring_name,
>>   	return rc;
>>   }
>>
>> +void
>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev) {
>> +	if (dev == NULL)
>> +		return;
>> +	eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev); }
>> +
>>   const struct rte_memzone *
>>   rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char
>> *ring_name,
>>   			 uint16_t queue_id, size_t size, unsigned int align, diff -
>> -git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index
>> 2c9d615fb5..0d964d1f67 100644
>> --- a/lib/ethdev/ethdev_driver.h
>> +++ b/lib/ethdev/ethdev_driver.h
>> @@ -1621,6 +1621,16 @@ int
>>   rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const char
>> *name,
>>   		 uint16_t queue_id);
>>
>> +/**
>> + * @internal
>> + * Setup eth fast-path API to ethdev values.
>> + *
>> + * @param dev
>> + *  Pointer to struct rte_eth_dev.
>> + */
>> +__rte_internal
>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
>> +
>>   /**
>>    * @internal
>>    * Atomically set the link status for the specific device.
>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
>> 049641d57c..44ee7229c1 100644
>> --- a/lib/ethdev/rte_ethdev.h
>> +++ b/lib/ethdev/rte_ethdev.h
>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
>>   	 */
>>   	RTE_ETH_EVENT_RX_AVAIL_THRESH,
>>   	/** Port recovering from a hardware or firmware error.
>> -	 * If PMD supports proactive error recovery,
>> -	 * it should trigger this event to notify application
>> -	 * that it detected an error and the recovery is being started.
>> -	 * Upon receiving the event, the application should not invoke any
>> control path API
>> -	 * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until receiving
>> -	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or
>> RTE_ETH_EVENT_RECOVERY_FAILED event.
>> -	 * The PMD will set the data path pointers to dummy functions,
>> -	 * and re-set the data path pointers to non-dummy functions
>> -	 * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>> -	 * It means that the application cannot send or receive any packets
>> -	 * during this period.
>> +	 *
>> +	 * If PMD supports proactive error recovery, it should trigger this
>> +	 * event to notify application that it detected an error and the
>> +	 * recovery is about to start.
>> +	 *
>> +	 * Upon receiving the event, the application should not invoke any
>> +	 * control and data path API until receiving
>> +	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or
>> RTE_ETH_EVENT_RECOVERY_FAILED
>> +	 * event.
>> +	 *
>> +	 * Once this event is reported, the PMD will set the data path pointers
>> +	 * to dummy functions, and re-set the data path pointers to valid
>> +	 * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
>> event.
> Why do we need to set the data path pointers to dummy functions if the application is restricted from invoking any control and data path APIs till the recovery process is completed?

You are right, in theory it is not mandatory.
Though it helps to flag a problem if user will still try to call them
while recovery is in progress.
Again, same as we doing in dev_stop().

> 
>> +	 *
>>   	 * @note Before the PMD reports the recovery result,
>>   	 * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
>> again,
>>   	 * because a larger error may occur during the recovery.
>>   	 */
>>   	RTE_ETH_EVENT_ERR_RECOVERING,
> I understand this is not a change in this patch. But, just wondering, what is the purpose of this? How is the application supposed to use this?
> 
>>   	/** Port recovers successfully from the error.
>> -	 * The PMD already re-configured the port,
>> -	 * and the effect is the same as a restart operation.
>> +	 *
>> +	 * The PMD already re-configured the port:
>>   	 * a) The following operation will be retained: (alphabetically)
>>   	 *    - DCB configuration
>>   	 *    - FEC configuration
>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
>>   	 *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
>>   	 * c) Any other configuration will not be stored
>>   	 *    and will need to be re-configured.
>> +	 *
>> +	 * The application should restore some additional configuration
>> +	 * (see above case b/c), and then enable data path API invocation.
>>   	 */
>>   	RTE_ETH_EVENT_RECOVERY_SUCCESS,
>>   	/** Port recovery failed.
>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map index
>> 357d1a88c0..c273e0bdae 100644
>> --- a/lib/ethdev/version.map
>> +++ b/lib/ethdev/version.map
>> @@ -320,6 +320,7 @@ INTERNAL {
>>   	rte_eth_devices;
>>   	rte_eth_dma_zone_free;
>>   	rte_eth_dma_zone_reserve;
>> +	rte_eth_fp_ops_setup;
>>   	rte_eth_hairpin_queue_peer_bind;
>>   	rte_eth_hairpin_queue_peer_unbind;
>>   	rte_eth_hairpin_queue_peer_update;
>> --
>> 2.17.1
>
  
Ferruh Yigit March 3, 2023, 4:51 p.m. UTC | #4
On 3/2/2023 12:08 PM, Konstantin Ananyev wrote:
> 
>> In the proactive error handling mode, the PMD will set the data path
>> pointers to dummy functions and then try recovery, in this period the
>> application may still invoking data path API. This will introduce a
>> race-condition with data path which may lead to crash [1].
>>
>> Although the PMD added delay after setting data path pointers to cover
>> the above race-condition, it reduces the probability, but it doesn't
>> solve the problem.
>>
>> To solve the race-condition problem fundamentally, the following
>> requirements are added:
>> 1. The PMD should set the data path pointers to dummy functions after
>>    report RTE_ETH_EVENT_ERR_RECOVERING event.
>> 2. The application should stop data path API invocation when process
>>    the RTE_ETH_EVENT_ERR_RECOVERING event.
>> 3. The PMD should set the data path pointers to valid functions before
>>    report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>> 4. The application should enable data path API invocation when process
>>    the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>

How this is solving the race-condition, by pushing responsibility to
stop data path to application?

What if application is not interested in recovery modes at all and not
registered any callback for the recovery?

I think driver should not rely on application for this, unless
application explicitly says (to driver) that it is handling recovery,
right now there is no way for driver to know this.


>> Also, this patch introduce a driver internal function
>> rte_eth_fp_ops_setup which used as an help function for PMD.
>>
>> [1] http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
>>
>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>> ---
>>  doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
>>  lib/ethdev/ethdev_driver.c              |  8 +++++++
>>  lib/ethdev/ethdev_driver.h              | 10 ++++++++
>>  lib/ethdev/rte_ethdev.h                 | 32 +++++++++++++++----------
>>  lib/ethdev/version.map                  |  1 +
>>  5 files changed, 46 insertions(+), 25 deletions(-)
>>
>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
>> index c145a9066c..e380ff135a 100644
>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>> @@ -638,14 +638,9 @@ different from the application invokes recovery in PASSIVE mode,
>>  the PMD automatically recovers from error in PROACTIVE mode,
>>  and only a small amount of work is required for the application.
>>
>> -During error detection and automatic recovery,
>> -the PMD sets the data path pointers to dummy functions
>> -(which will prevent the crash),
>> -and also make sure the control path operations fail with a return code ``-EBUSY``.
>> -
>> -Because the PMD recovers automatically,
>> -the application can only sense that the data flow is disconnected for a while
>> -and the control API returns an error in this period.
>> +During error detection and automatic recovery, the PMD sets the data path
>> +pointers to dummy functions and also make sure the control path operations
>> +failed with a return code ``-EBUSY``.
>>
>>  In order to sense the error happening/recovering,
>>  as well as to restore some additional configuration,
>> @@ -653,9 +648,9 @@ three events are available:
>>
>>  ``RTE_ETH_EVENT_ERR_RECOVERING``
>>     Notify the application that an error is detected
>> -   and the recovery is being started.
>> +   and the recovery is about to start.
>>     Upon receiving the event, the application should not invoke
>> -   any control path function until receiving
>> +   any control and data path API until receiving
>>     ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
>>
>>  .. note::
>> @@ -666,8 +661,9 @@ three events are available:
>>
>>  ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
>>     Notify the application that the recovery from error is successful,
>> -   the PMD already re-configures the port,
>> -   and the effect is the same as a restart operation.
>> +   the PMD already re-configures the port.
>> +   The application should restore some additional configuration, and then
>> +   enable data path API invocation.
>>
>>  ``RTE_ETH_EVENT_RECOVERY_FAILED``
>>     Notify the application that the recovery from error failed,
>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
>> index 0be1e8ca04..f994653fe9 100644
>> --- a/lib/ethdev/ethdev_driver.c
>> +++ b/lib/ethdev/ethdev_driver.c
>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev *dev, const char *ring_name,
>>  	return rc;
>>  }
>>
>> +void
>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
>> +{
>> +	if (dev == NULL)
>> +		return;
>> +	eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
>> +}
>> +
>>  const struct rte_memzone *
>>  rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char *ring_name,
>>  			 uint16_t queue_id, size_t size, unsigned int align,
>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
>> index 2c9d615fb5..0d964d1f67 100644
>> --- a/lib/ethdev/ethdev_driver.h
>> +++ b/lib/ethdev/ethdev_driver.h
>> @@ -1621,6 +1621,16 @@ int
>>  rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const char *name,
>>  		 uint16_t queue_id);
>>
>> +/**
>> + * @internal
>> + * Setup eth fast-path API to ethdev values.
>> + *
>> + * @param dev
>> + *  Pointer to struct rte_eth_dev.
>> + */
>> +__rte_internal
>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
>> +
>>  /**
>>   * @internal
>>   * Atomically set the link status for the specific device.
>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>> index 049641d57c..44ee7229c1 100644
>> --- a/lib/ethdev/rte_ethdev.h
>> +++ b/lib/ethdev/rte_ethdev.h
>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
>>  	 */
>>  	RTE_ETH_EVENT_RX_AVAIL_THRESH,
>>  	/** Port recovering from a hardware or firmware error.
>> -	 * If PMD supports proactive error recovery,
>> -	 * it should trigger this event to notify application
>> -	 * that it detected an error and the recovery is being started.
>> -	 * Upon receiving the event, the application should not invoke any control path API
>> -	 * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until receiving
>> -	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED event.
>> -	 * The PMD will set the data path pointers to dummy functions,
>> -	 * and re-set the data path pointers to non-dummy functions
>> -	 * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>> -	 * It means that the application cannot send or receive any packets
>> -	 * during this period.
>> +	 *
>> +	 * If PMD supports proactive error recovery, it should trigger this
>> +	 * event to notify application that it detected an error and the
>> +	 * recovery is about to start.
>> +	 *
>> +	 * Upon receiving the event, the application should not invoke any
>> +	 * control and data path API until receiving
>> +	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
>> +	 * event.
>> +	 *
>> +	 * Once this event is reported, the PMD will set the data path pointers
>> +	 * to dummy functions, and re-set the data path pointers to valid
>> +	 * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>> +	 *
>>  	 * @note Before the PMD reports the recovery result,
>>  	 * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event again,
>>  	 * because a larger error may occur during the recovery.
>>  	 */
>>  	RTE_ETH_EVENT_ERR_RECOVERING,
>>  	/** Port recovers successfully from the error.
>> -	 * The PMD already re-configured the port,
>> -	 * and the effect is the same as a restart operation.
>> +	 *
>> +	 * The PMD already re-configured the port:
>>  	 * a) The following operation will be retained: (alphabetically)
>>  	 *    - DCB configuration
>>  	 *    - FEC configuration
>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
>>  	 *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
>>  	 * c) Any other configuration will not be stored
>>  	 *    and will need to be re-configured.
>> +	 *
>> +	 * The application should restore some additional configuration
>> +	 * (see above case b/c), and then enable data path API invocation.
>>  	 */
>>  	RTE_ETH_EVENT_RECOVERY_SUCCESS,
>>  	/** Port recovery failed.
>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
>> index 357d1a88c0..c273e0bdae 100644
>> --- a/lib/ethdev/version.map
>> +++ b/lib/ethdev/version.map
>> @@ -320,6 +320,7 @@ INTERNAL {
>>  	rte_eth_devices;
>>  	rte_eth_dma_zone_free;
>>  	rte_eth_dma_zone_reserve;
>> +	rte_eth_fp_ops_setup;
>>  	rte_eth_hairpin_queue_peer_bind;
>>  	rte_eth_hairpin_queue_peer_unbind;
>>  	rte_eth_hairpin_queue_peer_update;
>> --
>  
> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> 
>> 2.17.1
>
  
Honnappa Nagarahalli March 4, 2023, 5:08 a.m. UTC | #5
> -----Original Message-----
> From: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> Sent: Thursday, March 2, 2023 6:22 PM
> To: dev@dpdk.org
> Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive error handling
> mode
> 
> 
> 
> >
> >> -----Original Message-----
> >> From: Chengwen Feng <fengchengwen@huawei.com>
> >> Sent: Tuesday, February 28, 2023 9:06 PM
> >> To: thomas@monjalon.net; ferruh.yigit@amd.com;
> >> konstantin.ananyev@huawei.com; Andrew Rybchenko
> >> <andrew.rybchenko@oktetlabs.ru>; Kalesh AP <kalesh-
> >> anakkur.purayil@broadcom.com>; Ajit Khaparde
> >> (ajit.khaparde@broadcom.com) <ajit.khaparde@broadcom.com>
> >> Cc: dev@dpdk.org
> >> Subject: [PATCH 1/5] ethdev: fix race-condition of proactive error
> >> handling mode
> >>
> >> In the proactive error handling mode, the PMD will set the data path
> >> pointers to dummy functions and then try recovery, in this period the
> >> application may still invoking data path API. This will introduce a
> >> race-condition with data path which may lead to crash [1].
> >>
> >> Although the PMD added delay after setting data path pointers to
> >> cover the above race-condition, it reduces the probability, but it
> >> doesn't solve the problem.
> >>
> >> To solve the race-condition problem fundamentally, the following
> >> requirements are added:
> >> 1. The PMD should set the data path pointers to dummy functions after
> >>     report RTE_ETH_EVENT_ERR_RECOVERING event.
> > Do you mean to say, PMD should set the data path pointers after calling the
> call back function?
> > The PMD is running in the context of multiple EAL threads. How do these
> threads synchronize such that only one thread sets these data pointers?
> 
> As I understand this event callback supposed to be called in the context of EAL
> interrupt thread (whoever is more familiar with original idea, feel free to correct
> me if I missed something).
I could not figure this out. It looks to be called from the data plane thread context.
I also have a thought on alternate design at the end, appreciate if you can take a look.
 
> How it is going to signal data-path threads that they need to stop/suspend
> calling data-path API - that's I suppose is left to application to decide...
> Same as right now it is application responsibility to stop data-path threads
> before doing dev_stop()/dev/_config()/etc.
Ok, good, this expectation is not new. The application must have a mechanism already.

> 
> 
> >
> >> 2. The application should stop data path API invocation when process
> >>     the RTE_ETH_EVENT_ERR_RECOVERING event.
> > Any thoughts on how an application can do this?
We can ignore this question as there is already similar expectation set for earlier functionalities.

> >
> >> 3. The PMD should set the data path pointers to valid functions before
> >>     report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >> 4. The application should enable data path API invocation when process
> >>     the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > Do you mean to say that the application should not call the datapath APIs
> while the PMD is running the recovery process?
> 
> Yes, I believe that's the intention.
Ok, this is good and makes sense.

> 
> >>
> >> Also, this patch introduce a driver internal function
> >> rte_eth_fp_ops_setup which used as an help function for PMD.
> >>
> >> [1]
> >>
> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2
> >> -
> >> ashok.k.kaladi@intel.com/
> >>
> >> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
> >> Cc: stable@dpdk.org
> >>
> >> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> >> ---
> >>   doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
> >>   lib/ethdev/ethdev_driver.c              |  8 +++++++
> >>   lib/ethdev/ethdev_driver.h              | 10 ++++++++
> >>   lib/ethdev/rte_ethdev.h                 | 32 +++++++++++++++----------
> >>   lib/ethdev/version.map                  |  1 +
> >>   5 files changed, 46 insertions(+), 25 deletions(-)
> >>
> >> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
> >> b/doc/guides/prog_guide/poll_mode_drv.rst
> >> index c145a9066c..e380ff135a 100644
> >> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> >> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> >> @@ -638,14 +638,9 @@ different from the application invokes recovery
> >> in PASSIVE mode,  the PMD automatically recovers from error in
> >> PROACTIVE mode,  and only a small amount of work is required for the
> application.
> >>
> >> -During error detection and automatic recovery, -the PMD sets the
> >> data path pointers to dummy functions -(which will prevent the
> >> crash), -and also make sure the control path operations fail with a return
> code ``-EBUSY``.
> >> -
> >> -Because the PMD recovers automatically, -the application can only
> >> sense that the data flow is disconnected for a while -and the control
> >> API returns an error in this period.
> >> +During error detection and automatic recovery, the PMD sets the data
> >> +path pointers to dummy functions and also make sure the control path
> >> +operations failed with a return code ``-EBUSY``.
> >>
> >>   In order to sense the error happening/recovering,  as well as to
> >> restore some additional configuration, @@ -653,9 +648,9 @@ three events
> are available:
> >>
> >>   ``RTE_ETH_EVENT_ERR_RECOVERING``
> >>      Notify the application that an error is detected
> >> -   and the recovery is being started.
> >> +   and the recovery is about to start.
> >>      Upon receiving the event, the application should not invoke
> >> -   any control path function until receiving
> >> +   any control and data path API until receiving
> >>      ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
> >> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
> >>
> >>   .. note::
> >> @@ -666,8 +661,9 @@ three events are available:
> >>
> >>   ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
> >>      Notify the application that the recovery from error is successful,
> >> -   the PMD already re-configures the port,
> >> -   and the effect is the same as a restart operation.
> >> +   the PMD already re-configures the port.
> >> +   The application should restore some additional configuration, and
> >> + then
> > What is the additional configuration? Is this specific to each NIC/PMD?
> > I thought, this is an auto recovery process and the application does not require
> to reconfigure anything. If the application has to restore the configuration, how
> does auto recovery differ from typical recovery process?
> >
> >> +   enable data path API invocation.
> >>
> >>   ``RTE_ETH_EVENT_RECOVERY_FAILED``
> >>      Notify the application that the recovery from error failed, diff
> >> --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c index
> >> 0be1e8ca04..f994653fe9 100644
> >> --- a/lib/ethdev/ethdev_driver.c
> >> +++ b/lib/ethdev/ethdev_driver.c
> >> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev
> >> *dev, const char *ring_name,
> >>   	return rc;
> >>   }
> >>
> >> +void
> >> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev) {
> >> +	if (dev == NULL)
> >> +		return;
> >> +	eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev); }
> >> +
> >>   const struct rte_memzone *
> >>   rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char
> >> *ring_name,
> >>   			 uint16_t queue_id, size_t size, unsigned int align, diff -
> -git
> >> a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index
> >> 2c9d615fb5..0d964d1f67 100644
> >> --- a/lib/ethdev/ethdev_driver.h
> >> +++ b/lib/ethdev/ethdev_driver.h
> >> @@ -1621,6 +1621,16 @@ int
> >>   rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const char
> >> *name,
> >>   		 uint16_t queue_id);
> >>
> >> +/**
> >> + * @internal
> >> + * Setup eth fast-path API to ethdev values.
> >> + *
> >> + * @param dev
> >> + *  Pointer to struct rte_eth_dev.
> >> + */
> >> +__rte_internal
> >> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
> >> +
> >>   /**
> >>    * @internal
> >>    * Atomically set the link status for the specific device.
> >> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> >> 049641d57c..44ee7229c1 100644
> >> --- a/lib/ethdev/rte_ethdev.h
> >> +++ b/lib/ethdev/rte_ethdev.h
> >> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
> >>   	 */
> >>   	RTE_ETH_EVENT_RX_AVAIL_THRESH,
> >>   	/** Port recovering from a hardware or firmware error.
> >> -	 * If PMD supports proactive error recovery,
> >> -	 * it should trigger this event to notify application
> >> -	 * that it detected an error and the recovery is being started.
> >> -	 * Upon receiving the event, the application should not invoke any
> >> control path API
> >> -	 * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until receiving
> >> -	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> >> RTE_ETH_EVENT_RECOVERY_FAILED event.
> >> -	 * The PMD will set the data path pointers to dummy functions,
> >> -	 * and re-set the data path pointers to non-dummy functions
> >> -	 * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >> -	 * It means that the application cannot send or receive any packets
> >> -	 * during this period.
> >> +	 *
> >> +	 * If PMD supports proactive error recovery, it should trigger this
> >> +	 * event to notify application that it detected an error and the
> >> +	 * recovery is about to start.
> >> +	 *
> >> +	 * Upon receiving the event, the application should not invoke any
> >> +	 * control and data path API until receiving
> >> +	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> >> RTE_ETH_EVENT_RECOVERY_FAILED
> >> +	 * event.
> >> +	 *
> >> +	 * Once this event is reported, the PMD will set the data path pointers
> >> +	 * to dummy functions, and re-set the data path pointers to valid
> >> +	 * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
> >> event.
> > Why do we need to set the data path pointers to dummy functions if the
> application is restricted from invoking any control and data path APIs till the
> recovery process is completed?
> 
> You are right, in theory it is not mandatory.
> Though it helps to flag a problem if user will still try to call them while recovery is
> in progress.
Ok, may be in debug mode.
I mean, we have already set an expectation to the application that it should not call and the application has implemented a method to do the same. Why do we need to complicate this?
If the application calls the APIs, it is a programming error.

> Again, same as we doing in dev_stop().

> 
> >
> >> +	 *
> >>   	 * @note Before the PMD reports the recovery result,
> >>   	 * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
> >> again,
> >>   	 * because a larger error may occur during the recovery.
> >>   	 */
> >>   	RTE_ETH_EVENT_ERR_RECOVERING,
> > I understand this is not a change in this patch. But, just wondering, what is the
> purpose of this? How is the application supposed to use this?
> >
> >>   	/** Port recovers successfully from the error.
> >> -	 * The PMD already re-configured the port,
> >> -	 * and the effect is the same as a restart operation.
> >> +	 *
> >> +	 * The PMD already re-configured the port:
> >>   	 * a) The following operation will be retained: (alphabetically)
> >>   	 *    - DCB configuration
> >>   	 *    - FEC configuration
> >> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
> >>   	 *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
> >>   	 * c) Any other configuration will not be stored
> >>   	 *    and will need to be re-configured.
> >> +	 *
> >> +	 * The application should restore some additional configuration
> >> +	 * (see above case b/c), and then enable data path API invocation.
> >>   	 */
> >>   	RTE_ETH_EVENT_RECOVERY_SUCCESS,
> >>   	/** Port recovery failed.
> >> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map index
> >> 357d1a88c0..c273e0bdae 100644
> >> --- a/lib/ethdev/version.map
> >> +++ b/lib/ethdev/version.map
> >> @@ -320,6 +320,7 @@ INTERNAL {
> >>   	rte_eth_devices;
> >>   	rte_eth_dma_zone_free;
> >>   	rte_eth_dma_zone_reserve;
> >> +	rte_eth_fp_ops_setup;
> >>   	rte_eth_hairpin_queue_peer_bind;
> >>   	rte_eth_hairpin_queue_peer_unbind;
> >>   	rte_eth_hairpin_queue_peer_update;
> >> --
> >> 2.17.1
> >

Is there any reason not to design this in the same way as 'rte_eth_dev_reset'? Why does the PMD have to recover by itself?
We could have a similar API 'rte_eth_dev_recover' to do the recovery functionality.
  
Konstantin Ananyev March 5, 2023, 2:53 p.m. UTC | #6
03/03/2023 16:51, Ferruh Yigit пишет:
> On 3/2/2023 12:08 PM, Konstantin Ananyev wrote:
>>
>>> In the proactive error handling mode, the PMD will set the data path
>>> pointers to dummy functions and then try recovery, in this period the
>>> application may still invoking data path API. This will introduce a
>>> race-condition with data path which may lead to crash [1].
>>>
>>> Although the PMD added delay after setting data path pointers to cover
>>> the above race-condition, it reduces the probability, but it doesn't
>>> solve the problem.
>>>
>>> To solve the race-condition problem fundamentally, the following
>>> requirements are added:
>>> 1. The PMD should set the data path pointers to dummy functions after
>>>     report RTE_ETH_EVENT_ERR_RECOVERING event.
>>> 2. The application should stop data path API invocation when process
>>>     the RTE_ETH_EVENT_ERR_RECOVERING event.
>>> 3. The PMD should set the data path pointers to valid functions before
>>>     report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>> 4. The application should enable data path API invocation when process
>>>     the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>
> 
> How this is solving the race-condition, by pushing responsibility to
> stop data path to application?

Exactly, it becomes application responsibility to make sure data-path is
stopped/suspended before recovery will continue.

> 
> What if application is not interested in recovery modes at all and not
> registered any callback for the recovery?


Are you saying there is no way for application to disable
automatic recovery in PMD if it is not interested
(or can't full-fill per-requesties for it)?
If so, then yes it is a problem and we need to fix it.
I assumed that such mechanism to disable unwanted events already exists,
but I can't find anything.
Wonder what would be the easiest way here - can PMD make a decision 
based on callback return value, or do we need a new API to 
enable/disable callbacks, or ...?


> I think driver should not rely on application for this, unless
> application explicitly says (to driver) that it is handling recovery,
> right now there is no way for driver to know this.

I think it is visa-versa:
application should not enable auto-recovery if it can't meet
per-requeststies for it (provide appropriate callback).


> 
>>> Also, this patch introduce a driver internal function
>>> rte_eth_fp_ops_setup which used as an help function for PMD.
>>>
>>> [1] http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
>>>
>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>>> ---
>>>   doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
>>>   lib/ethdev/ethdev_driver.c              |  8 +++++++
>>>   lib/ethdev/ethdev_driver.h              | 10 ++++++++
>>>   lib/ethdev/rte_ethdev.h                 | 32 +++++++++++++++----------
>>>   lib/ethdev/version.map                  |  1 +
>>>   5 files changed, 46 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
>>> index c145a9066c..e380ff135a 100644
>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>>> @@ -638,14 +638,9 @@ different from the application invokes recovery in PASSIVE mode,
>>>   the PMD automatically recovers from error in PROACTIVE mode,
>>>   and only a small amount of work is required for the application.
>>>
>>> -During error detection and automatic recovery,
>>> -the PMD sets the data path pointers to dummy functions
>>> -(which will prevent the crash),
>>> -and also make sure the control path operations fail with a return code ``-EBUSY``.
>>> -
>>> -Because the PMD recovers automatically,
>>> -the application can only sense that the data flow is disconnected for a while
>>> -and the control API returns an error in this period.
>>> +During error detection and automatic recovery, the PMD sets the data path
>>> +pointers to dummy functions and also make sure the control path operations
>>> +failed with a return code ``-EBUSY``.
>>>
>>>   In order to sense the error happening/recovering,
>>>   as well as to restore some additional configuration,
>>> @@ -653,9 +648,9 @@ three events are available:
>>>
>>>   ``RTE_ETH_EVENT_ERR_RECOVERING``
>>>      Notify the application that an error is detected
>>> -   and the recovery is being started.
>>> +   and the recovery is about to start.
>>>      Upon receiving the event, the application should not invoke
>>> -   any control path function until receiving
>>> +   any control and data path API until receiving
>>>      ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
>>>
>>>   .. note::
>>> @@ -666,8 +661,9 @@ three events are available:
>>>
>>>   ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
>>>      Notify the application that the recovery from error is successful,
>>> -   the PMD already re-configures the port,
>>> -   and the effect is the same as a restart operation.
>>> +   the PMD already re-configures the port.
>>> +   The application should restore some additional configuration, and then
>>> +   enable data path API invocation.
>>>
>>>   ``RTE_ETH_EVENT_RECOVERY_FAILED``
>>>      Notify the application that the recovery from error failed,
>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
>>> index 0be1e8ca04..f994653fe9 100644
>>> --- a/lib/ethdev/ethdev_driver.c
>>> +++ b/lib/ethdev/ethdev_driver.c
>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev *dev, const char *ring_name,
>>>   	return rc;
>>>   }
>>>
>>> +void
>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
>>> +{
>>> +	if (dev == NULL)
>>> +		return;
>>> +	eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
>>> +}
>>> +
>>>   const struct rte_memzone *
>>>   rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char *ring_name,
>>>   			 uint16_t queue_id, size_t size, unsigned int align,
>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
>>> index 2c9d615fb5..0d964d1f67 100644
>>> --- a/lib/ethdev/ethdev_driver.h
>>> +++ b/lib/ethdev/ethdev_driver.h
>>> @@ -1621,6 +1621,16 @@ int
>>>   rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const char *name,
>>>   		 uint16_t queue_id);
>>>
>>> +/**
>>> + * @internal
>>> + * Setup eth fast-path API to ethdev values.
>>> + *
>>> + * @param dev
>>> + *  Pointer to struct rte_eth_dev.
>>> + */
>>> +__rte_internal
>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
>>> +
>>>   /**
>>>    * @internal
>>>    * Atomically set the link status for the specific device.
>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>>> index 049641d57c..44ee7229c1 100644
>>> --- a/lib/ethdev/rte_ethdev.h
>>> +++ b/lib/ethdev/rte_ethdev.h
>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
>>>   	 */
>>>   	RTE_ETH_EVENT_RX_AVAIL_THRESH,
>>>   	/** Port recovering from a hardware or firmware error.
>>> -	 * If PMD supports proactive error recovery,
>>> -	 * it should trigger this event to notify application
>>> -	 * that it detected an error and the recovery is being started.
>>> -	 * Upon receiving the event, the application should not invoke any control path API
>>> -	 * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until receiving
>>> -	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED event.
>>> -	 * The PMD will set the data path pointers to dummy functions,
>>> -	 * and re-set the data path pointers to non-dummy functions
>>> -	 * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>> -	 * It means that the application cannot send or receive any packets
>>> -	 * during this period.
>>> +	 *
>>> +	 * If PMD supports proactive error recovery, it should trigger this
>>> +	 * event to notify application that it detected an error and the
>>> +	 * recovery is about to start.
>>> +	 *
>>> +	 * Upon receiving the event, the application should not invoke any
>>> +	 * control and data path API until receiving
>>> +	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
>>> +	 * event.
>>> +	 *
>>> +	 * Once this event is reported, the PMD will set the data path pointers
>>> +	 * to dummy functions, and re-set the data path pointers to valid
>>> +	 * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>> +	 *
>>>   	 * @note Before the PMD reports the recovery result,
>>>   	 * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event again,
>>>   	 * because a larger error may occur during the recovery.
>>>   	 */
>>>   	RTE_ETH_EVENT_ERR_RECOVERING,
>>>   	/** Port recovers successfully from the error.
>>> -	 * The PMD already re-configured the port,
>>> -	 * and the effect is the same as a restart operation.
>>> +	 *
>>> +	 * The PMD already re-configured the port:
>>>   	 * a) The following operation will be retained: (alphabetically)
>>>   	 *    - DCB configuration
>>>   	 *    - FEC configuration
>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
>>>   	 *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
>>>   	 * c) Any other configuration will not be stored
>>>   	 *    and will need to be re-configured.
>>> +	 *
>>> +	 * The application should restore some additional configuration
>>> +	 * (see above case b/c), and then enable data path API invocation.
>>>   	 */
>>>   	RTE_ETH_EVENT_RECOVERY_SUCCESS,
>>>   	/** Port recovery failed.
>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
>>> index 357d1a88c0..c273e0bdae 100644
>>> --- a/lib/ethdev/version.map
>>> +++ b/lib/ethdev/version.map
>>> @@ -320,6 +320,7 @@ INTERNAL {
>>>   	rte_eth_devices;
>>>   	rte_eth_dma_zone_free;
>>>   	rte_eth_dma_zone_reserve;
>>> +	rte_eth_fp_ops_setup;
>>>   	rte_eth_hairpin_queue_peer_bind;
>>>   	rte_eth_hairpin_queue_peer_unbind;
>>>   	rte_eth_hairpin_queue_peer_update;
>>> --
>>   
>> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
>>
>>> 2.17.1
>>
>
  
Konstantin Ananyev March 5, 2023, 3:23 p.m. UTC | #7
>>>>
>>>> In the proactive error handling mode, the PMD will set the data path
>>>> pointers to dummy functions and then try recovery, in this period the
>>>> application may still invoking data path API. This will introduce a
>>>> race-condition with data path which may lead to crash [1].
>>>>
>>>> Although the PMD added delay after setting data path pointers to
>>>> cover the above race-condition, it reduces the probability, but it
>>>> doesn't solve the problem.
>>>>
>>>> To solve the race-condition problem fundamentally, the following
>>>> requirements are added:
>>>> 1. The PMD should set the data path pointers to dummy functions after
>>>>      report RTE_ETH_EVENT_ERR_RECOVERING event.
>>> Do you mean to say, PMD should set the data path pointers after calling the
>> call back function?
>>> The PMD is running in the context of multiple EAL threads. How do these
>> threads synchronize such that only one thread sets these data pointers?
>>
>> As I understand this event callback supposed to be called in the context of EAL
>> interrupt thread (whoever is more familiar with original idea, feel free to correct
>> me if I missed something).
> I could not figure this out. It looks to be called from the data plane thread context.
> I also have a thought on alternate design at the end, appreciate if you can take a look.
>   
>> How it is going to signal data-path threads that they need to stop/suspend
>> calling data-path API - that's I suppose is left to application to decide...
>> Same as right now it is application responsibility to stop data-path threads
>> before doing dev_stop()/dev/_config()/etc.
> Ok, good, this expectation is not new. The application must have a mechanism already.
> 
>>
>>
>>>
>>>> 2. The application should stop data path API invocation when process
>>>>      the RTE_ETH_EVENT_ERR_RECOVERING event.
>>> Any thoughts on how an application can do this?
> We can ignore this question as there is already similar expectation set for earlier functionalities.
> 
>>>
>>>> 3. The PMD should set the data path pointers to valid functions before
>>>>      report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>> 4. The application should enable data path API invocation when process
>>>>      the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>> Do you mean to say that the application should not call the datapath APIs
>> while the PMD is running the recovery process?
>>
>> Yes, I believe that's the intention.
> Ok, this is good and makes sense.
> 
>>
>>>>
>>>> Also, this patch introduce a driver internal function
>>>> rte_eth_fp_ops_setup which used as an help function for PMD.
>>>>
>>>> [1]
>>>>
>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2
>>>> -
>>>> ashok.k.kaladi@intel.com/
>>>>
>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
>>>> Cc: stable@dpdk.org
>>>>
>>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>>>> ---
>>>>    doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
>>>>    lib/ethdev/ethdev_driver.c              |  8 +++++++
>>>>    lib/ethdev/ethdev_driver.h              | 10 ++++++++
>>>>    lib/ethdev/rte_ethdev.h                 | 32 +++++++++++++++----------
>>>>    lib/ethdev/version.map                  |  1 +
>>>>    5 files changed, 46 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
>>>> b/doc/guides/prog_guide/poll_mode_drv.rst
>>>> index c145a9066c..e380ff135a 100644
>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>>>> @@ -638,14 +638,9 @@ different from the application invokes recovery
>>>> in PASSIVE mode,  the PMD automatically recovers from error in
>>>> PROACTIVE mode,  and only a small amount of work is required for the
>> application.
>>>>
>>>> -During error detection and automatic recovery, -the PMD sets the
>>>> data path pointers to dummy functions -(which will prevent the
>>>> crash), -and also make sure the control path operations fail with a return
>> code ``-EBUSY``.
>>>> -
>>>> -Because the PMD recovers automatically, -the application can only
>>>> sense that the data flow is disconnected for a while -and the control
>>>> API returns an error in this period.
>>>> +During error detection and automatic recovery, the PMD sets the data
>>>> +path pointers to dummy functions and also make sure the control path
>>>> +operations failed with a return code ``-EBUSY``.
>>>>
>>>>    In order to sense the error happening/recovering,  as well as to
>>>> restore some additional configuration, @@ -653,9 +648,9 @@ three events
>> are available:
>>>>
>>>>    ``RTE_ETH_EVENT_ERR_RECOVERING``
>>>>       Notify the application that an error is detected
>>>> -   and the recovery is being started.
>>>> +   and the recovery is about to start.
>>>>       Upon receiving the event, the application should not invoke
>>>> -   any control path function until receiving
>>>> +   any control and data path API until receiving
>>>>       ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
>>>>
>>>>    .. note::
>>>> @@ -666,8 +661,9 @@ three events are available:
>>>>
>>>>    ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
>>>>       Notify the application that the recovery from error is successful,
>>>> -   the PMD already re-configures the port,
>>>> -   and the effect is the same as a restart operation.
>>>> +   the PMD already re-configures the port.
>>>> +   The application should restore some additional configuration, and
>>>> + then
>>> What is the additional configuration? Is this specific to each NIC/PMD?
>>> I thought, this is an auto recovery process and the application does not require
>> to reconfigure anything. If the application has to restore the configuration, how
>> does auto recovery differ from typical recovery process?
>>>
>>>> +   enable data path API invocation.
>>>>
>>>>    ``RTE_ETH_EVENT_RECOVERY_FAILED``
>>>>       Notify the application that the recovery from error failed, diff
>>>> --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c index
>>>> 0be1e8ca04..f994653fe9 100644
>>>> --- a/lib/ethdev/ethdev_driver.c
>>>> +++ b/lib/ethdev/ethdev_driver.c
>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev
>>>> *dev, const char *ring_name,
>>>>    	return rc;
>>>>    }
>>>>
>>>> +void
>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev) {
>>>> +	if (dev == NULL)
>>>> +		return;
>>>> +	eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev); }
>>>> +
>>>>    const struct rte_memzone *
>>>>    rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char
>>>> *ring_name,
>>>>    			 uint16_t queue_id, size_t size, unsigned int align, diff -
>> -git
>>>> a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index
>>>> 2c9d615fb5..0d964d1f67 100644
>>>> --- a/lib/ethdev/ethdev_driver.h
>>>> +++ b/lib/ethdev/ethdev_driver.h
>>>> @@ -1621,6 +1621,16 @@ int
>>>>    rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const char
>>>> *name,
>>>>    		 uint16_t queue_id);
>>>>
>>>> +/**
>>>> + * @internal
>>>> + * Setup eth fast-path API to ethdev values.
>>>> + *
>>>> + * @param dev
>>>> + *  Pointer to struct rte_eth_dev.
>>>> + */
>>>> +__rte_internal
>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
>>>> +
>>>>    /**
>>>>     * @internal
>>>>     * Atomically set the link status for the specific device.
>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
>>>> 049641d57c..44ee7229c1 100644
>>>> --- a/lib/ethdev/rte_ethdev.h
>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
>>>>    	 */
>>>>    	RTE_ETH_EVENT_RX_AVAIL_THRESH,
>>>>    	/** Port recovering from a hardware or firmware error.
>>>> -	 * If PMD supports proactive error recovery,
>>>> -	 * it should trigger this event to notify application
>>>> -	 * that it detected an error and the recovery is being started.
>>>> -	 * Upon receiving the event, the application should not invoke any
>>>> control path API
>>>> -	 * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until receiving
>>>> -	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or
>>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
>>>> -	 * The PMD will set the data path pointers to dummy functions,
>>>> -	 * and re-set the data path pointers to non-dummy functions
>>>> -	 * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>> -	 * It means that the application cannot send or receive any packets
>>>> -	 * during this period.
>>>> +	 *
>>>> +	 * If PMD supports proactive error recovery, it should trigger this
>>>> +	 * event to notify application that it detected an error and the
>>>> +	 * recovery is about to start.
>>>> +	 *
>>>> +	 * Upon receiving the event, the application should not invoke any
>>>> +	 * control and data path API until receiving
>>>> +	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or
>>>> RTE_ETH_EVENT_RECOVERY_FAILED
>>>> +	 * event.
>>>> +	 *
>>>> +	 * Once this event is reported, the PMD will set the data path pointers
>>>> +	 * to dummy functions, and re-set the data path pointers to valid
>>>> +	 * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
>>>> event.
>>> Why do we need to set the data path pointers to dummy functions if the
>> application is restricted from invoking any control and data path APIs till the
>> recovery process is completed?
>>
>> You are right, in theory it is not mandatory.
>> Though it helps to flag a problem if user will still try to call them while recovery is
>> in progress.
> Ok, may be in debug mode.
> I mean, we have already set an expectation to the application that it should not call and the application has implemented a method to do the same. Why do we need to complicate this?
> If the application calls the APIs, it is a programming error.


My preference would be to keep it this way for both debug and non-debug 
mode.
It doesn't cost anything to us in terms of perfomance, but helps to 
catch problems with wrong behaving app.

> 
>> Again, same as we doing in dev_stop().
> 
>>
>>>
>>>> +	 *
>>>>    	 * @note Before the PMD reports the recovery result,
>>>>    	 * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
>>>> again,
>>>>    	 * because a larger error may occur during the recovery.
>>>>    	 */
>>>>    	RTE_ETH_EVENT_ERR_RECOVERING,
>>> I understand this is not a change in this patch. But, just wondering, what is the
>> purpose of this? How is the application supposed to use this?
>>>
>>>>    	/** Port recovers successfully from the error.
>>>> -	 * The PMD already re-configured the port,
>>>> -	 * and the effect is the same as a restart operation.
>>>> +	 *
>>>> +	 * The PMD already re-configured the port:
>>>>    	 * a) The following operation will be retained: (alphabetically)
>>>>    	 *    - DCB configuration
>>>>    	 *    - FEC configuration
>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
>>>>    	 *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
>>>>    	 * c) Any other configuration will not be stored
>>>>    	 *    and will need to be re-configured.
>>>> +	 *
>>>> +	 * The application should restore some additional configuration
>>>> +	 * (see above case b/c), and then enable data path API invocation.
>>>>    	 */
>>>>    	RTE_ETH_EVENT_RECOVERY_SUCCESS,
>>>>    	/** Port recovery failed.
>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map index
>>>> 357d1a88c0..c273e0bdae 100644
>>>> --- a/lib/ethdev/version.map
>>>> +++ b/lib/ethdev/version.map
>>>> @@ -320,6 +320,7 @@ INTERNAL {
>>>>    	rte_eth_devices;
>>>>    	rte_eth_dma_zone_free;
>>>>    	rte_eth_dma_zone_reserve;
>>>> +	rte_eth_fp_ops_setup;
>>>>    	rte_eth_hairpin_queue_peer_bind;
>>>>    	rte_eth_hairpin_queue_peer_unbind;
>>>>    	rte_eth_hairpin_queue_peer_update;
>>>> --
>>>> 2.17.1
>>>
> 
> Is there any reason not to design this in the same way as 'rte_eth_dev_reset'? Why does the PMD have to recover by itself?

I suppose it is a question for the authors of original patch...

> We could have a similar API 'rte_eth_dev_recover' to do the recovery functionality.

I suppose such approach is also possible.
Personally I am fine with both ways: either existing one or what you 
propose, as long as we'll fix existing race-condition.
What is good with what you suggest - that way we probably don't need to
worry how to allow user to enable/disable auto-recovery inside PMD.

Konstantin
  
Chengwen Feng March 6, 2023, 1:41 a.m. UTC | #8
On 2023/3/4 0:51, Ferruh Yigit wrote:
> On 3/2/2023 12:08 PM, Konstantin Ananyev wrote:
>>
>>> In the proactive error handling mode, the PMD will set the data path
>>> pointers to dummy functions and then try recovery, in this period the
>>> application may still invoking data path API. This will introduce a
>>> race-condition with data path which may lead to crash [1].
>>>
>>> Although the PMD added delay after setting data path pointers to cover
>>> the above race-condition, it reduces the probability, but it doesn't
>>> solve the problem.
>>>
>>> To solve the race-condition problem fundamentally, the following
>>> requirements are added:
>>> 1. The PMD should set the data path pointers to dummy functions after
>>>    report RTE_ETH_EVENT_ERR_RECOVERING event.
>>> 2. The application should stop data path API invocation when process
>>>    the RTE_ETH_EVENT_ERR_RECOVERING event.
>>> 3. The PMD should set the data path pointers to valid functions before
>>>    report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>> 4. The application should enable data path API invocation when process
>>>    the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>
> 
> How this is solving the race-condition, by pushing responsibility to
> stop data path to application?

Yes, I think it's more practical to collaborate with application.

The application will control API invocation (including control and data path),
From a DPDK SDK perspective, it has a God perspective.

> 
> What if application is not interested in recovery modes at all and not
> registered any callback for the recovery?

There's probably race-condition which may lead to crash, because DPDK worker
threads runs busyloop and located on isolated core, and also PMDs add delay time,
the actual probability of occurence is very very low, at least for HNS3 pmd it
has not run out for at least four years.

> 
> I think driver should not rely on application for this, unless
> application explicitly says (to driver) that it is handling recovery,

If application register the event callback, the PMD could deduce that application will know this.
If application not register, then PMD will recovery itself and maybe race-condition.

> right now there is no way for driver to know this.
> 
> 
>>> Also, this patch introduce a driver internal function
>>> rte_eth_fp_ops_setup which used as an help function for PMD.
>>>
>>> [1] http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
>>>
>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>>> ---
>>>  doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
>>>  lib/ethdev/ethdev_driver.c              |  8 +++++++
>>>  lib/ethdev/ethdev_driver.h              | 10 ++++++++
>>>  lib/ethdev/rte_ethdev.h                 | 32 +++++++++++++++----------
>>>  lib/ethdev/version.map                  |  1 +
>>>  5 files changed, 46 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
>>> index c145a9066c..e380ff135a 100644
>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>>> @@ -638,14 +638,9 @@ different from the application invokes recovery in PASSIVE mode,
>>>  the PMD automatically recovers from error in PROACTIVE mode,
>>>  and only a small amount of work is required for the application.
>>>
>>> -During error detection and automatic recovery,
>>> -the PMD sets the data path pointers to dummy functions
>>> -(which will prevent the crash),
>>> -and also make sure the control path operations fail with a return code ``-EBUSY``.
>>> -
>>> -Because the PMD recovers automatically,
>>> -the application can only sense that the data flow is disconnected for a while
>>> -and the control API returns an error in this period.
>>> +During error detection and automatic recovery, the PMD sets the data path
>>> +pointers to dummy functions and also make sure the control path operations
>>> +failed with a return code ``-EBUSY``.
>>>
>>>  In order to sense the error happening/recovering,
>>>  as well as to restore some additional configuration,
>>> @@ -653,9 +648,9 @@ three events are available:
>>>
>>>  ``RTE_ETH_EVENT_ERR_RECOVERING``
>>>     Notify the application that an error is detected
>>> -   and the recovery is being started.
>>> +   and the recovery is about to start.
>>>     Upon receiving the event, the application should not invoke
>>> -   any control path function until receiving
>>> +   any control and data path API until receiving
>>>     ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
>>>
>>>  .. note::
>>> @@ -666,8 +661,9 @@ three events are available:
>>>
>>>  ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
>>>     Notify the application that the recovery from error is successful,
>>> -   the PMD already re-configures the port,
>>> -   and the effect is the same as a restart operation.
>>> +   the PMD already re-configures the port.
>>> +   The application should restore some additional configuration, and then
>>> +   enable data path API invocation.
>>>
>>>  ``RTE_ETH_EVENT_RECOVERY_FAILED``
>>>     Notify the application that the recovery from error failed,
>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
>>> index 0be1e8ca04..f994653fe9 100644
>>> --- a/lib/ethdev/ethdev_driver.c
>>> +++ b/lib/ethdev/ethdev_driver.c
>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev *dev, const char *ring_name,
>>>  	return rc;
>>>  }
>>>
>>> +void
>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
>>> +{
>>> +	if (dev == NULL)
>>> +		return;
>>> +	eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
>>> +}
>>> +
>>>  const struct rte_memzone *
>>>  rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char *ring_name,
>>>  			 uint16_t queue_id, size_t size, unsigned int align,
>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
>>> index 2c9d615fb5..0d964d1f67 100644
>>> --- a/lib/ethdev/ethdev_driver.h
>>> +++ b/lib/ethdev/ethdev_driver.h
>>> @@ -1621,6 +1621,16 @@ int
>>>  rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const char *name,
>>>  		 uint16_t queue_id);
>>>
>>> +/**
>>> + * @internal
>>> + * Setup eth fast-path API to ethdev values.
>>> + *
>>> + * @param dev
>>> + *  Pointer to struct rte_eth_dev.
>>> + */
>>> +__rte_internal
>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
>>> +
>>>  /**
>>>   * @internal
>>>   * Atomically set the link status for the specific device.
>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>>> index 049641d57c..44ee7229c1 100644
>>> --- a/lib/ethdev/rte_ethdev.h
>>> +++ b/lib/ethdev/rte_ethdev.h
>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
>>>  	 */
>>>  	RTE_ETH_EVENT_RX_AVAIL_THRESH,
>>>  	/** Port recovering from a hardware or firmware error.
>>> -	 * If PMD supports proactive error recovery,
>>> -	 * it should trigger this event to notify application
>>> -	 * that it detected an error and the recovery is being started.
>>> -	 * Upon receiving the event, the application should not invoke any control path API
>>> -	 * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until receiving
>>> -	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED event.
>>> -	 * The PMD will set the data path pointers to dummy functions,
>>> -	 * and re-set the data path pointers to non-dummy functions
>>> -	 * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>> -	 * It means that the application cannot send or receive any packets
>>> -	 * during this period.
>>> +	 *
>>> +	 * If PMD supports proactive error recovery, it should trigger this
>>> +	 * event to notify application that it detected an error and the
>>> +	 * recovery is about to start.
>>> +	 *
>>> +	 * Upon receiving the event, the application should not invoke any
>>> +	 * control and data path API until receiving
>>> +	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
>>> +	 * event.
>>> +	 *
>>> +	 * Once this event is reported, the PMD will set the data path pointers
>>> +	 * to dummy functions, and re-set the data path pointers to valid
>>> +	 * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>> +	 *
>>>  	 * @note Before the PMD reports the recovery result,
>>>  	 * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event again,
>>>  	 * because a larger error may occur during the recovery.
>>>  	 */
>>>  	RTE_ETH_EVENT_ERR_RECOVERING,
>>>  	/** Port recovers successfully from the error.
>>> -	 * The PMD already re-configured the port,
>>> -	 * and the effect is the same as a restart operation.
>>> +	 *
>>> +	 * The PMD already re-configured the port:
>>>  	 * a) The following operation will be retained: (alphabetically)
>>>  	 *    - DCB configuration
>>>  	 *    - FEC configuration
>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
>>>  	 *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
>>>  	 * c) Any other configuration will not be stored
>>>  	 *    and will need to be re-configured.
>>> +	 *
>>> +	 * The application should restore some additional configuration
>>> +	 * (see above case b/c), and then enable data path API invocation.
>>>  	 */
>>>  	RTE_ETH_EVENT_RECOVERY_SUCCESS,
>>>  	/** Port recovery failed.
>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
>>> index 357d1a88c0..c273e0bdae 100644
>>> --- a/lib/ethdev/version.map
>>> +++ b/lib/ethdev/version.map
>>> @@ -320,6 +320,7 @@ INTERNAL {
>>>  	rte_eth_devices;
>>>  	rte_eth_dma_zone_free;
>>>  	rte_eth_dma_zone_reserve;
>>> +	rte_eth_fp_ops_setup;
>>>  	rte_eth_hairpin_queue_peer_bind;
>>>  	rte_eth_hairpin_queue_peer_unbind;
>>>  	rte_eth_hairpin_queue_peer_update;
>>> --
>>  
>> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
>>
>>> 2.17.1
>>
> 
> .
>
  
Ferruh Yigit March 6, 2023, 8:55 a.m. UTC | #9
On 3/5/2023 2:53 PM, Konstantin Ananyev wrote:
> 03/03/2023 16:51, Ferruh Yigit пишет:
>> On 3/2/2023 12:08 PM, Konstantin Ananyev wrote:
>>>
>>>> In the proactive error handling mode, the PMD will set the data path
>>>> pointers to dummy functions and then try recovery, in this period the
>>>> application may still invoking data path API. This will introduce a
>>>> race-condition with data path which may lead to crash [1].
>>>>
>>>> Although the PMD added delay after setting data path pointers to cover
>>>> the above race-condition, it reduces the probability, but it doesn't
>>>> solve the problem.
>>>>
>>>> To solve the race-condition problem fundamentally, the following
>>>> requirements are added:
>>>> 1. The PMD should set the data path pointers to dummy functions after
>>>>     report RTE_ETH_EVENT_ERR_RECOVERING event.
>>>> 2. The application should stop data path API invocation when process
>>>>     the RTE_ETH_EVENT_ERR_RECOVERING event.
>>>> 3. The PMD should set the data path pointers to valid functions before
>>>>     report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>> 4. The application should enable data path API invocation when process
>>>>     the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>
>>
>> How this is solving the race-condition, by pushing responsibility to
>> stop data path to application?
> 
> Exactly, it becomes application responsibility to make sure data-path is
> stopped/suspended before recovery will continue.
> 

From documentation of the feature:

``
Because the PMD recovers automatically,
the application can only sense that the data flow is disconnected for a
while and the control API returns an error in this period.

In order to sense the error happening/recovering, as well as to restore
some additional configuration, three events are available:
``

It looks like initial design is to use events mainly inform application
about what happened and mainly for re-configuration.

Although I am don't disagree to involve the application, I am not sure
that is part of current design.

>>
>> What if application is not interested in recovery modes at all and not
>> registered any callback for the recovery?
> 
> 
> Are you saying there is no way for application to disable
> automatic recovery in PMD if it is not interested
> (or can't full-fill per-requesties for it)?
> If so, then yes it is a problem and we need to fix it.
> I assumed that such mechanism to disable unwanted events already exists,
> but I can't find anything.
> Wonder what would be the easiest way here - can PMD make a decision
> based on callback return value, or do we need a new API to
> enable/disable callbacks, or ...?
> 
> 

As far as I can see automatic recovery is not configurable by app.

But that is not all, PMD sends events to application but PMD can't know
if application is handling them or not, so with current design PMD can't
rely on to app.


>> I think driver should not rely on application for this, unless
>> application explicitly says (to driver) that it is handling recovery,
>> right now there is no way for driver to know this.
> 
> I think it is visa-versa:
> application should not enable auto-recovery if it can't meet
> per-requeststies for it (provide appropriate callback).
> 

I agree on above, we are saying similar thing in different perspective.

> 
>>
>>>> Also, this patch introduce a driver internal function
>>>> rte_eth_fp_ops_setup which used as an help function for PMD.
>>>>
>>>> [1]
>>>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
>>>>
>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
>>>> Cc: stable@dpdk.org
>>>>
>>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>>>> ---
>>>>   doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
>>>>   lib/ethdev/ethdev_driver.c              |  8 +++++++
>>>>   lib/ethdev/ethdev_driver.h              | 10 ++++++++
>>>>   lib/ethdev/rte_ethdev.h                 | 32
>>>> +++++++++++++++----------
>>>>   lib/ethdev/version.map                  |  1 +
>>>>   5 files changed, 46 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
>>>> b/doc/guides/prog_guide/poll_mode_drv.rst
>>>> index c145a9066c..e380ff135a 100644
>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>>>> @@ -638,14 +638,9 @@ different from the application invokes recovery
>>>> in PASSIVE mode,
>>>>   the PMD automatically recovers from error in PROACTIVE mode,
>>>>   and only a small amount of work is required for the application.
>>>>
>>>> -During error detection and automatic recovery,
>>>> -the PMD sets the data path pointers to dummy functions
>>>> -(which will prevent the crash),
>>>> -and also make sure the control path operations fail with a return
>>>> code ``-EBUSY``.
>>>> -
>>>> -Because the PMD recovers automatically,
>>>> -the application can only sense that the data flow is disconnected
>>>> for a while
>>>> -and the control API returns an error in this period.
>>>> +During error detection and automatic recovery, the PMD sets the
>>>> data path
>>>> +pointers to dummy functions and also make sure the control path
>>>> operations
>>>> +failed with a return code ``-EBUSY``.
>>>>
>>>>   In order to sense the error happening/recovering,
>>>>   as well as to restore some additional configuration,
>>>> @@ -653,9 +648,9 @@ three events are available:
>>>>
>>>>   ``RTE_ETH_EVENT_ERR_RECOVERING``
>>>>      Notify the application that an error is detected
>>>> -   and the recovery is being started.
>>>> +   and the recovery is about to start.
>>>>      Upon receiving the event, the application should not invoke
>>>> -   any control path function until receiving
>>>> +   any control and data path API until receiving
>>>>      ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
>>>>
>>>>   .. note::
>>>> @@ -666,8 +661,9 @@ three events are available:
>>>>
>>>>   ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
>>>>      Notify the application that the recovery from error is successful,
>>>> -   the PMD already re-configures the port,
>>>> -   and the effect is the same as a restart operation.
>>>> +   the PMD already re-configures the port.
>>>> +   The application should restore some additional configuration,
>>>> and then
>>>> +   enable data path API invocation.
>>>>
>>>>   ``RTE_ETH_EVENT_RECOVERY_FAILED``
>>>>      Notify the application that the recovery from error failed,
>>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
>>>> index 0be1e8ca04..f994653fe9 100644
>>>> --- a/lib/ethdev/ethdev_driver.c
>>>> +++ b/lib/ethdev/ethdev_driver.c
>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev
>>>> *dev, const char *ring_name,
>>>>       return rc;
>>>>   }
>>>>
>>>> +void
>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
>>>> +{
>>>> +    if (dev == NULL)
>>>> +        return;
>>>> +    eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
>>>> +}
>>>> +
>>>>   const struct rte_memzone *
>>>>   rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char
>>>> *ring_name,
>>>>                uint16_t queue_id, size_t size, unsigned int align,
>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
>>>> index 2c9d615fb5..0d964d1f67 100644
>>>> --- a/lib/ethdev/ethdev_driver.h
>>>> +++ b/lib/ethdev/ethdev_driver.h
>>>> @@ -1621,6 +1621,16 @@ int
>>>>   rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
>>>> char *name,
>>>>            uint16_t queue_id);
>>>>
>>>> +/**
>>>> + * @internal
>>>> + * Setup eth fast-path API to ethdev values.
>>>> + *
>>>> + * @param dev
>>>> + *  Pointer to struct rte_eth_dev.
>>>> + */
>>>> +__rte_internal
>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
>>>> +
>>>>   /**
>>>>    * @internal
>>>>    * Atomically set the link status for the specific device.
>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>>>> index 049641d57c..44ee7229c1 100644
>>>> --- a/lib/ethdev/rte_ethdev.h
>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
>>>>        */
>>>>       RTE_ETH_EVENT_RX_AVAIL_THRESH,
>>>>       /** Port recovering from a hardware or firmware error.
>>>> -     * If PMD supports proactive error recovery,
>>>> -     * it should trigger this event to notify application
>>>> -     * that it detected an error and the recovery is being started.
>>>> -     * Upon receiving the event, the application should not invoke
>>>> any control path API
>>>> -     * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until
>>>> receiving
>>>> -     * RTE_ETH_EVENT_RECOVERY_SUCCESS or
>>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
>>>> -     * The PMD will set the data path pointers to dummy functions,
>>>> -     * and re-set the data path pointers to non-dummy functions
>>>> -     * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>> -     * It means that the application cannot send or receive any
>>>> packets
>>>> -     * during this period.
>>>> +     *
>>>> +     * If PMD supports proactive error recovery, it should trigger
>>>> this
>>>> +     * event to notify application that it detected an error and the
>>>> +     * recovery is about to start.
>>>> +     *
>>>> +     * Upon receiving the event, the application should not invoke any
>>>> +     * control and data path API until receiving
>>>> +     * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
>>>> +     * event.
>>>> +     *
>>>> +     * Once this event is reported, the PMD will set the data path
>>>> pointers
>>>> +     * to dummy functions, and re-set the data path pointers to valid
>>>> +     * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
>>>> event.
>>>> +     *
>>>>        * @note Before the PMD reports the recovery result,
>>>>        * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
>>>> again,
>>>>        * because a larger error may occur during the recovery.
>>>>        */
>>>>       RTE_ETH_EVENT_ERR_RECOVERING,
>>>>       /** Port recovers successfully from the error.
>>>> -     * The PMD already re-configured the port,
>>>> -     * and the effect is the same as a restart operation.
>>>> +     *
>>>> +     * The PMD already re-configured the port:
>>>>        * a) The following operation will be retained: (alphabetically)
>>>>        *    - DCB configuration
>>>>        *    - FEC configuration
>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
>>>>        *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
>>>>        * c) Any other configuration will not be stored
>>>>        *    and will need to be re-configured.
>>>> +     *
>>>> +     * The application should restore some additional configuration
>>>> +     * (see above case b/c), and then enable data path API invocation.
>>>>        */
>>>>       RTE_ETH_EVENT_RECOVERY_SUCCESS,
>>>>       /** Port recovery failed.
>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
>>>> index 357d1a88c0..c273e0bdae 100644
>>>> --- a/lib/ethdev/version.map
>>>> +++ b/lib/ethdev/version.map
>>>> @@ -320,6 +320,7 @@ INTERNAL {
>>>>       rte_eth_devices;
>>>>       rte_eth_dma_zone_free;
>>>>       rte_eth_dma_zone_reserve;
>>>> +    rte_eth_fp_ops_setup;
>>>>       rte_eth_hairpin_queue_peer_bind;
>>>>       rte_eth_hairpin_queue_peer_unbind;
>>>>       rte_eth_hairpin_queue_peer_update;
>>>> -- 
>>>   Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
>>>
>>>> 2.17.1
>>>
>>
>
  
Ferruh Yigit March 6, 2023, 8:57 a.m. UTC | #10
On 3/6/2023 1:41 AM, fengchengwen wrote:
> On 2023/3/4 0:51, Ferruh Yigit wrote:
>> On 3/2/2023 12:08 PM, Konstantin Ananyev wrote:
>>>
>>>> In the proactive error handling mode, the PMD will set the data path
>>>> pointers to dummy functions and then try recovery, in this period the
>>>> application may still invoking data path API. This will introduce a
>>>> race-condition with data path which may lead to crash [1].
>>>>
>>>> Although the PMD added delay after setting data path pointers to cover
>>>> the above race-condition, it reduces the probability, but it doesn't
>>>> solve the problem.
>>>>
>>>> To solve the race-condition problem fundamentally, the following
>>>> requirements are added:
>>>> 1. The PMD should set the data path pointers to dummy functions after
>>>>    report RTE_ETH_EVENT_ERR_RECOVERING event.
>>>> 2. The application should stop data path API invocation when process
>>>>    the RTE_ETH_EVENT_ERR_RECOVERING event.
>>>> 3. The PMD should set the data path pointers to valid functions before
>>>>    report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>> 4. The application should enable data path API invocation when process
>>>>    the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>
>>
>> How this is solving the race-condition, by pushing responsibility to
>> stop data path to application?
> 
> Yes, I think it's more practical to collaborate with application.
> 
> The application will control API invocation (including control and data path),
> From a DPDK SDK perspective, it has a God perspective.
> 
>>
>> What if application is not interested in recovery modes at all and not
>> registered any callback for the recovery?
> 
> There's probably race-condition which may lead to crash, because DPDK worker
> threads runs busyloop and located on isolated core, and also PMDs add delay time,
> the actual probability of occurence is very very low, at least for HNS3 pmd it
> has not run out for at least four years.
> 
>>
>> I think driver should not rely on application for this, unless
>> application explicitly says (to driver) that it is handling recovery,
> 
> If application register the event callback, the PMD could deduce that application will know this.
> If application not register, then PMD will recovery itself and maybe race-condition.
> 

If application support is required (that makes sense as you mentioned),
in that case I think application should explicitly enable this feature.

>> right now there is no way for driver to know this.
>>
>>
>>>> Also, this patch introduce a driver internal function
>>>> rte_eth_fp_ops_setup which used as an help function for PMD.
>>>>
>>>> [1] http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
>>>>
>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
>>>> Cc: stable@dpdk.org
>>>>
>>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>>>> ---
>>>>  doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
>>>>  lib/ethdev/ethdev_driver.c              |  8 +++++++
>>>>  lib/ethdev/ethdev_driver.h              | 10 ++++++++
>>>>  lib/ethdev/rte_ethdev.h                 | 32 +++++++++++++++----------
>>>>  lib/ethdev/version.map                  |  1 +
>>>>  5 files changed, 46 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
>>>> index c145a9066c..e380ff135a 100644
>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>>>> @@ -638,14 +638,9 @@ different from the application invokes recovery in PASSIVE mode,
>>>>  the PMD automatically recovers from error in PROACTIVE mode,
>>>>  and only a small amount of work is required for the application.
>>>>
>>>> -During error detection and automatic recovery,
>>>> -the PMD sets the data path pointers to dummy functions
>>>> -(which will prevent the crash),
>>>> -and also make sure the control path operations fail with a return code ``-EBUSY``.
>>>> -
>>>> -Because the PMD recovers automatically,
>>>> -the application can only sense that the data flow is disconnected for a while
>>>> -and the control API returns an error in this period.
>>>> +During error detection and automatic recovery, the PMD sets the data path
>>>> +pointers to dummy functions and also make sure the control path operations
>>>> +failed with a return code ``-EBUSY``.
>>>>
>>>>  In order to sense the error happening/recovering,
>>>>  as well as to restore some additional configuration,
>>>> @@ -653,9 +648,9 @@ three events are available:
>>>>
>>>>  ``RTE_ETH_EVENT_ERR_RECOVERING``
>>>>     Notify the application that an error is detected
>>>> -   and the recovery is being started.
>>>> +   and the recovery is about to start.
>>>>     Upon receiving the event, the application should not invoke
>>>> -   any control path function until receiving
>>>> +   any control and data path API until receiving
>>>>     ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
>>>>
>>>>  .. note::
>>>> @@ -666,8 +661,9 @@ three events are available:
>>>>
>>>>  ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
>>>>     Notify the application that the recovery from error is successful,
>>>> -   the PMD already re-configures the port,
>>>> -   and the effect is the same as a restart operation.
>>>> +   the PMD already re-configures the port.
>>>> +   The application should restore some additional configuration, and then
>>>> +   enable data path API invocation.
>>>>
>>>>  ``RTE_ETH_EVENT_RECOVERY_FAILED``
>>>>     Notify the application that the recovery from error failed,
>>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
>>>> index 0be1e8ca04..f994653fe9 100644
>>>> --- a/lib/ethdev/ethdev_driver.c
>>>> +++ b/lib/ethdev/ethdev_driver.c
>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev *dev, const char *ring_name,
>>>>  	return rc;
>>>>  }
>>>>
>>>> +void
>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
>>>> +{
>>>> +	if (dev == NULL)
>>>> +		return;
>>>> +	eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
>>>> +}
>>>> +
>>>>  const struct rte_memzone *
>>>>  rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char *ring_name,
>>>>  			 uint16_t queue_id, size_t size, unsigned int align,
>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
>>>> index 2c9d615fb5..0d964d1f67 100644
>>>> --- a/lib/ethdev/ethdev_driver.h
>>>> +++ b/lib/ethdev/ethdev_driver.h
>>>> @@ -1621,6 +1621,16 @@ int
>>>>  rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const char *name,
>>>>  		 uint16_t queue_id);
>>>>
>>>> +/**
>>>> + * @internal
>>>> + * Setup eth fast-path API to ethdev values.
>>>> + *
>>>> + * @param dev
>>>> + *  Pointer to struct rte_eth_dev.
>>>> + */
>>>> +__rte_internal
>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
>>>> +
>>>>  /**
>>>>   * @internal
>>>>   * Atomically set the link status for the specific device.
>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>>>> index 049641d57c..44ee7229c1 100644
>>>> --- a/lib/ethdev/rte_ethdev.h
>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
>>>>  	 */
>>>>  	RTE_ETH_EVENT_RX_AVAIL_THRESH,
>>>>  	/** Port recovering from a hardware or firmware error.
>>>> -	 * If PMD supports proactive error recovery,
>>>> -	 * it should trigger this event to notify application
>>>> -	 * that it detected an error and the recovery is being started.
>>>> -	 * Upon receiving the event, the application should not invoke any control path API
>>>> -	 * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until receiving
>>>> -	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED event.
>>>> -	 * The PMD will set the data path pointers to dummy functions,
>>>> -	 * and re-set the data path pointers to non-dummy functions
>>>> -	 * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>> -	 * It means that the application cannot send or receive any packets
>>>> -	 * during this period.
>>>> +	 *
>>>> +	 * If PMD supports proactive error recovery, it should trigger this
>>>> +	 * event to notify application that it detected an error and the
>>>> +	 * recovery is about to start.
>>>> +	 *
>>>> +	 * Upon receiving the event, the application should not invoke any
>>>> +	 * control and data path API until receiving
>>>> +	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
>>>> +	 * event.
>>>> +	 *
>>>> +	 * Once this event is reported, the PMD will set the data path pointers
>>>> +	 * to dummy functions, and re-set the data path pointers to valid
>>>> +	 * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>> +	 *
>>>>  	 * @note Before the PMD reports the recovery result,
>>>>  	 * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event again,
>>>>  	 * because a larger error may occur during the recovery.
>>>>  	 */
>>>>  	RTE_ETH_EVENT_ERR_RECOVERING,
>>>>  	/** Port recovers successfully from the error.
>>>> -	 * The PMD already re-configured the port,
>>>> -	 * and the effect is the same as a restart operation.
>>>> +	 *
>>>> +	 * The PMD already re-configured the port:
>>>>  	 * a) The following operation will be retained: (alphabetically)
>>>>  	 *    - DCB configuration
>>>>  	 *    - FEC configuration
>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
>>>>  	 *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
>>>>  	 * c) Any other configuration will not be stored
>>>>  	 *    and will need to be re-configured.
>>>> +	 *
>>>> +	 * The application should restore some additional configuration
>>>> +	 * (see above case b/c), and then enable data path API invocation.
>>>>  	 */
>>>>  	RTE_ETH_EVENT_RECOVERY_SUCCESS,
>>>>  	/** Port recovery failed.
>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
>>>> index 357d1a88c0..c273e0bdae 100644
>>>> --- a/lib/ethdev/version.map
>>>> +++ b/lib/ethdev/version.map
>>>> @@ -320,6 +320,7 @@ INTERNAL {
>>>>  	rte_eth_devices;
>>>>  	rte_eth_dma_zone_free;
>>>>  	rte_eth_dma_zone_reserve;
>>>> +	rte_eth_fp_ops_setup;
>>>>  	rte_eth_hairpin_queue_peer_bind;
>>>>  	rte_eth_hairpin_queue_peer_unbind;
>>>>  	rte_eth_hairpin_queue_peer_update;
>>>> --
>>>  
>>> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
>>>
>>>> 2.17.1
>>>
>>
>> .
>>
  
Ferruh Yigit March 6, 2023, 9:10 a.m. UTC | #11
On 3/6/2023 1:41 AM, fengchengwen wrote:
>> What if application is not interested in recovery modes at all and not
>> registered any callback for the recover>
> There's probably race-condition which may lead to crash, because DPDK worker
> threads runs busyloop and located on isolated core, and also PMDs add delay time,
> the actual probability of occurence is very very low, at least for HNS3 pmd it
> has not run out for at least four years.
> 

I understand the problem and why application needs to involve, but the
question is what will happen if application is not aware of this and not
handled this event, or ported from different NIC etc.
Do you want to make handling this event mandatory for each DPDK application?


Btw, what about my suggestion [1] to use different version of burst ops
update function in PMDs to prevent crash?

[1]
https://inbox.dpdk.org/dev/20230220060839.1267349-1-ashok.k.kaladi@intel.com/T/#m876b5c5312391557c952198561e6823473bce151
  
Konstantin Ananyev March 6, 2023, 10:22 a.m. UTC | #12
> >>>> In the proactive error handling mode, the PMD will set the data path
> >>>> pointers to dummy functions and then try recovery, in this period the
> >>>> application may still invoking data path API. This will introduce a
> >>>> race-condition with data path which may lead to crash [1].
> >>>>
> >>>> Although the PMD added delay after setting data path pointers to cover
> >>>> the above race-condition, it reduces the probability, but it doesn't
> >>>> solve the problem.
> >>>>
> >>>> To solve the race-condition problem fundamentally, the following
> >>>> requirements are added:
> >>>> 1. The PMD should set the data path pointers to dummy functions after
> >>>>     report RTE_ETH_EVENT_ERR_RECOVERING event.
> >>>> 2. The application should stop data path API invocation when process
> >>>>     the RTE_ETH_EVENT_ERR_RECOVERING event.
> >>>> 3. The PMD should set the data path pointers to valid functions before
> >>>>     report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>>> 4. The application should enable data path API invocation when process
> >>>>     the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>>>
> >>
> >> How this is solving the race-condition, by pushing responsibility to
> >> stop data path to application?
> >
> > Exactly, it becomes application responsibility to make sure data-path is
> > stopped/suspended before recovery will continue.
> >
> 
> From documentation of the feature:
> 
> ``
> Because the PMD recovers automatically,
> the application can only sense that the data flow is disconnected for a
> while and the control API returns an error in this period.
> 
> In order to sense the error happening/recovering, as well as to restore
> some additional configuration, three events are available:
> ``
> 
> It looks like initial design is to use events mainly inform application
> about what happened and mainly for re-configuration.
> 
> Although I am don't disagree to involve the application, I am not sure
> that is part of current design.

I thought we all agreed that initial design contain some fallacies that
need to fixed, no?
Statement that with current rte_ethdev design error recovery can be done
without interaction with the app (to stop/suspend data/control path)
is the main one I think.
It needs some interaction with app layer, one way or another. 

> >>
> >> What if application is not interested in recovery modes at all and not
> >> registered any callback for the recovery?
> >
> >
> > Are you saying there is no way for application to disable
> > automatic recovery in PMD if it is not interested
> > (or can't full-fill per-requesties for it)?
> > If so, then yes it is a problem and we need to fix it.
> > I assumed that such mechanism to disable unwanted events already exists,
> > but I can't find anything.
> > Wonder what would be the easiest way here - can PMD make a decision
> > based on callback return value, or do we need a new API to
> > enable/disable callbacks, or ...?
> >
> >
> 
> As far as I can see automatic recovery is not configurable by app.
> 
> But that is not all, PMD sends events to application but PMD can't know
> if application is handling them or not, so with current design PMD can't
> rely on to app.

Well, PMD invokes user provided callback.
One way to fix that problem - if there is no callback provided,
or callback returns an error code - PMD can assume that recovery
should not be done.
That is probably not the best design choice, but at least it will allow
to fix the problem without too many changes and introducing new API.
That could be sort of a 'quick fix'.
In a meanwhile we can think about new/better approach for that.    

> 
> >> I think driver should not rely on application for this, unless
> >> application explicitly says (to driver) that it is handling recovery,
> >> right now there is no way for driver to know this.
> >
> > I think it is visa-versa:
> > application should not enable auto-recovery if it can't meet
> > per-requeststies for it (provide appropriate callback).
> >
> 
> I agree on above, we are saying similar thing in different perspective.

Ok, that's good we are on the same page.
 

> 
> >
> >>
> >>>> Also, this patch introduce a driver internal function
> >>>> rte_eth_fp_ops_setup which used as an help function for PMD.
> >>>>
> >>>> [1]
> >>>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
> >>>>
> >>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
> >>>> Cc: stable@dpdk.org
> >>>>
> >>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> >>>> ---
> >>>>   doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
> >>>>   lib/ethdev/ethdev_driver.c              |  8 +++++++
> >>>>   lib/ethdev/ethdev_driver.h              | 10 ++++++++
> >>>>   lib/ethdev/rte_ethdev.h                 | 32
> >>>> +++++++++++++++----------
> >>>>   lib/ethdev/version.map                  |  1 +
> >>>>   5 files changed, 46 insertions(+), 25 deletions(-)
> >>>>
> >>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
> >>>> b/doc/guides/prog_guide/poll_mode_drv.rst
> >>>> index c145a9066c..e380ff135a 100644
> >>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> >>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> >>>> @@ -638,14 +638,9 @@ different from the application invokes recovery
> >>>> in PASSIVE mode,
> >>>>   the PMD automatically recovers from error in PROACTIVE mode,
> >>>>   and only a small amount of work is required for the application.
> >>>>
> >>>> -During error detection and automatic recovery,
> >>>> -the PMD sets the data path pointers to dummy functions
> >>>> -(which will prevent the crash),
> >>>> -and also make sure the control path operations fail with a return
> >>>> code ``-EBUSY``.
> >>>> -
> >>>> -Because the PMD recovers automatically,
> >>>> -the application can only sense that the data flow is disconnected
> >>>> for a while
> >>>> -and the control API returns an error in this period.
> >>>> +During error detection and automatic recovery, the PMD sets the
> >>>> data path
> >>>> +pointers to dummy functions and also make sure the control path
> >>>> operations
> >>>> +failed with a return code ``-EBUSY``.
> >>>>
> >>>>   In order to sense the error happening/recovering,
> >>>>   as well as to restore some additional configuration,
> >>>> @@ -653,9 +648,9 @@ three events are available:
> >>>>
> >>>>   ``RTE_ETH_EVENT_ERR_RECOVERING``
> >>>>      Notify the application that an error is detected
> >>>> -   and the recovery is being started.
> >>>> +   and the recovery is about to start.
> >>>>      Upon receiving the event, the application should not invoke
> >>>> -   any control path function until receiving
> >>>> +   any control and data path API until receiving
> >>>>      ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
> >>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
> >>>>
> >>>>   .. note::
> >>>> @@ -666,8 +661,9 @@ three events are available:
> >>>>
> >>>>   ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
> >>>>      Notify the application that the recovery from error is successful,
> >>>> -   the PMD already re-configures the port,
> >>>> -   and the effect is the same as a restart operation.
> >>>> +   the PMD already re-configures the port.
> >>>> +   The application should restore some additional configuration,
> >>>> and then
> >>>> +   enable data path API invocation.
> >>>>
> >>>>   ``RTE_ETH_EVENT_RECOVERY_FAILED``
> >>>>      Notify the application that the recovery from error failed,
> >>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
> >>>> index 0be1e8ca04..f994653fe9 100644
> >>>> --- a/lib/ethdev/ethdev_driver.c
> >>>> +++ b/lib/ethdev/ethdev_driver.c
> >>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev
> >>>> *dev, const char *ring_name,
> >>>>       return rc;
> >>>>   }
> >>>>
> >>>> +void
> >>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
> >>>> +{
> >>>> +    if (dev == NULL)
> >>>> +        return;
> >>>> +    eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
> >>>> +}
> >>>> +
> >>>>   const struct rte_memzone *
> >>>>   rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char
> >>>> *ring_name,
> >>>>                uint16_t queue_id, size_t size, unsigned int align,
> >>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> >>>> index 2c9d615fb5..0d964d1f67 100644
> >>>> --- a/lib/ethdev/ethdev_driver.h
> >>>> +++ b/lib/ethdev/ethdev_driver.h
> >>>> @@ -1621,6 +1621,16 @@ int
> >>>>   rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
> >>>> char *name,
> >>>>            uint16_t queue_id);
> >>>>
> >>>> +/**
> >>>> + * @internal
> >>>> + * Setup eth fast-path API to ethdev values.
> >>>> + *
> >>>> + * @param dev
> >>>> + *  Pointer to struct rte_eth_dev.
> >>>> + */
> >>>> +__rte_internal
> >>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
> >>>> +
> >>>>   /**
> >>>>    * @internal
> >>>>    * Atomically set the link status for the specific device.
> >>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> >>>> index 049641d57c..44ee7229c1 100644
> >>>> --- a/lib/ethdev/rte_ethdev.h
> >>>> +++ b/lib/ethdev/rte_ethdev.h
> >>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
> >>>>        */
> >>>>       RTE_ETH_EVENT_RX_AVAIL_THRESH,
> >>>>       /** Port recovering from a hardware or firmware error.
> >>>> -     * If PMD supports proactive error recovery,
> >>>> -     * it should trigger this event to notify application
> >>>> -     * that it detected an error and the recovery is being started.
> >>>> -     * Upon receiving the event, the application should not invoke
> >>>> any control path API
> >>>> -     * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until
> >>>> receiving
> >>>> -     * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> >>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
> >>>> -     * The PMD will set the data path pointers to dummy functions,
> >>>> -     * and re-set the data path pointers to non-dummy functions
> >>>> -     * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>>> -     * It means that the application cannot send or receive any
> >>>> packets
> >>>> -     * during this period.
> >>>> +     *
> >>>> +     * If PMD supports proactive error recovery, it should trigger
> >>>> this
> >>>> +     * event to notify application that it detected an error and the
> >>>> +     * recovery is about to start.
> >>>> +     *
> >>>> +     * Upon receiving the event, the application should not invoke any
> >>>> +     * control and data path API until receiving
> >>>> +     * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
> >>>> +     * event.
> >>>> +     *
> >>>> +     * Once this event is reported, the PMD will set the data path
> >>>> pointers
> >>>> +     * to dummy functions, and re-set the data path pointers to valid
> >>>> +     * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
> >>>> event.
> >>>> +     *
> >>>>        * @note Before the PMD reports the recovery result,
> >>>>        * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
> >>>> again,
> >>>>        * because a larger error may occur during the recovery.
> >>>>        */
> >>>>       RTE_ETH_EVENT_ERR_RECOVERING,
> >>>>       /** Port recovers successfully from the error.
> >>>> -     * The PMD already re-configured the port,
> >>>> -     * and the effect is the same as a restart operation.
> >>>> +     *
> >>>> +     * The PMD already re-configured the port:
> >>>>        * a) The following operation will be retained: (alphabetically)
> >>>>        *    - DCB configuration
> >>>>        *    - FEC configuration
> >>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
> >>>>        *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
> >>>>        * c) Any other configuration will not be stored
> >>>>        *    and will need to be re-configured.
> >>>> +     *
> >>>> +     * The application should restore some additional configuration
> >>>> +     * (see above case b/c), and then enable data path API invocation.
> >>>>        */
> >>>>       RTE_ETH_EVENT_RECOVERY_SUCCESS,
> >>>>       /** Port recovery failed.
> >>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> >>>> index 357d1a88c0..c273e0bdae 100644
> >>>> --- a/lib/ethdev/version.map
> >>>> +++ b/lib/ethdev/version.map
> >>>> @@ -320,6 +320,7 @@ INTERNAL {
> >>>>       rte_eth_devices;
> >>>>       rte_eth_dma_zone_free;
> >>>>       rte_eth_dma_zone_reserve;
> >>>> +    rte_eth_fp_ops_setup;
> >>>>       rte_eth_hairpin_queue_peer_bind;
> >>>>       rte_eth_hairpin_queue_peer_unbind;
> >>>>       rte_eth_hairpin_queue_peer_update;
> >>>> --
> >>>   Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> >>>
> >>>> 2.17.1
> >>>
> >>
> >
  
Ferruh Yigit March 6, 2023, 11 a.m. UTC | #13
On 3/6/2023 10:22 AM, Konstantin Ananyev wrote:
> 
> 
>>>>>> In the proactive error handling mode, the PMD will set the data path
>>>>>> pointers to dummy functions and then try recovery, in this period the
>>>>>> application may still invoking data path API. This will introduce a
>>>>>> race-condition with data path which may lead to crash [1].
>>>>>>
>>>>>> Although the PMD added delay after setting data path pointers to cover
>>>>>> the above race-condition, it reduces the probability, but it doesn't
>>>>>> solve the problem.
>>>>>>
>>>>>> To solve the race-condition problem fundamentally, the following
>>>>>> requirements are added:
>>>>>> 1. The PMD should set the data path pointers to dummy functions after
>>>>>>     report RTE_ETH_EVENT_ERR_RECOVERING event.
>>>>>> 2. The application should stop data path API invocation when process
>>>>>>     the RTE_ETH_EVENT_ERR_RECOVERING event.
>>>>>> 3. The PMD should set the data path pointers to valid functions before
>>>>>>     report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>>> 4. The application should enable data path API invocation when process
>>>>>>     the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>>>
>>>>
>>>> How this is solving the race-condition, by pushing responsibility to
>>>> stop data path to application?
>>>
>>> Exactly, it becomes application responsibility to make sure data-path is
>>> stopped/suspended before recovery will continue.
>>>
>>
>> From documentation of the feature:
>>
>> ``
>> Because the PMD recovers automatically,
>> the application can only sense that the data flow is disconnected for a
>> while and the control API returns an error in this period.
>>
>> In order to sense the error happening/recovering, as well as to restore
>> some additional configuration, three events are available:
>> ``
>>
>> It looks like initial design is to use events mainly inform application
>> about what happened and mainly for re-configuration.
>>
>> Although I am don't disagree to involve the application, I am not sure
>> that is part of current design.
> 
> I thought we all agreed that initial design contain some fallacies that
> need to fixed, no?
> Statement that with current rte_ethdev design error recovery can be done
> without interaction with the app (to stop/suspend data/control path)
> is the main one I think.
> It needs some interaction with app layer, one way or another. 
> 
>>>>
>>>> What if application is not interested in recovery modes at all and not
>>>> registered any callback for the recovery?
>>>
>>>
>>> Are you saying there is no way for application to disable
>>> automatic recovery in PMD if it is not interested
>>> (or can't full-fill per-requesties for it)?
>>> If so, then yes it is a problem and we need to fix it.
>>> I assumed that such mechanism to disable unwanted events already exists,
>>> but I can't find anything.
>>> Wonder what would be the easiest way here - can PMD make a decision
>>> based on callback return value, or do we need a new API to
>>> enable/disable callbacks, or ...?
>>>
>>>
>>
>> As far as I can see automatic recovery is not configurable by app.
>>
>> But that is not all, PMD sends events to application but PMD can't know
>> if application is handling them or not, so with current design PMD can't
>> rely on to app.
> 
> Well, PMD invokes user provided callback.
> One way to fix that problem - if there is no callback provided,
> or callback returns an error code - PMD can assume that recovery
> should not be done.
> That is probably not the best design choice, but at least it will allow
> to fix the problem without too many changes and introducing new API.
> That could be sort of a 'quick fix'.
> In a meanwhile we can think about new/better approach for that.    
> 

-rc2 for 23.03 is a few days away.

What do you think to have 'quick fix' as modifying how driver updates
burst ops to prevent the race condition, for this release?

And plan a design update for the next release?


>>
>>>> I think driver should not rely on application for this, unless
>>>> application explicitly says (to driver) that it is handling recovery,
>>>> right now there is no way for driver to know this.
>>>
>>> I think it is visa-versa:
>>> application should not enable auto-recovery if it can't meet
>>> per-requeststies for it (provide appropriate callback).
>>>
>>
>> I agree on above, we are saying similar thing in different perspective.
> 
> Ok, that's good we are on the same page.
>  
> 
>>
>>>
>>>>
>>>>>> Also, this patch introduce a driver internal function
>>>>>> rte_eth_fp_ops_setup which used as an help function for PMD.
>>>>>>
>>>>>> [1]
>>>>>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
>>>>>>
>>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
>>>>>> Cc: stable@dpdk.org
>>>>>>
>>>>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>>>>>> ---
>>>>>>   doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
>>>>>>   lib/ethdev/ethdev_driver.c              |  8 +++++++
>>>>>>   lib/ethdev/ethdev_driver.h              | 10 ++++++++
>>>>>>   lib/ethdev/rte_ethdev.h                 | 32
>>>>>> +++++++++++++++----------
>>>>>>   lib/ethdev/version.map                  |  1 +
>>>>>>   5 files changed, 46 insertions(+), 25 deletions(-)
>>>>>>
>>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>> index c145a9066c..e380ff135a 100644
>>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>> @@ -638,14 +638,9 @@ different from the application invokes recovery
>>>>>> in PASSIVE mode,
>>>>>>   the PMD automatically recovers from error in PROACTIVE mode,
>>>>>>   and only a small amount of work is required for the application.
>>>>>>
>>>>>> -During error detection and automatic recovery,
>>>>>> -the PMD sets the data path pointers to dummy functions
>>>>>> -(which will prevent the crash),
>>>>>> -and also make sure the control path operations fail with a return
>>>>>> code ``-EBUSY``.
>>>>>> -
>>>>>> -Because the PMD recovers automatically,
>>>>>> -the application can only sense that the data flow is disconnected
>>>>>> for a while
>>>>>> -and the control API returns an error in this period.
>>>>>> +During error detection and automatic recovery, the PMD sets the
>>>>>> data path
>>>>>> +pointers to dummy functions and also make sure the control path
>>>>>> operations
>>>>>> +failed with a return code ``-EBUSY``.
>>>>>>
>>>>>>   In order to sense the error happening/recovering,
>>>>>>   as well as to restore some additional configuration,
>>>>>> @@ -653,9 +648,9 @@ three events are available:
>>>>>>
>>>>>>   ``RTE_ETH_EVENT_ERR_RECOVERING``
>>>>>>      Notify the application that an error is detected
>>>>>> -   and the recovery is being started.
>>>>>> +   and the recovery is about to start.
>>>>>>      Upon receiving the event, the application should not invoke
>>>>>> -   any control path function until receiving
>>>>>> +   any control and data path API until receiving
>>>>>>      ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
>>>>>>
>>>>>>   .. note::
>>>>>> @@ -666,8 +661,9 @@ three events are available:
>>>>>>
>>>>>>   ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
>>>>>>      Notify the application that the recovery from error is successful,
>>>>>> -   the PMD already re-configures the port,
>>>>>> -   and the effect is the same as a restart operation.
>>>>>> +   the PMD already re-configures the port.
>>>>>> +   The application should restore some additional configuration,
>>>>>> and then
>>>>>> +   enable data path API invocation.
>>>>>>
>>>>>>   ``RTE_ETH_EVENT_RECOVERY_FAILED``
>>>>>>      Notify the application that the recovery from error failed,
>>>>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
>>>>>> index 0be1e8ca04..f994653fe9 100644
>>>>>> --- a/lib/ethdev/ethdev_driver.c
>>>>>> +++ b/lib/ethdev/ethdev_driver.c
>>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev
>>>>>> *dev, const char *ring_name,
>>>>>>       return rc;
>>>>>>   }
>>>>>>
>>>>>> +void
>>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
>>>>>> +{
>>>>>> +    if (dev == NULL)
>>>>>> +        return;
>>>>>> +    eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
>>>>>> +}
>>>>>> +
>>>>>>   const struct rte_memzone *
>>>>>>   rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char
>>>>>> *ring_name,
>>>>>>                uint16_t queue_id, size_t size, unsigned int align,
>>>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
>>>>>> index 2c9d615fb5..0d964d1f67 100644
>>>>>> --- a/lib/ethdev/ethdev_driver.h
>>>>>> +++ b/lib/ethdev/ethdev_driver.h
>>>>>> @@ -1621,6 +1621,16 @@ int
>>>>>>   rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
>>>>>> char *name,
>>>>>>            uint16_t queue_id);
>>>>>>
>>>>>> +/**
>>>>>> + * @internal
>>>>>> + * Setup eth fast-path API to ethdev values.
>>>>>> + *
>>>>>> + * @param dev
>>>>>> + *  Pointer to struct rte_eth_dev.
>>>>>> + */
>>>>>> +__rte_internal
>>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
>>>>>> +
>>>>>>   /**
>>>>>>    * @internal
>>>>>>    * Atomically set the link status for the specific device.
>>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>>>>>> index 049641d57c..44ee7229c1 100644
>>>>>> --- a/lib/ethdev/rte_ethdev.h
>>>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
>>>>>>        */
>>>>>>       RTE_ETH_EVENT_RX_AVAIL_THRESH,
>>>>>>       /** Port recovering from a hardware or firmware error.
>>>>>> -     * If PMD supports proactive error recovery,
>>>>>> -     * it should trigger this event to notify application
>>>>>> -     * that it detected an error and the recovery is being started.
>>>>>> -     * Upon receiving the event, the application should not invoke
>>>>>> any control path API
>>>>>> -     * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until
>>>>>> receiving
>>>>>> -     * RTE_ETH_EVENT_RECOVERY_SUCCESS or
>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
>>>>>> -     * The PMD will set the data path pointers to dummy functions,
>>>>>> -     * and re-set the data path pointers to non-dummy functions
>>>>>> -     * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>>> -     * It means that the application cannot send or receive any
>>>>>> packets
>>>>>> -     * during this period.
>>>>>> +     *
>>>>>> +     * If PMD supports proactive error recovery, it should trigger
>>>>>> this
>>>>>> +     * event to notify application that it detected an error and the
>>>>>> +     * recovery is about to start.
>>>>>> +     *
>>>>>> +     * Upon receiving the event, the application should not invoke any
>>>>>> +     * control and data path API until receiving
>>>>>> +     * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
>>>>>> +     * event.
>>>>>> +     *
>>>>>> +     * Once this event is reported, the PMD will set the data path
>>>>>> pointers
>>>>>> +     * to dummy functions, and re-set the data path pointers to valid
>>>>>> +     * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
>>>>>> event.
>>>>>> +     *
>>>>>>        * @note Before the PMD reports the recovery result,
>>>>>>        * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
>>>>>> again,
>>>>>>        * because a larger error may occur during the recovery.
>>>>>>        */
>>>>>>       RTE_ETH_EVENT_ERR_RECOVERING,
>>>>>>       /** Port recovers successfully from the error.
>>>>>> -     * The PMD already re-configured the port,
>>>>>> -     * and the effect is the same as a restart operation.
>>>>>> +     *
>>>>>> +     * The PMD already re-configured the port:
>>>>>>        * a) The following operation will be retained: (alphabetically)
>>>>>>        *    - DCB configuration
>>>>>>        *    - FEC configuration
>>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
>>>>>>        *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
>>>>>>        * c) Any other configuration will not be stored
>>>>>>        *    and will need to be re-configured.
>>>>>> +     *
>>>>>> +     * The application should restore some additional configuration
>>>>>> +     * (see above case b/c), and then enable data path API invocation.
>>>>>>        */
>>>>>>       RTE_ETH_EVENT_RECOVERY_SUCCESS,
>>>>>>       /** Port recovery failed.
>>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
>>>>>> index 357d1a88c0..c273e0bdae 100644
>>>>>> --- a/lib/ethdev/version.map
>>>>>> +++ b/lib/ethdev/version.map
>>>>>> @@ -320,6 +320,7 @@ INTERNAL {
>>>>>>       rte_eth_devices;
>>>>>>       rte_eth_dma_zone_free;
>>>>>>       rte_eth_dma_zone_reserve;
>>>>>> +    rte_eth_fp_ops_setup;
>>>>>>       rte_eth_hairpin_queue_peer_bind;
>>>>>>       rte_eth_hairpin_queue_peer_unbind;
>>>>>>       rte_eth_hairpin_queue_peer_update;
>>>>>> --
>>>>>   Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
>>>>>
>>>>>> 2.17.1
>>>>>
>>>>
>>>
>
  
Ajit Khaparde March 6, 2023, 11:05 a.m. UTC | #14
On Mon, Mar 6, 2023 at 3:00 AM Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>
> On 3/6/2023 10:22 AM, Konstantin Ananyev wrote:
> >
> >
> >>>>>> In the proactive error handling mode, the PMD will set the data path
> >>>>>> pointers to dummy functions and then try recovery, in this period the
> >>>>>> application may still invoking data path API. This will introduce a
> >>>>>> race-condition with data path which may lead to crash [1].
> >>>>>>
> >>>>>> Although the PMD added delay after setting data path pointers to cover
> >>>>>> the above race-condition, it reduces the probability, but it doesn't
> >>>>>> solve the problem.
> >>>>>>
> >>>>>> To solve the race-condition problem fundamentally, the following
> >>>>>> requirements are added:
> >>>>>> 1. The PMD should set the data path pointers to dummy functions after
> >>>>>>     report RTE_ETH_EVENT_ERR_RECOVERING event.
> >>>>>> 2. The application should stop data path API invocation when process
> >>>>>>     the RTE_ETH_EVENT_ERR_RECOVERING event.
> >>>>>> 3. The PMD should set the data path pointers to valid functions before
> >>>>>>     report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>>>>> 4. The application should enable data path API invocation when process
> >>>>>>     the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>>>>>
> >>>>
> >>>> How this is solving the race-condition, by pushing responsibility to
> >>>> stop data path to application?
> >>>
> >>> Exactly, it becomes application responsibility to make sure data-path is
> >>> stopped/suspended before recovery will continue.
> >>>
> >>
> >> From documentation of the feature:
> >>
> >> ``
> >> Because the PMD recovers automatically,
> >> the application can only sense that the data flow is disconnected for a
> >> while and the control API returns an error in this period.
> >>
> >> In order to sense the error happening/recovering, as well as to restore
> >> some additional configuration, three events are available:
> >> ``
> >>
> >> It looks like initial design is to use events mainly inform application
> >> about what happened and mainly for re-configuration.
> >>
> >> Although I am don't disagree to involve the application, I am not sure
> >> that is part of current design.
> >
> > I thought we all agreed that initial design contain some fallacies that
> > need to fixed, no?
> > Statement that with current rte_ethdev design error recovery can be done
> > without interaction with the app (to stop/suspend data/control path)
> > is the main one I think.
> > It needs some interaction with app layer, one way or another.
> >
> >>>>
> >>>> What if application is not interested in recovery modes at all and not
> >>>> registered any callback for the recovery?
> >>>
> >>>
> >>> Are you saying there is no way for application to disable
> >>> automatic recovery in PMD if it is not interested
> >>> (or can't full-fill per-requesties for it)?
> >>> If so, then yes it is a problem and we need to fix it.
> >>> I assumed that such mechanism to disable unwanted events already exists,
> >>> but I can't find anything.
> >>> Wonder what would be the easiest way here - can PMD make a decision
> >>> based on callback return value, or do we need a new API to
> >>> enable/disable callbacks, or ...?
> >>>
> >>>
> >>
> >> As far as I can see automatic recovery is not configurable by app.
> >>
> >> But that is not all, PMD sends events to application but PMD can't know
> >> if application is handling them or not, so with current design PMD can't
> >> rely on to app.
> >
> > Well, PMD invokes user provided callback.
> > One way to fix that problem - if there is no callback provided,
> > or callback returns an error code - PMD can assume that recovery
> > should not be done.
> > That is probably not the best design choice, but at least it will allow
> > to fix the problem without too many changes and introducing new API.
> > That could be sort of a 'quick fix'.
> > In a meanwhile we can think about new/better approach for that.
> >
>
> -rc2 for 23.03 is a few days away.
>
> What do you think to have 'quick fix' as modifying how driver updates
> burst ops to prevent the race condition, for this release?
>
> And plan a design update for the next release?
+1 on the overall approach.

>
>
> >>
> >>>> I think driver should not rely on application for this, unless
> >>>> application explicitly says (to driver) that it is handling recovery,
> >>>> right now there is no way for driver to know this.
> >>>
> >>> I think it is visa-versa:
> >>> application should not enable auto-recovery if it can't meet
> >>> per-requeststies for it (provide appropriate callback).
> >>>
> >>
> >> I agree on above, we are saying similar thing in different perspective.
> >
> > Ok, that's good we are on the same page.
> >
> >
> >>
> >>>
> >>>>
> >>>>>> Also, this patch introduce a driver internal function
> >>>>>> rte_eth_fp_ops_setup which used as an help function for PMD.
> >>>>>>
> >>>>>> [1]
> >>>>>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
> >>>>>>
> >>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
> >>>>>> Cc: stable@dpdk.org
> >>>>>>
> >>>>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> >>>>>> ---
> >>>>>>   doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
> >>>>>>   lib/ethdev/ethdev_driver.c              |  8 +++++++
> >>>>>>   lib/ethdev/ethdev_driver.h              | 10 ++++++++
> >>>>>>   lib/ethdev/rte_ethdev.h                 | 32
> >>>>>> +++++++++++++++----------
> >>>>>>   lib/ethdev/version.map                  |  1 +
> >>>>>>   5 files changed, 46 insertions(+), 25 deletions(-)
> >>>>>>
> >>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
> >>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst
> >>>>>> index c145a9066c..e380ff135a 100644
> >>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> >>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> >>>>>> @@ -638,14 +638,9 @@ different from the application invokes recovery
> >>>>>> in PASSIVE mode,
> >>>>>>   the PMD automatically recovers from error in PROACTIVE mode,
> >>>>>>   and only a small amount of work is required for the application.
> >>>>>>
> >>>>>> -During error detection and automatic recovery,
> >>>>>> -the PMD sets the data path pointers to dummy functions
> >>>>>> -(which will prevent the crash),
> >>>>>> -and also make sure the control path operations fail with a return
> >>>>>> code ``-EBUSY``.
> >>>>>> -
> >>>>>> -Because the PMD recovers automatically,
> >>>>>> -the application can only sense that the data flow is disconnected
> >>>>>> for a while
> >>>>>> -and the control API returns an error in this period.
> >>>>>> +During error detection and automatic recovery, the PMD sets the
> >>>>>> data path
> >>>>>> +pointers to dummy functions and also make sure the control path
> >>>>>> operations
> >>>>>> +failed with a return code ``-EBUSY``.
> >>>>>>
> >>>>>>   In order to sense the error happening/recovering,
> >>>>>>   as well as to restore some additional configuration,
> >>>>>> @@ -653,9 +648,9 @@ three events are available:
> >>>>>>
> >>>>>>   ``RTE_ETH_EVENT_ERR_RECOVERING``
> >>>>>>      Notify the application that an error is detected
> >>>>>> -   and the recovery is being started.
> >>>>>> +   and the recovery is about to start.
> >>>>>>      Upon receiving the event, the application should not invoke
> >>>>>> -   any control path function until receiving
> >>>>>> +   any control and data path API until receiving
> >>>>>>      ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
> >>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
> >>>>>>
> >>>>>>   .. note::
> >>>>>> @@ -666,8 +661,9 @@ three events are available:
> >>>>>>
> >>>>>>   ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
> >>>>>>      Notify the application that the recovery from error is successful,
> >>>>>> -   the PMD already re-configures the port,
> >>>>>> -   and the effect is the same as a restart operation.
> >>>>>> +   the PMD already re-configures the port.
> >>>>>> +   The application should restore some additional configuration,
> >>>>>> and then
> >>>>>> +   enable data path API invocation.
> >>>>>>
> >>>>>>   ``RTE_ETH_EVENT_RECOVERY_FAILED``
> >>>>>>      Notify the application that the recovery from error failed,
> >>>>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
> >>>>>> index 0be1e8ca04..f994653fe9 100644
> >>>>>> --- a/lib/ethdev/ethdev_driver.c
> >>>>>> +++ b/lib/ethdev/ethdev_driver.c
> >>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev
> >>>>>> *dev, const char *ring_name,
> >>>>>>       return rc;
> >>>>>>   }
> >>>>>>
> >>>>>> +void
> >>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
> >>>>>> +{
> >>>>>> +    if (dev == NULL)
> >>>>>> +        return;
> >>>>>> +    eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
> >>>>>> +}
> >>>>>> +
> >>>>>>   const struct rte_memzone *
> >>>>>>   rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char
> >>>>>> *ring_name,
> >>>>>>                uint16_t queue_id, size_t size, unsigned int align,
> >>>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> >>>>>> index 2c9d615fb5..0d964d1f67 100644
> >>>>>> --- a/lib/ethdev/ethdev_driver.h
> >>>>>> +++ b/lib/ethdev/ethdev_driver.h
> >>>>>> @@ -1621,6 +1621,16 @@ int
> >>>>>>   rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
> >>>>>> char *name,
> >>>>>>            uint16_t queue_id);
> >>>>>>
> >>>>>> +/**
> >>>>>> + * @internal
> >>>>>> + * Setup eth fast-path API to ethdev values.
> >>>>>> + *
> >>>>>> + * @param dev
> >>>>>> + *  Pointer to struct rte_eth_dev.
> >>>>>> + */
> >>>>>> +__rte_internal
> >>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
> >>>>>> +
> >>>>>>   /**
> >>>>>>    * @internal
> >>>>>>    * Atomically set the link status for the specific device.
> >>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> >>>>>> index 049641d57c..44ee7229c1 100644
> >>>>>> --- a/lib/ethdev/rte_ethdev.h
> >>>>>> +++ b/lib/ethdev/rte_ethdev.h
> >>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
> >>>>>>        */
> >>>>>>       RTE_ETH_EVENT_RX_AVAIL_THRESH,
> >>>>>>       /** Port recovering from a hardware or firmware error.
> >>>>>> -     * If PMD supports proactive error recovery,
> >>>>>> -     * it should trigger this event to notify application
> >>>>>> -     * that it detected an error and the recovery is being started.
> >>>>>> -     * Upon receiving the event, the application should not invoke
> >>>>>> any control path API
> >>>>>> -     * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until
> >>>>>> receiving
> >>>>>> -     * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> >>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
> >>>>>> -     * The PMD will set the data path pointers to dummy functions,
> >>>>>> -     * and re-set the data path pointers to non-dummy functions
> >>>>>> -     * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>>>>> -     * It means that the application cannot send or receive any
> >>>>>> packets
> >>>>>> -     * during this period.
> >>>>>> +     *
> >>>>>> +     * If PMD supports proactive error recovery, it should trigger
> >>>>>> this
> >>>>>> +     * event to notify application that it detected an error and the
> >>>>>> +     * recovery is about to start.
> >>>>>> +     *
> >>>>>> +     * Upon receiving the event, the application should not invoke any
> >>>>>> +     * control and data path API until receiving
> >>>>>> +     * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
> >>>>>> +     * event.
> >>>>>> +     *
> >>>>>> +     * Once this event is reported, the PMD will set the data path
> >>>>>> pointers
> >>>>>> +     * to dummy functions, and re-set the data path pointers to valid
> >>>>>> +     * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
> >>>>>> event.
> >>>>>> +     *
> >>>>>>        * @note Before the PMD reports the recovery result,
> >>>>>>        * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
> >>>>>> again,
> >>>>>>        * because a larger error may occur during the recovery.
> >>>>>>        */
> >>>>>>       RTE_ETH_EVENT_ERR_RECOVERING,
> >>>>>>       /** Port recovers successfully from the error.
> >>>>>> -     * The PMD already re-configured the port,
> >>>>>> -     * and the effect is the same as a restart operation.
> >>>>>> +     *
> >>>>>> +     * The PMD already re-configured the port:
> >>>>>>        * a) The following operation will be retained: (alphabetically)
> >>>>>>        *    - DCB configuration
> >>>>>>        *    - FEC configuration
> >>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
> >>>>>>        *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
> >>>>>>        * c) Any other configuration will not be stored
> >>>>>>        *    and will need to be re-configured.
> >>>>>> +     *
> >>>>>> +     * The application should restore some additional configuration
> >>>>>> +     * (see above case b/c), and then enable data path API invocation.
> >>>>>>        */
> >>>>>>       RTE_ETH_EVENT_RECOVERY_SUCCESS,
> >>>>>>       /** Port recovery failed.
> >>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> >>>>>> index 357d1a88c0..c273e0bdae 100644
> >>>>>> --- a/lib/ethdev/version.map
> >>>>>> +++ b/lib/ethdev/version.map
> >>>>>> @@ -320,6 +320,7 @@ INTERNAL {
> >>>>>>       rte_eth_devices;
> >>>>>>       rte_eth_dma_zone_free;
> >>>>>>       rte_eth_dma_zone_reserve;
> >>>>>> +    rte_eth_fp_ops_setup;
> >>>>>>       rte_eth_hairpin_queue_peer_bind;
> >>>>>>       rte_eth_hairpin_queue_peer_unbind;
> >>>>>>       rte_eth_hairpin_queue_peer_update;
> >>>>>> --
> >>>>>   Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> >>>>>
> >>>>>> 2.17.1
> >>>>>
> >>>>
> >>>
> >
>
  
Konstantin Ananyev March 6, 2023, 11:13 a.m. UTC | #15
> > >>>>>> In the proactive error handling mode, the PMD will set the data path
> > >>>>>> pointers to dummy functions and then try recovery, in this period the
> > >>>>>> application may still invoking data path API. This will introduce a
> > >>>>>> race-condition with data path which may lead to crash [1].
> > >>>>>>
> > >>>>>> Although the PMD added delay after setting data path pointers to cover
> > >>>>>> the above race-condition, it reduces the probability, but it doesn't
> > >>>>>> solve the problem.
> > >>>>>>
> > >>>>>> To solve the race-condition problem fundamentally, the following
> > >>>>>> requirements are added:
> > >>>>>> 1. The PMD should set the data path pointers to dummy functions after
> > >>>>>>     report RTE_ETH_EVENT_ERR_RECOVERING event.
> > >>>>>> 2. The application should stop data path API invocation when process
> > >>>>>>     the RTE_ETH_EVENT_ERR_RECOVERING event.
> > >>>>>> 3. The PMD should set the data path pointers to valid functions before
> > >>>>>>     report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >>>>>> 4. The application should enable data path API invocation when process
> > >>>>>>     the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >>>>>>
> > >>>>
> > >>>> How this is solving the race-condition, by pushing responsibility to
> > >>>> stop data path to application?
> > >>>
> > >>> Exactly, it becomes application responsibility to make sure data-path is
> > >>> stopped/suspended before recovery will continue.
> > >>>
> > >>
> > >> From documentation of the feature:
> > >>
> > >> ``
> > >> Because the PMD recovers automatically,
> > >> the application can only sense that the data flow is disconnected for a
> > >> while and the control API returns an error in this period.
> > >>
> > >> In order to sense the error happening/recovering, as well as to restore
> > >> some additional configuration, three events are available:
> > >> ``
> > >>
> > >> It looks like initial design is to use events mainly inform application
> > >> about what happened and mainly for re-configuration.
> > >>
> > >> Although I am don't disagree to involve the application, I am not sure
> > >> that is part of current design.
> > >
> > > I thought we all agreed that initial design contain some fallacies that
> > > need to fixed, no?
> > > Statement that with current rte_ethdev design error recovery can be done
> > > without interaction with the app (to stop/suspend data/control path)
> > > is the main one I think.
> > > It needs some interaction with app layer, one way or another.
> > >
> > >>>>
> > >>>> What if application is not interested in recovery modes at all and not
> > >>>> registered any callback for the recovery?
> > >>>
> > >>>
> > >>> Are you saying there is no way for application to disable
> > >>> automatic recovery in PMD if it is not interested
> > >>> (or can't full-fill per-requesties for it)?
> > >>> If so, then yes it is a problem and we need to fix it.
> > >>> I assumed that such mechanism to disable unwanted events already exists,
> > >>> but I can't find anything.
> > >>> Wonder what would be the easiest way here - can PMD make a decision
> > >>> based on callback return value, or do we need a new API to
> > >>> enable/disable callbacks, or ...?
> > >>>
> > >>>
> > >>
> > >> As far as I can see automatic recovery is not configurable by app.
> > >>
> > >> But that is not all, PMD sends events to application but PMD can't know
> > >> if application is handling them or not, so with current design PMD can't
> > >> rely on to app.
> > >
> > > Well, PMD invokes user provided callback.
> > > One way to fix that problem - if there is no callback provided,
> > > or callback returns an error code - PMD can assume that recovery
> > > should not be done.
> > > That is probably not the best design choice, but at least it will allow
> > > to fix the problem without too many changes and introducing new API.
> > > That could be sort of a 'quick fix'.
> > > In a meanwhile we can think about new/better approach for that.
> > >
> >
> > -rc2 for 23.03 is a few days away.
> >
> > What do you think to have 'quick fix' as modifying how driver updates
> > burst ops to prevent the race condition, for this release?
> >
> > And plan a design update for the next release?
> +1 on the overall approach.

Yep, agree.
 
> 
> >
> >
> > >>
> > >>>> I think driver should not rely on application for this, unless
> > >>>> application explicitly says (to driver) that it is handling recovery,
> > >>>> right now there is no way for driver to know this.
> > >>>
> > >>> I think it is visa-versa:
> > >>> application should not enable auto-recovery if it can't meet
> > >>> per-requeststies for it (provide appropriate callback).
> > >>>
> > >>
> > >> I agree on above, we are saying similar thing in different perspective.
> > >
> > > Ok, that's good we are on the same page.
> > >
> > >
> > >>
> > >>>
> > >>>>
> > >>>>>> Also, this patch introduce a driver internal function
> > >>>>>> rte_eth_fp_ops_setup which used as an help function for PMD.
> > >>>>>>
> > >>>>>> [1]
> > >>>>>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
> > >>>>>>
> > >>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
> > >>>>>> Cc: stable@dpdk.org
> > >>>>>>
> > >>>>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> > >>>>>> ---
> > >>>>>>   doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
> > >>>>>>   lib/ethdev/ethdev_driver.c              |  8 +++++++
> > >>>>>>   lib/ethdev/ethdev_driver.h              | 10 ++++++++
> > >>>>>>   lib/ethdev/rte_ethdev.h                 | 32
> > >>>>>> +++++++++++++++----------
> > >>>>>>   lib/ethdev/version.map                  |  1 +
> > >>>>>>   5 files changed, 46 insertions(+), 25 deletions(-)
> > >>>>>>
> > >>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>>>> index c145a9066c..e380ff135a 100644
> > >>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>>>> @@ -638,14 +638,9 @@ different from the application invokes recovery
> > >>>>>> in PASSIVE mode,
> > >>>>>>   the PMD automatically recovers from error in PROACTIVE mode,
> > >>>>>>   and only a small amount of work is required for the application.
> > >>>>>>
> > >>>>>> -During error detection and automatic recovery,
> > >>>>>> -the PMD sets the data path pointers to dummy functions
> > >>>>>> -(which will prevent the crash),
> > >>>>>> -and also make sure the control path operations fail with a return
> > >>>>>> code ``-EBUSY``.
> > >>>>>> -
> > >>>>>> -Because the PMD recovers automatically,
> > >>>>>> -the application can only sense that the data flow is disconnected
> > >>>>>> for a while
> > >>>>>> -and the control API returns an error in this period.
> > >>>>>> +During error detection and automatic recovery, the PMD sets the
> > >>>>>> data path
> > >>>>>> +pointers to dummy functions and also make sure the control path
> > >>>>>> operations
> > >>>>>> +failed with a return code ``-EBUSY``.
> > >>>>>>
> > >>>>>>   In order to sense the error happening/recovering,
> > >>>>>>   as well as to restore some additional configuration,
> > >>>>>> @@ -653,9 +648,9 @@ three events are available:
> > >>>>>>
> > >>>>>>   ``RTE_ETH_EVENT_ERR_RECOVERING``
> > >>>>>>      Notify the application that an error is detected
> > >>>>>> -   and the recovery is being started.
> > >>>>>> +   and the recovery is about to start.
> > >>>>>>      Upon receiving the event, the application should not invoke
> > >>>>>> -   any control path function until receiving
> > >>>>>> +   any control and data path API until receiving
> > >>>>>>      ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
> > >>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
> > >>>>>>
> > >>>>>>   .. note::
> > >>>>>> @@ -666,8 +661,9 @@ three events are available:
> > >>>>>>
> > >>>>>>   ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
> > >>>>>>      Notify the application that the recovery from error is successful,
> > >>>>>> -   the PMD already re-configures the port,
> > >>>>>> -   and the effect is the same as a restart operation.
> > >>>>>> +   the PMD already re-configures the port.
> > >>>>>> +   The application should restore some additional configuration,
> > >>>>>> and then
> > >>>>>> +   enable data path API invocation.
> > >>>>>>
> > >>>>>>   ``RTE_ETH_EVENT_RECOVERY_FAILED``
> > >>>>>>      Notify the application that the recovery from error failed,
> > >>>>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
> > >>>>>> index 0be1e8ca04..f994653fe9 100644
> > >>>>>> --- a/lib/ethdev/ethdev_driver.c
> > >>>>>> +++ b/lib/ethdev/ethdev_driver.c
> > >>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev
> > >>>>>> *dev, const char *ring_name,
> > >>>>>>       return rc;
> > >>>>>>   }
> > >>>>>>
> > >>>>>> +void
> > >>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
> > >>>>>> +{
> > >>>>>> +    if (dev == NULL)
> > >>>>>> +        return;
> > >>>>>> +    eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
> > >>>>>> +}
> > >>>>>> +
> > >>>>>>   const struct rte_memzone *
> > >>>>>>   rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char
> > >>>>>> *ring_name,
> > >>>>>>                uint16_t queue_id, size_t size, unsigned int align,
> > >>>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> > >>>>>> index 2c9d615fb5..0d964d1f67 100644
> > >>>>>> --- a/lib/ethdev/ethdev_driver.h
> > >>>>>> +++ b/lib/ethdev/ethdev_driver.h
> > >>>>>> @@ -1621,6 +1621,16 @@ int
> > >>>>>>   rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
> > >>>>>> char *name,
> > >>>>>>            uint16_t queue_id);
> > >>>>>>
> > >>>>>> +/**
> > >>>>>> + * @internal
> > >>>>>> + * Setup eth fast-path API to ethdev values.
> > >>>>>> + *
> > >>>>>> + * @param dev
> > >>>>>> + *  Pointer to struct rte_eth_dev.
> > >>>>>> + */
> > >>>>>> +__rte_internal
> > >>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
> > >>>>>> +
> > >>>>>>   /**
> > >>>>>>    * @internal
> > >>>>>>    * Atomically set the link status for the specific device.
> > >>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > >>>>>> index 049641d57c..44ee7229c1 100644
> > >>>>>> --- a/lib/ethdev/rte_ethdev.h
> > >>>>>> +++ b/lib/ethdev/rte_ethdev.h
> > >>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
> > >>>>>>        */
> > >>>>>>       RTE_ETH_EVENT_RX_AVAIL_THRESH,
> > >>>>>>       /** Port recovering from a hardware or firmware error.
> > >>>>>> -     * If PMD supports proactive error recovery,
> > >>>>>> -     * it should trigger this event to notify application
> > >>>>>> -     * that it detected an error and the recovery is being started.
> > >>>>>> -     * Upon receiving the event, the application should not invoke
> > >>>>>> any control path API
> > >>>>>> -     * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until
> > >>>>>> receiving
> > >>>>>> -     * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> > >>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
> > >>>>>> -     * The PMD will set the data path pointers to dummy functions,
> > >>>>>> -     * and re-set the data path pointers to non-dummy functions
> > >>>>>> -     * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >>>>>> -     * It means that the application cannot send or receive any
> > >>>>>> packets
> > >>>>>> -     * during this period.
> > >>>>>> +     *
> > >>>>>> +     * If PMD supports proactive error recovery, it should trigger
> > >>>>>> this
> > >>>>>> +     * event to notify application that it detected an error and the
> > >>>>>> +     * recovery is about to start.
> > >>>>>> +     *
> > >>>>>> +     * Upon receiving the event, the application should not invoke any
> > >>>>>> +     * control and data path API until receiving
> > >>>>>> +     * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
> > >>>>>> +     * event.
> > >>>>>> +     *
> > >>>>>> +     * Once this event is reported, the PMD will set the data path
> > >>>>>> pointers
> > >>>>>> +     * to dummy functions, and re-set the data path pointers to valid
> > >>>>>> +     * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
> > >>>>>> event.
> > >>>>>> +     *
> > >>>>>>        * @note Before the PMD reports the recovery result,
> > >>>>>>        * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
> > >>>>>> again,
> > >>>>>>        * because a larger error may occur during the recovery.
> > >>>>>>        */
> > >>>>>>       RTE_ETH_EVENT_ERR_RECOVERING,
> > >>>>>>       /** Port recovers successfully from the error.
> > >>>>>> -     * The PMD already re-configured the port,
> > >>>>>> -     * and the effect is the same as a restart operation.
> > >>>>>> +     *
> > >>>>>> +     * The PMD already re-configured the port:
> > >>>>>>        * a) The following operation will be retained: (alphabetically)
> > >>>>>>        *    - DCB configuration
> > >>>>>>        *    - FEC configuration
> > >>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
> > >>>>>>        *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
> > >>>>>>        * c) Any other configuration will not be stored
> > >>>>>>        *    and will need to be re-configured.
> > >>>>>> +     *
> > >>>>>> +     * The application should restore some additional configuration
> > >>>>>> +     * (see above case b/c), and then enable data path API invocation.
> > >>>>>>        */
> > >>>>>>       RTE_ETH_EVENT_RECOVERY_SUCCESS,
> > >>>>>>       /** Port recovery failed.
> > >>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> > >>>>>> index 357d1a88c0..c273e0bdae 100644
> > >>>>>> --- a/lib/ethdev/version.map
> > >>>>>> +++ b/lib/ethdev/version.map
> > >>>>>> @@ -320,6 +320,7 @@ INTERNAL {
> > >>>>>>       rte_eth_devices;
> > >>>>>>       rte_eth_dma_zone_free;
> > >>>>>>       rte_eth_dma_zone_reserve;
> > >>>>>> +    rte_eth_fp_ops_setup;
> > >>>>>>       rte_eth_hairpin_queue_peer_bind;
> > >>>>>>       rte_eth_hairpin_queue_peer_unbind;
> > >>>>>>       rte_eth_hairpin_queue_peer_update;
> > >>>>>> --
> > >>>>>   Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> > >>>>>
> > >>>>>> 2.17.1
> > >>>>>
> > >>>>
> > >>>
> > >
> >
  
Honnappa Nagarahalli March 7, 2023, 5:34 a.m. UTC | #16
> -----Original Message-----
> From: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> Sent: Sunday, March 5, 2023 9:24 AM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> dev@dpdk.org; Chengwen Feng <fengchengwen@huawei.com>;
> thomas@monjalon.net; Ferruh Yigit <ferruh.yigit@amd.com>; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>; Kalesh AP <kalesh-
> anakkur.purayil@broadcom.com>; Ajit Khaparde
> (ajit.khaparde@broadcom.com) <ajit.khaparde@broadcom.com>
> Cc: nd <nd@arm.com>
> Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive error handling
> mode
> 
> 
> >>>>
> >>>> In the proactive error handling mode, the PMD will set the data
> >>>> path pointers to dummy functions and then try recovery, in this
> >>>> period the application may still invoking data path API. This will
> >>>> introduce a race-condition with data path which may lead to crash [1].
> >>>>
> >>>> Although the PMD added delay after setting data path pointers to
> >>>> cover the above race-condition, it reduces the probability, but it
> >>>> doesn't solve the problem.
> >>>>
> >>>> To solve the race-condition problem fundamentally, the following
> >>>> requirements are added:
> >>>> 1. The PMD should set the data path pointers to dummy functions after
> >>>>      report RTE_ETH_EVENT_ERR_RECOVERING event.
> >>> Do you mean to say, PMD should set the data path pointers after
> >>> calling the
> >> call back function?
> >>> The PMD is running in the context of multiple EAL threads. How do
> >>> these
> >> threads synchronize such that only one thread sets these data pointers?
> >>
> >> As I understand this event callback supposed to be called in the
> >> context of EAL interrupt thread (whoever is more familiar with
> >> original idea, feel free to correct me if I missed something).
> > I could not figure this out. It looks to be called from the data plane thread
> context.
> > I also have a thought on alternate design at the end, appreciate if you can
> take a look.
> >
> >> How it is going to signal data-path threads that they need to
> >> stop/suspend calling data-path API - that's I suppose is left to application
> to decide...
> >> Same as right now it is application responsibility to stop data-path
> >> threads before doing dev_stop()/dev/_config()/etc.
> > Ok, good, this expectation is not new. The application must have a
> mechanism already.
> >
> >>
> >>
> >>>
> >>>> 2. The application should stop data path API invocation when process
> >>>>      the RTE_ETH_EVENT_ERR_RECOVERING event.
> >>> Any thoughts on how an application can do this?
> > We can ignore this question as there is already similar expectation set for
> earlier functionalities.
> >
> >>>
> >>>> 3. The PMD should set the data path pointers to valid functions before
> >>>>      report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>>> 4. The application should enable data path API invocation when process
> >>>>      the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>> Do you mean to say that the application should not call the datapath
> >>> APIs
> >> while the PMD is running the recovery process?
> >>
> >> Yes, I believe that's the intention.
> > Ok, this is good and makes sense.
> >
> >>
> >>>>
> >>>> Also, this patch introduce a driver internal function
> >>>> rte_eth_fp_ops_setup which used as an help function for PMD.
> >>>>
> >>>> [1]
> >>>>
> >>
> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2
> >>>> -
> >>>> ashok.k.kaladi@intel.com/
> >>>>
> >>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
> >>>> Cc: stable@dpdk.org
> >>>>
> >>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> >>>> ---
> >>>>    doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
> >>>>    lib/ethdev/ethdev_driver.c              |  8 +++++++
> >>>>    lib/ethdev/ethdev_driver.h              | 10 ++++++++
> >>>>    lib/ethdev/rte_ethdev.h                 | 32 +++++++++++++++----------
> >>>>    lib/ethdev/version.map                  |  1 +
> >>>>    5 files changed, 46 insertions(+), 25 deletions(-)
> >>>>
> >>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
> >>>> b/doc/guides/prog_guide/poll_mode_drv.rst
> >>>> index c145a9066c..e380ff135a 100644
> >>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> >>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> >>>> @@ -638,14 +638,9 @@ different from the application invokes
> >>>> recovery in PASSIVE mode,  the PMD automatically recovers from
> >>>> error in PROACTIVE mode,  and only a small amount of work is
> >>>> required for the
> >> application.
> >>>>
> >>>> -During error detection and automatic recovery, -the PMD sets the
> >>>> data path pointers to dummy functions -(which will prevent the
> >>>> crash), -and also make sure the control path operations fail with a
> >>>> return
> >> code ``-EBUSY``.
> >>>> -
> >>>> -Because the PMD recovers automatically, -the application can only
> >>>> sense that the data flow is disconnected for a while -and the
> >>>> control API returns an error in this period.
> >>>> +During error detection and automatic recovery, the PMD sets the
> >>>> +data path pointers to dummy functions and also make sure the
> >>>> +control path operations failed with a return code ``-EBUSY``.
> >>>>
> >>>>    In order to sense the error happening/recovering,  as well as to
> >>>> restore some additional configuration, @@ -653,9 +648,9 @@ three
> >>>> events
> >> are available:
> >>>>
> >>>>    ``RTE_ETH_EVENT_ERR_RECOVERING``
> >>>>       Notify the application that an error is detected
> >>>> -   and the recovery is being started.
> >>>> +   and the recovery is about to start.
> >>>>       Upon receiving the event, the application should not invoke
> >>>> -   any control path function until receiving
> >>>> +   any control and data path API until receiving
> >>>>       ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
> >>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
> >>>>
> >>>>    .. note::
> >>>> @@ -666,8 +661,9 @@ three events are available:
> >>>>
> >>>>    ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
> >>>>       Notify the application that the recovery from error is successful,
> >>>> -   the PMD already re-configures the port,
> >>>> -   and the effect is the same as a restart operation.
> >>>> +   the PMD already re-configures the port.
> >>>> +   The application should restore some additional configuration,
> >>>> + and then
> >>> What is the additional configuration? Is this specific to each NIC/PMD?
> >>> I thought, this is an auto recovery process and the application does
> >>> not require
> >> to reconfigure anything. If the application has to restore the
> >> configuration, how does auto recovery differ from typical recovery
> process?
> >>>
> >>>> +   enable data path API invocation.
> >>>>
> >>>>    ``RTE_ETH_EVENT_RECOVERY_FAILED``
> >>>>       Notify the application that the recovery from error failed,
> >>>> diff --git a/lib/ethdev/ethdev_driver.c
> >>>> b/lib/ethdev/ethdev_driver.c index
> >>>> 0be1e8ca04..f994653fe9 100644
> >>>> --- a/lib/ethdev/ethdev_driver.c
> >>>> +++ b/lib/ethdev/ethdev_driver.c
> >>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct
> rte_eth_dev
> >>>> *dev, const char *ring_name,
> >>>>    	return rc;
> >>>>    }
> >>>>
> >>>> +void
> >>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev) {
> >>>> +	if (dev == NULL)
> >>>> +		return;
> >>>> +	eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev); }
> >>>> +
> >>>>    const struct rte_memzone *
> >>>>    rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const
> >>>> char *ring_name,
> >>>>    			 uint16_t queue_id, size_t size, unsigned int align, diff
> -
> >> -git
> >>>> a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index
> >>>> 2c9d615fb5..0d964d1f67 100644
> >>>> --- a/lib/ethdev/ethdev_driver.h
> >>>> +++ b/lib/ethdev/ethdev_driver.h
> >>>> @@ -1621,6 +1621,16 @@ int
> >>>>    rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
> >>>> char *name,
> >>>>    		 uint16_t queue_id);
> >>>>
> >>>> +/**
> >>>> + * @internal
> >>>> + * Setup eth fast-path API to ethdev values.
> >>>> + *
> >>>> + * @param dev
> >>>> + *  Pointer to struct rte_eth_dev.
> >>>> + */
> >>>> +__rte_internal
> >>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
> >>>> +
> >>>>    /**
> >>>>     * @internal
> >>>>     * Atomically set the link status for the specific device.
> >>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> >>>> index
> >>>> 049641d57c..44ee7229c1 100644
> >>>> --- a/lib/ethdev/rte_ethdev.h
> >>>> +++ b/lib/ethdev/rte_ethdev.h
> >>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
> >>>>    	 */
> >>>>    	RTE_ETH_EVENT_RX_AVAIL_THRESH,
> >>>>    	/** Port recovering from a hardware or firmware error.
> >>>> -	 * If PMD supports proactive error recovery,
> >>>> -	 * it should trigger this event to notify application
> >>>> -	 * that it detected an error and the recovery is being started.
> >>>> -	 * Upon receiving the event, the application should not invoke any
> >>>> control path API
> >>>> -	 * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until receiving
> >>>> -	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> >>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
> >>>> -	 * The PMD will set the data path pointers to dummy functions,
> >>>> -	 * and re-set the data path pointers to non-dummy functions
> >>>> -	 * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>>> -	 * It means that the application cannot send or receive any packets
> >>>> -	 * during this period.
> >>>> +	 *
> >>>> +	 * If PMD supports proactive error recovery, it should trigger this
> >>>> +	 * event to notify application that it detected an error and the
> >>>> +	 * recovery is about to start.
> >>>> +	 *
> >>>> +	 * Upon receiving the event, the application should not invoke any
> >>>> +	 * control and data path API until receiving
> >>>> +	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> >>>> RTE_ETH_EVENT_RECOVERY_FAILED
> >>>> +	 * event.
> >>>> +	 *
> >>>> +	 * Once this event is reported, the PMD will set the data path pointers
> >>>> +	 * to dummy functions, and re-set the data path pointers to valid
> >>>> +	 * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
> >>>> event.
> >>> Why do we need to set the data path pointers to dummy functions if
> >>> the
> >> application is restricted from invoking any control and data path
> >> APIs till the recovery process is completed?
> >>
> >> You are right, in theory it is not mandatory.
> >> Though it helps to flag a problem if user will still try to call them
> >> while recovery is in progress.
> > Ok, may be in debug mode.
> > I mean, we have already set an expectation to the application that it should
> not call and the application has implemented a method to do the same. Why
> do we need to complicate this?
> > If the application calls the APIs, it is a programming error.
> 
> 
> My preference would be to keep it this way for both debug and non-debug
> mode.
> It doesn't cost anything to us in terms of perfomance, but helps to catch
> problems with wrong behaving app.

This is also causing a synchronization problem. i.e. if this has to be done correctly, we need to use correct synchronization mechanisms.
We cannot set the function pointers and assume that data will be visible to other threads/cores in the correct order.
A possible mechanism (though I see some problems with this) could be to use a guard variable, which indicates when it is safe to use the function pointers on the data plane threads. This would require a load-acquire in the data plane threads.

> 
> >
> >> Again, same as we doing in dev_stop().
> >
> >>
> >>>
> >>>> +	 *
> >>>>    	 * @note Before the PMD reports the recovery result,
> >>>>    	 * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
> >>>> again,
> >>>>    	 * because a larger error may occur during the recovery.
> >>>>    	 */
> >>>>    	RTE_ETH_EVENT_ERR_RECOVERING,
> >>> I understand this is not a change in this patch. But, just
> >>> wondering, what is the
> >> purpose of this? How is the application supposed to use this?
> >>>
> >>>>    	/** Port recovers successfully from the error.
> >>>> -	 * The PMD already re-configured the port,
> >>>> -	 * and the effect is the same as a restart operation.
> >>>> +	 *
> >>>> +	 * The PMD already re-configured the port:
> >>>>    	 * a) The following operation will be retained: (alphabetically)
> >>>>    	 *    - DCB configuration
> >>>>    	 *    - FEC configuration
> >>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
> >>>>    	 *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
> >>>>    	 * c) Any other configuration will not be stored
> >>>>    	 *    and will need to be re-configured.
> >>>> +	 *
> >>>> +	 * The application should restore some additional configuration
> >>>> +	 * (see above case b/c), and then enable data path API invocation.
> >>>>    	 */
> >>>>    	RTE_ETH_EVENT_RECOVERY_SUCCESS,
> >>>>    	/** Port recovery failed.
> >>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map index
> >>>> 357d1a88c0..c273e0bdae 100644
> >>>> --- a/lib/ethdev/version.map
> >>>> +++ b/lib/ethdev/version.map
> >>>> @@ -320,6 +320,7 @@ INTERNAL {
> >>>>    	rte_eth_devices;
> >>>>    	rte_eth_dma_zone_free;
> >>>>    	rte_eth_dma_zone_reserve;
> >>>> +	rte_eth_fp_ops_setup;
> >>>>    	rte_eth_hairpin_queue_peer_bind;
> >>>>    	rte_eth_hairpin_queue_peer_unbind;
> >>>>    	rte_eth_hairpin_queue_peer_update;
> >>>> --
> >>>> 2.17.1
> >>>
> >
> > Is there any reason not to design this in the same way as
> 'rte_eth_dev_reset'? Why does the PMD have to recover by itself?
> 
> I suppose it is a question for the authors of original patch...
Appreciate if the authors could comment on this.

> 
> > We could have a similar API 'rte_eth_dev_recover' to do the recovery
> functionality.
> 
> I suppose such approach is also possible.
> Personally I am fine with both ways: either existing one or what you propose,
> as long as we'll fix existing race-condition.
> What is good with what you suggest - that way we probably don't need to
> worry how to allow user to enable/disable auto-recovery inside PMD.
> 
> Konstantin
>
  
Chengwen Feng March 7, 2023, 8:25 a.m. UTC | #17
On 2023/3/6 19:13, Konstantin Ananyev wrote:
> 
> 
>>>>>>>>> In the proactive error handling mode, the PMD will set the data path
>>>>>>>>> pointers to dummy functions and then try recovery, in this period the
>>>>>>>>> application may still invoking data path API. This will introduce a
>>>>>>>>> race-condition with data path which may lead to crash [1].
>>>>>>>>>
>>>>>>>>> Although the PMD added delay after setting data path pointers to cover
>>>>>>>>> the above race-condition, it reduces the probability, but it doesn't
>>>>>>>>> solve the problem.
>>>>>>>>>
>>>>>>>>> To solve the race-condition problem fundamentally, the following
>>>>>>>>> requirements are added:
>>>>>>>>> 1. The PMD should set the data path pointers to dummy functions after
>>>>>>>>>     report RTE_ETH_EVENT_ERR_RECOVERING event.
>>>>>>>>> 2. The application should stop data path API invocation when process
>>>>>>>>>     the RTE_ETH_EVENT_ERR_RECOVERING event.
>>>>>>>>> 3. The PMD should set the data path pointers to valid functions before
>>>>>>>>>     report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>>>>>> 4. The application should enable data path API invocation when process
>>>>>>>>>     the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>>>>>>
>>>>>>>
>>>>>>> How this is solving the race-condition, by pushing responsibility to
>>>>>>> stop data path to application?
>>>>>>
>>>>>> Exactly, it becomes application responsibility to make sure data-path is
>>>>>> stopped/suspended before recovery will continue.
>>>>>>
>>>>>
>>>>> From documentation of the feature:
>>>>>
>>>>> ``
>>>>> Because the PMD recovers automatically,
>>>>> the application can only sense that the data flow is disconnected for a
>>>>> while and the control API returns an error in this period.
>>>>>
>>>>> In order to sense the error happening/recovering, as well as to restore
>>>>> some additional configuration, three events are available:
>>>>> ``
>>>>>
>>>>> It looks like initial design is to use events mainly inform application
>>>>> about what happened and mainly for re-configuration.
>>>>>
>>>>> Although I am don't disagree to involve the application, I am not sure
>>>>> that is part of current design.
>>>>
>>>> I thought we all agreed that initial design contain some fallacies that
>>>> need to fixed, no?
>>>> Statement that with current rte_ethdev design error recovery can be done
>>>> without interaction with the app (to stop/suspend data/control path)
>>>> is the main one I think.
>>>> It needs some interaction with app layer, one way or another.
>>>>
>>>>>>>
>>>>>>> What if application is not interested in recovery modes at all and not
>>>>>>> registered any callback for the recovery?
>>>>>>
>>>>>>
>>>>>> Are you saying there is no way for application to disable
>>>>>> automatic recovery in PMD if it is not interested
>>>>>> (or can't full-fill per-requesties for it)?
>>>>>> If so, then yes it is a problem and we need to fix it.
>>>>>> I assumed that such mechanism to disable unwanted events already exists,
>>>>>> but I can't find anything.
>>>>>> Wonder what would be the easiest way here - can PMD make a decision
>>>>>> based on callback return value, or do we need a new API to
>>>>>> enable/disable callbacks, or ...?
>>>>>>
>>>>>>
>>>>>
>>>>> As far as I can see automatic recovery is not configurable by app.
>>>>>
>>>>> But that is not all, PMD sends events to application but PMD can't know
>>>>> if application is handling them or not, so with current design PMD can't
>>>>> rely on to app.
>>>>
>>>> Well, PMD invokes user provided callback.
>>>> One way to fix that problem - if there is no callback provided,
>>>> or callback returns an error code - PMD can assume that recovery
>>>> should not be done.
>>>> That is probably not the best design choice, but at least it will allow
>>>> to fix the problem without too many changes and introducing new API.
>>>> That could be sort of a 'quick fix'.
>>>> In a meanwhile we can think about new/better approach for that.
>>>>
>>>
>>> -rc2 for 23.03 is a few days away.
>>>
>>> What do you think to have 'quick fix' as modifying how driver updates
>>> burst ops to prevent the race condition, for this release?

The 'quick fix', do you mean only update function pointer (without rxq setting) ?
Currently the PMDs which announced support "proactive error handling mode" already
do this.

>>>
>>> And plan a design update for the next release?
>> +1 on the overall approach.
> 
> Yep, agree.

Hope for better solution.
And also, I notice only the openvswitch (from all open-source software which based-on DPDK)
registers RTE_ETH_EVENT_INTR_RESET callback .

Therefore, hope we build a recovery framework at the DPDK SDK level and be compatible
with RTE_ETH_EVENT_INTR_RESET and RTE_ETH_EVENT_ERR_RECOVERING mechanism.

>  
>>
>>>
>>>
>>>>>
>>>>>>> I think driver should not rely on application for this, unless
>>>>>>> application explicitly says (to driver) that it is handling recovery,
>>>>>>> right now there is no way for driver to know this.
>>>>>>
>>>>>> I think it is visa-versa:
>>>>>> application should not enable auto-recovery if it can't meet
>>>>>> per-requeststies for it (provide appropriate callback).
>>>>>>
>>>>>
>>>>> I agree on above, we are saying similar thing in different perspective.
>>>>
>>>> Ok, that's good we are on the same page.
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>> Also, this patch introduce a driver internal function
>>>>>>>>> rte_eth_fp_ops_setup which used as an help function for PMD.
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
>>>>>>>>>
>>>>>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
>>>>>>>>> Cc: stable@dpdk.org
>>>>>>>>>
>>>>>>>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>>>>>>>>> ---
>>>>>>>>>   doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
>>>>>>>>>   lib/ethdev/ethdev_driver.c              |  8 +++++++
>>>>>>>>>   lib/ethdev/ethdev_driver.h              | 10 ++++++++
>>>>>>>>>   lib/ethdev/rte_ethdev.h                 | 32
>>>>>>>>> +++++++++++++++----------
>>>>>>>>>   lib/ethdev/version.map                  |  1 +
>>>>>>>>>   5 files changed, 46 insertions(+), 25 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>>>>> index c145a9066c..e380ff135a 100644
>>>>>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>>>>> @@ -638,14 +638,9 @@ different from the application invokes recovery
>>>>>>>>> in PASSIVE mode,
>>>>>>>>>   the PMD automatically recovers from error in PROACTIVE mode,
>>>>>>>>>   and only a small amount of work is required for the application.
>>>>>>>>>
>>>>>>>>> -During error detection and automatic recovery,
>>>>>>>>> -the PMD sets the data path pointers to dummy functions
>>>>>>>>> -(which will prevent the crash),
>>>>>>>>> -and also make sure the control path operations fail with a return
>>>>>>>>> code ``-EBUSY``.
>>>>>>>>> -
>>>>>>>>> -Because the PMD recovers automatically,
>>>>>>>>> -the application can only sense that the data flow is disconnected
>>>>>>>>> for a while
>>>>>>>>> -and the control API returns an error in this period.
>>>>>>>>> +During error detection and automatic recovery, the PMD sets the
>>>>>>>>> data path
>>>>>>>>> +pointers to dummy functions and also make sure the control path
>>>>>>>>> operations
>>>>>>>>> +failed with a return code ``-EBUSY``.
>>>>>>>>>
>>>>>>>>>   In order to sense the error happening/recovering,
>>>>>>>>>   as well as to restore some additional configuration,
>>>>>>>>> @@ -653,9 +648,9 @@ three events are available:
>>>>>>>>>
>>>>>>>>>   ``RTE_ETH_EVENT_ERR_RECOVERING``
>>>>>>>>>      Notify the application that an error is detected
>>>>>>>>> -   and the recovery is being started.
>>>>>>>>> +   and the recovery is about to start.
>>>>>>>>>      Upon receiving the event, the application should not invoke
>>>>>>>>> -   any control path function until receiving
>>>>>>>>> +   any control and data path API until receiving
>>>>>>>>>      ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
>>>>>>>>>
>>>>>>>>>   .. note::
>>>>>>>>> @@ -666,8 +661,9 @@ three events are available:
>>>>>>>>>
>>>>>>>>>   ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
>>>>>>>>>      Notify the application that the recovery from error is successful,
>>>>>>>>> -   the PMD already re-configures the port,
>>>>>>>>> -   and the effect is the same as a restart operation.
>>>>>>>>> +   the PMD already re-configures the port.
>>>>>>>>> +   The application should restore some additional configuration,
>>>>>>>>> and then
>>>>>>>>> +   enable data path API invocation.
>>>>>>>>>
>>>>>>>>>   ``RTE_ETH_EVENT_RECOVERY_FAILED``
>>>>>>>>>      Notify the application that the recovery from error failed,
>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
>>>>>>>>> index 0be1e8ca04..f994653fe9 100644
>>>>>>>>> --- a/lib/ethdev/ethdev_driver.c
>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.c
>>>>>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev
>>>>>>>>> *dev, const char *ring_name,
>>>>>>>>>       return rc;
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>> +void
>>>>>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
>>>>>>>>> +{
>>>>>>>>> +    if (dev == NULL)
>>>>>>>>> +        return;
>>>>>>>>> +    eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>   const struct rte_memzone *
>>>>>>>>>   rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char
>>>>>>>>> *ring_name,
>>>>>>>>>                uint16_t queue_id, size_t size, unsigned int align,
>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
>>>>>>>>> index 2c9d615fb5..0d964d1f67 100644
>>>>>>>>> --- a/lib/ethdev/ethdev_driver.h
>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.h
>>>>>>>>> @@ -1621,6 +1621,16 @@ int
>>>>>>>>>   rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
>>>>>>>>> char *name,
>>>>>>>>>            uint16_t queue_id);
>>>>>>>>>
>>>>>>>>> +/**
>>>>>>>>> + * @internal
>>>>>>>>> + * Setup eth fast-path API to ethdev values.
>>>>>>>>> + *
>>>>>>>>> + * @param dev
>>>>>>>>> + *  Pointer to struct rte_eth_dev.
>>>>>>>>> + */
>>>>>>>>> +__rte_internal
>>>>>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
>>>>>>>>> +
>>>>>>>>>   /**
>>>>>>>>>    * @internal
>>>>>>>>>    * Atomically set the link status for the specific device.
>>>>>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>>>>>>>>> index 049641d57c..44ee7229c1 100644
>>>>>>>>> --- a/lib/ethdev/rte_ethdev.h
>>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>>>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
>>>>>>>>>        */
>>>>>>>>>       RTE_ETH_EVENT_RX_AVAIL_THRESH,
>>>>>>>>>       /** Port recovering from a hardware or firmware error.
>>>>>>>>> -     * If PMD supports proactive error recovery,
>>>>>>>>> -     * it should trigger this event to notify application
>>>>>>>>> -     * that it detected an error and the recovery is being started.
>>>>>>>>> -     * Upon receiving the event, the application should not invoke
>>>>>>>>> any control path API
>>>>>>>>> -     * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until
>>>>>>>>> receiving
>>>>>>>>> -     * RTE_ETH_EVENT_RECOVERY_SUCCESS or
>>>>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
>>>>>>>>> -     * The PMD will set the data path pointers to dummy functions,
>>>>>>>>> -     * and re-set the data path pointers to non-dummy functions
>>>>>>>>> -     * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>>>>>> -     * It means that the application cannot send or receive any
>>>>>>>>> packets
>>>>>>>>> -     * during this period.
>>>>>>>>> +     *
>>>>>>>>> +     * If PMD supports proactive error recovery, it should trigger
>>>>>>>>> this
>>>>>>>>> +     * event to notify application that it detected an error and the
>>>>>>>>> +     * recovery is about to start.
>>>>>>>>> +     *
>>>>>>>>> +     * Upon receiving the event, the application should not invoke any
>>>>>>>>> +     * control and data path API until receiving
>>>>>>>>> +     * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
>>>>>>>>> +     * event.
>>>>>>>>> +     *
>>>>>>>>> +     * Once this event is reported, the PMD will set the data path
>>>>>>>>> pointers
>>>>>>>>> +     * to dummy functions, and re-set the data path pointers to valid
>>>>>>>>> +     * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
>>>>>>>>> event.
>>>>>>>>> +     *
>>>>>>>>>        * @note Before the PMD reports the recovery result,
>>>>>>>>>        * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
>>>>>>>>> again,
>>>>>>>>>        * because a larger error may occur during the recovery.
>>>>>>>>>        */
>>>>>>>>>       RTE_ETH_EVENT_ERR_RECOVERING,
>>>>>>>>>       /** Port recovers successfully from the error.
>>>>>>>>> -     * The PMD already re-configured the port,
>>>>>>>>> -     * and the effect is the same as a restart operation.
>>>>>>>>> +     *
>>>>>>>>> +     * The PMD already re-configured the port:
>>>>>>>>>        * a) The following operation will be retained: (alphabetically)
>>>>>>>>>        *    - DCB configuration
>>>>>>>>>        *    - FEC configuration
>>>>>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
>>>>>>>>>        *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
>>>>>>>>>        * c) Any other configuration will not be stored
>>>>>>>>>        *    and will need to be re-configured.
>>>>>>>>> +     *
>>>>>>>>> +     * The application should restore some additional configuration
>>>>>>>>> +     * (see above case b/c), and then enable data path API invocation.
>>>>>>>>>        */
>>>>>>>>>       RTE_ETH_EVENT_RECOVERY_SUCCESS,
>>>>>>>>>       /** Port recovery failed.
>>>>>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
>>>>>>>>> index 357d1a88c0..c273e0bdae 100644
>>>>>>>>> --- a/lib/ethdev/version.map
>>>>>>>>> +++ b/lib/ethdev/version.map
>>>>>>>>> @@ -320,6 +320,7 @@ INTERNAL {
>>>>>>>>>       rte_eth_devices;
>>>>>>>>>       rte_eth_dma_zone_free;
>>>>>>>>>       rte_eth_dma_zone_reserve;
>>>>>>>>> +    rte_eth_fp_ops_setup;
>>>>>>>>>       rte_eth_hairpin_queue_peer_bind;
>>>>>>>>>       rte_eth_hairpin_queue_peer_unbind;
>>>>>>>>>       rte_eth_hairpin_queue_peer_update;
>>>>>>>>> --
>>>>>>>>   Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
>>>>>>>>
>>>>>>>>> 2.17.1
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
  
Chengwen Feng March 7, 2023, 8:39 a.m. UTC | #18
On 2023/3/7 13:34, Honnappa Nagarahalli wrote:
> 
> 
>> -----Original Message-----
>> From: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
>> Sent: Sunday, March 5, 2023 9:24 AM
>> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
>> dev@dpdk.org; Chengwen Feng <fengchengwen@huawei.com>;
>> thomas@monjalon.net; Ferruh Yigit <ferruh.yigit@amd.com>; Andrew
>> Rybchenko <andrew.rybchenko@oktetlabs.ru>; Kalesh AP <kalesh-
>> anakkur.purayil@broadcom.com>; Ajit Khaparde
>> (ajit.khaparde@broadcom.com) <ajit.khaparde@broadcom.com>
>> Cc: nd <nd@arm.com>
>> Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive error handling
>> mode
>>
>>
>>>>>>
>>>>>> In the proactive error handling mode, the PMD will set the data
>>>>>> path pointers to dummy functions and then try recovery, in this
>>>>>> period the application may still invoking data path API. This will
>>>>>> introduce a race-condition with data path which may lead to crash [1].
>>>>>>
>>>>>> Although the PMD added delay after setting data path pointers to
>>>>>> cover the above race-condition, it reduces the probability, but it
>>>>>> doesn't solve the problem.
>>>>>>
>>>>>> To solve the race-condition problem fundamentally, the following
>>>>>> requirements are added:
>>>>>> 1. The PMD should set the data path pointers to dummy functions after
>>>>>>      report RTE_ETH_EVENT_ERR_RECOVERING event.
>>>>> Do you mean to say, PMD should set the data path pointers after
>>>>> calling the
>>>> call back function?
>>>>> The PMD is running in the context of multiple EAL threads. How do
>>>>> these
>>>> threads synchronize such that only one thread sets these data pointers?
>>>>
>>>> As I understand this event callback supposed to be called in the
>>>> context of EAL interrupt thread (whoever is more familiar with
>>>> original idea, feel free to correct me if I missed something).
>>> I could not figure this out. It looks to be called from the data plane thread
>> context.
>>> I also have a thought on alternate design at the end, appreciate if you can
>> take a look.
>>>
>>>> How it is going to signal data-path threads that they need to
>>>> stop/suspend calling data-path API - that's I suppose is left to application
>> to decide...
>>>> Same as right now it is application responsibility to stop data-path
>>>> threads before doing dev_stop()/dev/_config()/etc.
>>> Ok, good, this expectation is not new. The application must have a
>> mechanism already.
>>>
>>>>
>>>>
>>>>>
>>>>>> 2. The application should stop data path API invocation when process
>>>>>>      the RTE_ETH_EVENT_ERR_RECOVERING event.
>>>>> Any thoughts on how an application can do this?
>>> We can ignore this question as there is already similar expectation set for
>> earlier functionalities.
>>>
>>>>>
>>>>>> 3. The PMD should set the data path pointers to valid functions before
>>>>>>      report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>>> 4. The application should enable data path API invocation when process
>>>>>>      the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>> Do you mean to say that the application should not call the datapath
>>>>> APIs
>>>> while the PMD is running the recovery process?
>>>>
>>>> Yes, I believe that's the intention.
>>> Ok, this is good and makes sense.
>>>
>>>>
>>>>>>
>>>>>> Also, this patch introduce a driver internal function
>>>>>> rte_eth_fp_ops_setup which used as an help function for PMD.
>>>>>>
>>>>>> [1]
>>>>>>
>>>>
>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2
>>>>>> -
>>>>>> ashok.k.kaladi@intel.com/
>>>>>>
>>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
>>>>>> Cc: stable@dpdk.org
>>>>>>
>>>>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>>>>>> ---
>>>>>>    doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
>>>>>>    lib/ethdev/ethdev_driver.c              |  8 +++++++
>>>>>>    lib/ethdev/ethdev_driver.h              | 10 ++++++++
>>>>>>    lib/ethdev/rte_ethdev.h                 | 32 +++++++++++++++----------
>>>>>>    lib/ethdev/version.map                  |  1 +
>>>>>>    5 files changed, 46 insertions(+), 25 deletions(-)
>>>>>>
>>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>> index c145a9066c..e380ff135a 100644
>>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>> @@ -638,14 +638,9 @@ different from the application invokes
>>>>>> recovery in PASSIVE mode,  the PMD automatically recovers from
>>>>>> error in PROACTIVE mode,  and only a small amount of work is
>>>>>> required for the
>>>> application.
>>>>>>
>>>>>> -During error detection and automatic recovery, -the PMD sets the
>>>>>> data path pointers to dummy functions -(which will prevent the
>>>>>> crash), -and also make sure the control path operations fail with a
>>>>>> return
>>>> code ``-EBUSY``.
>>>>>> -
>>>>>> -Because the PMD recovers automatically, -the application can only
>>>>>> sense that the data flow is disconnected for a while -and the
>>>>>> control API returns an error in this period.
>>>>>> +During error detection and automatic recovery, the PMD sets the
>>>>>> +data path pointers to dummy functions and also make sure the
>>>>>> +control path operations failed with a return code ``-EBUSY``.
>>>>>>
>>>>>>    In order to sense the error happening/recovering,  as well as to
>>>>>> restore some additional configuration, @@ -653,9 +648,9 @@ three
>>>>>> events
>>>> are available:
>>>>>>
>>>>>>    ``RTE_ETH_EVENT_ERR_RECOVERING``
>>>>>>       Notify the application that an error is detected
>>>>>> -   and the recovery is being started.
>>>>>> +   and the recovery is about to start.
>>>>>>       Upon receiving the event, the application should not invoke
>>>>>> -   any control path function until receiving
>>>>>> +   any control and data path API until receiving
>>>>>>       ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
>>>>>>
>>>>>>    .. note::
>>>>>> @@ -666,8 +661,9 @@ three events are available:
>>>>>>
>>>>>>    ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
>>>>>>       Notify the application that the recovery from error is successful,
>>>>>> -   the PMD already re-configures the port,
>>>>>> -   and the effect is the same as a restart operation.
>>>>>> +   the PMD already re-configures the port.
>>>>>> +   The application should restore some additional configuration,
>>>>>> + and then
>>>>> What is the additional configuration? Is this specific to each NIC/PMD?
>>>>> I thought, this is an auto recovery process and the application does
>>>>> not require
>>>> to reconfigure anything. If the application has to restore the
>>>> configuration, how does auto recovery differ from typical recovery
>> process?
>>>>>
>>>>>> +   enable data path API invocation.
>>>>>>
>>>>>>    ``RTE_ETH_EVENT_RECOVERY_FAILED``
>>>>>>       Notify the application that the recovery from error failed,
>>>>>> diff --git a/lib/ethdev/ethdev_driver.c
>>>>>> b/lib/ethdev/ethdev_driver.c index
>>>>>> 0be1e8ca04..f994653fe9 100644
>>>>>> --- a/lib/ethdev/ethdev_driver.c
>>>>>> +++ b/lib/ethdev/ethdev_driver.c
>>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct
>> rte_eth_dev
>>>>>> *dev, const char *ring_name,
>>>>>>    	return rc;
>>>>>>    }
>>>>>>
>>>>>> +void
>>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev) {
>>>>>> +	if (dev == NULL)
>>>>>> +		return;
>>>>>> +	eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev); }
>>>>>> +
>>>>>>    const struct rte_memzone *
>>>>>>    rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const
>>>>>> char *ring_name,
>>>>>>    			 uint16_t queue_id, size_t size, unsigned int align, diff
>> -
>>>> -git
>>>>>> a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index
>>>>>> 2c9d615fb5..0d964d1f67 100644
>>>>>> --- a/lib/ethdev/ethdev_driver.h
>>>>>> +++ b/lib/ethdev/ethdev_driver.h
>>>>>> @@ -1621,6 +1621,16 @@ int
>>>>>>    rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
>>>>>> char *name,
>>>>>>    		 uint16_t queue_id);
>>>>>>
>>>>>> +/**
>>>>>> + * @internal
>>>>>> + * Setup eth fast-path API to ethdev values.
>>>>>> + *
>>>>>> + * @param dev
>>>>>> + *  Pointer to struct rte_eth_dev.
>>>>>> + */
>>>>>> +__rte_internal
>>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
>>>>>> +
>>>>>>    /**
>>>>>>     * @internal
>>>>>>     * Atomically set the link status for the specific device.
>>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>>>>>> index
>>>>>> 049641d57c..44ee7229c1 100644
>>>>>> --- a/lib/ethdev/rte_ethdev.h
>>>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
>>>>>>    	 */
>>>>>>    	RTE_ETH_EVENT_RX_AVAIL_THRESH,
>>>>>>    	/** Port recovering from a hardware or firmware error.
>>>>>> -	 * If PMD supports proactive error recovery,
>>>>>> -	 * it should trigger this event to notify application
>>>>>> -	 * that it detected an error and the recovery is being started.
>>>>>> -	 * Upon receiving the event, the application should not invoke any
>>>>>> control path API
>>>>>> -	 * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until receiving
>>>>>> -	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or
>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
>>>>>> -	 * The PMD will set the data path pointers to dummy functions,
>>>>>> -	 * and re-set the data path pointers to non-dummy functions
>>>>>> -	 * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>>> -	 * It means that the application cannot send or receive any packets
>>>>>> -	 * during this period.
>>>>>> +	 *
>>>>>> +	 * If PMD supports proactive error recovery, it should trigger this
>>>>>> +	 * event to notify application that it detected an error and the
>>>>>> +	 * recovery is about to start.
>>>>>> +	 *
>>>>>> +	 * Upon receiving the event, the application should not invoke any
>>>>>> +	 * control and data path API until receiving
>>>>>> +	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or
>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED
>>>>>> +	 * event.
>>>>>> +	 *
>>>>>> +	 * Once this event is reported, the PMD will set the data path pointers
>>>>>> +	 * to dummy functions, and re-set the data path pointers to valid
>>>>>> +	 * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
>>>>>> event.
>>>>> Why do we need to set the data path pointers to dummy functions if
>>>>> the
>>>> application is restricted from invoking any control and data path
>>>> APIs till the recovery process is completed?
>>>>
>>>> You are right, in theory it is not mandatory.
>>>> Though it helps to flag a problem if user will still try to call them
>>>> while recovery is in progress.
>>> Ok, may be in debug mode.
>>> I mean, we have already set an expectation to the application that it should
>> not call and the application has implemented a method to do the same. Why
>> do we need to complicate this?
>>> If the application calls the APIs, it is a programming error.
>>
>>
>> My preference would be to keep it this way for both debug and non-debug
>> mode.
>> It doesn't cost anything to us in terms of perfomance, but helps to catch
>> problems with wrong behaving app.
> 
> This is also causing a synchronization problem. i.e. if this has to be done correctly, we need to use correct synchronization mechanisms.
> We cannot set the function pointers and assume that data will be visible to other threads/cores in the correct order.
> A possible mechanism (though I see some problems with this) could be to use a guard variable, which indicates when it is safe to use the function pointers on the data plane threads. This would require a load-acquire in the data plane threads.
> 
>>
>>>
>>>> Again, same as we doing in dev_stop().
>>>
>>>>
>>>>>
>>>>>> +	 *
>>>>>>    	 * @note Before the PMD reports the recovery result,
>>>>>>    	 * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
>>>>>> again,
>>>>>>    	 * because a larger error may occur during the recovery.
>>>>>>    	 */
>>>>>>    	RTE_ETH_EVENT_ERR_RECOVERING,
>>>>> I understand this is not a change in this patch. But, just
>>>>> wondering, what is the
>>>> purpose of this? How is the application supposed to use this?
>>>>>
>>>>>>    	/** Port recovers successfully from the error.
>>>>>> -	 * The PMD already re-configured the port,
>>>>>> -	 * and the effect is the same as a restart operation.
>>>>>> +	 *
>>>>>> +	 * The PMD already re-configured the port:
>>>>>>    	 * a) The following operation will be retained: (alphabetically)
>>>>>>    	 *    - DCB configuration
>>>>>>    	 *    - FEC configuration
>>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
>>>>>>    	 *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
>>>>>>    	 * c) Any other configuration will not be stored
>>>>>>    	 *    and will need to be re-configured.
>>>>>> +	 *
>>>>>> +	 * The application should restore some additional configuration
>>>>>> +	 * (see above case b/c), and then enable data path API invocation.
>>>>>>    	 */
>>>>>>    	RTE_ETH_EVENT_RECOVERY_SUCCESS,
>>>>>>    	/** Port recovery failed.
>>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map index
>>>>>> 357d1a88c0..c273e0bdae 100644
>>>>>> --- a/lib/ethdev/version.map
>>>>>> +++ b/lib/ethdev/version.map
>>>>>> @@ -320,6 +320,7 @@ INTERNAL {
>>>>>>    	rte_eth_devices;
>>>>>>    	rte_eth_dma_zone_free;
>>>>>>    	rte_eth_dma_zone_reserve;
>>>>>> +	rte_eth_fp_ops_setup;
>>>>>>    	rte_eth_hairpin_queue_peer_bind;
>>>>>>    	rte_eth_hairpin_queue_peer_unbind;
>>>>>>    	rte_eth_hairpin_queue_peer_update;
>>>>>> --
>>>>>> 2.17.1
>>>>>
>>>
>>> Is there any reason not to design this in the same way as
>> 'rte_eth_dev_reset'? Why does the PMD have to recover by itself?
>>
>> I suppose it is a question for the authors of original patch...
> Appreciate if the authors could comment on this.

The main cause is that the hardware implementation limit, I will try to
explain from hns3 PMD's view.
For a global reset, all the function need responsed within a centain period
of time. otherwise, the reset will fail. and also the reset requirement a few
steps (all may take a long time).

When with multiple functions in one DPDK, and trigger a global reset, the
rte_eth_dev_reset will not cover this scene:
1. each port's will report RTE_ETH_EVENT_INTR_RESET in interrupt thread.
2. then invoke application callback, but due to the same thread, and each
    port's recover will take a long time, so later port will reset failed.

> 
>>
>>> We could have a similar API 'rte_eth_dev_recover' to do the recovery
>> functionality.
>>
>> I suppose such approach is also possible.
>> Personally I am fine with both ways: either existing one or what you propose,
>> as long as we'll fix existing race-condition.
>> What is good with what you suggest - that way we probably don't need to
>> worry how to allow user to enable/disable auto-recovery inside PMD.
>>
>> Konstantin
>>
>
  
Konstantin Ananyev March 7, 2023, 9:52 a.m. UTC | #19
> >
> >>>>>>>>> In the proactive error handling mode, the PMD will set the data path
> >>>>>>>>> pointers to dummy functions and then try recovery, in this period the
> >>>>>>>>> application may still invoking data path API. This will introduce a
> >>>>>>>>> race-condition with data path which may lead to crash [1].
> >>>>>>>>>
> >>>>>>>>> Although the PMD added delay after setting data path pointers to cover
> >>>>>>>>> the above race-condition, it reduces the probability, but it doesn't
> >>>>>>>>> solve the problem.
> >>>>>>>>>
> >>>>>>>>> To solve the race-condition problem fundamentally, the following
> >>>>>>>>> requirements are added:
> >>>>>>>>> 1. The PMD should set the data path pointers to dummy functions after
> >>>>>>>>>     report RTE_ETH_EVENT_ERR_RECOVERING event.
> >>>>>>>>> 2. The application should stop data path API invocation when process
> >>>>>>>>>     the RTE_ETH_EVENT_ERR_RECOVERING event.
> >>>>>>>>> 3. The PMD should set the data path pointers to valid functions before
> >>>>>>>>>     report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>>>>>>>> 4. The application should enable data path API invocation when process
> >>>>>>>>>     the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>>>>>>>>
> >>>>>>>
> >>>>>>> How this is solving the race-condition, by pushing responsibility to
> >>>>>>> stop data path to application?
> >>>>>>
> >>>>>> Exactly, it becomes application responsibility to make sure data-path is
> >>>>>> stopped/suspended before recovery will continue.
> >>>>>>
> >>>>>
> >>>>> From documentation of the feature:
> >>>>>
> >>>>> ``
> >>>>> Because the PMD recovers automatically,
> >>>>> the application can only sense that the data flow is disconnected for a
> >>>>> while and the control API returns an error in this period.
> >>>>>
> >>>>> In order to sense the error happening/recovering, as well as to restore
> >>>>> some additional configuration, three events are available:
> >>>>> ``
> >>>>>
> >>>>> It looks like initial design is to use events mainly inform application
> >>>>> about what happened and mainly for re-configuration.
> >>>>>
> >>>>> Although I am don't disagree to involve the application, I am not sure
> >>>>> that is part of current design.
> >>>>
> >>>> I thought we all agreed that initial design contain some fallacies that
> >>>> need to fixed, no?
> >>>> Statement that with current rte_ethdev design error recovery can be done
> >>>> without interaction with the app (to stop/suspend data/control path)
> >>>> is the main one I think.
> >>>> It needs some interaction with app layer, one way or another.
> >>>>
> >>>>>>>
> >>>>>>> What if application is not interested in recovery modes at all and not
> >>>>>>> registered any callback for the recovery?
> >>>>>>
> >>>>>>
> >>>>>> Are you saying there is no way for application to disable
> >>>>>> automatic recovery in PMD if it is not interested
> >>>>>> (or can't full-fill per-requesties for it)?
> >>>>>> If so, then yes it is a problem and we need to fix it.
> >>>>>> I assumed that such mechanism to disable unwanted events already exists,
> >>>>>> but I can't find anything.
> >>>>>> Wonder what would be the easiest way here - can PMD make a decision
> >>>>>> based on callback return value, or do we need a new API to
> >>>>>> enable/disable callbacks, or ...?
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> As far as I can see automatic recovery is not configurable by app.
> >>>>>
> >>>>> But that is not all, PMD sends events to application but PMD can't know
> >>>>> if application is handling them or not, so with current design PMD can't
> >>>>> rely on to app.
> >>>>
> >>>> Well, PMD invokes user provided callback.
> >>>> One way to fix that problem - if there is no callback provided,
> >>>> or callback returns an error code - PMD can assume that recovery
> >>>> should not be done.
> >>>> That is probably not the best design choice, but at least it will allow
> >>>> to fix the problem without too many changes and introducing new API.
> >>>> That could be sort of a 'quick fix'.
> >>>> In a meanwhile we can think about new/better approach for that.
> >>>>
> >>>
> >>> -rc2 for 23.03 is a few days away.
> >>>
> >>> What do you think to have 'quick fix' as modifying how driver updates
> >>> burst ops to prevent the race condition, for this release?
> 
> The 'quick fix', do you mean only update function pointer (without rxq setting) ?
> Currently the PMDs which announced support "proactive error handling mode" already
> do this.

Really sorry guys, I was too fast on the keyboard, and didn't read properly what Ferruh suggested.
Reading it once again - no I don not agree with that.
It wouldn't fix anything, but will just add extra mess into the code.
Sorry again for the wrong reply.
Konstantin


> 
> >>>
> >>> And plan a design update for the next release?
> >> +1 on the overall approach.
> >
> > Yep, agree.
> 
> Hope for better solution.
> And also, I notice only the openvswitch (from all open-source software which based-on DPDK)
> registers RTE_ETH_EVENT_INTR_RESET callback .
> 
> Therefore, hope we build a recovery framework at the DPDK SDK level and be compatible
> with RTE_ETH_EVENT_INTR_RESET and RTE_ETH_EVENT_ERR_RECOVERING mechanism.
> 
> >
> >>
> >>>
> >>>
> >>>>>
> >>>>>>> I think driver should not rely on application for this, unless
> >>>>>>> application explicitly says (to driver) that it is handling recovery,
> >>>>>>> right now there is no way for driver to know this.
> >>>>>>
> >>>>>> I think it is visa-versa:
> >>>>>> application should not enable auto-recovery if it can't meet
> >>>>>> per-requeststies for it (provide appropriate callback).
> >>>>>>
> >>>>>
> >>>>> I agree on above, we are saying similar thing in different perspective.
> >>>>
> >>>> Ok, that's good we are on the same page.
> >>>>
> >>>>
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>>>> Also, this patch introduce a driver internal function
> >>>>>>>>> rte_eth_fp_ops_setup which used as an help function for PMD.
> >>>>>>>>>
> >>>>>>>>> [1]
> >>>>>>>>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
> >>>>>>>>>
> >>>>>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
> >>>>>>>>> Cc: stable@dpdk.org
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> >>>>>>>>> ---
> >>>>>>>>>   doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
> >>>>>>>>>   lib/ethdev/ethdev_driver.c              |  8 +++++++
> >>>>>>>>>   lib/ethdev/ethdev_driver.h              | 10 ++++++++
> >>>>>>>>>   lib/ethdev/rte_ethdev.h                 | 32
> >>>>>>>>> +++++++++++++++----------
> >>>>>>>>>   lib/ethdev/version.map                  |  1 +
> >>>>>>>>>   5 files changed, 46 insertions(+), 25 deletions(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
> >>>>>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst
> >>>>>>>>> index c145a9066c..e380ff135a 100644
> >>>>>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> >>>>>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> >>>>>>>>> @@ -638,14 +638,9 @@ different from the application invokes recovery
> >>>>>>>>> in PASSIVE mode,
> >>>>>>>>>   the PMD automatically recovers from error in PROACTIVE mode,
> >>>>>>>>>   and only a small amount of work is required for the application.
> >>>>>>>>>
> >>>>>>>>> -During error detection and automatic recovery,
> >>>>>>>>> -the PMD sets the data path pointers to dummy functions
> >>>>>>>>> -(which will prevent the crash),
> >>>>>>>>> -and also make sure the control path operations fail with a return
> >>>>>>>>> code ``-EBUSY``.
> >>>>>>>>> -
> >>>>>>>>> -Because the PMD recovers automatically,
> >>>>>>>>> -the application can only sense that the data flow is disconnected
> >>>>>>>>> for a while
> >>>>>>>>> -and the control API returns an error in this period.
> >>>>>>>>> +During error detection and automatic recovery, the PMD sets the
> >>>>>>>>> data path
> >>>>>>>>> +pointers to dummy functions and also make sure the control path
> >>>>>>>>> operations
> >>>>>>>>> +failed with a return code ``-EBUSY``.
> >>>>>>>>>
> >>>>>>>>>   In order to sense the error happening/recovering,
> >>>>>>>>>   as well as to restore some additional configuration,
> >>>>>>>>> @@ -653,9 +648,9 @@ three events are available:
> >>>>>>>>>
> >>>>>>>>>   ``RTE_ETH_EVENT_ERR_RECOVERING``
> >>>>>>>>>      Notify the application that an error is detected
> >>>>>>>>> -   and the recovery is being started.
> >>>>>>>>> +   and the recovery is about to start.
> >>>>>>>>>      Upon receiving the event, the application should not invoke
> >>>>>>>>> -   any control path function until receiving
> >>>>>>>>> +   any control and data path API until receiving
> >>>>>>>>>      ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
> >>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
> >>>>>>>>>
> >>>>>>>>>   .. note::
> >>>>>>>>> @@ -666,8 +661,9 @@ three events are available:
> >>>>>>>>>
> >>>>>>>>>   ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
> >>>>>>>>>      Notify the application that the recovery from error is successful,
> >>>>>>>>> -   the PMD already re-configures the port,
> >>>>>>>>> -   and the effect is the same as a restart operation.
> >>>>>>>>> +   the PMD already re-configures the port.
> >>>>>>>>> +   The application should restore some additional configuration,
> >>>>>>>>> and then
> >>>>>>>>> +   enable data path API invocation.
> >>>>>>>>>
> >>>>>>>>>   ``RTE_ETH_EVENT_RECOVERY_FAILED``
> >>>>>>>>>      Notify the application that the recovery from error failed,
> >>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
> >>>>>>>>> index 0be1e8ca04..f994653fe9 100644
> >>>>>>>>> --- a/lib/ethdev/ethdev_driver.c
> >>>>>>>>> +++ b/lib/ethdev/ethdev_driver.c
> >>>>>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev
> >>>>>>>>> *dev, const char *ring_name,
> >>>>>>>>>       return rc;
> >>>>>>>>>   }
> >>>>>>>>>
> >>>>>>>>> +void
> >>>>>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
> >>>>>>>>> +{
> >>>>>>>>> +    if (dev == NULL)
> >>>>>>>>> +        return;
> >>>>>>>>> +    eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>>   const struct rte_memzone *
> >>>>>>>>>   rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char
> >>>>>>>>> *ring_name,
> >>>>>>>>>                uint16_t queue_id, size_t size, unsigned int align,
> >>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> >>>>>>>>> index 2c9d615fb5..0d964d1f67 100644
> >>>>>>>>> --- a/lib/ethdev/ethdev_driver.h
> >>>>>>>>> +++ b/lib/ethdev/ethdev_driver.h
> >>>>>>>>> @@ -1621,6 +1621,16 @@ int
> >>>>>>>>>   rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
> >>>>>>>>> char *name,
> >>>>>>>>>            uint16_t queue_id);
> >>>>>>>>>
> >>>>>>>>> +/**
> >>>>>>>>> + * @internal
> >>>>>>>>> + * Setup eth fast-path API to ethdev values.
> >>>>>>>>> + *
> >>>>>>>>> + * @param dev
> >>>>>>>>> + *  Pointer to struct rte_eth_dev.
> >>>>>>>>> + */
> >>>>>>>>> +__rte_internal
> >>>>>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
> >>>>>>>>> +
> >>>>>>>>>   /**
> >>>>>>>>>    * @internal
> >>>>>>>>>    * Atomically set the link status for the specific device.
> >>>>>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> >>>>>>>>> index 049641d57c..44ee7229c1 100644
> >>>>>>>>> --- a/lib/ethdev/rte_ethdev.h
> >>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h
> >>>>>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
> >>>>>>>>>        */
> >>>>>>>>>       RTE_ETH_EVENT_RX_AVAIL_THRESH,
> >>>>>>>>>       /** Port recovering from a hardware or firmware error.
> >>>>>>>>> -     * If PMD supports proactive error recovery,
> >>>>>>>>> -     * it should trigger this event to notify application
> >>>>>>>>> -     * that it detected an error and the recovery is being started.
> >>>>>>>>> -     * Upon receiving the event, the application should not invoke
> >>>>>>>>> any control path API
> >>>>>>>>> -     * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until
> >>>>>>>>> receiving
> >>>>>>>>> -     * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> >>>>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
> >>>>>>>>> -     * The PMD will set the data path pointers to dummy functions,
> >>>>>>>>> -     * and re-set the data path pointers to non-dummy functions
> >>>>>>>>> -     * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>>>>>>>> -     * It means that the application cannot send or receive any
> >>>>>>>>> packets
> >>>>>>>>> -     * during this period.
> >>>>>>>>> +     *
> >>>>>>>>> +     * If PMD supports proactive error recovery, it should trigger
> >>>>>>>>> this
> >>>>>>>>> +     * event to notify application that it detected an error and the
> >>>>>>>>> +     * recovery is about to start.
> >>>>>>>>> +     *
> >>>>>>>>> +     * Upon receiving the event, the application should not invoke any
> >>>>>>>>> +     * control and data path API until receiving
> >>>>>>>>> +     * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
> >>>>>>>>> +     * event.
> >>>>>>>>> +     *
> >>>>>>>>> +     * Once this event is reported, the PMD will set the data path
> >>>>>>>>> pointers
> >>>>>>>>> +     * to dummy functions, and re-set the data path pointers to valid
> >>>>>>>>> +     * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
> >>>>>>>>> event.
> >>>>>>>>> +     *
> >>>>>>>>>        * @note Before the PMD reports the recovery result,
> >>>>>>>>>        * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
> >>>>>>>>> again,
> >>>>>>>>>        * because a larger error may occur during the recovery.
> >>>>>>>>>        */
> >>>>>>>>>       RTE_ETH_EVENT_ERR_RECOVERING,
> >>>>>>>>>       /** Port recovers successfully from the error.
> >>>>>>>>> -     * The PMD already re-configured the port,
> >>>>>>>>> -     * and the effect is the same as a restart operation.
> >>>>>>>>> +     *
> >>>>>>>>> +     * The PMD already re-configured the port:
> >>>>>>>>>        * a) The following operation will be retained: (alphabetically)
> >>>>>>>>>        *    - DCB configuration
> >>>>>>>>>        *    - FEC configuration
> >>>>>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
> >>>>>>>>>        *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
> >>>>>>>>>        * c) Any other configuration will not be stored
> >>>>>>>>>        *    and will need to be re-configured.
> >>>>>>>>> +     *
> >>>>>>>>> +     * The application should restore some additional configuration
> >>>>>>>>> +     * (see above case b/c), and then enable data path API invocation.
> >>>>>>>>>        */
> >>>>>>>>>       RTE_ETH_EVENT_RECOVERY_SUCCESS,
> >>>>>>>>>       /** Port recovery failed.
> >>>>>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> >>>>>>>>> index 357d1a88c0..c273e0bdae 100644
> >>>>>>>>> --- a/lib/ethdev/version.map
> >>>>>>>>> +++ b/lib/ethdev/version.map
> >>>>>>>>> @@ -320,6 +320,7 @@ INTERNAL {
> >>>>>>>>>       rte_eth_devices;
> >>>>>>>>>       rte_eth_dma_zone_free;
> >>>>>>>>>       rte_eth_dma_zone_reserve;
> >>>>>>>>> +    rte_eth_fp_ops_setup;
> >>>>>>>>>       rte_eth_hairpin_queue_peer_bind;
> >>>>>>>>>       rte_eth_hairpin_queue_peer_unbind;
> >>>>>>>>>       rte_eth_hairpin_queue_peer_update;
> >>>>>>>>> --
> >>>>>>>>   Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> >>>>>>>>
> >>>>>>>>> 2.17.1
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>
  
Konstantin Ananyev March 7, 2023, 9:56 a.m. UTC | #20
> >
> >
> > >>>>
> > >>>> In the proactive error handling mode, the PMD will set the data
> > >>>> path pointers to dummy functions and then try recovery, in this
> > >>>> period the application may still invoking data path API. This will
> > >>>> introduce a race-condition with data path which may lead to crash [1].
> > >>>>
> > >>>> Although the PMD added delay after setting data path pointers to
> > >>>> cover the above race-condition, it reduces the probability, but it
> > >>>> doesn't solve the problem.
> > >>>>
> > >>>> To solve the race-condition problem fundamentally, the following
> > >>>> requirements are added:
> > >>>> 1. The PMD should set the data path pointers to dummy functions after
> > >>>>      report RTE_ETH_EVENT_ERR_RECOVERING event.
> > >>> Do you mean to say, PMD should set the data path pointers after
> > >>> calling the
> > >> call back function?
> > >>> The PMD is running in the context of multiple EAL threads. How do
> > >>> these
> > >> threads synchronize such that only one thread sets these data pointers?
> > >>
> > >> As I understand this event callback supposed to be called in the
> > >> context of EAL interrupt thread (whoever is more familiar with
> > >> original idea, feel free to correct me if I missed something).
> > > I could not figure this out. It looks to be called from the data plane thread
> > context.
> > > I also have a thought on alternate design at the end, appreciate if you can
> > take a look.
> > >
> > >> How it is going to signal data-path threads that they need to
> > >> stop/suspend calling data-path API - that's I suppose is left to application
> > to decide...
> > >> Same as right now it is application responsibility to stop data-path
> > >> threads before doing dev_stop()/dev/_config()/etc.
> > > Ok, good, this expectation is not new. The application must have a
> > mechanism already.
> > >
> > >>
> > >>
> > >>>
> > >>>> 2. The application should stop data path API invocation when process
> > >>>>      the RTE_ETH_EVENT_ERR_RECOVERING event.
> > >>> Any thoughts on how an application can do this?
> > > We can ignore this question as there is already similar expectation set for
> > earlier functionalities.
> > >
> > >>>
> > >>>> 3. The PMD should set the data path pointers to valid functions before
> > >>>>      report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >>>> 4. The application should enable data path API invocation when process
> > >>>>      the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >>> Do you mean to say that the application should not call the datapath
> > >>> APIs
> > >> while the PMD is running the recovery process?
> > >>
> > >> Yes, I believe that's the intention.
> > > Ok, this is good and makes sense.
> > >
> > >>
> > >>>>
> > >>>> Also, this patch introduce a driver internal function
> > >>>> rte_eth_fp_ops_setup which used as an help function for PMD.
> > >>>>
> > >>>> [1]
> > >>>>
> > >>
> > http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2
> > >>>> -
> > >>>> ashok.k.kaladi@intel.com/
> > >>>>
> > >>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
> > >>>> Cc: stable@dpdk.org
> > >>>>
> > >>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> > >>>> ---
> > >>>>    doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
> > >>>>    lib/ethdev/ethdev_driver.c              |  8 +++++++
> > >>>>    lib/ethdev/ethdev_driver.h              | 10 ++++++++
> > >>>>    lib/ethdev/rte_ethdev.h                 | 32 +++++++++++++++----------
> > >>>>    lib/ethdev/version.map                  |  1 +
> > >>>>    5 files changed, 46 insertions(+), 25 deletions(-)
> > >>>>
> > >>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>> b/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>> index c145a9066c..e380ff135a 100644
> > >>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>> @@ -638,14 +638,9 @@ different from the application invokes
> > >>>> recovery in PASSIVE mode,  the PMD automatically recovers from
> > >>>> error in PROACTIVE mode,  and only a small amount of work is
> > >>>> required for the
> > >> application.
> > >>>>
> > >>>> -During error detection and automatic recovery, -the PMD sets the
> > >>>> data path pointers to dummy functions -(which will prevent the
> > >>>> crash), -and also make sure the control path operations fail with a
> > >>>> return
> > >> code ``-EBUSY``.
> > >>>> -
> > >>>> -Because the PMD recovers automatically, -the application can only
> > >>>> sense that the data flow is disconnected for a while -and the
> > >>>> control API returns an error in this period.
> > >>>> +During error detection and automatic recovery, the PMD sets the
> > >>>> +data path pointers to dummy functions and also make sure the
> > >>>> +control path operations failed with a return code ``-EBUSY``.
> > >>>>
> > >>>>    In order to sense the error happening/recovering,  as well as to
> > >>>> restore some additional configuration, @@ -653,9 +648,9 @@ three
> > >>>> events
> > >> are available:
> > >>>>
> > >>>>    ``RTE_ETH_EVENT_ERR_RECOVERING``
> > >>>>       Notify the application that an error is detected
> > >>>> -   and the recovery is being started.
> > >>>> +   and the recovery is about to start.
> > >>>>       Upon receiving the event, the application should not invoke
> > >>>> -   any control path function until receiving
> > >>>> +   any control and data path API until receiving
> > >>>>       ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
> > >>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
> > >>>>
> > >>>>    .. note::
> > >>>> @@ -666,8 +661,9 @@ three events are available:
> > >>>>
> > >>>>    ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
> > >>>>       Notify the application that the recovery from error is successful,
> > >>>> -   the PMD already re-configures the port,
> > >>>> -   and the effect is the same as a restart operation.
> > >>>> +   the PMD already re-configures the port.
> > >>>> +   The application should restore some additional configuration,
> > >>>> + and then
> > >>> What is the additional configuration? Is this specific to each NIC/PMD?
> > >>> I thought, this is an auto recovery process and the application does
> > >>> not require
> > >> to reconfigure anything. If the application has to restore the
> > >> configuration, how does auto recovery differ from typical recovery
> > process?
> > >>>
> > >>>> +   enable data path API invocation.
> > >>>>
> > >>>>    ``RTE_ETH_EVENT_RECOVERY_FAILED``
> > >>>>       Notify the application that the recovery from error failed,
> > >>>> diff --git a/lib/ethdev/ethdev_driver.c
> > >>>> b/lib/ethdev/ethdev_driver.c index
> > >>>> 0be1e8ca04..f994653fe9 100644
> > >>>> --- a/lib/ethdev/ethdev_driver.c
> > >>>> +++ b/lib/ethdev/ethdev_driver.c
> > >>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct
> > rte_eth_dev
> > >>>> *dev, const char *ring_name,
> > >>>>    	return rc;
> > >>>>    }
> > >>>>
> > >>>> +void
> > >>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev) {
> > >>>> +	if (dev == NULL)
> > >>>> +		return;
> > >>>> +	eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev); }
> > >>>> +
> > >>>>    const struct rte_memzone *
> > >>>>    rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const
> > >>>> char *ring_name,
> > >>>>    			 uint16_t queue_id, size_t size, unsigned int align, diff
> > -
> > >> -git
> > >>>> a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index
> > >>>> 2c9d615fb5..0d964d1f67 100644
> > >>>> --- a/lib/ethdev/ethdev_driver.h
> > >>>> +++ b/lib/ethdev/ethdev_driver.h
> > >>>> @@ -1621,6 +1621,16 @@ int
> > >>>>    rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
> > >>>> char *name,
> > >>>>    		 uint16_t queue_id);
> > >>>>
> > >>>> +/**
> > >>>> + * @internal
> > >>>> + * Setup eth fast-path API to ethdev values.
> > >>>> + *
> > >>>> + * @param dev
> > >>>> + *  Pointer to struct rte_eth_dev.
> > >>>> + */
> > >>>> +__rte_internal
> > >>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
> > >>>> +
> > >>>>    /**
> > >>>>     * @internal
> > >>>>     * Atomically set the link status for the specific device.
> > >>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > >>>> index
> > >>>> 049641d57c..44ee7229c1 100644
> > >>>> --- a/lib/ethdev/rte_ethdev.h
> > >>>> +++ b/lib/ethdev/rte_ethdev.h
> > >>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
> > >>>>    	 */
> > >>>>    	RTE_ETH_EVENT_RX_AVAIL_THRESH,
> > >>>>    	/** Port recovering from a hardware or firmware error.
> > >>>> -	 * If PMD supports proactive error recovery,
> > >>>> -	 * it should trigger this event to notify application
> > >>>> -	 * that it detected an error and the recovery is being started.
> > >>>> -	 * Upon receiving the event, the application should not invoke any
> > >>>> control path API
> > >>>> -	 * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until receiving
> > >>>> -	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> > >>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
> > >>>> -	 * The PMD will set the data path pointers to dummy functions,
> > >>>> -	 * and re-set the data path pointers to non-dummy functions
> > >>>> -	 * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >>>> -	 * It means that the application cannot send or receive any packets
> > >>>> -	 * during this period.
> > >>>> +	 *
> > >>>> +	 * If PMD supports proactive error recovery, it should trigger this
> > >>>> +	 * event to notify application that it detected an error and the
> > >>>> +	 * recovery is about to start.
> > >>>> +	 *
> > >>>> +	 * Upon receiving the event, the application should not invoke any
> > >>>> +	 * control and data path API until receiving
> > >>>> +	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> > >>>> RTE_ETH_EVENT_RECOVERY_FAILED
> > >>>> +	 * event.
> > >>>> +	 *
> > >>>> +	 * Once this event is reported, the PMD will set the data path pointers
> > >>>> +	 * to dummy functions, and re-set the data path pointers to valid
> > >>>> +	 * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
> > >>>> event.
> > >>> Why do we need to set the data path pointers to dummy functions if
> > >>> the
> > >> application is restricted from invoking any control and data path
> > >> APIs till the recovery process is completed?
> > >>
> > >> You are right, in theory it is not mandatory.
> > >> Though it helps to flag a problem if user will still try to call them
> > >> while recovery is in progress.
> > > Ok, may be in debug mode.
> > > I mean, we have already set an expectation to the application that it should
> > not call and the application has implemented a method to do the same. Why
> > do we need to complicate this?
> > > If the application calls the APIs, it is a programming error.
> >
> >
> > My preference would be to keep it this way for both debug and non-debug
> > mode.
> > It doesn't cost anything to us in terms of perfomance, but helps to catch
> > problems with wrong behaving app.
> 
> This is also causing a synchronization problem. i.e. if this has to be done correctly, we need to use correct synchronization
> mechanisms.
> We cannot set the function pointers and assume that data will be visible to other threads/cores in the correct order.
> A possible mechanism (though I see some problems with this) could be to use a guard variable, which indicates when it is safe to use
> the function pointers on the data plane threads. This would require a load-acquire in the data plane threads.

I do realize that it doesn't provide any synchronization by itself. 
It is just best effort approach (no guarantee) to flag a possible problem to the app developer/maintainer, nothing more.
But it showed itself usefull already - as I remember we cached few bugs with it for dev_stop, etc.
Plus it costs us nothing in terms of performance, so why not to have it.

> >
> > >
> > >> Again, same as we doing in dev_stop().
> > >
> > >>
> > >>>
> > >>>> +	 *
> > >>>>    	 * @note Before the PMD reports the recovery result,
> > >>>>    	 * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
> > >>>> again,
> > >>>>    	 * because a larger error may occur during the recovery.
> > >>>>    	 */
> > >>>>    	RTE_ETH_EVENT_ERR_RECOVERING,
> > >>> I understand this is not a change in this patch. But, just
> > >>> wondering, what is the
> > >> purpose of this? How is the application supposed to use this?
> > >>>
> > >>>>    	/** Port recovers successfully from the error.
> > >>>> -	 * The PMD already re-configured the port,
> > >>>> -	 * and the effect is the same as a restart operation.
> > >>>> +	 *
> > >>>> +	 * The PMD already re-configured the port:
> > >>>>    	 * a) The following operation will be retained: (alphabetically)
> > >>>>    	 *    - DCB configuration
> > >>>>    	 *    - FEC configuration
> > >>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
> > >>>>    	 *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
> > >>>>    	 * c) Any other configuration will not be stored
> > >>>>    	 *    and will need to be re-configured.
> > >>>> +	 *
> > >>>> +	 * The application should restore some additional configuration
> > >>>> +	 * (see above case b/c), and then enable data path API invocation.
> > >>>>    	 */
> > >>>>    	RTE_ETH_EVENT_RECOVERY_SUCCESS,
> > >>>>    	/** Port recovery failed.
> > >>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map index
> > >>>> 357d1a88c0..c273e0bdae 100644
> > >>>> --- a/lib/ethdev/version.map
> > >>>> +++ b/lib/ethdev/version.map
> > >>>> @@ -320,6 +320,7 @@ INTERNAL {
> > >>>>    	rte_eth_devices;
> > >>>>    	rte_eth_dma_zone_free;
> > >>>>    	rte_eth_dma_zone_reserve;
> > >>>> +	rte_eth_fp_ops_setup;
> > >>>>    	rte_eth_hairpin_queue_peer_bind;
> > >>>>    	rte_eth_hairpin_queue_peer_unbind;
> > >>>>    	rte_eth_hairpin_queue_peer_update;
> > >>>> --
> > >>>> 2.17.1
> > >>>
> > >
> > > Is there any reason not to design this in the same way as
> > 'rte_eth_dev_reset'? Why does the PMD have to recover by itself?
> >
> > I suppose it is a question for the authors of original patch...
> Appreciate if the authors could comment on this.
> 
> >
> > > We could have a similar API 'rte_eth_dev_recover' to do the recovery
> > functionality.
> >
> > I suppose such approach is also possible.
> > Personally I am fine with both ways: either existing one or what you propose,
> > as long as we'll fix existing race-condition.
> > What is good with what you suggest - that way we probably don't need to
> > worry how to allow user to enable/disable auto-recovery inside PMD.
> >
> > Konstantin
> >
  
Konstantin Ananyev March 7, 2023, 10:11 a.m. UTC | #21
> > >>>>>>>>> In the proactive error handling mode, the PMD will set the data path
> > >>>>>>>>> pointers to dummy functions and then try recovery, in this period the
> > >>>>>>>>> application may still invoking data path API. This will introduce a
> > >>>>>>>>> race-condition with data path which may lead to crash [1].
> > >>>>>>>>>
> > >>>>>>>>> Although the PMD added delay after setting data path pointers to cover
> > >>>>>>>>> the above race-condition, it reduces the probability, but it doesn't
> > >>>>>>>>> solve the problem.
> > >>>>>>>>>
> > >>>>>>>>> To solve the race-condition problem fundamentally, the following
> > >>>>>>>>> requirements are added:
> > >>>>>>>>> 1. The PMD should set the data path pointers to dummy functions after
> > >>>>>>>>>     report RTE_ETH_EVENT_ERR_RECOVERING event.
> > >>>>>>>>> 2. The application should stop data path API invocation when process
> > >>>>>>>>>     the RTE_ETH_EVENT_ERR_RECOVERING event.
> > >>>>>>>>> 3. The PMD should set the data path pointers to valid functions before
> > >>>>>>>>>     report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >>>>>>>>> 4. The application should enable data path API invocation when process
> > >>>>>>>>>     the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>> How this is solving the race-condition, by pushing responsibility to
> > >>>>>>> stop data path to application?
> > >>>>>>
> > >>>>>> Exactly, it becomes application responsibility to make sure data-path is
> > >>>>>> stopped/suspended before recovery will continue.
> > >>>>>>
> > >>>>>
> > >>>>> From documentation of the feature:
> > >>>>>
> > >>>>> ``
> > >>>>> Because the PMD recovers automatically,
> > >>>>> the application can only sense that the data flow is disconnected for a
> > >>>>> while and the control API returns an error in this period.
> > >>>>>
> > >>>>> In order to sense the error happening/recovering, as well as to restore
> > >>>>> some additional configuration, three events are available:
> > >>>>> ``
> > >>>>>
> > >>>>> It looks like initial design is to use events mainly inform application
> > >>>>> about what happened and mainly for re-configuration.
> > >>>>>
> > >>>>> Although I am don't disagree to involve the application, I am not sure
> > >>>>> that is part of current design.
> > >>>>
> > >>>> I thought we all agreed that initial design contain some fallacies that
> > >>>> need to fixed, no?
> > >>>> Statement that with current rte_ethdev design error recovery can be done
> > >>>> without interaction with the app (to stop/suspend data/control path)
> > >>>> is the main one I think.
> > >>>> It needs some interaction with app layer, one way or another.
> > >>>>
> > >>>>>>>
> > >>>>>>> What if application is not interested in recovery modes at all and not
> > >>>>>>> registered any callback for the recovery?
> > >>>>>>
> > >>>>>>
> > >>>>>> Are you saying there is no way for application to disable
> > >>>>>> automatic recovery in PMD if it is not interested
> > >>>>>> (or can't full-fill per-requesties for it)?
> > >>>>>> If so, then yes it is a problem and we need to fix it.
> > >>>>>> I assumed that such mechanism to disable unwanted events already exists,
> > >>>>>> but I can't find anything.
> > >>>>>> Wonder what would be the easiest way here - can PMD make a decision
> > >>>>>> based on callback return value, or do we need a new API to
> > >>>>>> enable/disable callbacks, or ...?
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>> As far as I can see automatic recovery is not configurable by app.
> > >>>>>
> > >>>>> But that is not all, PMD sends events to application but PMD can't know
> > >>>>> if application is handling them or not, so with current design PMD can't
> > >>>>> rely on to app.
> > >>>>
> > >>>> Well, PMD invokes user provided callback.
> > >>>> One way to fix that problem - if there is no callback provided,
> > >>>> or callback returns an error code - PMD can assume that recovery
> > >>>> should not be done.
> > >>>> That is probably not the best design choice, but at least it will allow
> > >>>> to fix the problem without too many changes and introducing new API.
> > >>>> That could be sort of a 'quick fix'.
> > >>>> In a meanwhile we can think about new/better approach for that.
> > >>>>
> > >>>
> > >>> -rc2 for 23.03 is a few days away.
> > >>>
> > >>> What do you think to have 'quick fix' as modifying how driver updates
> > >>> burst ops to prevent the race condition, for this release?
> >
> > The 'quick fix', do you mean only update function pointer (without rxq setting) ?
> > Currently the PMDs which announced support "proactive error handling mode" already
> > do this.
> 
> Really sorry guys, I was too fast on the keyboard, and didn't read properly what Ferruh suggested.
> Reading it once again - no I don not agree with that.
> It wouldn't fix anything, but will just add extra mess into the code.
> Sorry again for the wrong reply.
> Konstantin
> 

Thinking about 'quick fix' once again: I think the patches Fengchengwen already provided:
https://patchwork.dpdk.org/project/dpdk/list/?series=27201
is a much better approach.
I believe it should stop race condition (and crashing) with properly written callback.
If we still have time for it, I'd suggest one extra change in PMD:
check that recovery callback is installed, if not simply not start recovery at all.  

> >
> > >>>
> > >>> And plan a design update for the next release?
> > >> +1 on the overall approach.
> > >
> > > Yep, agree.
> >
> > Hope for better solution.
> > And also, I notice only the openvswitch (from all open-source software which based-on DPDK)
> > registers RTE_ETH_EVENT_INTR_RESET callback .
> >
> > Therefore, hope we build a recovery framework at the DPDK SDK level and be compatible
> > with RTE_ETH_EVENT_INTR_RESET and RTE_ETH_EVENT_ERR_RECOVERING mechanism.
> >
> > >
> > >>
> > >>>
> > >>>
> > >>>>>
> > >>>>>>> I think driver should not rely on application for this, unless
> > >>>>>>> application explicitly says (to driver) that it is handling recovery,
> > >>>>>>> right now there is no way for driver to know this.
> > >>>>>>
> > >>>>>> I think it is visa-versa:
> > >>>>>> application should not enable auto-recovery if it can't meet
> > >>>>>> per-requeststies for it (provide appropriate callback).
> > >>>>>>
> > >>>>>
> > >>>>> I agree on above, we are saying similar thing in different perspective.
> > >>>>
> > >>>> Ok, that's good we are on the same page.
> > >>>>
> > >>>>
> > >>>>>
> > >>>>>>
> > >>>>>>>
> > >>>>>>>>> Also, this patch introduce a driver internal function
> > >>>>>>>>> rte_eth_fp_ops_setup which used as an help function for PMD.
> > >>>>>>>>>
> > >>>>>>>>> [1]
> > >>>>>>>>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
> > >>>>>>>>>
> > >>>>>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
> > >>>>>>>>> Cc: stable@dpdk.org
> > >>>>>>>>>
> > >>>>>>>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> > >>>>>>>>> ---
> > >>>>>>>>>   doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
> > >>>>>>>>>   lib/ethdev/ethdev_driver.c              |  8 +++++++
> > >>>>>>>>>   lib/ethdev/ethdev_driver.h              | 10 ++++++++
> > >>>>>>>>>   lib/ethdev/rte_ethdev.h                 | 32
> > >>>>>>>>> +++++++++++++++----------
> > >>>>>>>>>   lib/ethdev/version.map                  |  1 +
> > >>>>>>>>>   5 files changed, 46 insertions(+), 25 deletions(-)
> > >>>>>>>>>
> > >>>>>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>>>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>>>>>>> index c145a9066c..e380ff135a 100644
> > >>>>>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>>>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>>>>>>> @@ -638,14 +638,9 @@ different from the application invokes recovery
> > >>>>>>>>> in PASSIVE mode,
> > >>>>>>>>>   the PMD automatically recovers from error in PROACTIVE mode,
> > >>>>>>>>>   and only a small amount of work is required for the application.
> > >>>>>>>>>
> > >>>>>>>>> -During error detection and automatic recovery,
> > >>>>>>>>> -the PMD sets the data path pointers to dummy functions
> > >>>>>>>>> -(which will prevent the crash),
> > >>>>>>>>> -and also make sure the control path operations fail with a return
> > >>>>>>>>> code ``-EBUSY``.
> > >>>>>>>>> -
> > >>>>>>>>> -Because the PMD recovers automatically,
> > >>>>>>>>> -the application can only sense that the data flow is disconnected
> > >>>>>>>>> for a while
> > >>>>>>>>> -and the control API returns an error in this period.
> > >>>>>>>>> +During error detection and automatic recovery, the PMD sets the
> > >>>>>>>>> data path
> > >>>>>>>>> +pointers to dummy functions and also make sure the control path
> > >>>>>>>>> operations
> > >>>>>>>>> +failed with a return code ``-EBUSY``.
> > >>>>>>>>>
> > >>>>>>>>>   In order to sense the error happening/recovering,
> > >>>>>>>>>   as well as to restore some additional configuration,
> > >>>>>>>>> @@ -653,9 +648,9 @@ three events are available:
> > >>>>>>>>>
> > >>>>>>>>>   ``RTE_ETH_EVENT_ERR_RECOVERING``
> > >>>>>>>>>      Notify the application that an error is detected
> > >>>>>>>>> -   and the recovery is being started.
> > >>>>>>>>> +   and the recovery is about to start.
> > >>>>>>>>>      Upon receiving the event, the application should not invoke
> > >>>>>>>>> -   any control path function until receiving
> > >>>>>>>>> +   any control and data path API until receiving
> > >>>>>>>>>      ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
> > >>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
> > >>>>>>>>>
> > >>>>>>>>>   .. note::
> > >>>>>>>>> @@ -666,8 +661,9 @@ three events are available:
> > >>>>>>>>>
> > >>>>>>>>>   ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
> > >>>>>>>>>      Notify the application that the recovery from error is successful,
> > >>>>>>>>> -   the PMD already re-configures the port,
> > >>>>>>>>> -   and the effect is the same as a restart operation.
> > >>>>>>>>> +   the PMD already re-configures the port.
> > >>>>>>>>> +   The application should restore some additional configuration,
> > >>>>>>>>> and then
> > >>>>>>>>> +   enable data path API invocation.
> > >>>>>>>>>
> > >>>>>>>>>   ``RTE_ETH_EVENT_RECOVERY_FAILED``
> > >>>>>>>>>      Notify the application that the recovery from error failed,
> > >>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
> > >>>>>>>>> index 0be1e8ca04..f994653fe9 100644
> > >>>>>>>>> --- a/lib/ethdev/ethdev_driver.c
> > >>>>>>>>> +++ b/lib/ethdev/ethdev_driver.c
> > >>>>>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev
> > >>>>>>>>> *dev, const char *ring_name,
> > >>>>>>>>>       return rc;
> > >>>>>>>>>   }
> > >>>>>>>>>
> > >>>>>>>>> +void
> > >>>>>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
> > >>>>>>>>> +{
> > >>>>>>>>> +    if (dev == NULL)
> > >>>>>>>>> +        return;
> > >>>>>>>>> +    eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
> > >>>>>>>>> +}
> > >>>>>>>>> +
> > >>>>>>>>>   const struct rte_memzone *
> > >>>>>>>>>   rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char
> > >>>>>>>>> *ring_name,
> > >>>>>>>>>                uint16_t queue_id, size_t size, unsigned int align,
> > >>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> > >>>>>>>>> index 2c9d615fb5..0d964d1f67 100644
> > >>>>>>>>> --- a/lib/ethdev/ethdev_driver.h
> > >>>>>>>>> +++ b/lib/ethdev/ethdev_driver.h
> > >>>>>>>>> @@ -1621,6 +1621,16 @@ int
> > >>>>>>>>>   rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
> > >>>>>>>>> char *name,
> > >>>>>>>>>            uint16_t queue_id);
> > >>>>>>>>>
> > >>>>>>>>> +/**
> > >>>>>>>>> + * @internal
> > >>>>>>>>> + * Setup eth fast-path API to ethdev values.
> > >>>>>>>>> + *
> > >>>>>>>>> + * @param dev
> > >>>>>>>>> + *  Pointer to struct rte_eth_dev.
> > >>>>>>>>> + */
> > >>>>>>>>> +__rte_internal
> > >>>>>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
> > >>>>>>>>> +
> > >>>>>>>>>   /**
> > >>>>>>>>>    * @internal
> > >>>>>>>>>    * Atomically set the link status for the specific device.
> > >>>>>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > >>>>>>>>> index 049641d57c..44ee7229c1 100644
> > >>>>>>>>> --- a/lib/ethdev/rte_ethdev.h
> > >>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h
> > >>>>>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
> > >>>>>>>>>        */
> > >>>>>>>>>       RTE_ETH_EVENT_RX_AVAIL_THRESH,
> > >>>>>>>>>       /** Port recovering from a hardware or firmware error.
> > >>>>>>>>> -     * If PMD supports proactive error recovery,
> > >>>>>>>>> -     * it should trigger this event to notify application
> > >>>>>>>>> -     * that it detected an error and the recovery is being started.
> > >>>>>>>>> -     * Upon receiving the event, the application should not invoke
> > >>>>>>>>> any control path API
> > >>>>>>>>> -     * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until
> > >>>>>>>>> receiving
> > >>>>>>>>> -     * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> > >>>>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
> > >>>>>>>>> -     * The PMD will set the data path pointers to dummy functions,
> > >>>>>>>>> -     * and re-set the data path pointers to non-dummy functions
> > >>>>>>>>> -     * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >>>>>>>>> -     * It means that the application cannot send or receive any
> > >>>>>>>>> packets
> > >>>>>>>>> -     * during this period.
> > >>>>>>>>> +     *
> > >>>>>>>>> +     * If PMD supports proactive error recovery, it should trigger
> > >>>>>>>>> this
> > >>>>>>>>> +     * event to notify application that it detected an error and the
> > >>>>>>>>> +     * recovery is about to start.
> > >>>>>>>>> +     *
> > >>>>>>>>> +     * Upon receiving the event, the application should not invoke any
> > >>>>>>>>> +     * control and data path API until receiving
> > >>>>>>>>> +     * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
> > >>>>>>>>> +     * event.
> > >>>>>>>>> +     *
> > >>>>>>>>> +     * Once this event is reported, the PMD will set the data path
> > >>>>>>>>> pointers
> > >>>>>>>>> +     * to dummy functions, and re-set the data path pointers to valid
> > >>>>>>>>> +     * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
> > >>>>>>>>> event.
> > >>>>>>>>> +     *
> > >>>>>>>>>        * @note Before the PMD reports the recovery result,
> > >>>>>>>>>        * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
> > >>>>>>>>> again,
> > >>>>>>>>>        * because a larger error may occur during the recovery.
> > >>>>>>>>>        */
> > >>>>>>>>>       RTE_ETH_EVENT_ERR_RECOVERING,
> > >>>>>>>>>       /** Port recovers successfully from the error.
> > >>>>>>>>> -     * The PMD already re-configured the port,
> > >>>>>>>>> -     * and the effect is the same as a restart operation.
> > >>>>>>>>> +     *
> > >>>>>>>>> +     * The PMD already re-configured the port:
> > >>>>>>>>>        * a) The following operation will be retained: (alphabetically)
> > >>>>>>>>>        *    - DCB configuration
> > >>>>>>>>>        *    - FEC configuration
> > >>>>>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
> > >>>>>>>>>        *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
> > >>>>>>>>>        * c) Any other configuration will not be stored
> > >>>>>>>>>        *    and will need to be re-configured.
> > >>>>>>>>> +     *
> > >>>>>>>>> +     * The application should restore some additional configuration
> > >>>>>>>>> +     * (see above case b/c), and then enable data path API invocation.
> > >>>>>>>>>        */
> > >>>>>>>>>       RTE_ETH_EVENT_RECOVERY_SUCCESS,
> > >>>>>>>>>       /** Port recovery failed.
> > >>>>>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> > >>>>>>>>> index 357d1a88c0..c273e0bdae 100644
> > >>>>>>>>> --- a/lib/ethdev/version.map
> > >>>>>>>>> +++ b/lib/ethdev/version.map
> > >>>>>>>>> @@ -320,6 +320,7 @@ INTERNAL {
> > >>>>>>>>>       rte_eth_devices;
> > >>>>>>>>>       rte_eth_dma_zone_free;
> > >>>>>>>>>       rte_eth_dma_zone_reserve;
> > >>>>>>>>> +    rte_eth_fp_ops_setup;
> > >>>>>>>>>       rte_eth_hairpin_queue_peer_bind;
> > >>>>>>>>>       rte_eth_hairpin_queue_peer_unbind;
> > >>>>>>>>>       rte_eth_hairpin_queue_peer_update;
> > >>>>>>>>> --
> > >>>>>>>>   Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> > >>>>>>>>
> > >>>>>>>>> 2.17.1
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>>
  
Ferruh Yigit March 7, 2023, 12:07 p.m. UTC | #22
On 3/7/2023 8:25 AM, fengchengwen wrote:
> 
> 
> On 2023/3/6 19:13, Konstantin Ananyev wrote:
>>
>>
>>>>>>>>>> In the proactive error handling mode, the PMD will set the data path
>>>>>>>>>> pointers to dummy functions and then try recovery, in this period the
>>>>>>>>>> application may still invoking data path API. This will introduce a
>>>>>>>>>> race-condition with data path which may lead to crash [1].
>>>>>>>>>>
>>>>>>>>>> Although the PMD added delay after setting data path pointers to cover
>>>>>>>>>> the above race-condition, it reduces the probability, but it doesn't
>>>>>>>>>> solve the problem.
>>>>>>>>>>
>>>>>>>>>> To solve the race-condition problem fundamentally, the following
>>>>>>>>>> requirements are added:
>>>>>>>>>> 1. The PMD should set the data path pointers to dummy functions after
>>>>>>>>>>     report RTE_ETH_EVENT_ERR_RECOVERING event.
>>>>>>>>>> 2. The application should stop data path API invocation when process
>>>>>>>>>>     the RTE_ETH_EVENT_ERR_RECOVERING event.
>>>>>>>>>> 3. The PMD should set the data path pointers to valid functions before
>>>>>>>>>>     report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>>>>>>> 4. The application should enable data path API invocation when process
>>>>>>>>>>     the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>>>>>>>
>>>>>>>>
>>>>>>>> How this is solving the race-condition, by pushing responsibility to
>>>>>>>> stop data path to application?
>>>>>>>
>>>>>>> Exactly, it becomes application responsibility to make sure data-path is
>>>>>>> stopped/suspended before recovery will continue.
>>>>>>>
>>>>>>
>>>>>> From documentation of the feature:
>>>>>>
>>>>>> ``
>>>>>> Because the PMD recovers automatically,
>>>>>> the application can only sense that the data flow is disconnected for a
>>>>>> while and the control API returns an error in this period.
>>>>>>
>>>>>> In order to sense the error happening/recovering, as well as to restore
>>>>>> some additional configuration, three events are available:
>>>>>> ``
>>>>>>
>>>>>> It looks like initial design is to use events mainly inform application
>>>>>> about what happened and mainly for re-configuration.
>>>>>>
>>>>>> Although I am don't disagree to involve the application, I am not sure
>>>>>> that is part of current design.
>>>>>
>>>>> I thought we all agreed that initial design contain some fallacies that
>>>>> need to fixed, no?
>>>>> Statement that with current rte_ethdev design error recovery can be done
>>>>> without interaction with the app (to stop/suspend data/control path)
>>>>> is the main one I think.
>>>>> It needs some interaction with app layer, one way or another.
>>>>>
>>>>>>>>
>>>>>>>> What if application is not interested in recovery modes at all and not
>>>>>>>> registered any callback for the recovery?
>>>>>>>
>>>>>>>
>>>>>>> Are you saying there is no way for application to disable
>>>>>>> automatic recovery in PMD if it is not interested
>>>>>>> (or can't full-fill per-requesties for it)?
>>>>>>> If so, then yes it is a problem and we need to fix it.
>>>>>>> I assumed that such mechanism to disable unwanted events already exists,
>>>>>>> but I can't find anything.
>>>>>>> Wonder what would be the easiest way here - can PMD make a decision
>>>>>>> based on callback return value, or do we need a new API to
>>>>>>> enable/disable callbacks, or ...?
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> As far as I can see automatic recovery is not configurable by app.
>>>>>>
>>>>>> But that is not all, PMD sends events to application but PMD can't know
>>>>>> if application is handling them or not, so with current design PMD can't
>>>>>> rely on to app.
>>>>>
>>>>> Well, PMD invokes user provided callback.
>>>>> One way to fix that problem - if there is no callback provided,
>>>>> or callback returns an error code - PMD can assume that recovery
>>>>> should not be done.
>>>>> That is probably not the best design choice, but at least it will allow
>>>>> to fix the problem without too many changes and introducing new API.
>>>>> That could be sort of a 'quick fix'.
>>>>> In a meanwhile we can think about new/better approach for that.
>>>>>
>>>>
>>>> -rc2 for 23.03 is a few days away.
>>>>
>>>> What do you think to have 'quick fix' as modifying how driver updates
>>>> burst ops to prevent the race condition, for this release?
> 
> The 'quick fix', do you mean only update function pointer (without rxq setting) ?
> Currently the PMDs which announced support "proactive error handling mode" already
> do this.
> 

Yes.
I checked hns3, it does as you said, hns3_eth_dev_fp_ops_config()'
updates all fields in 'rte_eth_fp_ops' but only function pointer seems
changed in the driver, resulting only function pointers to be updated.

The discussion about race condition started with patch [1], which
mentions a crash because of a race condition. Later in discussions,
recovery event given as a sample for where the race can occur, that is
why we are here.

But after above info, although there is race condition and a bigger
update (that needs application involvement) is required for recovery
mechanism, there is no crash and NO 'quick fix' is required for recovery.

@Konstantin, @Chengwen, can you please confirm above understanding is
correct?



[1]
https://patches.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/

>>>>
>>>> And plan a design update for the next release?
>>> +1 on the overall approach.
>>
>> Yep, agree.
> 
> Hope for better solution.
> And also, I notice only the openvswitch (from all open-source software which based-on DPDK)
> registers RTE_ETH_EVENT_INTR_RESET callback .
> 
> Therefore, hope we build a recovery framework at the DPDK SDK level and be compatible
> with RTE_ETH_EVENT_INTR_RESET and RTE_ETH_EVENT_ERR_RECOVERING mechanism.
> 
>>  
>>>
>>>>
>>>>
>>>>>>
>>>>>>>> I think driver should not rely on application for this, unless
>>>>>>>> application explicitly says (to driver) that it is handling recovery,
>>>>>>>> right now there is no way for driver to know this.
>>>>>>>
>>>>>>> I think it is visa-versa:
>>>>>>> application should not enable auto-recovery if it can't meet
>>>>>>> per-requeststies for it (provide appropriate callback).
>>>>>>>
>>>>>>
>>>>>> I agree on above, we are saying similar thing in different perspective.
>>>>>
>>>>> Ok, that's good we are on the same page.
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>> Also, this patch introduce a driver internal function
>>>>>>>>>> rte_eth_fp_ops_setup which used as an help function for PMD.
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
>>>>>>>>>>
>>>>>>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
>>>>>>>>>> Cc: stable@dpdk.org
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>>>>>>>>>> ---
>>>>>>>>>>   doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
>>>>>>>>>>   lib/ethdev/ethdev_driver.c              |  8 +++++++
>>>>>>>>>>   lib/ethdev/ethdev_driver.h              | 10 ++++++++
>>>>>>>>>>   lib/ethdev/rte_ethdev.h                 | 32
>>>>>>>>>> +++++++++++++++----------
>>>>>>>>>>   lib/ethdev/version.map                  |  1 +
>>>>>>>>>>   5 files changed, 46 insertions(+), 25 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>>>>>> index c145a9066c..e380ff135a 100644
>>>>>>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>>>>>> @@ -638,14 +638,9 @@ different from the application invokes recovery
>>>>>>>>>> in PASSIVE mode,
>>>>>>>>>>   the PMD automatically recovers from error in PROACTIVE mode,
>>>>>>>>>>   and only a small amount of work is required for the application.
>>>>>>>>>>
>>>>>>>>>> -During error detection and automatic recovery,
>>>>>>>>>> -the PMD sets the data path pointers to dummy functions
>>>>>>>>>> -(which will prevent the crash),
>>>>>>>>>> -and also make sure the control path operations fail with a return
>>>>>>>>>> code ``-EBUSY``.
>>>>>>>>>> -
>>>>>>>>>> -Because the PMD recovers automatically,
>>>>>>>>>> -the application can only sense that the data flow is disconnected
>>>>>>>>>> for a while
>>>>>>>>>> -and the control API returns an error in this period.
>>>>>>>>>> +During error detection and automatic recovery, the PMD sets the
>>>>>>>>>> data path
>>>>>>>>>> +pointers to dummy functions and also make sure the control path
>>>>>>>>>> operations
>>>>>>>>>> +failed with a return code ``-EBUSY``.
>>>>>>>>>>
>>>>>>>>>>   In order to sense the error happening/recovering,
>>>>>>>>>>   as well as to restore some additional configuration,
>>>>>>>>>> @@ -653,9 +648,9 @@ three events are available:
>>>>>>>>>>
>>>>>>>>>>   ``RTE_ETH_EVENT_ERR_RECOVERING``
>>>>>>>>>>      Notify the application that an error is detected
>>>>>>>>>> -   and the recovery is being started.
>>>>>>>>>> +   and the recovery is about to start.
>>>>>>>>>>      Upon receiving the event, the application should not invoke
>>>>>>>>>> -   any control path function until receiving
>>>>>>>>>> +   any control and data path API until receiving
>>>>>>>>>>      ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
>>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
>>>>>>>>>>
>>>>>>>>>>   .. note::
>>>>>>>>>> @@ -666,8 +661,9 @@ three events are available:
>>>>>>>>>>
>>>>>>>>>>   ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
>>>>>>>>>>      Notify the application that the recovery from error is successful,
>>>>>>>>>> -   the PMD already re-configures the port,
>>>>>>>>>> -   and the effect is the same as a restart operation.
>>>>>>>>>> +   the PMD already re-configures the port.
>>>>>>>>>> +   The application should restore some additional configuration,
>>>>>>>>>> and then
>>>>>>>>>> +   enable data path API invocation.
>>>>>>>>>>
>>>>>>>>>>   ``RTE_ETH_EVENT_RECOVERY_FAILED``
>>>>>>>>>>      Notify the application that the recovery from error failed,
>>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
>>>>>>>>>> index 0be1e8ca04..f994653fe9 100644
>>>>>>>>>> --- a/lib/ethdev/ethdev_driver.c
>>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.c
>>>>>>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev
>>>>>>>>>> *dev, const char *ring_name,
>>>>>>>>>>       return rc;
>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>> +void
>>>>>>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
>>>>>>>>>> +{
>>>>>>>>>> +    if (dev == NULL)
>>>>>>>>>> +        return;
>>>>>>>>>> +    eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>>   const struct rte_memzone *
>>>>>>>>>>   rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char
>>>>>>>>>> *ring_name,
>>>>>>>>>>                uint16_t queue_id, size_t size, unsigned int align,
>>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
>>>>>>>>>> index 2c9d615fb5..0d964d1f67 100644
>>>>>>>>>> --- a/lib/ethdev/ethdev_driver.h
>>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.h
>>>>>>>>>> @@ -1621,6 +1621,16 @@ int
>>>>>>>>>>   rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
>>>>>>>>>> char *name,
>>>>>>>>>>            uint16_t queue_id);
>>>>>>>>>>
>>>>>>>>>> +/**
>>>>>>>>>> + * @internal
>>>>>>>>>> + * Setup eth fast-path API to ethdev values.
>>>>>>>>>> + *
>>>>>>>>>> + * @param dev
>>>>>>>>>> + *  Pointer to struct rte_eth_dev.
>>>>>>>>>> + */
>>>>>>>>>> +__rte_internal
>>>>>>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
>>>>>>>>>> +
>>>>>>>>>>   /**
>>>>>>>>>>    * @internal
>>>>>>>>>>    * Atomically set the link status for the specific device.
>>>>>>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>>>>>>>>>> index 049641d57c..44ee7229c1 100644
>>>>>>>>>> --- a/lib/ethdev/rte_ethdev.h
>>>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>>>>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
>>>>>>>>>>        */
>>>>>>>>>>       RTE_ETH_EVENT_RX_AVAIL_THRESH,
>>>>>>>>>>       /** Port recovering from a hardware or firmware error.
>>>>>>>>>> -     * If PMD supports proactive error recovery,
>>>>>>>>>> -     * it should trigger this event to notify application
>>>>>>>>>> -     * that it detected an error and the recovery is being started.
>>>>>>>>>> -     * Upon receiving the event, the application should not invoke
>>>>>>>>>> any control path API
>>>>>>>>>> -     * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until
>>>>>>>>>> receiving
>>>>>>>>>> -     * RTE_ETH_EVENT_RECOVERY_SUCCESS or
>>>>>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
>>>>>>>>>> -     * The PMD will set the data path pointers to dummy functions,
>>>>>>>>>> -     * and re-set the data path pointers to non-dummy functions
>>>>>>>>>> -     * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>>>>>>> -     * It means that the application cannot send or receive any
>>>>>>>>>> packets
>>>>>>>>>> -     * during this period.
>>>>>>>>>> +     *
>>>>>>>>>> +     * If PMD supports proactive error recovery, it should trigger
>>>>>>>>>> this
>>>>>>>>>> +     * event to notify application that it detected an error and the
>>>>>>>>>> +     * recovery is about to start.
>>>>>>>>>> +     *
>>>>>>>>>> +     * Upon receiving the event, the application should not invoke any
>>>>>>>>>> +     * control and data path API until receiving
>>>>>>>>>> +     * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
>>>>>>>>>> +     * event.
>>>>>>>>>> +     *
>>>>>>>>>> +     * Once this event is reported, the PMD will set the data path
>>>>>>>>>> pointers
>>>>>>>>>> +     * to dummy functions, and re-set the data path pointers to valid
>>>>>>>>>> +     * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
>>>>>>>>>> event.
>>>>>>>>>> +     *
>>>>>>>>>>        * @note Before the PMD reports the recovery result,
>>>>>>>>>>        * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
>>>>>>>>>> again,
>>>>>>>>>>        * because a larger error may occur during the recovery.
>>>>>>>>>>        */
>>>>>>>>>>       RTE_ETH_EVENT_ERR_RECOVERING,
>>>>>>>>>>       /** Port recovers successfully from the error.
>>>>>>>>>> -     * The PMD already re-configured the port,
>>>>>>>>>> -     * and the effect is the same as a restart operation.
>>>>>>>>>> +     *
>>>>>>>>>> +     * The PMD already re-configured the port:
>>>>>>>>>>        * a) The following operation will be retained: (alphabetically)
>>>>>>>>>>        *    - DCB configuration
>>>>>>>>>>        *    - FEC configuration
>>>>>>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
>>>>>>>>>>        *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
>>>>>>>>>>        * c) Any other configuration will not be stored
>>>>>>>>>>        *    and will need to be re-configured.
>>>>>>>>>> +     *
>>>>>>>>>> +     * The application should restore some additional configuration
>>>>>>>>>> +     * (see above case b/c), and then enable data path API invocation.
>>>>>>>>>>        */
>>>>>>>>>>       RTE_ETH_EVENT_RECOVERY_SUCCESS,
>>>>>>>>>>       /** Port recovery failed.
>>>>>>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
>>>>>>>>>> index 357d1a88c0..c273e0bdae 100644
>>>>>>>>>> --- a/lib/ethdev/version.map
>>>>>>>>>> +++ b/lib/ethdev/version.map
>>>>>>>>>> @@ -320,6 +320,7 @@ INTERNAL {
>>>>>>>>>>       rte_eth_devices;
>>>>>>>>>>       rte_eth_dma_zone_free;
>>>>>>>>>>       rte_eth_dma_zone_reserve;
>>>>>>>>>> +    rte_eth_fp_ops_setup;
>>>>>>>>>>       rte_eth_hairpin_queue_peer_bind;
>>>>>>>>>>       rte_eth_hairpin_queue_peer_unbind;
>>>>>>>>>>       rte_eth_hairpin_queue_peer_update;
>>>>>>>>>> --
>>>>>>>>>   Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
>>>>>>>>>
>>>>>>>>>> 2.17.1
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
  
Chengwen Feng March 7, 2023, 12:26 p.m. UTC | #23
On 2023/3/7 20:07, Ferruh Yigit wrote:
> On 3/7/2023 8:25 AM, fengchengwen wrote:
>>
>>
>> On 2023/3/6 19:13, Konstantin Ananyev wrote:
>>>
>>>
>>>>>>>>>>> In the proactive error handling mode, the PMD will set the data path
>>>>>>>>>>> pointers to dummy functions and then try recovery, in this period the
>>>>>>>>>>> application may still invoking data path API. This will introduce a
>>>>>>>>>>> race-condition with data path which may lead to crash [1].
>>>>>>>>>>>
>>>>>>>>>>> Although the PMD added delay after setting data path pointers to cover
>>>>>>>>>>> the above race-condition, it reduces the probability, but it doesn't
>>>>>>>>>>> solve the problem.
>>>>>>>>>>>
>>>>>>>>>>> To solve the race-condition problem fundamentally, the following
>>>>>>>>>>> requirements are added:
>>>>>>>>>>> 1. The PMD should set the data path pointers to dummy functions after
>>>>>>>>>>>     report RTE_ETH_EVENT_ERR_RECOVERING event.
>>>>>>>>>>> 2. The application should stop data path API invocation when process
>>>>>>>>>>>     the RTE_ETH_EVENT_ERR_RECOVERING event.
>>>>>>>>>>> 3. The PMD should set the data path pointers to valid functions before
>>>>>>>>>>>     report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>>>>>>>> 4. The application should enable data path API invocation when process
>>>>>>>>>>>     the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> How this is solving the race-condition, by pushing responsibility to
>>>>>>>>> stop data path to application?
>>>>>>>>
>>>>>>>> Exactly, it becomes application responsibility to make sure data-path is
>>>>>>>> stopped/suspended before recovery will continue.
>>>>>>>>
>>>>>>>
>>>>>>> From documentation of the feature:
>>>>>>>
>>>>>>> ``
>>>>>>> Because the PMD recovers automatically,
>>>>>>> the application can only sense that the data flow is disconnected for a
>>>>>>> while and the control API returns an error in this period.
>>>>>>>
>>>>>>> In order to sense the error happening/recovering, as well as to restore
>>>>>>> some additional configuration, three events are available:
>>>>>>> ``
>>>>>>>
>>>>>>> It looks like initial design is to use events mainly inform application
>>>>>>> about what happened and mainly for re-configuration.
>>>>>>>
>>>>>>> Although I am don't disagree to involve the application, I am not sure
>>>>>>> that is part of current design.
>>>>>>
>>>>>> I thought we all agreed that initial design contain some fallacies that
>>>>>> need to fixed, no?
>>>>>> Statement that with current rte_ethdev design error recovery can be done
>>>>>> without interaction with the app (to stop/suspend data/control path)
>>>>>> is the main one I think.
>>>>>> It needs some interaction with app layer, one way or another.
>>>>>>
>>>>>>>>>
>>>>>>>>> What if application is not interested in recovery modes at all and not
>>>>>>>>> registered any callback for the recovery?
>>>>>>>>
>>>>>>>>
>>>>>>>> Are you saying there is no way for application to disable
>>>>>>>> automatic recovery in PMD if it is not interested
>>>>>>>> (or can't full-fill per-requesties for it)?
>>>>>>>> If so, then yes it is a problem and we need to fix it.
>>>>>>>> I assumed that such mechanism to disable unwanted events already exists,
>>>>>>>> but I can't find anything.
>>>>>>>> Wonder what would be the easiest way here - can PMD make a decision
>>>>>>>> based on callback return value, or do we need a new API to
>>>>>>>> enable/disable callbacks, or ...?
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> As far as I can see automatic recovery is not configurable by app.
>>>>>>>
>>>>>>> But that is not all, PMD sends events to application but PMD can't know
>>>>>>> if application is handling them or not, so with current design PMD can't
>>>>>>> rely on to app.
>>>>>>
>>>>>> Well, PMD invokes user provided callback.
>>>>>> One way to fix that problem - if there is no callback provided,
>>>>>> or callback returns an error code - PMD can assume that recovery
>>>>>> should not be done.
>>>>>> That is probably not the best design choice, but at least it will allow
>>>>>> to fix the problem without too many changes and introducing new API.
>>>>>> That could be sort of a 'quick fix'.
>>>>>> In a meanwhile we can think about new/better approach for that.
>>>>>>
>>>>>
>>>>> -rc2 for 23.03 is a few days away.
>>>>>
>>>>> What do you think to have 'quick fix' as modifying how driver updates
>>>>> burst ops to prevent the race condition, for this release?
>>
>> The 'quick fix', do you mean only update function pointer (without rxq setting) ?
>> Currently the PMDs which announced support "proactive error handling mode" already
>> do this.
>>
> 
> Yes.
> I checked hns3, it does as you said, hns3_eth_dev_fp_ops_config()'
> updates all fields in 'rte_eth_fp_ops' but only function pointer seems
> changed in the driver, resulting only function pointers to be updated.
> 
> The discussion about race condition started with patch [1], which
> mentions a crash because of a race condition. Later in discussions,
> recovery event given as a sample for where the race can occur, that is
> why we are here.
> 
> But after above info, although there is race condition and a bigger
> update (that needs application involvement) is required for recovery
> mechanism, there is no crash and NO 'quick fix' is required for recovery.
> 
> @Konstantin, @Chengwen, can you please confirm above understanding is
> correct?

Yes, that's what.

> 
> 
> 
> [1]
> https://patches.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
> 
>>>>>
>>>>> And plan a design update for the next release?
>>>> +1 on the overall approach.
>>>
>>> Yep, agree.
>>
>> Hope for better solution.
>> And also, I notice only the openvswitch (from all open-source software which based-on DPDK)
>> registers RTE_ETH_EVENT_INTR_RESET callback .
>>
>> Therefore, hope we build a recovery framework at the DPDK SDK level and be compatible
>> with RTE_ETH_EVENT_INTR_RESET and RTE_ETH_EVENT_ERR_RECOVERING mechanism.
>>
>>>  
>>>>
>>>>>
>>>>>
>>>>>>>
>>>>>>>>> I think driver should not rely on application for this, unless
>>>>>>>>> application explicitly says (to driver) that it is handling recovery,
>>>>>>>>> right now there is no way for driver to know this.
>>>>>>>>
>>>>>>>> I think it is visa-versa:
>>>>>>>> application should not enable auto-recovery if it can't meet
>>>>>>>> per-requeststies for it (provide appropriate callback).
>>>>>>>>
>>>>>>>
>>>>>>> I agree on above, we are saying similar thing in different perspective.
>>>>>>
>>>>>> Ok, that's good we are on the same page.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> Also, this patch introduce a driver internal function
>>>>>>>>>>> rte_eth_fp_ops_setup which used as an help function for PMD.
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
>>>>>>>>>>>
>>>>>>>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
>>>>>>>>>>> Cc: stable@dpdk.org
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>>>>>>>>>>> ---
>>>>>>>>>>>   doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
>>>>>>>>>>>   lib/ethdev/ethdev_driver.c              |  8 +++++++
>>>>>>>>>>>   lib/ethdev/ethdev_driver.h              | 10 ++++++++
>>>>>>>>>>>   lib/ethdev/rte_ethdev.h                 | 32
>>>>>>>>>>> +++++++++++++++----------
>>>>>>>>>>>   lib/ethdev/version.map                  |  1 +
>>>>>>>>>>>   5 files changed, 46 insertions(+), 25 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>>>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>>>>>>> index c145a9066c..e380ff135a 100644
>>>>>>>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>>>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
>>>>>>>>>>> @@ -638,14 +638,9 @@ different from the application invokes recovery
>>>>>>>>>>> in PASSIVE mode,
>>>>>>>>>>>   the PMD automatically recovers from error in PROACTIVE mode,
>>>>>>>>>>>   and only a small amount of work is required for the application.
>>>>>>>>>>>
>>>>>>>>>>> -During error detection and automatic recovery,
>>>>>>>>>>> -the PMD sets the data path pointers to dummy functions
>>>>>>>>>>> -(which will prevent the crash),
>>>>>>>>>>> -and also make sure the control path operations fail with a return
>>>>>>>>>>> code ``-EBUSY``.
>>>>>>>>>>> -
>>>>>>>>>>> -Because the PMD recovers automatically,
>>>>>>>>>>> -the application can only sense that the data flow is disconnected
>>>>>>>>>>> for a while
>>>>>>>>>>> -and the control API returns an error in this period.
>>>>>>>>>>> +During error detection and automatic recovery, the PMD sets the
>>>>>>>>>>> data path
>>>>>>>>>>> +pointers to dummy functions and also make sure the control path
>>>>>>>>>>> operations
>>>>>>>>>>> +failed with a return code ``-EBUSY``.
>>>>>>>>>>>
>>>>>>>>>>>   In order to sense the error happening/recovering,
>>>>>>>>>>>   as well as to restore some additional configuration,
>>>>>>>>>>> @@ -653,9 +648,9 @@ three events are available:
>>>>>>>>>>>
>>>>>>>>>>>   ``RTE_ETH_EVENT_ERR_RECOVERING``
>>>>>>>>>>>      Notify the application that an error is detected
>>>>>>>>>>> -   and the recovery is being started.
>>>>>>>>>>> +   and the recovery is about to start.
>>>>>>>>>>>      Upon receiving the event, the application should not invoke
>>>>>>>>>>> -   any control path function until receiving
>>>>>>>>>>> +   any control and data path API until receiving
>>>>>>>>>>>      ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
>>>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
>>>>>>>>>>>
>>>>>>>>>>>   .. note::
>>>>>>>>>>> @@ -666,8 +661,9 @@ three events are available:
>>>>>>>>>>>
>>>>>>>>>>>   ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
>>>>>>>>>>>      Notify the application that the recovery from error is successful,
>>>>>>>>>>> -   the PMD already re-configures the port,
>>>>>>>>>>> -   and the effect is the same as a restart operation.
>>>>>>>>>>> +   the PMD already re-configures the port.
>>>>>>>>>>> +   The application should restore some additional configuration,
>>>>>>>>>>> and then
>>>>>>>>>>> +   enable data path API invocation.
>>>>>>>>>>>
>>>>>>>>>>>   ``RTE_ETH_EVENT_RECOVERY_FAILED``
>>>>>>>>>>>      Notify the application that the recovery from error failed,
>>>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
>>>>>>>>>>> index 0be1e8ca04..f994653fe9 100644
>>>>>>>>>>> --- a/lib/ethdev/ethdev_driver.c
>>>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.c
>>>>>>>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev
>>>>>>>>>>> *dev, const char *ring_name,
>>>>>>>>>>>       return rc;
>>>>>>>>>>>   }
>>>>>>>>>>>
>>>>>>>>>>> +void
>>>>>>>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
>>>>>>>>>>> +{
>>>>>>>>>>> +    if (dev == NULL)
>>>>>>>>>>> +        return;
>>>>>>>>>>> +    eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
>>>>>>>>>>> +}
>>>>>>>>>>> +
>>>>>>>>>>>   const struct rte_memzone *
>>>>>>>>>>>   rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char
>>>>>>>>>>> *ring_name,
>>>>>>>>>>>                uint16_t queue_id, size_t size, unsigned int align,
>>>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
>>>>>>>>>>> index 2c9d615fb5..0d964d1f67 100644
>>>>>>>>>>> --- a/lib/ethdev/ethdev_driver.h
>>>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.h
>>>>>>>>>>> @@ -1621,6 +1621,16 @@ int
>>>>>>>>>>>   rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
>>>>>>>>>>> char *name,
>>>>>>>>>>>            uint16_t queue_id);
>>>>>>>>>>>
>>>>>>>>>>> +/**
>>>>>>>>>>> + * @internal
>>>>>>>>>>> + * Setup eth fast-path API to ethdev values.
>>>>>>>>>>> + *
>>>>>>>>>>> + * @param dev
>>>>>>>>>>> + *  Pointer to struct rte_eth_dev.
>>>>>>>>>>> + */
>>>>>>>>>>> +__rte_internal
>>>>>>>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
>>>>>>>>>>> +
>>>>>>>>>>>   /**
>>>>>>>>>>>    * @internal
>>>>>>>>>>>    * Atomically set the link status for the specific device.
>>>>>>>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>>>>>>>>>>> index 049641d57c..44ee7229c1 100644
>>>>>>>>>>> --- a/lib/ethdev/rte_ethdev.h
>>>>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h
>>>>>>>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
>>>>>>>>>>>        */
>>>>>>>>>>>       RTE_ETH_EVENT_RX_AVAIL_THRESH,
>>>>>>>>>>>       /** Port recovering from a hardware or firmware error.
>>>>>>>>>>> -     * If PMD supports proactive error recovery,
>>>>>>>>>>> -     * it should trigger this event to notify application
>>>>>>>>>>> -     * that it detected an error and the recovery is being started.
>>>>>>>>>>> -     * Upon receiving the event, the application should not invoke
>>>>>>>>>>> any control path API
>>>>>>>>>>> -     * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until
>>>>>>>>>>> receiving
>>>>>>>>>>> -     * RTE_ETH_EVENT_RECOVERY_SUCCESS or
>>>>>>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
>>>>>>>>>>> -     * The PMD will set the data path pointers to dummy functions,
>>>>>>>>>>> -     * and re-set the data path pointers to non-dummy functions
>>>>>>>>>>> -     * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
>>>>>>>>>>> -     * It means that the application cannot send or receive any
>>>>>>>>>>> packets
>>>>>>>>>>> -     * during this period.
>>>>>>>>>>> +     *
>>>>>>>>>>> +     * If PMD supports proactive error recovery, it should trigger
>>>>>>>>>>> this
>>>>>>>>>>> +     * event to notify application that it detected an error and the
>>>>>>>>>>> +     * recovery is about to start.
>>>>>>>>>>> +     *
>>>>>>>>>>> +     * Upon receiving the event, the application should not invoke any
>>>>>>>>>>> +     * control and data path API until receiving
>>>>>>>>>>> +     * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
>>>>>>>>>>> +     * event.
>>>>>>>>>>> +     *
>>>>>>>>>>> +     * Once this event is reported, the PMD will set the data path
>>>>>>>>>>> pointers
>>>>>>>>>>> +     * to dummy functions, and re-set the data path pointers to valid
>>>>>>>>>>> +     * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
>>>>>>>>>>> event.
>>>>>>>>>>> +     *
>>>>>>>>>>>        * @note Before the PMD reports the recovery result,
>>>>>>>>>>>        * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
>>>>>>>>>>> again,
>>>>>>>>>>>        * because a larger error may occur during the recovery.
>>>>>>>>>>>        */
>>>>>>>>>>>       RTE_ETH_EVENT_ERR_RECOVERING,
>>>>>>>>>>>       /** Port recovers successfully from the error.
>>>>>>>>>>> -     * The PMD already re-configured the port,
>>>>>>>>>>> -     * and the effect is the same as a restart operation.
>>>>>>>>>>> +     *
>>>>>>>>>>> +     * The PMD already re-configured the port:
>>>>>>>>>>>        * a) The following operation will be retained: (alphabetically)
>>>>>>>>>>>        *    - DCB configuration
>>>>>>>>>>>        *    - FEC configuration
>>>>>>>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
>>>>>>>>>>>        *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
>>>>>>>>>>>        * c) Any other configuration will not be stored
>>>>>>>>>>>        *    and will need to be re-configured.
>>>>>>>>>>> +     *
>>>>>>>>>>> +     * The application should restore some additional configuration
>>>>>>>>>>> +     * (see above case b/c), and then enable data path API invocation.
>>>>>>>>>>>        */
>>>>>>>>>>>       RTE_ETH_EVENT_RECOVERY_SUCCESS,
>>>>>>>>>>>       /** Port recovery failed.
>>>>>>>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
>>>>>>>>>>> index 357d1a88c0..c273e0bdae 100644
>>>>>>>>>>> --- a/lib/ethdev/version.map
>>>>>>>>>>> +++ b/lib/ethdev/version.map
>>>>>>>>>>> @@ -320,6 +320,7 @@ INTERNAL {
>>>>>>>>>>>       rte_eth_devices;
>>>>>>>>>>>       rte_eth_dma_zone_free;
>>>>>>>>>>>       rte_eth_dma_zone_reserve;
>>>>>>>>>>> +    rte_eth_fp_ops_setup;
>>>>>>>>>>>       rte_eth_hairpin_queue_peer_bind;
>>>>>>>>>>>       rte_eth_hairpin_queue_peer_unbind;
>>>>>>>>>>>       rte_eth_hairpin_queue_peer_update;
>>>>>>>>>>> --
>>>>>>>>>>   Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
>>>>>>>>>>
>>>>>>>>>>> 2.17.1
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
> 
> .
>
  
Konstantin Ananyev March 7, 2023, 12:39 p.m. UTC | #24
> >>>>>>>>>>> In the proactive error handling mode, the PMD will set the data path
> >>>>>>>>>>> pointers to dummy functions and then try recovery, in this period the
> >>>>>>>>>>> application may still invoking data path API. This will introduce a
> >>>>>>>>>>> race-condition with data path which may lead to crash [1].
> >>>>>>>>>>>
> >>>>>>>>>>> Although the PMD added delay after setting data path pointers to cover
> >>>>>>>>>>> the above race-condition, it reduces the probability, but it doesn't
> >>>>>>>>>>> solve the problem.
> >>>>>>>>>>>
> >>>>>>>>>>> To solve the race-condition problem fundamentally, the following
> >>>>>>>>>>> requirements are added:
> >>>>>>>>>>> 1. The PMD should set the data path pointers to dummy functions after
> >>>>>>>>>>>     report RTE_ETH_EVENT_ERR_RECOVERING event.
> >>>>>>>>>>> 2. The application should stop data path API invocation when process
> >>>>>>>>>>>     the RTE_ETH_EVENT_ERR_RECOVERING event.
> >>>>>>>>>>> 3. The PMD should set the data path pointers to valid functions before
> >>>>>>>>>>>     report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>>>>>>>>>> 4. The application should enable data path API invocation when process
> >>>>>>>>>>>     the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> How this is solving the race-condition, by pushing responsibility to
> >>>>>>>>> stop data path to application?
> >>>>>>>>
> >>>>>>>> Exactly, it becomes application responsibility to make sure data-path is
> >>>>>>>> stopped/suspended before recovery will continue.
> >>>>>>>>
> >>>>>>>
> >>>>>>> From documentation of the feature:
> >>>>>>>
> >>>>>>> ``
> >>>>>>> Because the PMD recovers automatically,
> >>>>>>> the application can only sense that the data flow is disconnected for a
> >>>>>>> while and the control API returns an error in this period.
> >>>>>>>
> >>>>>>> In order to sense the error happening/recovering, as well as to restore
> >>>>>>> some additional configuration, three events are available:
> >>>>>>> ``
> >>>>>>>
> >>>>>>> It looks like initial design is to use events mainly inform application
> >>>>>>> about what happened and mainly for re-configuration.
> >>>>>>>
> >>>>>>> Although I am don't disagree to involve the application, I am not sure
> >>>>>>> that is part of current design.
> >>>>>>
> >>>>>> I thought we all agreed that initial design contain some fallacies that
> >>>>>> need to fixed, no?
> >>>>>> Statement that with current rte_ethdev design error recovery can be done
> >>>>>> without interaction with the app (to stop/suspend data/control path)
> >>>>>> is the main one I think.
> >>>>>> It needs some interaction with app layer, one way or another.
> >>>>>>
> >>>>>>>>>
> >>>>>>>>> What if application is not interested in recovery modes at all and not
> >>>>>>>>> registered any callback for the recovery?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Are you saying there is no way for application to disable
> >>>>>>>> automatic recovery in PMD if it is not interested
> >>>>>>>> (or can't full-fill per-requesties for it)?
> >>>>>>>> If so, then yes it is a problem and we need to fix it.
> >>>>>>>> I assumed that such mechanism to disable unwanted events already exists,
> >>>>>>>> but I can't find anything.
> >>>>>>>> Wonder what would be the easiest way here - can PMD make a decision
> >>>>>>>> based on callback return value, or do we need a new API to
> >>>>>>>> enable/disable callbacks, or ...?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> As far as I can see automatic recovery is not configurable by app.
> >>>>>>>
> >>>>>>> But that is not all, PMD sends events to application but PMD can't know
> >>>>>>> if application is handling them or not, so with current design PMD can't
> >>>>>>> rely on to app.
> >>>>>>
> >>>>>> Well, PMD invokes user provided callback.
> >>>>>> One way to fix that problem - if there is no callback provided,
> >>>>>> or callback returns an error code - PMD can assume that recovery
> >>>>>> should not be done.
> >>>>>> That is probably not the best design choice, but at least it will allow
> >>>>>> to fix the problem without too many changes and introducing new API.
> >>>>>> That could be sort of a 'quick fix'.
> >>>>>> In a meanwhile we can think about new/better approach for that.
> >>>>>>
> >>>>>
> >>>>> -rc2 for 23.03 is a few days away.
> >>>>>
> >>>>> What do you think to have 'quick fix' as modifying how driver updates
> >>>>> burst ops to prevent the race condition, for this release?
> >>
> >> The 'quick fix', do you mean only update function pointer (without rxq setting) ?
> >> Currently the PMDs which announced support "proactive error handling mode" already
> >> do this.
> >>
> >
> > Yes.
> > I checked hns3, it does as you said, hns3_eth_dev_fp_ops_config()'
> > updates all fields in 'rte_eth_fp_ops' but only function pointer seems
> > changed in the driver, resulting only function pointers to be updated.
> >
> > The discussion about race condition started with patch [1], which
> > mentions a crash because of a race condition. Later in discussions,
> > recovery event given as a sample for where the race can occur, that is
> > why we are here.
> >
> > But after above info, although there is race condition and a bigger
> > update (that needs application involvement) is required for recovery
> > mechanism, there is no crash and NO 'quick fix' is required for recovery.
> >
> > @Konstantin, @Chengwen, can you please confirm above understanding is
> > correct?
> 
> Yes, that's what.

Yes, I think with Chengwen patch the race condition problem should be fixed.
Though for that user has to provide a properly implemented callback.
What is not currently addressed - user can not disable this auto-recovery procedure on his will. 
So if user will not provide a proper call-back the recovery can still proceed and race can happen. 

> 
> >
> >
> >
> > [1]
> > https://patches.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
> >
> >>>>>
> >>>>> And plan a design update for the next release?
> >>>> +1 on the overall approach.
> >>>
> >>> Yep, agree.
> >>
> >> Hope for better solution.
> >> And also, I notice only the openvswitch (from all open-source software which based-on DPDK)
> >> registers RTE_ETH_EVENT_INTR_RESET callback .
> >>
> >> Therefore, hope we build a recovery framework at the DPDK SDK level and be compatible
> >> with RTE_ETH_EVENT_INTR_RESET and RTE_ETH_EVENT_ERR_RECOVERING mechanism.
> >>
> >>>
> >>>>
> >>>>>
> >>>>>
> >>>>>>>
> >>>>>>>>> I think driver should not rely on application for this, unless
> >>>>>>>>> application explicitly says (to driver) that it is handling recovery,
> >>>>>>>>> right now there is no way for driver to know this.
> >>>>>>>>
> >>>>>>>> I think it is visa-versa:
> >>>>>>>> application should not enable auto-recovery if it can't meet
> >>>>>>>> per-requeststies for it (provide appropriate callback).
> >>>>>>>>
> >>>>>>>
> >>>>>>> I agree on above, we are saying similar thing in different perspective.
> >>>>>>
> >>>>>> Ok, that's good we are on the same page.
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>> Also, this patch introduce a driver internal function
> >>>>>>>>>>> rte_eth_fp_ops_setup which used as an help function for PMD.
> >>>>>>>>>>>
> >>>>>>>>>>> [1]
> >>>>>>>>>>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
> >>>>>>>>>>>
> >>>>>>>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
> >>>>>>>>>>> Cc: stable@dpdk.org
> >>>>>>>>>>>
> >>>>>>>>>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> >>>>>>>>>>> ---
> >>>>>>>>>>>   doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
> >>>>>>>>>>>   lib/ethdev/ethdev_driver.c              |  8 +++++++
> >>>>>>>>>>>   lib/ethdev/ethdev_driver.h              | 10 ++++++++
> >>>>>>>>>>>   lib/ethdev/rte_ethdev.h                 | 32
> >>>>>>>>>>> +++++++++++++++----------
> >>>>>>>>>>>   lib/ethdev/version.map                  |  1 +
> >>>>>>>>>>>   5 files changed, 46 insertions(+), 25 deletions(-)
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
> >>>>>>>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst
> >>>>>>>>>>> index c145a9066c..e380ff135a 100644
> >>>>>>>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> >>>>>>>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> >>>>>>>>>>> @@ -638,14 +638,9 @@ different from the application invokes recovery
> >>>>>>>>>>> in PASSIVE mode,
> >>>>>>>>>>>   the PMD automatically recovers from error in PROACTIVE mode,
> >>>>>>>>>>>   and only a small amount of work is required for the application.
> >>>>>>>>>>>
> >>>>>>>>>>> -During error detection and automatic recovery,
> >>>>>>>>>>> -the PMD sets the data path pointers to dummy functions
> >>>>>>>>>>> -(which will prevent the crash),
> >>>>>>>>>>> -and also make sure the control path operations fail with a return
> >>>>>>>>>>> code ``-EBUSY``.
> >>>>>>>>>>> -
> >>>>>>>>>>> -Because the PMD recovers automatically,
> >>>>>>>>>>> -the application can only sense that the data flow is disconnected
> >>>>>>>>>>> for a while
> >>>>>>>>>>> -and the control API returns an error in this period.
> >>>>>>>>>>> +During error detection and automatic recovery, the PMD sets the
> >>>>>>>>>>> data path
> >>>>>>>>>>> +pointers to dummy functions and also make sure the control path
> >>>>>>>>>>> operations
> >>>>>>>>>>> +failed with a return code ``-EBUSY``.
> >>>>>>>>>>>
> >>>>>>>>>>>   In order to sense the error happening/recovering,
> >>>>>>>>>>>   as well as to restore some additional configuration,
> >>>>>>>>>>> @@ -653,9 +648,9 @@ three events are available:
> >>>>>>>>>>>
> >>>>>>>>>>>   ``RTE_ETH_EVENT_ERR_RECOVERING``
> >>>>>>>>>>>      Notify the application that an error is detected
> >>>>>>>>>>> -   and the recovery is being started.
> >>>>>>>>>>> +   and the recovery is about to start.
> >>>>>>>>>>>      Upon receiving the event, the application should not invoke
> >>>>>>>>>>> -   any control path function until receiving
> >>>>>>>>>>> +   any control and data path API until receiving
> >>>>>>>>>>>      ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
> >>>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
> >>>>>>>>>>>
> >>>>>>>>>>>   .. note::
> >>>>>>>>>>> @@ -666,8 +661,9 @@ three events are available:
> >>>>>>>>>>>
> >>>>>>>>>>>   ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
> >>>>>>>>>>>      Notify the application that the recovery from error is successful,
> >>>>>>>>>>> -   the PMD already re-configures the port,
> >>>>>>>>>>> -   and the effect is the same as a restart operation.
> >>>>>>>>>>> +   the PMD already re-configures the port.
> >>>>>>>>>>> +   The application should restore some additional configuration,
> >>>>>>>>>>> and then
> >>>>>>>>>>> +   enable data path API invocation.
> >>>>>>>>>>>
> >>>>>>>>>>>   ``RTE_ETH_EVENT_RECOVERY_FAILED``
> >>>>>>>>>>>      Notify the application that the recovery from error failed,
> >>>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
> >>>>>>>>>>> index 0be1e8ca04..f994653fe9 100644
> >>>>>>>>>>> --- a/lib/ethdev/ethdev_driver.c
> >>>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.c
> >>>>>>>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev
> >>>>>>>>>>> *dev, const char *ring_name,
> >>>>>>>>>>>       return rc;
> >>>>>>>>>>>   }
> >>>>>>>>>>>
> >>>>>>>>>>> +void
> >>>>>>>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
> >>>>>>>>>>> +{
> >>>>>>>>>>> +    if (dev == NULL)
> >>>>>>>>>>> +        return;
> >>>>>>>>>>> +    eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
> >>>>>>>>>>> +}
> >>>>>>>>>>> +
> >>>>>>>>>>>   const struct rte_memzone *
> >>>>>>>>>>>   rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char
> >>>>>>>>>>> *ring_name,
> >>>>>>>>>>>                uint16_t queue_id, size_t size, unsigned int align,
> >>>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> >>>>>>>>>>> index 2c9d615fb5..0d964d1f67 100644
> >>>>>>>>>>> --- a/lib/ethdev/ethdev_driver.h
> >>>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.h
> >>>>>>>>>>> @@ -1621,6 +1621,16 @@ int
> >>>>>>>>>>>   rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
> >>>>>>>>>>> char *name,
> >>>>>>>>>>>            uint16_t queue_id);
> >>>>>>>>>>>
> >>>>>>>>>>> +/**
> >>>>>>>>>>> + * @internal
> >>>>>>>>>>> + * Setup eth fast-path API to ethdev values.
> >>>>>>>>>>> + *
> >>>>>>>>>>> + * @param dev
> >>>>>>>>>>> + *  Pointer to struct rte_eth_dev.
> >>>>>>>>>>> + */
> >>>>>>>>>>> +__rte_internal
> >>>>>>>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
> >>>>>>>>>>> +
> >>>>>>>>>>>   /**
> >>>>>>>>>>>    * @internal
> >>>>>>>>>>>    * Atomically set the link status for the specific device.
> >>>>>>>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> >>>>>>>>>>> index 049641d57c..44ee7229c1 100644
> >>>>>>>>>>> --- a/lib/ethdev/rte_ethdev.h
> >>>>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h
> >>>>>>>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
> >>>>>>>>>>>        */
> >>>>>>>>>>>       RTE_ETH_EVENT_RX_AVAIL_THRESH,
> >>>>>>>>>>>       /** Port recovering from a hardware or firmware error.
> >>>>>>>>>>> -     * If PMD supports proactive error recovery,
> >>>>>>>>>>> -     * it should trigger this event to notify application
> >>>>>>>>>>> -     * that it detected an error and the recovery is being started.
> >>>>>>>>>>> -     * Upon receiving the event, the application should not invoke
> >>>>>>>>>>> any control path API
> >>>>>>>>>>> -     * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until
> >>>>>>>>>>> receiving
> >>>>>>>>>>> -     * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> >>>>>>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
> >>>>>>>>>>> -     * The PMD will set the data path pointers to dummy functions,
> >>>>>>>>>>> -     * and re-set the data path pointers to non-dummy functions
> >>>>>>>>>>> -     * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> >>>>>>>>>>> -     * It means that the application cannot send or receive any
> >>>>>>>>>>> packets
> >>>>>>>>>>> -     * during this period.
> >>>>>>>>>>> +     *
> >>>>>>>>>>> +     * If PMD supports proactive error recovery, it should trigger
> >>>>>>>>>>> this
> >>>>>>>>>>> +     * event to notify application that it detected an error and the
> >>>>>>>>>>> +     * recovery is about to start.
> >>>>>>>>>>> +     *
> >>>>>>>>>>> +     * Upon receiving the event, the application should not invoke any
> >>>>>>>>>>> +     * control and data path API until receiving
> >>>>>>>>>>> +     * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
> >>>>>>>>>>> +     * event.
> >>>>>>>>>>> +     *
> >>>>>>>>>>> +     * Once this event is reported, the PMD will set the data path
> >>>>>>>>>>> pointers
> >>>>>>>>>>> +     * to dummy functions, and re-set the data path pointers to valid
> >>>>>>>>>>> +     * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
> >>>>>>>>>>> event.
> >>>>>>>>>>> +     *
> >>>>>>>>>>>        * @note Before the PMD reports the recovery result,
> >>>>>>>>>>>        * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
> >>>>>>>>>>> again,
> >>>>>>>>>>>        * because a larger error may occur during the recovery.
> >>>>>>>>>>>        */
> >>>>>>>>>>>       RTE_ETH_EVENT_ERR_RECOVERING,
> >>>>>>>>>>>       /** Port recovers successfully from the error.
> >>>>>>>>>>> -     * The PMD already re-configured the port,
> >>>>>>>>>>> -     * and the effect is the same as a restart operation.
> >>>>>>>>>>> +     *
> >>>>>>>>>>> +     * The PMD already re-configured the port:
> >>>>>>>>>>>        * a) The following operation will be retained: (alphabetically)
> >>>>>>>>>>>        *    - DCB configuration
> >>>>>>>>>>>        *    - FEC configuration
> >>>>>>>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
> >>>>>>>>>>>        *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
> >>>>>>>>>>>        * c) Any other configuration will not be stored
> >>>>>>>>>>>        *    and will need to be re-configured.
> >>>>>>>>>>> +     *
> >>>>>>>>>>> +     * The application should restore some additional configuration
> >>>>>>>>>>> +     * (see above case b/c), and then enable data path API invocation.
> >>>>>>>>>>>        */
> >>>>>>>>>>>       RTE_ETH_EVENT_RECOVERY_SUCCESS,
> >>>>>>>>>>>       /** Port recovery failed.
> >>>>>>>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> >>>>>>>>>>> index 357d1a88c0..c273e0bdae 100644
> >>>>>>>>>>> --- a/lib/ethdev/version.map
> >>>>>>>>>>> +++ b/lib/ethdev/version.map
> >>>>>>>>>>> @@ -320,6 +320,7 @@ INTERNAL {
> >>>>>>>>>>>       rte_eth_devices;
> >>>>>>>>>>>       rte_eth_dma_zone_free;
> >>>>>>>>>>>       rte_eth_dma_zone_reserve;
> >>>>>>>>>>> +    rte_eth_fp_ops_setup;
> >>>>>>>>>>>       rte_eth_hairpin_queue_peer_bind;
> >>>>>>>>>>>       rte_eth_hairpin_queue_peer_unbind;
> >>>>>>>>>>>       rte_eth_hairpin_queue_peer_update;
> >>>>>>>>>>> --
> >>>>>>>>>>   Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> >>>>>>>>>>
> >>>>>>>>>>> 2.17.1
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >
> > .
> >
  
Honnappa Nagarahalli March 8, 2023, 1:09 a.m. UTC | #25
<snip>

> >>>>>
> >>>
> >>> Is there any reason not to design this in the same way as
> >> 'rte_eth_dev_reset'? Why does the PMD have to recover by itself?
> >>
> >> I suppose it is a question for the authors of original patch...
> > Appreciate if the authors could comment on this.
> 
> The main cause is that the hardware implementation limit, I will try to explain
> from hns3 PMD's view.
> For a global reset, all the function need responsed within a centain period of
> time. otherwise, the reset will fail. and also the reset requirement a few steps (all
> may take a long time).
> 
> When with multiple functions in one DPDK, and trigger a global reset, the
> rte_eth_dev_reset will not cover this scene:
> 1. each port's will report RTE_ETH_EVENT_INTR_RESET in interrupt thread.
> 2. then invoke application callback, but due to the same thread, and each
>     port's recover will take a long time, so later port will reset failed.
If the design were to introduce RTE_ETH_EVENT_INTR_RECOVER and rte_eth_dev_recover, what problems do you see?

> 
> >
> >>
> >>> We could have a similar API 'rte_eth_dev_recover' to do the recovery
> >> functionality.
> >>
> >> I suppose such approach is also possible.
> >> Personally I am fine with both ways: either existing one or what you
> >> propose, as long as we'll fix existing race-condition.
> >> What is good with what you suggest - that way we probably don't need
> >> to worry how to allow user to enable/disable auto-recovery inside PMD.
> >>
> >> Konstantin
> >>
> >
  
Chengwen Feng March 9, 2023, 12:59 a.m. UTC | #26
On 2023/3/8 9:09, Honnappa Nagarahalli wrote:
> <snip>
> 
>>>>>>>
>>>>>
>>>>> Is there any reason not to design this in the same way as
>>>> 'rte_eth_dev_reset'? Why does the PMD have to recover by itself?
>>>>
>>>> I suppose it is a question for the authors of original patch...
>>> Appreciate if the authors could comment on this.
>>
>> The main cause is that the hardware implementation limit, I will try to explain
>> from hns3 PMD's view.
>> For a global reset, all the function need responsed within a centain period of
>> time. otherwise, the reset will fail. and also the reset requirement a few steps (all
>> may take a long time).
>>
>> When with multiple functions in one DPDK, and trigger a global reset, the
>> rte_eth_dev_reset will not cover this scene:
>> 1. each port's will report RTE_ETH_EVENT_INTR_RESET in interrupt thread.
>> 2. then invoke application callback, but due to the same thread, and each
>>     port's recover will take a long time, so later port will reset failed.
> If the design were to introduce RTE_ETH_EVENT_INTR_RECOVER and rte_eth_dev_recover, what problems do you see?

I see the 'RTE_ETH_EVENT_INTR_RECOVER and rte_eth_dev_recover' has no difference with
RTE_ETH_EVENT_INTR_RESET mechanism.
Could you detail more?

> 
>>
>>>
>>>>
>>>>> We could have a similar API 'rte_eth_dev_recover' to do the recovery
>>>> functionality.
>>>>
>>>> I suppose such approach is also possible.
>>>> Personally I am fine with both ways: either existing one or what you
>>>> propose, as long as we'll fix existing race-condition.
>>>> What is good with what you suggest - that way we probably don't need
>>>> to worry how to allow user to enable/disable auto-recovery inside PMD.
>>>>
>>>> Konstantin
>>>>
>>>
  
Ajit Khaparde March 9, 2023, 2:05 a.m. UTC | #27
On Tue, Mar 7, 2023 at 4:40 AM Konstantin Ananyev
<konstantin.ananyev@huawei.com> wrote:
>
>
>
> > >>>>>>>>>>> In the proactive error handling mode, the PMD will set the data path
> > >>>>>>>>>>> pointers to dummy functions and then try recovery, in this period the
> > >>>>>>>>>>> application may still invoking data path API. This will introduce a
> > >>>>>>>>>>> race-condition with data path which may lead to crash [1].
> > >>>>>>>>>>>
> > >>>>>>>>>>> Although the PMD added delay after setting data path pointers to cover
> > >>>>>>>>>>> the above race-condition, it reduces the probability, but it doesn't
> > >>>>>>>>>>> solve the problem.
> > >>>>>>>>>>>
> > >>>>>>>>>>> To solve the race-condition problem fundamentally, the following
> > >>>>>>>>>>> requirements are added:
> > >>>>>>>>>>> 1. The PMD should set the data path pointers to dummy functions after
> > >>>>>>>>>>>     report RTE_ETH_EVENT_ERR_RECOVERING event.
> > >>>>>>>>>>> 2. The application should stop data path API invocation when process
> > >>>>>>>>>>>     the RTE_ETH_EVENT_ERR_RECOVERING event.
> > >>>>>>>>>>> 3. The PMD should set the data path pointers to valid functions before
> > >>>>>>>>>>>     report RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >>>>>>>>>>> 4. The application should enable data path API invocation when process
> > >>>>>>>>>>>     the RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >>>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> How this is solving the race-condition, by pushing responsibility to
> > >>>>>>>>> stop data path to application?
> > >>>>>>>>
> > >>>>>>>> Exactly, it becomes application responsibility to make sure data-path is
> > >>>>>>>> stopped/suspended before recovery will continue.
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> From documentation of the feature:
> > >>>>>>>
> > >>>>>>> ``
> > >>>>>>> Because the PMD recovers automatically,
> > >>>>>>> the application can only sense that the data flow is disconnected for a
> > >>>>>>> while and the control API returns an error in this period.
> > >>>>>>>
> > >>>>>>> In order to sense the error happening/recovering, as well as to restore
> > >>>>>>> some additional configuration, three events are available:
> > >>>>>>> ``
> > >>>>>>>
> > >>>>>>> It looks like initial design is to use events mainly inform application
> > >>>>>>> about what happened and mainly for re-configuration.
> > >>>>>>>
> > >>>>>>> Although I am don't disagree to involve the application, I am not sure
> > >>>>>>> that is part of current design.
> > >>>>>>
> > >>>>>> I thought we all agreed that initial design contain some fallacies that
> > >>>>>> need to fixed, no?
> > >>>>>> Statement that with current rte_ethdev design error recovery can be done
> > >>>>>> without interaction with the app (to stop/suspend data/control path)
> > >>>>>> is the main one I think.
> > >>>>>> It needs some interaction with app layer, one way or another.
> > >>>>>>
> > >>>>>>>>>
> > >>>>>>>>> What if application is not interested in recovery modes at all and not
> > >>>>>>>>> registered any callback for the recovery?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Are you saying there is no way for application to disable
> > >>>>>>>> automatic recovery in PMD if it is not interested
> > >>>>>>>> (or can't full-fill per-requesties for it)?
> > >>>>>>>> If so, then yes it is a problem and we need to fix it.
> > >>>>>>>> I assumed that such mechanism to disable unwanted events already exists,
> > >>>>>>>> but I can't find anything.
> > >>>>>>>> Wonder what would be the easiest way here - can PMD make a decision
> > >>>>>>>> based on callback return value, or do we need a new API to
> > >>>>>>>> enable/disable callbacks, or ...?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> As far as I can see automatic recovery is not configurable by app.
> > >>>>>>>
> > >>>>>>> But that is not all, PMD sends events to application but PMD can't know
> > >>>>>>> if application is handling them or not, so with current design PMD can't
> > >>>>>>> rely on to app.
> > >>>>>>
> > >>>>>> Well, PMD invokes user provided callback.
> > >>>>>> One way to fix that problem - if there is no callback provided,
> > >>>>>> or callback returns an error code - PMD can assume that recovery
> > >>>>>> should not be done.
> > >>>>>> That is probably not the best design choice, but at least it will allow
> > >>>>>> to fix the problem without too many changes and introducing new API.
> > >>>>>> That could be sort of a 'quick fix'.
> > >>>>>> In a meanwhile we can think about new/better approach for that.
> > >>>>>>
> > >>>>>
> > >>>>> -rc2 for 23.03 is a few days away.
> > >>>>>
> > >>>>> What do you think to have 'quick fix' as modifying how driver updates
> > >>>>> burst ops to prevent the race condition, for this release?
> > >>
> > >> The 'quick fix', do you mean only update function pointer (without rxq setting) ?
> > >> Currently the PMDs which announced support "proactive error handling mode" already
> > >> do this.
> > >>
> > >
> > > Yes.
> > > I checked hns3, it does as you said, hns3_eth_dev_fp_ops_config()'
> > > updates all fields in 'rte_eth_fp_ops' but only function pointer seems
> > > changed in the driver, resulting only function pointers to be updated.
> > >
> > > The discussion about race condition started with patch [1], which
> > > mentions a crash because of a race condition. Later in discussions,
> > > recovery event given as a sample for where the race can occur, that is
> > > why we are here.
> > >
> > > But after above info, although there is race condition and a bigger
> > > update (that needs application involvement) is required for recovery
> > > mechanism, there is no crash and NO 'quick fix' is required for recovery.
> > >
> > > @Konstantin, @Chengwen, can you please confirm above understanding is
> > > correct?
> >
> > Yes, that's what.
>
> Yes, I think with Chengwen patch the race condition problem should be fixed.
> Though for that user has to provide a properly implemented callback.
> What is not currently addressed - user can not disable this auto-recovery procedure on his will.
> So if user will not provide a proper call-back the recovery can still proceed and race can happen.
Ideally the user or the application should participate in the recovery
to prevent more catastrophic results which may need a system reboot.
Not all scenarios are recoverable, but based on implementation that
could be a very small percentage.
But the application awareness and participation as an end goal is a
good idea nevertheless.

>
> >
> > >
> > >
> > >
> > > [1]
> > > https://patches.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
> > >
> > >>>>>
> > >>>>> And plan a design update for the next release?
> > >>>> +1 on the overall approach.
> > >>>
> > >>> Yep, agree.
> > >>
> > >> Hope for better solution.
> > >> And also, I notice only the openvswitch (from all open-source software which based-on DPDK)
> > >> registers RTE_ETH_EVENT_INTR_RESET callback .
> > >>
> > >> Therefore, hope we build a recovery framework at the DPDK SDK level and be compatible
> > >> with RTE_ETH_EVENT_INTR_RESET and RTE_ETH_EVENT_ERR_RECOVERING mechanism.
> > >>
> > >>>
> > >>>>
> > >>>>>
> > >>>>>
> > >>>>>>>
> > >>>>>>>>> I think driver should not rely on application for this, unless
> > >>>>>>>>> application explicitly says (to driver) that it is handling recovery,
> > >>>>>>>>> right now there is no way for driver to know this.
> > >>>>>>>>
> > >>>>>>>> I think it is visa-versa:
> > >>>>>>>> application should not enable auto-recovery if it can't meet
> > >>>>>>>> per-requeststies for it (provide appropriate callback).
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> I agree on above, we are saying similar thing in different perspective.
> > >>>>>>
> > >>>>>> Ok, that's good we are on the same page.
> > >>>>>>
> > >>>>>>
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>> Also, this patch introduce a driver internal function
> > >>>>>>>>>>> rte_eth_fp_ops_setup which used as an help function for PMD.
> > >>>>>>>>>>>
> > >>>>>>>>>>> [1]
> > >>>>>>>>>>> http://patchwork.dpdk.org/project/dpdk/patch/20230220060839.1267349-2-ashok.k.kaladi@intel.com/
> > >>>>>>>>>>>
> > >>>>>>>>>>> Fixes: eb0d471a8941 ("ethdev: add proactive error handling mode")
> > >>>>>>>>>>> Cc: stable@dpdk.org
> > >>>>>>>>>>>
> > >>>>>>>>>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> > >>>>>>>>>>> ---
> > >>>>>>>>>>>   doc/guides/prog_guide/poll_mode_drv.rst | 20 +++++++---------
> > >>>>>>>>>>>   lib/ethdev/ethdev_driver.c              |  8 +++++++
> > >>>>>>>>>>>   lib/ethdev/ethdev_driver.h              | 10 ++++++++
> > >>>>>>>>>>>   lib/ethdev/rte_ethdev.h                 | 32
> > >>>>>>>>>>> +++++++++++++++----------
> > >>>>>>>>>>>   lib/ethdev/version.map                  |  1 +
> > >>>>>>>>>>>   5 files changed, 46 insertions(+), 25 deletions(-)
> > >>>>>>>>>>>
> > >>>>>>>>>>> diff --git a/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>>>>>>>>> b/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>>>>>>>>> index c145a9066c..e380ff135a 100644
> > >>>>>>>>>>> --- a/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>>>>>>>>> +++ b/doc/guides/prog_guide/poll_mode_drv.rst
> > >>>>>>>>>>> @@ -638,14 +638,9 @@ different from the application invokes recovery
> > >>>>>>>>>>> in PASSIVE mode,
> > >>>>>>>>>>>   the PMD automatically recovers from error in PROACTIVE mode,
> > >>>>>>>>>>>   and only a small amount of work is required for the application.
> > >>>>>>>>>>>
> > >>>>>>>>>>> -During error detection and automatic recovery,
> > >>>>>>>>>>> -the PMD sets the data path pointers to dummy functions
> > >>>>>>>>>>> -(which will prevent the crash),
> > >>>>>>>>>>> -and also make sure the control path operations fail with a return
> > >>>>>>>>>>> code ``-EBUSY``.
> > >>>>>>>>>>> -
> > >>>>>>>>>>> -Because the PMD recovers automatically,
> > >>>>>>>>>>> -the application can only sense that the data flow is disconnected
> > >>>>>>>>>>> for a while
> > >>>>>>>>>>> -and the control API returns an error in this period.
> > >>>>>>>>>>> +During error detection and automatic recovery, the PMD sets the
> > >>>>>>>>>>> data path
> > >>>>>>>>>>> +pointers to dummy functions and also make sure the control path
> > >>>>>>>>>>> operations
> > >>>>>>>>>>> +failed with a return code ``-EBUSY``.
> > >>>>>>>>>>>
> > >>>>>>>>>>>   In order to sense the error happening/recovering,
> > >>>>>>>>>>>   as well as to restore some additional configuration,
> > >>>>>>>>>>> @@ -653,9 +648,9 @@ three events are available:
> > >>>>>>>>>>>
> > >>>>>>>>>>>   ``RTE_ETH_EVENT_ERR_RECOVERING``
> > >>>>>>>>>>>      Notify the application that an error is detected
> > >>>>>>>>>>> -   and the recovery is being started.
> > >>>>>>>>>>> +   and the recovery is about to start.
> > >>>>>>>>>>>      Upon receiving the event, the application should not invoke
> > >>>>>>>>>>> -   any control path function until receiving
> > >>>>>>>>>>> +   any control and data path API until receiving
> > >>>>>>>>>>>      ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or
> > >>>>>>>>>>> ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
> > >>>>>>>>>>>
> > >>>>>>>>>>>   .. note::
> > >>>>>>>>>>> @@ -666,8 +661,9 @@ three events are available:
> > >>>>>>>>>>>
> > >>>>>>>>>>>   ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
> > >>>>>>>>>>>      Notify the application that the recovery from error is successful,
> > >>>>>>>>>>> -   the PMD already re-configures the port,
> > >>>>>>>>>>> -   and the effect is the same as a restart operation.
> > >>>>>>>>>>> +   the PMD already re-configures the port.
> > >>>>>>>>>>> +   The application should restore some additional configuration,
> > >>>>>>>>>>> and then
> > >>>>>>>>>>> +   enable data path API invocation.
> > >>>>>>>>>>>
> > >>>>>>>>>>>   ``RTE_ETH_EVENT_RECOVERY_FAILED``
> > >>>>>>>>>>>      Notify the application that the recovery from error failed,
> > >>>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
> > >>>>>>>>>>> index 0be1e8ca04..f994653fe9 100644
> > >>>>>>>>>>> --- a/lib/ethdev/ethdev_driver.c
> > >>>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.c
> > >>>>>>>>>>> @@ -515,6 +515,14 @@ rte_eth_dma_zone_free(const struct rte_eth_dev
> > >>>>>>>>>>> *dev, const char *ring_name,
> > >>>>>>>>>>>       return rc;
> > >>>>>>>>>>>   }
> > >>>>>>>>>>>
> > >>>>>>>>>>> +void
> > >>>>>>>>>>> +rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
> > >>>>>>>>>>> +{
> > >>>>>>>>>>> +    if (dev == NULL)
> > >>>>>>>>>>> +        return;
> > >>>>>>>>>>> +    eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
> > >>>>>>>>>>> +}
> > >>>>>>>>>>> +
> > >>>>>>>>>>>   const struct rte_memzone *
> > >>>>>>>>>>>   rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char
> > >>>>>>>>>>> *ring_name,
> > >>>>>>>>>>>                uint16_t queue_id, size_t size, unsigned int align,
> > >>>>>>>>>>> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> > >>>>>>>>>>> index 2c9d615fb5..0d964d1f67 100644
> > >>>>>>>>>>> --- a/lib/ethdev/ethdev_driver.h
> > >>>>>>>>>>> +++ b/lib/ethdev/ethdev_driver.h
> > >>>>>>>>>>> @@ -1621,6 +1621,16 @@ int
> > >>>>>>>>>>>   rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const
> > >>>>>>>>>>> char *name,
> > >>>>>>>>>>>            uint16_t queue_id);
> > >>>>>>>>>>>
> > >>>>>>>>>>> +/**
> > >>>>>>>>>>> + * @internal
> > >>>>>>>>>>> + * Setup eth fast-path API to ethdev values.
> > >>>>>>>>>>> + *
> > >>>>>>>>>>> + * @param dev
> > >>>>>>>>>>> + *  Pointer to struct rte_eth_dev.
> > >>>>>>>>>>> + */
> > >>>>>>>>>>> +__rte_internal
> > >>>>>>>>>>> +void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
> > >>>>>>>>>>> +
> > >>>>>>>>>>>   /**
> > >>>>>>>>>>>    * @internal
> > >>>>>>>>>>>    * Atomically set the link status for the specific device.
> > >>>>>>>>>>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > >>>>>>>>>>> index 049641d57c..44ee7229c1 100644
> > >>>>>>>>>>> --- a/lib/ethdev/rte_ethdev.h
> > >>>>>>>>>>> +++ b/lib/ethdev/rte_ethdev.h
> > >>>>>>>>>>> @@ -3944,25 +3944,28 @@ enum rte_eth_event_type {
> > >>>>>>>>>>>        */
> > >>>>>>>>>>>       RTE_ETH_EVENT_RX_AVAIL_THRESH,
> > >>>>>>>>>>>       /** Port recovering from a hardware or firmware error.
> > >>>>>>>>>>> -     * If PMD supports proactive error recovery,
> > >>>>>>>>>>> -     * it should trigger this event to notify application
> > >>>>>>>>>>> -     * that it detected an error and the recovery is being started.
> > >>>>>>>>>>> -     * Upon receiving the event, the application should not invoke
> > >>>>>>>>>>> any control path API
> > >>>>>>>>>>> -     * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until
> > >>>>>>>>>>> receiving
> > >>>>>>>>>>> -     * RTE_ETH_EVENT_RECOVERY_SUCCESS or
> > >>>>>>>>>>> RTE_ETH_EVENT_RECOVERY_FAILED event.
> > >>>>>>>>>>> -     * The PMD will set the data path pointers to dummy functions,
> > >>>>>>>>>>> -     * and re-set the data path pointers to non-dummy functions
> > >>>>>>>>>>> -     * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
> > >>>>>>>>>>> -     * It means that the application cannot send or receive any
> > >>>>>>>>>>> packets
> > >>>>>>>>>>> -     * during this period.
> > >>>>>>>>>>> +     *
> > >>>>>>>>>>> +     * If PMD supports proactive error recovery, it should trigger
> > >>>>>>>>>>> this
> > >>>>>>>>>>> +     * event to notify application that it detected an error and the
> > >>>>>>>>>>> +     * recovery is about to start.
> > >>>>>>>>>>> +     *
> > >>>>>>>>>>> +     * Upon receiving the event, the application should not invoke any
> > >>>>>>>>>>> +     * control and data path API until receiving
> > >>>>>>>>>>> +     * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
> > >>>>>>>>>>> +     * event.
> > >>>>>>>>>>> +     *
> > >>>>>>>>>>> +     * Once this event is reported, the PMD will set the data path
> > >>>>>>>>>>> pointers
> > >>>>>>>>>>> +     * to dummy functions, and re-set the data path pointers to valid
> > >>>>>>>>>>> +     * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS
> > >>>>>>>>>>> event.
> > >>>>>>>>>>> +     *
> > >>>>>>>>>>>        * @note Before the PMD reports the recovery result,
> > >>>>>>>>>>>        * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event
> > >>>>>>>>>>> again,
> > >>>>>>>>>>>        * because a larger error may occur during the recovery.
> > >>>>>>>>>>>        */
> > >>>>>>>>>>>       RTE_ETH_EVENT_ERR_RECOVERING,
> > >>>>>>>>>>>       /** Port recovers successfully from the error.
> > >>>>>>>>>>> -     * The PMD already re-configured the port,
> > >>>>>>>>>>> -     * and the effect is the same as a restart operation.
> > >>>>>>>>>>> +     *
> > >>>>>>>>>>> +     * The PMD already re-configured the port:
> > >>>>>>>>>>>        * a) The following operation will be retained: (alphabetically)
> > >>>>>>>>>>>        *    - DCB configuration
> > >>>>>>>>>>>        *    - FEC configuration
> > >>>>>>>>>>> @@ -3989,6 +3992,9 @@ enum rte_eth_event_type {
> > >>>>>>>>>>>        *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
> > >>>>>>>>>>>        * c) Any other configuration will not be stored
> > >>>>>>>>>>>        *    and will need to be re-configured.
> > >>>>>>>>>>> +     *
> > >>>>>>>>>>> +     * The application should restore some additional configuration
> > >>>>>>>>>>> +     * (see above case b/c), and then enable data path API invocation.
> > >>>>>>>>>>>        */
> > >>>>>>>>>>>       RTE_ETH_EVENT_RECOVERY_SUCCESS,
> > >>>>>>>>>>>       /** Port recovery failed.
> > >>>>>>>>>>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> > >>>>>>>>>>> index 357d1a88c0..c273e0bdae 100644
> > >>>>>>>>>>> --- a/lib/ethdev/version.map
> > >>>>>>>>>>> +++ b/lib/ethdev/version.map
> > >>>>>>>>>>> @@ -320,6 +320,7 @@ INTERNAL {
> > >>>>>>>>>>>       rte_eth_devices;
> > >>>>>>>>>>>       rte_eth_dma_zone_free;
> > >>>>>>>>>>>       rte_eth_dma_zone_reserve;
> > >>>>>>>>>>> +    rte_eth_fp_ops_setup;
> > >>>>>>>>>>>       rte_eth_hairpin_queue_peer_bind;
> > >>>>>>>>>>>       rte_eth_hairpin_queue_peer_unbind;
> > >>>>>>>>>>>       rte_eth_hairpin_queue_peer_update;
> > >>>>>>>>>>> --
> > >>>>>>>>>>   Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> > >>>>>>>>>>
> > >>>>>>>>>>> 2.17.1
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>
> > >
> > > .
> > >
  
Honnappa Nagarahalli March 9, 2023, 3:03 a.m. UTC | #28
> -----Original Message-----
> From: fengchengwen <fengchengwen@huawei.com>
> Sent: Wednesday, March 8, 2023 7:00 PM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Konstantin
> Ananyev <konstantin.v.ananyev@yandex.ru>; dev@dpdk.org;
> thomas@monjalon.net; Ferruh Yigit <ferruh.yigit@amd.com>; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>; Kalesh AP <kalesh-
> anakkur.purayil@broadcom.com>; Ajit Khaparde
> (ajit.khaparde@broadcom.com) <ajit.khaparde@broadcom.com>
> Cc: nd <nd@arm.com>
> Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive error handling
> mode
> 
> 
> 
> On 2023/3/8 9:09, Honnappa Nagarahalli wrote:
> > <snip>
> >
> >>>>>>>
> >>>>>
> >>>>> Is there any reason not to design this in the same way as
> >>>> 'rte_eth_dev_reset'? Why does the PMD have to recover by itself?
> >>>>
> >>>> I suppose it is a question for the authors of original patch...
> >>> Appreciate if the authors could comment on this.
> >>
> >> The main cause is that the hardware implementation limit, I will try
> >> to explain from hns3 PMD's view.
> >> For a global reset, all the function need responsed within a centain
> >> period of time. otherwise, the reset will fail. and also the reset
> >> requirement a few steps (all may take a long time).
> >>
> >> When with multiple functions in one DPDK, and trigger a global reset,
> >> the rte_eth_dev_reset will not cover this scene:
> >> 1. each port's will report RTE_ETH_EVENT_INTR_RESET in interrupt thread.
> >> 2. then invoke application callback, but due to the same thread, and each
> >>     port's recover will take a long time, so later port will reset failed.
I am reading this again. What you are saying is, a single thread running the recovery process in sequence for multiple ports will not meet the required time limits. Hence, the recovery process needs to run in multiple threads simultaneously. This way each thread could run the recovery for a different port. Do I understand this correctly?

(Assuming my understanding is correct) The current implementation is running the recovery process in the context of data plane threads and not in the interrupt thread. Is this correct?

> > If the design were to introduce RTE_ETH_EVENT_INTR_RECOVER and
> rte_eth_dev_recover, what problems do you see?
> 
> I see the 'RTE_ETH_EVENT_INTR_RECOVER and rte_eth_dev_recover' has no
> difference with RTE_ETH_EVENT_INTR_RESET mechanism.
> Could you detail more?
> 
> >
> >>
> >>>
> >>>>
> >>>>> We could have a similar API 'rte_eth_dev_recover' to do the
> >>>>> recovery
> >>>> functionality.
> >>>>
> >>>> I suppose such approach is also possible.
> >>>> Personally I am fine with both ways: either existing one or what
> >>>> you propose, as long as we'll fix existing race-condition.
> >>>> What is good with what you suggest - that way we probably don't
> >>>> need to worry how to allow user to enable/disable auto-recovery inside
> PMD.
> >>>>
> >>>> Konstantin
> >>>>
> >>>
  
Chengwen Feng March 9, 2023, 11:30 a.m. UTC | #29
On 2023/3/9 11:03, Honnappa Nagarahalli wrote:
> 
> 
>> -----Original Message-----
>> From: fengchengwen <fengchengwen@huawei.com>
>> Sent: Wednesday, March 8, 2023 7:00 PM
>> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Konstantin
>> Ananyev <konstantin.v.ananyev@yandex.ru>; dev@dpdk.org;
>> thomas@monjalon.net; Ferruh Yigit <ferruh.yigit@amd.com>; Andrew
>> Rybchenko <andrew.rybchenko@oktetlabs.ru>; Kalesh AP <kalesh-
>> anakkur.purayil@broadcom.com>; Ajit Khaparde
>> (ajit.khaparde@broadcom.com) <ajit.khaparde@broadcom.com>
>> Cc: nd <nd@arm.com>
>> Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive error handling
>> mode
>>
>>
>>
>> On 2023/3/8 9:09, Honnappa Nagarahalli wrote:
>>> <snip>
>>>
>>>>>>>>>
>>>>>>>
>>>>>>> Is there any reason not to design this in the same way as
>>>>>> 'rte_eth_dev_reset'? Why does the PMD have to recover by itself?
>>>>>>
>>>>>> I suppose it is a question for the authors of original patch...
>>>>> Appreciate if the authors could comment on this.
>>>>
>>>> The main cause is that the hardware implementation limit, I will try
>>>> to explain from hns3 PMD's view.
>>>> For a global reset, all the function need responsed within a centain
>>>> period of time. otherwise, the reset will fail. and also the reset
>>>> requirement a few steps (all may take a long time).
>>>>
>>>> When with multiple functions in one DPDK, and trigger a global reset,
>>>> the rte_eth_dev_reset will not cover this scene:
>>>> 1. each port's will report RTE_ETH_EVENT_INTR_RESET in interrupt thread.
>>>> 2. then invoke application callback, but due to the same thread, and each
>>>>     port's recover will take a long time, so later port will reset failed.
> I am reading this again. What you are saying is, a single thread running the recovery process in sequence for multiple ports will not meet the required time limits. Hence, the recovery process needs to run in multiple threads simultaneously. This way each thread could run the recovery for a different port. Do I understand this correctly?

No
It's not realistic to have threads on every port.

> 
> (Assuming my understanding is correct) The current implementation is running the recovery process in the context of data plane threads and not in the interrupt thread. Is this correct?

No, the recovery process is running in the interrupt thread.

> 
>>> If the design were to introduce RTE_ETH_EVENT_INTR_RECOVER and
>> rte_eth_dev_recover, what problems do you see?
>>
>> I see the 'RTE_ETH_EVENT_INTR_RECOVER and rte_eth_dev_recover' has no
>> difference with RTE_ETH_EVENT_INTR_RESET mechanism.
>> Could you detail more?
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>> We could have a similar API 'rte_eth_dev_recover' to do the
>>>>>>> recovery
>>>>>> functionality.
>>>>>>
>>>>>> I suppose such approach is also possible.
>>>>>> Personally I am fine with both ways: either existing one or what
>>>>>> you propose, as long as we'll fix existing race-condition.
>>>>>> What is good with what you suggest - that way we probably don't
>>>>>> need to worry how to allow user to enable/disable auto-recovery inside
>> PMD.
>>>>>>
>>>>>> Konstantin
>>>>>>
>>>>>
  
Honnappa Nagarahalli March 10, 2023, 3:25 a.m. UTC | #30
> -----Original Message-----
> From: fengchengwen <fengchengwen@huawei.com>
> Sent: Thursday, March 9, 2023 5:31 AM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Konstantin
> Ananyev <konstantin.v.ananyev@yandex.ru>; dev@dpdk.org;
> thomas@monjalon.net; Ferruh Yigit <ferruh.yigit@amd.com>; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>; Kalesh AP <kalesh-
> anakkur.purayil@broadcom.com>; Ajit Khaparde
> (ajit.khaparde@broadcom.com) <ajit.khaparde@broadcom.com>
> Cc: nd <nd@arm.com>
> Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive error handling
> mode
> 
> 
> 
> On 2023/3/9 11:03, Honnappa Nagarahalli wrote:
> >
> >
> >> -----Original Message-----
> >> From: fengchengwen <fengchengwen@huawei.com>
> >> Sent: Wednesday, March 8, 2023 7:00 PM
> >> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> Konstantin
> >> Ananyev <konstantin.v.ananyev@yandex.ru>; dev@dpdk.org;
> >> thomas@monjalon.net; Ferruh Yigit <ferruh.yigit@amd.com>; Andrew
> >> Rybchenko <andrew.rybchenko@oktetlabs.ru>; Kalesh AP <kalesh-
> >> anakkur.purayil@broadcom.com>; Ajit Khaparde
> >> (ajit.khaparde@broadcom.com) <ajit.khaparde@broadcom.com>
> >> Cc: nd <nd@arm.com>
> >> Subject: Re: [PATCH 1/5] ethdev: fix race-condition of proactive
> >> error handling mode
> >>
> >>
> >>
> >> On 2023/3/8 9:09, Honnappa Nagarahalli wrote:
> >>> <snip>
> >>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>> Is there any reason not to design this in the same way as
> >>>>>> 'rte_eth_dev_reset'? Why does the PMD have to recover by itself?
> >>>>>>
> >>>>>> I suppose it is a question for the authors of original patch...
> >>>>> Appreciate if the authors could comment on this.
> >>>>
> >>>> The main cause is that the hardware implementation limit, I will
> >>>> try to explain from hns3 PMD's view.
> >>>> For a global reset, all the function need responsed within a
> >>>> centain period of time. otherwise, the reset will fail. and also
> >>>> the reset requirement a few steps (all may take a long time).
> >>>>
> >>>> When with multiple functions in one DPDK, and trigger a global
> >>>> reset, the rte_eth_dev_reset will not cover this scene:
> >>>> 1. each port's will report RTE_ETH_EVENT_INTR_RESET in interrupt
> thread.
> >>>> 2. then invoke application callback, but due to the same thread, and
> each
> >>>>     port's recover will take a long time, so later port will reset failed.
> > I am reading this again. What you are saying is, a single thread running the
> recovery process in sequence for multiple ports will not meet the required
> time limits. Hence, the recovery process needs to run in multiple threads
> simultaneously. This way each thread could run the recovery for a different
> port. Do I understand this correctly?
> 
> No
> It's not realistic to have threads on every port.
> 
> >
> > (Assuming my understanding is correct) The current implementation is
> running the recovery process in the context of data plane threads and not in
> the interrupt thread. Is this correct?
> 
> No, the recovery process is running in the interrupt thread.
Ok.

> 
> >
> >>> If the design were to introduce RTE_ETH_EVENT_INTR_RECOVER and
> >> rte_eth_dev_recover, what problems do you see?
> >>
> >> I see the 'RTE_ETH_EVENT_INTR_RECOVER and rte_eth_dev_recover' has
> no
> >> difference with RTE_ETH_EVENT_INTR_RESET mechanism.
> >> Could you detail more?
They are similar. i.e. we use RTE_ETH_EVENT_INTR_RECOVER to indicate that it is a recovery interrupt (not a reset event). The recovery process is called through new rte_eth_dev_recover API. What problems do you see with it?
I am unable to understand the problems you have described above.

> >>
> >>>
> >>>>
> >>>>>
> >>>>>>
> >>>>>>> We could have a similar API 'rte_eth_dev_recover' to do the
> >>>>>>> recovery
> >>>>>> functionality.
> >>>>>>
> >>>>>> I suppose such approach is also possible.
> >>>>>> Personally I am fine with both ways: either existing one or what
> >>>>>> you propose, as long as we'll fix existing race-condition.
> >>>>>> What is good with what you suggest - that way we probably don't
> >>>>>> need to worry how to allow user to enable/disable auto-recovery
> >>>>>> inside
> >> PMD.
> >>>>>>
> >>>>>> Konstantin
> >>>>>>
> >>>>>
  

Patch

diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
index c145a9066c..e380ff135a 100644
--- a/doc/guides/prog_guide/poll_mode_drv.rst
+++ b/doc/guides/prog_guide/poll_mode_drv.rst
@@ -638,14 +638,9 @@  different from the application invokes recovery in PASSIVE mode,
 the PMD automatically recovers from error in PROACTIVE mode,
 and only a small amount of work is required for the application.
 
-During error detection and automatic recovery,
-the PMD sets the data path pointers to dummy functions
-(which will prevent the crash),
-and also make sure the control path operations fail with a return code ``-EBUSY``.
-
-Because the PMD recovers automatically,
-the application can only sense that the data flow is disconnected for a while
-and the control API returns an error in this period.
+During error detection and automatic recovery, the PMD sets the data path
+pointers to dummy functions and also make sure the control path operations
+failed with a return code ``-EBUSY``.
 
 In order to sense the error happening/recovering,
 as well as to restore some additional configuration,
@@ -653,9 +648,9 @@  three events are available:
 
 ``RTE_ETH_EVENT_ERR_RECOVERING``
    Notify the application that an error is detected
-   and the recovery is being started.
+   and the recovery is about to start.
    Upon receiving the event, the application should not invoke
-   any control path function until receiving
+   any control and data path API until receiving
    ``RTE_ETH_EVENT_RECOVERY_SUCCESS`` or ``RTE_ETH_EVENT_RECOVERY_FAILED`` event.
 
 .. note::
@@ -666,8 +661,9 @@  three events are available:
 
 ``RTE_ETH_EVENT_RECOVERY_SUCCESS``
    Notify the application that the recovery from error is successful,
-   the PMD already re-configures the port,
-   and the effect is the same as a restart operation.
+   the PMD already re-configures the port.
+   The application should restore some additional configuration, and then
+   enable data path API invocation.
 
 ``RTE_ETH_EVENT_RECOVERY_FAILED``
    Notify the application that the recovery from error failed,
diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
index 0be1e8ca04..f994653fe9 100644
--- a/lib/ethdev/ethdev_driver.c
+++ b/lib/ethdev/ethdev_driver.c
@@ -515,6 +515,14 @@  rte_eth_dma_zone_free(const struct rte_eth_dev *dev, const char *ring_name,
 	return rc;
 }
 
+void
+rte_eth_fp_ops_setup(struct rte_eth_dev *dev)
+{
+	if (dev == NULL)
+		return;
+	eth_dev_fp_ops_setup(rte_eth_fp_ops + dev->data->port_id, dev);
+}
+
 const struct rte_memzone *
 rte_eth_dma_zone_reserve(const struct rte_eth_dev *dev, const char *ring_name,
 			 uint16_t queue_id, size_t size, unsigned int align,
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 2c9d615fb5..0d964d1f67 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1621,6 +1621,16 @@  int
 rte_eth_dma_zone_free(const struct rte_eth_dev *eth_dev, const char *name,
 		 uint16_t queue_id);
 
+/**
+ * @internal
+ * Setup eth fast-path API to ethdev values.
+ *
+ * @param dev
+ *  Pointer to struct rte_eth_dev.
+ */
+__rte_internal
+void rte_eth_fp_ops_setup(struct rte_eth_dev *dev);
+
 /**
  * @internal
  * Atomically set the link status for the specific device.
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 049641d57c..44ee7229c1 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -3944,25 +3944,28 @@  enum rte_eth_event_type {
 	 */
 	RTE_ETH_EVENT_RX_AVAIL_THRESH,
 	/** Port recovering from a hardware or firmware error.
-	 * If PMD supports proactive error recovery,
-	 * it should trigger this event to notify application
-	 * that it detected an error and the recovery is being started.
-	 * Upon receiving the event, the application should not invoke any control path API
-	 * (such as rte_eth_dev_configure/rte_eth_dev_stop...) until receiving
-	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED event.
-	 * The PMD will set the data path pointers to dummy functions,
-	 * and re-set the data path pointers to non-dummy functions
-	 * before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
-	 * It means that the application cannot send or receive any packets
-	 * during this period.
+	 *
+	 * If PMD supports proactive error recovery, it should trigger this
+	 * event to notify application that it detected an error and the
+	 * recovery is about to start.
+	 *
+	 * Upon receiving the event, the application should not invoke any
+	 * control and data path API until receiving
+	 * RTE_ETH_EVENT_RECOVERY_SUCCESS or RTE_ETH_EVENT_RECOVERY_FAILED
+	 * event.
+	 *
+	 * Once this event is reported, the PMD will set the data path pointers
+	 * to dummy functions, and re-set the data path pointers to valid
+	 * functions before reporting RTE_ETH_EVENT_RECOVERY_SUCCESS event.
+	 *
 	 * @note Before the PMD reports the recovery result,
 	 * the PMD may report the RTE_ETH_EVENT_ERR_RECOVERING event again,
 	 * because a larger error may occur during the recovery.
 	 */
 	RTE_ETH_EVENT_ERR_RECOVERING,
 	/** Port recovers successfully from the error.
-	 * The PMD already re-configured the port,
-	 * and the effect is the same as a restart operation.
+	 *
+	 * The PMD already re-configured the port:
 	 * a) The following operation will be retained: (alphabetically)
 	 *    - DCB configuration
 	 *    - FEC configuration
@@ -3989,6 +3992,9 @@  enum rte_eth_event_type {
 	 *      (@see RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP)
 	 * c) Any other configuration will not be stored
 	 *    and will need to be re-configured.
+	 *
+	 * The application should restore some additional configuration
+	 * (see above case b/c), and then enable data path API invocation.
 	 */
 	RTE_ETH_EVENT_RECOVERY_SUCCESS,
 	/** Port recovery failed.
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 357d1a88c0..c273e0bdae 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -320,6 +320,7 @@  INTERNAL {
 	rte_eth_devices;
 	rte_eth_dma_zone_free;
 	rte_eth_dma_zone_reserve;
+	rte_eth_fp_ops_setup;
 	rte_eth_hairpin_queue_peer_bind;
 	rte_eth_hairpin_queue_peer_unbind;
 	rte_eth_hairpin_queue_peer_update;