[RFC,v3] ethdev: claim device reset as async

Message ID 20180920045626.106109-1-qi.z.zhang@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series [RFC,v3] ethdev: claim device reset as async |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Qi Zhang Sept. 20, 2018, 4:56 a.m. UTC
  Device reset should be implemented in an async way since it is
possible to be invoked in interrupt thread and sometimes to reset a
device need to wait for some dependency, for example, a VF expects for
PF ready or a NIC function as part of a SOC wait for the whole system
reset complete, and all these time-consuming tasks will block the
interrupt thread.
The patch rename rte_eth_dev_reset to rte_eth_dev_reset_async and
rework the implementation. It will spawn a new thread which will call
ops->dev_reset, and when finished it will raise the event
RTE_ETH_EVENT_RESET_COMPLETE. The application should always wait for
this event before it continues to configure and restart the device.

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
---
 app/test-pmd/testpmd.c         | 50 +++++++++++++++++++++++++++++++++++++++++-
 lib/librte_ethdev/rte_ethdev.c | 50 +++++++++++++++++++++++++++++++++++++++---
 lib/librte_ethdev/rte_ethdev.h | 48 ++++++++++++++++++++++++----------------
 3 files changed, 125 insertions(+), 23 deletions(-)
  

Comments

Ferruh Yigit Oct. 4, 2018, 11:30 a.m. UTC | #1
On 9/20/2018 5:56 AM, Qi Zhang wrote:
> Device reset should be implemented in an async way since it is
> possible to be invoked in interrupt thread and sometimes to reset a
> device need to wait for some dependency, for example, a VF expects for
> PF ready or a NIC function as part of a SOC wait for the whole system
> reset complete, and all these time-consuming tasks will block the
> interrupt thread.
> The patch rename rte_eth_dev_reset to rte_eth_dev_reset_async and
> rework the implementation. It will spawn a new thread which will call
> ops->dev_reset, and when finished it will raise the event
> RTE_ETH_EVENT_RESET_COMPLETE. The application should always wait for
> this event before it continues to configure and restart the device.
> 
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>

<...>

> @@ -1385,10 +1413,26 @@ rte_eth_dev_reset(uint16_t port_id)
>  
>  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
>  
> +	/* already on resetting */
> +	if (dev->state == RTE_ETH_DEV_RESETTING)
> +		return 0;
> +
> +	args = calloc(1, sizeof(struct dev_reset_args));
> +	if (!args)
> +		return -ENOMEM;
> +
>  	rte_eth_dev_stop(port_id);
> -	ret = dev->dev_ops->dev_reset(dev);
>  
> -	return eth_err(port_id, ret);
> +	/* store previous device state temporary */
> +	args->pre_state = dev->state;
> +
> +	dev->state = RTE_ETH_DEV_RESETTING;

Do we need to update the state, I think this will break rte_eth_dev_count() and
friends, like during device reset app will think it has one less device in system.

<...>

> @@ -1814,21 +1816,29 @@ void rte_eth_dev_close(uint16_t port_id);
>   * RTE_ETH_EVENT_INTR_RESET event is detected, but can also use it to start
>   * a port reset in other circumstances.
>   *
> - * When this function is called, it first stops the port and then calls the
> - * PMD specific dev_uninit( ) and dev_init( ) to return the port to initial
> - * state, in which no Tx and Rx queues are setup, as if the port has been
> - * reset and not started. The port keeps the port id it had before the
> - * function call.
> - *
> - * After calling rte_eth_dev_reset( ), the application should use
> - * rte_eth_dev_configure( ), rte_eth_rx_queue_setup( ),
> - * rte_eth_tx_queue_setup( ), and rte_eth_dev_start( )
> - * to reconfigure the device as appropriate.
> - *
> - * Note: To avoid unexpected behavior, the application should stop calling
> - * Tx and Rx functions before calling rte_eth_dev_reset( ). For thread
> - * safety, all these controlling functions should be called from the same
> - * thread.
> + * @note
> + * Device reset may have the dependency, for example, a VF reset expects
> + * PF ready, or a NIC function as a part of a SOC need to wait for other
> + * parts of the system be ready, these are time-consuming tasks and will
> + * block current thread.
> + *
> + * As the name, rte_eth_dev_reset_async is an async API, it will spwan a
> + * new thread to call ops->dev_reset, once it is finished, it will raise
> + * the RTE_ETH_EVENT_RESET_COMPLETE event to notify application.  That makes
> + * things easy for an application that what to reset the device from the
> + * interrupt thread since typically a RTE_ETH_EVENT_INTR_RESET handler is
> + * invoked in interrupt thread.

thread calls dev_ops->dev_reset(dev) and wait for it, so it means
dev_ops->dev_reset is synchronous, perhaps it would be good to highlight this in
"dev_reset" comment to help PMD developers.

of dev_ops->dev_reset() is synchronous, means existing rte_eth_dev_reset() is
synchronous, so what do you thinks keep rte_eth_dev_reset() as it is and add new
rte_eth_dev_reset_async() API? Than we will have both sync and async solution.

> + *
> + * Application should not assume device reset is finished after
> + * rte_eth_dev_reset_async return, it should always wait for a
> + * RTE_ETH_EVENT_RESET_COMPLETE event and check the reset result.
> + * If reset success, application should call rte_eth_dev_configure( ),
> + * rte_eth_rx_queue_setup( ), rte_eth_tx_queue_setup( ),
> + * and rte_eth_dev_start( ) to reconfigure the device as appropriate.
> + *
> + * @Note
> + * To avoid unexpected behavior, the application should stop calling
> + * Tx and Rx functions before calling rte_eth_dev_reset_async( ).
>   *
>   * @param port_id
>   *   The port identifier of the Ethernet device.
> @@ -1837,12 +1847,10 @@ void rte_eth_dev_close(uint16_t port_id);
>   *   - (0) if successful.
>   *   - (-EINVAL) if port identifier is invalid.
>   *   - (-ENOTSUP) if hardware doesn't support this function.
> - *   - (-EPERM) if not ran from the primary process.
> - *   - (-EIO) if re-initialisation failed or device is removed.
>   *   - (-ENOMEM) if the reset failed due to OOM.
> - *   - (-EAGAIN) if the reset temporarily failed and should be retried later.
> + *   - (<0) other errors from low level driver.
>   */
> -int rte_eth_dev_reset(uint16_t port_id);
> +int rte_eth_dev_reset_async(uint16_t port_id);
>  
>  /**
>   * Enable receipt in promiscuous mode for an Ethernet device.
> @@ -2574,6 +2582,8 @@ enum rte_eth_event_type {
>  				/**< queue state event (enabled/disabled) */
>  	RTE_ETH_EVENT_INTR_RESET,
>  			/**< reset interrupt event, sent to VF on PF reset */
> +	RTE_ETH_EVENT_RESET_COMPLETE,
> +			/**< inform application that reset is completed */
>  	RTE_ETH_EVENT_VF_MBOX,  /**< message from the VF received by PF */
>  	RTE_ETH_EVENT_MACSEC,   /**< MACsec offload related event */
>  	RTE_ETH_EVENT_INTR_RMV, /**< device removal event */
>
  
Qi Zhang Oct. 4, 2018, 3:58 p.m. UTC | #2
> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Thursday, October 4, 2018 7:30 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net; Doherty,
> Declan <declan.doherty@intel.com>
> Cc: ktraynor@redhat.com; dev@dpdk.org; Shelton, Benjamin H
> <benjamin.h.shelton@intel.com>; Vangati, Narender
> <narender.vangati@intel.com>
> Subject: Re: [RFC v3] ethdev: claim device reset as async
> 
> On 9/20/2018 5:56 AM, Qi Zhang wrote:
> > Device reset should be implemented in an async way since it is
> > possible to be invoked in interrupt thread and sometimes to reset a
> > device need to wait for some dependency, for example, a VF expects for
> > PF ready or a NIC function as part of a SOC wait for the whole system
> > reset complete, and all these time-consuming tasks will block the
> > interrupt thread.
> > The patch rename rte_eth_dev_reset to rte_eth_dev_reset_async and
> > rework the implementation. It will spawn a new thread which will call
> > ops->dev_reset, and when finished it will raise the event
> > RTE_ETH_EVENT_RESET_COMPLETE. The application should always wait for
> > this event before it continues to configure and restart the device.
> >
> > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> 
> <...>
> 
> > @@ -1385,10 +1413,26 @@ rte_eth_dev_reset(uint16_t port_id)
> >
> >  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
> >
> > +	/* already on resetting */
> > +	if (dev->state == RTE_ETH_DEV_RESETTING)
> > +		return 0;
> > +
> > +	args = calloc(1, sizeof(struct dev_reset_args));
> > +	if (!args)
> > +		return -ENOMEM;
> > +
> >  	rte_eth_dev_stop(port_id);
> > -	ret = dev->dev_ops->dev_reset(dev);
> >
> > -	return eth_err(port_id, ret);
> > +	/* store previous device state temporary */
> > +	args->pre_state = dev->state;
> > +
> > +	dev->state = RTE_ETH_DEV_RESETTING;
> 
> Do we need to update the state, I think this will break rte_eth_dev_count() and
> friends, like during device reset app will think it has one less device in system.

I'd like to have this new state which represent the situation of the device more accurate.
In this patch RTE_ETH_DEV_RESETTING is just to be used to prevent double reset, but in future it can also be used to prevent invalid operation during device reset.

Of cause we need to make sure it does not break exist behavior and seems add RTE_ETH_DEV_RESETTING check in rte_eth_find_next_owned_by and rte_eth_find_next is able to fix the issue you observed.

I can add this in v4 if you agree the idea.

> 
> <...>
> 
> > @@ -1814,21 +1816,29 @@ void rte_eth_dev_close(uint16_t port_id);
> >   * RTE_ETH_EVENT_INTR_RESET event is detected, but can also use it to
> start
> >   * a port reset in other circumstances.
> >   *
> > - * When this function is called, it first stops the port and then
> > calls the
> > - * PMD specific dev_uninit( ) and dev_init( ) to return the port to
> > initial
> > - * state, in which no Tx and Rx queues are setup, as if the port has
> > been
> > - * reset and not started. The port keeps the port id it had before
> > the
> > - * function call.
> > - *
> > - * After calling rte_eth_dev_reset( ), the application should use
> > - * rte_eth_dev_configure( ), rte_eth_rx_queue_setup( ),
> > - * rte_eth_tx_queue_setup( ), and rte_eth_dev_start( )
> > - * to reconfigure the device as appropriate.
> > - *
> > - * Note: To avoid unexpected behavior, the application should stop
> > calling
> > - * Tx and Rx functions before calling rte_eth_dev_reset( ). For
> > thread
> > - * safety, all these controlling functions should be called from the
> > same
> > - * thread.
> > + * @note
> > + * Device reset may have the dependency, for example, a VF reset
> > + expects
> > + * PF ready, or a NIC function as a part of a SOC need to wait for
> > + other
> > + * parts of the system be ready, these are time-consuming tasks and
> > + will
> > + * block current thread.
> > + *
> > + * As the name, rte_eth_dev_reset_async is an async API, it will
> > + spwan a
> > + * new thread to call ops->dev_reset, once it is finished, it will
> > + raise
> > + * the RTE_ETH_EVENT_RESET_COMPLETE event to notify application.
> > + That makes
> > + * things easy for an application that what to reset the device from
> > + the
> > + * interrupt thread since typically a RTE_ETH_EVENT_INTR_RESET
> > + handler is
> > + * invoked in interrupt thread.
> 
> thread calls dev_ops->dev_reset(dev) and wait for it, so it means
> dev_ops->dev_reset is synchronous, perhaps it would be good to highlight this
> in "dev_reset" comment to help PMD developers.

OK

> 
> of dev_ops->dev_reset() is synchronous, means existing rte_eth_dev_reset() is
> synchronous, so what do you thinks keep rte_eth_dev_reset() as it is and add
> new
> rte_eth_dev_reset_async() API? Than we will have both sync and async
> solution.

Typically device reset happens when application receive RTE_ETH_EVENT_INTR_RESET and this is in interrupt thread. 
Invoke an async API in interrupt thread is the right way, is it better if we highlight this is the only way?
I may not prefer to expose the sync API right now, it's better to figure out some typical usage before we expose this, but so far I don't have.

Regards
Qi


> 
> > + *
> > + * Application should not assume device reset is finished after
> > + * rte_eth_dev_reset_async return, it should always wait for a
> > + * RTE_ETH_EVENT_RESET_COMPLETE event and check the reset result.
> > + * If reset success, application should call rte_eth_dev_configure(
> > + ),
> > + * rte_eth_rx_queue_setup( ), rte_eth_tx_queue_setup( ),
> > + * and rte_eth_dev_start( ) to reconfigure the device as appropriate.
> > + *
> > + * @Note
> > + * To avoid unexpected behavior, the application should stop calling
> > + * Tx and Rx functions before calling rte_eth_dev_reset_async( ).
> >   *
> >   * @param port_id
> >   *   The port identifier of the Ethernet device.
> > @@ -1837,12 +1847,10 @@ void rte_eth_dev_close(uint16_t port_id);
> >   *   - (0) if successful.
> >   *   - (-EINVAL) if port identifier is invalid.
> >   *   - (-ENOTSUP) if hardware doesn't support this function.
> > - *   - (-EPERM) if not ran from the primary process.
> > - *   - (-EIO) if re-initialisation failed or device is removed.
> >   *   - (-ENOMEM) if the reset failed due to OOM.
> > - *   - (-EAGAIN) if the reset temporarily failed and should be retried later.
> > + *   - (<0) other errors from low level driver.
> >   */
> > -int rte_eth_dev_reset(uint16_t port_id);
> > +int rte_eth_dev_reset_async(uint16_t port_id);
> >
> >  /**
> >   * Enable receipt in promiscuous mode for an Ethernet device.
> > @@ -2574,6 +2582,8 @@ enum rte_eth_event_type {
> >  				/**< queue state event (enabled/disabled) */
> >  	RTE_ETH_EVENT_INTR_RESET,
> >  			/**< reset interrupt event, sent to VF on PF reset */
> > +	RTE_ETH_EVENT_RESET_COMPLETE,
> > +			/**< inform application that reset is completed */
> >  	RTE_ETH_EVENT_VF_MBOX,  /**< message from the VF received by PF
> */
> >  	RTE_ETH_EVENT_MACSEC,   /**< MACsec offload related event */
> >  	RTE_ETH_EVENT_INTR_RMV, /**< device removal event */
> >
  
Ferruh Yigit March 19, 2019, 1:13 p.m. UTC | #3
On 10/4/2018 4:58 PM, Zhang, Qi Z wrote:
> 
> 
>> -----Original Message-----
>> From: Yigit, Ferruh
>> Sent: Thursday, October 4, 2018 7:30 PM
>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net; Doherty,
>> Declan <declan.doherty@intel.com>
>> Cc: ktraynor@redhat.com; dev@dpdk.org; Shelton, Benjamin H
>> <benjamin.h.shelton@intel.com>; Vangati, Narender
>> <narender.vangati@intel.com>
>> Subject: Re: [RFC v3] ethdev: claim device reset as async
>>
>> On 9/20/2018 5:56 AM, Qi Zhang wrote:
>>> Device reset should be implemented in an async way since it is
>>> possible to be invoked in interrupt thread and sometimes to reset a
>>> device need to wait for some dependency, for example, a VF expects for
>>> PF ready or a NIC function as part of a SOC wait for the whole system
>>> reset complete, and all these time-consuming tasks will block the
>>> interrupt thread.
>>> The patch rename rte_eth_dev_reset to rte_eth_dev_reset_async and
>>> rework the implementation. It will spawn a new thread which will call
>>> ops->dev_reset, and when finished it will raise the event
>>> RTE_ETH_EVENT_RESET_COMPLETE. The application should always wait for
>>> this event before it continues to configure and restart the device.
>>>
>>> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
>>
>> <...>
>>
>>> @@ -1385,10 +1413,26 @@ rte_eth_dev_reset(uint16_t port_id)
>>>
>>>  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
>>>
>>> +	/* already on resetting */
>>> +	if (dev->state == RTE_ETH_DEV_RESETTING)
>>> +		return 0;
>>> +
>>> +	args = calloc(1, sizeof(struct dev_reset_args));
>>> +	if (!args)
>>> +		return -ENOMEM;
>>> +
>>>  	rte_eth_dev_stop(port_id);
>>> -	ret = dev->dev_ops->dev_reset(dev);
>>>
>>> -	return eth_err(port_id, ret);
>>> +	/* store previous device state temporary */
>>> +	args->pre_state = dev->state;
>>> +
>>> +	dev->state = RTE_ETH_DEV_RESETTING;
>>
>> Do we need to update the state, I think this will break rte_eth_dev_count() and
>> friends, like during device reset app will think it has one less device in system.
> 
> I'd like to have this new state which represent the situation of the device more accurate.
> In this patch RTE_ETH_DEV_RESETTING is just to be used to prevent double reset, but in future it can also be used to prevent invalid operation during device reset.
> 
> Of cause we need to make sure it does not break exist behavior and seems add RTE_ETH_DEV_RESETTING check in rte_eth_find_next_owned_by and rte_eth_find_next is able to fix the issue you observed.
> 
> I can add this in v4 if you agree the idea.
> 
>>
>> <...>
>>
>>> @@ -1814,21 +1816,29 @@ void rte_eth_dev_close(uint16_t port_id);
>>>   * RTE_ETH_EVENT_INTR_RESET event is detected, but can also use it to
>> start
>>>   * a port reset in other circumstances.
>>>   *
>>> - * When this function is called, it first stops the port and then
>>> calls the
>>> - * PMD specific dev_uninit( ) and dev_init( ) to return the port to
>>> initial
>>> - * state, in which no Tx and Rx queues are setup, as if the port has
>>> been
>>> - * reset and not started. The port keeps the port id it had before
>>> the
>>> - * function call.
>>> - *
>>> - * After calling rte_eth_dev_reset( ), the application should use
>>> - * rte_eth_dev_configure( ), rte_eth_rx_queue_setup( ),
>>> - * rte_eth_tx_queue_setup( ), and rte_eth_dev_start( )
>>> - * to reconfigure the device as appropriate.
>>> - *
>>> - * Note: To avoid unexpected behavior, the application should stop
>>> calling
>>> - * Tx and Rx functions before calling rte_eth_dev_reset( ). For
>>> thread
>>> - * safety, all these controlling functions should be called from the
>>> same
>>> - * thread.
>>> + * @note
>>> + * Device reset may have the dependency, for example, a VF reset
>>> + expects
>>> + * PF ready, or a NIC function as a part of a SOC need to wait for
>>> + other
>>> + * parts of the system be ready, these are time-consuming tasks and
>>> + will
>>> + * block current thread.
>>> + *
>>> + * As the name, rte_eth_dev_reset_async is an async API, it will
>>> + spwan a
>>> + * new thread to call ops->dev_reset, once it is finished, it will
>>> + raise
>>> + * the RTE_ETH_EVENT_RESET_COMPLETE event to notify application.
>>> + That makes
>>> + * things easy for an application that what to reset the device from
>>> + the
>>> + * interrupt thread since typically a RTE_ETH_EVENT_INTR_RESET
>>> + handler is
>>> + * invoked in interrupt thread.
>>
>> thread calls dev_ops->dev_reset(dev) and wait for it, so it means
>> dev_ops->dev_reset is synchronous, perhaps it would be good to highlight this
>> in "dev_reset" comment to help PMD developers.
> 
> OK
> 
>>
>> of dev_ops->dev_reset() is synchronous, means existing rte_eth_dev_reset() is
>> synchronous, so what do you thinks keep rte_eth_dev_reset() as it is and add
>> new
>> rte_eth_dev_reset_async() API? Than we will have both sync and async
>> solution.
> 
> Typically device reset happens when application receive RTE_ETH_EVENT_INTR_RESET and this is in interrupt thread. 
> Invoke an async API in interrupt thread is the right way, is it better if we highlight this is the only way?
> I may not prefer to expose the sync API right now, it's better to figure out some typical usage before we expose this, but so far I don't have.

Hi Qi,

Is the 'rte_eth_dev_reset_async()' still required? Is there any update planned
to this RFC?

> 
> Regards
> Qi
> 
> 
>>
>>> + *
>>> + * Application should not assume device reset is finished after
>>> + * rte_eth_dev_reset_async return, it should always wait for a
>>> + * RTE_ETH_EVENT_RESET_COMPLETE event and check the reset result.
>>> + * If reset success, application should call rte_eth_dev_configure(
>>> + ),
>>> + * rte_eth_rx_queue_setup( ), rte_eth_tx_queue_setup( ),
>>> + * and rte_eth_dev_start( ) to reconfigure the device as appropriate.
>>> + *
>>> + * @Note
>>> + * To avoid unexpected behavior, the application should stop calling
>>> + * Tx and Rx functions before calling rte_eth_dev_reset_async( ).
>>>   *
>>>   * @param port_id
>>>   *   The port identifier of the Ethernet device.
>>> @@ -1837,12 +1847,10 @@ void rte_eth_dev_close(uint16_t port_id);
>>>   *   - (0) if successful.
>>>   *   - (-EINVAL) if port identifier is invalid.
>>>   *   - (-ENOTSUP) if hardware doesn't support this function.
>>> - *   - (-EPERM) if not ran from the primary process.
>>> - *   - (-EIO) if re-initialisation failed or device is removed.
>>>   *   - (-ENOMEM) if the reset failed due to OOM.
>>> - *   - (-EAGAIN) if the reset temporarily failed and should be retried later.
>>> + *   - (<0) other errors from low level driver.
>>>   */
>>> -int rte_eth_dev_reset(uint16_t port_id);
>>> +int rte_eth_dev_reset_async(uint16_t port_id);
>>>
>>>  /**
>>>   * Enable receipt in promiscuous mode for an Ethernet device.
>>> @@ -2574,6 +2582,8 @@ enum rte_eth_event_type {
>>>  				/**< queue state event (enabled/disabled) */
>>>  	RTE_ETH_EVENT_INTR_RESET,
>>>  			/**< reset interrupt event, sent to VF on PF reset */
>>> +	RTE_ETH_EVENT_RESET_COMPLETE,
>>> +			/**< inform application that reset is completed */
>>>  	RTE_ETH_EVENT_VF_MBOX,  /**< message from the VF received by PF
>> */
>>>  	RTE_ETH_EVENT_MACSEC,   /**< MACsec offload related event */
>>>  	RTE_ETH_EVENT_INTR_RMV, /**< device removal event */
>>>
>
  
Qi Zhang March 19, 2019, 1:40 p.m. UTC | #4
> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Tuesday, March 19, 2019 9:14 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net; Doherty, Declan
> <declan.doherty@intel.com>
> Cc: ktraynor@redhat.com; dev@dpdk.org; Shelton, Benjamin H
> <benjamin.h.shelton@intel.com>; Vangati, Narender
> <narender.vangati@intel.com>
> Subject: Re: [RFC v3] ethdev: claim device reset as async
> 
> On 10/4/2018 4:58 PM, Zhang, Qi Z wrote:
> >
> >
> >> -----Original Message-----
> >> From: Yigit, Ferruh
> >> Sent: Thursday, October 4, 2018 7:30 PM
> >> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net; Doherty,
> >> Declan <declan.doherty@intel.com>
> >> Cc: ktraynor@redhat.com; dev@dpdk.org; Shelton, Benjamin H
> >> <benjamin.h.shelton@intel.com>; Vangati, Narender
> >> <narender.vangati@intel.com>
> >> Subject: Re: [RFC v3] ethdev: claim device reset as async
> >>
> >> On 9/20/2018 5:56 AM, Qi Zhang wrote:
> >>> Device reset should be implemented in an async way since it is
> >>> possible to be invoked in interrupt thread and sometimes to reset a
> >>> device need to wait for some dependency, for example, a VF expects
> >>> for PF ready or a NIC function as part of a SOC wait for the whole
> >>> system reset complete, and all these time-consuming tasks will block
> >>> the interrupt thread.
> >>> The patch rename rte_eth_dev_reset to rte_eth_dev_reset_async and
> >>> rework the implementation. It will spawn a new thread which will
> >>> call
> >>> ops->dev_reset, and when finished it will raise the event
> >>> RTE_ETH_EVENT_RESET_COMPLETE. The application should always wait for
> >>> this event before it continues to configure and restart the device.
> >>>
> >>> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> >>
> >> <...>
> >>
> >>> @@ -1385,10 +1413,26 @@ rte_eth_dev_reset(uint16_t port_id)
> >>>
> >>>  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset,
> -ENOTSUP);
> >>>
> >>> +	/* already on resetting */
> >>> +	if (dev->state == RTE_ETH_DEV_RESETTING)
> >>> +		return 0;
> >>> +
> >>> +	args = calloc(1, sizeof(struct dev_reset_args));
> >>> +	if (!args)
> >>> +		return -ENOMEM;
> >>> +
> >>>  	rte_eth_dev_stop(port_id);
> >>> -	ret = dev->dev_ops->dev_reset(dev);
> >>>
> >>> -	return eth_err(port_id, ret);
> >>> +	/* store previous device state temporary */
> >>> +	args->pre_state = dev->state;
> >>> +
> >>> +	dev->state = RTE_ETH_DEV_RESETTING;
> >>
> >> Do we need to update the state, I think this will break
> >> rte_eth_dev_count() and friends, like during device reset app will think it has
> one less device in system.
> >
> > I'd like to have this new state which represent the situation of the device more
> accurate.
> > In this patch RTE_ETH_DEV_RESETTING is just to be used to prevent double
> reset, but in future it can also be used to prevent invalid operation during device
> reset.
> >
> > Of cause we need to make sure it does not break exist behavior and seems add
> RTE_ETH_DEV_RESETTING check in rte_eth_find_next_owned_by and
> rte_eth_find_next is able to fix the issue you observed.
> >
> > I can add this in v4 if you agree the idea.
> >
> >>
> >> <...>
> >>
> >>> @@ -1814,21 +1816,29 @@ void rte_eth_dev_close(uint16_t port_id);
> >>>   * RTE_ETH_EVENT_INTR_RESET event is detected, but can also use it
> >>> to
> >> start
> >>>   * a port reset in other circumstances.
> >>>   *
> >>> - * When this function is called, it first stops the port and then
> >>> calls the
> >>> - * PMD specific dev_uninit( ) and dev_init( ) to return the port to
> >>> initial
> >>> - * state, in which no Tx and Rx queues are setup, as if the port
> >>> has been
> >>> - * reset and not started. The port keeps the port id it had before
> >>> the
> >>> - * function call.
> >>> - *
> >>> - * After calling rte_eth_dev_reset( ), the application should use
> >>> - * rte_eth_dev_configure( ), rte_eth_rx_queue_setup( ),
> >>> - * rte_eth_tx_queue_setup( ), and rte_eth_dev_start( )
> >>> - * to reconfigure the device as appropriate.
> >>> - *
> >>> - * Note: To avoid unexpected behavior, the application should stop
> >>> calling
> >>> - * Tx and Rx functions before calling rte_eth_dev_reset( ). For
> >>> thread
> >>> - * safety, all these controlling functions should be called from
> >>> the same
> >>> - * thread.
> >>> + * @note
> >>> + * Device reset may have the dependency, for example, a VF reset
> >>> + expects
> >>> + * PF ready, or a NIC function as a part of a SOC need to wait for
> >>> + other
> >>> + * parts of the system be ready, these are time-consuming tasks and
> >>> + will
> >>> + * block current thread.
> >>> + *
> >>> + * As the name, rte_eth_dev_reset_async is an async API, it will
> >>> + spwan a
> >>> + * new thread to call ops->dev_reset, once it is finished, it will
> >>> + raise
> >>> + * the RTE_ETH_EVENT_RESET_COMPLETE event to notify application.
> >>> + That makes
> >>> + * things easy for an application that what to reset the device
> >>> + from the
> >>> + * interrupt thread since typically a RTE_ETH_EVENT_INTR_RESET
> >>> + handler is
> >>> + * invoked in interrupt thread.
> >>
> >> thread calls dev_ops->dev_reset(dev) and wait for it, so it means
> >> dev_ops->dev_reset is synchronous, perhaps it would be good to
> >> highlight this in "dev_reset" comment to help PMD developers.
> >
> > OK
> >
> >>
> >> of dev_ops->dev_reset() is synchronous, means existing
> >> rte_eth_dev_reset() is synchronous, so what do you thinks keep
> >> rte_eth_dev_reset() as it is and add new
> >> rte_eth_dev_reset_async() API? Than we will have both sync and async
> >> solution.
> >
> > Typically device reset happens when application receive
> RTE_ETH_EVENT_INTR_RESET and this is in interrupt thread.
> > Invoke an async API in interrupt thread is the right way, is it better if we
> highlight this is the only way?
> > I may not prefer to expose the sync API right now, it's better to figure out some
> typical usage before we expose this, but so far I don't have.
> 
> Hi Qi,
> 
> Is the 'rte_eth_dev_reset_async()' still required? Is there any update planned to
> this RFC?

Yes, I think the requirement is still there. Just don't have bandwidth work on this recently.
May I send out v1 for 19.05 in this week? since deprecation notes already be send out in 19.02 cycle


> 
> >
> > Regards
> > Qi
> >
> >
> >>
> >>> + *
> >>> + * Application should not assume device reset is finished after
> >>> + * rte_eth_dev_reset_async return, it should always wait for a
> >>> + * RTE_ETH_EVENT_RESET_COMPLETE event and check the reset result.
> >>> + * If reset success, application should call rte_eth_dev_configure(
> >>> + ),
> >>> + * rte_eth_rx_queue_setup( ), rte_eth_tx_queue_setup( ),
> >>> + * and rte_eth_dev_start( ) to reconfigure the device as appropriate.
> >>> + *
> >>> + * @Note
> >>> + * To avoid unexpected behavior, the application should stop
> >>> + calling
> >>> + * Tx and Rx functions before calling rte_eth_dev_reset_async( ).
> >>>   *
> >>>   * @param port_id
> >>>   *   The port identifier of the Ethernet device.
> >>> @@ -1837,12 +1847,10 @@ void rte_eth_dev_close(uint16_t port_id);
> >>>   *   - (0) if successful.
> >>>   *   - (-EINVAL) if port identifier is invalid.
> >>>   *   - (-ENOTSUP) if hardware doesn't support this function.
> >>> - *   - (-EPERM) if not ran from the primary process.
> >>> - *   - (-EIO) if re-initialisation failed or device is removed.
> >>>   *   - (-ENOMEM) if the reset failed due to OOM.
> >>> - *   - (-EAGAIN) if the reset temporarily failed and should be retried later.
> >>> + *   - (<0) other errors from low level driver.
> >>>   */
> >>> -int rte_eth_dev_reset(uint16_t port_id);
> >>> +int rte_eth_dev_reset_async(uint16_t port_id);
> >>>
> >>>  /**
> >>>   * Enable receipt in promiscuous mode for an Ethernet device.
> >>> @@ -2574,6 +2582,8 @@ enum rte_eth_event_type {
> >>>  				/**< queue state event (enabled/disabled) */
> >>>  	RTE_ETH_EVENT_INTR_RESET,
> >>>  			/**< reset interrupt event, sent to VF on PF reset */
> >>> +	RTE_ETH_EVENT_RESET_COMPLETE,
> >>> +			/**< inform application that reset is completed */
> >>>  	RTE_ETH_EVENT_VF_MBOX,  /**< message from the VF received by
> PF
> >> */
> >>>  	RTE_ETH_EVENT_MACSEC,   /**< MACsec offload related event */
> >>>  	RTE_ETH_EVENT_INTR_RMV, /**< device removal event */
> >>>
> >
  
Ferruh Yigit March 19, 2019, 4:41 p.m. UTC | #5
On 3/19/2019 1:40 PM, Zhang, Qi Z wrote:
> 
> 
>> -----Original Message-----
>> From: Yigit, Ferruh
>> Sent: Tuesday, March 19, 2019 9:14 PM
>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net; Doherty, Declan
>> <declan.doherty@intel.com>
>> Cc: ktraynor@redhat.com; dev@dpdk.org; Shelton, Benjamin H
>> <benjamin.h.shelton@intel.com>; Vangati, Narender
>> <narender.vangati@intel.com>
>> Subject: Re: [RFC v3] ethdev: claim device reset as async
>>
>> On 10/4/2018 4:58 PM, Zhang, Qi Z wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Yigit, Ferruh
>>>> Sent: Thursday, October 4, 2018 7:30 PM
>>>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; thomas@monjalon.net; Doherty,
>>>> Declan <declan.doherty@intel.com>
>>>> Cc: ktraynor@redhat.com; dev@dpdk.org; Shelton, Benjamin H
>>>> <benjamin.h.shelton@intel.com>; Vangati, Narender
>>>> <narender.vangati@intel.com>
>>>> Subject: Re: [RFC v3] ethdev: claim device reset as async
>>>>
>>>> On 9/20/2018 5:56 AM, Qi Zhang wrote:
>>>>> Device reset should be implemented in an async way since it is
>>>>> possible to be invoked in interrupt thread and sometimes to reset a
>>>>> device need to wait for some dependency, for example, a VF expects
>>>>> for PF ready or a NIC function as part of a SOC wait for the whole
>>>>> system reset complete, and all these time-consuming tasks will block
>>>>> the interrupt thread.
>>>>> The patch rename rte_eth_dev_reset to rte_eth_dev_reset_async and
>>>>> rework the implementation. It will spawn a new thread which will
>>>>> call
>>>>> ops->dev_reset, and when finished it will raise the event
>>>>> RTE_ETH_EVENT_RESET_COMPLETE. The application should always wait for
>>>>> this event before it continues to configure and restart the device.
>>>>>
>>>>> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
>>>>
>>>> <...>
>>>>
>>>>> @@ -1385,10 +1413,26 @@ rte_eth_dev_reset(uint16_t port_id)
>>>>>
>>>>>  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset,
>> -ENOTSUP);
>>>>>
>>>>> +	/* already on resetting */
>>>>> +	if (dev->state == RTE_ETH_DEV_RESETTING)
>>>>> +		return 0;
>>>>> +
>>>>> +	args = calloc(1, sizeof(struct dev_reset_args));
>>>>> +	if (!args)
>>>>> +		return -ENOMEM;
>>>>> +
>>>>>  	rte_eth_dev_stop(port_id);
>>>>> -	ret = dev->dev_ops->dev_reset(dev);
>>>>>
>>>>> -	return eth_err(port_id, ret);
>>>>> +	/* store previous device state temporary */
>>>>> +	args->pre_state = dev->state;
>>>>> +
>>>>> +	dev->state = RTE_ETH_DEV_RESETTING;
>>>>
>>>> Do we need to update the state, I think this will break
>>>> rte_eth_dev_count() and friends, like during device reset app will think it has
>> one less device in system.
>>>
>>> I'd like to have this new state which represent the situation of the device more
>> accurate.
>>> In this patch RTE_ETH_DEV_RESETTING is just to be used to prevent double
>> reset, but in future it can also be used to prevent invalid operation during device
>> reset.
>>>
>>> Of cause we need to make sure it does not break exist behavior and seems add
>> RTE_ETH_DEV_RESETTING check in rte_eth_find_next_owned_by and
>> rte_eth_find_next is able to fix the issue you observed.
>>>
>>> I can add this in v4 if you agree the idea.
>>>
>>>>
>>>> <...>
>>>>
>>>>> @@ -1814,21 +1816,29 @@ void rte_eth_dev_close(uint16_t port_id);
>>>>>   * RTE_ETH_EVENT_INTR_RESET event is detected, but can also use it
>>>>> to
>>>> start
>>>>>   * a port reset in other circumstances.
>>>>>   *
>>>>> - * When this function is called, it first stops the port and then
>>>>> calls the
>>>>> - * PMD specific dev_uninit( ) and dev_init( ) to return the port to
>>>>> initial
>>>>> - * state, in which no Tx and Rx queues are setup, as if the port
>>>>> has been
>>>>> - * reset and not started. The port keeps the port id it had before
>>>>> the
>>>>> - * function call.
>>>>> - *
>>>>> - * After calling rte_eth_dev_reset( ), the application should use
>>>>> - * rte_eth_dev_configure( ), rte_eth_rx_queue_setup( ),
>>>>> - * rte_eth_tx_queue_setup( ), and rte_eth_dev_start( )
>>>>> - * to reconfigure the device as appropriate.
>>>>> - *
>>>>> - * Note: To avoid unexpected behavior, the application should stop
>>>>> calling
>>>>> - * Tx and Rx functions before calling rte_eth_dev_reset( ). For
>>>>> thread
>>>>> - * safety, all these controlling functions should be called from
>>>>> the same
>>>>> - * thread.
>>>>> + * @note
>>>>> + * Device reset may have the dependency, for example, a VF reset
>>>>> + expects
>>>>> + * PF ready, or a NIC function as a part of a SOC need to wait for
>>>>> + other
>>>>> + * parts of the system be ready, these are time-consuming tasks and
>>>>> + will
>>>>> + * block current thread.
>>>>> + *
>>>>> + * As the name, rte_eth_dev_reset_async is an async API, it will
>>>>> + spwan a
>>>>> + * new thread to call ops->dev_reset, once it is finished, it will
>>>>> + raise
>>>>> + * the RTE_ETH_EVENT_RESET_COMPLETE event to notify application.
>>>>> + That makes
>>>>> + * things easy for an application that what to reset the device
>>>>> + from the
>>>>> + * interrupt thread since typically a RTE_ETH_EVENT_INTR_RESET
>>>>> + handler is
>>>>> + * invoked in interrupt thread.
>>>>
>>>> thread calls dev_ops->dev_reset(dev) and wait for it, so it means
>>>> dev_ops->dev_reset is synchronous, perhaps it would be good to
>>>> highlight this in "dev_reset" comment to help PMD developers.
>>>
>>> OK
>>>
>>>>
>>>> of dev_ops->dev_reset() is synchronous, means existing
>>>> rte_eth_dev_reset() is synchronous, so what do you thinks keep
>>>> rte_eth_dev_reset() as it is and add new
>>>> rte_eth_dev_reset_async() API? Than we will have both sync and async
>>>> solution.
>>>
>>> Typically device reset happens when application receive
>> RTE_ETH_EVENT_INTR_RESET and this is in interrupt thread.
>>> Invoke an async API in interrupt thread is the right way, is it better if we
>> highlight this is the only way?
>>> I may not prefer to expose the sync API right now, it's better to figure out some
>> typical usage before we expose this, but so far I don't have.
>>
>> Hi Qi,
>>
>> Is the 'rte_eth_dev_reset_async()' still required? Is there any update planned to
>> this RFC?
> 
> Yes, I think the requirement is still there. Just don't have bandwidth work on this recently.
> May I send out v1 for 19.05 in this week? since deprecation notes already be send out in 19.02 cycle

I think technically yes, since proposal is already out for a long time.
But it will give less time to review it when you send this week.

> 
> 
>>
>>>
>>> Regards
>>> Qi
>>>
>>>
>>>>
>>>>> + *
>>>>> + * Application should not assume device reset is finished after
>>>>> + * rte_eth_dev_reset_async return, it should always wait for a
>>>>> + * RTE_ETH_EVENT_RESET_COMPLETE event and check the reset result.
>>>>> + * If reset success, application should call rte_eth_dev_configure(
>>>>> + ),
>>>>> + * rte_eth_rx_queue_setup( ), rte_eth_tx_queue_setup( ),
>>>>> + * and rte_eth_dev_start( ) to reconfigure the device as appropriate.
>>>>> + *
>>>>> + * @Note
>>>>> + * To avoid unexpected behavior, the application should stop
>>>>> + calling
>>>>> + * Tx and Rx functions before calling rte_eth_dev_reset_async( ).
>>>>>   *
>>>>>   * @param port_id
>>>>>   *   The port identifier of the Ethernet device.
>>>>> @@ -1837,12 +1847,10 @@ void rte_eth_dev_close(uint16_t port_id);
>>>>>   *   - (0) if successful.
>>>>>   *   - (-EINVAL) if port identifier is invalid.
>>>>>   *   - (-ENOTSUP) if hardware doesn't support this function.
>>>>> - *   - (-EPERM) if not ran from the primary process.
>>>>> - *   - (-EIO) if re-initialisation failed or device is removed.
>>>>>   *   - (-ENOMEM) if the reset failed due to OOM.
>>>>> - *   - (-EAGAIN) if the reset temporarily failed and should be retried later.
>>>>> + *   - (<0) other errors from low level driver.
>>>>>   */
>>>>> -int rte_eth_dev_reset(uint16_t port_id);
>>>>> +int rte_eth_dev_reset_async(uint16_t port_id);
>>>>>
>>>>>  /**
>>>>>   * Enable receipt in promiscuous mode for an Ethernet device.
>>>>> @@ -2574,6 +2582,8 @@ enum rte_eth_event_type {
>>>>>  				/**< queue state event (enabled/disabled) */
>>>>>  	RTE_ETH_EVENT_INTR_RESET,
>>>>>  			/**< reset interrupt event, sent to VF on PF reset */
>>>>> +	RTE_ETH_EVENT_RESET_COMPLETE,
>>>>> +			/**< inform application that reset is completed */
>>>>>  	RTE_ETH_EVENT_VF_MBOX,  /**< message from the VF received by
>> PF
>>>> */
>>>>>  	RTE_ETH_EVENT_MACSEC,   /**< MACsec offload related event */
>>>>>  	RTE_ETH_EVENT_INTR_RMV, /**< device removal event */
>>>>>
>>>
>
  

Patch

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index ee48db2a3..ce3d9931b 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -1918,6 +1918,54 @@  close_port(portid_t pid)
 	printf("Done\n");
 }
 
+static int reset_ret;
+static int reset_done;
+
+static int
+on_reset_complete(__rte_unused uint16_t port_id,
+		__rte_unused enum rte_eth_event_type event,
+		__rte_unused void *cb_arg,
+		void *ret_param)
+{
+	RTE_ASSERT(event == RTE_ETH_EVENT_RESET_COMPLETE);
+
+	reset_ret = *(int *)ret_param;
+	reset_done = 1;
+	return 0;
+}
+
+static int
+do_dev_reset_sync(portid_t pid)
+{
+	int ret;
+
+	ret = rte_eth_dev_callback_register(pid,
+			RTE_ETH_EVENT_RESET_COMPLETE,
+			on_reset_complete, NULL);
+
+	if (ret) {
+		printf("Fail to reigster callback function\n");
+		return ret;
+	}
+
+	reset_done = 0;
+	ret = rte_eth_dev_reset_async(pid);
+	if (ret)
+		goto finish;
+
+	while (!reset_done)
+		rte_delay_ms(10);
+
+	ret = reset_ret;
+
+finish:
+	rte_eth_dev_callback_unregister(pid,
+			RTE_ETH_EVENT_RESET_COMPLETE,
+			on_reset_complete, NULL);
+
+	return ret;
+}
+
 void
 reset_port(portid_t pid)
 {
@@ -1946,7 +1994,7 @@  reset_port(portid_t pid)
 			continue;
 		}
 
-		diag = rte_eth_dev_reset(pi);
+		diag = do_dev_reset_sync(pi);
 		if (diag == 0) {
 			port = &ports[pi];
 			port->need_reconfig = 1;
diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c
index 4c3202505..69279577d 100644
--- a/lib/librte_ethdev/rte_ethdev.c
+++ b/lib/librte_ethdev/rte_ethdev.c
@@ -1374,10 +1374,38 @@  rte_eth_dev_close(uint16_t port_id)
 	dev->data->tx_queues = NULL;
 }
 
+struct dev_reset_args {
+	struct rte_eth_dev *dev;
+	enum rte_eth_dev_state pre_state;
+};
+
+static void *
+do_dev_reset(void *args)
+{
+	struct dev_reset_args *reset_args = args;
+	struct rte_eth_dev *dev = reset_args->dev;
+	int ret;
+
+	ret = dev->dev_ops->dev_reset(dev);
+
+	/*restore previous device state */
+	dev->state = reset_args->pre_state;
+
+	/* notify users */
+	_rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_RESET_COMPLETE,
+					&ret);
+
+	free(args);
+	return NULL;
+}
+
 int
-rte_eth_dev_reset(uint16_t port_id)
+rte_eth_dev_reset_async(uint16_t port_id)
 {
 	struct rte_eth_dev *dev;
+	struct dev_reset_args *args;
+	char pthread_name[20];
+	pthread_t tid;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -1385,10 +1413,26 @@  rte_eth_dev_reset(uint16_t port_id)
 
 	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
 
+	/* already on resetting */
+	if (dev->state == RTE_ETH_DEV_RESETTING)
+		return 0;
+
+	args = calloc(1, sizeof(struct dev_reset_args));
+	if (!args)
+		return -ENOMEM;
+
 	rte_eth_dev_stop(port_id);
-	ret = dev->dev_ops->dev_reset(dev);
 
-	return eth_err(port_id, ret);
+	/* store previous device state temporary */
+	args->pre_state = dev->state;
+
+	dev->state = RTE_ETH_DEV_RESETTING;
+	snprintf(pthread_name, sizeof(pthread_name),
+			"do_dev_reset_%d", port_id);
+	ret = rte_ctrl_thread_create(&tid, pthread_name, NULL,
+					do_dev_reset, args);
+
+	return ret;
 }
 
 int __rte_experimental
diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
index 7070e9ab4..95fcb2931 100644
--- a/lib/librte_ethdev/rte_ethdev.h
+++ b/lib/librte_ethdev/rte_ethdev.h
@@ -1239,6 +1239,8 @@  enum rte_eth_dev_state {
 	RTE_ETH_DEV_DEFERRED,
 	/** Device is in removed state when plug-out is detected. */
 	RTE_ETH_DEV_REMOVED,
+	/** Device is on resetting when rte_eth_dev_reset_async is called. */
+	RTE_ETH_DEV_RESETTING,
 };
 
 struct rte_eth_dev_sriov {
@@ -1814,21 +1816,29 @@  void rte_eth_dev_close(uint16_t port_id);
  * RTE_ETH_EVENT_INTR_RESET event is detected, but can also use it to start
  * a port reset in other circumstances.
  *
- * When this function is called, it first stops the port and then calls the
- * PMD specific dev_uninit( ) and dev_init( ) to return the port to initial
- * state, in which no Tx and Rx queues are setup, as if the port has been
- * reset and not started. The port keeps the port id it had before the
- * function call.
- *
- * After calling rte_eth_dev_reset( ), the application should use
- * rte_eth_dev_configure( ), rte_eth_rx_queue_setup( ),
- * rte_eth_tx_queue_setup( ), and rte_eth_dev_start( )
- * to reconfigure the device as appropriate.
- *
- * Note: To avoid unexpected behavior, the application should stop calling
- * Tx and Rx functions before calling rte_eth_dev_reset( ). For thread
- * safety, all these controlling functions should be called from the same
- * thread.
+ * @note
+ * Device reset may have the dependency, for example, a VF reset expects
+ * PF ready, or a NIC function as a part of a SOC need to wait for other
+ * parts of the system be ready, these are time-consuming tasks and will
+ * block current thread.
+ *
+ * As the name, rte_eth_dev_reset_async is an async API, it will spwan a
+ * new thread to call ops->dev_reset, once it is finished, it will raise
+ * the RTE_ETH_EVENT_RESET_COMPLETE event to notify application.  That makes
+ * things easy for an application that what to reset the device from the
+ * interrupt thread since typically a RTE_ETH_EVENT_INTR_RESET handler is
+ * invoked in interrupt thread.
+ *
+ * Application should not assume device reset is finished after
+ * rte_eth_dev_reset_async return, it should always wait for a
+ * RTE_ETH_EVENT_RESET_COMPLETE event and check the reset result.
+ * If reset success, application should call rte_eth_dev_configure( ),
+ * rte_eth_rx_queue_setup( ), rte_eth_tx_queue_setup( ),
+ * and rte_eth_dev_start( ) to reconfigure the device as appropriate.
+ *
+ * @Note
+ * To avoid unexpected behavior, the application should stop calling
+ * Tx and Rx functions before calling rte_eth_dev_reset_async( ).
  *
  * @param port_id
  *   The port identifier of the Ethernet device.
@@ -1837,12 +1847,10 @@  void rte_eth_dev_close(uint16_t port_id);
  *   - (0) if successful.
  *   - (-EINVAL) if port identifier is invalid.
  *   - (-ENOTSUP) if hardware doesn't support this function.
- *   - (-EPERM) if not ran from the primary process.
- *   - (-EIO) if re-initialisation failed or device is removed.
  *   - (-ENOMEM) if the reset failed due to OOM.
- *   - (-EAGAIN) if the reset temporarily failed and should be retried later.
+ *   - (<0) other errors from low level driver.
  */
-int rte_eth_dev_reset(uint16_t port_id);
+int rte_eth_dev_reset_async(uint16_t port_id);
 
 /**
  * Enable receipt in promiscuous mode for an Ethernet device.
@@ -2574,6 +2582,8 @@  enum rte_eth_event_type {
 				/**< queue state event (enabled/disabled) */
 	RTE_ETH_EVENT_INTR_RESET,
 			/**< reset interrupt event, sent to VF on PF reset */
+	RTE_ETH_EVENT_RESET_COMPLETE,
+			/**< inform application that reset is completed */
 	RTE_ETH_EVENT_VF_MBOX,  /**< message from the VF received by PF */
 	RTE_ETH_EVENT_MACSEC,   /**< MACsec offload related event */
 	RTE_ETH_EVENT_INTR_RMV, /**< device removal event */