[3/3] app/testpmd: fix callback issue for hot-unplug

Message ID 1541484436-91320-4-git-send-email-jia.guo@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series fix vfio hot-unplug issue |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Guo, Jia Nov. 6, 2018, 6:07 a.m. UTC
  Before detach device when device be hot-unplugged, the failure process
in user space and kernel space both need to be finished, such as eal
interrupt callback need to be inactive before the callback be unregistered
when device is being cleaned. This patch add rte alarm for device
detaching, with that it could finish interrupt callback soon and
give time to let the failure process done before detaching.

Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 app/test-pmd/testpmd.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)
  

Comments

Matan Azrad Nov. 6, 2018, 6:36 a.m. UTC | #1
Hi Jeff

 From: Jeff Guo <jia.guo@intel.com>
> Before detach device when device be hot-unplugged, the failure process in
> user space and kernel space both need to be finished, such as eal interrupt
> callback need to be inactive before the callback be unregistered when device
> is being cleaned. This patch add rte alarm for device detaching, with that it
> could finish interrupt callback soon and give time to let the failure process
> done before detaching.
> 
> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
>  app/test-pmd/testpmd.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> 9c0edca..9c673cf 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
> *device_name, enum rte_dev_event_type type,
>  				device_name);
>  			return;
>  		}
> -		rmv_event_callback((void *)(intptr_t)port_id);
> +		/*
> +		 * Before detach device, the hot-unplug failure process in
> +		 * user space and kernel space both need to be finished,
> +		 * such as eal interrupt callback need to be inactive before
> +		 * the callback be unregistered when device is being cleaned.
> +		 * So finished interrupt callback soon here and give time to
> +		 * let the work done before detaching.
> +		 */
> +		if (rte_eal_alarm_set(100000,
> +				rmv_event_callback, (void
> *)(intptr_t)port_id))
> +			RTE_LOG(ERR, EAL,
> +				"Could not set up deferred device


It looks me strange to use callback and alarm to remove a device:
Why not to use callback and that is it?

I think that it's better to let the EAL to detach the device after all the callbacks were done and not to do it by the user callback.
So the application\callback owners just need to clean its resources with understanding that after the callback the device(and the callback itself) will be detached by the EAL. 


> removal\n");
>  		break;
>  	case RTE_DEV_EVENT_ADD:
>  		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
> --
> 2.7.4
  
Guo, Jia Nov. 7, 2018, 7:30 a.m. UTC | #2
matan

On 11/6/2018 2:36 PM, Matan Azrad wrote:
> Hi Jeff
>
>   From: Jeff Guo <jia.guo@intel.com>
>> Before detach device when device be hot-unplugged, the failure process in
>> user space and kernel space both need to be finished, such as eal interrupt
>> callback need to be inactive before the callback be unregistered when device
>> is being cleaned. This patch add rte alarm for device detaching, with that it
>> could finish interrupt callback soon and give time to let the failure process
>> done before detaching.
>>
>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>>   app/test-pmd/testpmd.c | 13 ++++++++++++-
>>   1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>> 9c0edca..9c673cf 100644
>> --- a/app/test-pmd/testpmd.c
>> +++ b/app/test-pmd/testpmd.c
>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
>> *device_name, enum rte_dev_event_type type,
>>   				device_name);
>>   			return;
>>   		}
>> -		rmv_event_callback((void *)(intptr_t)port_id);
>> +		/*
>> +		 * Before detach device, the hot-unplug failure process in
>> +		 * user space and kernel space both need to be finished,
>> +		 * such as eal interrupt callback need to be inactive before
>> +		 * the callback be unregistered when device is being cleaned.
>> +		 * So finished interrupt callback soon here and give time to
>> +		 * let the work done before detaching.
>> +		 */
>> +		if (rte_eal_alarm_set(100000,
>> +				rmv_event_callback, (void
>> *)(intptr_t)port_id))
>> +			RTE_LOG(ERR, EAL,
>> +				"Could not set up deferred device
>
> It looks me strange to use callback and alarm to remove a device:
> Why not to use callback and that is it?
>
> I think that it's better to let the EAL to detach the device after all the callbacks were done and not to do it by the user callback.
> So the application\callback owners just need to clean its resources with understanding that after the callback the device(and the callback itself) will be detached by the EAL.


Firstly, at the currently framework and solution, such as callback for 
RTE_ETH_EVENT_INTR_RMV, still need to use the deferred device removal, 
we tend to give the control of detaching device to the application, and 
the whole process is located on the user's callback. Notify app to 
detach device by callback but make it deferred, i think it is fine.

Secondly, the vfio is different with igb_uio for hot-unplug, it 
register/unregister hotplug interrupt callback for each device, so need 
to make  the callback done before unregister the callback.

So I think it should be considerate as an workaround here, before we 
find a better way.


>
>> removal\n");
>>   		break;
>>   	case RTE_DEV_EVENT_ADD:
>>   		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
>> --
>> 2.7.4
  
Ananyev, Konstantin Nov. 7, 2018, 11:05 a.m. UTC | #3
> -----Original Message-----
> From: Guo, Jia
> Sent: Wednesday, November 7, 2018 7:30 AM
> To: Matan Azrad <matan@mellanox.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Burakov, Anatoly
> <anatoly.burakov@intel.com>; Thomas Monjalon <thomas@monjalon.net>; Iremonger, Bernard <bernard.iremonger@intel.com>; Wu,
> Jingjing <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>; He, Shaopeng <shaopeng.he@intel.com>
> Subject: Re: [PATCH 3/3] app/testpmd: fix callback issue for hot-unplug
> 
> matan
> 
> On 11/6/2018 2:36 PM, Matan Azrad wrote:
> > Hi Jeff
> >
> >   From: Jeff Guo <jia.guo@intel.com>
> >> Before detach device when device be hot-unplugged, the failure process in
> >> user space and kernel space both need to be finished, such as eal interrupt
> >> callback need to be inactive before the callback be unregistered when device
> >> is being cleaned. This patch add rte alarm for device detaching, with that it
> >> could finish interrupt callback soon and give time to let the failure process
> >> done before detaching.
> >>
> >> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
> >> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> >> ---
> >>   app/test-pmd/testpmd.c | 13 ++++++++++++-
> >>   1 file changed, 12 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> >> 9c0edca..9c673cf 100644
> >> --- a/app/test-pmd/testpmd.c
> >> +++ b/app/test-pmd/testpmd.c
> >> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
> >> *device_name, enum rte_dev_event_type type,
> >>   				device_name);
> >>   			return;
> >>   		}
> >> -		rmv_event_callback((void *)(intptr_t)port_id);
> >> +		/*
> >> +		 * Before detach device, the hot-unplug failure process in
> >> +		 * user space and kernel space both need to be finished,
> >> +		 * such as eal interrupt callback need to be inactive before
> >> +		 * the callback be unregistered when device is being cleaned.
> >> +		 * So finished interrupt callback soon here and give time to
> >> +		 * let the work done before detaching.
> >> +		 */
> >> +		if (rte_eal_alarm_set(100000,
> >> +				rmv_event_callback, (void
> >> *)(intptr_t)port_id))
> >> +			RTE_LOG(ERR, EAL,
> >> +				"Could not set up deferred device
> >
> > It looks me strange to use callback and alarm to remove a device:
> > Why not to use callback and that is it?
> >
> > I think that it's better to let the EAL to detach the device after all the callbacks were done and not to do it by the user callback.
> > So the application\callback owners just need to clean its resources with understanding that after the callback the device(and the callback
> itself) will be detached by the EAL.
> 
> 
> Firstly, at the currently framework and solution, such as callback for
> RTE_ETH_EVENT_INTR_RMV, still need to use the deferred device removal,
> we tend to give the control of detaching device to the application, and
> the whole process is located on the user's callback. Notify app to
> detach device by callback but make it deferred, i think it is fine.

It is also unclear to me my we need an alarm here.
First (probably wrong) impression we just try to hide some synchronization
Problem by introducing delay.
Konstantin

> 
> Secondly, the vfio is different with igb_uio for hot-unplug, it
> register/unregister hotplug interrupt callback for each device, so need
> to make  the callback done before unregister the callback.
> 
> So I think it should be considerate as an workaround here, before we
> find a better way.
> 
> 
> >
> >> removal\n");
> >>   		break;
> >>   	case RTE_DEV_EVENT_ADD:
> >>   		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
> >> --
> >> 2.7.4
  
Matan Azrad Nov. 8, 2018, 7:28 a.m. UTC | #4
From: Ananyev, Konstantin
> > -----Original Message-----
> > From: Guo, Jia
> > Sent: Wednesday, November 7, 2018 7:30 AM
> > To: Matan Azrad <matan@mellanox.com>; Ananyev, Konstantin
> > <konstantin.ananyev@intel.com>; Burakov, Anatoly
> > <anatoly.burakov@intel.com>; Thomas Monjalon
> <thomas@monjalon.net>;
> > Iremonger, Bernard <bernard.iremonger@intel.com>; Wu, Jingjing
> > <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> > Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org; Zhang, Helin
> > <helin.zhang@intel.com>; He, Shaopeng <shaopeng.he@intel.com>
> > Subject: Re: [PATCH 3/3] app/testpmd: fix callback issue for
> > hot-unplug
> >
> > matan
> >
> > On 11/6/2018 2:36 PM, Matan Azrad wrote:
> > > Hi Jeff
> > >
> > >   From: Jeff Guo <jia.guo@intel.com>
> > >> Before detach device when device be hot-unplugged, the failure
> > >> process in user space and kernel space both need to be finished,
> > >> such as eal interrupt callback need to be inactive before the
> > >> callback be unregistered when device is being cleaned. This patch
> > >> add rte alarm for device detaching, with that it could finish
> > >> interrupt callback soon and give time to let the failure process done
> before detaching.
> > >>
> > >> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
> > >> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> > >> ---
> > >>   app/test-pmd/testpmd.c | 13 ++++++++++++-
> > >>   1 file changed, 12 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> > >> 9c0edca..9c673cf 100644
> > >> --- a/app/test-pmd/testpmd.c
> > >> +++ b/app/test-pmd/testpmd.c
> > >> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
> > >> *device_name, enum rte_dev_event_type type,
> > >>   				device_name);
> > >>   			return;
> > >>   		}
> > >> -		rmv_event_callback((void *)(intptr_t)port_id);
> > >> +		/*
> > >> +		 * Before detach device, the hot-unplug failure process in
> > >> +		 * user space and kernel space both need to be finished,
> > >> +		 * such as eal interrupt callback need to be inactive before
> > >> +		 * the callback be unregistered when device is being cleaned.
> > >> +		 * So finished interrupt callback soon here and give time to
> > >> +		 * let the work done before detaching.
> > >> +		 */
> > >> +		if (rte_eal_alarm_set(100000,
> > >> +				rmv_event_callback, (void
> > >> *)(intptr_t)port_id))
> > >> +			RTE_LOG(ERR, EAL,
> > >> +				"Could not set up deferred device
> > >
> > > It looks me strange to use callback and alarm to remove a device:
> > > Why not to use callback and that is it?
> > >
> > > I think that it's better to let the EAL to detach the device after all the
> callbacks were done and not to do it by the user callback.
> > > So the application\callback owners just need to clean its resources
> > > with understanding that after the callback the device(and the
> > > callback
> > itself) will be detached by the EAL.
> >
> >
> > Firstly, at the currently framework and solution, such as callback for
> > RTE_ETH_EVENT_INTR_RMV, still need to use the deferred device
> removal,
> > we tend to give the control of detaching device to the application,
> > and the whole process is located on the user's callback. Notify app to
> > detach device by callback but make it deferred, i think it is fine.

But the device must be detached in remove event, why not to do it in EAL?

> 
> It is also unclear to me my we need an alarm here.
> First (probably wrong) impression we just try to hide some synchronization
> Problem by introducing delay.

Looks like, the issue is that the callback function memory will be removed from the function itself (by the detach call), no?

> Konstantin
> 
> >
> > Secondly, the vfio is different with igb_uio for hot-unplug, it
> > register/unregister hotplug interrupt callback for each device, so
> > need to make  the callback done before unregister the callback.
> >
> > So I think it should be considerate as an workaround here, before we
> > find a better way.
> >
> >
> > >
> > >> removal\n");
> > >>   		break;
> > >>   	case RTE_DEV_EVENT_ADD:
> > >>   		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
> > >> --
> > >> 2.7.4
  
Guo, Jia Nov. 8, 2018, 8:49 a.m. UTC | #5
On 11/8/2018 3:28 PM, Matan Azrad wrote:
>
> From: Ananyev, Konstantin
>>> -----Original Message-----
>>> From: Guo, Jia
>>> Sent: Wednesday, November 7, 2018 7:30 AM
>>> To: Matan Azrad <matan@mellanox.com>; Ananyev, Konstantin
>>> <konstantin.ananyev@intel.com>; Burakov, Anatoly
>>> <anatoly.burakov@intel.com>; Thomas Monjalon
>> <thomas@monjalon.net>;
>>> Iremonger, Bernard <bernard.iremonger@intel.com>; Wu, Jingjing
>>> <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
>>> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org; Zhang, Helin
>>> <helin.zhang@intel.com>; He, Shaopeng <shaopeng.he@intel.com>
>>> Subject: Re: [PATCH 3/3] app/testpmd: fix callback issue for
>>> hot-unplug
>>>
>>> matan
>>>
>>> On 11/6/2018 2:36 PM, Matan Azrad wrote:
>>>> Hi Jeff
>>>>
>>>>    From: Jeff Guo <jia.guo@intel.com>
>>>>> Before detach device when device be hot-unplugged, the failure
>>>>> process in user space and kernel space both need to be finished,
>>>>> such as eal interrupt callback need to be inactive before the
>>>>> callback be unregistered when device is being cleaned. This patch
>>>>> add rte alarm for device detaching, with that it could finish
>>>>> interrupt callback soon and give time to let the failure process done
>> before detaching.
>>>>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
>>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>>>>> ---
>>>>>    app/test-pmd/testpmd.c | 13 ++++++++++++-
>>>>>    1 file changed, 12 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>>>>> 9c0edca..9c673cf 100644
>>>>> --- a/app/test-pmd/testpmd.c
>>>>> +++ b/app/test-pmd/testpmd.c
>>>>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
>>>>> *device_name, enum rte_dev_event_type type,
>>>>>    				device_name);
>>>>>    			return;
>>>>>    		}
>>>>> -		rmv_event_callback((void *)(intptr_t)port_id);
>>>>> +		/*
>>>>> +		 * Before detach device, the hot-unplug failure process in
>>>>> +		 * user space and kernel space both need to be finished,
>>>>> +		 * such as eal interrupt callback need to be inactive before
>>>>> +		 * the callback be unregistered when device is being cleaned.
>>>>> +		 * So finished interrupt callback soon here and give time to
>>>>> +		 * let the work done before detaching.
>>>>> +		 */
>>>>> +		if (rte_eal_alarm_set(100000,
>>>>> +				rmv_event_callback, (void
>>>>> *)(intptr_t)port_id))
>>>>> +			RTE_LOG(ERR, EAL,
>>>>> +				"Could not set up deferred device
>>>> It looks me strange to use callback and alarm to remove a device:
>>>> Why not to use callback and that is it?
>>>>
>>>> I think that it's better to let the EAL to detach the device after all the
>> callbacks were done and not to do it by the user callback.
>>>> So the application\callback owners just need to clean its resources
>>>> with understanding that after the callback the device(and the
>>>> callback
>>> itself) will be detached by the EAL.
>>>
>>>
>>> Firstly, at the currently framework and solution, such as callback for
>>> RTE_ETH_EVENT_INTR_RMV, still need to use the deferred device
>> removal,
>>> we tend to give the control of detaching device to the application,
>>> and the whole process is located on the user's callback. Notify app to
>>> detach device by callback but make it deferred, i think it is fine.
> But the device must be detached in remove event, why not to do it in EAL?


I think it because of before detached the device, application should be 
stop the forwarding at first, then stop the device, then close

the device, finally call eal unplug API to detach device. If eal 
directly detach device at the first step, there will be mountain user 
space error need to handle, so that is one reason why need to provider 
the remove notification to app, and let app to process it.


>> It is also unclear to me my we need an alarm here.
>> First (probably wrong) impression we just try to hide some synchronization
>> Problem by introducing delay.
> Looks like, the issue is that the callback function memory will be removed from the function itself (by the detach call), no?


Answer here for both Konstantin and Matan.

Yes, i think matan is right, the interrupt callback will be destroy in 
the app callback itself, the sequence is that as below

hot-unplug interrupt -> interrupt callback -> app callback(return to 
finish interrupt callback, delay detaching) -> detach device(unregister 
interrupt callback)


>> Konstantin
>>
>>> Secondly, the vfio is different with igb_uio for hot-unplug, it
>>> register/unregister hotplug interrupt callback for each device, so
>>> need to make  the callback done before unregister the callback.
>>>
>>> So I think it should be considerate as an workaround here, before we
>>> find a better way.
>>>
>>>
>>>>> removal\n");
>>>>>    		break;
>>>>>    	case RTE_DEV_EVENT_ADD:
>>>>>    		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",
>>>>> --
>>>>> 2.7.4
  
Matan Azrad Nov. 8, 2018, 9:35 a.m. UTC | #6
From: Jeff Guo 
> On 11/8/2018 3:28 PM, Matan Azrad wrote:
> >
> > From: Ananyev, Konstantin
> >>> -----Original Message-----
> >>> From: Guo, Jia
> >>> Sent: Wednesday, November 7, 2018 7:30 AM
> >>> To: Matan Azrad <matan@mellanox.com>; Ananyev, Konstantin
> >>> <konstantin.ananyev@intel.com>; Burakov, Anatoly
> >>> <anatoly.burakov@intel.com>; Thomas Monjalon
> >> <thomas@monjalon.net>;
> >>> Iremonger, Bernard <bernard.iremonger@intel.com>; Wu, Jingjing
> >>> <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> >>> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org; Zhang,
> >>> Helin <helin.zhang@intel.com>; He, Shaopeng
> <shaopeng.he@intel.com>
> >>> Subject: Re: [PATCH 3/3] app/testpmd: fix callback issue for
> >>> hot-unplug
> >>>
> >>> matan
> >>>
> >>> On 11/6/2018 2:36 PM, Matan Azrad wrote:
> >>>> Hi Jeff
> >>>>
> >>>>    From: Jeff Guo <jia.guo@intel.com>
> >>>>> Before detach device when device be hot-unplugged, the failure
> >>>>> process in user space and kernel space both need to be finished,
> >>>>> such as eal interrupt callback need to be inactive before the
> >>>>> callback be unregistered when device is being cleaned. This patch
> >>>>> add rte alarm for device detaching, with that it could finish
> >>>>> interrupt callback soon and give time to let the failure process
> >>>>> done
> >> before detaching.
> >>>>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
> >>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> >>>>> ---
> >>>>>    app/test-pmd/testpmd.c | 13 ++++++++++++-
> >>>>>    1 file changed, 12 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> >>>>> 9c0edca..9c673cf 100644
> >>>>> --- a/app/test-pmd/testpmd.c
> >>>>> +++ b/app/test-pmd/testpmd.c
> >>>>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
> >>>>> *device_name, enum rte_dev_event_type type,
> >>>>>    				device_name);
> >>>>>    			return;
> >>>>>    		}
> >>>>> -		rmv_event_callback((void *)(intptr_t)port_id);
> >>>>> +		/*
> >>>>> +		 * Before detach device, the hot-unplug failure
> process in
> >>>>> +		 * user space and kernel space both need to be
> finished,
> >>>>> +		 * such as eal interrupt callback need to be inactive
> before
> >>>>> +		 * the callback be unregistered when device is being
> cleaned.
> >>>>> +		 * So finished interrupt callback soon here and give
> time to
> >>>>> +		 * let the work done before detaching.
> >>>>> +		 */
> >>>>> +		if (rte_eal_alarm_set(100000,
> >>>>> +				rmv_event_callback, (void
> >>>>> *)(intptr_t)port_id))
> >>>>> +			RTE_LOG(ERR, EAL,
> >>>>> +				"Could not set up deferred device
> >>>> It looks me strange to use callback and alarm to remove a device:
> >>>> Why not to use callback and that is it?
> >>>>
> >>>> I think that it's better to let the EAL to detach the device after
> >>>> all the
> >> callbacks were done and not to do it by the user callback.
> >>>> So the application\callback owners just need to clean its resources
> >>>> with understanding that after the callback the device(and the
> >>>> callback
> >>> itself) will be detached by the EAL.
> >>>
> >>>
> >>> Firstly, at the currently framework and solution, such as callback
> >>> for RTE_ETH_EVENT_INTR_RMV, still need to use the deferred device
> >> removal,
> >>> we tend to give the control of detaching device to the application,
> >>> and the whole process is located on the user's callback. Notify app
> >>> to detach device by callback but make it deferred, i think it is fine.
> > But the device must be detached in remove event, why not to do it in EAL?
> 
> 
> I think it because of before detached the device, application should be stop
> the forwarding at first, then stop the device, then close
> 
> the device, finally call eal unplug API to detach device. If eal directly detach
> device at the first step, there will be mountain user space error need to
> handle, so that is one reason why need to provider the remove notification
> to app, and let app to process it.


This is why the EAL need to detach the device only after all the user callbacks were done.

> 
> 
> >> It is also unclear to me my we need an alarm here.
> >> First (probably wrong) impression we just try to hide some
> >> synchronization Problem by introducing delay.
> > Looks like, the issue is that the callback function memory will be removed
> from the function itself (by the detach call), no?
> 
> 
> Answer here for both Konstantin and Matan.
> 
> Yes, i think matan is right, the interrupt callback will be destroy in the app
> callback itself, the sequence is that as below
> 
> hot-unplug interrupt -> interrupt callback -> app callback(return to finish
> interrupt callback, delay detaching) -> detach device(unregister interrupt
> callback)
> 
> 
> >> Konstantin
> >>
> >>> Secondly, the vfio is different with igb_uio for hot-unplug, it
> >>> register/unregister hotplug interrupt callback for each device, so
> >>> need to make  the callback done before unregister the callback.
> >>>
> >>> So I think it should be considerate as an workaround here, before we
> >>> find a better way.
> >>>
> >>>
> >>>>> removal\n");
> >>>>>    		break;
> >>>>>    	case RTE_DEV_EVENT_ADD:
> >>>>>    		RTE_LOG(ERR, EAL, "The device: %s has been
> added!\n",
> >>>>> --
> >>>>> 2.7.4
  
Guo, Jia Nov. 9, 2018, 3:55 a.m. UTC | #7
On 11/8/2018 5:35 PM, Matan Azrad wrote:
>
> From: Jeff Guo
>> On 11/8/2018 3:28 PM, Matan Azrad wrote:
>>> From: Ananyev, Konstantin
>>>>> -----Original Message-----
>>>>> From: Guo, Jia
>>>>> Sent: Wednesday, November 7, 2018 7:30 AM
>>>>> To: Matan Azrad <matan@mellanox.com>; Ananyev, Konstantin
>>>>> <konstantin.ananyev@intel.com>; Burakov, Anatoly
>>>>> <anatoly.burakov@intel.com>; Thomas Monjalon
>>>> <thomas@monjalon.net>;
>>>>> Iremonger, Bernard <bernard.iremonger@intel.com>; Wu, Jingjing
>>>>> <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
>>>>> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org; Zhang,
>>>>> Helin <helin.zhang@intel.com>; He, Shaopeng
>> <shaopeng.he@intel.com>
>>>>> Subject: Re: [PATCH 3/3] app/testpmd: fix callback issue for
>>>>> hot-unplug
>>>>>
>>>>> matan
>>>>>
>>>>> On 11/6/2018 2:36 PM, Matan Azrad wrote:
>>>>>> Hi Jeff
>>>>>>
>>>>>>     From: Jeff Guo <jia.guo@intel.com>
>>>>>>> Before detach device when device be hot-unplugged, the failure
>>>>>>> process in user space and kernel space both need to be finished,
>>>>>>> such as eal interrupt callback need to be inactive before the
>>>>>>> callback be unregistered when device is being cleaned. This patch
>>>>>>> add rte alarm for device detaching, with that it could finish
>>>>>>> interrupt callback soon and give time to let the failure process
>>>>>>> done
>>>> before detaching.
>>>>>>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
>>>>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>>>>>>> ---
>>>>>>>     app/test-pmd/testpmd.c | 13 ++++++++++++-
>>>>>>>     1 file changed, 12 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>>>>>>> 9c0edca..9c673cf 100644
>>>>>>> --- a/app/test-pmd/testpmd.c
>>>>>>> +++ b/app/test-pmd/testpmd.c
>>>>>>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
>>>>>>> *device_name, enum rte_dev_event_type type,
>>>>>>>     				device_name);
>>>>>>>     			return;
>>>>>>>     		}
>>>>>>> -		rmv_event_callback((void *)(intptr_t)port_id);
>>>>>>> +		/*
>>>>>>> +		 * Before detach device, the hot-unplug failure
>> process in
>>>>>>> +		 * user space and kernel space both need to be
>> finished,
>>>>>>> +		 * such as eal interrupt callback need to be inactive
>> before
>>>>>>> +		 * the callback be unregistered when device is being
>> cleaned.
>>>>>>> +		 * So finished interrupt callback soon here and give
>> time to
>>>>>>> +		 * let the work done before detaching.
>>>>>>> +		 */
>>>>>>> +		if (rte_eal_alarm_set(100000,
>>>>>>> +				rmv_event_callback, (void
>>>>>>> *)(intptr_t)port_id))
>>>>>>> +			RTE_LOG(ERR, EAL,
>>>>>>> +				"Could not set up deferred device
>>>>>> It looks me strange to use callback and alarm to remove a device:
>>>>>> Why not to use callback and that is it?
>>>>>>
>>>>>> I think that it's better to let the EAL to detach the device after
>>>>>> all the
>>>> callbacks were done and not to do it by the user callback.
>>>>>> So the application\callback owners just need to clean its resources
>>>>>> with understanding that after the callback the device(and the
>>>>>> callback
>>>>> itself) will be detached by the EAL.
>>>>>
>>>>>
>>>>> Firstly, at the currently framework and solution, such as callback
>>>>> for RTE_ETH_EVENT_INTR_RMV, still need to use the deferred device
>>>> removal,
>>>>> we tend to give the control of detaching device to the application,
>>>>> and the whole process is located on the user's callback. Notify app
>>>>> to detach device by callback but make it deferred, i think it is fine.
>>> But the device must be detached in remove event, why not to do it in EAL?
>>
>> I think it because of before detached the device, application should be stop
>> the forwarding at first, then stop the device, then close
>>
>> the device, finally call eal unplug API to detach device. If eal directly detach
>> device at the first step, there will be mountain user space error need to
>> handle, so that is one reason why need to provider the remove notification
>> to app, and let app to process it.
>
> This is why the EAL need to detach the device only after all the user callbacks were done.


If i correctly got your meaning, you suppose to let eal to mandatory 
detach device but not app, app just need to stop/close port, right?

If so, it will need to modify rmv_event_callback by not invoke the 
detaching func and add some detaching logic to hotplug framework in eal.

It is hardly say better or worse but this new propose need to discuss 
more in public. And it might be better to use another specific patch to 
handler it. What do you or other guys think so?


>>
>>>> It is also unclear to me my we need an alarm here.
>>>> First (probably wrong) impression we just try to hide some
>>>> synchronization Problem by introducing delay.
>>> Looks like, the issue is that the callback function memory will be removed
>> from the function itself (by the detach call), no?
>>
>>
>> Answer here for both Konstantin and Matan.
>>
>> Yes, i think matan is right, the interrupt callback will be destroy in the app
>> callback itself, the sequence is that as below
>>
>> hot-unplug interrupt -> interrupt callback -> app callback(return to finish
>> interrupt callback, delay detaching) -> detach device(unregister interrupt
>> callback)
>>
>>
>>>> Konstantin
>>>>
>>>>> Secondly, the vfio is different with igb_uio for hot-unplug, it
>>>>> register/unregister hotplug interrupt callback for each device, so
>>>>> need to make  the callback done before unregister the callback.
>>>>>
>>>>> So I think it should be considerate as an workaround here, before we
>>>>> find a better way.
>>>>>
>>>>>
>>>>>>> removal\n");
>>>>>>>     		break;
>>>>>>>     	case RTE_DEV_EVENT_ADD:
>>>>>>>     		RTE_LOG(ERR, EAL, "The device: %s has been
>> added!\n",
>>>>>>> --
>>>>>>> 2.7.4
  
Matan Azrad Nov. 9, 2018, 5:24 a.m. UTC | #8
From: Jeff Guo 
> On 11/8/2018 5:35 PM, Matan Azrad wrote:
> >
> > From: Jeff Guo
> >> On 11/8/2018 3:28 PM, Matan Azrad wrote:
> >>> From: Ananyev, Konstantin
> >>>>> -----Original Message-----
> >>>>> From: Guo, Jia
> >>>>> Sent: Wednesday, November 7, 2018 7:30 AM
> >>>>> To: Matan Azrad <matan@mellanox.com>; Ananyev, Konstantin
> >>>>> <konstantin.ananyev@intel.com>; Burakov, Anatoly
> >>>>> <anatoly.burakov@intel.com>; Thomas Monjalon
> >>>> <thomas@monjalon.net>;
> >>>>> Iremonger, Bernard <bernard.iremonger@intel.com>; Wu, Jingjing
> >>>>> <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> >>>>> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org; Zhang,
> >>>>> Helin <helin.zhang@intel.com>; He, Shaopeng
> >> <shaopeng.he@intel.com>
> >>>>> Subject: Re: [PATCH 3/3] app/testpmd: fix callback issue for
> >>>>> hot-unplug
> >>>>>
> >>>>> matan
> >>>>>
> >>>>> On 11/6/2018 2:36 PM, Matan Azrad wrote:
> >>>>>> Hi Jeff
> >>>>>>
> >>>>>>     From: Jeff Guo <jia.guo@intel.com>
> >>>>>>> Before detach device when device be hot-unplugged, the failure
> >>>>>>> process in user space and kernel space both need to be finished,
> >>>>>>> such as eal interrupt callback need to be inactive before the
> >>>>>>> callback be unregistered when device is being cleaned. This
> >>>>>>> patch add rte alarm for device detaching, with that it could
> >>>>>>> finish interrupt callback soon and give time to let the failure
> >>>>>>> process done
> >>>> before detaching.
> >>>>>>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
> >>>>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> >>>>>>> ---
> >>>>>>>     app/test-pmd/testpmd.c | 13 ++++++++++++-
> >>>>>>>     1 file changed, 12 insertions(+), 1 deletion(-)
> >>>>>>>
> >>>>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> >>>>>>> index 9c0edca..9c673cf 100644
> >>>>>>> --- a/app/test-pmd/testpmd.c
> >>>>>>> +++ b/app/test-pmd/testpmd.c
> >>>>>>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
> >>>>>>> *device_name, enum rte_dev_event_type type,
> >>>>>>>     				device_name);
> >>>>>>>     			return;
> >>>>>>>     		}
> >>>>>>> -		rmv_event_callback((void *)(intptr_t)port_id);
> >>>>>>> +		/*
> >>>>>>> +		 * Before detach device, the hot-unplug failure
> >> process in
> >>>>>>> +		 * user space and kernel space both need to be
> >> finished,
> >>>>>>> +		 * such as eal interrupt callback need to be inactive
> >> before
> >>>>>>> +		 * the callback be unregistered when device is being
> >> cleaned.
> >>>>>>> +		 * So finished interrupt callback soon here and give
> >> time to
> >>>>>>> +		 * let the work done before detaching.
> >>>>>>> +		 */
> >>>>>>> +		if (rte_eal_alarm_set(100000,
> >>>>>>> +				rmv_event_callback, (void
> >>>>>>> *)(intptr_t)port_id))
> >>>>>>> +			RTE_LOG(ERR, EAL,
> >>>>>>> +				"Could not set up deferred device
> >>>>>> It looks me strange to use callback and alarm to remove a device:
> >>>>>> Why not to use callback and that is it?
> >>>>>>
> >>>>>> I think that it's better to let the EAL to detach the device
> >>>>>> after all the
> >>>> callbacks were done and not to do it by the user callback.
> >>>>>> So the application\callback owners just need to clean its
> >>>>>> resources with understanding that after the callback the
> >>>>>> device(and the callback
> >>>>> itself) will be detached by the EAL.
> >>>>>
> >>>>>
> >>>>> Firstly, at the currently framework and solution, such as callback
> >>>>> for RTE_ETH_EVENT_INTR_RMV, still need to use the deferred device
> >>>> removal,
> >>>>> we tend to give the control of detaching device to the
> >>>>> application, and the whole process is located on the user's
> >>>>> callback. Notify app to detach device by callback but make it deferred,
> i think it is fine.
> >>> But the device must be detached in remove event, why not to do it in
> EAL?
> >>
> >> I think it because of before detached the device, application should
> >> be stop the forwarding at first, then stop the device, then close
> >>
> >> the device, finally call eal unplug API to detach device. If eal
> >> directly detach device at the first step, there will be mountain user
> >> space error need to handle, so that is one reason why need to
> >> provider the remove notification to app, and let app to process it.
> >
> > This is why the EAL need to detach the device only after all the user
> callbacks were done.
> 
> 
> If i correctly got your meaning, you suppose to let eal to mandatory detach
> device but not app, app just need to stop/close port, right?

Yes, the app should stop,close,clean its own resources of the removed device,
Then, EAL to detach the device.

> 
> If so, it will need to modify rmv_event_callback by not invoke the detaching
> func and add some detaching logic to hotplug framework in eal.
> 
rmv_event_callback is using by other hotplug mechanism (ETHDEV RMV event), so you need to use another one\ add parameter to it.
And yes, you need to detach the device from EAL, should be simple.

> It is hardly say better or worse but this new propose need to discuss more in
> public. And it might be better to use another specific patch to handler it.
> What do you or other guys think so?

Since you are fixing issue here, it can be done by a fix series.

Other feedbacks are welcome all the time 😊

> 
> 
> >>
> >>>> It is also unclear to me my we need an alarm here.
> >>>> First (probably wrong) impression we just try to hide some
> >>>> synchronization Problem by introducing delay.
> >>> Looks like, the issue is that the callback function memory will be
> >>> removed
> >> from the function itself (by the detach call), no?
> >>
> >>
> >> Answer here for both Konstantin and Matan.
> >>
> >> Yes, i think matan is right, the interrupt callback will be destroy
> >> in the app callback itself, the sequence is that as below
> >>
> >> hot-unplug interrupt -> interrupt callback -> app callback(return to
> >> finish interrupt callback, delay detaching) -> detach
> >> device(unregister interrupt
> >> callback)
> >>
> >>
> >>>> Konstantin
> >>>>
> >>>>> Secondly, the vfio is different with igb_uio for hot-unplug, it
> >>>>> register/unregister hotplug interrupt callback for each device, so
> >>>>> need to make  the callback done before unregister the callback.
> >>>>>
> >>>>> So I think it should be considerate as an workaround here, before
> >>>>> we find a better way.
> >>>>>
> >>>>>
> >>>>>>> removal\n");
> >>>>>>>     		break;
> >>>>>>>     	case RTE_DEV_EVENT_ADD:
> >>>>>>>     		RTE_LOG(ERR, EAL, "The device: %s has been
> >> added!\n",
> >>>>>>> --
> >>>>>>> 2.7.4
  
Guo, Jia Nov. 9, 2018, 6:17 a.m. UTC | #9
On 11/9/2018 1:24 PM, Matan Azrad wrote:
>
>   From: Jeff Guo
>> On 11/8/2018 5:35 PM, Matan Azrad wrote:
>>> From: Jeff Guo
>>>> On 11/8/2018 3:28 PM, Matan Azrad wrote:
>>>>> From: Ananyev, Konstantin
>>>>>>> -----Original Message-----
>>>>>>> From: Guo, Jia
>>>>>>> Sent: Wednesday, November 7, 2018 7:30 AM
>>>>>>> To: Matan Azrad <matan@mellanox.com>; Ananyev, Konstantin
>>>>>>> <konstantin.ananyev@intel.com>; Burakov, Anatoly
>>>>>>> <anatoly.burakov@intel.com>; Thomas Monjalon
>>>>>> <thomas@monjalon.net>;
>>>>>>> Iremonger, Bernard <bernard.iremonger@intel.com>; Wu, Jingjing
>>>>>>> <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
>>>>>>> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org; Zhang,
>>>>>>> Helin <helin.zhang@intel.com>; He, Shaopeng
>>>> <shaopeng.he@intel.com>
>>>>>>> Subject: Re: [PATCH 3/3] app/testpmd: fix callback issue for
>>>>>>> hot-unplug
>>>>>>>
>>>>>>> matan
>>>>>>>
>>>>>>> On 11/6/2018 2:36 PM, Matan Azrad wrote:
>>>>>>>> Hi Jeff
>>>>>>>>
>>>>>>>>      From: Jeff Guo <jia.guo@intel.com>
>>>>>>>>> Before detach device when device be hot-unplugged, the failure
>>>>>>>>> process in user space and kernel space both need to be finished,
>>>>>>>>> such as eal interrupt callback need to be inactive before the
>>>>>>>>> callback be unregistered when device is being cleaned. This
>>>>>>>>> patch add rte alarm for device detaching, with that it could
>>>>>>>>> finish interrupt callback soon and give time to let the failure
>>>>>>>>> process done
>>>>>> before detaching.
>>>>>>>>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure handler")
>>>>>>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>>>>>>>>> ---
>>>>>>>>>      app/test-pmd/testpmd.c | 13 ++++++++++++-
>>>>>>>>>      1 file changed, 12 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
>>>>>>>>> index 9c0edca..9c673cf 100644
>>>>>>>>> --- a/app/test-pmd/testpmd.c
>>>>>>>>> +++ b/app/test-pmd/testpmd.c
>>>>>>>>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
>>>>>>>>> *device_name, enum rte_dev_event_type type,
>>>>>>>>>      				device_name);
>>>>>>>>>      			return;
>>>>>>>>>      		}
>>>>>>>>> -		rmv_event_callback((void *)(intptr_t)port_id);
>>>>>>>>> +		/*
>>>>>>>>> +		 * Before detach device, the hot-unplug failure
>>>> process in
>>>>>>>>> +		 * user space and kernel space both need to be
>>>> finished,
>>>>>>>>> +		 * such as eal interrupt callback need to be inactive
>>>> before
>>>>>>>>> +		 * the callback be unregistered when device is being
>>>> cleaned.
>>>>>>>>> +		 * So finished interrupt callback soon here and give
>>>> time to
>>>>>>>>> +		 * let the work done before detaching.
>>>>>>>>> +		 */
>>>>>>>>> +		if (rte_eal_alarm_set(100000,
>>>>>>>>> +				rmv_event_callback, (void
>>>>>>>>> *)(intptr_t)port_id))
>>>>>>>>> +			RTE_LOG(ERR, EAL,
>>>>>>>>> +				"Could not set up deferred device
>>>>>>>> It looks me strange to use callback and alarm to remove a device:
>>>>>>>> Why not to use callback and that is it?
>>>>>>>>
>>>>>>>> I think that it's better to let the EAL to detach the device
>>>>>>>> after all the
>>>>>> callbacks were done and not to do it by the user callback.
>>>>>>>> So the application\callback owners just need to clean its
>>>>>>>> resources with understanding that after the callback the
>>>>>>>> device(and the callback
>>>>>>> itself) will be detached by the EAL.
>>>>>>>
>>>>>>>
>>>>>>> Firstly, at the currently framework and solution, such as callback
>>>>>>> for RTE_ETH_EVENT_INTR_RMV, still need to use the deferred device
>>>>>> removal,
>>>>>>> we tend to give the control of detaching device to the
>>>>>>> application, and the whole process is located on the user's
>>>>>>> callback. Notify app to detach device by callback but make it deferred,
>> i think it is fine.
>>>>> But the device must be detached in remove event, why not to do it in
>> EAL?
>>>> I think it because of before detached the device, application should
>>>> be stop the forwarding at first, then stop the device, then close
>>>>
>>>> the device, finally call eal unplug API to detach device. If eal
>>>> directly detach device at the first step, there will be mountain user
>>>> space error need to handle, so that is one reason why need to
>>>> provider the remove notification to app, and let app to process it.
>>> This is why the EAL need to detach the device only after all the user
>> callbacks were done.
>>
>>
>> If i correctly got your meaning, you suppose to let eal to mandatory detach
>> device but not app, app just need to stop/close port, right?
> Yes, the app should stop,close,clean its own resources of the removed device,
> Then, EAL to detach the device.
>
>> If so, it will need to modify rmv_event_callback by not invoke the detaching
>> func and add some detaching logic to hotplug framework in eal.
>>
> rmv_event_callback is using by other hotplug mechanism (ETHDEV RMV event), so you need to use another one\ add parameter to it.
> And yes, you need to detach the device from EAL, should be simple.


I think rmv_event_callback is original use for other hotplug event 
(ETHDEV RMV event), but it still use the common hotplug mechanism(app 
callback and app detach),

so i think it will still need to face this callback issue and you could 
check that eth_event_callback also use the rte alarm to detach device.

so my suggestion is that, you maybe propose a good idea but let we keep 
on current mechanism until we come to a final good solution agreement, 
before that, just let

it functional.


>> It is hardly say better or worse but this new propose need to discuss more in
>> public. And it might be better to use another specific patch to handler it.
>> What do you or other guys think so?
> Since you are fixing issue here, it can be done by a fix series.
>
> Other feedbacks are welcome all the time 😊
>
>>
>>>>>> It is also unclear to me my we need an alarm here.
>>>>>> First (probably wrong) impression we just try to hide some
>>>>>> synchronization Problem by introducing delay.
>>>>> Looks like, the issue is that the callback function memory will be
>>>>> removed
>>>> from the function itself (by the detach call), no?
>>>>
>>>>
>>>> Answer here for both Konstantin and Matan.
>>>>
>>>> Yes, i think matan is right, the interrupt callback will be destroy
>>>> in the app callback itself, the sequence is that as below
>>>>
>>>> hot-unplug interrupt -> interrupt callback -> app callback(return to
>>>> finish interrupt callback, delay detaching) -> detach
>>>> device(unregister interrupt
>>>> callback)
>>>>
>>>>
>>>>>> Konstantin
>>>>>>
>>>>>>> Secondly, the vfio is different with igb_uio for hot-unplug, it
>>>>>>> register/unregister hotplug interrupt callback for each device, so
>>>>>>> need to make  the callback done before unregister the callback.
>>>>>>>
>>>>>>> So I think it should be considerate as an workaround here, before
>>>>>>> we find a better way.
>>>>>>>
>>>>>>>
>>>>>>>>> removal\n");
>>>>>>>>>      		break;
>>>>>>>>>      	case RTE_DEV_EVENT_ADD:
>>>>>>>>>      		RTE_LOG(ERR, EAL, "The device: %s has been
>>>> added!\n",
>>>>>>>>> --
>>>>>>>>> 2.7.4
  
Matan Azrad Nov. 11, 2018, 7:31 a.m. UTC | #10
From: Jeff Guo 
> On 11/9/2018 1:24 PM, Matan Azrad wrote:
> >
> >   From: Jeff Guo
> >> On 11/8/2018 5:35 PM, Matan Azrad wrote:
> >>> From: Jeff Guo
> >>>> On 11/8/2018 3:28 PM, Matan Azrad wrote:
> >>>>> From: Ananyev, Konstantin
> >>>>>>> -----Original Message-----
> >>>>>>> From: Guo, Jia
> >>>>>>> Sent: Wednesday, November 7, 2018 7:30 AM
> >>>>>>> To: Matan Azrad <matan@mellanox.com>; Ananyev, Konstantin
> >>>>>>> <konstantin.ananyev@intel.com>; Burakov, Anatoly
> >>>>>>> <anatoly.burakov@intel.com>; Thomas Monjalon
> >>>>>> <thomas@monjalon.net>;
> >>>>>>> Iremonger, Bernard <bernard.iremonger@intel.com>; Wu, Jingjing
> >>>>>>> <jingjing.wu@intel.com>; Lu, Wenzhuo <wenzhuo.lu@intel.com>
> >>>>>>> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org; Zhang,
> >>>>>>> Helin <helin.zhang@intel.com>; He, Shaopeng
> >>>> <shaopeng.he@intel.com>
> >>>>>>> Subject: Re: [PATCH 3/3] app/testpmd: fix callback issue for
> >>>>>>> hot-unplug
> >>>>>>>
> >>>>>>> matan
> >>>>>>>
> >>>>>>> On 11/6/2018 2:36 PM, Matan Azrad wrote:
> >>>>>>>> Hi Jeff
> >>>>>>>>
> >>>>>>>>      From: Jeff Guo <jia.guo@intel.com>
> >>>>>>>>> Before detach device when device be hot-unplugged, the failure
> >>>>>>>>> process in user space and kernel space both need to be
> >>>>>>>>> finished, such as eal interrupt callback need to be inactive
> >>>>>>>>> before the callback be unregistered when device is being
> >>>>>>>>> cleaned. This patch add rte alarm for device detaching, with
> >>>>>>>>> that it could finish interrupt callback soon and give time to
> >>>>>>>>> let the failure process done
> >>>>>> before detaching.
> >>>>>>>>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure
> >>>>>>>>> handler")
> >>>>>>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> >>>>>>>>> ---
> >>>>>>>>>      app/test-pmd/testpmd.c | 13 ++++++++++++-
> >>>>>>>>>      1 file changed, 12 insertions(+), 1 deletion(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> >>>>>>>>> index 9c0edca..9c673cf 100644
> >>>>>>>>> --- a/app/test-pmd/testpmd.c
> >>>>>>>>> +++ b/app/test-pmd/testpmd.c
> >>>>>>>>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
> >>>>>>>>> *device_name, enum rte_dev_event_type type,
> >>>>>>>>>      				device_name);
> >>>>>>>>>      			return;
> >>>>>>>>>      		}
> >>>>>>>>> -		rmv_event_callback((void *)(intptr_t)port_id);
> >>>>>>>>> +		/*
> >>>>>>>>> +		 * Before detach device, the hot-unplug failure
> >>>> process in
> >>>>>>>>> +		 * user space and kernel space both need to be
> >>>> finished,
> >>>>>>>>> +		 * such as eal interrupt callback need to be inactive
> >>>> before
> >>>>>>>>> +		 * the callback be unregistered when device is being
> >>>> cleaned.
> >>>>>>>>> +		 * So finished interrupt callback soon here and give
> >>>> time to
> >>>>>>>>> +		 * let the work done before detaching.
> >>>>>>>>> +		 */
> >>>>>>>>> +		if (rte_eal_alarm_set(100000,
> >>>>>>>>> +				rmv_event_callback, (void
> >>>>>>>>> *)(intptr_t)port_id))
> >>>>>>>>> +			RTE_LOG(ERR, EAL,
> >>>>>>>>> +				"Could not set up deferred device
> >>>>>>>> It looks me strange to use callback and alarm to remove a device:
> >>>>>>>> Why not to use callback and that is it?
> >>>>>>>>
> >>>>>>>> I think that it's better to let the EAL to detach the device
> >>>>>>>> after all the
> >>>>>> callbacks were done and not to do it by the user callback.
> >>>>>>>> So the application\callback owners just need to clean its
> >>>>>>>> resources with understanding that after the callback the
> >>>>>>>> device(and the callback
> >>>>>>> itself) will be detached by the EAL.
> >>>>>>>
> >>>>>>>
> >>>>>>> Firstly, at the currently framework and solution, such as
> >>>>>>> callback for RTE_ETH_EVENT_INTR_RMV, still need to use the
> >>>>>>> deferred device
> >>>>>> removal,
> >>>>>>> we tend to give the control of detaching device to the
> >>>>>>> application, and the whole process is located on the user's
> >>>>>>> callback. Notify app to detach device by callback but make it
> >>>>>>> deferred,
> >> i think it is fine.
> >>>>> But the device must be detached in remove event, why not to do it
> >>>>> in
> >> EAL?
> >>>> I think it because of before detached the device, application
> >>>> should be stop the forwarding at first, then stop the device, then
> >>>> close
> >>>>
> >>>> the device, finally call eal unplug API to detach device. If eal
> >>>> directly detach device at the first step, there will be mountain
> >>>> user space error need to handle, so that is one reason why need to
> >>>> provider the remove notification to app, and let app to process it.
> >>> This is why the EAL need to detach the device only after all the
> >>> user
> >> callbacks were done.
> >>
> >>
> >> If i correctly got your meaning, you suppose to let eal to mandatory
> >> detach device but not app, app just need to stop/close port, right?
> > Yes, the app should stop,close,clean its own resources of the removed
> > device, Then, EAL to detach the device.
> >
> >> If so, it will need to modify rmv_event_callback by not invoke the
> >> detaching func and add some detaching logic to hotplug framework in eal.
> >>
> > rmv_event_callback is using by other hotplug mechanism (ETHDEV RMV
> event), so you need to use another one\ add parameter to it.
> > And yes, you need to detach the device from EAL, should be simple.
> 
> 
> I think rmv_event_callback is original use for other hotplug event (ETHDEV
> RMV event), but it still use the common hotplug mechanism(app callback and
> app detach),
> 
> so i think it will still need to face this callback issue and you could check that
> eth_event_callback also use the rte alarm to detach device.

The ETHDEV layer cannot know if more than one ethdev port is associated to the rte_device,
So, it is not correct to detach a rte_device by the ETHDEV layer. Hence, the application should decide in the ETHDEV event case.

But, in the EAL event case this is different as I explained before.

Moreover, I think that the ethdev RMV event should be deprecated one day, after the EAL hot-plug mechanism will be stable. 

> 
> so my suggestion is that, you maybe propose a good idea but let we keep on
> current mechanism until we come to a final good solution agreement, before
> that, just let
> 
> it functional.
> 

I still suggest to fix the issue by an EAL detaching.

> 
> >> It is hardly say better or worse but this new propose need to discuss
> >> more in public. And it might be better to use another specific patch to
> handler it.
> >> What do you or other guys think so?
> > Since you are fixing issue here, it can be done by a fix series.
> >
> > Other feedbacks are welcome all the time 😊
> >
> >>
> >>>>>> It is also unclear to me my we need an alarm here.
> >>>>>> First (probably wrong) impression we just try to hide some
> >>>>>> synchronization Problem by introducing delay.
> >>>>> Looks like, the issue is that the callback function memory will be
> >>>>> removed
> >>>> from the function itself (by the detach call), no?
> >>>>
> >>>>
> >>>> Answer here for both Konstantin and Matan.
> >>>>
> >>>> Yes, i think matan is right, the interrupt callback will be destroy
> >>>> in the app callback itself, the sequence is that as below
> >>>>
> >>>> hot-unplug interrupt -> interrupt callback -> app callback(return
> >>>> to finish interrupt callback, delay detaching) -> detach
> >>>> device(unregister interrupt
> >>>> callback)
> >>>>
> >>>>
> >>>>>> Konstantin
> >>>>>>
> >>>>>>> Secondly, the vfio is different with igb_uio for hot-unplug, it
> >>>>>>> register/unregister hotplug interrupt callback for each device,
> >>>>>>> so need to make  the callback done before unregister the callback.
> >>>>>>>
> >>>>>>> So I think it should be considerate as an workaround here,
> >>>>>>> before we find a better way.
> >>>>>>>
> >>>>>>>
> >>>>>>>>> removal\n");
> >>>>>>>>>      		break;
> >>>>>>>>>      	case RTE_DEV_EVENT_ADD:
> >>>>>>>>>      		RTE_LOG(ERR, EAL, "The device: %s has been
> >>>> added!\n",
> >>>>>>>>> --
> >>>>>>>>> 2.7.4
  
Thomas Monjalon Nov. 12, 2018, 1:35 a.m. UTC | #11
11/11/2018 08:31, Matan Azrad:
> From: Jeff Guo 
> > On 11/9/2018 1:24 PM, Matan Azrad wrote:
> > >   From: Jeff Guo
> > >> On 11/8/2018 5:35 PM, Matan Azrad wrote:
> > >>> From: Jeff Guo
> > >>>> On 11/8/2018 3:28 PM, Matan Azrad wrote:
> > >>>>> From: Ananyev, Konstantin
> > >>>>>> From: Guo, Jia
> > >>>>>>> On 11/6/2018 2:36 PM, Matan Azrad wrote:
> > >>>>>>>> Hi Jeff
> > >>>>>>>>
> > >>>>>>>>      From: Jeff Guo <jia.guo@intel.com>
> > >>>>>>>>> Before detach device when device be hot-unplugged, the failure
> > >>>>>>>>> process in user space and kernel space both need to be
> > >>>>>>>>> finished, such as eal interrupt callback need to be inactive
> > >>>>>>>>> before the callback be unregistered when device is being
> > >>>>>>>>> cleaned. This patch add rte alarm for device detaching, with
> > >>>>>>>>> that it could finish interrupt callback soon and give time to
> > >>>>>>>>> let the failure process done
> > >>>>>> before detaching.
> > >>>>>>>>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure
> > >>>>>>>>> handler")
> > >>>>>>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> > >>>>>>>>> ---
> > >>>>>>>>>      app/test-pmd/testpmd.c | 13 ++++++++++++-
> > >>>>>>>>>      1 file changed, 12 insertions(+), 1 deletion(-)
> > >>>>>>>>>
> > >>>>>>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> > >>>>>>>>> index 9c0edca..9c673cf 100644
> > >>>>>>>>> --- a/app/test-pmd/testpmd.c
> > >>>>>>>>> +++ b/app/test-pmd/testpmd.c
> > >>>>>>>>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
> > >>>>>>>>> *device_name, enum rte_dev_event_type type,
> > >>>>>>>>>      				device_name);
> > >>>>>>>>>      			return;
> > >>>>>>>>>      		}
> > >>>>>>>>> -		rmv_event_callback((void *)(intptr_t)port_id);
> > >>>>>>>>> +		/*
> > >>>>>>>>> +		 * Before detach device, the hot-unplug failure
> > >>>> process in
> > >>>>>>>>> +		 * user space and kernel space both need to be
> > >>>> finished,
> > >>>>>>>>> +		 * such as eal interrupt callback need to be inactive
> > >>>> before
> > >>>>>>>>> +		 * the callback be unregistered when device is being
> > >>>> cleaned.
> > >>>>>>>>> +		 * So finished interrupt callback soon here and give
> > >>>> time to
> > >>>>>>>>> +		 * let the work done before detaching.
> > >>>>>>>>> +		 */
> > >>>>>>>>> +		if (rte_eal_alarm_set(100000,
> > >>>>>>>>> +				rmv_event_callback, (void
> > >>>>>>>>> *)(intptr_t)port_id))
> > >>>>>>>>> +			RTE_LOG(ERR, EAL,
> > >>>>>>>>> +				"Could not set up deferred device
> > >>>>>>>> It looks me strange to use callback and alarm to remove a device:
> > >>>>>>>> Why not to use callback and that is it?
> > >>>>>>>>
> > >>>>>>>> I think that it's better to let the EAL to detach the device
> > >>>>>>>> after all the
> > >>>>>> callbacks were done and not to do it by the user callback.
> > >>>>>>>> So the application\callback owners just need to clean its
> > >>>>>>>> resources with understanding that after the callback the
> > >>>>>>>> device(and the callback
> > >>>>>>> itself) will be detached by the EAL.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Firstly, at the currently framework and solution, such as
> > >>>>>>> callback for RTE_ETH_EVENT_INTR_RMV, still need to use the
> > >>>>>>> deferred device
> > >>>>>> removal,
> > >>>>>>> we tend to give the control of detaching device to the
> > >>>>>>> application, and the whole process is located on the user's
> > >>>>>>> callback. Notify app to detach device by callback but make it
> > >>>>>>> deferred,
> > >> i think it is fine.
> > >>>>> But the device must be detached in remove event, why not to do it
> > >>>>> in
> > >> EAL?
> > >>>> I think it because of before detached the device, application
> > >>>> should be stop the forwarding at first, then stop the device, then
> > >>>> close
> > >>>>
> > >>>> the device, finally call eal unplug API to detach device. If eal
> > >>>> directly detach device at the first step, there will be mountain
> > >>>> user space error need to handle, so that is one reason why need to
> > >>>> provider the remove notification to app, and let app to process it.
> > >>> This is why the EAL need to detach the device only after all the
> > >>> user
> > >> callbacks were done.
> > >>
> > >>
> > >> If i correctly got your meaning, you suppose to let eal to mandatory
> > >> detach device but not app, app just need to stop/close port, right?
> > > Yes, the app should stop,close,clean its own resources of the removed
> > > device, Then, EAL to detach the device.
> > >
> > >> If so, it will need to modify rmv_event_callback by not invoke the
> > >> detaching func and add some detaching logic to hotplug framework in eal.
> > >>
> > > rmv_event_callback is using by other hotplug mechanism (ETHDEV RMV
> > event), so you need to use another one\ add parameter to it.
> > > And yes, you need to detach the device from EAL, should be simple.
> > 
> > 
> > I think rmv_event_callback is original use for other hotplug event (ETHDEV
> > RMV event), but it still use the common hotplug mechanism(app callback and
> > app detach),
> > 
> > so i think it will still need to face this callback issue and you could check that
> > eth_event_callback also use the rte alarm to detach device.
> 
> The ETHDEV layer cannot know if more than one ethdev port is associated to the rte_device,
> So, it is not correct to detach a rte_device by the ETHDEV layer. Hence, the application should decide in the ETHDEV event case.

Yes, we must not detach a device (rte_dev_remove) if we are not sure all ports are closed.

> But, in the EAL event case this is different as I explained before.
> 
> Moreover, I think that the ethdev RMV event should be deprecated one day, after the EAL hot-plug mechanism will be stable.

Yes, we may replace the ethdev RMV event by the EAL event RTE_DEV_EVENT_REMOVE.

> > so my suggestion is that, you maybe propose a good idea but let we keep on
> > current mechanism until we come to a final good solution agreement, before
> > that, just let it functional.
> 
> I still suggest to fix the issue by an EAL detaching.

I don't understand the issue, but yes we can call rte_dev_remove in EAL,
after the callback return.
Ideally, the callback should be able to return the decision of cleaning
the device or not.

I suggest several steps (a roadmap for HW unplug):
	- in 18.11, remove call to detach_port_device() from rmv_event_callback()
	- in 18.11, call rte_dev_remove() in EAL after RTE_DEV_EVENT_REMOVE callback
	- in 18.11, remove call to rmv_event_callback() in eth_dev_event_callback()
	- in 18.11, rename eth_dev_event_callback to dev_event_callback
	- in 19.02, convert all PMDs to free port resources on rte_eth_dev_close()
	- in 19.05, automatically call rte_dev_remove() from PMD when closing last port
	- in 19.08, we may try to unbind the kernel driver in rte_dev_remove()
  
Guo, Jia Nov. 14, 2018, 9:32 a.m. UTC | #12
On 11/12/2018 9:35 AM, Thomas Monjalon wrote:
> 11/11/2018 08:31, Matan Azrad:
>> From: Jeff Guo
>>> On 11/9/2018 1:24 PM, Matan Azrad wrote:
>>>>    From: Jeff Guo
>>>>> On 11/8/2018 5:35 PM, Matan Azrad wrote:
>>>>>> From: Jeff Guo
>>>>>>> On 11/8/2018 3:28 PM, Matan Azrad wrote:
>>>>>>>> From: Ananyev, Konstantin
>>>>>>>>> From: Guo, Jia
>>>>>>>>>> On 11/6/2018 2:36 PM, Matan Azrad wrote:
>>>>>>>>>>> Hi Jeff
>>>>>>>>>>>
>>>>>>>>>>>       From: Jeff Guo <jia.guo@intel.com>
>>>>>>>>>>>> Before detach device when device be hot-unplugged, the failure
>>>>>>>>>>>> process in user space and kernel space both need to be
>>>>>>>>>>>> finished, such as eal interrupt callback need to be inactive
>>>>>>>>>>>> before the callback be unregistered when device is being
>>>>>>>>>>>> cleaned. This patch add rte alarm for device detaching, with
>>>>>>>>>>>> that it could finish interrupt callback soon and give time to
>>>>>>>>>>>> let the failure process done
>>>>>>>>> before detaching.
>>>>>>>>>>>> Fixes: 2049c5113fe8 ("app/testpmd: use hotplug failure
>>>>>>>>>>>> handler")
>>>>>>>>>>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>       app/test-pmd/testpmd.c | 13 ++++++++++++-
>>>>>>>>>>>>       1 file changed, 12 insertions(+), 1 deletion(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
>>>>>>>>>>>> index 9c0edca..9c673cf 100644
>>>>>>>>>>>> --- a/app/test-pmd/testpmd.c
>>>>>>>>>>>> +++ b/app/test-pmd/testpmd.c
>>>>>>>>>>>> @@ -2620,7 +2620,18 @@ eth_dev_event_callback(const char
>>>>>>>>>>>> *device_name, enum rte_dev_event_type type,
>>>>>>>>>>>>       				device_name);
>>>>>>>>>>>>       			return;
>>>>>>>>>>>>       		}
>>>>>>>>>>>> -		rmv_event_callback((void *)(intptr_t)port_id);
>>>>>>>>>>>> +		/*
>>>>>>>>>>>> +		 * Before detach device, the hot-unplug failure
>>>>>>> process in
>>>>>>>>>>>> +		 * user space and kernel space both need to be
>>>>>>> finished,
>>>>>>>>>>>> +		 * such as eal interrupt callback need to be inactive
>>>>>>> before
>>>>>>>>>>>> +		 * the callback be unregistered when device is being
>>>>>>> cleaned.
>>>>>>>>>>>> +		 * So finished interrupt callback soon here and give
>>>>>>> time to
>>>>>>>>>>>> +		 * let the work done before detaching.
>>>>>>>>>>>> +		 */
>>>>>>>>>>>> +		if (rte_eal_alarm_set(100000,
>>>>>>>>>>>> +				rmv_event_callback, (void
>>>>>>>>>>>> *)(intptr_t)port_id))
>>>>>>>>>>>> +			RTE_LOG(ERR, EAL,
>>>>>>>>>>>> +				"Could not set up deferred device
>>>>>>>>>>> It looks me strange to use callback and alarm to remove a device:
>>>>>>>>>>> Why not to use callback and that is it?
>>>>>>>>>>>
>>>>>>>>>>> I think that it's better to let the EAL to detach the device
>>>>>>>>>>> after all the
>>>>>>>>> callbacks were done and not to do it by the user callback.
>>>>>>>>>>> So the application\callback owners just need to clean its
>>>>>>>>>>> resources with understanding that after the callback the
>>>>>>>>>>> device(and the callback
>>>>>>>>>> itself) will be detached by the EAL.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Firstly, at the currently framework and solution, such as
>>>>>>>>>> callback for RTE_ETH_EVENT_INTR_RMV, still need to use the
>>>>>>>>>> deferred device
>>>>>>>>> removal,
>>>>>>>>>> we tend to give the control of detaching device to the
>>>>>>>>>> application, and the whole process is located on the user's
>>>>>>>>>> callback. Notify app to detach device by callback but make it
>>>>>>>>>> deferred,
>>>>> i think it is fine.
>>>>>>>> But the device must be detached in remove event, why not to do it
>>>>>>>> in
>>>>> EAL?
>>>>>>> I think it because of before detached the device, application
>>>>>>> should be stop the forwarding at first, then stop the device, then
>>>>>>> close
>>>>>>>
>>>>>>> the device, finally call eal unplug API to detach device. If eal
>>>>>>> directly detach device at the first step, there will be mountain
>>>>>>> user space error need to handle, so that is one reason why need to
>>>>>>> provider the remove notification to app, and let app to process it.
>>>>>> This is why the EAL need to detach the device only after all the
>>>>>> user
>>>>> callbacks were done.
>>>>>
>>>>>
>>>>> If i correctly got your meaning, you suppose to let eal to mandatory
>>>>> detach device but not app, app just need to stop/close port, right?
>>>> Yes, the app should stop,close,clean its own resources of the removed
>>>> device, Then, EAL to detach the device.
>>>>
>>>>> If so, it will need to modify rmv_event_callback by not invoke the
>>>>> detaching func and add some detaching logic to hotplug framework in eal.
>>>>>
>>>> rmv_event_callback is using by other hotplug mechanism (ETHDEV RMV
>>> event), so you need to use another one\ add parameter to it.
>>>> And yes, you need to detach the device from EAL, should be simple.
>>>
>>> I think rmv_event_callback is original use for other hotplug event (ETHDEV
>>> RMV event), but it still use the common hotplug mechanism(app callback and
>>> app detach),
>>>
>>> so i think it will still need to face this callback issue and you could check that
>>> eth_event_callback also use the rte alarm to detach device.
>> The ETHDEV layer cannot know if more than one ethdev port is associated to the rte_device,
>> So, it is not correct to detach a rte_device by the ETHDEV layer. Hence, the application should decide in the ETHDEV event case.
> Yes, we must not detach a device (rte_dev_remove) if we are not sure all ports are closed.
>
>> But, in the EAL event case this is different as I explained before.
>>
>> Moreover, I think that the ethdev RMV event should be deprecated one day, after the EAL hot-plug mechanism will be stable.
> Yes, we may replace the ethdev RMV event by the EAL event RTE_DEV_EVENT_REMOVE.
>
>>> so my suggestion is that, you maybe propose a good idea but let we keep on
>>> current mechanism until we come to a final good solution agreement, before
>>> that, just let it functional.
>> I still suggest to fix the issue by an EAL detaching.
> I don't understand the issue, but yes we can call rte_dev_remove in EAL,
> after the callback return.
> Ideally, the callback should be able to return the decision of cleaning
> the device or not.
>
> I suggest several steps (a roadmap for HW unplug):
>
>

I mainly agree on that we need to modify currently hotplug framework for 
HW hot-unplug. But i think we might be think about that.

1) simple delete detach_port_device from rmv_event_callback will affect 
both currently ethdev event and eal event hot-unplug mechanism.

2) If we remove call to rmv_event_callback in eth_dev_event_callback() , 
we also need to remove call to rmv_event_callback in 
eth_event_callback(),  since it also use deferred device removal to 
waiting the interrupt callback be unregistered in 
mlx5_dev_interrupt_handler_uninstall.

so i think that basically the multiple port free and the hotplug 
framework integrate should be an RFC at the begin of 19.02, but should 
not affect currently mechanism. And i suggest the several steps might be 
better as.

	- in 18.11, rename eth_dev_event_callback to dev_event_callback
	- in 18.11, rename rmv_event_callback() to rmv_port_callback(), keep current function calling when detach one port both for ethdev event and eal event. (we could document it for the limitation.)

	- in 19.02, remove call to detach_port_device() from rmv_event_callback()
	- in 19.02, call rte_dev_remove() in EAL after RTE_DEV_EVENT_REMOVE callback
	- in 19.02, remove call to rmv_event_callback() in eth_dev_event_callback()
	- in 19.02, convert all PMDs to free port resources on rte_eth_dev_close()
	- in 19.05, automatically call rte_dev_remove() from PMD when closing last port
	- in 19.08, we may try to unbind the kernel driver in rte_dev_remove()
  

Patch

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 9c0edca..9c673cf 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2620,7 +2620,18 @@  eth_dev_event_callback(const char *device_name, enum rte_dev_event_type type,
 				device_name);
 			return;
 		}
-		rmv_event_callback((void *)(intptr_t)port_id);
+		/*
+		 * Before detach device, the hot-unplug failure process in
+		 * user space and kernel space both need to be finished,
+		 * such as eal interrupt callback need to be inactive before
+		 * the callback be unregistered when device is being cleaned.
+		 * So finished interrupt callback soon here and give time to
+		 * let the work done before detaching.
+		 */
+		if (rte_eal_alarm_set(100000,
+				rmv_event_callback, (void *)(intptr_t)port_id))
+			RTE_LOG(ERR, EAL,
+				"Could not set up deferred device removal\n");
 		break;
 	case RTE_DEV_EVENT_ADD:
 		RTE_LOG(ERR, EAL, "The device: %s has been added!\n",