[v1] igb_uio: fix unexpected removal for hot-unplug

Message ID 1539844035-11524-1-git-send-email-jia.guo@intel.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series [v1] igb_uio: fix unexpected removal for hot-unplug |

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/checkpatch success coding style OK

Commit Message

Guo, Jia Oct. 18, 2018, 6:27 a.m. UTC
  When a device is hot-unplugged, pci_remove will be invoked unexpectedly
before pci_release, it will caused kernel hung issue which will throw the
error info of "Trying to free already-free IRQ XXX". And on the other hand,
if pci_remove before pci_release, the interrupt will not got chance to be
disabled. So this patch aim to fix this issue by adding pci_release call
in pci_remove, it will gurranty that all pci clean up will be done before
pci removal.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 kernel/linux/igb_uio/igb_uio.c | 2 ++
 1 file changed, 2 insertions(+)
  

Comments

Ferruh Yigit Oct. 18, 2018, 4:06 p.m. UTC | #1
On 10/18/2018 7:27 AM, Jeff Guo wrote:
> When a device is hot-unplugged, pci_remove will be invoked unexpectedly
> before pci_release, it will caused kernel hung issue which will throw the
> error info of "Trying to free already-free IRQ XXX". And on the other hand,
> if pci_remove before pci_release, the interrupt will not got chance to be
> disabled. So this patch aim to fix this issue by adding pci_release call
> in pci_remove, it will gurranty that all pci clean up will be done before
> pci removal.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
>  kernel/linux/igb_uio/igb_uio.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
> index fede66c..3cf394b 100644
> --- a/kernel/linux/igb_uio/igb_uio.c
> +++ b/kernel/linux/igb_uio/igb_uio.c
> @@ -570,6 +570,8 @@ igbuio_pci_remove(struct pci_dev *dev)
>  {
>  	struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
>  
> +	igbuio_pci_release(&udev->info, NULL);
> +

Hi Jeff,

This is simpler approach comparing to previous version.

And do you know if igbuio_pci_release() won't be called after
igbuio_pci_remove() because that will also cause crash, and indeed it will cause
a crash in the uio too.

The flow as far as I can see:
when uioN device opened by application, igbuio_pci_open() is called.

If device removed, I expect driver remove() function called, which has a call
stack like below:

igbuio_pci_remove()
  uio_unregister_device()
    uio_device_release()
      kfree(struct uio_device)

After this point udev is freed and igbuio_pci_release() shouldn't be called, so
I assume uioN device closed before this point but I couldn't find where, if not
closed, closing it later will crash.

I can't test the hotplug case, can you please confirm above patch fixing crashes
you observed for your use cases?

And for regular usecase this change shouldn't cause any problem, so at worst it
may not be fixing all hotplug issues, which looks safe to get.
  
Guo, Jia Oct. 19, 2018, 8:35 a.m. UTC | #2
On 10/19/2018 12:06 AM, Ferruh Yigit wrote:
> On 10/18/2018 7:27 AM, Jeff Guo wrote:
>> When a device is hot-unplugged, pci_remove will be invoked unexpectedly
>> before pci_release, it will caused kernel hung issue which will throw the
>> error info of "Trying to free already-free IRQ XXX". And on the other hand,
>> if pci_remove before pci_release, the interrupt will not got chance to be
>> disabled. So this patch aim to fix this issue by adding pci_release call
>> in pci_remove, it will gurranty that all pci clean up will be done before
>> pci removal.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>>   kernel/linux/igb_uio/igb_uio.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
>> index fede66c..3cf394b 100644
>> --- a/kernel/linux/igb_uio/igb_uio.c
>> +++ b/kernel/linux/igb_uio/igb_uio.c
>> @@ -570,6 +570,8 @@ igbuio_pci_remove(struct pci_dev *dev)
>>   {
>>   	struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
>>   
>> +	igbuio_pci_release(&udev->info, NULL);
>> +
> Hi Jeff,
>
> This is simpler approach comparing to previous version.
>
> And do you know if igbuio_pci_release() won't be called after
> igbuio_pci_remove() because that will also cause crash, and indeed it will cause
> a crash in the uio too.
>
> The flow as far as I can see:
> when uioN device opened by application, igbuio_pci_open() is called.
>
> If device removed, I expect driver remove() function called, which has a call
> stack like below:
>
> igbuio_pci_remove()
>    uio_unregister_device()
>      uio_device_release()
>        kfree(struct uio_device)
>
> After this point udev is freed and igbuio_pci_release() shouldn't be called, so
> I assume uioN device closed before this point but I couldn't find where, if not
> closed, closing it later will crash.


What i saw is that after igb_uio remove , if detach the device the pci 
release will be called, so the

igbuo_pci_release should be called again.


> I can't test the hotplug case, can you please confirm above patch fixing crashes
> you observed for your use cases?


yes, it could be fix the crashed i observed right now.


>
> And for regular usecase this change shouldn't cause any problem, so at worst it
> may not be fixing all hotplug issues, which looks safe to get.


I think it would fix this hung issue that caused of double free irq and 
would not have side effect anyway.
  
Ferruh Yigit Oct. 22, 2018, 11:13 a.m. UTC | #3
On 10/19/2018 9:35 AM, Jeff Guo wrote:
> 
> On 10/19/2018 12:06 AM, Ferruh Yigit wrote:
>> On 10/18/2018 7:27 AM, Jeff Guo wrote:
>>> When a device is hot-unplugged, pci_remove will be invoked unexpectedly
>>> before pci_release, it will caused kernel hung issue which will throw the
>>> error info of "Trying to free already-free IRQ XXX". And on the other hand,
>>> if pci_remove before pci_release, the interrupt will not got chance to be
>>> disabled. So this patch aim to fix this issue by adding pci_release call
>>> in pci_remove, it will gurranty that all pci clean up will be done before
>>> pci removal.
>>>
>>> Signed-off-by: Jeff Guo <jia.guo@intel.com>

Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
  
Thomas Monjalon Oct. 24, 2018, 11:14 p.m. UTC | #4
22/10/2018 13:13, Ferruh Yigit:
> On 10/19/2018 9:35 AM, Jeff Guo wrote:
> > 
> > On 10/19/2018 12:06 AM, Ferruh Yigit wrote:
> >> On 10/18/2018 7:27 AM, Jeff Guo wrote:
> >>> When a device is hot-unplugged, pci_remove will be invoked unexpectedly
> >>> before pci_release, it will caused kernel hung issue which will throw the
> >>> error info of "Trying to free already-free IRQ XXX". And on the other hand,
> >>> if pci_remove before pci_release, the interrupt will not got chance to be
> >>> disabled. So this patch aim to fix this issue by adding pci_release call
> >>> in pci_remove, it will gurranty that all pci clean up will be done before
> >>> pci removal.
> >>>
> >>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> 
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

Applied, thanks
  

Patch

diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index fede66c..3cf394b 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -570,6 +570,8 @@  igbuio_pci_remove(struct pci_dev *dev)
 {
 	struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
 
+	igbuio_pci_release(&udev->info, NULL);
+
 	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);
 	uio_unregister_device(&udev->info);
 	igbuio_pci_release_iomem(&udev->info);