[v10,7/8] igb_uio: fix unexpected remove issue for hotplug

Message ID 1534502916-31636-9-git-send-email-jia.guo@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series hotplug failure handle mechanism |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Guo, Jia Aug. 17, 2018, 10:48 a.m. UTC
  When a device is hotplugged out, the PCI resource is released in the
kernel, the UIO file descriptor will disappear and the irq will be
released. After this, a kernel crash will be caused if the igb uio driver
tries to access or release these resources.

And more, uio_remove will be called unexpectedly before uio_release
when device be hotpluggged out, the uio_remove procedure will
free resources that are required by uio_release. This will later affect the
usage of interrupt as there is no way to disable the interrupt which is
defined in uio_release.

To prevent this, the hotplug removal needs to be identified and processed
accordingly in igb uio driver.

This patch proposes the addition of enum rte_udev_state in the
rte_uio_pci_dev struct. This will store the state of the uio device as one
of the following: probed/opened/released/removed.

This patch also checks the kobject's remove_uevent_sent state to detect if
the removal status is hotplug-out. Once a hotplug-out is detected, it will
call uio_release and set the uio status to "removed". After that, uio will
check the status in the uio_release function. If uio has already been
removed, it will only free the dirty uio resource.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Shaopeng He <shaopeng.he@intel.com>
---
v10->v9:
refine commmit log.
---
 kernel/linux/igb_uio/igb_uio.c | 69 +++++++++++++++++++++++++++++++++---------
 1 file changed, 55 insertions(+), 14 deletions(-)
  

Comments

Ferruh Yigit Sept. 27, 2018, 3:07 p.m. UTC | #1
On 8/17/2018 11:48 AM, Jeff Guo wrote:
> When a device is hotplugged out, the PCI resource is released in the
> kernel, the UIO file descriptor will disappear and the irq will be
> released. After this, a kernel crash will be caused if the igb uio driver
> tries to access or release these resources.
> 
> And more, uio_remove will be called unexpectedly before uio_release
> when device be hotpluggged out, the uio_remove procedure will
> free resources that are required by uio_release. This will later affect the
> usage of interrupt as there is no way to disable the interrupt which is
> defined in uio_release.
> 
> To prevent this, the hotplug removal needs to be identified and processed
> accordingly in igb uio driver.
> 
> This patch proposes the addition of enum rte_udev_state in the
> rte_uio_pci_dev struct. This will store the state of the uio device as one
> of the following: probed/opened/released/removed.
> 
> This patch also checks the kobject's remove_uevent_sent state to detect if
> the removal status is hotplug-out. Once a hotplug-out is detected, it will
> call uio_release and set the uio status to "removed". After that, uio will
> check the status in the uio_release function. If uio has already been
> removed, it will only free the dirty uio resource.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> Acked-by: Shaopeng He <shaopeng.he@intel.com>

<...>

> @@ -331,20 +351,35 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode)
>  
>  	/* enable interrupts */
>  	err = igbuio_pci_enable_interrupts(udev);
> -	mutex_unlock(&udev->lock);
>  	if (err) {
>  		dev_err(&dev->dev, "Enable interrupt fails\n");
> +		pci_clear_master(dev);

Why pci_clear_master required here.

btw, some part of this patch conflicts with [1], which removes mutes and use
atomic refcnt operations, but introducing state seems needs mutex.

[1]
igb_uio: fix refcount if open returns error
https://patches.dpdk.org/patch/44732/

> +		mutex_unlock(&udev->lock);
>  		return err;
>  	}
> +	udev->state = RTE_UDEV_OPENNED;
> +	mutex_unlock(&udev->lock);
>  	return 0;
>  }
>  
> +/**
> + * This gets called while closing uio device file.
> + */
>  static int
>  igbuio_pci_release(struct uio_info *info, struct inode *inode)
>  {
>  	struct rte_uio_pci_dev *udev = info->priv;
>  	struct pci_dev *dev = udev->pdev;
>  
> +	if (udev->state == RTE_UDEV_REMOVED) {
> +		mutex_destroy(&udev->lock);
> +		igbuio_pci_release_iomem(&udev->info);
> +		pci_disable_device(dev);
> +		pci_set_drvdata(dev, NULL);
> +		kfree(udev);
> +		return 0;

This branch taken when pci_remove called before pci_release.
- At this stage is "dev" valid, since pci_remove() called?
- In this path uio_unregister_device() is missing, who unregisters uio?
- sysfs_remove_group() also missing, it is not clear if it is forgotten or left
out, what do you think move common part of pci_remove into new function and call
both in pci_remove and here?

And as a logic, can we make pci_remove clear everything, instead of doing some
cleanup here. Like:
pci_remove:
- calls pci_release
- instead of return keeps doing pci_remove work
- set state to REMOVED

pci_release:
- if state is REMOVED, return without doing nothing

btw, even after uio_unregister_device() how pci_release called?


It can help to share crash backtrace in commit log, to describe problem in more
detail.
  
Guo, Jia Oct. 18, 2018, 5:51 a.m. UTC | #2
hi, ferruh

On 9/27/2018 11:07 PM, Ferruh Yigit wrote:
> On 8/17/2018 11:48 AM, Jeff Guo wrote:
>> When a device is hotplugged out, the PCI resource is released in the
>> kernel, the UIO file descriptor will disappear and the irq will be
>> released. After this, a kernel crash will be caused if the igb uio driver
>> tries to access or release these resources.
>>
>> And more, uio_remove will be called unexpectedly before uio_release
>> when device be hotpluggged out, the uio_remove procedure will
>> free resources that are required by uio_release. This will later affect the
>> usage of interrupt as there is no way to disable the interrupt which is
>> defined in uio_release.
>>
>> To prevent this, the hotplug removal needs to be identified and processed
>> accordingly in igb uio driver.
>>
>> This patch proposes the addition of enum rte_udev_state in the
>> rte_uio_pci_dev struct. This will store the state of the uio device as one
>> of the following: probed/opened/released/removed.
>>
>> This patch also checks the kobject's remove_uevent_sent state to detect if
>> the removal status is hotplug-out. Once a hotplug-out is detected, it will
>> call uio_release and set the uio status to "removed". After that, uio will
>> check the status in the uio_release function. If uio has already been
>> removed, it will only free the dirty uio resource.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> Acked-by: Shaopeng He <shaopeng.he@intel.com>
> <...>
>
>> @@ -331,20 +351,35 @@ igbuio_pci_open(struct uio_info *info, struct inode *inode)
>>   
>>   	/* enable interrupts */
>>   	err = igbuio_pci_enable_interrupts(udev);
>> -	mutex_unlock(&udev->lock);
>>   	if (err) {
>>   		dev_err(&dev->dev, "Enable interrupt fails\n");
>> +		pci_clear_master(dev);
> Why pci_clear_master required here.


It is because set master is before interrupt enabling, if enable 
interrupt fails should clear master before return i think.

Anyway it is not belong to this patch perspective, it could be separated 
to another one.


> btw, some part of this patch conflicts with [1], which removes mutes and use
> atomic refcnt operations, but introducing state seems needs mutex.
>
> [1]
> igb_uio: fix refcount if open returns error
> https://patches.dpdk.org/patch/44732/


yes, i see and will rework for that if need.


>> +		mutex_unlock(&udev->lock);
>>   		return err;
>>   	}
>> +	udev->state = RTE_UDEV_OPENNED;
>> +	mutex_unlock(&udev->lock);
>>   	return 0;
>>   }
>>   
>> +/**
>> + * This gets called while closing uio device file.
>> + */
>>   static int
>>   igbuio_pci_release(struct uio_info *info, struct inode *inode)
>>   {
>>   	struct rte_uio_pci_dev *udev = info->priv;
>>   	struct pci_dev *dev = udev->pdev;
>>   
>> +	if (udev->state == RTE_UDEV_REMOVED) {
>> +		mutex_destroy(&udev->lock);
>> +		igbuio_pci_release_iomem(&udev->info);
>> +		pci_disable_device(dev);
>> +		pci_set_drvdata(dev, NULL);
>> +		kfree(udev);
>> +		return 0;
> This branch taken when pci_remove called before pci_release.
> - At this stage is "dev" valid, since pci_remove() called?
> - In this path uio_unregister_device() is missing, who unregisters uio?
> - sysfs_remove_group() also missing, it is not clear if it is forgotten or left
> out, what do you think move common part of pci_remove into new function and call
> both in pci_remove and here?


It is not forgotten but specific left out, since the if uio remove 
before uio release it will cause kernel error, which is double free the

already-free irq issue when uio unregister device.


> And as a logic, can we make pci_remove clear everything, instead of doing some
> cleanup here. Like:
> pci_remove:
> - calls pci_release
> - instead of return keeps doing pci_remove work
> - set state to REMOVED
>
> pci_release:
> - if state is REMOVED, return without doing nothing


I think the logic you said here is make sense, just make release and 
remove more focus their own work.


>
> btw, even after uio_unregister_device() how pci_release called?


  The consequence of igb uio removal is that, igb uio remove be called, 
then igb uio release be called when detaching device, then if quit app 
it will call pci remove.


>
> It can help to share crash backtrace in commit log, to describe problem in more
> detail.


I will do that. And i think the 2 thing need to fix is that, one is the 
double free irq issue, the other one is give the chance to disable 
interrupt when uio remove be called before uio release. I check again 
and find that just add release before remove could both fix these

issues, so please review my coming update patch. Thanks.
  

Patch

diff --git a/kernel/linux/igb_uio/igb_uio.c b/kernel/linux/igb_uio/igb_uio.c
index 3398eac..d126371 100644
--- a/kernel/linux/igb_uio/igb_uio.c
+++ b/kernel/linux/igb_uio/igb_uio.c
@@ -19,6 +19,14 @@ 
 
 #include "compat.h"
 
+/* uio pci device state */
+enum rte_udev_state {
+	RTE_UDEV_PROBED,
+	RTE_UDEV_OPENNED,
+	RTE_UDEV_RELEASED,
+	RTE_UDEV_REMOVED,
+};
+
 /**
  * A structure describing the private information for a uio device.
  */
@@ -28,6 +36,7 @@  struct rte_uio_pci_dev {
 	enum rte_intr_mode mode;
 	struct mutex lock;
 	int refcnt;
+	enum rte_udev_state state;
 };
 
 static int wc_activate;
@@ -309,6 +318,17 @@  igbuio_pci_disable_interrupts(struct rte_uio_pci_dev *udev)
 #endif
 }
 
+/* Unmap previously ioremap'd resources */
+static void
+igbuio_pci_release_iomem(struct uio_info *info)
+{
+	int i;
+
+	for (i = 0; i < MAX_UIO_MAPS; i++) {
+		if (info->mem[i].internal_addr)
+			iounmap(info->mem[i].internal_addr);
+	}
+}
 
 /**
  * This gets called while opening uio device file.
@@ -331,20 +351,35 @@  igbuio_pci_open(struct uio_info *info, struct inode *inode)
 
 	/* enable interrupts */
 	err = igbuio_pci_enable_interrupts(udev);
-	mutex_unlock(&udev->lock);
 	if (err) {
 		dev_err(&dev->dev, "Enable interrupt fails\n");
+		pci_clear_master(dev);
+		mutex_unlock(&udev->lock);
 		return err;
 	}
+	udev->state = RTE_UDEV_OPENNED;
+	mutex_unlock(&udev->lock);
 	return 0;
 }
 
+/**
+ * This gets called while closing uio device file.
+ */
 static int
 igbuio_pci_release(struct uio_info *info, struct inode *inode)
 {
 	struct rte_uio_pci_dev *udev = info->priv;
 	struct pci_dev *dev = udev->pdev;
 
+	if (udev->state == RTE_UDEV_REMOVED) {
+		mutex_destroy(&udev->lock);
+		igbuio_pci_release_iomem(&udev->info);
+		pci_disable_device(dev);
+		pci_set_drvdata(dev, NULL);
+		kfree(udev);
+		return 0;
+	}
+
 	mutex_lock(&udev->lock);
 	if (--udev->refcnt > 0) {
 		mutex_unlock(&udev->lock);
@@ -356,7 +391,7 @@  igbuio_pci_release(struct uio_info *info, struct inode *inode)
 
 	/* stop the device from further DMA */
 	pci_clear_master(dev);
-
+	udev->state = RTE_UDEV_RELEASED;
 	mutex_unlock(&udev->lock);
 	return 0;
 }
@@ -414,18 +449,6 @@  igbuio_pci_setup_ioport(struct pci_dev *dev, struct uio_info *info,
 	return 0;
 }
 
-/* Unmap previously ioremap'd resources */
-static void
-igbuio_pci_release_iomem(struct uio_info *info)
-{
-	int i;
-
-	for (i = 0; i < MAX_UIO_MAPS; i++) {
-		if (info->mem[i].internal_addr)
-			iounmap(info->mem[i].internal_addr);
-	}
-}
-
 static int
 igbuio_setup_bars(struct pci_dev *dev, struct uio_info *info)
 {
@@ -562,6 +585,9 @@  igbuio_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
 			 (unsigned long long)map_dma_addr, map_addr);
 	}
 
+	mutex_lock(&udev->lock);
+	udev->state = RTE_UDEV_PROBED;
+	mutex_unlock(&udev->lock);
 	return 0;
 
 fail_remove_group:
@@ -579,6 +605,21 @@  static void
 igbuio_pci_remove(struct pci_dev *dev)
 {
 	struct rte_uio_pci_dev *udev = pci_get_drvdata(dev);
+	struct pci_dev *pdev = udev->pdev;
+	int ret;
+
+	/* handle unexpected removal */
+	if (udev->state == RTE_UDEV_OPENNED ||
+	    (&pdev->dev.kobj)->state_remove_uevent_sent == 1) {
+		dev_notice(&dev->dev, "Unexpected removal!\n");
+		ret = igbuio_pci_release(&udev->info, NULL);
+		if (ret)
+			return;
+		mutex_lock(&udev->lock);
+		udev->state = RTE_UDEV_REMOVED;
+		mutex_unlock(&udev->lock);
+		return;
+	}
 
 	mutex_destroy(&udev->lock);
 	sysfs_remove_group(&dev->dev.kobj, &dev_attr_grp);