doc: add known igb_uio device hot-unplug issue

Message ID 1542726571-121934-1-git-send-email-jia.guo@intel.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series doc: add known igb_uio device hot-unplug issue |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/mellanox-Performance-Testing success Performance Testing PASS
ci/intel-Performance-Testing success Performance Testing PASS
ci/Intel-compilation success Compilation OK

Commit Message

Guo, Jia Nov. 20, 2018, 3:09 p.m. UTC
  When device has been bound to igb_uio driver and application is running,
hot-unplugging the device may cause kernel crash.

Signed-off-by: Jeff Guo <jia.guo@intel.com>
---
 doc/guides/rel_notes/known_issues.rst | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)
  

Comments

Stephen Hemminger Nov. 20, 2018, 6:02 p.m. UTC | #1
On Tue, 20 Nov 2018 23:09:31 +0800
Jeff Guo <jia.guo@intel.com> wrote:

> When device has been bound to igb_uio driver and application is running,
> hot-unplugging the device may cause kernel crash.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>
> ---
>  doc/guides/rel_notes/known_issues.rst | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/known_issues.rst b/doc/guides/rel_notes/known_issues.rst
> index 95e4ce6..dfe0565 100644
> --- a/doc/guides/rel_notes/known_issues.rst
> +++ b/doc/guides/rel_notes/known_issues.rst
> @@ -759,3 +759,24 @@ Netvsc driver and application restart
>  
>  **Driver/Module**:
>     ``uio_hv_generic`` module.
> +
> +
> +kernel crash when hot-unplug igb_uio device while DPDK application is running
> +-----------------------------------------------------------------------------
> +
> +**Description**:
> +   When device has been bound to igb_uio driver and application is running, hot-unplugging
> +   the device may cause kernel crash.
> +
> +**Reason**:
> +   When device is hot-unplugged, igb_uio driver will be removed which will destroy uio resources.
> +   Later trying to access any uio resource will cause kernel crash.
> +
> +**Resolution/Workaround**:
> +   If using DPDK for PCI HW hot-unplug, prefer to bind device with VFIO instead of IGB_UIO.
> +
> +**Affected Environment/Platform**:
> +    ALL.
> +
> +**Driver/Module**:
> +   ``igb_uio`` module.

Surely this is fixable. What is the back trace in the kernel? How can it be reproduced with some
common hardware (or hypervisor).  Will it happen with KVM?
  
Guo, Jia Nov. 21, 2018, 7:42 a.m. UTC | #2
On 11/21/2018 2:02 AM, Stephen Hemminger wrote:
> On Tue, 20 Nov 2018 23:09:31 +0800
> Jeff Guo <jia.guo@intel.com> wrote:
>
>> When device has been bound to igb_uio driver and application is running,
>> hot-unplugging the device may cause kernel crash.
>>
>> Signed-off-by: Jeff Guo <jia.guo@intel.com>
>> ---
>>   doc/guides/rel_notes/known_issues.rst | 21 +++++++++++++++++++++
>>   1 file changed, 21 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/known_issues.rst b/doc/guides/rel_notes/known_issues.rst
>> index 95e4ce6..dfe0565 100644
>> --- a/doc/guides/rel_notes/known_issues.rst
>> +++ b/doc/guides/rel_notes/known_issues.rst
>> @@ -759,3 +759,24 @@ Netvsc driver and application restart
>>   
>>   **Driver/Module**:
>>      ``uio_hv_generic`` module.
>> +
>> +
>> +kernel crash when hot-unplug igb_uio device while DPDK application is running
>> +-----------------------------------------------------------------------------
>> +
>> +**Description**:
>> +   When device has been bound to igb_uio driver and application is running, hot-unplugging
>> +   the device may cause kernel crash.
>> +
>> +**Reason**:
>> +   When device is hot-unplugged, igb_uio driver will be removed which will destroy uio resources.
>> +   Later trying to access any uio resource will cause kernel crash.
>> +
>> +**Resolution/Workaround**:
>> +   If using DPDK for PCI HW hot-unplug, prefer to bind device with VFIO instead of IGB_UIO.
>> +
>> +**Affected Environment/Platform**:
>> +    ALL.
>> +
>> +**Driver/Module**:
>> +   ``igb_uio`` module.
> Surely this is fixable. What is the back trace in the kernel? How can it be reproduced with some
> common hardware (or hypervisor).  Will it happen with KVM?

I think the final fix should be at uio_module in the linux kernel,  and 
workaround could be in user space and igb_uio kernel driver if there is 
a better one. So that is why we need a document here.


you could reference the back trace as below.

[ 1078.006709] RIP: 0010:uio_write+0x2e/0xc0 [uio]

[ 1078.006727] Call Trace: [ 1078.006765]

  __vfs_write+0x18/0x40 [ 1078.006768]

  vfs_write+0xb8/0x1b0 [ 1078.006770]

  SyS_write+0x55/0xc0 [ 1078.006791]

  entry_SYSCALL_64_fastpath+0x1e/0xad [ 1078.006793]

RIP: 0033:0x7f75a10224bd


you could check the whole info  at below link which i have attach.

http://patches.dpdk.org/patch/47923/


The system env:

Host kernel: 4.17.0-041700rc1-generic

Vm kernel: Linux ubuntu 4.10.0-28-generic #32~16.04.2-Ubuntu.

QEMU emulator version: 2.5.0

DPDK: version: 18.11-rc4

NIC: ixgbe or i40e nic or other(igb_uio pci nic)

Reproduce step:

Host environment

1. Host: Bind port 0 to vfio-pci

    modprobe vfio_pci

./usertools/dpdk-devbind.py -b vfio-pci 81:10.0

2. start qemu scripts

taskset -c 12-21 qemu-system-x86_64 \

-enable-kvm -m 8192 -smp cores=10,sockets=1 -cpu host -name dpdk1-vm1 \

-monitor stdio \

-drive file=/home/vm/ubuntu-14.04.img \

-device vfio-pci,host=0000:81:10.0,id=dev1 \

-netdev tap,id=ipvm1,ifname=tap5,script=/etc/qemu-ifup -device 
rtl8139,netdev=ipvm1,id=net0,mac=00:00:00:00:00:01 \

-localtime -vnc :2


VM environment

1. Bind port 0 to igb_uio

./usertools/dpdk-devbind.py --st

./usertools/dpdk-devbind.py -b igb_uio 00:03.0

2. Start testpmd and enable hotplug feature

./x86_64-native-linuxapp-gcc/app/testpmd -c f -n 4 -- -i --hot-plug

3. testpmd>set fwd txonly

4. testpmd>start

5. Qemu: remove device for unplug:

(qemu) device_del dev1

6.Qemu : add device for plug:

(qemu) device_add vfio-pci,host=0000:81:10.0,id=dev1

7. Bind port 0 to igb_uio:

./usertools/dpdk-devbind.py -b igb_uio 00:03.0

8. testpmd>stop

9. testpmd>port attach 0000: 00:03.0

10. testpmd>port start all

11. testpmd>start

12. Repeat 5 -- 12 until the kernel crash occur.
  
Ferruh Yigit Nov. 22, 2018, 5:46 p.m. UTC | #3
On 11/20/2018 3:09 PM, Jeff Guo wrote:
> When device has been bound to igb_uio driver and application is running,
> hot-unplugging the device may cause kernel crash.
> 
> Signed-off-by: Jeff Guo <jia.guo@intel.com>

Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
  
John McNamara Nov. 22, 2018, 5:57 p.m. UTC | #4
> -----Original Message-----
> From: Guo, Jia
> Sent: Tuesday, November 20, 2018 3:10 PM
> To: Kovacevic, Marko <marko.kovacevic@intel.com>; Mcnamara, John
> <john.mcnamara@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Yigit,
> Ferruh <ferruh.yigit@intel.com>; thomas@monjalon.net
> Cc: dev@dpdk.org; Guo, Jia <jia.guo@intel.com>; Zhang, Helin
> <helin.zhang@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; He, Shaopeng <shaopeng.he@intel.com>;
> stephen@networkplumber.org; Richardson, Bruce
> <bruce.richardson@intel.com>; gaetan.rivet@6wind.com
> Subject: [PATCH] doc: add known igb_uio device hot-unplug issue
> 
> When device has been bound to igb_uio driver and application is running,
> hot-unplugging the device may cause kernel crash.


I agree that it should be fixed, if possible, but in the meantime having
it documented is a good idea.

Acked-by: John McNamara <john.mcnamara@intel.com>
  
Thomas Monjalon Nov. 23, 2018, 2:12 a.m. UTC | #5
22/11/2018 18:57, Mcnamara, John:
> From: Guo, Jia
> > 
> > When device has been bound to igb_uio driver and application is running,
> > hot-unplugging the device may cause kernel crash.
> 
> 
> I agree that it should be fixed, if possible, but in the meantime having
> it documented is a good idea.
> 
> Acked-by: John McNamara <john.mcnamara@intel.com>

Applied, thanks
  

Patch

diff --git a/doc/guides/rel_notes/known_issues.rst b/doc/guides/rel_notes/known_issues.rst
index 95e4ce6..dfe0565 100644
--- a/doc/guides/rel_notes/known_issues.rst
+++ b/doc/guides/rel_notes/known_issues.rst
@@ -759,3 +759,24 @@  Netvsc driver and application restart
 
 **Driver/Module**:
    ``uio_hv_generic`` module.
+
+
+kernel crash when hot-unplug igb_uio device while DPDK application is running
+-----------------------------------------------------------------------------
+
+**Description**:
+   When device has been bound to igb_uio driver and application is running, hot-unplugging
+   the device may cause kernel crash.
+
+**Reason**:
+   When device is hot-unplugged, igb_uio driver will be removed which will destroy uio resources.
+   Later trying to access any uio resource will cause kernel crash.
+
+**Resolution/Workaround**:
+   If using DPDK for PCI HW hot-unplug, prefer to bind device with VFIO instead of IGB_UIO.
+
+**Affected Environment/Platform**:
+    ALL.
+
+**Driver/Module**:
+   ``igb_uio`` module.