app/testpmd: fix device hotplug remove

Message ID 20191024010310.35882-1-chenxux.di@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series app/testpmd: fix device hotplug remove |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-compilation success Compile Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/travis-robot warning Travis build: failed
ci/Intel-compilation success Compilation OK

Commit Message

Chenxu Di Oct. 24, 2019, 1:03 a.m. UTC
  Hotplug remove cause infinite loops. Fix by canceling port_close
 before port_detach function when rmv_port_callback.

Fixes: ac89d46096d5 ("net/i40e: release port upon close")

Signed-off-by: Di ChenxuX <chenxux.di@intel.com>
---
 app/test-pmd/testpmd.c | 1 -
 1 file changed, 1 deletion(-)
  

Comments

Iremonger, Bernard Oct. 24, 2019, 11:28 a.m. UTC | #1
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Di ChenxuX
> Sent: Thursday, October 24, 2019 2:03 AM
> To: dev@dpdk.org
> Cc: Lu, Wenzhuo <wenzhuo.lu@intel.com>; Yang, Qiming
> <qiming.yang@intel.com>; Di, ChenxuX <chenxux.di@intel.com>
> Subject: [dpdk-dev] [PATCH] app/testpmd: fix device hotplug remove
> 
> Hotplug remove cause infinite loops. Fix by canceling port_close  before
> port_detach function when rmv_port_callback.
> 
> Fixes: ac89d46096d5 ("net/i40e: release port upon close")
> 
> Signed-off-by: Di ChenxuX <chenxux.di@intel.com>

Acked-by: Bernard Iremonger <bernard.iremonger@intel.com>
  
Ferruh Yigit Oct. 24, 2019, 5:24 p.m. UTC | #2
On 10/24/2019 2:03 AM, Di ChenxuX wrote:
> Hotplug remove cause infinite loops. Fix by canceling port_close
>  before port_detach function when rmv_port_callback.

Can you please give more details/backtrace of how loop happens?
How can trigger it?

> 
> Fixes: ac89d46096d5 ("net/i40e: release port upon close")
> 
> Signed-off-by: Di ChenxuX <chenxux.di@intel.com>
> ---
>  app/test-pmd/testpmd.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index 5701f3141..a264644a1 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -2708,7 +2708,6 @@ rmv_port_callback(void *arg)
>  	no_link_check = 1;
>  	stop_port(port_id);
>  	no_link_check = org_no_link_check;
> -	close_port(port_id);
>  	detach_port_device(port_id);
>  	if (need_to_start)
>  		start_packet_forwarding(0);
>
  
Chenxu Di Oct. 25, 2019, 1:48 a.m. UTC | #3
Hi, Ferruh

> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Friday, October 25, 2019 1:24 AM
> To: Di, ChenxuX <chenxux.di@intel.com>; dev@dpdk.org
> Cc: Lu, Wenzhuo <wenzhuo.lu@intel.com>; Yang, Qiming
> <qiming.yang@intel.com>
> Subject: Re: [dpdk-dev] [PATCH] app/testpmd: fix device hotplug remove
> 
> On 10/24/2019 2:03 AM, Di ChenxuX wrote:
> > Hotplug remove cause infinite loops. Fix by canceling port_close
> > before port_detach function when rmv_port_callback.
> 
> Can you please give more details/backtrace of how loop happens?
> How can trigger it?

Here is the test case

Environment
Os: Ubuntu 18.04
Device: X710 nic
Software: qemu

1. Bind pf0 to vfio-pci

	[root@xxxxxxxxx dpdk]# modprobe vfio-pci
	[root@ xxxxxxxxx dpdk]# usertools/dpdk-devbind.py --force --bind=vfio-pci 0000:81:00.0
2. Passthrough PF and start qemu

	[root@ xxxxxxxxx dpdk]# taskset -c 0-7 qemu-system-x86_64 -enable-kvm -pidfile /tmp/.vm0.pid -m 10240 -cpu host -smp 8 -name vm0 -monitor unix:/tmp/vm0_monitor.sock,server,nowait -chardev socket,path=/tmp/vm0_qga0.sock,server,nowait,id=vm0_qga0 -device virtio-serial -device virtserialport,chardev=vm0_qga0,name=org.qemu.guest_agent.0 -device e1000,netdev=nttsip1 -netdev user,id=nttsip1,hostfwd=tcp: xxxxxxxxx:6000-:22 -monitor stdio -drive file=/home/image/test_vfio.img -vnc :5 -device vfio-pci,host=0000:81:00.0,id=dev1
3. Log in VM, bind passthrough port 0 to vfio-pci

	virtdut. xxxxxxxxx:6000: modprobe -r vfio_iommu_type1
	virtdut. xxxxxxxxx:6000: modprobe -r vfio
	virtdut. xxxxxxxxx:6000: modprobe vfio enable_unsafe_noiommu_mode=1
	virtdut. xxxxxxxxx:6000: modprobe vfio-pci

	virtdut. xxxxxxxxx:6000: ./usertools/dpdk-devbind.py -b vfio-pci 0000:00:05.0
4. Start testpmd with "--hot-plug" enable

	virtdut. xxxxxxxxx:6000: ./x86_64-native-linuxapp-gcc/app/testpmd -l 0,1,2,3,4,5,6,7 -n 1 -w 0000:00:05.0  --file-prefix=dpdk_24610_20191014100036   -- -i --hot-plug
5. Remove device from qemu interface

	(qemu) device_del dev1

6.before change
	Removing a device...	
	EAL: Driver cannot detach the device (0000:00:05.0)
	EAL: Failed to detach device on primary process
	testpmd: Failed to detach device 0000:00:05.0
	EAL: can not get port by device 0000:00:05.0!
	EAL: can not get port by device 0000:00:05.0!
	...
	EAL: can not get port by device 0000:00:05.0!
	...
	...
 after change:
	Removing a device...
	EAL: Error disabling MSI-X interrupts for fd 47
	EAL: Releasing pci mapped resource for 0000:00:05.0
	EAL: Calling pci_unmap_resource for 0000:00:05.0 at 0x1100800000
	EAL: Calling pci_unmap_resource for 0000:00:05.0 at 0x1101000000
	Device of port 0 is detached
	Now total ports is 0
	Done
	Invalid port_id=0
	EAL: Cannot find device (0000:00:05.0) on bus (pci)


> 
> >
> > Fixes: ac89d46096d5 ("net/i40e: release port upon close")
> >
> > Signed-off-by: Di ChenxuX <chenxux.di@intel.com>
> > ---
> >  app/test-pmd/testpmd.c | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> > 5701f3141..a264644a1 100644
> > --- a/app/test-pmd/testpmd.c
> > +++ b/app/test-pmd/testpmd.c
> > @@ -2708,7 +2708,6 @@ rmv_port_callback(void *arg)
> >  	no_link_check = 1;
> >  	stop_port(port_id);
> >  	no_link_check = org_no_link_check;
> > -	close_port(port_id);
> >  	detach_port_device(port_id);
> >  	if (need_to_start)
> >  		start_packet_forwarding(0);
> >
  
Thomas Monjalon Oct. 27, 2019, 10:35 p.m. UTC | #4
25/10/2019 03:48, Di, ChenxuX:
> From: Yigit, Ferruh
> > On 10/24/2019 2:03 AM, Di ChenxuX wrote:
> > > Hotplug remove cause infinite loops. Fix by canceling port_close
> > > before port_detach function when rmv_port_callback.
> > 
> > Can you please give more details/backtrace of how loop happens?
> > How can trigger it?
> 
> Here is the test case
> 
> Environment
> Os: Ubuntu 18.04
> Device: X710 nic
> Software: qemu
> 
> 1. Bind pf0 to vfio-pci
> 
> 	[root@xxxxxxxxx dpdk]# modprobe vfio-pci
> 	[root@ xxxxxxxxx dpdk]# usertools/dpdk-devbind.py --force --bind=vfio-pci 0000:81:00.0
> 2. Passthrough PF and start qemu
> 
> 	[root@ xxxxxxxxx dpdk]# taskset -c 0-7 qemu-system-x86_64 -enable-kvm -pidfile /tmp/.vm0.pid -m 10240 -cpu host -smp 8 -name vm0 -monitor unix:/tmp/vm0_monitor.sock,server,nowait -chardev socket,path=/tmp/vm0_qga0.sock,server,nowait,id=vm0_qga0 -device virtio-serial -device virtserialport,chardev=vm0_qga0,name=org.qemu.guest_agent.0 -device e1000,netdev=nttsip1 -netdev user,id=nttsip1,hostfwd=tcp: xxxxxxxxx:6000-:22 -monitor stdio -drive file=/home/image/test_vfio.img -vnc :5 -device vfio-pci,host=0000:81:00.0,id=dev1
> 3. Log in VM, bind passthrough port 0 to vfio-pci
> 
> 	virtdut. xxxxxxxxx:6000: modprobe -r vfio_iommu_type1
> 	virtdut. xxxxxxxxx:6000: modprobe -r vfio
> 	virtdut. xxxxxxxxx:6000: modprobe vfio enable_unsafe_noiommu_mode=1
> 	virtdut. xxxxxxxxx:6000: modprobe vfio-pci
> 
> 	virtdut. xxxxxxxxx:6000: ./usertools/dpdk-devbind.py -b vfio-pci 0000:00:05.0
> 4. Start testpmd with "--hot-plug" enable
> 
> 	virtdut. xxxxxxxxx:6000: ./x86_64-native-linuxapp-gcc/app/testpmd -l 0,1,2,3,4,5,6,7 -n 1 -w 0000:00:05.0  --file-prefix=dpdk_24610_20191014100036   -- -i --hot-plug
> 5. Remove device from qemu interface
> 
> 	(qemu) device_del dev1
> 
> 6.before change
> 	Removing a device...	
> 	EAL: Driver cannot detach the device (0000:00:05.0)
> 	EAL: Failed to detach device on primary process
> 	testpmd: Failed to detach device 0000:00:05.0
> 	EAL: can not get port by device 0000:00:05.0!
> 	EAL: can not get port by device 0000:00:05.0!
> 	...
> 	EAL: can not get port by device 0000:00:05.0!
> 	...
> 	...
>  after change:
> 	Removing a device...
> 	EAL: Error disabling MSI-X interrupts for fd 47
> 	EAL: Releasing pci mapped resource for 0000:00:05.0
> 	EAL: Calling pci_unmap_resource for 0000:00:05.0 at 0x1100800000
> 	EAL: Calling pci_unmap_resource for 0000:00:05.0 at 0x1101000000
> 	Device of port 0 is detached
> 	Now total ports is 0
> 	Done
> 	Invalid port_id=0
> 	EAL: Cannot find device (0000:00:05.0) on bus (pci)
> 
> 
> > 
> > >
> > > Fixes: ac89d46096d5 ("net/i40e: release port upon close")
> > >
> > > Signed-off-by: Di ChenxuX <chenxux.di@intel.com>
> > > ---
> > >  app/test-pmd/testpmd.c | 1 -
> > >  1 file changed, 1 deletion(-)
> > >
> > > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> > > 5701f3141..a264644a1 100644
> > > --- a/app/test-pmd/testpmd.c
> > > +++ b/app/test-pmd/testpmd.c
> > > @@ -2708,7 +2708,6 @@ rmv_port_callback(void *arg)
> > >  	no_link_check = 1;
> > >  	stop_port(port_id);
> > >  	no_link_check = org_no_link_check;
> > > -	close_port(port_id);
> > >  	detach_port_device(port_id);
> > >  	if (need_to_start)
> > >  		start_packet_forwarding(0);


I disagree with this patch.
You are removing a call to the "close" function because it does not work
properly with your driver.
Please do not blame the tool which is showing the error.
  
Qiming Yang Oct. 28, 2019, 5:51 a.m. UTC | #5
Hi,

> -----Original Message-----
> From: Di, ChenxuX
> Sent: Thursday, October 24, 2019 9:03 AM
> To: dev@dpdk.org
> Cc: Lu, Wenzhuo <wenzhuo.lu@intel.com>; Yang, Qiming
> <qiming.yang@intel.com>; Di, ChenxuX <chenxux.di@intel.com>
> Subject: [PATCH] app/testpmd: fix device hotplug remove
> 
> Hotplug remove cause infinite loops. Fix by canceling port_close  before
> port_detach function when rmv_port_callback.
> 
> Fixes: ac89d46096d5 ("net/i40e: release port upon close")
> 
> Signed-off-by: Di ChenxuX <chenxux.di@intel.com>
> ---
>  app/test-pmd/testpmd.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
> 5701f3141..a264644a1 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -2708,7 +2708,6 @@ rmv_port_callback(void *arg)
>  	no_link_check = 1;
>  	stop_port(port_id);
>  	no_link_check = org_no_link_check;
> -	close_port(port_id);
>  	detach_port_device(port_id);
>  	if (need_to_start)
>  		start_packet_forwarding(0);
> --
> 2.17.1

NACK, this patch is not acceptable.
  
Ferruh Yigit Oct. 29, 2019, 12:13 p.m. UTC | #6
On 10/25/2019 2:48 AM, Di, ChenxuX wrote:
> Hi, Ferruh
> 
>> -----Original Message-----
>> From: Yigit, Ferruh
>> Sent: Friday, October 25, 2019 1:24 AM
>> To: Di, ChenxuX <chenxux.di@intel.com>; dev@dpdk.org
>> Cc: Lu, Wenzhuo <wenzhuo.lu@intel.com>; Yang, Qiming
>> <qiming.yang@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH] app/testpmd: fix device hotplug remove
>>
>> On 10/24/2019 2:03 AM, Di ChenxuX wrote:
>>> Hotplug remove cause infinite loops. Fix by canceling port_close
>>> before port_detach function when rmv_port_callback.
>>
>> Can you please give more details/backtrace of how loop happens?
>> How can trigger it?
> 
> Here is the test case
> 
> Environment
> Os: Ubuntu 18.04
> Device: X710 nic
> Software: qemu
> 
> 1. Bind pf0 to vfio-pci
> 
> 	[root@xxxxxxxxx dpdk]# modprobe vfio-pci
> 	[root@ xxxxxxxxx dpdk]# usertools/dpdk-devbind.py --force --bind=vfio-pci 0000:81:00.0
> 2. Passthrough PF and start qemu
> 
> 	[root@ xxxxxxxxx dpdk]# taskset -c 0-7 qemu-system-x86_64 -enable-kvm -pidfile /tmp/.vm0.pid -m 10240 -cpu host -smp 8 -name vm0 -monitor unix:/tmp/vm0_monitor.sock,server,nowait -chardev socket,path=/tmp/vm0_qga0.sock,server,nowait,id=vm0_qga0 -device virtio-serial -device virtserialport,chardev=vm0_qga0,name=org.qemu.guest_agent.0 -device e1000,netdev=nttsip1 -netdev user,id=nttsip1,hostfwd=tcp: xxxxxxxxx:6000-:22 -monitor stdio -drive file=/home/image/test_vfio.img -vnc :5 -device vfio-pci,host=0000:81:00.0,id=dev1
> 3. Log in VM, bind passthrough port 0 to vfio-pci
> 
> 	virtdut. xxxxxxxxx:6000: modprobe -r vfio_iommu_type1
> 	virtdut. xxxxxxxxx:6000: modprobe -r vfio
> 	virtdut. xxxxxxxxx:6000: modprobe vfio enable_unsafe_noiommu_mode=1
> 	virtdut. xxxxxxxxx:6000: modprobe vfio-pci
> 
> 	virtdut. xxxxxxxxx:6000: ./usertools/dpdk-devbind.py -b vfio-pci 0000:00:05.0
> 4. Start testpmd with "--hot-plug" enable
> 
> 	virtdut. xxxxxxxxx:6000: ./x86_64-native-linuxapp-gcc/app/testpmd -l 0,1,2,3,4,5,6,7 -n 1 -w 0000:00:05.0  --file-prefix=dpdk_24610_20191014100036   -- -i --hot-plug
> 5. Remove device from qemu interface
> 
> 	(qemu) device_del dev1
> 
> 6.before change
> 	Removing a device...	
> 	EAL: Driver cannot detach the device (0000:00:05.0)
> 	EAL: Failed to detach device on primary process
> 	testpmd: Failed to detach device 0000:00:05.0
> 	EAL: can not get port by device 0000:00:05.0!
> 	EAL: can not get port by device 0000:00:05.0!

'close()' is failing is a problem and should be fix but we keep getting
"RTE_DEV_EVENT_REMOVE" event? Did you able to get the backtrace of the issue?

> 	...
> 	EAL: can not get port by device 0000:00:05.0!
> 	...
> 	...
>  after change:
> 	Removing a device...
> 	EAL: Error disabling MSI-X interrupts for fd 47
> 	EAL: Releasing pci mapped resource for 0000:00:05.0
> 	EAL: Calling pci_unmap_resource for 0000:00:05.0 at 0x1100800000
> 	EAL: Calling pci_unmap_resource for 0000:00:05.0 at 0x1101000000
> 	Device of port 0 is detached
> 	Now total ports is 0
> 	Done
> 	Invalid port_id=0
> 	EAL: Cannot find device (0000:00:05.0) on bus (pci)
> 
> 
>>
>>>
>>> Fixes: ac89d46096d5 ("net/i40e: release port upon close")
>>>
>>> Signed-off-by: Di ChenxuX <chenxux.di@intel.com>
>>> ---
>>>  app/test-pmd/testpmd.c | 1 -
>>>  1 file changed, 1 deletion(-)
>>>
>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>>> 5701f3141..a264644a1 100644
>>> --- a/app/test-pmd/testpmd.c
>>> +++ b/app/test-pmd/testpmd.c
>>> @@ -2708,7 +2708,6 @@ rmv_port_callback(void *arg)
>>>  	no_link_check = 1;
>>>  	stop_port(port_id);
>>>  	no_link_check = org_no_link_check;
>>> -	close_port(port_id);
>>>  	detach_port_device(port_id);
>>>  	if (need_to_start)
>>>  		start_packet_forwarding(0);
>>>
>
  
Ferruh Yigit Nov. 5, 2019, 1:54 p.m. UTC | #7
On 10/28/2019 5:51 AM, Yang, Qiming wrote:
> Hi,
> 
>> -----Original Message-----
>> From: Di, ChenxuX
>> Sent: Thursday, October 24, 2019 9:03 AM
>> To: dev@dpdk.org
>> Cc: Lu, Wenzhuo <wenzhuo.lu@intel.com>; Yang, Qiming
>> <qiming.yang@intel.com>; Di, ChenxuX <chenxux.di@intel.com>
>> Subject: [PATCH] app/testpmd: fix device hotplug remove
>>
>> Hotplug remove cause infinite loops. Fix by canceling port_close  before
>> port_detach function when rmv_port_callback.
>>
>> Fixes: ac89d46096d5 ("net/i40e: release port upon close")
>>
>> Signed-off-by: Di ChenxuX <chenxux.di@intel.com>
>> ---
>>  app/test-pmd/testpmd.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>> 5701f3141..a264644a1 100644
>> --- a/app/test-pmd/testpmd.c
>> +++ b/app/test-pmd/testpmd.c
>> @@ -2708,7 +2708,6 @@ rmv_port_callback(void *arg)
>>  	no_link_check = 1;
>>  	stop_port(port_id);
>>  	no_link_check = org_no_link_check;
>> -	close_port(port_id);
>>  	detach_port_device(port_id);
>>  	if (need_to_start)
>>  		start_packet_forwarding(0);
>> --
>> 2.17.1
> 
> NACK, this patch is not acceptable.
> 

I can't really follow what is the root cause of the failure here, @Qiming, if it
is a driver issue, can you please briefly describe what is wrong in the driver?

Thanks,
ferruh
  
Xiaolong Ye Nov. 5, 2019, 11:30 p.m. UTC | #8
Hi, Ferruh

On 11/05, Ferruh Yigit wrote:
>On 10/28/2019 5:51 AM, Yang, Qiming wrote:
>> Hi,
>> 
>>> -----Original Message-----
>>> From: Di, ChenxuX
>>> Sent: Thursday, October 24, 2019 9:03 AM
>>> To: dev@dpdk.org
>>> Cc: Lu, Wenzhuo <wenzhuo.lu@intel.com>; Yang, Qiming
>>> <qiming.yang@intel.com>; Di, ChenxuX <chenxux.di@intel.com>
>>> Subject: [PATCH] app/testpmd: fix device hotplug remove
>>>
>>> Hotplug remove cause infinite loops. Fix by canceling port_close  before
>>> port_detach function when rmv_port_callback.
>>>
>>> Fixes: ac89d46096d5 ("net/i40e: release port upon close")
>>>
>>> Signed-off-by: Di ChenxuX <chenxux.di@intel.com>
>>> ---
>>>  app/test-pmd/testpmd.c | 1 -
>>>  1 file changed, 1 deletion(-)
>>>
>>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index
>>> 5701f3141..a264644a1 100644
>>> --- a/app/test-pmd/testpmd.c
>>> +++ b/app/test-pmd/testpmd.c
>>> @@ -2708,7 +2708,6 @@ rmv_port_callback(void *arg)
>>>  	no_link_check = 1;
>>>  	stop_port(port_id);
>>>  	no_link_check = org_no_link_check;
>>> -	close_port(port_id);
>>>  	detach_port_device(port_id);
>>>  	if (need_to_start)
>>>  		start_packet_forwarding(0);
>>> --
>>> 2.17.1
>> 
>> NACK, this patch is not acceptable.
>> 
>
>I can't really follow what is the root cause of the failure here, @Qiming, if it
>is a driver issue, can you please briefly describe what is wrong in the driver?

The real issue lays in i40e driver's remove ops after it adopts the RTE_ETH_DEV_CLOSE_REMOVE
flag, I've talked with chenxu and he'll send a new patch to fix this issue this patch tried
to solve.

Thanks,
Xiaolong

>
>Thanks,
>ferruh
  

Patch

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 5701f3141..a264644a1 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2708,7 +2708,6 @@  rmv_port_callback(void *arg)
 	no_link_check = 1;
 	stop_port(port_id);
 	no_link_check = org_no_link_check;
-	close_port(port_id);
 	detach_port_device(port_id);
 	if (need_to_start)
 		start_packet_forwarding(0);