net/ixgbe: fix crash when on remove

Message ID 1550026132-9244-1-git-send-email-wangyunjian@huawei.com (mailing list archive)
State Accepted, archived
Delegated to: Qi Zhang
Headers
Series net/ixgbe: fix crash when on remove |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/mellanox-Performance-Testing success Performance Testing PASS
ci/intel-Performance-Testing success Performance Testing PASS
ci/Intel-compilation success Compilation OK

Commit Message

Yunjian Wang Feb. 13, 2019, 2:48 a.m. UTC
  From: Yunjian Wang <wangyunjian@huawei.com>

The nic's interrupt source has some active handler, when the
port remove. We should cancel the delay handler before remove
dev to prevent executing the delay handler.

Call Trace:
  #0  ixgbe_disable_intr (hw=0x0, hw=0x0)
      at /usr/src/debug/dpdk-18.11/drivers/net/ixgbe/ixgbe_ethdev.c:852
  #1  ixgbe_dev_interrupt_delayed_handler (param=0xadb9c0
      <rte_eth_devices@@DPDK_2.2+33024>)
      at /usr/src/debug/dpdk-18.11/drivers/net/ixgbe/ixgbe_ethdev.c:4386
  #2  0x00007f05782147af in eal_alarm_callback (arg=<optimized out>)
      at /usr/src/debug/dpdk-18.11/lib/librte_eal/linuxapp/eal/
      eal_alarm.c:90
  #3  0x00007f057821320a in eal_intr_process_interrupts (nfds=1,
      events=0x7f056cbf3e88) at /usr/src/debug/dpdk-18.11/lib/
      librte_eal/linuxapp/eal/eal_interrupts.c:838
  #4  eal_intr_handle_interrupts (totalfds=<optimized out>, pfd=18)
      at /usr/src/debug/dpdk-18.11/lib/librte_eal/linuxapp/eal/
      eal_interrupts.c:885
  #5  eal_intr_thread_main (arg=<optimized out>)
      at /usr/src/debug/dpdk-18.11/lib/librte_eal/linuxapp/eal/
      eal_interrupts.c:965
  #6  0x00007f05708a0e45 in start_thread () from /usr/lib64/libpthread.so.0
  #7  0x00007f056eb4ab5d in clone () from /usr/lib64/libc.so.6

Fixes: 2866c5f1b87e ("ixgbe: support port hotplug")
Cc: stable@dpdk.org

Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
---
 drivers/net/ixgbe/ixgbe_ethdev.c | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

Qi Zhang Feb. 26, 2019, 5:34 a.m. UTC | #1
Hi:

> -----Original Message-----
> From: wangyunjian [mailto:wangyunjian@huawei.com]
> Sent: Wednesday, February 13, 2019 10:49 AM
> To: dev@dpdk.org
> Cc: Zhang, Qi Z <qi.z.zhang@intel.com>; xudingke@huawei.com; Yunjian Wang
> <wangyunjian@huawei.com>; stable@dpdk.org
> Subject: [dpdk-dev] [PATCH] net/ixgbe: fix crash when on remove
> 
> From: Yunjian Wang <wangyunjian@huawei.com>
> 
> The nic's interrupt source has some active handler, when the port remove. We
> should cancel the delay handler before remove dev to prevent executing the
> delay handler.

Agree, thanks to capture this.

> 
> Call Trace:
>   #0  ixgbe_disable_intr (hw=0x0, hw=0x0)
>       at /usr/src/debug/dpdk-18.11/drivers/net/ixgbe/ixgbe_ethdev.c:852
>   #1  ixgbe_dev_interrupt_delayed_handler (param=0xadb9c0
>       <rte_eth_devices@@DPDK_2.2+33024>)
>       at /usr/src/debug/dpdk-18.11/drivers/net/ixgbe/ixgbe_ethdev.c:4386
>   #2  0x00007f05782147af in eal_alarm_callback (arg=<optimized out>)
>       at /usr/src/debug/dpdk-18.11/lib/librte_eal/linuxapp/eal/
>       eal_alarm.c:90
>   #3  0x00007f057821320a in eal_intr_process_interrupts (nfds=1,
>       events=0x7f056cbf3e88) at /usr/src/debug/dpdk-18.11/lib/
>       librte_eal/linuxapp/eal/eal_interrupts.c:838
>   #4  eal_intr_handle_interrupts (totalfds=<optimized out>, pfd=18)
>       at /usr/src/debug/dpdk-18.11/lib/librte_eal/linuxapp/eal/
>       eal_interrupts.c:885
>   #5  eal_intr_thread_main (arg=<optimized out>)
>       at /usr/src/debug/dpdk-18.11/lib/librte_eal/linuxapp/eal/
>       eal_interrupts.c:965
>   #6  0x00007f05708a0e45 in start_thread () from /usr/lib64/libpthread.so.0
>   #7  0x00007f056eb4ab5d in clone () from /usr/lib64/libc.so.6
> 
> Fixes: 2866c5f1b87e ("ixgbe: support port hotplug")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> ---
>  drivers/net/ixgbe/ixgbe_ethdev.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> b/drivers/net/ixgbe/ixgbe_ethdev.c
> index 7493110..e9533e5 100644
> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> @@ -1336,6 +1336,9 @@ struct rte_ixgbe_xstats_name_off {
>  		rte_delay_ms(100);
>  	} while (retries++ < (10 + IXGBE_LINK_UP_TIME));
> 
> +	/* cancel the delay handler before remove dev */
> +	rte_eal_alarm_cancel(ixgbe_dev_interrupt_delayed_handler, eth_dev);
> +

I think it will be more safe to move this call ahead,
the delayed handler may invoked application callback which may also invoke the ethdev API, but at this moment, we already reset ethdev, we still have chance to get problem, right?
Is it better that we add this call at the beginning of dev_close?

Regards
Qi


>  	/* uninitialize PF if max_vfs not zero */
>  	ixgbe_pf_host_uninit(eth_dev);
> 
> --
> 1.8.3.1
>
  
Yunjian Wang Feb. 26, 2019, 7:21 a.m. UTC | #2
> -----Original Message-----
> From: Zhang, Qi Z [mailto:qi.z.zhang@intel.com]
> Sent: Tuesday, February 26, 2019 1:35 PM
> To: wangyunjian <wangyunjian@huawei.com>; dev@dpdk.org
> Cc: xudingke <xudingke@huawei.com>; stable@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] net/ixgbe: fix crash when on remove
> 
> Hi:
> 
> > -----Original Message-----
> > From: wangyunjian [mailto:wangyunjian@huawei.com]
> > Sent: Wednesday, February 13, 2019 10:49 AM
> > To: dev@dpdk.org
> > Cc: Zhang, Qi Z <qi.z.zhang@intel.com>; xudingke@huawei.com; Yunjian
> > Wang <wangyunjian@huawei.com>; stable@dpdk.org
> > Subject: [dpdk-dev] [PATCH] net/ixgbe: fix crash when on remove
> >
> > From: Yunjian Wang <wangyunjian@huawei.com>
> >
> > The nic's interrupt source has some active handler, when the port
> > remove. We should cancel the delay handler before remove dev to
> > prevent executing the delay handler.
> 
> Agree, thanks to capture this.
> 
> >
> > Call Trace:
> >   #0  ixgbe_disable_intr (hw=0x0, hw=0x0)
> >       at /usr/src/debug/dpdk-18.11/drivers/net/ixgbe/ixgbe_ethdev.c:852
> >   #1  ixgbe_dev_interrupt_delayed_handler (param=0xadb9c0
> >       <rte_eth_devices@@DPDK_2.2+33024>)
> >       at /usr/src/debug/dpdk-18.11/drivers/net/ixgbe/ixgbe_ethdev.c:4386
> >   #2  0x00007f05782147af in eal_alarm_callback (arg=<optimized out>)
> >       at /usr/src/debug/dpdk-18.11/lib/librte_eal/linuxapp/eal/
> >       eal_alarm.c:90
> >   #3  0x00007f057821320a in eal_intr_process_interrupts (nfds=1,
> >       events=0x7f056cbf3e88) at /usr/src/debug/dpdk-18.11/lib/
> >       librte_eal/linuxapp/eal/eal_interrupts.c:838
> >   #4  eal_intr_handle_interrupts (totalfds=<optimized out>, pfd=18)
> >       at /usr/src/debug/dpdk-18.11/lib/librte_eal/linuxapp/eal/
> >       eal_interrupts.c:885
> >   #5  eal_intr_thread_main (arg=<optimized out>)
> >       at /usr/src/debug/dpdk-18.11/lib/librte_eal/linuxapp/eal/
> >       eal_interrupts.c:965
> >   #6  0x00007f05708a0e45 in start_thread () from /usr/lib64/libpthread.so.0
> >   #7  0x00007f056eb4ab5d in clone () from /usr/lib64/libc.so.6
> >
> > Fixes: 2866c5f1b87e ("ixgbe: support port hotplug")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > ---
> >  drivers/net/ixgbe/ixgbe_ethdev.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> > b/drivers/net/ixgbe/ixgbe_ethdev.c
> > index 7493110..e9533e5 100644
> > --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> > @@ -1336,6 +1336,9 @@ struct rte_ixgbe_xstats_name_off {
> >  		rte_delay_ms(100);
> >  	} while (retries++ < (10 + IXGBE_LINK_UP_TIME));
> >
> > +	/* cancel the delay handler before remove dev */
> > +	rte_eal_alarm_cancel(ixgbe_dev_interrupt_delayed_handler,
> eth_dev);
> > +
> 
> I think it will be more safe to move this call ahead, the delayed handler may
> invoked application callback which may also invoke the ethdev API, but at
> this moment, we already reset ethdev, we still have chance to get problem,
> right?
> Is it better that we add this call at the beginning of dev_close?
> 
> Regards
> Qi

The delay handler callback was canceled after unregistered interrupt handler
and the interrupt handler callback was unregistered after disabled uio. So I
added the call after rte_intr_callback_unregister.

I am not sure if we can add the disable uio, unregister interrupt handler at
the beginning of dev_close.

Thanks
Yunjian
> 
> 
> >  	/* uninitialize PF if max_vfs not zero */
> >  	ixgbe_pf_host_uninit(eth_dev);
> >
> > --
> > 1.8.3.1
> >
  
Qi Zhang March 7, 2019, 1:38 p.m. UTC | #3
> -----Original Message-----
> From: wangyunjian [mailto:wangyunjian@huawei.com]
> Sent: Tuesday, February 26, 2019 3:22 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; dev@dpdk.org
> Cc: xudingke <xudingke@huawei.com>; stable@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] net/ixgbe: fix crash when on remove
> 
> 
> 
> > -----Original Message-----
> > From: Zhang, Qi Z [mailto:qi.z.zhang@intel.com]
> > Sent: Tuesday, February 26, 2019 1:35 PM
> > To: wangyunjian <wangyunjian@huawei.com>; dev@dpdk.org
> > Cc: xudingke <xudingke@huawei.com>; stable@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH] net/ixgbe: fix crash when on remove
> >
> > Hi:
> >
> > > -----Original Message-----
> > > From: wangyunjian [mailto:wangyunjian@huawei.com]
> > > Sent: Wednesday, February 13, 2019 10:49 AM
> > > To: dev@dpdk.org
> > > Cc: Zhang, Qi Z <qi.z.zhang@intel.com>; xudingke@huawei.com; Yunjian
> > > Wang <wangyunjian@huawei.com>; stable@dpdk.org
> > > Subject: [dpdk-dev] [PATCH] net/ixgbe: fix crash when on remove
> > >
> > > From: Yunjian Wang <wangyunjian@huawei.com>
> > >
> > > The nic's interrupt source has some active handler, when the port
> > > remove. We should cancel the delay handler before remove dev to
> > > prevent executing the delay handler.
> >
> > Agree, thanks to capture this.
> >
> > >
> > > Call Trace:
> > >   #0  ixgbe_disable_intr (hw=0x0, hw=0x0)
> > >       at /usr/src/debug/dpdk-18.11/drivers/net/ixgbe/ixgbe_ethdev.c:852
> > >   #1  ixgbe_dev_interrupt_delayed_handler (param=0xadb9c0
> > >       <rte_eth_devices@@DPDK_2.2+33024>)
> > >       at
> /usr/src/debug/dpdk-18.11/drivers/net/ixgbe/ixgbe_ethdev.c:4386
> > >   #2  0x00007f05782147af in eal_alarm_callback (arg=<optimized out>)
> > >       at /usr/src/debug/dpdk-18.11/lib/librte_eal/linuxapp/eal/
> > >       eal_alarm.c:90
> > >   #3  0x00007f057821320a in eal_intr_process_interrupts (nfds=1,
> > >       events=0x7f056cbf3e88) at /usr/src/debug/dpdk-18.11/lib/
> > >       librte_eal/linuxapp/eal/eal_interrupts.c:838
> > >   #4  eal_intr_handle_interrupts (totalfds=<optimized out>, pfd=18)
> > >       at /usr/src/debug/dpdk-18.11/lib/librte_eal/linuxapp/eal/
> > >       eal_interrupts.c:885
> > >   #5  eal_intr_thread_main (arg=<optimized out>)
> > >       at /usr/src/debug/dpdk-18.11/lib/librte_eal/linuxapp/eal/
> > >       eal_interrupts.c:965
> > >   #6  0x00007f05708a0e45 in start_thread () from /usr/lib64/libpthread.so.0
> > >   #7  0x00007f056eb4ab5d in clone () from /usr/lib64/libc.so.6
> > >
> > > Fixes: 2866c5f1b87e ("ixgbe: support port hotplug")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> > > ---
> > >  drivers/net/ixgbe/ixgbe_ethdev.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> > > b/drivers/net/ixgbe/ixgbe_ethdev.c
> > > index 7493110..e9533e5 100644
> > > --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> > > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> > > @@ -1336,6 +1336,9 @@ struct rte_ixgbe_xstats_name_off {
> > >  		rte_delay_ms(100);
> > >  	} while (retries++ < (10 + IXGBE_LINK_UP_TIME));
> > >
> > > +	/* cancel the delay handler before remove dev */
> > > +	rte_eal_alarm_cancel(ixgbe_dev_interrupt_delayed_handler,
> > eth_dev);
> > > +
> >
> > I think it will be more safe to move this call ahead, the delayed
> > handler may invoked application callback which may also invoke the
> > ethdev API, but at this moment, we already reset ethdev, we still have
> > chance to get problem, right?
> > Is it better that we add this call at the beginning of dev_close?
> >
> > Regards
> > Qi
> 
> The delay handler callback was canceled after unregistered interrupt handler and
> the interrupt handler callback was unregistered after disabled uio. So I added the
> call after rte_intr_callback_unregister.
> 
> I am not sure if we can add the disable uio, unregister interrupt handler at the
> beginning of dev_close.

OK, I think the idea scenario is

Disable interrupt -> unregister interrupt handler -> unregister delayed handler -> reset ethdev ...
Now we reset ethdev too early give potential risk in interrupt handler, but that could be fixed by a separate patch.

So 
Acked-by: Qi Zhang <qi.z.zhang@intel.com>

Applied to dpdk-next-net-intel.

Thanks
Qi



> 
> Thanks
> Yunjian
> >
> >
> > >  	/* uninitialize PF if max_vfs not zero */
> > >  	ixgbe_pf_host_uninit(eth_dev);
> > >
> > > --
> > > 1.8.3.1
> > >
  

Patch

diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 7493110..e9533e5 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1336,6 +1336,9 @@  struct rte_ixgbe_xstats_name_off {
 		rte_delay_ms(100);
 	} while (retries++ < (10 + IXGBE_LINK_UP_TIME));
 
+	/* cancel the delay handler before remove dev */
+	rte_eal_alarm_cancel(ixgbe_dev_interrupt_delayed_handler, eth_dev);
+
 	/* uninitialize PF if max_vfs not zero */
 	ixgbe_pf_host_uninit(eth_dev);