[dpdk-dev,1/3] rte_interrupts: add rte_eal_intr_exit to shut down IRQ thread

Message ID 1455399524-3252-1-git-send-email-mhall@mhcomputing.net (mailing list archive)
State Rejected, archived
Delegated to: Thomas Monjalon
Headers

Commit Message

Matthew Hall Feb. 13, 2016, 9:38 p.m. UTC
  There is no good way to shut down this thread from an application signal
handler. Here we add an rte_eal_intr_exit() function to allow this.

Signed-off-by: Matthew Hall <mhall@mhcomputing.net>
---
 lib/librte_eal/common/include/rte_eal.h      |  9 +++++++++
 lib/librte_eal/linuxapp/eal/eal_interrupts.c | 11 +++++++++++
 2 files changed, 20 insertions(+)
  

Comments

Thomas Monjalon Feb. 28, 2016, 9:17 p.m. UTC | #1
2016-02-13 13:38, Matthew Hall:
> There is no good way to shut down this thread from an application signal
> handler. Here we add an rte_eal_intr_exit() function to allow this.

Please Cunming,
Would you have time to review this series about interrupt thread?
Thank you
  
Thomas Monjalon March 8, 2016, 3:09 p.m. UTC | #2
2016-02-28 22:17, Thomas Monjalon:
> 2016-02-13 13:38, Matthew Hall:
> > There is no good way to shut down this thread from an application signal
> > handler. Here we add an rte_eal_intr_exit() function to allow this.
> 
> Please Cunming,
> Would you have time to review this series about interrupt thread?
> Thank you

PING - reviewers wanted
  
Cunming Liang March 9, 2016, 9:05 a.m. UTC | #3
Hi Mattew,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Matthew Hall
> Sent: Sunday, February 14, 2016 5:39 AM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH 1/3] rte_interrupts: add rte_eal_intr_exit to shut
> down IRQ thread
> 
> There is no good way to shut down this thread from an application signal
> handler. Here we add an rte_eal_intr_exit() function to allow this.
> 
> Signed-off-by: Matthew Hall <mhall@mhcomputing.net>
> ---
>  lib/librte_eal/common/include/rte_eal.h      |  9 +++++++++
>  lib/librte_eal/linuxapp/eal/eal_interrupts.c | 11 +++++++++++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/lib/librte_eal/common/include/rte_eal.h
> b/lib/librte_eal/common/include/rte_eal.h
> index d2816a8..1533eeb 100644
> --- a/lib/librte_eal/common/include/rte_eal.h
> +++ b/lib/librte_eal/common/include/rte_eal.h
> @@ -165,6 +165,15 @@ int rte_eal_init(int argc, char **argv);
>  typedef void	(*rte_usage_hook_t)(const char * prgname);
> 
>  /**
> + * Shut down the EAL interrupt thread.
> + *
> + * This function can be called from a signal handler during application
> + * shutdown.
> + *
> + */
> +int rte_eal_intr_exit(void);
I'm trying to understand the motivation. 
I don't think you're going to gracefully exit intr thread but leave all other eal threads live. We don't have API to new launch intr thread again.
So I guess your app is using own pthread(none EAL thread), you're trying to safely shutdown the whole application by your signal handler.
For this purpose, the device shall close safely(turn off intr) during the time, intr thread still wait but no event will be raised.
In this view, it seems not necessary to have this new. Can you explain more detail for the purpose? Thanks.

> +
> +/**
>   * Add application usage routine callout from the eal_usage() routine.
>   *
>   * This function allows the application to include its usage message
> diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> index b33ccdb..aa332a1 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
> @@ -892,6 +892,17 @@ rte_eal_intr_init(void)
>  		if (ret_1 != 0)
>  			RTE_LOG(ERR, EAL,
>  			"Failed to set thread name for interrupt handling\n");
> +
> +int
> +rte_eal_intr_exit(void)
> +{
> +	int ret = 0;
> +
> +	ret = pthread_cancel(intr_thread);
> +	if (ret != 0) {
> +		RTE_LOG(ERR, EAL,
> +			"Failed to cancel thread for interrupt handling\n");
> +		return -ret;
>  	}
> 
>  	return -ret;
> --
> 2.5.0
  
Matthew Hall March 17, 2016, 10:55 p.m. UTC | #4
From Cunming:
> I'm trying to understand the motivation.
> 
> I don't think you're going to gracefully exit intr thread but leave all 
> other eal threads live. We don't have API to new launch intr thread again.

The doc comment added for rte_eal_intr_exit already explains this. According 
to the doc I wrote, use of the function is limited to shutting everything 
down.

> So I guess your app is using own pthread(none EAL thread), you're trying to 
> safely shutdown the whole application by your signal handler.

No, the app is using DPDK pthreads, and trying to shutdown everything safely 
and cleanly w/ its signal handler, across DPDK and many other services in the 
app.

Unfortunately, right now from my experience it is impossible to get everything to 
cleanly shutdown, one an interrupt thread is activated. Because interrupt 
threads violate violate POSIX semantics:

1) It ignores EINTR and immediately forcibly restarts a poll() syscall. If the 
signal is delivered to the interrupt thread of the process by the kernel, this 
makes the thread uninterruptible to process the signal. Stuck running forever.

2) It does not properly set PTHREAD_CREATE_DETACHED for a background thread. 
So it holds the process open for its infinite loop of poll(). Stuck running 
forever.

3) There is no way to access the thread_id from intr_thread. So then you can't 
call pthread_cancel on it to shut it down. Stuck running forever.

> For this purpose, the device shall close safely(turn off intr) during the 
> time, intr thread still wait but no event will be raised.

In theory yes. In practice no. Because the intr thread violated POSIX rules 
for background processing threads per above.

> In this view, it seems not necessary to have this new. Can you explain more 
> detail for the purpose?

Based on my testing, I disagree. I could not get reliable shutdowns without 
this, or I wouldn't have coded it. (:

Matthew.
  
Cunming Liang March 21, 2016, 7:58 a.m. UTC | #5
Hi Matthew,

On 3/18/2016 6:55 AM, Matthew Hall wrote:
>  From Cunming:
>> I'm trying to understand the motivation.
>>
>> I don't think you're going to gracefully exit intr thread but leave all
>> other eal threads live. We don't have API to new launch intr thread again.
> The doc comment added for rte_eal_intr_exit already explains this. According
> to the doc I wrote, use of the function is limited to shutting everything
> down.
>
>> So I guess your app is using own pthread(none EAL thread), you're trying to
>> safely shutdown the whole application by your signal handler.
> No, the app is using DPDK pthreads, and trying to shutdown everything safely
> and cleanly w/ its signal handler, across DPDK and many other services in the
> app.
Get you. You don't satisfy with the default termination signal 
handler(SIG_DEL). The purpose is to safely clean everything by 
self-defined signal handler. Can you share us more of your observation 
on why the default termination handler is not enough/safe? As some of 
the samples are using it to terminate app, your concern may be necessary 
to apply on them as well.

> Unfortunately, right now from my experience it is impossible to get everything to
> cleanly shutdown, one an interrupt thread is activated. Because interrupt
> threads violate violate POSIX semantics:
>
> 1) It ignores EINTR and immediately forcibly restarts a poll() syscall. If the
> signal is delivered to the interrupt thread of the process by the kernel, this
> makes the thread uninterruptible to process the signal. Stuck running forever.
If EINTR is caused by some non-term purpose signals, are you going to 
exit the interrupt thread any way?

> 2) It does not properly set PTHREAD_CREATE_DETACHED for a background thread.
> So it holds the process open for its infinite loop of poll(). Stuck running
> forever.
Without setting 'PTHREAD_CREATE_DETACHED' won't cause the infinite loop. 
However by using pthread_cancel to terminate the thread, indeed it's 
necessary to set 'PTHREAD_CREATE_DETACHED'.

> 3) There is no way to access the thread_id from intr_thread. So then you can't
> call pthread_cancel on it to shut it down. Stuck running forever.
It looks like 'pthread_cancel' is the right way and I saw it continue 
keeps current EINTR handling in EAL interrupt thread.

>> For this purpose, the device shall close safely(turn off intr) during the
>> time, intr thread still wait but no event will be raised.
> In theory yes. In practice no. Because the intr thread violated POSIX rules
> for background processing threads per above.
>
>> In this view, it seems not necessary to have this new. Can you explain more
>> detail for the purpose?
> Based on my testing, I disagree. I could not get reliable shutdowns without
> this, or I wouldn't have coded it. (:

Now it's clear to me, overall it's fine. Three additional comments.
1. Can you explain and add patch comments why default signal handler is 
not good enough to terminate app.
2. I propose to add addition comments on rte_epoll_wait() API 
description. For any signal, it causes an error return, user needs to 
handle.
3. Will you do a favorite to add 'PTHREAD_CREATE_DETACHED' to all EAL 
pthread too.

Cunming

> Matthew.
>
>
  
Matthew Hall March 22, 2016, 7:39 a.m. UTC | #6
On Mon, Mar 21, 2016 at 03:58:44PM +0800, Liang, Cunming wrote:
> the default termination handler

I am not so experienced with this "default termination handler". Can someone 
clarify what it is so I could comment better about it?

> If EINTR is caused by some non-term purpose signals, are you going
> to exit the interrupt thread any way?

We should discuss what makes sense here. I'm just trying to get some things 
working and finding EINTR was getting eaten and causing infinite looping.

> Without setting 'PTHREAD_CREATE_DETACHED' won't cause the infinite
> loop. However by using pthread_cancel to terminate the thread,
> indeed it's necessary to set 'PTHREAD_CREATE_DETACHED'.

My general understanding is that PTHREAD_CREATE_DETACHED should be used for 
any thread, which should not keep a process open by itself if it is executing, 
i.e. a "daemon thread". I believe the interrupt thread qualifies as such a 
thread if I have understood everything right (which is hard to promise when 
you only work in DPDK in spare time).

> It looks like 'pthread_cancel' is the right way and I saw it
> continue keeps current EINTR handling in EAL interrupt thread.

It is one option. Depending what makes the most sense.

> 1. Can you explain and add patch comments why default signal handler
> is not good enough to terminate app.

Yes if someone call tell me more about what it is so I can check it.

> 2. I propose to add addition comments on rte_epoll_wait() API
> description. For any signal, it causes an error return, user needs
> to handle.

Agreed.

> 3. Will you do a favorite to add 'PTHREAD_CREATE_DETACHED' to all
> EAL pthread too.

As a spare time developer I am a bit conservative about too large of a scope 
and messing with code for other threads or features I didn't personally use or 
test. This is because I don't have the same QA resources as Intel / 6WIND / 
etc.. Some help from a full time developer would be great here.

> Cunming

Matthew.
  
Cunming Liang March 23, 2016, 3:24 a.m. UTC | #7
Hi Mattew,

Thank you for your time.

On 3/22/2016 3:39 PM, Matthew Hall wrote:
> On Mon, Mar 21, 2016 at 03:58:44PM +0800, Liang, Cunming wrote:
>> the default termination handler
> I am not so experienced with this "default termination handler". Can someone
> clarify what it is so I could comment better about it?
For example, you're handling SIGINT. After finishing your necessary app 
cleanup, then 'signal(SIGINT, SIG_DFL); raise(SIGINT);'.
The default signal handler can terminate the interrupt thread.

>
>> If EINTR is caused by some non-term purpose signals, are you going
>> to exit the interrupt thread any way?
> We should discuss what makes sense here. I'm just trying to get some things
> working and finding EINTR was getting eaten and causing infinite looping.
SIGINT/SIGTERM causes EINTR return, while SIGUSR1 also can cause the 
EINTR return. For the dedicated EAL interrupt thread, it won't be 
expected to exit for all kinds of the cause.
On this view, I'm in favor of your patch which cancel the interrupt 
thread, but don't directly return by the EINTR.

>
>> Without setting 'PTHREAD_CREATE_DETACHED' won't cause the infinite
>> loop. However by using pthread_cancel to terminate the thread,
>> indeed it's necessary to set 'PTHREAD_CREATE_DETACHED'.
> My general understanding is that PTHREAD_CREATE_DETACHED should be used for
> any thread, which should not keep a process open by itself if it is executing,
> i.e. a "daemon thread". I believe the interrupt thread qualifies as such a
> thread if I have understood everything right (which is hard to promise when
> you only work in DPDK in spare time).
>
>> It looks like 'pthread_cancel' is the right way and I saw it
>> continue keeps current EINTR handling in EAL interrupt thread.
> It is one option. Depending what makes the most sense.
>
>> 1. Can you explain and add patch comments why default signal handler
>> is not good enough to terminate app.
> Yes if someone call tell me more about what it is so I can check it.
>
>> 2. I propose to add addition comments on rte_epoll_wait() API
>> description. For any signal, it causes an error return, user needs
>> to handle.
> Agreed.
>
>> 3. Will you do a favorite to add 'PTHREAD_CREATE_DETACHED' to all
>> EAL pthread too.
> As a spare time developer I am a bit conservative about too large of a scope
> and messing with code for other threads or features I didn't personally use or
> test. This is because I don't have the same QA resources as Intel / 6WIND /
> etc.. Some help from a full time developer would be great here.
All right, reasonable to me.

>
>> Cunming
> Matthew.
  
Thomas Monjalon July 8, 2016, 5:36 p.m. UTC | #8
Cunming, what is the status of this patchset, please?

2016-03-23 11:24, Liang, Cunming:
> Hi Mattew,
> 
> Thank you for your time.
> 
> On 3/22/2016 3:39 PM, Matthew Hall wrote:
> > On Mon, Mar 21, 2016 at 03:58:44PM +0800, Liang, Cunming wrote:
> >> the default termination handler
> > I am not so experienced with this "default termination handler". Can someone
> > clarify what it is so I could comment better about it?
> For example, you're handling SIGINT. After finishing your necessary app 
> cleanup, then 'signal(SIGINT, SIG_DFL); raise(SIGINT);'.
> The default signal handler can terminate the interrupt thread.
> 
> >
> >> If EINTR is caused by some non-term purpose signals, are you going
> >> to exit the interrupt thread any way?
> > We should discuss what makes sense here. I'm just trying to get some things
> > working and finding EINTR was getting eaten and causing infinite looping.
> SIGINT/SIGTERM causes EINTR return, while SIGUSR1 also can cause the 
> EINTR return. For the dedicated EAL interrupt thread, it won't be 
> expected to exit for all kinds of the cause.
> On this view, I'm in favor of your patch which cancel the interrupt 
> thread, but don't directly return by the EINTR.
> 
> >
> >> Without setting 'PTHREAD_CREATE_DETACHED' won't cause the infinite
> >> loop. However by using pthread_cancel to terminate the thread,
> >> indeed it's necessary to set 'PTHREAD_CREATE_DETACHED'.
> > My general understanding is that PTHREAD_CREATE_DETACHED should be used for
> > any thread, which should not keep a process open by itself if it is executing,
> > i.e. a "daemon thread". I believe the interrupt thread qualifies as such a
> > thread if I have understood everything right (which is hard to promise when
> > you only work in DPDK in spare time).
> >
> >> It looks like 'pthread_cancel' is the right way and I saw it
> >> continue keeps current EINTR handling in EAL interrupt thread.
> > It is one option. Depending what makes the most sense.
> >
> >> 1. Can you explain and add patch comments why default signal handler
> >> is not good enough to terminate app.
> > Yes if someone call tell me more about what it is so I can check it.
> >
> >> 2. I propose to add addition comments on rte_epoll_wait() API
> >> description. For any signal, it causes an error return, user needs
> >> to handle.
> > Agreed.
> >
> >> 3. Will you do a favorite to add 'PTHREAD_CREATE_DETACHED' to all
> >> EAL pthread too.
> > As a spare time developer I am a bit conservative about too large of a scope
> > and messing with code for other threads or features I didn't personally use or
> > test. This is because I don't have the same QA resources as Intel / 6WIND /
> > etc.. Some help from a full time developer would be great here.
> All right, reasonable to me.
> 
> >
> >> Cunming
> > Matthew.
>
  
Cunming Liang July 11, 2016, 4:07 a.m. UTC | #9
Hi Thomas,

Base on the previous conversation, at least it requires v2 to reword some comments.

> > >> 2. I propose to add addition comments on rte_epoll_wait() API
> > >> description. For any signal, it causes an error return, user needs
> > >> to handle.
> > > Agreed.

In addition, one conversion is not close.

> > >> the default termination handler
> > > I am not so experienced with this "default termination handler". Can
> someone
> > > clarify what it is so I could comment better about it?
> > For example, you're handling SIGINT. After finishing your necessary app
> > cleanup, then 'signal(SIGINT, SIG_DFL); raise(SIGINT);'.
> > The default signal handler can terminate the interrupt thread.

> > >> 1. Can you explain and add patch comments why default signal handler
> > >> is not good enough to terminate app.
> > > Yes if someone call tell me more about what it is so I can check it.

Thanks,
Cunming

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Saturday, July 09, 2016 1:36 AM
> To: Liang, Cunming <cunming.liang@intel.com>
> Cc: Matthew Hall <mhall@mhcomputing.net>; dev@dpdk.org
> Subject: Re: [dpdk-dev,1/3] rte_interrupts: add rte_eal_intr_exit to shut down
> IRQ thread
> 
> Cunming, what is the status of this patchset, please?
> 
> 2016-03-23 11:24, Liang, Cunming:
> > Hi Mattew,
> >
> > Thank you for your time.
> >
> > On 3/22/2016 3:39 PM, Matthew Hall wrote:
> > > On Mon, Mar 21, 2016 at 03:58:44PM +0800, Liang, Cunming wrote:
> > >> the default termination handler
> > > I am not so experienced with this "default termination handler". Can
> someone
> > > clarify what it is so I could comment better about it?
> > For example, you're handling SIGINT. After finishing your necessary app
> > cleanup, then 'signal(SIGINT, SIG_DFL); raise(SIGINT);'.
> > The default signal handler can terminate the interrupt thread.
> >
> > >
> > >> If EINTR is caused by some non-term purpose signals, are you going
> > >> to exit the interrupt thread any way?
> > > We should discuss what makes sense here. I'm just trying to get some things
> > > working and finding EINTR was getting eaten and causing infinite looping.
> > SIGINT/SIGTERM causes EINTR return, while SIGUSR1 also can cause the
> > EINTR return. For the dedicated EAL interrupt thread, it won't be
> > expected to exit for all kinds of the cause.
> > On this view, I'm in favor of your patch which cancel the interrupt
> > thread, but don't directly return by the EINTR.
> >
> > >
> > >> Without setting 'PTHREAD_CREATE_DETACHED' won't cause the infinite
> > >> loop. However by using pthread_cancel to terminate the thread,
> > >> indeed it's necessary to set 'PTHREAD_CREATE_DETACHED'.
> > > My general understanding is that PTHREAD_CREATE_DETACHED should be
> used for
> > > any thread, which should not keep a process open by itself if it is executing,
> > > i.e. a "daemon thread". I believe the interrupt thread qualifies as such a
> > > thread if I have understood everything right (which is hard to promise when
> > > you only work in DPDK in spare time).
> > >
> > >> It looks like 'pthread_cancel' is the right way and I saw it
> > >> continue keeps current EINTR handling in EAL interrupt thread.
> > > It is one option. Depending what makes the most sense.
> > >
> > >> 1. Can you explain and add patch comments why default signal handler
> > >> is not good enough to terminate app.
> > > Yes if someone call tell me more about what it is so I can check it.
> > >
> > >> 2. I propose to add addition comments on rte_epoll_wait() API
> > >> description. For any signal, it causes an error return, user needs
> > >> to handle.
> > > Agreed.
> > >
> > >> 3. Will you do a favorite to add 'PTHREAD_CREATE_DETACHED' to all
> > >> EAL pthread too.
> > > As a spare time developer I am a bit conservative about too large of a scope
> > > and messing with code for other threads or features I didn't personally use or
> > > test. This is because I don't have the same QA resources as Intel / 6WIND /
> > > etc.. Some help from a full time developer would be great here.
> > All right, reasonable to me.
> >
> > >
> > >> Cunming
> > > Matthew.
> >
>
  

Patch

diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index d2816a8..1533eeb 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -165,6 +165,15 @@  int rte_eal_init(int argc, char **argv);
 typedef void	(*rte_usage_hook_t)(const char * prgname);
 
 /**
+ * Shut down the EAL interrupt thread.
+ *
+ * This function can be called from a signal handler during application
+ * shutdown.
+ *
+ */
+int rte_eal_intr_exit(void);
+
+/**
  * Add application usage routine callout from the eal_usage() routine.
  *
  * This function allows the application to include its usage message
diff --git a/lib/librte_eal/linuxapp/eal/eal_interrupts.c b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
index b33ccdb..aa332a1 100644
--- a/lib/librte_eal/linuxapp/eal/eal_interrupts.c
+++ b/lib/librte_eal/linuxapp/eal/eal_interrupts.c
@@ -892,6 +892,17 @@  rte_eal_intr_init(void)
 		if (ret_1 != 0)
 			RTE_LOG(ERR, EAL,
 			"Failed to set thread name for interrupt handling\n");
+
+int
+rte_eal_intr_exit(void)
+{
+	int ret = 0;
+
+	ret = pthread_cancel(intr_thread);
+	if (ret != 0) {
+		RTE_LOG(ERR, EAL,
+			"Failed to cancel thread for interrupt handling\n");
+		return -ret;
 	}
 
 	return -ret;