[3/3] eal/windows: cleanup interrupt resources

Message ID 20210502023333.30351-3-dmitry.kozliuk@gmail.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers
Series [1/3] eal/windows: fix use of incorrect thread ID |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-testing success Testing PASS
ci/github-robot success github build: passed
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS

Commit Message

Dmitry Kozlyuk May 2, 2021, 2:33 a.m. UTC
  Interrupt manager in Windows EAL allocates on IOCP and starts
a control thread that runs indefinitely. At DPDK cleanup
this thread was not stopped and IOCP handle was not closed.

Gracefully stop interrupt-handling in rte_eal_cleanup().
The thread already closes IOCP handle before exiting.

Fixes: 5c016fc0205a ("eal/windows: add interrupt thread skeleton")
Cc: stable@dpdk.org

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
---
 lib/eal/windows/eal.c            |  1 +
 lib/eal/windows/eal_interrupts.c | 26 ++++++++++++++++++++++++--
 lib/eal/windows/eal_windows.h    |  5 +++++
 3 files changed, 30 insertions(+), 2 deletions(-)
  

Comments

Thomas Monjalon May 11, 2021, 7:41 a.m. UTC | #1
02/05/2021 04:33, Dmitry Kozlyuk:
> Interrupt manager in Windows EAL allocates on IOCP and starts
> a control thread that runs indefinitely. At DPDK cleanup
> this thread was not stopped and IOCP handle was not closed.
> 
> Gracefully stop interrupt-handling in rte_eal_cleanup().
> The thread already closes IOCP handle before exiting.
> 
> Fixes: 5c016fc0205a ("eal/windows: add interrupt thread skeleton")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> ---
>  lib/eal/windows/eal.c            |  1 +
>  lib/eal/windows/eal_interrupts.c | 26 ++++++++++++++++++++++++--
>  lib/eal/windows/eal_windows.h    |  5 +++++
>  3 files changed, 30 insertions(+), 2 deletions(-)

It seems nobody reviewed.
To be on the safe side, I'll merge this series after DPDK 21.05 is released.
Or am I missing any critical issue?
  
Dmitry Kozlyuk May 11, 2021, 7:59 a.m. UTC | #2
2021-05-11 09:41 (UTC+0200), Thomas Monjalon:
> 02/05/2021 04:33, Dmitry Kozlyuk:
> > Interrupt manager in Windows EAL allocates on IOCP and starts
> > a control thread that runs indefinitely. At DPDK cleanup
> > this thread was not stopped and IOCP handle was not closed.
> > 
> > Gracefully stop interrupt-handling in rte_eal_cleanup().
> > The thread already closes IOCP handle before exiting.
> > 
> > Fixes: 5c016fc0205a ("eal/windows: add interrupt thread skeleton")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> > ---
> >  lib/eal/windows/eal.c            |  1 +
> >  lib/eal/windows/eal_interrupts.c | 26 ++++++++++++++++++++++++--
> >  lib/eal/windows/eal_windows.h    |  5 +++++
> >  3 files changed, 30 insertions(+), 2 deletions(-)  
> 
> It seems nobody reviewed.
> To be on the safe side, I'll merge this series after DPDK 21.05 is released.
> Or am I missing any critical issue?

IIRC Windows DPDK is not shipped anywhere yet, so the fix can be postponed.

Without fix in 2/3 rte_eal_alarm_set() will start failing after some
thousands of calls (i40e calls every 50 ms, mlx5 call every 1 sec or less).
For mlx5 it seems to break flow counters (mlx5_flow_query_alarm function).
  
Menon, Ranjit May 11, 2021, 5:21 p.m. UTC | #3
On 5/11/2021 12:59 AM, Dmitry Kozlyuk wrote:
> 2021-05-11 09:41 (UTC+0200), Thomas Monjalon:
>> 02/05/2021 04:33, Dmitry Kozlyuk:
>>> Interrupt manager in Windows EAL allocates on IOCP and starts
>>> a control thread that runs indefinitely. At DPDK cleanup
>>> this thread was not stopped and IOCP handle was not closed.
>>>
>>> Gracefully stop interrupt-handling in rte_eal_cleanup().
>>> The thread already closes IOCP handle before exiting.
>>>
>>> Fixes: 5c016fc0205a ("eal/windows: add interrupt thread skeleton")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
>>> ---
>>>   lib/eal/windows/eal.c            |  1 +
>>>   lib/eal/windows/eal_interrupts.c | 26 ++++++++++++++++++++++++--
>>>   lib/eal/windows/eal_windows.h    |  5 +++++
>>>   3 files changed, 30 insertions(+), 2 deletions(-)
>> It seems nobody reviewed.
>> To be on the safe side, I'll merge this series after DPDK 21.05 is released.
>> Or am I missing any critical issue?
> IIRC Windows DPDK is not shipped anywhere yet, so the fix can be postponed.
>
> Without fix in 2/3 rte_eal_alarm_set() will start failing after some
> thousands of calls (i40e calls every 50 ms, mlx5 call every 1 sec or less).
> For mlx5 it seems to break flow counters (mlx5_flow_query_alarm function).

It appears that Tyler reviewed and ack-ed this. I'll add my ACK too. If 
we can get this in to 21.05, it would be great.

ranjit m.
  
Menon, Ranjit May 11, 2021, 5:24 p.m. UTC | #4
On 5/1/2021 7:33 PM, Dmitry Kozlyuk wrote:
> Interrupt manager in Windows EAL allocates on IOCP and starts
> a control thread that runs indefinitely. At DPDK cleanup
> this thread was not stopped and IOCP handle was not closed.
>
> Gracefully stop interrupt-handling in rte_eal_cleanup().
> The thread already closes IOCP handle before exiting.
>
> Fixes: 5c016fc0205a ("eal/windows: add interrupt thread skeleton")
> Cc: stable@dpdk.org
>
> Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> ---
>   lib/eal/windows/eal.c            |  1 +
>   lib/eal/windows/eal_interrupts.c | 26 ++++++++++++++++++++++++--
>   lib/eal/windows/eal_windows.h    |  5 +++++
>   3 files changed, 30 insertions(+), 2 deletions(-)
>
> diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c
> index 28c787c0b0..25afc42f8e 100644
> --- a/lib/eal/windows/eal.c
> +++ b/lib/eal/windows/eal.c
> @@ -258,6 +258,7 @@ rte_eal_cleanup(void)
>   {
>   	struct internal_config *internal_conf =
>   		eal_get_internal_configuration();
> +	eal_intr_thread_cancel();
A blank line above this line would be nice.
>   	/* after this point, any DPDK pointers will become dangling */
>   	rte_eal_memory_detach();
>   	eal_cleanup_config(internal_conf);
> diff --git a/lib/eal/windows/eal_interrupts.c b/lib/eal/windows/eal_interrupts.c
> index f24ed6e54e..bb0585cb34 100644
> --- a/lib/eal/windows/eal_interrupts.c
> +++ b/lib/eal/windows/eal_interrupts.c
> @@ -7,6 +7,8 @@
>   #include "eal_private.h"
>   #include "eal_windows.h"
>   
> +#define IOCP_KEY_SHUTDOWN UINT32_MAX
> +
>   static pthread_t intr_thread;
>   
>   static HANDLE intr_iocp;
> @@ -34,12 +36,14 @@ eal_intr_thread_handle_init(void)
>   static void *
>   eal_intr_thread_main(LPVOID arg __rte_unused)
>   {
> +	bool finished = false;
> +
>   	if (eal_intr_thread_handle_init() < 0) {
>   		RTE_LOG(ERR, EAL, "Cannot open interrupt thread handle\n");
>   		goto cleanup;
>   	}
>   
> -	while (1) {
> +	while (!finished) {
>   		OVERLAPPED_ENTRY events[16];
>   		ULONG event_count, i;
>   		BOOL result;
> @@ -61,8 +65,13 @@ eal_intr_thread_main(LPVOID arg __rte_unused)
>   			continue;
>   		}
>   
> -		for (i = 0; i < event_count; i++)
> +		for (i = 0; i < event_count; i++) {
> +			if (events[i].lpCompletionKey == IOCP_KEY_SHUTDOWN) {
> +				finished = true;
> +				break;
> +			}
>   			eal_intr_process(&events[i]);
> +		}
>   	}
>   
>   	CloseHandle(intr_thread_handle);
> @@ -125,6 +134,19 @@ eal_intr_thread_schedule(void (*func)(void *arg), void *arg)
>   	return 0;
>   }
>   
> +void
> +eal_intr_thread_cancel(void)
> +{
> +	if (!PostQueuedCompletionStatus(
> +			intr_iocp, 0, IOCP_KEY_SHUTDOWN, NULL)) {
> +		RTE_LOG_WIN32_ERR("PostQueuedCompletionStatus()");
> +		RTE_LOG(ERR, EAL, "Cannot cancel interrupt thread\n");
> +		return;
> +	}
> +
> +	WaitForSingleObject(intr_thread_handle, INFINITE);
> +}
> +
>   int
>   rte_intr_callback_register(
>   	__rte_unused const struct rte_intr_handle *intr_handle,
> diff --git a/lib/eal/windows/eal_windows.h b/lib/eal/windows/eal_windows.h
> index 478accc1b9..7cc811485d 100644
> --- a/lib/eal/windows/eal_windows.h
> +++ b/lib/eal/windows/eal_windows.h
> @@ -67,6 +67,11 @@ unsigned int eal_socket_numa_node(unsigned int socket_id);
>    */
>   int eal_intr_thread_schedule(void (*func)(void *arg), void *arg);
>   
> +/**
> + * Request interrupt thread to stop and wait its termination.
> + */
> +void eal_intr_thread_cancel(void);
> +
>   /**
>    * Open virt2phys driver interface device.
>    *

Other than nit above,

Acked-by: Ranjit Menon <ranjit.menon@intel.com>
  
Thomas Monjalon May 12, 2021, 2:56 p.m. UTC | #5
11/05/2021 19:21, Ranjit Menon:
> On 5/11/2021 12:59 AM, Dmitry Kozlyuk wrote:
> > 2021-05-11 09:41 (UTC+0200), Thomas Monjalon:
> >> 02/05/2021 04:33, Dmitry Kozlyuk:
> >>> Interrupt manager in Windows EAL allocates on IOCP and starts
> >>> a control thread that runs indefinitely. At DPDK cleanup
> >>> this thread was not stopped and IOCP handle was not closed.
> >>>
> >>> Gracefully stop interrupt-handling in rte_eal_cleanup().
> >>> The thread already closes IOCP handle before exiting.
> >>>
> >>> Fixes: 5c016fc0205a ("eal/windows: add interrupt thread skeleton")
> >>> Cc: stable@dpdk.org
> >>>
> >>> Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> >>> ---
> >>>   lib/eal/windows/eal.c            |  1 +
> >>>   lib/eal/windows/eal_interrupts.c | 26 ++++++++++++++++++++++++--
> >>>   lib/eal/windows/eal_windows.h    |  5 +++++
> >>>   3 files changed, 30 insertions(+), 2 deletions(-)
> >> It seems nobody reviewed.
> >> To be on the safe side, I'll merge this series after DPDK 21.05 is released.
> >> Or am I missing any critical issue?
> > IIRC Windows DPDK is not shipped anywhere yet, so the fix can be postponed.
> >
> > Without fix in 2/3 rte_eal_alarm_set() will start failing after some
> > thousands of calls (i40e calls every 50 ms, mlx5 call every 1 sec or less).
> > For mlx5 it seems to break flow counters (mlx5_flow_query_alarm function).
> 
> It appears that Tyler reviewed and ack-ed this. I'll add my ACK too. If 
> we can get this in to 21.05, it would be great.

Tyler acked only the patch 1.

It would be good to have tests with mlx5 and i40e for the patch 2.
  
Jie Zhou May 28, 2021, 5:33 p.m. UTC | #6
On Sun, May 02, 2021 at 05:33:33AM +0300, Dmitry Kozlyuk wrote:
> Interrupt manager in Windows EAL allocates on IOCP and starts
> a control thread that runs indefinitely. At DPDK cleanup
> this thread was not stopped and IOCP handle was not closed.
> 
> Gracefully stop interrupt-handling in rte_eal_cleanup().
> The thread already closes IOCP handle before exiting.
> 
> Fixes: 5c016fc0205a ("eal/windows: add interrupt thread skeleton")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>

Acked-by: Jie Zhou <jizh@microsoft.com>
Tested-by: Jie Zhou <jizh@microsoft.com>

> ---
>  lib/eal/windows/eal.c            |  1 +
>  lib/eal/windows/eal_interrupts.c | 26 ++++++++++++++++++++++++--
>  lib/eal/windows/eal_windows.h    |  5 +++++
>  3 files changed, 30 insertions(+), 2 deletions(-)

Enabled a subset of Unit test on Windows and when running alarm_autotest, system hang at rte_eal_alarm_set. After applying this patch set, no repro any more. Also system hang at pmd_perf_autotest and no repro with the patch. It is with Intel i40e.
  
Thomas Monjalon June 23, 2021, 7:08 a.m. UTC | #7
28/05/2021 19:33, Jie Zhou:
> On Sun, May 02, 2021 at 05:33:33AM +0300, Dmitry Kozlyuk wrote:
> > Interrupt manager in Windows EAL allocates on IOCP and starts
> > a control thread that runs indefinitely. At DPDK cleanup
> > this thread was not stopped and IOCP handle was not closed.
> > 
> > Gracefully stop interrupt-handling in rte_eal_cleanup().
> > The thread already closes IOCP handle before exiting.
> > 
> > Fixes: 5c016fc0205a ("eal/windows: add interrupt thread skeleton")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
> 
> Acked-by: Jie Zhou <jizh@microsoft.com>
> Tested-by: Jie Zhou <jizh@microsoft.com>

Series applied, thanks.
  

Patch

diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c
index 28c787c0b0..25afc42f8e 100644
--- a/lib/eal/windows/eal.c
+++ b/lib/eal/windows/eal.c
@@ -258,6 +258,7 @@  rte_eal_cleanup(void)
 {
 	struct internal_config *internal_conf =
 		eal_get_internal_configuration();
+	eal_intr_thread_cancel();
 	/* after this point, any DPDK pointers will become dangling */
 	rte_eal_memory_detach();
 	eal_cleanup_config(internal_conf);
diff --git a/lib/eal/windows/eal_interrupts.c b/lib/eal/windows/eal_interrupts.c
index f24ed6e54e..bb0585cb34 100644
--- a/lib/eal/windows/eal_interrupts.c
+++ b/lib/eal/windows/eal_interrupts.c
@@ -7,6 +7,8 @@ 
 #include "eal_private.h"
 #include "eal_windows.h"
 
+#define IOCP_KEY_SHUTDOWN UINT32_MAX
+
 static pthread_t intr_thread;
 
 static HANDLE intr_iocp;
@@ -34,12 +36,14 @@  eal_intr_thread_handle_init(void)
 static void *
 eal_intr_thread_main(LPVOID arg __rte_unused)
 {
+	bool finished = false;
+
 	if (eal_intr_thread_handle_init() < 0) {
 		RTE_LOG(ERR, EAL, "Cannot open interrupt thread handle\n");
 		goto cleanup;
 	}
 
-	while (1) {
+	while (!finished) {
 		OVERLAPPED_ENTRY events[16];
 		ULONG event_count, i;
 		BOOL result;
@@ -61,8 +65,13 @@  eal_intr_thread_main(LPVOID arg __rte_unused)
 			continue;
 		}
 
-		for (i = 0; i < event_count; i++)
+		for (i = 0; i < event_count; i++) {
+			if (events[i].lpCompletionKey == IOCP_KEY_SHUTDOWN) {
+				finished = true;
+				break;
+			}
 			eal_intr_process(&events[i]);
+		}
 	}
 
 	CloseHandle(intr_thread_handle);
@@ -125,6 +134,19 @@  eal_intr_thread_schedule(void (*func)(void *arg), void *arg)
 	return 0;
 }
 
+void
+eal_intr_thread_cancel(void)
+{
+	if (!PostQueuedCompletionStatus(
+			intr_iocp, 0, IOCP_KEY_SHUTDOWN, NULL)) {
+		RTE_LOG_WIN32_ERR("PostQueuedCompletionStatus()");
+		RTE_LOG(ERR, EAL, "Cannot cancel interrupt thread\n");
+		return;
+	}
+
+	WaitForSingleObject(intr_thread_handle, INFINITE);
+}
+
 int
 rte_intr_callback_register(
 	__rte_unused const struct rte_intr_handle *intr_handle,
diff --git a/lib/eal/windows/eal_windows.h b/lib/eal/windows/eal_windows.h
index 478accc1b9..7cc811485d 100644
--- a/lib/eal/windows/eal_windows.h
+++ b/lib/eal/windows/eal_windows.h
@@ -67,6 +67,11 @@  unsigned int eal_socket_numa_node(unsigned int socket_id);
  */
 int eal_intr_thread_schedule(void (*func)(void *arg), void *arg);
 
+/**
+ * Request interrupt thread to stop and wait its termination.
+ */
+void eal_intr_thread_cancel(void);
+
 /**
  * Open virt2phys driver interface device.
  *