[v5,09/10] eal: add EAL argument for setting thread priority

Message ID 1617057640-24301-10-git-send-email-navasile@linux.microsoft.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series eal: Add new API for threading |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Narcisa Ana Maria Vasile March 29, 2021, 10:40 p.m. UTC
  From: Narcisa Vasile <navasile@microsoft.com>

Allow the user to choose the thread priority through an EAL
command line argument.

The user can select the thread priority to be either 'normal'
or 'critical':
--thread-prio normal
--thread-prio realtime

Signed-off-by: Narcisa Vasile <navasile@microsoft.com>
---
 lib/librte_eal/common/eal_common_options.c | 28 +++++++++++++++++++++-
 lib/librte_eal/common/eal_internal_cfg.h   |  2 ++
 lib/librte_eal/common/eal_options.h        |  2 ++
 3 files changed, 31 insertions(+), 1 deletion(-)
  

Comments

Stephen Hemminger March 30, 2021, 9:11 p.m. UTC | #1
On Mon, 29 Mar 2021 15:40:39 -0700
Narcisa Ana Maria Vasile <navasile@linux.microsoft.com> wrote:

> From: Narcisa Vasile <navasile@microsoft.com>
> 
> Allow the user to choose the thread priority through an EAL
> command line argument.
> 
> The user can select the thread priority to be either 'normal'
> or 'critical':
> --thread-prio normal
> --thread-prio realtime
> 
> Signed-off-by: Narcisa Vasile <navasile@microsoft.com>

The discussion internally was that this was intended to resolve issues on Windows.
So it makes sense for Windows, but it is not something that we want to have on Linux.
Could you make this Windows only, and add update the documentation please.

I just don't want Linux users discovering it, trying it, then reporting more bugs.
  
Tal Shnaiderman March 31, 2021, 8:21 a.m. UTC | #2
> Subject: Re: [dpdk-dev] [PATCH v5 09/10] eal: add EAL argument for setting
> thread priority
> 
> External email: Use caution opening links or attachments
> 
> 
> On Mon, 29 Mar 2021 15:40:39 -0700
> Narcisa Ana Maria Vasile <navasile@linux.microsoft.com> wrote:
> 
> > From: Narcisa Vasile <navasile@microsoft.com>
> >
> > Allow the user to choose the thread priority through an EAL command
> > line argument.
> >
> > The user can select the thread priority to be either 'normal'
> > or 'critical':
> > --thread-prio normal
> > --thread-prio realtime
> >
> > Signed-off-by: Narcisa Vasile <navasile@microsoft.com>
> 
> The discussion internally was that this was intended to resolve issues on
> Windows.
> So it makes sense for Windows, but it is not something that we want to have
> on Linux.
> Could you make this Windows only, and add update the documentation
> please.
> 
> I just don't want Linux users discovering it, trying it, then reporting more
> bugs.

Windows needs it from performance aspects.

However if we're keeping this option for Windows a warning like the one below from MSDN [1] should be added to docu:

" A thread with a base priority level above 11 interferes with the normal operation of the operating system. Using REALTIME_PRIORITY_CLASS may cause disk caches to not flush, cause the mouse to stop responding, and so on. "

[1] - https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-setthreadpriority
  
Dmitry Kozlyuk March 31, 2021, 9:12 p.m. UTC | #3
2021-03-30 14:11 (UTC-0700), Stephen Hemminger:
> On Mon, 29 Mar 2021 15:40:39 -0700
> Narcisa Ana Maria Vasile <navasile@linux.microsoft.com> wrote:
> 
> > From: Narcisa Vasile <navasile@microsoft.com>
> > 
> > Allow the user to choose the thread priority through an EAL
> > command line argument.
> > 
> > The user can select the thread priority to be either 'normal'
> > or 'critical':
> > --thread-prio normal
> > --thread-prio realtime
> > 
> > Signed-off-by: Narcisa Vasile <navasile@microsoft.com>  
> 
> The discussion internally was that this was intended to resolve issues on Windows.
> So it makes sense for Windows, but it is not something that we want to have on Linux.
> Could you make this Windows only, and add update the documentation please.
> 
> I just don't want Linux users discovering it, trying it, then reporting more bugs.

Can you share more details of that discussion?
Is realtime-critical needed not for busy-polling apps (which indeed cause
starvation), but for interrupt-driven ones to process packets ASAP?

If it's true, then maybe NetUIO can instead give priority boost to these
threads when notifying them about interrupts (Omar? DmitryM?). This can be
configurable via devargs. One downside is that every kernel driver has to
support it, currently Mellanox bifurcated driver and NetUIO. But they will
need some interrupt-related IOCTLs anyway.
  
Stephen Hemminger March 31, 2021, 10:09 p.m. UTC | #4
On Thu, 1 Apr 2021 00:12:54 +0300
Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote:

> 2021-03-30 14:11 (UTC-0700), Stephen Hemminger:
> > On Mon, 29 Mar 2021 15:40:39 -0700
> > Narcisa Ana Maria Vasile <navasile@linux.microsoft.com> wrote:
> >   
> > > From: Narcisa Vasile <navasile@microsoft.com>
> > > 
> > > Allow the user to choose the thread priority through an EAL
> > > command line argument.
> > > 
> > > The user can select the thread priority to be either 'normal'
> > > or 'critical':
> > > --thread-prio normal
> > > --thread-prio realtime
> > > 
> > > Signed-off-by: Narcisa Vasile <navasile@microsoft.com>    
> > 
> > The discussion internally was that this was intended to resolve issues on Windows.
> > So it makes sense for Windows, but it is not something that we want to have on Linux.
> > Could you make this Windows only, and add update the documentation please.
> > 
> > I just don't want Linux users discovering it, trying it, then reporting more bugs.  
> 
> Can you share more details of that discussion?
> Is realtime-critical needed not for busy-polling apps (which indeed cause
> starvation), but for interrupt-driven ones to process packets ASAP?
> 
> If it's true, then maybe NetUIO can instead give priority boost to these
> threads when notifying them about interrupts (Omar? DmitryM?). This can be
> configurable via devargs. One downside is that every kernel driver has to
> support it, currently Mellanox bifurcated driver and NetUIO. But they will
> need some interrupt-related IOCTLs anyway.


A DPDK application typically has cores detected to polling for packets.
The temptation is to set those cores to have a real time scheduling policy
(SCHED_FIFO, or SCH_RR).  The problem is that those priorities run in preference
to required kernel functions. So the polling-for-packets threads will starve
out the Linux kernel RCU and softirq completion of I/O.  This starvation
will lead to memory loss (no RCU cleanup) and potential deadlocks (disk
I/O never completing).

It is possible to use real time priority on Linux but it requires lots of tuning
to make sure that the kernel never runs work queues, interrupts or soft irqs
on those cores. Lots of changes to /proc, kernel command line, and sysfs
tunables. Which is possible on embedded systems but not for general purpose
applications.

This is already a problem that shows up, but it only happens if the DPDK
application writer explcitly calls the setscheduler on those threads.
At that point, it is the case where the user has started to manipulate
threads, and we have to assume they know the consequences and are ready
to deal with them.

On Windows, the situation is different so yes, this is necessary.
  
Dmitry Malloy March 31, 2021, 10:42 p.m. UTC | #5
The internal discussion was about the fact that this EAL parameter is:

a) optional
b) modifies default behavior (which is different on Windows or on Linux)

Unless admin decides to use this option - no one is regressed. This patch is not forcing a change in default behavior. It gives the admin a choice to change the priority. From this perspective - is see no problem with it.

Thanks,
Dmitry

-----Original Message-----
From: Stephen Hemminger <stephen@networkplumber.org> 
Sent: Wednesday, March 31, 2021 3:09 PM
To: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Cc: Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>; dev@dpdk.org; thomas <thomas@monjalon.net>; Khoa To <khot@microsoft.com>; Narcisa Ana Maria Vasile <Narcisa.Vasile@microsoft.com>; Dmitry Malloy <dmitrym@microsoft.com>; Tyler Retzlaff <roretzla@microsoft.com>; talshn@nvidia.com; Omar Cardona <ocardona@microsoft.com>; bruce.richardson@intel.com; david.marchand@redhat.com; Kadam, Pallavi <pallavi.kadam@intel.com>
Subject: [EXTERNAL] Re: [dpdk-dev] [PATCH v5 09/10] eal: add EAL argument for setting thread priority

On Thu, 1 Apr 2021 00:12:54 +0300
Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote:

> 2021-03-30 14:11 (UTC-0700), Stephen Hemminger:
> > On Mon, 29 Mar 2021 15:40:39 -0700
> > Narcisa Ana Maria Vasile <navasile@linux.microsoft.com> wrote:
> >   
> > > From: Narcisa Vasile <navasile@microsoft.com>
> > > 
> > > Allow the user to choose the thread priority through an EAL 
> > > command line argument.
> > > 
> > > The user can select the thread priority to be either 'normal'
> > > or 'critical':
> > > --thread-prio normal
> > > --thread-prio realtime
> > > 
> > > Signed-off-by: Narcisa Vasile <navasile@microsoft.com>    
> > 
> > The discussion internally was that this was intended to resolve issues on Windows.
> > So it makes sense for Windows, but it is not something that we want to have on Linux.
> > Could you make this Windows only, and add update the documentation please.
> > 
> > I just don't want Linux users discovering it, trying it, then reporting more bugs.  
> 
> Can you share more details of that discussion?
> Is realtime-critical needed not for busy-polling apps (which indeed 
> cause starvation), but for interrupt-driven ones to process packets ASAP?
> 
> If it's true, then maybe NetUIO can instead give priority boost to 
> these threads when notifying them about interrupts (Omar? DmitryM?). 
> This can be configurable via devargs. One downside is that every 
> kernel driver has to support it, currently Mellanox bifurcated driver 
> and NetUIO. But they will need some interrupt-related IOCTLs anyway.


A DPDK application typically has cores detected to polling for packets.
The temptation is to set those cores to have a real time scheduling policy (SCHED_FIFO, or SCH_RR).  The problem is that those priorities run in preference to required kernel functions. So the polling-for-packets threads will starve out the Linux kernel RCU and softirq completion of I/O.  This starvation will lead to memory loss (no RCU cleanup) and potential deadlocks (disk I/O never completing).

It is possible to use real time priority on Linux but it requires lots of tuning to make sure that the kernel never runs work queues, interrupts or soft irqs on those cores. Lots of changes to /proc, kernel command line, and sysfs tunables. Which is possible on embedded systems but not for general purpose applications.

This is already a problem that shows up, but it only happens if the DPDK application writer explcitly calls the setscheduler on those threads.
At that point, it is the case where the user has started to manipulate threads, and we have to assume they know the consequences and are ready to deal with them.

On Windows, the situation is different so yes, this is necessary.
  

Patch

diff --git a/lib/librte_eal/common/eal_common_options.c b/lib/librte_eal/common/eal_common_options.c
index 622c7bc42..287a89a75 100644
--- a/lib/librte_eal/common/eal_common_options.c
+++ b/lib/librte_eal/common/eal_common_options.c
@@ -106,6 +106,7 @@  eal_long_options[] = {
 	{OPT_TELEMETRY,         0, NULL, OPT_TELEMETRY_NUM        },
 	{OPT_NO_TELEMETRY,      0, NULL, OPT_NO_TELEMETRY_NUM     },
 	{OPT_FORCE_MAX_SIMD_BITWIDTH, 1, NULL, OPT_FORCE_MAX_SIMD_BITWIDTH_NUM},
+	{OPT_THREAD_PRIORITY,   1, NULL, OPT_THREAD_PRIORITY_NUM},
 
 	/* legacy options that will be removed in future */
 	{OPT_PCI_BLACKLIST,     1, NULL, OPT_PCI_BLACKLIST_NUM    },
@@ -1383,6 +1384,24 @@  eal_parse_simd_bitwidth(const char *arg)
 	return 0;
 }
 
+static int
+eal_parse_thread_priority(const char *arg)
+{
+	struct internal_config *internal_conf =
+		eal_get_internal_configuration();
+	enum rte_thread_priority priority;
+
+	if (!strncmp("normal", arg, sizeof("normal")))
+		priority = RTE_THREAD_PRIORITY_NORMAL;
+	else if (!strncmp("realtime", arg, sizeof("realtime")))
+		priority = RTE_THREAD_PRIORITY_REALTIME_CRITICAL;
+	else
+		return -1;
+
+	internal_conf->thread_priority = priority;
+	return 0;
+}
+
 static int
 eal_parse_base_virtaddr(const char *arg)
 {
@@ -1796,7 +1815,13 @@  eal_parse_common_option(int opt, const char *optarg,
 			return -1;
 		}
 		break;
-
+	case OPT_THREAD_PRIORITY_NUM:
+		if (eal_parse_thread_priority(optarg) < 0) {
+			RTE_LOG(ERR, EAL, "invalid parameter for --"
+					OPT_THREAD_PRIORITY "\n");
+			return -1;
+		}
+		break;
 	/* don't know what to do, leave this to caller */
 	default:
 		return 1;
@@ -2059,6 +2084,7 @@  eal_common_usage(void)
 	       "                      (can be used multiple times)\n"
 	       "  --"OPT_VMWARE_TSC_MAP"    Use VMware TSC map instead of native RDTSC\n"
 	       "  --"OPT_PROC_TYPE"         Type of this process (primary|secondary|auto)\n"
+	       "  --"OPT_THREAD_PRIORITY"   Set threads priority (normal|realtime)\n"
 #ifndef RTE_EXEC_ENV_WINDOWS
 	       "  --"OPT_SYSLOG"            Set syslog facility\n"
 #endif
diff --git a/lib/librte_eal/common/eal_internal_cfg.h b/lib/librte_eal/common/eal_internal_cfg.h
index 51dbe86e2..7ab1d0008 100644
--- a/lib/librte_eal/common/eal_internal_cfg.h
+++ b/lib/librte_eal/common/eal_internal_cfg.h
@@ -93,6 +93,8 @@  struct internal_config {
 	unsigned int no_telemetry; /**< true to disable Telemetry */
 	struct simd_bitwidth max_simd_bitwidth;
 	/**< max simd bitwidth path to use */
+	enum rte_thread_priority thread_priority;
+	/**< thread priority to configure */
 };
 
 void eal_reset_internal_config(struct internal_config *internal_cfg);
diff --git a/lib/librte_eal/common/eal_options.h b/lib/librte_eal/common/eal_options.h
index 7b348e707..9f5b209f6 100644
--- a/lib/librte_eal/common/eal_options.h
+++ b/lib/librte_eal/common/eal_options.h
@@ -93,6 +93,8 @@  enum {
 	OPT_NO_TELEMETRY_NUM,
 #define OPT_FORCE_MAX_SIMD_BITWIDTH  "force-max-simd-bitwidth"
 	OPT_FORCE_MAX_SIMD_BITWIDTH_NUM,
+#define OPT_THREAD_PRIORITY          "thread-prio"
+	OPT_THREAD_PRIORITY_NUM,
 
 	/* legacy option that will be removed in future */
 #define OPT_PCI_BLACKLIST     "pci-blacklist"