[v2] kni: allow configuring the kni thread granularity

Message ID 1635868273-69843-1-git-send-email-tudor.cornea@gmail.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [v2] kni: allow configuring the kni thread granularity |

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/github-robot: build success github build: passed
ci/iol-spell-check-testing warning Testing issues
ci/iol-mellanox-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-aarch64-unit-testing success Testing PASS
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-x86_64-compile-testing success Testing PASS
ci/iol-aarch64-compile-testing success Testing PASS
ci/iol-x86_64-unit-testing success Testing PASS

Commit Message

Tudor Cornea Nov. 2, 2021, 3:51 p.m. UTC
  The Kni kthreads seem to be re-scheduled at a granularity of roughly
1 millisecond right now, which seems to be insufficient for performing tests
involving a lot of control plane traffic.

Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it
seems that the existing code cannot reschedule at the desired granularily,
due to precision constraints of schedule_timeout_interruptible().

In our use case, we leverage the Linux Kernel for control plane, and
it is not uncommon to have 60K - 100K pps for some signaling protocols.

Since we are in non-atomic context, the usleep_range() function seems to be
more appropriate for being able to introduce smaller controlled delays,
in the range of 5-10 microseconds. Upon reading the existing code, it would
seem that this was the original intent. Adding sub-millisecond delays,
seems unfeasible with a call to schedule_timeout_interruptible().

KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */
schedule_timeout_interruptible(
        usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL));

Below, we attempted a brief comparison between the existing implementation,
which uses schedule_timeout_interruptible() and usleep_range().

insmod rte_kni.ko kthread_mode=single carrier=on

schedule_timeout_interruptible(usecs_to_jiffies(5))
kni_single CPU Usage: 2-4 %
[root@localhost ~]# ping 1.1.1.2 -I eth1
PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data.
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms

usleep_range(5, 10)
kni_single CPU usage: 50%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms

usleep_range(20, 50)
kni_single CPU usage: 24%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms

usleep_range(50, 100)
kni_single CPU usage: 13%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms

usleep_range(100, 200)
kni_single CPU usage: 7%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms

usleep_range(1000, 1100)
kni_single CPU usage: 2%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms

Upon testing, usleep_range(1000, 1100) seems roughly equivalent in
latency and cpu usage to the variant with schedule_timeout_interruptible(),
while usleep_range(100, 200) seems to give a decent tradeoff between
latency and cpu usage, while allowing users to tweak the limits for
improved precision if they have such use cases.

Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a
softlockup on my kernel.

Kernel panic - not syncing: softlockup: hung tasks
CPU: 0 PID: 1226 Comm: kni_single Tainted: G        W  O 3.10 #1
 <IRQ>  [<ffffffff814f84de>] dump_stack+0x19/0x1b
 [<ffffffff814f7891>] panic+0xcd/0x1e0
 [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160
 [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0
 [<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0
 [<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0
 [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80

References:
[1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt

Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>

---
v2:
* Fixed some spelling errors
---
 doc/guides/prog_guide/kernel_nic_interface.rst | 33 +++++++++++++++++++++++
 kernel/linux/kni/kni_dev.h                     |  2 +-
 kernel/linux/kni/kni_misc.c                    | 36 +++++++++++++++++++++++---
 3 files changed, 66 insertions(+), 5 deletions(-)
  

Comments

Stephen Hemminger Nov. 2, 2021, 3:53 p.m. UTC | #1
On Tue,  2 Nov 2021 17:51:13 +0200
Tudor Cornea <tudor.cornea@gmail.com> wrote:

> +#ifdef RTE_KNI_PREEMPT_DEFAULT
> +module_param(min_scheduling_interval, long, 0644);
> +MODULE_PARM_DESC(min_scheduling_interval,
> +"Kni thread min scheduling interval (default=100 microseconds):\n"
> +"\t\t"
> +);

Why the non-standard newline's and tab's?
Please try to make KNI look like other kernel code.
  
Tudor Cornea Nov. 3, 2021, 8:40 p.m. UTC | #2
On Tue, 2 Nov 2021 at 17:53, Stephen Hemminger <stephen@networkplumber.org>
wrote:

> On Tue,  2 Nov 2021 17:51:13 +0200
> Tudor Cornea <tudor.cornea@gmail.com> wrote:
>
> > +#ifdef RTE_KNI_PREEMPT_DEFAULT
> > +module_param(min_scheduling_interval, long, 0644);
> > +MODULE_PARM_DESC(min_scheduling_interval,
> > +"Kni thread min scheduling interval (default=100 microseconds):\n"
> > +"\t\t"
> > +);
>
> Why the non-standard newline's and tab's?
> Please try to make KNI look like other kernel code.
>

Hi Stephen,

I tried to base the description of the new parameters on an existing
parameter implemented for the rte_kni module - carrier.

module_param(carrier, charp, 0644);
MODULE_PARM_DESC(carrier,
"Default carrier state for KNI interface (default=off):\n"
"\t\toff   Interfaces will be created with carrier state set to off.\n"
"\t\ton    Interfaces will be created with carrier state set to on.\n"
"\t\t"
);

I thought about keeping the compatibility in terms of coding style with the
existing Kni module parameters.
Upon browsing the Linux tree, I realise it might not be standard (
checkpatch.pl , interestingly didn't seem to complain about the patch)

I also realise now, that I missed two tabs at the beginning of the params
description.
Should I add the missing tabs, so that the new parameters that I intend to
add through this patch are similar in style to the existing ones, or should
I remove the newlines and tabs altogether, when specifying the description
for min_scheduling_interval and max_scheduling_interval ?

Thanks,
Tudor
  
Stephen Hemminger Nov. 3, 2021, 10:18 p.m. UTC | #3
On Wed, 3 Nov 2021 22:40:51 +0200
Tudor Cornea <tudor.cornea@gmail.com> wrote:

> On Tue, 2 Nov 2021 at 17:53, Stephen Hemminger <stephen@networkplumber.org>
> wrote:
> 
> > On Tue,  2 Nov 2021 17:51:13 +0200
> > Tudor Cornea <tudor.cornea@gmail.com> wrote:
> >  
> > > +#ifdef RTE_KNI_PREEMPT_DEFAULT
> > > +module_param(min_scheduling_interval, long, 0644);
> > > +MODULE_PARM_DESC(min_scheduling_interval,
> > > +"Kni thread min scheduling interval (default=100 microseconds):\n"
> > > +"\t\t"
> > > +);  
> >
> > Why the non-standard newline's and tab's?
> > Please try to make KNI look like other kernel code.
> >  
> 
> Hi Stephen,
> 
> I tried to base the description of the new parameters on an existing
> parameter implemented for the rte_kni module - carrier.
> 
> module_param(carrier, charp, 0644);
> MODULE_PARM_DESC(carrier,
> "Default carrier state for KNI interface (default=off):\n"
> "\t\toff   Interfaces will be created with carrier state set to off.\n"
> "\t\ton    Interfaces will be created with carrier state set to on.\n"
> "\t\t"
> );
> 
> I thought about keeping the compatibility in terms of coding style with the
> existing Kni module parameters.
> Upon browsing the Linux tree, I realise it might not be standard (
> checkpatch.pl , interestingly didn't seem to complain about the patch)
> 
> I also realise now, that I missed two tabs at the beginning of the params
> description.
> Should I add the missing tabs, so that the new parameters that I intend to
> add through this patch are similar in style to the existing ones, or should
> I remove the newlines and tabs altogether, when specifying the description
> for min_scheduling_interval and max_scheduling_interval ?
> 
> Thanks,
> Tudor

Although KNI is unlikely to ever get upstream code review, there
is no reason to deviate from common practice in kernel drivers.
The original module parameter was doing something unconventional.

Not wrong, just different.
  

Patch

diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst b/doc/guides/prog_guide/kernel_nic_interface.rst
index 1ce03ec..2dd3481 100644
--- a/doc/guides/prog_guide/kernel_nic_interface.rst
+++ b/doc/guides/prog_guide/kernel_nic_interface.rst
@@ -56,6 +56,10 @@  can be specified when the module is loaded to control its behavior:
                     off   Interfaces will be created with carrier state set to off.
                     on    Interfaces will be created with carrier state set to on.
                      (charp)
+    parm:           min_scheduling_interval: "Kni thread min scheduling interval (default=100 microseconds):
+                     (long)
+    parm:           max_scheduling_interval: "Kni thread max scheduling interval (default=200 microseconds):
+                     (long)
 
 Loading the ``rte_kni`` kernel module without any optional parameters is
 the typical way a DPDK application gets packets into and out of the kernel
@@ -174,6 +178,35 @@  To set the default carrier state to *off*:
 If the ``carrier`` parameter is not specified, the default carrier state
 of KNI interfaces will be set to *off*.
 
+KNI Kthread Scheduling
+~~~~~~~~~~~~~~~~~~~~~~
+
+The ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters
+control the rescheduling interval of the KNI kthreads.
+
+This might be useful if we have use cases in which we require improved
+latency or performance for control plane traffic.
+
+The implementation is backed by Linux hrtimers, and uses ``usleep_range``.
+Hence, it will have the same granularity constraints as Linux hrtimers.
+
+To see more about the Linux hrtimers, you can check the following resource: `Kernel Timers <http://www.kernel.org/doc/Documentation/timers/timers-howto.txt>`_
+
+To set the ``min_scheduling_interval`` to a value of 100 microseconds:
+
+.. code-block:: console
+
+    # insmod <build_dir>/kernel/linux/kni/rte_kni.ko min_scheduling_interval=100
+
+To set the ``max_scheduling_interval`` to a value of 200 microseconds:
+
+.. code-block:: console
+
+    # insmod <build_dir>/kernel/linux/kni/rte_kni.ko max_scheduling_interval=200
+
+If the ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters are
+not specified, the default interval limits will be set to *100* and *200* respectively.
+
 KNI Creation and Deletion
 -------------------------
 
diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h
index c15da311..bb4d891 100644
--- a/kernel/linux/kni/kni_dev.h
+++ b/kernel/linux/kni/kni_dev.h
@@ -27,7 +27,7 @@ 
 #include <linux/list.h>
 
 #include <rte_kni_common.h>
-#define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */
+#define KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL 1000000 /* us */
 
 #define MBUF_BURST_SZ 32
 
diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c
index 2b464c4..e23cfd9 100644
--- a/kernel/linux/kni/kni_misc.c
+++ b/kernel/linux/kni/kni_misc.c
@@ -41,6 +41,12 @@  static uint32_t multiple_kthread_on;
 static char *carrier;
 uint32_t kni_dflt_carrier;
 
+#ifdef RTE_KNI_PREEMPT_DEFAULT
+/* Kni thread scheduling interval */
+static long min_scheduling_interval = 100; /* us */
+static long max_scheduling_interval = 200; /* us */
+#endif
+
 #define KNI_DEV_IN_USE_BIT_NUM 0 /* Bit number for device in use */
 
 static int kni_net_id;
@@ -130,8 +136,7 @@  kni_thread_single(void *data)
 		up_read(&knet->kni_list_lock);
 #ifdef RTE_KNI_PREEMPT_DEFAULT
 		/* reschedule out for a while */
-		schedule_timeout_interruptible(
-			usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL));
+		usleep_range(min_scheduling_interval, max_scheduling_interval);
 #endif
 	}
 
@@ -150,8 +155,7 @@  kni_thread_multiple(void *param)
 			kni_net_poll_resp(dev);
 		}
 #ifdef RTE_KNI_PREEMPT_DEFAULT
-		schedule_timeout_interruptible(
-			usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL));
+		usleep_range(min_scheduling_interval, max_scheduling_interval);
 #endif
 	}
 
@@ -593,6 +597,16 @@  kni_init(void)
 	else
 		pr_debug("Default carrier state set to on.\n");
 
+#ifdef RTE_KNI_PREEMPT_DEFAULT
+	if (min_scheduling_interval < 0 || max_scheduling_interval < 0 ||
+		min_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL ||
+		max_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL ||
+		min_scheduling_interval >= max_scheduling_interval) {
+		pr_err("Invalid parameters for scheduling interval\n");
+		return -EINVAL;
+	}
+#endif
+
 #ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS
 	rc = register_pernet_subsys(&kni_net_ops);
 #else
@@ -659,3 +673,17 @@  MODULE_PARM_DESC(carrier,
 "\t\ton    Interfaces will be created with carrier state set to on.\n"
 "\t\t"
 );
+
+#ifdef RTE_KNI_PREEMPT_DEFAULT
+module_param(min_scheduling_interval, long, 0644);
+MODULE_PARM_DESC(min_scheduling_interval,
+"Kni thread min scheduling interval (default=100 microseconds):\n"
+"\t\t"
+);
+
+module_param(max_scheduling_interval, long, 0644);
+MODULE_PARM_DESC(max_scheduling_interval,
+"Kni thread max scheduling interval (default=200 microseconds):\n"
+"\t\t"
+);
+#endif