[v2,1/2] power: introduce PM QoS API on CPU wide

Message ID 20240613112038.14271-2-lihuisong@huawei.com (mailing list archive)
State Superseded
Delegated to: Thomas Monjalon
Headers
Series power: introduce PM QoS interface |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/loongarch-compilation warning apply patch failure
ci/iol-testing warning apply patch failure

Commit Message

lihuisong (C) June 13, 2024, 11:20 a.m. UTC
  The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
interface is used to set and get the resume latency limit on the cpuX for
userspace. Each cpuidle governor in Linux select which idle state to enter
based on this CPU resume latency in their idle task.

The per-CPU PM QoS API can be used to control this CPU's idle state
selection and limit just enter the shallowest idle state to low the delay
after sleep by setting strict resume latency (zero value).

Signed-off-by: Huisong Li <lihuisong@huawei.com>
---
 doc/guides/prog_guide/power_man.rst    |  22 +++++
 doc/guides/rel_notes/release_24_07.rst |   4 +
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_qos.c              | 116 +++++++++++++++++++++++++
 lib/power/rte_power_qos.h              |  70 +++++++++++++++
 lib/power/version.map                  |   2 +
 6 files changed, 216 insertions(+)
 create mode 100644 lib/power/rte_power_qos.c
 create mode 100644 lib/power/rte_power_qos.h
  

Comments

Morten Brørup June 14, 2024, 8:04 a.m. UTC | #1
> +#define PM_QOS_SYSFILE_RESUME_LATENCY_US	\
> +	"/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"

Is it OK to access this path using the lcore_id as CPU parameter to open_core_sysfs_file(), or must it be mapped through rte_lcore_to_cpu_id(lcore_id) first?

@David, do you know?

> +
> +int
> +rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
> +{
> +	char buf[BUFSIZ] = {0};
> +	FILE *f;
> +	int ret;
> +
> +	if (lcore_id >= RTE_MAX_LCORE) {
> +		POWER_LOG(ERR, "Lcore id %u can not exceeds %u",
> +			  lcore_id, RTE_MAX_LCORE - 1U);
> +		return -EINVAL;
> +	}

The lcore_id could be a registered non-EAL thread.
You should probably fail in that case.

Same comment for rte_power_qos_get_cpu_resume_latency().


> +#define PM_QOS_STRICT_LATENCY_VALUE             0
> +#define PM_QOS_RESUME_LATENCY_NO_CONSTRAINT    ((int)(UINT32_MAX >> 1))

These definitions are in the public header file, and thus should be RTE_POWER_ prefixed and have comments describing them.
  
lihuisong (C) June 18, 2024, 12:19 p.m. UTC | #2
Hi Morten,

Thanks for your review.


在 2024/6/14 16:04, Morten Brørup 写道:
>> +#define PM_QOS_SYSFILE_RESUME_LATENCY_US	\
>> +	"/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
> Is it OK to access this path using the lcore_id as CPU parameter to open_core_sysfs_file(), or must it be mapped through rte_lcore_to_cpu_id(lcore_id) first?
The cpu_id getting by rte_lcore_to_cpu_id() is from 
lcore_config[lcore_id].core_id which is from 
"/sys/devices/system/cpu/cpuX/topology/core_id" file, please see the 
function eal_cpu_core_id().
So I think the number in above "cpuX" must be the lcore_id in DPDK.
And the similar interface in power lib also directly use the locore_id.
>
> @David, do you know?
>
>> +
>> +int
>> +rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
>> +{
>> +	char buf[BUFSIZ] = {0};
>> +	FILE *f;
>> +	int ret;
>> +
>> +	if (lcore_id >= RTE_MAX_LCORE) {
>> +		POWER_LOG(ERR, "Lcore id %u can not exceeds %u",
>> +			  lcore_id, RTE_MAX_LCORE - 1U);
>> +		return -EINVAL;
>> +	}
> The lcore_id could be a registered non-EAL thread.
> You should probably fail in that case.
right, how about use rte_lcore_is_enabled(locore_id)?
>
> Same comment for rte_power_qos_get_cpu_resume_latency().
>
>
>> +#define PM_QOS_STRICT_LATENCY_VALUE             0
>> +#define PM_QOS_RESUME_LATENCY_NO_CONSTRAINT    ((int)(UINT32_MAX >> 1))
> These definitions are in the public header file, and thus should be RTE_POWER_ prefixed and have comments describing them.
Ack
>
>
> .
  
Morten Brørup June 18, 2024, 12:53 p.m. UTC | #3
> From: lihuisong (C) [mailto:lihuisong@huawei.com]
> 
> Hi Morten,
> 
> Thanks for your review.
> 
> 
> 在 2024/6/14 16:04, Morten Brørup 写道:
> >> +#define PM_QOS_SYSFILE_RESUME_LATENCY_US	\
> >> +	"/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
> > Is it OK to access this path using the lcore_id as CPU parameter to
> open_core_sysfs_file(), or must it be mapped through
> rte_lcore_to_cpu_id(lcore_id) first?
> The cpu_id getting by rte_lcore_to_cpu_id() is from
> lcore_config[lcore_id].core_id which is from
> "/sys/devices/system/cpu/cpuX/topology/core_id" file, please see the
> function eal_cpu_core_id().
> So I think the number in above "cpuX" must be the lcore_id in DPDK.
> And the similar interface in power lib also directly use the locore_id.

Then it should be OK.
Thanks for the detailed answer.

> >
> > @David, do you know?
> >
> >> +
> >> +int
> >> +rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
> >> +{
> >> +	char buf[BUFSIZ] = {0};
> >> +	FILE *f;
> >> +	int ret;
> >> +
> >> +	if (lcore_id >= RTE_MAX_LCORE) {
> >> +		POWER_LOG(ERR, "Lcore id %u can not exceeds %u",
> >> +			  lcore_id, RTE_MAX_LCORE - 1U);
> >> +		return -EINVAL;
> >> +	}
> > The lcore_id could be a registered non-EAL thread.
> > You should probably fail in that case.
> right, how about use rte_lcore_is_enabled(locore_id)?

I suppose setting latency for service cores should be forbidden too,
so using rte_lcore_is_enabled() to check for ROLE_RTE is correct.

> >
> > Same comment for rte_power_qos_get_cpu_resume_latency().
> >
> >
> >> +#define PM_QOS_STRICT_LATENCY_VALUE             0
> >> +#define PM_QOS_RESUME_LATENCY_NO_CONSTRAINT    ((int)(UINT32_MAX >> 1))
> > These definitions are in the public header file, and thus should be
> RTE_POWER_ prefixed and have comments describing them.
> Ack
> >
> >
> > .
  

Patch

diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index f6674efe2d..3ff46f06c1 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -249,6 +249,28 @@  Get Num Pkgs
 Get Num Dies
   Get the number of die's on a given package.
 
+
+PM QoS
+------
+
+The deeper the idle state, the lower the power consumption, but the longer
+the resume time. Some service are delay sensitive and very except the low
+resume time, like interrupt packet receiving mode.
+
+And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
+interface is used to set and get the resume latency limit on the cpuX for
+userspace. Each cpuidle governor in Linux select which idle state to enter
+based on this CPU resume latency in their idle task.
+
+The per-CPU PM QoS API can be used to set and get the CPU resume latency.
+
+The ``rte_power_qos_set_cpu_resume_latency()`` function can effect the work
+CPU's idle state selection and just allow to enter the shallowest idle state
+if set to zero (strict resume latency) for this CPU.
+
+The ``rte_power_qos_get_cpu_resume_latency()`` function can obtain the resume
+latency on specified CPU.
+
 References
 ----------
 
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index e68a53d757..7c0d36e389 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -89,6 +89,10 @@  New Features
 
   * Added SSE/NEON vector datapath.
 
+* **Introduce PM QoS interface.**
+
+  * Introduce PM QoS interface to low the delay after sleep.
+
 
 Removed Items
 -------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index b8426589b2..8222e178b0 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -23,12 +23,14 @@  sources = files(
         'rte_power.c',
         'rte_power_uncore.c',
         'rte_power_pmd_mgmt.c',
+        'rte_power_qos.c',
 )
 headers = files(
         'rte_power.h',
         'rte_power_guest_channel.h',
         'rte_power_pmd_mgmt.h',
         'rte_power_uncore.h',
+        'rte_power_qos.h',
 )
 if cc.has_argument('-Wno-cast-qual')
     cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
new file mode 100644
index 0000000000..706f8432ee
--- /dev/null
+++ b/lib/power/rte_power_qos.c
@@ -0,0 +1,116 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_log.h>
+
+#include "power_common.h"
+#include "rte_power_qos.h"
+
+#define PM_QOS_RESUME_LATENCY_NO_CONSTRAINT	((int)(UINT32_MAX >> 1))
+#define PM_QOS_SYSFILE_RESUME_LATENCY_US	\
+	"/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
+
+int
+rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
+{
+	char buf[BUFSIZ] = {0};
+	FILE *f;
+	int ret;
+
+	if (lcore_id >= RTE_MAX_LCORE) {
+		POWER_LOG(ERR, "Lcore id %u can not exceeds %u",
+			  lcore_id, RTE_MAX_LCORE - 1U);
+		return -EINVAL;
+	}
+
+	if (latency < 0) {
+		POWER_LOG(ERR, "latency should be greater than and equal to 0");
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+	 * is as follows for different input string.
+	 * 1> the resume latency is 0 if the input is "n/a".
+	 * 2> the resume latency is no constraint if the input is "0".
+	 * 3> the resume latency is the actual value to be set.
+	 */
+	if (latency == 0)
+		sprintf(buf, "%s", "n/a");
+	else if (latency == PM_QOS_RESUME_LATENCY_NO_CONSTRAINT)
+		sprintf(buf, "%u", 0);
+	else
+		sprintf(buf, "%u", latency);
+
+	ret = write_core_sysfs_s(f, buf);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		goto out;
+	}
+
+out:
+	if (f != NULL)
+		fclose(f);
+
+	return ret;
+}
+
+int
+rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id)
+{
+	char buf[BUFSIZ];
+	int latency = -1;
+	FILE *f;
+	int ret;
+
+	if (lcore_id >= RTE_MAX_LCORE) {
+		POWER_LOG(ERR, "Lcore id %u can not exceeds %u",
+				lcore_id, RTE_MAX_LCORE - 1U);
+		return -EINVAL;
+	}
+
+	ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		return ret;
+	}
+
+	ret = read_core_sysfs_s(f, buf, sizeof(buf));
+	if (ret != 0) {
+		POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
+		goto out;
+	}
+
+	/*
+	 * Based on the sysfs interface pm_qos_resume_latency_us under
+	 * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meanning
+	 * is as follows for different output string.
+	 * 1> the resume latency is 0 if the output is "n/a".
+	 * 2> the resume latency is no constraint if the output is "0".
+	 * 3> the resume latency is the actual value in used for other string.
+	 */
+	if (strcmp(buf, "n/a") == 0)
+		latency = 0;
+	else {
+		latency = strtoul(buf, NULL, 10);
+		latency = latency == 0 ? PM_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency;
+	}
+
+out:
+	if (f != NULL)
+		fclose(f);
+
+	return latency != -1 ? latency : ret;
+}
diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h
new file mode 100644
index 0000000000..1ba9568d1b
--- /dev/null
+++ b/lib/power/rte_power_qos.h
@@ -0,0 +1,70 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#ifndef RTE_POWER_QOS_H
+#define RTE_POWER_QOS_H
+
+#include <rte_compat.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_power_qos.h
+ *
+ * PM QoS API.
+ *
+ * The CPU-wide resume latency limit has a positive impact on this CPU's idle
+ * state selection in each cpuidle governor.
+ * Please see the PM QoS on CPU wide in the following link:
+ * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices-power-pm-qos-resume-latency-us
+ *
+ * The deeper the idle state, the lower the power consumption, but the
+ * longer the resume time. Some service are delay sensitive and very except the
+ * low resume time, like interrupt packet receiving mode.
+ *
+ * In these case, per-CPU PM QoS API can be used to control this CPU's idle
+ * state selection and limit just enter the shallowest idle state to low the
+ * delay after sleep by setting strict resume latency (zero value).
+ */
+
+#define PM_QOS_STRICT_LATENCY_VALUE             0
+#define PM_QOS_RESUME_LATENCY_NO_CONSTRAINT    ((int)(UINT32_MAX >> 1))
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param lcore_id
+ *   target logical core id
+ *
+ * @param latency
+ *   The latency should be greater than and equal to zero in microseconds unit.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the current resume latency of this logical core.
+ * The default value in kernel is @see PM_QOS_RESUME_LATENCY_NO_CONSTRAINT if don't set it.
+ *
+ * @return
+ *   Negative value on failure.
+ *   >= 0 means the actual resume latency limit on this core.
+ */
+__rte_experimental
+int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_POWER_QOS_H */
diff --git a/lib/power/version.map b/lib/power/version.map
index ad92a65f91..81b8ff11b7 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -51,4 +51,6 @@  EXPERIMENTAL {
 	rte_power_set_uncore_env;
 	rte_power_uncore_freqs;
 	rte_power_unset_uncore_env;
+	rte_power_qos_set_cpu_resume_latency;
+	rte_power_qos_get_cpu_resume_latency;
 };