[1/2] power: introduce PM QoS interface

Message ID 20240320105529.5626-2-lihuisong@huawei.com (mailing list archive)
State Superseded
Delegated to: Thomas Monjalon
Headers
Series introduce PM QoS interface |

Checks

Context Check Description
ci/checkpatch warning coding style issues

Commit Message

lihuisong (C) March 20, 2024, 10:55 a.m. UTC
  The system-wide CPU latency QoS limit has a positive impact on the idle
state selection in cpuidle governor.

Linux creates a cpu_dma_latency device under '/dev' directory to obtain the
CPU latency QoS limit on system and send the QoS request for userspace.
Please see the PM QoS framework in the following link:
https://docs.kernel.org/power/pm_qos_interface.html?highlight=qos
This feature has beed supported by kernel-v2.6.25.

The deeper the idle state, the lower the power consumption, but the longer
the resume time. Some service are delay sensitive and very except the low
resume time, like interrupt packet receiving mode.

So this PM QoS API make it easy to obtain the CPU latency limit on system
and send the CPU latency QoS request for the application that need them.

The recommend usage method is as follows:
1) an application process first creates QoS request.
2) update the CPU latency request to zero when need.
3) back to the default value when no need(this step is optional).
4) release QoS request when process exit.

Signed-off-by: Huisong Li <lihuisong@huawei.com>
---
 doc/guides/prog_guide/power_man.rst    |  16 ++++
 doc/guides/rel_notes/release_24_03.rst |   4 +
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_qos.c              |  98 ++++++++++++++++++++++++
 lib/power/rte_power_qos.h              | 101 +++++++++++++++++++++++++
 lib/power/version.map                  |   4 +
 6 files changed, 225 insertions(+)
 create mode 100644 lib/power/rte_power_qos.c
 create mode 100644 lib/power/rte_power_qos.h
  

Patch

diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index f6674efe2d..493c75bf9d 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -249,6 +249,22 @@  Get Num Pkgs
 Get Num Dies
   Get the number of die's on a given package.
 
+PM QoS API
+----------
+The deeper the idle state, the lower the power consumption, but the longer
+the resume time. Some service threads are delay sensitive and very except
+the low resume time, like interrupt packet receiving mode.
+
+This PM QoS API is aimed to obtain the CPU latency limit on system and send the
+CPU latency QoS request for the application that need them.
+
+* ``rte_power_qos_get_curr_cpu_latency()`` is used to get the current CPU
+  latency limit on system.
+* For sending CPU latency QoS request, first call ``rte_power_create_qos_request()``
+  to create a QoS request, then update CPU latency value by calling
+  ``rte_power_qos_update_request()``. The ``rte_power_release_qos_request()`` is
+  used to release this QoS request when process exit.
+
 References
 ----------
 
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 14826ea08f..b5be724133 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -196,6 +196,10 @@  New Features
   Added DMA producer mode to measure performance of ``OP_FORWARD`` mode
   of event DMA adapter.
 
+* **Added CPU latency PM QoS support.**
+
+  Added the interface querying cpu latency PM QoS limit on system and
+  the interface sending cpu latency QoS request in power lib.
 
 Removed Items
 -------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index b8426589b2..8222e178b0 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -23,12 +23,14 @@  sources = files(
         'rte_power.c',
         'rte_power_uncore.c',
         'rte_power_pmd_mgmt.c',
+        'rte_power_qos.c',
 )
 headers = files(
         'rte_power.h',
         'rte_power_guest_channel.h',
         'rte_power_pmd_mgmt.h',
         'rte_power_uncore.h',
+        'rte_power_qos.h',
 )
 if cc.has_argument('-Wno-cast-qual')
     cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
new file mode 100644
index 0000000000..d2b55923a0
--- /dev/null
+++ b/lib/power/rte_power_qos.c
@@ -0,0 +1,98 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#include <errno.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+#include <rte_log.h>
+
+#include "power_common.h"
+#include "rte_power_qos.h"
+
+#define QOS_CPU_DMA_LATENCY_DEV "/dev/cpu_dma_latency"
+
+struct rte_power_qos_info {
+	/*
+	 * Keep file descriptor to update QoS request until there are no
+	 * necessary anymore.
+	 */
+	int fd;
+	int cur_cpu_latency; /* unit microseconds */
+	};
+
+struct rte_power_qos_info g_qos = {
+	.fd = -1,
+	.cur_cpu_latency = -1,
+};
+
+int
+rte_power_qos_get_curr_cpu_latency(int *latency)
+{
+	int fd, ret;
+
+	fd = open(QOS_CPU_DMA_LATENCY_DEV, O_RDONLY);
+	if (fd < 0) {
+		POWER_LOG(ERR, "Failed to open %s", QOS_CPU_DMA_LATENCY_DEV);
+		return -1;
+	}
+
+	ret = read(fd, latency, sizeof(*latency));
+	if (ret == 0) {
+		POWER_LOG(ERR, "Failed to read %s", QOS_CPU_DMA_LATENCY_DEV);
+		return -1;
+	}
+	close(fd);
+
+	return 0;
+}
+
+int
+rte_power_qos_update_request(int latency)
+{
+	int ret;
+
+	if (g_qos.fd == -1) {
+		POWER_LOG(ERR, "please create QoS request first.");
+		return -EINVAL;
+	}
+
+	if (latency < 0) {
+		POWER_LOG(ERR, "latency should be non negative number.");
+		return -EINVAL;
+	}
+
+	if (g_qos.cur_cpu_latency != -1 && latency == g_qos.cur_cpu_latency)
+		return 0;
+
+	ret = write(g_qos.fd, &latency, sizeof(latency));
+	if (ret == 0) {
+		POWER_LOG(ERR, "Failed to write %s", QOS_CPU_DMA_LATENCY_DEV);
+		return -1;
+	}
+	g_qos.cur_cpu_latency = latency;
+
+	return 0;
+}
+
+int
+rte_power_create_qos_request(void)
+{
+	g_qos.fd = open(QOS_CPU_DMA_LATENCY_DEV, O_WRONLY);
+	if (g_qos.fd < 0) {
+		POWER_LOG(ERR, "Failed to open %s.", QOS_CPU_DMA_LATENCY_DEV);
+		return -1;
+	}
+
+	return 0;
+}
+
+void
+rte_power_release_qos_request(void)
+{
+	if (g_qos.fd != -1) {
+		close(g_qos.fd);
+		g_qos.fd = -1;
+	}
+}
diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h
new file mode 100644
index 0000000000..d39f5d0c0f
--- /dev/null
+++ b/lib/power/rte_power_qos.h
@@ -0,0 +1,101 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2024 HiSilicon Limited
+ */
+
+#ifndef RTE_POWER_QOS_H
+#define RTE_POWER_QOS_H
+
+#include <rte_compat.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @file rte_power_qos.h
+ *
+ * PM QoS API.
+ *
+ * The system-wide CPU latency QoS limit has a positive impact on the idle
+ * state selection in cpuidle governor.
+ *
+ * Linux creates a cpu_dma_latency device under '/dev' directory to obtain the
+ * CPU latency QoS limit on system and send the QoS request for userspace.
+ * Please see the PM QoS framework in the following link:
+ * https://docs.kernel.org/power/pm_qos_interface.html?highlight=qos
+ *
+ * The deeper the idle state, the lower the power consumption, but the longer
+ * the resume time. Some service are delay sensitive and very except the
+ * low resume time, like interrupt packet receiving mode.
+ *
+ * So this PM QoS API make it easy to obtain the CPU latency limit on system and
+ * send the CPU latency QoS request for the application that need them.
+ *
+ * The recommend usage method is as follows:
+ * 1) an application process first creates QoS request.
+ * 2) update the CPU latency request to zero when need.
+ * 3) back to the default value @see PM_QOS_CPU_LATENCY_DEFAULT_VALUE when
+ *    no need (this step is optional).
+ * 4)release QoS request when process exit.
+ */
+
+#define QOS_USEC_PER_SEC                        1000000
+#define PM_QOS_CPU_LATENCY_DEFAULT_VALUE        (2000 * QOS_USEC_PER_SEC)
+#define PM_QOS_STRICT_LATENCY_VALUE             0
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Create CPU latency QoS request and release this request by
+ * @see rte_power_release_qos_request.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_create_qos_request(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * release CPU latency QoS request.
+ */
+__rte_experimental
+void rte_power_release_qos_request(void);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the current CPU latency QoS limit on system.
+ * The default value in kernel is @see PM_QOS_CPU_LATENCY_DEFAULT_VALUE.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_qos_get_curr_cpu_latency(int *latency);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Update the CPU latency QoS request.
+ * Note: need to create QoS request first and then call this API.
+ *
+ * @param latency
+ *   The latency should be greater than and equal to zero.
+ *
+ * @return
+ *   0 on success. Otherwise negative value is returned.
+ */
+__rte_experimental
+int rte_power_qos_update_request(int latency);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_POWER_QOS_H */
diff --git a/lib/power/version.map b/lib/power/version.map
index ad92a65f91..42770762b1 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -51,4 +51,8 @@  EXPERIMENTAL {
 	rte_power_set_uncore_env;
 	rte_power_uncore_freqs;
 	rte_power_unset_uncore_env;
+	rte_power_create_qos_request;
+	rte_power_release_qos_request;
+	rte_power_qos_get_curr_cpu_latency;
+	rte_power_qos_update_request;
 };