diff mbox series

[v9,1/3] power: add Intel uncore frequency control API to power library

Message ID 20221006093803.2076768-2-tadhg.kearney@intel.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers show
Series add Intel uncore api to be called through l3fwd-power | expand

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Tadhg Kearney Oct. 6, 2022, 9:38 a.m. UTC
Add API to allow uncore frequency adjustment. Uncore is a
term used by Intel to describe function of a microprocessor
that are closely connected to the core to achieve high
performance. This is done through manipulating related
uncore frequency control sysfs entries to adjust the
minimum and maximum uncore frequency values and works
on Linux for Intel hardware.

Signed-off-by: Tadhg Kearney <tadhg.kearney@intel.com>
Reviewed-by: David Hunt <david.hunt@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
---
 doc/guides/prog_guide/power_man.rst    |  54 +++
 doc/guides/rel_notes/release_22_11.rst |   6 +
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_intel_uncore.c     | 451 +++++++++++++++++++++++++
 lib/power/rte_power_intel_uncore.h     | 194 +++++++++++
 lib/power/version.map                  |  11 +
 6 files changed, 718 insertions(+)
 create mode 100644 lib/power/rte_power_intel_uncore.c
 create mode 100644 lib/power/rte_power_intel_uncore.h

Comments

Stephen Hemminger Oct. 6, 2022, 5:32 p.m. UTC | #1
On Thu,  6 Oct 2022 09:38:01 +0000
Tadhg Kearney <tadhg.kearney@intel.com> wrote:

> Add API to allow uncore frequency adjustment. Uncore is a
> term used by Intel to describe function of a microprocessor
> that are closely connected to the core to achieve high
> performance. This is done through manipulating related
> uncore frequency control sysfs entries to adjust the
> minimum and maximum uncore frequency values and works
> on Linux for Intel hardware.
> 
> Signed-off-by: Tadhg Kearney <tadhg.kearney@intel.com>
> Reviewed-by: David Hunt <david.hunt@intel.com>
> Acked-by: David Hunt <david.hunt@intel.com>

Looks like this is missing an opportunity for a more general
long term solution in DPDK.

Shouldn't this be a general thing like the Linux kernel scheduler.
Uncore is Intel specific, but there is already big/little cores
on many ARM platforms.
Hunt, David Oct. 7, 2022, 10:30 a.m. UTC | #2
On 06/10/2022 18:32, Stephen Hemminger wrote:
> On Thu,  6 Oct 2022 09:38:01 +0000
> Tadhg Kearney <tadhg.kearney@intel.com> wrote:
>
>> Add API to allow uncore frequency adjustment. Uncore is a
>> term used by Intel to describe function of a microprocessor
>> that are closely connected to the core to achieve high
>> performance. This is done through manipulating related
>> uncore frequency control sysfs entries to adjust the
>> minimum and maximum uncore frequency values and works
>> on Linux for Intel hardware.
>>
>> Signed-off-by: Tadhg Kearney <tadhg.kearney@intel.com>
>> Reviewed-by: David Hunt <david.hunt@intel.com>
>> Acked-by: David Hunt <david.hunt@intel.com>


Hi Stephen,

> Looks like this is missing an opportunity for a more general
> long term solution in DPDK.


We're hoping that this is the first step along the path to that 
long-term solution. It's like the power library frequency control for 
cores, which was initially Intel only, and then more architectures were 
added over time. The API's are experimental, so can be adapted if needed.


> Shouldn't this be a general thing like the Linux kernel scheduler.

I don't think the kernel scheduler has any concept of uncore busyness, 
the uncore frequency is typically controlled by hardware, and if there's 
enough polling going on, the frequency of the uncore will remain high 
(if uncore frequency scaling is enabled). We're addressing that in this 
patch in that if the application realises that it's not processing a lot 
of packets even though most of it's cores are polling, it can tell the 
hardware to scale down the uncore to save power.


> Uncore is Intel specific, but there is already big/little cores
> on many ARM platforms.

I don't think big/little is related to uncore frequency scaling. The 
big/little cores still need to communicate via that architecture's 
communications bus (called uncore in Intel's case, though I've seen that 
term used on other architectures also). Where an architecture can scale 
the frequency of this communications bus, this patch set's functionality 
can be extended in the future to cover this.

Regards,
Dave
Thomas Monjalon Oct. 10, 2022, 12:46 p.m. UTC | #3
06/10/2022 11:38, Tadhg Kearney:
> +API Overview for Intel Uncore
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Overview of each function in the Intel Uncore API, with explanation of what
> +they do. Each function should not be called in the fast path.
> +
> +* **Uncore Power Init**
> +    Initialize uncore power, populate frequency array and record
> +    original min & max for die on pkg.
> +
> +* **Uncore Power Exit**
> +    Exit uncore power, restoring original min & max for die on pkg.
> +
> +* **Get Uncore Power Freq**
> +    Get current uncore freq index for die on pkg.
> +
> +* **Set Uncore Power Freq**
> +    Set min & max uncore freq index for die on pkg to specified index value
> +    (min and max will be the same).
> +
> +* **Uncore Power Max**
> +    Set min & max uncore freq to maximum frequency index for die on pkg
> +    (min and max will be the same).
> +
> +* **Uncore Power Min**
> +    Set min & max uncore freq to minimum frequency index for die on pkg
> +    (min and max will be the same).
> +
> +* **Get Num Freqs**
> +    Get the number of frequencies in the index array.
> +
> +* **Get Num Pkgs**
> +    Get the number of packages (CPU's) on the system.
> +
> +* **Get Num Dies**
> +    Get the number of die's on a given package.

This is the role of doxygen documentation to explain API.
I don't understand why it is there.
Anyway I've converted it into a definition list,
which the proper RST syntax for what you do.

>          'rte_power.c',
>          'rte_power_empty_poll.c',
>          'rte_power_pmd_mgmt.c',
> +        'rte_power_intel_uncore.c',

In general, we keep such list in alphabetical order.

[...]
> +struct uncore_power_info {
> +	unsigned int die;                    /** Core die id */
> +	unsigned int pkg;                    /** Package id */
> +	uint32_t freqs[MAX_UNCORE_FREQS];/** Frequency array */
> +	uint32_t nb_freqs;                   /** Number of available freqs */
> +	FILE *f_cur_min;                     /** FD of scaling_min */
> +	FILE *f_cur_max;                     /** FD of scaling_max */
> +	uint32_t curr_idx;                   /** Freq index in freqs array */
> +	uint32_t org_min_freq;               /** Original min freq of uncore */
> +	uint32_t org_max_freq;               /** Original max freq of uncore */
> +	uint32_t init_max_freq;              /** System max uncore freq */
> +	uint32_t init_min_freq;              /** System min uncore freq */
> +} __rte_cache_aligned;

No need of doxygen syntax in a .c file.
And an alignment is wrong.

[...]
> +		RTE_LOG(DEBUG, POWER, "Invalid uncore frequency index %u, which "
> +				"should be less than %u\n", idx, ui->nb_freqs);

When you have time, it would be good to switch to dynamic logging.
See RTE_LOG_REGISTER_DEFAULT


[...]
> +#ifndef _RTE_POWER_INTEL_UNCORE_H
> +#define _RTE_POWER_INTEL_UNCORE_H

underscore prefix is reserved for reserved keywords

[...]
> +/**
> + * Exit uncore frequency management on a specific die on a package. It will restore uncore min and

Screen width is limited, while scrolling bar is infinite,
so don't hesitate to break your lines shorter when possible,
like at the end of a sentence.
diff mbox series

Patch

diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index 98cfd3c1f3..89bc23aa9d 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -276,6 +276,60 @@  API Overview for Ethernet PMD Power Management
 * **Set Scaling Max Freq**: Set the maximum frequency (kHz) to be used in Frequency
   Scaling mode.
 
+Intel Uncore API
+----------------
+
+Abstract
+~~~~~~~~
+
+Uncore is a term used by Intel to describe the functions of a microprocessor that are
+not in the core, but which must be closely connected to the core to achieve high performance;
+L3 cache, on-die memory controller, etc.
+Significant power savings can be achieved by reducing the uncore frequency to its lowest value.
+
+The Linux kernel provides the driver “intel-uncore-frequency" to control the uncore frequency limits
+for x86 platform. The driver is available from kernel version 5.6 and above.
+Also CONFIG_INTEL_UNCORE_FREQ_CONTROL will need to be enabled in the kernel, which was added in 5.6.
+This manipulates the contest of MSR 0x620, which sets min/max of the uncore for the SKU.
+
+
+API Overview for Intel Uncore
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Overview of each function in the Intel Uncore API, with explanation of what
+they do. Each function should not be called in the fast path.
+
+* **Uncore Power Init**
+    Initialize uncore power, populate frequency array and record
+    original min & max for die on pkg.
+
+* **Uncore Power Exit**
+    Exit uncore power, restoring original min & max for die on pkg.
+
+* **Get Uncore Power Freq**
+    Get current uncore freq index for die on pkg.
+
+* **Set Uncore Power Freq**
+    Set min & max uncore freq index for die on pkg to specified index value
+    (min and max will be the same).
+
+* **Uncore Power Max**
+    Set min & max uncore freq to maximum frequency index for die on pkg
+    (min and max will be the same).
+
+* **Uncore Power Min**
+    Set min & max uncore freq to minimum frequency index for die on pkg
+    (min and max will be the same).
+
+* **Get Num Freqs**
+    Get the number of frequencies in the index array.
+
+* **Get Num Pkgs**
+    Get the number of packages (CPU's) on the system.
+
+* **Get Num Dies**
+    Get the number of die's on a given package.
+
 References
 ----------
 
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index ac67e7e710..3f21a6126b 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -123,6 +123,12 @@  New Features
   into single event containing ``rte_event_vector``
   whose event type is ``RTE_EVENT_TYPE_CRYPTODEV_VECTOR``.
 
+* **Added Intel uncore frequency control API to the power library.**
+
+  Add API to allow uncore frequency adjustment. This is done through
+  manipulating related uncore frequency control sysfs entries to
+  adjust the minimum and maximum uncore frequency values, which works on
+  Linux with Intel hardware only.
 
 Removed Items
 -------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index ba8d66074b..c8de545b6c 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -21,12 +21,14 @@  sources = files(
         'rte_power.c',
         'rte_power_empty_poll.c',
         'rte_power_pmd_mgmt.c',
+        'rte_power_intel_uncore.c',
 )
 headers = files(
         'rte_power.h',
         'rte_power_empty_poll.h',
         'rte_power_pmd_mgmt.h',
         'rte_power_guest_channel.h',
+        'rte_power_intel_uncore.h',
 )
 if cc.has_argument('-Wno-cast-qual')
     cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_intel_uncore.c b/lib/power/rte_power_intel_uncore.c
new file mode 100644
index 0000000000..4ba1ed7b4e
--- /dev/null
+++ b/lib/power/rte_power_intel_uncore.c
@@ -0,0 +1,451 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Intel Corporation
+ */
+
+#include <errno.h>
+#include <dirent.h>
+#include <fnmatch.h>
+
+#include <rte_memcpy.h>
+
+#include "rte_power_intel_uncore.h"
+#include "power_common.h"
+
+#define MAX_UNCORE_FREQS 32
+#define MAX_NUMA_DIE 8
+#define BUS_FREQ     100000
+#define FILTER_LENGTH 18
+#define PACKAGE_FILTER "package_%02u_die_*"
+#define DIE_FILTER "package_%02u_die_%02u"
+#define INTEL_UNCORE_FREQUENCY_DIR "/sys/devices/system/cpu/intel_uncore_frequency"
+#define POWER_GOVERNOR_PERF "performance"
+#define POWER_INTEL_UNCORE_SYSFILE_MAX_FREQ \
+		"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_die_%02u/max_freq_khz"
+#define POWER_INTEL_UNCORE_SYSFILE_MIN_FREQ  \
+		"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_die_%02u/min_freq_khz"
+#define POWER_INTEL_UNCORE_SYSFILE_BASE_MAX_FREQ \
+		"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_die_%02u/initial_max_freq_khz"
+#define POWER_INTEL_UNCORE_SYSFILE_BASE_MIN_FREQ  \
+		"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_die_%02u/initial_min_freq_khz"
+
+
+struct uncore_power_info {
+	unsigned int die;                    /** Core die id */
+	unsigned int pkg;                    /** Package id */
+	uint32_t freqs[MAX_UNCORE_FREQS];/** Frequency array */
+	uint32_t nb_freqs;                   /** Number of available freqs */
+	FILE *f_cur_min;                     /** FD of scaling_min */
+	FILE *f_cur_max;                     /** FD of scaling_max */
+	uint32_t curr_idx;                   /** Freq index in freqs array */
+	uint32_t org_min_freq;               /** Original min freq of uncore */
+	uint32_t org_max_freq;               /** Original max freq of uncore */
+	uint32_t init_max_freq;              /** System max uncore freq */
+	uint32_t init_min_freq;              /** System min uncore freq */
+} __rte_cache_aligned;
+
+static struct uncore_power_info uncore_info[RTE_MAX_NUMA_NODES][MAX_NUMA_DIE];
+
+static int
+set_uncore_freq_internal(struct uncore_power_info *ui, uint32_t idx)
+{
+	uint32_t target_uncore_freq, curr_max_freq;
+	int ret;
+
+	if (idx >= MAX_UNCORE_FREQS || idx >= ui->nb_freqs) {
+		RTE_LOG(DEBUG, POWER, "Invalid uncore frequency index %u, which "
+				"should be less than %u\n", idx, ui->nb_freqs);
+		return -1;
+	}
+
+	target_uncore_freq = ui->freqs[idx];
+
+	/* check current max freq, so that the value to be flushed first
+	 * can be accurately recorded
+	 */
+	open_core_sysfs_file(&ui->f_cur_max, "rw+", POWER_INTEL_UNCORE_SYSFILE_MAX_FREQ,
+			ui->pkg, ui->die);
+	if (ui->f_cur_max == NULL) {
+		RTE_LOG(DEBUG, POWER, "failed to open %s\n",
+				POWER_INTEL_UNCORE_SYSFILE_MAX_FREQ);
+		return -1;
+	}
+	ret = read_core_sysfs_u32(ui->f_cur_max, &curr_max_freq);
+	if (ret < 0) {
+		RTE_LOG(DEBUG, POWER, "Failed to read %s\n",
+				POWER_INTEL_UNCORE_SYSFILE_MAX_FREQ);
+		fclose(ui->f_cur_max);
+		return -1;
+	}
+
+	/* check this value first before fprintf value to f_cur_max, so value isn't overwritten */
+	if (fprintf(ui->f_cur_min, "%u", target_uncore_freq) < 0) {
+		RTE_LOG(ERR, POWER, "Fail to write new uncore frequency for "
+				"pkg %02u die %02u\n", ui->pkg, ui->die);
+		return -1;
+	}
+
+	if (fprintf(ui->f_cur_max, "%u", target_uncore_freq) < 0) {
+		RTE_LOG(ERR, POWER, "Fail to write new uncore frequency for "
+				"pkg %02u die %02u\n", ui->pkg, ui->die);
+		return -1;
+	}
+
+	POWER_DEBUG_TRACE("Uncore frequency '%u' to be set for pkg %02u die %02u\n",
+				target_uncore_freq, ui->pkg, ui->die);
+
+
+	/* write the minimum value first if the target freq is less than current max */
+	if (target_uncore_freq <= curr_max_freq) {
+		fflush(ui->f_cur_min);
+		fflush(ui->f_cur_max);
+	} else {
+		fflush(ui->f_cur_max);
+		fflush(ui->f_cur_min);
+	}
+	ui->curr_idx = idx;
+
+	return 0;
+}
+
+/**
+ * Fopen the sys file for the future setting of the uncore die frequency.
+ */
+static int
+power_init_for_setting_uncore_freq(struct uncore_power_info *ui)
+{
+	FILE *f_base_min = NULL, *f_base_max = NULL, *f_min = NULL, *f_max = NULL;
+	uint32_t base_min_freq = 0, base_max_freq = 0, min_freq = 0, max_freq = 0;
+	int ret;
+
+	/* open and read all uncore sys files */
+	/* Base max */
+	open_core_sysfs_file(&f_base_max, "r", POWER_INTEL_UNCORE_SYSFILE_BASE_MAX_FREQ,
+			ui->pkg, ui->die);
+	if (f_base_max == NULL) {
+		RTE_LOG(DEBUG, POWER, "failed to open %s\n",
+				POWER_INTEL_UNCORE_SYSFILE_BASE_MAX_FREQ);
+		goto err;
+	}
+	ret = read_core_sysfs_u32(f_base_max, &base_max_freq);
+	if (ret < 0) {
+		RTE_LOG(DEBUG, POWER, "Failed to read %s\n",
+				POWER_INTEL_UNCORE_SYSFILE_BASE_MAX_FREQ);
+		goto err;
+	}
+
+	/* Base min */
+	open_core_sysfs_file(&f_base_min, "r", POWER_INTEL_UNCORE_SYSFILE_BASE_MIN_FREQ,
+		ui->pkg, ui->die);
+	if (f_base_min == NULL) {
+		RTE_LOG(DEBUG, POWER, "failed to open %s\n",
+				POWER_INTEL_UNCORE_SYSFILE_BASE_MIN_FREQ);
+		goto err;
+	}
+	if (f_base_min != NULL) {
+		ret = read_core_sysfs_u32(f_base_min, &base_min_freq);
+		if (ret < 0) {
+			RTE_LOG(DEBUG, POWER, "Failed to read %s\n",
+					POWER_INTEL_UNCORE_SYSFILE_BASE_MIN_FREQ);
+			goto err;
+		}
+	}
+
+	/* Curr min */
+	open_core_sysfs_file(&f_min, "rw+", POWER_INTEL_UNCORE_SYSFILE_MIN_FREQ,
+			ui->pkg, ui->die);
+	if (f_min == NULL) {
+		RTE_LOG(DEBUG, POWER, "failed to open %s\n",
+				POWER_INTEL_UNCORE_SYSFILE_MIN_FREQ);
+		goto err;
+	}
+	if (f_min != NULL) {
+		ret = read_core_sysfs_u32(f_min, &min_freq);
+		if (ret < 0) {
+			RTE_LOG(DEBUG, POWER, "Failed to read %s\n",
+					POWER_INTEL_UNCORE_SYSFILE_MIN_FREQ);
+			goto err;
+		}
+	}
+
+	/* Curr max */
+	open_core_sysfs_file(&f_max, "rw+", POWER_INTEL_UNCORE_SYSFILE_MAX_FREQ,
+			ui->pkg, ui->die);
+	if (f_max == NULL) {
+		RTE_LOG(DEBUG, POWER, "failed to open %s\n",
+				POWER_INTEL_UNCORE_SYSFILE_MAX_FREQ);
+		goto err;
+	}
+	if (f_max != NULL) {
+		ret = read_core_sysfs_u32(f_max, &max_freq);
+		if (ret < 0) {
+			RTE_LOG(DEBUG, POWER, "Failed to read %s\n",
+					POWER_INTEL_UNCORE_SYSFILE_MAX_FREQ);
+			goto err;
+		}
+	}
+
+	/* assign file handles */
+	ui->f_cur_min = f_min;
+	ui->f_cur_max = f_max;
+	/* save current min + max freq's so that they can be restored on exit */
+	ui->org_min_freq = min_freq;
+	ui->org_max_freq = max_freq;
+	ui->init_max_freq = base_max_freq;
+	ui->init_min_freq = base_min_freq;
+
+	return 0;
+
+err:
+	if (f_base_min != NULL)
+		fclose(f_base_min);
+	if (f_base_max != NULL)
+		fclose(f_base_max);
+	if (f_min != NULL)
+		fclose(f_min);
+	if (f_max != NULL)
+		fclose(f_max);
+	return -1;
+}
+
+/**
+ * Get the available uncore frequencies of the specific die by reading the
+ * sys file.
+ */
+static int
+power_get_available_uncore_freqs(struct uncore_power_info *ui)
+{
+	int ret = -1;
+	uint32_t i, num_uncore_freqs = 0;
+
+	num_uncore_freqs = (ui->init_max_freq - ui->init_min_freq) / BUS_FREQ + 1;
+	if (num_uncore_freqs >= MAX_UNCORE_FREQS) {
+		RTE_LOG(ERR, POWER, "Too many available uncore frequencies: %d\n",
+				num_uncore_freqs);
+		goto out;
+	}
+
+	/* Generate the uncore freq bucket array. */
+	for (i = 0; i < num_uncore_freqs; i++)
+		ui->freqs[i] = ui->init_max_freq - (i) * BUS_FREQ;
+
+	ui->nb_freqs = num_uncore_freqs;
+
+	ret = 0;
+
+	POWER_DEBUG_TRACE("%d frequency(s) of pkg %02u die %02u are available\n",
+			num_uncore_freqs, ui->pkg, ui->die);
+
+out:
+	return ret;
+}
+
+static int
+check_pkg_die_values(unsigned int pkg, unsigned int die)
+{
+	unsigned int max_pkgs, max_dies;
+	max_pkgs = rte_power_uncore_get_num_pkgs();
+	if (max_pkgs == 0)
+		return -1;
+	if (pkg >= max_pkgs) {
+		RTE_LOG(DEBUG, POWER, "Package number %02u can not exceed %u\n",
+				pkg, max_pkgs);
+		return -1;
+	}
+
+	max_dies = rte_power_uncore_get_num_dies(pkg);
+	if (max_dies == 0)
+		return -1;
+	if (die >= max_dies) {
+		RTE_LOG(DEBUG, POWER, "Die number %02u can not exceed %u\n",
+				die, max_dies);
+		return -1;
+	}
+
+	return 0;
+}
+
+int
+rte_power_uncore_init(unsigned int pkg, unsigned int die)
+{
+	struct uncore_power_info *ui;
+
+	int ret = check_pkg_die_values(pkg, die);
+	if (ret < 0)
+		return -1;
+
+	ui = &uncore_info[pkg][die];
+	ui->die = die;
+	ui->pkg = pkg;
+
+	/* Init for setting uncore die frequency */
+	if (power_init_for_setting_uncore_freq(ui) < 0) {
+		RTE_LOG(DEBUG, POWER, "Cannot init for setting uncore frequency for "
+				"pkg %02u die %02u\n", pkg, die);
+		return -1;
+	}
+
+	/* Get the available frequencies */
+	if (power_get_available_uncore_freqs(ui) < 0) {
+		RTE_LOG(DEBUG, POWER, "Cannot get available uncore frequencies of "
+				"pkg %02u die %02u\n", pkg, die);
+		return -1;
+	}
+
+	return 0;
+}
+
+int
+rte_power_uncore_exit(unsigned int pkg, unsigned int die)
+{
+	struct uncore_power_info *ui;
+
+	int ret = check_pkg_die_values(pkg, die);
+	if (ret < 0)
+		return -1;
+
+	ui = &uncore_info[pkg][die];
+
+	if (fprintf(ui->f_cur_min, "%u", ui->org_min_freq) < 0) {
+		RTE_LOG(ERR, POWER, "Fail to write original uncore frequency for "
+				"pkg %02u die %02u\n", ui->pkg, ui->die);
+		return -1;
+	}
+
+	if (fprintf(ui->f_cur_max, "%u", ui->org_max_freq) < 0) {
+		RTE_LOG(ERR, POWER, "Fail to write original uncore frequency for "
+				"pkg %02u die %02u\n", ui->pkg, ui->die);
+		return -1;
+	}
+
+	fflush(ui->f_cur_min);
+	fflush(ui->f_cur_max);
+
+	/* Close FD of setting freq */
+	fclose(ui->f_cur_min);
+	fclose(ui->f_cur_max);
+	ui->f_cur_min = NULL;
+	ui->f_cur_max = NULL;
+
+	return 0;
+}
+
+uint32_t
+rte_power_get_uncore_freq(unsigned int pkg, unsigned int die)
+{
+	int ret = check_pkg_die_values(pkg, die);
+	if (ret < 0)
+		return -1;
+
+	return uncore_info[pkg][die].curr_idx;
+}
+
+int
+rte_power_set_uncore_freq(unsigned int pkg, unsigned int die, uint32_t index)
+{
+	int ret = check_pkg_die_values(pkg, die);
+	if (ret < 0)
+		return -1;
+
+	return set_uncore_freq_internal(&(uncore_info[pkg][die]), index);
+}
+
+int
+rte_power_uncore_freq_max(unsigned int pkg, unsigned int die)
+{
+	int ret = check_pkg_die_values(pkg, die);
+	if (ret < 0)
+		return -1;
+
+	return set_uncore_freq_internal(&(uncore_info[pkg][die]), 0);
+}
+
+
+int
+rte_power_uncore_freq_min(unsigned int pkg, unsigned int die)
+{
+	int ret = check_pkg_die_values(pkg, die);
+	if (ret < 0)
+		return -1;
+
+	struct uncore_power_info *ui = &uncore_info[pkg][die];
+
+	return set_uncore_freq_internal(&(uncore_info[pkg][die]), ui->nb_freqs - 1);
+}
+
+int
+rte_power_uncore_get_num_freqs(unsigned int pkg, unsigned int die)
+{
+	int ret = check_pkg_die_values(pkg, die);
+	if (ret < 0)
+		return -1;
+
+	return uncore_info[pkg][die].nb_freqs;
+}
+
+unsigned int
+rte_power_uncore_get_num_pkgs(void)
+{
+	DIR *d;
+	struct dirent *dir;
+	unsigned int count = 0;
+	char filter[FILTER_LENGTH];
+
+	d = opendir(INTEL_UNCORE_FREQUENCY_DIR);
+	if (d == NULL) {
+		RTE_LOG(ERR, POWER,
+		"Uncore frequency management not supported/enabled on this kernel. "
+		"Please enable CONFIG_INTEL_UNCORE_FREQ_CONTROL if on x86 with linux kernel"
+		" >= 5.6\n");
+		return 0;
+	}
+
+	/* search by incrementing file name for max pkg file value */
+	while ((dir = readdir(d)) != NULL) {
+		snprintf(filter, FILTER_LENGTH, PACKAGE_FILTER, count);
+		/* make sure filter string is in file name (don't include hidden files) */
+		if (fnmatch(filter, dir->d_name, 0) == 0)
+			count++;
+	}
+
+	closedir(d);
+
+	return count;
+}
+
+unsigned int
+rte_power_uncore_get_num_dies(unsigned int pkg)
+{
+	DIR *d;
+	struct dirent *dir;
+	unsigned int count = 0, max_pkgs;
+	char filter[FILTER_LENGTH];
+
+	max_pkgs = rte_power_uncore_get_num_pkgs();
+	if (max_pkgs == 0)
+		return 0;
+	if (pkg >= max_pkgs) {
+		RTE_LOG(DEBUG, POWER, "Invalid package number\n");
+		return 0;
+	}
+
+	d = opendir(INTEL_UNCORE_FREQUENCY_DIR);
+	if (d == NULL) {
+		RTE_LOG(ERR, POWER,
+		"Uncore frequency management not supported/enabled on this kernel. "
+		"Please enable CONFIG_INTEL_UNCORE_FREQ_CONTROL if on x86 with linux kernel"
+		" >= 5.6\n");
+		return 0;
+	}
+
+	/* search by incrementing file name for max die file value */
+	while ((dir = readdir(d)) != NULL) {
+		snprintf(filter, FILTER_LENGTH, DIE_FILTER, pkg, count);
+		/* make sure filter string is in file name (don't include hidden files) */
+		if (fnmatch(filter, dir->d_name, 0) == 0)
+			count++;
+	}
+
+	closedir(d);
+
+	return count;
+}
diff --git a/lib/power/rte_power_intel_uncore.h b/lib/power/rte_power_intel_uncore.h
new file mode 100644
index 0000000000..3e891f4001
--- /dev/null
+++ b/lib/power/rte_power_intel_uncore.h
@@ -0,0 +1,194 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Intel Corporation
+ */
+
+#ifndef _RTE_POWER_INTEL_UNCORE_H
+#define _RTE_POWER_INTEL_UNCORE_H
+
+/**
+ * @file
+ * RTE Intel Uncore Frequency Management
+ */
+
+#include "rte_power.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Initialize uncore frequency management for specific die on a package. It will get the available
+ * frequencies and prepare to set new die frequencies.
+ *
+ * This function should NOT be called in the fast path.
+ *
+ * @param pkg
+ *  Package number. Each physical CPU in a system is referred to as a package.
+ * @param die
+ *  Die number. Each package can have several dies connected together via the uncore mesh.
+ *
+ * @return
+ *  - 0 on success.
+ *  - Negative on error.
+ */
+__rte_experimental
+int
+rte_power_uncore_init(unsigned int pkg, unsigned int die);
+
+/**
+ * Exit uncore frequency management on a specific die on a package. It will restore uncore min and
+ * max values to previous values before initialization of API.
+ *
+ * This function should NOT be called in the fast path.
+ *
+ * @param pkg
+ *  Package number. Each physical CPU in a system is referred to as a package.
+ * @param die
+ *  Die number. Each package can have several dies connected together via the uncore mesh.
+ *
+ * @return
+ *  - 0 on success.
+ *  - Negative on error.
+ */
+__rte_experimental
+int
+rte_power_uncore_exit(unsigned int pkg, unsigned int die);
+
+/**
+ * Return the current index of available frequencies of a specific die on a package.
+ * It should be protected outside of this function for threadsafe.
+ *
+ * This function should NOT be called in the fast path.
+ *
+ * @param pkg
+ *  Package number. Each physical CPU in a system is referred to as a package.
+ * @param die
+ *  Die number. Each package can have several dies connected together via the uncore mesh.
+ *
+ * @return
+ *  The current index of available frequencies.
+ *  If error, it will return 'RTE_POWER_INVALID_FREQ_INDEX = (~0)'.
+ */
+__rte_experimental
+uint32_t
+rte_power_get_uncore_freq(unsigned int pkg, unsigned int die);
+
+/**
+ * Set minimum and maximum uncore frequency for specified die on a package to specified
+ * index value.
+ * It should be protected outside of this function for threadsafe.
+ *
+ * This function should NOT be called in the fast path.
+ *
+ * @param pkg
+ *  Package number. Each physical CPU in a system is referred to as a package.
+ * @param die
+ *  Die number. Each package can have several dies connected together via the uncore mesh.
+ * @param index
+ *  The index of available frequencies.
+ *
+ * @return
+ *  - 1 on success with frequency changed.
+ *  - 0 on success without frequency changed.
+ *  - Negative on error.
+ */
+__rte_experimental
+int
+rte_power_set_uncore_freq(unsigned int pkg, unsigned int die, uint32_t index);
+
+/**
+ * Set minimum and maximum uncore frequency for specified die on a package to maximum
+ * value according to the available frequencies.
+ * It should be protected outside of this function for threadsafe.
+ *
+ * This function should NOT be called in the fast path.
+ *
+ * @param pkg
+ *  Package number. Each physical CPU in a system is referred to as a package.
+ * @param die
+ *  Die number. Each package can have several dies connected together via the uncore mesh.
+ *
+ * @return
+ *  - 1 on success with frequency changed.
+ *  - 0 on success without frequency changed.
+ *  - Negative on error.
+ */
+__rte_experimental
+int
+rte_power_uncore_freq_max(unsigned int pkg, unsigned int die);
+
+/**
+ * Set minimum and maximum uncore frequency for specified die on a package to minimum
+ * value according to the available frequencies.
+ * It should be protected outside of this function for threadsafe.
+ *
+ * This function should NOT be called in the fast path.
+ *
+ * @param pkg
+ *  Package number. Each physical CPU in a system is referred to as a package.
+ * @param die
+ *  Die number. Each package can have several dies connected together via the uncore mesh.
+ *
+ * @return
+ *  - 1 on success with frequency changed.
+ *  - 0 on success without frequency changed.
+ *  - Negative on error.
+ */
+__rte_experimental
+int
+rte_power_uncore_freq_min(unsigned int pkg, unsigned int die);
+
+/**
+ * Return the list length of available frequencies in the index array.
+ *
+ * This function should NOT be called in the fast path.
+ *
+ * @param pkg
+ *  Package number. Each physical CPU in a system is referred to as a package.
+ * @param die
+ *  Die number. Each package can have several dies connected together via the uncore mesh.
+ *
+ * @return
+ *  - The number of available index's in frequency array.
+ *  - Negative on error.
+ */
+__rte_experimental
+int
+rte_power_uncore_get_num_freqs(unsigned int pkg, unsigned int die);
+
+/**
+ * Return the number of packages (CPUs) on a system by parsing the uncore
+ * sysfs directory.
+ *
+ * This function should NOT be called in the fast path.
+ *
+ * @return
+ *  - Zero on error.
+ *  - Number of package on system on success.
+ */
+__rte_experimental
+unsigned int
+rte_power_uncore_get_num_pkgs(void);
+
+/**
+ * Return the number of dies for pakckages (CPUs) specified from parsing
+ * the uncore sysfs directory.
+ *
+ * This function should NOT be called in the fast path.
+ *
+ * @param pkg
+ *  Package number. Each physical CPU in a system is referred to as a package.
+ *
+ * @return
+ *  - Zero on error.
+ *  - Number of dies for package on sucecss.
+ */
+__rte_experimental
+unsigned int
+rte_power_uncore_get_num_dies(unsigned int pkg);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/power/version.map b/lib/power/version.map
index f9b2947adf..8fccbf20f7 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -48,4 +48,15 @@  EXPERIMENTAL {
 	rte_power_pmd_mgmt_set_pause_duration;
 	rte_power_pmd_mgmt_set_scaling_freq_max;
 	rte_power_pmd_mgmt_set_scaling_freq_min;
+
+	# added in 22.11
+	rte_power_get_uncore_freq;
+	rte_power_set_uncore_freq;
+	rte_power_uncore_exit;
+	rte_power_uncore_freq_max;
+	rte_power_uncore_freq_min;
+	rte_power_uncore_get_num_dies;
+	rte_power_uncore_get_num_freqs;
+	rte_power_uncore_get_num_pkgs;
+	rte_power_uncore_init;
 };