[v7,1/3] power: add uncore frequency control API to the power library

Message ID 20220928133018.1583280-2-tadhg.kearney@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series add uncore api to be called through l3fwd-power |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Tadhg Kearney Sept. 28, 2022, 1:30 p.m. UTC
  Add API to allow uncore frequency adjustment. This is done through
manipulating related uncore frequency control sysfs entries to
adjust the minimum and maximum uncore frequency values.
Nine API's are being added that are all public and experimental.

Signed-off-by: Tadhg Kearney <tadhg.kearney@intel.com>
Reviewed-by: David Hunt <david.hunt@intel.com>
---
 doc/guides/prog_guide/power_man.rst    |  38 +++
 doc/guides/rel_notes/release_22_11.rst |   5 +
 lib/power/meson.build                  |   2 +
 lib/power/rte_power_uncore.c           | 451 +++++++++++++++++++++++++
 lib/power/rte_power_uncore.h           | 194 +++++++++++
 lib/power/version.map                  |  11 +
 6 files changed, 701 insertions(+)
 create mode 100644 lib/power/rte_power_uncore.c
 create mode 100644 lib/power/rte_power_uncore.h
  

Comments

Thomas Monjalon Oct. 4, 2022, 5:09 p.m. UTC | #1
28/09/2022 15:30, Tadhg Kearney:
> Add API to allow uncore frequency adjustment. This is done through
> manipulating related uncore frequency control sysfs entries to
> adjust the minimum and maximum uncore frequency values.
> Nine API's are being added that are all public and experimental.

You cannot introduce an API without explaining what it is about.
Maybe I'm an idiot but I don't know what is "uncore".
I see it is explained in the documentation,
but few words in the commit message would not be too much.
At least you must say it for Linux on Intel,
and which feature it is controlling.

> +Uncore API
> +----------
> +
> +Abstract
> +~~~~~~~~
> +
> +Uncore is a term used by Intel to describe the functions of a microprocessor that are
> +not in the core, but which must be closely connected to the core to achieve high performance;
> +L3 cache, on-die memory controller, etc.
> +Significant power savings can be achieved by reducing the uncore frequency to its lowest value.

So this is an Intel thing.

> +
> +The Linux kernel provides the driver “intel-uncore-frequency" to control the uncore frequency limits
> +for x86 platform. The driver is available from kernel version 5.6 and above.
> +Also CONFIG_INTEL_UNCORE_FREQ_CONTROL will need to be enabled in the kernel, which was added in 5.6.
> +This manipulates the contest of MSR 0x620, which sets min/max of the uncore for the SKU.

It is correctly named "intel-uncore" in the Linux kernel.
Why not having "Intel" in the DPDK feature name?

> +
> +
> +API Overview for Uncore
> +~~~~~~~~~~~~~~~~~~~~~~~

A blank line is missing here.

> +* **Uncore Power Init**: Initialise uncore power, populate frequency array and record
> +  original min & max for pkg & die.
> +
> +* **Uncore Power Exit**: Exit uncore power, restoring original min & max for pkg & die.
> +
> +* **Get Uncore Power Freq**: Get current uncore freq index for pkg & die.
> +
> +* **Set Uncore Power Freq**: Set min & max uncore freq index for pkg & die (min and max will be the same).
> +
> +* **Uncore Power Max**: Set max uncore freq index for pkg & die.
> +
> +* **Uncore Power Min**: Set min uncore freq index for pkg & die.
> +
> +* **Get Num Freqs**: Get the number of frequencies in the index array.
> +
> +* **Get Num Pkgs**: Get the number of packages (CPUs) on the system.
> +
> +* **Get Num Dies**: Get the number of die's on a given package.

Not sure what you are listing here. Are they functions?
If you really want to keep a list, I suggest using a definition list
available in RST syntax.
If you want to provide an explanation easy to read,
full sentences connecting things together would be better.

> +
>  References
>  ----------
>  
> diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
> index cb7677fd3c..5d3f815b54 100644
> --- a/doc/guides/rel_notes/release_22_11.rst
> +++ b/doc/guides/rel_notes/release_22_11.rst
> @@ -75,6 +75,11 @@ New Features
>    * Added ``rte_event_eth_tx_adapter_instance_get`` to get Tx adapter
>      instance ID for specified ethernet device ID and Tx queue index.
>  
> +* **Added uncore frequency control API to the power library.**
> +
> +  Add api to allow uncore frequency adjustment. This is done through

s/api/API/

> +  manipulating related uncore frequency control sysfs entries to
> +  adjust the minimum and maximum uncore frequency values.

It is Linux-only for Intel hardware only.

> --- /dev/null
> +++ b/lib/power/rte_power_uncore.c

I would add "intel" in the filename.

[...]
> +#define UNCORE_FREQUENCY_DIR "/sys/devices/system/cpu/intel_uncore_frequency"
> +#define POWER_GOVERNOR_PERF "performance"
> +#define POWER_UNCORE_SYSFILE_MAX_FREQ \
> +		"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_die_%02u/max_freq_khz"
> +#define POWER_UNCORE_SYSFILE_MIN_FREQ  \
> +		"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_die_%02u/min_freq_khz"
> +#define POWER_UNCORE_SYSFILE_BASE_MAX_FREQ \
> +		"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_die_%02u/initial_max_freq_khz"
> +#define POWER_UNCORE_SYSFILE_BASE_MIN_FREQ  \
> +		"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_die_%02u/initial_min_freq_khz"

It is for Intel CPU only, right?

> + * This function should NOT be called in the fast path.
> + *
> + * @param pkg
> + *  Package number.
> + * @param die
> + *  Die number.

To me it is not clear what they are.
Is it possible to better explain "pkg" and "die" somewhere?
Is it related to NUMA nodes?
  
Tadhg Kearney Oct. 5, 2022, 10:50 a.m. UTC | #2
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Tuesday 4 October 2022 18:09
> To: Kearney, Tadhg <tadhg.kearney@intel.com>
> Cc: dev@dpdk.org; Hunt, David <david.hunt@intel.com>; Burakov, Anatoly
> <anatoly.burakov@intel.com>; Pattan, Reshma <reshma.pattan@intel.com>
> Subject: Re: [PATCH v7 1/3] power: add uncore frequency control API to the
> power library
> 
> 28/09/2022 15:30, Tadhg Kearney:
> > Add API to allow uncore frequency adjustment. This is done through
> > manipulating related uncore frequency control sysfs entries to adjust
> > the minimum and maximum uncore frequency values.
> > Nine API's are being added that are all public and experimental.
> 
> You cannot introduce an API without explaining what it is about.
> Maybe I'm an idiot but I don't know what is "uncore".
> I see it is explained in the documentation, but few words in the commit
> message would not be too much.
> At least you must say it for Linux on Intel, and which feature it is controlling.
> 
> > +Uncore API
> > +----------
> > +
> > +Abstract
> > +~~~~~~~~
> > +
> > +Uncore is a term used by Intel to describe the functions of a
> > +microprocessor that are not in the core, but which must be closely
> > +connected to the core to achieve high performance;
> > +L3 cache, on-die memory controller, etc.
> > +Significant power savings can be achieved by reducing the uncore
> frequency to its lowest value.
> 
> So this is an Intel thing.

Yes, so far the uncore kernel implementation covers Intel hardware.

> 
> > +
> > +The Linux kernel provides the driver “intel-uncore-frequency" to
> > +control the uncore frequency limits for x86 platform. The driver is
> available from kernel version 5.6 and above.
> > +Also CONFIG_INTEL_UNCORE_FREQ_CONTROL will need to be enabled in
> the kernel, which was added in 5.6.
> > +This manipulates the contest of MSR 0x620, which sets min/max of the
> uncore for the SKU.
> 
> It is correctly named "intel-uncore" in the Linux kernel.
> Why not having "Intel" in the DPDK feature name?
> 
> > +
> > +
> > +API Overview for Uncore
> > +~~~~~~~~~~~~~~~~~~~~~~~
> 
> A blank line is missing here.
> 
> > +* **Uncore Power Init**: Initialise uncore power, populate frequency
> > +array and record
> > +  original min & max for pkg & die.
> > +
> > +* **Uncore Power Exit**: Exit uncore power, restoring original min & max
> for pkg & die.
> > +
> > +* **Get Uncore Power Freq**: Get current uncore freq index for pkg &
> die.
> > +
> > +* **Set Uncore Power Freq**: Set min & max uncore freq index for pkg &
> die (min and max will be the same).
> > +
> > +* **Uncore Power Max**: Set max uncore freq index for pkg & die.
> > +
> > +* **Uncore Power Min**: Set min uncore freq index for pkg & die.
> > +
> > +* **Get Num Freqs**: Get the number of frequencies in the index array.
> > +
> > +* **Get Num Pkgs**: Get the number of packages (CPUs) on the system.
> > +
> > +* **Get Num Dies**: Get the number of die's on a given package.
> 
> Not sure what you are listing here. Are they functions?
> If you really want to keep a list, I suggest using a definition list available in RST
> syntax.
> If you want to provide an explanation easy to read, full sentences connecting
> things together would be better.

Agreed.

> 
> > +
> >  References
> >  ----------
> >
> > diff --git a/doc/guides/rel_notes/release_22_11.rst
> > b/doc/guides/rel_notes/release_22_11.rst
> > index cb7677fd3c..5d3f815b54 100644
> > --- a/doc/guides/rel_notes/release_22_11.rst
> > +++ b/doc/guides/rel_notes/release_22_11.rst
> > @@ -75,6 +75,11 @@ New Features
> >    * Added ``rte_event_eth_tx_adapter_instance_get`` to get Tx adapter
> >      instance ID for specified ethernet device ID and Tx queue index.
> >
> > +* **Added uncore frequency control API to the power library.**
> > +
> > +  Add api to allow uncore frequency adjustment. This is done through
> 
> s/api/API/
> 
> > +  manipulating related uncore frequency control sysfs entries to
> > + adjust the minimum and maximum uncore frequency values.
> 
> It is Linux-only for Intel hardware only.
> 
> > --- /dev/null
> > +++ b/lib/power/rte_power_uncore.c
> 
> I would add "intel" in the filename.
> 
> [...]
> > +#define UNCORE_FREQUENCY_DIR
> "/sys/devices/system/cpu/intel_uncore_frequency"
> > +#define POWER_GOVERNOR_PERF "performance"
> > +#define POWER_UNCORE_SYSFILE_MAX_FREQ \
> > +
> 	"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_
> die_%02u/max_freq_khz"
> > +#define POWER_UNCORE_SYSFILE_MIN_FREQ  \
> > +
> 	"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_
> die_%02u/min_freq_khz"
> > +#define POWER_UNCORE_SYSFILE_BASE_MAX_FREQ \
> > +
> 	"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_
> die_%02u/initial_max_freq_khz"
> > +#define POWER_UNCORE_SYSFILE_BASE_MIN_FREQ  \
> > +
> 	"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_
> die_%02u/initial_min_freq_khz"
> 
> It is for Intel CPU only, right?

Currently only Intel CPUs are covered by these sysfs entries, but it is possible that other platforms will be included in the future.

> 
> > + * This function should NOT be called in the fast path.
> > + *
> > + * @param pkg
> > + *  Package number.
> > + * @param die
> > + *  Die number.
> 
> To me it is not clear what they are.
> Is it possible to better explain "pkg" and "die" somewhere?
> Is it related to NUMA nodes?

Each NUMA node is a package, which can contain 1 or more dies. These dies are connected in a package together via the UNCORE mesh.
Dies may appear as separate NUMA nodes, or a group of dies on a packages may appear as a single NUMA node, depending on the BIOS configuration.
Header descriptions will be changed to:
    * Package number. Each physical CPU in a system is referred to as a package.
    * Die number. Each package can have several dies connected together via the uncore mesh.

> 

Regards, Tadhg
  
Thomas Monjalon Oct. 5, 2022, 12:11 p.m. UTC | #3
05/10/2022 12:50, Kearney, Tadhg:
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Tuesday 4 October 2022 18:09
> > To: Kearney, Tadhg <tadhg.kearney@intel.com>
> > Cc: dev@dpdk.org; Hunt, David <david.hunt@intel.com>; Burakov, Anatoly
> > <anatoly.burakov@intel.com>; Pattan, Reshma <reshma.pattan@intel.com>
> > Subject: Re: [PATCH v7 1/3] power: add uncore frequency control API to the
> > power library
> > 
> > 28/09/2022 15:30, Tadhg Kearney:
> > > Add API to allow uncore frequency adjustment. This is done through
> > > manipulating related uncore frequency control sysfs entries to adjust
> > > the minimum and maximum uncore frequency values.
> > > Nine API's are being added that are all public and experimental.
> > 
> > You cannot introduce an API without explaining what it is about.
> > Maybe I'm an idiot but I don't know what is "uncore".
> > I see it is explained in the documentation, but few words in the commit
> > message would not be too much.
> > At least you must say it for Linux on Intel, and which feature it is controlling.
> > 
> > > +Uncore API
> > > +----------
> > > +
> > > +Abstract
> > > +~~~~~~~~
> > > +
> > > +Uncore is a term used by Intel to describe the functions of a
> > > +microprocessor that are not in the core, but which must be closely
> > > +connected to the core to achieve high performance;
> > > +L3 cache, on-die memory controller, etc.
> > > +Significant power savings can be achieved by reducing the uncore
> > frequency to its lowest value.
> > 
> > So this is an Intel thing.
> 
> Yes, so far the uncore kernel implementation covers Intel hardware.
> 
> > 
> > > +
> > > +The Linux kernel provides the driver “intel-uncore-frequency" to
> > > +control the uncore frequency limits for x86 platform. The driver is
> > available from kernel version 5.6 and above.
> > > +Also CONFIG_INTEL_UNCORE_FREQ_CONTROL will need to be enabled in
> > the kernel, which was added in 5.6.
> > > +This manipulates the contest of MSR 0x620, which sets min/max of the
> > uncore for the SKU.
> > 
> > It is correctly named "intel-uncore" in the Linux kernel.
> > Why not having "Intel" in the DPDK feature name?
> > 
> > > +
> > > +
> > > +API Overview for Uncore
> > > +~~~~~~~~~~~~~~~~~~~~~~~
> > 
> > A blank line is missing here.
> > 
> > > +* **Uncore Power Init**: Initialise uncore power, populate frequency
> > > +array and record
> > > +  original min & max for pkg & die.
> > > +
> > > +* **Uncore Power Exit**: Exit uncore power, restoring original min & max
> > for pkg & die.
> > > +
> > > +* **Get Uncore Power Freq**: Get current uncore freq index for pkg &
> > die.
> > > +
> > > +* **Set Uncore Power Freq**: Set min & max uncore freq index for pkg &
> > die (min and max will be the same).
> > > +
> > > +* **Uncore Power Max**: Set max uncore freq index for pkg & die.
> > > +
> > > +* **Uncore Power Min**: Set min uncore freq index for pkg & die.
> > > +
> > > +* **Get Num Freqs**: Get the number of frequencies in the index array.
> > > +
> > > +* **Get Num Pkgs**: Get the number of packages (CPUs) on the system.
> > > +
> > > +* **Get Num Dies**: Get the number of die's on a given package.
> > 
> > Not sure what you are listing here. Are they functions?
> > If you really want to keep a list, I suggest using a definition list available in RST
> > syntax.
> > If you want to provide an explanation easy to read, full sentences connecting
> > things together would be better.
> 
> Agreed.
> 
> > 
> > > +
> > >  References
> > >  ----------
> > >
> > > diff --git a/doc/guides/rel_notes/release_22_11.rst
> > > b/doc/guides/rel_notes/release_22_11.rst
> > > index cb7677fd3c..5d3f815b54 100644
> > > --- a/doc/guides/rel_notes/release_22_11.rst
> > > +++ b/doc/guides/rel_notes/release_22_11.rst
> > > @@ -75,6 +75,11 @@ New Features
> > >    * Added ``rte_event_eth_tx_adapter_instance_get`` to get Tx adapter
> > >      instance ID for specified ethernet device ID and Tx queue index.
> > >
> > > +* **Added uncore frequency control API to the power library.**
> > > +
> > > +  Add api to allow uncore frequency adjustment. This is done through
> > 
> > s/api/API/
> > 
> > > +  manipulating related uncore frequency control sysfs entries to
> > > + adjust the minimum and maximum uncore frequency values.
> > 
> > It is Linux-only for Intel hardware only.
> > 
> > > --- /dev/null
> > > +++ b/lib/power/rte_power_uncore.c
> > 
> > I would add "intel" in the filename.
> > 
> > [...]
> > > +#define UNCORE_FREQUENCY_DIR
> > "/sys/devices/system/cpu/intel_uncore_frequency"
> > > +#define POWER_GOVERNOR_PERF "performance"
> > > +#define POWER_UNCORE_SYSFILE_MAX_FREQ \
> > > +
> > 	"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_
> > die_%02u/max_freq_khz"
> > > +#define POWER_UNCORE_SYSFILE_MIN_FREQ  \
> > > +
> > 	"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_
> > die_%02u/min_freq_khz"
> > > +#define POWER_UNCORE_SYSFILE_BASE_MAX_FREQ \
> > > +
> > 	"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_
> > die_%02u/initial_max_freq_khz"
> > > +#define POWER_UNCORE_SYSFILE_BASE_MIN_FREQ  \
> > > +
> > 	"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_
> > die_%02u/initial_min_freq_khz"
> > 
> > It is for Intel CPU only, right?
> 
> Currently only Intel CPUs are covered by these sysfs entries, but it is possible that other platforms will be included in the future.

No, these sysfs have "intel" in their names.
So it will never become something not Intel.
At the very minimum, you should name the defines with _INTEL_

> > > + * This function should NOT be called in the fast path.
> > > + *
> > > + * @param pkg
> > > + *  Package number.
> > > + * @param die
> > > + *  Die number.
> > 
> > To me it is not clear what they are.
> > Is it possible to better explain "pkg" and "die" somewhere?
> > Is it related to NUMA nodes?
> 
> Each NUMA node is a package, which can contain 1 or more dies. These dies are connected in a package together via the UNCORE mesh.
> Dies may appear as separate NUMA nodes, or a group of dies on a packages may appear as a single NUMA node, depending on the BIOS configuration.
> Header descriptions will be changed to:
>     * Package number. Each physical CPU in a system is referred to as a package.
>     * Die number. Each package can have several dies connected together via the uncore mesh.

OK better, thanks.
  

Patch

diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
index 98cfd3c1f3..cf2f5d8b69 100644
--- a/doc/guides/prog_guide/power_man.rst
+++ b/doc/guides/prog_guide/power_man.rst
@@ -276,6 +276,44 @@  API Overview for Ethernet PMD Power Management
 * **Set Scaling Max Freq**: Set the maximum frequency (kHz) to be used in Frequency
   Scaling mode.
 
+Uncore API
+----------
+
+Abstract
+~~~~~~~~
+
+Uncore is a term used by Intel to describe the functions of a microprocessor that are
+not in the core, but which must be closely connected to the core to achieve high performance;
+L3 cache, on-die memory controller, etc.
+Significant power savings can be achieved by reducing the uncore frequency to its lowest value.
+
+The Linux kernel provides the driver “intel-uncore-frequency" to control the uncore frequency limits
+for x86 platform. The driver is available from kernel version 5.6 and above.
+Also CONFIG_INTEL_UNCORE_FREQ_CONTROL will need to be enabled in the kernel, which was added in 5.6.
+This manipulates the contest of MSR 0x620, which sets min/max of the uncore for the SKU.
+
+
+API Overview for Uncore
+~~~~~~~~~~~~~~~~~~~~~~~
+* **Uncore Power Init**: Initialise uncore power, populate frequency array and record
+  original min & max for pkg & die.
+
+* **Uncore Power Exit**: Exit uncore power, restoring original min & max for pkg & die.
+
+* **Get Uncore Power Freq**: Get current uncore freq index for pkg & die.
+
+* **Set Uncore Power Freq**: Set min & max uncore freq index for pkg & die (min and max will be the same).
+
+* **Uncore Power Max**: Set max uncore freq index for pkg & die.
+
+* **Uncore Power Min**: Set min uncore freq index for pkg & die.
+
+* **Get Num Freqs**: Get the number of frequencies in the index array.
+
+* **Get Num Pkgs**: Get the number of packages (CPUs) on the system.
+
+* **Get Num Dies**: Get the number of die's on a given package.
+
 References
 ----------
 
diff --git a/doc/guides/rel_notes/release_22_11.rst b/doc/guides/rel_notes/release_22_11.rst
index cb7677fd3c..5d3f815b54 100644
--- a/doc/guides/rel_notes/release_22_11.rst
+++ b/doc/guides/rel_notes/release_22_11.rst
@@ -75,6 +75,11 @@  New Features
   * Added ``rte_event_eth_tx_adapter_instance_get`` to get Tx adapter
     instance ID for specified ethernet device ID and Tx queue index.
 
+* **Added uncore frequency control API to the power library.**
+
+  Add api to allow uncore frequency adjustment. This is done through
+  manipulating related uncore frequency control sysfs entries to
+  adjust the minimum and maximum uncore frequency values.
 
 Removed Items
 -------------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index ba8d66074b..80cdeb72d4 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -21,12 +21,14 @@  sources = files(
         'rte_power.c',
         'rte_power_empty_poll.c',
         'rte_power_pmd_mgmt.c',
+        'rte_power_uncore.c',
 )
 headers = files(
         'rte_power.h',
         'rte_power_empty_poll.h',
         'rte_power_pmd_mgmt.h',
         'rte_power_guest_channel.h',
+        'rte_power_uncore.h',
 )
 if cc.has_argument('-Wno-cast-qual')
     cflags += '-Wno-cast-qual'
diff --git a/lib/power/rte_power_uncore.c b/lib/power/rte_power_uncore.c
new file mode 100644
index 0000000000..c9049e2b6c
--- /dev/null
+++ b/lib/power/rte_power_uncore.c
@@ -0,0 +1,451 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Intel Corporation
+ */
+
+#include <errno.h>
+#include <dirent.h>
+#include <fnmatch.h>
+
+#include <rte_memcpy.h>
+
+#include "rte_power_uncore.h"
+#include "power_common.h"
+
+#define MAX_UNCORE_FREQS 32
+#define MAX_NUMA_DIE 8
+#define BUS_FREQ     100000
+#define FILTER_LENGTH 18
+#define PACKAGE_FILTER "package_%02u_die_*"
+#define DIE_FILTER "package_%02u_die_%02u"
+#define UNCORE_FREQUENCY_DIR "/sys/devices/system/cpu/intel_uncore_frequency"
+#define POWER_GOVERNOR_PERF "performance"
+#define POWER_UNCORE_SYSFILE_MAX_FREQ \
+		"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_die_%02u/max_freq_khz"
+#define POWER_UNCORE_SYSFILE_MIN_FREQ  \
+		"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_die_%02u/min_freq_khz"
+#define POWER_UNCORE_SYSFILE_BASE_MAX_FREQ \
+		"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_die_%02u/initial_max_freq_khz"
+#define POWER_UNCORE_SYSFILE_BASE_MIN_FREQ  \
+		"/sys/devices/system/cpu/intel_uncore_frequency/package_%02u_die_%02u/initial_min_freq_khz"
+
+
+struct uncore_power_info {
+	unsigned int die;                    /** Core die id */
+	unsigned int pkg;                    /** Package id */
+	uint32_t freqs[MAX_UNCORE_FREQS];/** Frequency array */
+	uint32_t nb_freqs;                   /** Number of available freqs */
+	FILE *f_cur_min;                     /** FD of scaling_min */
+	FILE *f_cur_max;                     /** FD of scaling_max */
+	uint32_t curr_idx;                   /** Freq index in freqs array */
+	uint32_t org_min_freq;               /** Original min freq of uncore */
+	uint32_t org_max_freq;               /** Original max freq of uncore */
+	uint32_t init_max_freq;              /** System max uncore freq */
+	uint32_t init_min_freq;              /** System min uncore freq */
+} __rte_cache_aligned;
+
+static struct uncore_power_info uncore_info[RTE_MAX_NUMA_NODES][MAX_NUMA_DIE];
+
+static int
+set_uncore_freq_internal(struct uncore_power_info *ui, uint32_t idx)
+{
+	uint32_t target_uncore_freq, curr_max_freq;
+	int ret;
+
+	if (idx >= MAX_UNCORE_FREQS || idx >= ui->nb_freqs) {
+		RTE_LOG(DEBUG, POWER, "Invalid uncore frequency index %u, which "
+				"should be less than %u\n", idx, ui->nb_freqs);
+		return -1;
+	}
+
+	target_uncore_freq = ui->freqs[idx];
+
+	/* check current max freq, so that the value to be flushed first
+	 * can be accurately recorded
+	 */
+	open_core_sysfs_file(&ui->f_cur_max, "rw+", POWER_UNCORE_SYSFILE_MAX_FREQ,
+			ui->pkg, ui->die);
+	if (ui->f_cur_max == NULL) {
+		RTE_LOG(DEBUG, POWER, "failed to open %s\n",
+				POWER_UNCORE_SYSFILE_MAX_FREQ);
+		return -1;
+	}
+	ret = read_core_sysfs_u32(ui->f_cur_max, &curr_max_freq);
+	if (ret < 0) {
+		RTE_LOG(DEBUG, POWER, "Failed to read %s\n",
+				POWER_UNCORE_SYSFILE_MAX_FREQ);
+		fclose(ui->f_cur_max);
+		return -1;
+	}
+
+	/* check this value first before fprintf value to f_cur_max, so value isn't overwritten */
+	if (fprintf(ui->f_cur_min, "%u", target_uncore_freq) < 0) {
+		RTE_LOG(ERR, POWER, "Fail to write new uncore frequency for "
+				"pkg %02u die %02u\n", ui->pkg, ui->die);
+		return -1;
+	}
+
+	if (fprintf(ui->f_cur_max, "%u", target_uncore_freq) < 0) {
+		RTE_LOG(ERR, POWER, "Fail to write new uncore frequency for "
+				"pkg %02u die %02u\n", ui->pkg, ui->die);
+		return -1;
+	}
+
+	POWER_DEBUG_TRACE("Uncore frequency '%u' to be set for pkg %02u die %02u\n",
+				target_uncore_freq, ui->pkg, ui->die);
+
+
+	/* write the minimum value first if the target freq is less than current max */
+	if (target_uncore_freq <= curr_max_freq) {
+		fflush(ui->f_cur_min);
+		fflush(ui->f_cur_max);
+	} else {
+		fflush(ui->f_cur_max);
+		fflush(ui->f_cur_min);
+	}
+	ui->curr_idx = idx;
+
+	return 0;
+}
+
+/**
+ * Fopen the sys file for the future setting of the uncore die frequency.
+ */
+static int
+power_init_for_setting_uncore_freq(struct uncore_power_info *ui)
+{
+	FILE *f_base_min = NULL, *f_base_max = NULL, *f_min = NULL, *f_max = NULL;
+	uint32_t base_min_freq = 0, base_max_freq = 0, min_freq = 0, max_freq = 0;
+	int ret;
+
+	/* open and read all uncore sys files */
+	/* Base max */
+	open_core_sysfs_file(&f_base_max, "r", POWER_UNCORE_SYSFILE_BASE_MAX_FREQ,
+			ui->pkg, ui->die);
+	if (f_base_max == NULL) {
+		RTE_LOG(DEBUG, POWER, "failed to open %s\n",
+				POWER_UNCORE_SYSFILE_BASE_MAX_FREQ);
+		goto err;
+	}
+	ret = read_core_sysfs_u32(f_base_max, &base_max_freq);
+	if (ret < 0) {
+		RTE_LOG(DEBUG, POWER, "Failed to read %s\n",
+				POWER_UNCORE_SYSFILE_BASE_MAX_FREQ);
+		goto err;
+	}
+
+	/* Base min */
+	open_core_sysfs_file(&f_base_min, "r", POWER_UNCORE_SYSFILE_BASE_MIN_FREQ,
+		ui->pkg, ui->die);
+	if (f_base_min == NULL) {
+		RTE_LOG(DEBUG, POWER, "failed to open %s\n",
+				POWER_UNCORE_SYSFILE_BASE_MIN_FREQ);
+		goto err;
+	}
+	if (f_base_min != NULL) {
+		ret = read_core_sysfs_u32(f_base_min, &base_min_freq);
+		if (ret < 0) {
+			RTE_LOG(DEBUG, POWER, "Failed to read %s\n",
+					POWER_UNCORE_SYSFILE_BASE_MIN_FREQ);
+			goto err;
+		}
+	}
+
+	/* Curr min */
+	open_core_sysfs_file(&f_min, "rw+", POWER_UNCORE_SYSFILE_MIN_FREQ,
+			ui->pkg, ui->die);
+	if (f_min == NULL) {
+		RTE_LOG(DEBUG, POWER, "failed to open %s\n",
+				POWER_UNCORE_SYSFILE_MIN_FREQ);
+		goto err;
+	}
+	if (f_min != NULL) {
+		ret = read_core_sysfs_u32(f_min, &min_freq);
+		if (ret < 0) {
+			RTE_LOG(DEBUG, POWER, "Failed to read %s\n",
+					POWER_UNCORE_SYSFILE_MIN_FREQ);
+			goto err;
+		}
+	}
+
+	/* Curr max */
+	open_core_sysfs_file(&f_max, "rw+", POWER_UNCORE_SYSFILE_MAX_FREQ,
+			ui->pkg, ui->die);
+	if (f_max == NULL) {
+		RTE_LOG(DEBUG, POWER, "failed to open %s\n",
+				POWER_UNCORE_SYSFILE_MAX_FREQ);
+		goto err;
+	}
+	if (f_max != NULL) {
+		ret = read_core_sysfs_u32(f_max, &max_freq);
+		if (ret < 0) {
+			RTE_LOG(DEBUG, POWER, "Failed to read %s\n",
+					POWER_UNCORE_SYSFILE_MAX_FREQ);
+			goto err;
+		}
+	}
+
+	/* assign file handles */
+	ui->f_cur_min = f_min;
+	ui->f_cur_max = f_max;
+	/* save current min + max freq's so that they can be restored on exit */
+	ui->org_min_freq = min_freq;
+	ui->org_max_freq = max_freq;
+	ui->init_max_freq = base_max_freq;
+	ui->init_min_freq = base_min_freq;
+
+	return 0;
+
+err:
+	if (f_base_min != NULL)
+		fclose(f_base_min);
+	if (f_base_max != NULL)
+		fclose(f_base_max);
+	if (f_min != NULL)
+		fclose(f_min);
+	if (f_max != NULL)
+		fclose(f_max);
+	return -1;
+}
+
+/**
+ * Get the available uncore frequencies of the specific die by reading the
+ * sys file.
+ */
+static int
+power_get_available_uncore_freqs(struct uncore_power_info *ui)
+{
+	int ret = -1;
+	uint32_t i, num_uncore_freqs = 0;
+
+	num_uncore_freqs = (ui->init_max_freq - ui->init_min_freq) / BUS_FREQ + 1;
+	if (num_uncore_freqs >= MAX_UNCORE_FREQS) {
+		RTE_LOG(ERR, POWER, "Too many available uncore frequencies: %d\n",
+				num_uncore_freqs);
+		goto out;
+	}
+
+	/* Generate the uncore freq bucket array. */
+	for (i = 0; i < num_uncore_freqs; i++)
+		ui->freqs[i] = ui->init_max_freq - (i) * BUS_FREQ;
+
+	ui->nb_freqs = num_uncore_freqs;
+
+	ret = 0;
+
+	POWER_DEBUG_TRACE("%d frequency(s) of pkg %02u die %02u are available\n",
+			num_uncore_freqs, ui->pkg, ui->die);
+
+out:
+	return ret;
+}
+
+static int
+check_pkg_die_values(unsigned int pkg, unsigned int die)
+{
+	unsigned int max_pkgs, max_dies;
+	max_pkgs = rte_power_uncore_get_num_pkgs();
+	if (max_pkgs == 0)
+		return -1;
+	if (pkg >= max_pkgs) {
+		RTE_LOG(DEBUG, POWER, "Package number %02u can not exceed %u\n",
+				pkg, max_pkgs);
+		return -1;
+	}
+
+	max_dies = rte_power_uncore_get_num_dies(pkg);
+	if (max_dies == 0)
+		return -1;
+	if (die >= max_dies) {
+		RTE_LOG(DEBUG, POWER, "Die number %02u can not exceed %u\n",
+				die, max_dies);
+		return -1;
+	}
+
+	return 0;
+}
+
+int
+rte_power_uncore_init(unsigned int pkg, unsigned int die)
+{
+	struct uncore_power_info *ui;
+
+	int ret = check_pkg_die_values(pkg, die);
+	if (ret < 0)
+		return -1;
+
+	ui = &uncore_info[pkg][die];
+	ui->die = die;
+	ui->pkg = pkg;
+
+	/* Init for setting uncore die frequency */
+	if (power_init_for_setting_uncore_freq(ui) < 0) {
+		RTE_LOG(DEBUG, POWER, "Cannot init for setting uncore frequency for "
+				"pkg %02u die %02u\n", pkg, die);
+		return -1;
+	}
+
+	/* Get the available frequencies */
+	if (power_get_available_uncore_freqs(ui) < 0) {
+		RTE_LOG(DEBUG, POWER, "Cannot get available uncore frequencies of "
+				"pkg %02u die %02u\n", pkg, die);
+		return -1;
+	}
+
+	return 0;
+}
+
+int
+rte_power_uncore_exit(unsigned int pkg, unsigned int die)
+{
+	struct uncore_power_info *ui;
+
+	int ret = check_pkg_die_values(pkg, die);
+	if (ret < 0)
+		return -1;
+
+	ui = &uncore_info[pkg][die];
+
+	if (fprintf(ui->f_cur_min, "%u", ui->org_min_freq) < 0) {
+		RTE_LOG(ERR, POWER, "Fail to write original uncore frequency for "
+				"pkg %02u die %02u\n", ui->pkg, ui->die);
+		return -1;
+	}
+
+	if (fprintf(ui->f_cur_max, "%u", ui->org_max_freq) < 0) {
+		RTE_LOG(ERR, POWER, "Fail to write original uncore frequency for "
+				"pkg %02u die %02u\n", ui->pkg, ui->die);
+		return -1;
+	}
+
+	fflush(ui->f_cur_min);
+	fflush(ui->f_cur_max);
+
+	/* Close FD of setting freq */
+	fclose(ui->f_cur_min);
+	fclose(ui->f_cur_max);
+	ui->f_cur_min = NULL;
+	ui->f_cur_max = NULL;
+
+	return 0;
+}
+
+uint32_t
+rte_power_get_uncore_freq(unsigned int pkg, unsigned int die)
+{
+	int ret = check_pkg_die_values(pkg, die);
+	if (ret < 0)
+		return -1;
+
+	return uncore_info[pkg][die].curr_idx;
+}
+
+int
+rte_power_set_uncore_freq(unsigned int pkg, unsigned int die, uint32_t index)
+{
+	int ret = check_pkg_die_values(pkg, die);
+	if (ret < 0)
+		return -1;
+
+	return set_uncore_freq_internal(&(uncore_info[pkg][die]), index);
+}
+
+int
+rte_power_uncore_freq_max(unsigned int pkg, unsigned int die)
+{
+	int ret = check_pkg_die_values(pkg, die);
+	if (ret < 0)
+		return -1;
+
+	return set_uncore_freq_internal(&(uncore_info[pkg][die]), 0);
+}
+
+
+int
+rte_power_uncore_freq_min(unsigned int pkg, unsigned int die)
+{
+	int ret = check_pkg_die_values(pkg, die);
+	if (ret < 0)
+		return -1;
+
+	struct uncore_power_info *ui = &uncore_info[pkg][die];
+
+	return set_uncore_freq_internal(&(uncore_info[pkg][die]), ui->nb_freqs - 1);
+}
+
+int
+rte_power_uncore_get_num_freqs(unsigned int pkg, unsigned int die)
+{
+	int ret = check_pkg_die_values(pkg, die);
+	if (ret < 0)
+		return -1;
+
+	return uncore_info[pkg][die].nb_freqs;
+}
+
+unsigned int
+rte_power_uncore_get_num_pkgs(void)
+{
+	DIR *d;
+	struct dirent *dir;
+	unsigned int count = 0;
+	char filter[FILTER_LENGTH];
+
+	d = opendir(UNCORE_FREQUENCY_DIR);
+	if (d == NULL) {
+		RTE_LOG(ERR, POWER,
+		"Uncore frequency management not supported/enabled on this kernel. "
+		"Please enable CONFIG_INTEL_UNCORE_FREQ_CONTROL if on x86 with linux kernel"
+		" >= 5.6\n");
+		return 0;
+	}
+
+	/* search by incrementing file name for max pkg file value */
+	while ((dir = readdir(d)) != NULL) {
+		snprintf(filter, FILTER_LENGTH, PACKAGE_FILTER, count);
+		/* make sure filter string is in file name (don't include hidden files) */
+		if (fnmatch(filter, dir->d_name, 0) == 0)
+			count++;
+	}
+
+	closedir(d);
+
+	return count;
+}
+
+unsigned int
+rte_power_uncore_get_num_dies(unsigned int pkg)
+{
+	DIR *d;
+	struct dirent *dir;
+	unsigned int count = 0, max_pkgs;
+	char filter[FILTER_LENGTH];
+
+	max_pkgs = rte_power_uncore_get_num_pkgs();
+	if (max_pkgs == 0)
+		return 0;
+	if (pkg >= max_pkgs) {
+		RTE_LOG(DEBUG, POWER, "Invalid package number\n");
+		return 0;
+	}
+
+	d = opendir(UNCORE_FREQUENCY_DIR);
+	if (d == NULL) {
+		RTE_LOG(ERR, POWER,
+		"Uncore frequency management not supported/enabled on this kernel. "
+		"Please enable CONFIG_INTEL_UNCORE_FREQ_CONTROL if on x86 with linux kernel"
+		" >= 5.6\n");
+		return 0;
+	}
+
+	/* search by incrementing file name for max die file value */
+	while ((dir = readdir(d)) != NULL) {
+		snprintf(filter, FILTER_LENGTH, DIE_FILTER, pkg, count);
+		/* make sure filter string is in file name (don't include hidden files) */
+		if (fnmatch(filter, dir->d_name, 0) == 0)
+			count++;
+	}
+
+	closedir(d);
+
+	return count;
+}
diff --git a/lib/power/rte_power_uncore.h b/lib/power/rte_power_uncore.h
new file mode 100644
index 0000000000..2be6546f49
--- /dev/null
+++ b/lib/power/rte_power_uncore.h
@@ -0,0 +1,194 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2022 Intel Corporation
+ */
+
+#ifndef _RTE_POWER_UNCORE_H
+#define _RTE_POWER_UNCORE_H
+
+/**
+ * @file
+ * RTE Uncore Frequency Management
+ */
+
+#include "rte_power.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Initialize uncore frequency management for specific die on a package. It will get the available
+ * frequencies and prepare to set new die frequencies.
+ *
+ * This function should NOT be called in the fast path.
+ *
+ * @param pkg
+ *  Package number.
+ * @param die
+ *  Die number.
+ *
+ * @return
+ *  - 0 on success.
+ *  - Negative on error.
+ */
+__rte_experimental
+int
+rte_power_uncore_init(unsigned int pkg, unsigned int die);
+
+/**
+ * Exit uncore frequency management on a specific die on a package. It will restore uncore min and
+ * max values to previous values before initialization of API.
+ *
+ * This function should NOT be called in the fast path.
+ *
+ * @param pkg
+ *  Package number.
+ * @param die
+ *  Die number.
+ *
+ * @return
+ *  - 0 on success.
+ *  - Negative on error.
+ */
+__rte_experimental
+int
+rte_power_uncore_exit(unsigned int pkg, unsigned int die);
+
+/**
+ * Return the current index of available frequencies of a specific die on a package.
+ * It should be protected outside of this function for threadsafe.
+ *
+ * This function should NOT be called in the fast path.
+ *
+ * @param pkg
+ *  Package number.
+ * @param die
+ *  Die number.
+ *
+ * @return
+ *  The current index of available frequencies.
+ *  If error, it will return 'RTE_POWER_INVALID_FREQ_INDEX = (~0)'.
+ */
+__rte_experimental
+uint32_t
+rte_power_get_uncore_freq(unsigned int pkg, unsigned int die);
+
+/**
+ * Set the new frequency for a specific die on a package by indicating the index of
+ * available frequencies.
+ * It should be protected outside of this function for threadsafe.
+ *
+ * This function should NOT be called in the fast path.
+ *
+ * @param pkg
+ *  Package number.
+ * @param die
+ *  Die number.
+ * @param index
+ *  The index of available frequencies.
+ *
+ * @return
+ *  - 1 on success with frequency changed.
+ *  - 0 on success without frequency changed.
+ *  - Negative on error.
+ */
+__rte_experimental
+int
+rte_power_set_uncore_freq(unsigned int pkg, unsigned int die, uint32_t index);
+
+/**
+ * Scale up the frequency of a specific die on a package to the highest according to the
+ * available frequencies.
+ * It should be protected outside of this function for threadsafe.
+ *
+ * This function should NOT be called in the fast path.
+ *
+ * @param pkg
+ *  Package number.
+ * @param die
+ *  Die number.
+ *
+ * @return
+ *  - 1 on success with frequency changed.
+ *  - 0 on success without frequency changed.
+ *  - Negative on error.
+ */
+__rte_experimental
+int
+rte_power_uncore_freq_max(unsigned int pkg, unsigned int die);
+
+/**
+ * Scale down the frequency of a specific die on a package to the lowest according to the
+ * available frequencies.
+ * It should be protected outside of this function for threadsafe.
+ *
+ * This function should NOT be called in the fast path.
+ *
+ * @param pkg
+ *  Package number.
+ * @param die
+ *  Die number.
+ *
+ * @return
+ *  - 1 on success with frequency changed.
+ *  - 0 on success without frequency changed.
+ *  - Negative on error.
+ */
+__rte_experimental
+int
+rte_power_uncore_freq_min(unsigned int pkg, unsigned int die);
+
+/**
+ * Return the list length of available frequencies in the index array.
+ *
+ * This function should NOT be called in the fast path.
+ *
+ * @param pkg
+ *  Package number.
+ * @param die
+ *  Die number.
+ *
+ * @return
+ *  - The number of available index's in frequency array.
+ *  - Negative on error.
+ */
+__rte_experimental
+int
+rte_power_uncore_get_num_freqs(unsigned int pkg, unsigned int die);
+
+/**
+ * Return the number of packages (CPUs) on a system by parsing the uncore
+ * sysfs directory.
+ *
+ * This function should NOT be called in the fast path.
+ *
+ * @return
+ *  - Zero on error.
+ *  - Number of package on system on success.
+ */
+__rte_experimental
+unsigned int
+rte_power_uncore_get_num_pkgs(void);
+
+/**
+ * Return the number of dies for pakckages (CPUs) specified from parsing
+ * the uncore sysfs directory.
+ *
+ * This function should NOT be called in the fast path.
+ *
+ * @param pkg
+ *  Package number.
+ *
+ * @return
+ *  - Zero on error.
+ *  - Number of dies for package on sucecss.
+ */
+__rte_experimental
+unsigned int
+rte_power_uncore_get_num_dies(unsigned int pkg);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/power/version.map b/lib/power/version.map
index f9b2947adf..8fccbf20f7 100644
--- a/lib/power/version.map
+++ b/lib/power/version.map
@@ -48,4 +48,15 @@  EXPERIMENTAL {
 	rte_power_pmd_mgmt_set_pause_duration;
 	rte_power_pmd_mgmt_set_scaling_freq_max;
 	rte_power_pmd_mgmt_set_scaling_freq_min;
+
+	# added in 22.11
+	rte_power_get_uncore_freq;
+	rte_power_set_uncore_freq;
+	rte_power_uncore_exit;
+	rte_power_uncore_freq_max;
+	rte_power_uncore_freq_min;
+	rte_power_uncore_get_num_dies;
+	rte_power_uncore_get_num_freqs;
+	rte_power_uncore_get_num_pkgs;
+	rte_power_uncore_init;
 };