From patchwork Thu Oct 15 12:04:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 80888 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 28B42A04DB; Thu, 15 Oct 2020 14:04:27 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 7EBB31E53B; Thu, 15 Oct 2020 14:04:24 +0200 (CEST) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 161121DEC8 for ; Thu, 15 Oct 2020 14:04:22 +0200 (CEST) IronPort-SDR: a9vfz0JF3xw9E2H2qGe9zJ0CjruxvdLw5R1e5RkFKFcqX4e7LJGjLT03Z/XuHUULw9lXmPD6EV CjuknC+7VQ2w== X-IronPort-AV: E=McAfee;i="6000,8403,9774"; a="165545795" X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="165545795" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2020 05:04:19 -0700 IronPort-SDR: +3VsTcxapUkl6mHPD2fSXA9TxawYPL461Jak+m17c2dfAht7j4omKlmhaLn/0QNQxNmRMeE44D Hk3G3pXM2ecg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="314485560" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.52]) by orsmga003.jf.intel.com with ESMTP; 15 Oct 2020 05:04:16 -0700 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , Bruce Richardson , Konstantin Ananyev , david.hunt@intel.com, jerinjacobk@gmail.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, chris.macnamara@intel.com Date: Thu, 15 Oct 2020 13:04:06 +0100 Message-Id: X-Mailer: git-send-email 2.17.1 In-Reply-To: References: Subject: [dpdk-dev] [PATCH v7 01/10] eal: add new x86 cpuid support for WAITPKG X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Add a new CPUID flag indicating processor support for UMONITOR/UMWAIT and TPAUSE instructions instruction. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev --- Notes: v6: - Fix typos - Better commit message lib/librte_eal/x86/include/rte_cpuflags.h | 1 + lib/librte_eal/x86/rte_cpuflags.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/lib/librte_eal/x86/include/rte_cpuflags.h b/lib/librte_eal/x86/include/rte_cpuflags.h index c1d20364d1..848ba9cbfb 100644 --- a/lib/librte_eal/x86/include/rte_cpuflags.h +++ b/lib/librte_eal/x86/include/rte_cpuflags.h @@ -132,6 +132,7 @@ enum rte_cpu_flag_t { RTE_CPUFLAG_MOVDIR64B, /**< Direct Store Instructions 64B */ RTE_CPUFLAG_AVX512VP2INTERSECT, /**< AVX512 Two Register Intersection */ + RTE_CPUFLAG_WAITPKG, /**< UMONITOR/UMWAIT/TPAUSE */ /* The last item */ RTE_CPUFLAG_NUMFLAGS, /**< This should always be the last! */ }; diff --git a/lib/librte_eal/x86/rte_cpuflags.c b/lib/librte_eal/x86/rte_cpuflags.c index 30439e7951..0325c4b93b 100644 --- a/lib/librte_eal/x86/rte_cpuflags.c +++ b/lib/librte_eal/x86/rte_cpuflags.c @@ -110,6 +110,8 @@ const struct feature_entry rte_cpu_feature_table[] = { FEAT_DEF(AVX512F, 0x00000007, 0, RTE_REG_EBX, 16) FEAT_DEF(RDSEED, 0x00000007, 0, RTE_REG_EBX, 18) + FEAT_DEF(WAITPKG, 0x00000007, 0, RTE_REG_ECX, 5) + FEAT_DEF(LAHF_SAHF, 0x80000001, 0, RTE_REG_ECX, 0) FEAT_DEF(LZCNT, 0x80000001, 0, RTE_REG_ECX, 4) From patchwork Thu Oct 15 12:04:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 80889 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id DF9C0A04DB; Thu, 15 Oct 2020 14:04:45 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 671C31E546; Thu, 15 Oct 2020 14:04:30 +0200 (CEST) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id B78A21E544 for ; Thu, 15 Oct 2020 14:04:25 +0200 (CEST) IronPort-SDR: Ho4CFQXtrnOHGVUEbZRAAPG5AMv1LlKLkK64l7pjpWyEyfzdx89BpgSZ2PAMYageNGxCehC89w omWdEey9dvFg== X-IronPort-AV: E=McAfee;i="6000,8403,9774"; a="165545844" X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="165545844" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2020 05:04:24 -0700 IronPort-SDR: pyfrJKgRaRqc+3sOD0dJVd2EJw2I/76wz075nJOHydlXgYA7nYxTOWFvOEuES8MPEPdsrFTyMy rFjYZtU0Dzyw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="314485573" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.52]) by orsmga003.jf.intel.com with ESMTP; 15 Oct 2020 05:04:19 -0700 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , Jan Viktorin , Ruifeng Wang , David Christensen , Bruce Richardson , Konstantin Ananyev , david.hunt@intel.com, jerinjacobk@gmail.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, chris.macnamara@intel.com Date: Thu, 15 Oct 2020 13:04:07 +0100 Message-Id: <7d5724730715ccfbf55baceae42b91a2351020f8.1602763439.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v7 02/10] eal: add power management intrinsics X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Add two new power management intrinsics, and provide an implementation in eal/x86 based on UMONITOR/UMWAIT instructions. The instructions are implemented as raw byte opcodes because there is not yet widespread compiler support for these instructions. The power management instructions provide an architecture-specific function to either wait until a specified TSC timestamp is reached, or optionally wait until either a TSC timestamp is reached or a memory location is written to. The monitor function also provides an optional comparison, to avoid sleeping when the expected write has already happened, and no more writes are expected. For more details, please refer to Intel(R) 64 and IA-32 Architectures Software Developer's Manual, Volume 2. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: David Christensen Acked-by: Jerin Jacob Acked-by: Konstantin Ananyev Acked-by: Ruifeng Wang --- Notes: v7: - Fix code style and other nitpicks (Konstantin) v6: - Add spinlock-enabled version to allow pthread-wait-like constructs with umwait - Clarify comments - Added experimental tags to intrinsics - Added endianness support v5: - Removed return values - Simplified intrinsics and hardcoded C0.2 state - Added other arch stubs lib/librte_eal/arm/include/meson.build | 1 + .../arm/include/rte_power_intrinsics.h | 60 ++++++++ .../include/generic/rte_power_intrinsics.h | 111 ++++++++++++++ lib/librte_eal/include/meson.build | 1 + lib/librte_eal/ppc/include/meson.build | 1 + .../ppc/include/rte_power_intrinsics.h | 60 ++++++++ lib/librte_eal/x86/include/meson.build | 1 + .../x86/include/rte_power_intrinsics.h | 135 ++++++++++++++++++ 8 files changed, 370 insertions(+) create mode 100644 lib/librte_eal/arm/include/rte_power_intrinsics.h create mode 100644 lib/librte_eal/include/generic/rte_power_intrinsics.h create mode 100644 lib/librte_eal/ppc/include/rte_power_intrinsics.h create mode 100644 lib/librte_eal/x86/include/rte_power_intrinsics.h diff --git a/lib/librte_eal/arm/include/meson.build b/lib/librte_eal/arm/include/meson.build index 73b750a18f..c6a9f70d73 100644 --- a/lib/librte_eal/arm/include/meson.build +++ b/lib/librte_eal/arm/include/meson.build @@ -20,6 +20,7 @@ arch_headers = files( 'rte_pause_32.h', 'rte_pause_64.h', 'rte_pause.h', + 'rte_power_intrinsics.h', 'rte_prefetch_32.h', 'rte_prefetch_64.h', 'rte_prefetch.h', diff --git a/lib/librte_eal/arm/include/rte_power_intrinsics.h b/lib/librte_eal/arm/include/rte_power_intrinsics.h new file mode 100644 index 0000000000..a4a1bc1159 --- /dev/null +++ b/lib/librte_eal/arm/include/rte_power_intrinsics.h @@ -0,0 +1,60 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#ifndef _RTE_POWER_INTRINSIC_ARM_H_ +#define _RTE_POWER_INTRINSIC_ARM_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include + +#include "generic/rte_power_intrinsics.h" + +/** + * This function is not supported on ARM. + */ +static inline void +rte_power_monitor(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on ARM. + */ +static inline void +rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz, rte_spinlock_t *lck) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(lck); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on ARM. + */ +static inline void +rte_power_pause(const uint64_t tsc_timestamp) +{ + RTE_SET_USED(tsc_timestamp); +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_POWER_INTRINSIC_ARM_H_ */ diff --git a/lib/librte_eal/include/generic/rte_power_intrinsics.h b/lib/librte_eal/include/generic/rte_power_intrinsics.h new file mode 100644 index 0000000000..fb897d9060 --- /dev/null +++ b/lib/librte_eal/include/generic/rte_power_intrinsics.h @@ -0,0 +1,111 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#ifndef _RTE_POWER_INTRINSIC_H_ +#define _RTE_POWER_INTRINSIC_H_ + +#include + +#include +#include + +/** + * @file + * Advanced power management operations. + * + * This file define APIs for advanced power management, + * which are architecture-dependent. + */ + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Monitor specific address for changes. This will cause the CPU to enter an + * architecture-defined optimized power state until either the specified + * memory address is written to, a certain TSC timestamp is reached, or other + * reasons cause the CPU to wake up. + * + * Additionally, an `expected` 64-bit value and 64-bit mask are provided. If + * mask is non-zero, the current value pointed to by the `p` pointer will be + * checked against the expected value, and if they match, the entering of + * optimized power state may be aborted. + * + * @param p + * Address to monitor for changes. + * @param expected_value + * Before attempting the monitoring, the `p` address may be read and compared + * against this value. If `value_mask` is zero, this step will be skipped. + * @param value_mask + * The 64-bit mask to use to extract current value from `p`. + * @param tsc_timestamp + * Maximum TSC timestamp to wait for. Note that the wait behavior is + * architecture-dependent. + * @param data_sz + * Data size (in bytes) that will be used to compare expected value with the + * memory address. Can be 1, 2, 4 or 8. Supplying any other value will lead + * to undefined result. + */ +__rte_experimental +static inline void rte_power_monitor(const volatile void *p, + const uint64_t expected_value, const uint64_t value_mask, + const uint64_t tsc_timestamp, const uint8_t data_sz); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Monitor specific address for changes. This will cause the CPU to enter an + * architecture-defined optimized power state until either the specified + * memory address is written to, a certain TSC timestamp is reached, or other + * reasons cause the CPU to wake up. + * + * Additionally, an `expected` 64-bit value and 64-bit mask are provided. If + * mask is non-zero, the current value pointed to by the `p` pointer will be + * checked against the expected value, and if they match, the entering of + * optimized power state may be aborted. + * + * This call will also lock a spinlock on entering sleep, and release it on + * waking up the CPU. + * + * @param p + * Address to monitor for changes. + * @param expected_value + * Before attempting the monitoring, the `p` address may be read and compared + * against this value. If `value_mask` is zero, this step will be skipped. + * @param value_mask + * The 64-bit mask to use to extract current value from `p`. + * @param tsc_timestamp + * Maximum TSC timestamp to wait for. Note that the wait behavior is + * architecture-dependent. + * @param data_sz + * Data size (in bytes) that will be used to compare expected value with the + * memory address. Can be 1, 2, 4 or 8. Supplying any other value will lead + * to undefined result. + * @param lck + * A spinlock that must be locked before entering the function, will be + * unlocked while the CPU is sleeping, and will be locked again once the CPU + * wakes up. + */ +__rte_experimental +static inline void rte_power_monitor_sync(const volatile void *p, + const uint64_t expected_value, const uint64_t value_mask, + const uint64_t tsc_timestamp, const uint8_t data_sz, + rte_spinlock_t *lck); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Enter an architecture-defined optimized power state until a certain TSC + * timestamp is reached. + * + * @param tsc_timestamp + * Maximum TSC timestamp to wait for. Note that the wait behavior is + * architecture-dependent. + */ +__rte_experimental +static inline void rte_power_pause(const uint64_t tsc_timestamp); + +#endif /* _RTE_POWER_INTRINSIC_H_ */ diff --git a/lib/librte_eal/include/meson.build b/lib/librte_eal/include/meson.build index cd09027958..3a12e87e19 100644 --- a/lib/librte_eal/include/meson.build +++ b/lib/librte_eal/include/meson.build @@ -60,6 +60,7 @@ generic_headers = files( 'generic/rte_memcpy.h', 'generic/rte_pause.h', 'generic/rte_prefetch.h', + 'generic/rte_power_intrinsics.h', 'generic/rte_rwlock.h', 'generic/rte_spinlock.h', 'generic/rte_ticketlock.h', diff --git a/lib/librte_eal/ppc/include/meson.build b/lib/librte_eal/ppc/include/meson.build index ab4bd28092..0873b2aecb 100644 --- a/lib/librte_eal/ppc/include/meson.build +++ b/lib/librte_eal/ppc/include/meson.build @@ -10,6 +10,7 @@ arch_headers = files( 'rte_io.h', 'rte_memcpy.h', 'rte_pause.h', + 'rte_power_intrinsics.h', 'rte_prefetch.h', 'rte_rwlock.h', 'rte_spinlock.h', diff --git a/lib/librte_eal/ppc/include/rte_power_intrinsics.h b/lib/librte_eal/ppc/include/rte_power_intrinsics.h new file mode 100644 index 0000000000..4ed03d521f --- /dev/null +++ b/lib/librte_eal/ppc/include/rte_power_intrinsics.h @@ -0,0 +1,60 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#ifndef _RTE_POWER_INTRINSIC_PPC_H_ +#define _RTE_POWER_INTRINSIC_PPC_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include + +#include "generic/rte_power_intrinsics.h" + +/** + * This function is not supported on PPC64. + */ +static inline void +rte_power_monitor(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on PPC64. + */ +static inline void +rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz, rte_spinlock_t *lck) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(lck); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on PPC64. + */ +static inline void +rte_power_pause(const uint64_t tsc_timestamp) +{ + RTE_SET_USED(tsc_timestamp); +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_POWER_INTRINSIC_PPC_H_ */ diff --git a/lib/librte_eal/x86/include/meson.build b/lib/librte_eal/x86/include/meson.build index f0e998c2fe..494a8142a2 100644 --- a/lib/librte_eal/x86/include/meson.build +++ b/lib/librte_eal/x86/include/meson.build @@ -13,6 +13,7 @@ arch_headers = files( 'rte_io.h', 'rte_memcpy.h', 'rte_prefetch.h', + 'rte_power_intrinsics.h', 'rte_pause.h', 'rte_rtm.h', 'rte_rwlock.h', diff --git a/lib/librte_eal/x86/include/rte_power_intrinsics.h b/lib/librte_eal/x86/include/rte_power_intrinsics.h new file mode 100644 index 0000000000..f9b761d796 --- /dev/null +++ b/lib/librte_eal/x86/include/rte_power_intrinsics.h @@ -0,0 +1,135 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#ifndef _RTE_POWER_INTRINSIC_X86_H_ +#define _RTE_POWER_INTRINSIC_X86_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include + +#include "generic/rte_power_intrinsics.h" + +static inline uint64_t +__get_umwait_val(const volatile void *p, const uint8_t sz) +{ + switch (sz) { + case sizeof(uint8_t): + return *(const volatile uint8_t *)p; + case sizeof(uint16_t): + return *(const volatile uint16_t *)p; + case sizeof(uint32_t): + return *(const volatile uint32_t *)p; + case sizeof(uint64_t): + return *(const volatile uint64_t *)p; + default: + /* this is an intrinsic, so we can't have any error handling */ + RTE_ASSERT(0); + return 0; + } +} + +/** + * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state. + * For more information about usage of these instructions, please refer to + * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. + */ +static inline void +rte_power_monitor(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz) +{ + const uint32_t tsc_l = (uint32_t)tsc_timestamp; + const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + /* + * we're using raw byte codes for now as only the newest compiler + * versions support this instruction natively. + */ + + /* set address for UMONITOR */ + asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" + : + : "D"(p)); + + if (value_mask) { + const uint64_t cur_value = __get_umwait_val(p, data_sz); + const uint64_t masked = cur_value & value_mask; + + /* if the masked value is already matching, abort */ + if (masked == expected_value) + return; + } + /* execute UMWAIT */ + asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" + : /* ignore rflags */ + : "D"(0), /* enter C0.2 */ + "a"(tsc_l), "d"(tsc_h)); +} + +/** + * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state. + * For more information about usage of these instructions, please refer to + * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. + */ +static inline void +rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz, rte_spinlock_t *lck) +{ + const uint32_t tsc_l = (uint32_t)tsc_timestamp; + const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + /* + * we're using raw byte codes for now as only the newest compiler + * versions support this instruction natively. + */ + + /* set address for UMONITOR */ + asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" + : + : "D"(p)); + + if (value_mask) { + const uint64_t cur_value = __get_umwait_val(p, data_sz); + const uint64_t masked = cur_value & value_mask; + + /* if the masked value is already matching, abort */ + if (masked == expected_value) + return; + } + rte_spinlock_unlock(lck); + + /* execute UMWAIT */ + asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" + : /* ignore rflags */ + : "D"(0), /* enter C0.2 */ + "a"(tsc_l), "d"(tsc_h)); + + rte_spinlock_lock(lck); +} + +/** + * This function uses TPAUSE instruction and will enter C0.2 state. For more + * information about usage of this instruction, please refer to Intel(R) 64 and + * IA-32 Architectures Software Developer's Manual. + */ +static inline void +rte_power_pause(const uint64_t tsc_timestamp) +{ + const uint32_t tsc_l = (uint32_t)tsc_timestamp; + const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + + /* execute TPAUSE */ + asm volatile(".byte 0x66, 0x0f, 0xae, 0xf7;" + : /* ignore rflags */ + : "D"(0), /* enter C0.2 */ + "a"(tsc_l), "d"(tsc_h)); +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_POWER_INTRINSIC_X86_H_ */ From patchwork Thu Oct 15 12:04:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 80890 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7E286A04DB; Thu, 15 Oct 2020 14:05:09 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 371101E54D; Thu, 15 Oct 2020 14:04:36 +0200 (CEST) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 53F091DA34 for ; Thu, 15 Oct 2020 14:04:30 +0200 (CEST) IronPort-SDR: VQfM6M85hu/RQajom0kXN8RTmSNn7PH8vCFYledSz80DIs+zSz+FwBWya0z3wMO7zJMaCuCDj7 8i0lE3b6HeOA== X-IronPort-AV: E=McAfee;i="6000,8403,9774"; a="165545884" X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="165545884" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2020 05:04:29 -0700 IronPort-SDR: VbVFBs6TS6WVlhM2bTnx5L9c0nIwEUOzTAp+n8CyNpW0jJy1qgBzSHNg3SZtAdXcV2x4skY7ad NcBss0phEjHA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="314485587" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.52]) by orsmga003.jf.intel.com with ESMTP; 15 Oct 2020 05:04:24 -0700 From: Anatoly Burakov To: dev@dpdk.org Cc: Jan Viktorin , Ruifeng Wang , David Christensen , Ray Kinsella , Neil Horman , Bruce Richardson , Konstantin Ananyev , david.hunt@intel.com, liang.j.ma@intel.com, jerinjacobk@gmail.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, chris.macnamara@intel.com Date: Thu, 15 Oct 2020 13:04:08 +0100 Message-Id: X-Mailer: git-send-email 2.17.1 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v7 03/10] eal: add intrinsics support check infrastructure X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Currently, it is not possible to check support for intrinsics that are platform-specific, cannot be abstracted in a generic way, or do not have support on all architectures. The CPUID flags can be used to some extent, but they are only defined for their platform, while intrinsics will be available to all code as they are in generic headers. This patch introduces infrastructure to check support for certain platform-specific intrinsics, and adds support for checking support for IA power management-related intrinsics for UMWAIT/UMONITOR and TPAUSE. Signed-off-by: Anatoly Burakov Acked-by: David Christensen Acked-by: Jerin Jacob Acked-by: Ruifeng Wang Acked-by: Ray Kinsella --- Notes: v6: - Fix the comments lib/librte_eal/arm/rte_cpuflags.c | 6 +++++ lib/librte_eal/include/generic/rte_cpuflags.h | 26 +++++++++++++++++++ .../include/generic/rte_power_intrinsics.h | 12 +++++++++ lib/librte_eal/ppc/rte_cpuflags.c | 7 +++++ lib/librte_eal/rte_eal_version.map | 1 + lib/librte_eal/x86/rte_cpuflags.c | 12 +++++++++ 6 files changed, 64 insertions(+) diff --git a/lib/librte_eal/arm/rte_cpuflags.c b/lib/librte_eal/arm/rte_cpuflags.c index 7b257b7873..e3a53bcece 100644 --- a/lib/librte_eal/arm/rte_cpuflags.c +++ b/lib/librte_eal/arm/rte_cpuflags.c @@ -151,3 +151,9 @@ rte_cpu_get_flag_name(enum rte_cpu_flag_t feature) return NULL; return rte_cpu_feature_table[feature].name; } + +void +rte_cpu_get_intrinsics_support(struct rte_cpu_intrinsics *intrinsics) +{ + memset(intrinsics, 0, sizeof(*intrinsics)); +} diff --git a/lib/librte_eal/include/generic/rte_cpuflags.h b/lib/librte_eal/include/generic/rte_cpuflags.h index 872f0ebe3e..28a5aecde8 100644 --- a/lib/librte_eal/include/generic/rte_cpuflags.h +++ b/lib/librte_eal/include/generic/rte_cpuflags.h @@ -13,6 +13,32 @@ #include "rte_common.h" #include +#include + +/** + * Structure used to describe platform-specific intrinsics that may or may not + * be supported at runtime. + */ +struct rte_cpu_intrinsics { + uint32_t power_monitor : 1; + /**< indicates support for rte_power_monitor function */ + uint32_t power_pause : 1; + /**< indicates support for rte_power_pause function */ +}; + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Check CPU support for various intrinsics at runtime. + * + * @param intrinsics + * Pointer to a structure to be filled. + */ +__rte_experimental +void +rte_cpu_get_intrinsics_support(struct rte_cpu_intrinsics *intrinsics); + /** * Enumeration of all CPU features supported */ diff --git a/lib/librte_eal/include/generic/rte_power_intrinsics.h b/lib/librte_eal/include/generic/rte_power_intrinsics.h index fb897d9060..03a326f076 100644 --- a/lib/librte_eal/include/generic/rte_power_intrinsics.h +++ b/lib/librte_eal/include/generic/rte_power_intrinsics.h @@ -32,6 +32,10 @@ * checked against the expected value, and if they match, the entering of * optimized power state may be aborted. * + * @warning It is responsibility of the user to check if this function is + * supported at runtime using `rte_cpu_get_features()` API call. Failing to do + * so may result in an illegal CPU instruction error. + * * @param p * Address to monitor for changes. * @param expected_value @@ -69,6 +73,10 @@ static inline void rte_power_monitor(const volatile void *p, * This call will also lock a spinlock on entering sleep, and release it on * waking up the CPU. * + * @warning It is responsibility of the user to check if this function is + * supported at runtime using `rte_cpu_get_features()` API call. Failing to do + * so may result in an illegal CPU instruction error. + * * @param p * Address to monitor for changes. * @param expected_value @@ -101,6 +109,10 @@ static inline void rte_power_monitor_sync(const volatile void *p, * Enter an architecture-defined optimized power state until a certain TSC * timestamp is reached. * + * @warning It is responsibility of the user to check if this function is + * supported at runtime using `rte_cpu_get_features()` API call. Failing to do + * so may result in an illegal CPU instruction error. + * * @param tsc_timestamp * Maximum TSC timestamp to wait for. Note that the wait behavior is * architecture-dependent. diff --git a/lib/librte_eal/ppc/rte_cpuflags.c b/lib/librte_eal/ppc/rte_cpuflags.c index 3bb7563ce9..61db5c216d 100644 --- a/lib/librte_eal/ppc/rte_cpuflags.c +++ b/lib/librte_eal/ppc/rte_cpuflags.c @@ -8,6 +8,7 @@ #include #include #include +#include #include /* Symbolic values for the entries in the auxiliary table */ @@ -108,3 +109,9 @@ rte_cpu_get_flag_name(enum rte_cpu_flag_t feature) return NULL; return rte_cpu_feature_table[feature].name; } + +void +rte_cpu_get_intrinsics_support(struct rte_cpu_intrinsics *intrinsics) +{ + memset(intrinsics, 0, sizeof(*intrinsics)); +} diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map index a93dea9fe6..ed944f2bd4 100644 --- a/lib/librte_eal/rte_eal_version.map +++ b/lib/librte_eal/rte_eal_version.map @@ -400,6 +400,7 @@ EXPERIMENTAL { # added in 20.11 __rte_eal_trace_generic_size_t; rte_service_lcore_may_be_active; + rte_cpu_get_intrinsics_support; }; INTERNAL { diff --git a/lib/librte_eal/x86/rte_cpuflags.c b/lib/librte_eal/x86/rte_cpuflags.c index 0325c4b93b..a96312ff7f 100644 --- a/lib/librte_eal/x86/rte_cpuflags.c +++ b/lib/librte_eal/x86/rte_cpuflags.c @@ -7,6 +7,7 @@ #include #include #include +#include #include "rte_cpuid.h" @@ -179,3 +180,14 @@ rte_cpu_get_flag_name(enum rte_cpu_flag_t feature) return NULL; return rte_cpu_feature_table[feature].name; } + +void +rte_cpu_get_intrinsics_support(struct rte_cpu_intrinsics *intrinsics) +{ + memset(intrinsics, 0, sizeof(*intrinsics)); + + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_WAITPKG)) { + intrinsics->power_monitor = 1; + intrinsics->power_pause = 1; + } +} From patchwork Thu Oct 15 12:04:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 80891 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id B71A7A04DB; Thu, 15 Oct 2020 14:05:36 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 368A81E567; Thu, 15 Oct 2020 14:04:44 +0200 (CEST) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id C697A1E551 for ; Thu, 15 Oct 2020 14:04:36 +0200 (CEST) IronPort-SDR: aNhCXdhl3Ja7q2oCqdDdyOiVZGLmGJH9Cz1qbas4W2R736sXDTOvbH2yFO8x1Ku+DY+Lt6IzK9 QMG7XkhMvLtw== X-IronPort-AV: E=McAfee;i="6000,8403,9774"; a="227977179" X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="227977179" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2020 05:04:34 -0700 IronPort-SDR: bVEK1/WsU80gNw0PxKvdNCNAv5r6/tHZttwB+cbS46VdVDIe8ibb0RtaeatDt7seSrEtv96Gde ixM8kM12v+RA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="314485600" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.52]) by orsmga003.jf.intel.com with ESMTP; 15 Oct 2020 05:04:30 -0700 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , Thomas Monjalon , Ferruh Yigit , Andrew Rybchenko , Ray Kinsella , Neil Horman , david.hunt@intel.com, konstantin.ananyev@intel.com, jerinjacobk@gmail.com, bruce.richardson@intel.com, timothy.mcdaniel@intel.com, gage.eads@intel.com, chris.macnamara@intel.com Date: Thu, 15 Oct 2020 13:04:09 +0100 Message-Id: X-Mailer: git-send-email 2.17.1 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v7 04/10] ethdev: add simple power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Add a simple API to allow getting address of next RX descriptor from the PMD, as well as release notes information. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev --- Notes: v7: - Fixed queue ID validation - Fixed documentation v6: - Rebase on top of latest main - Ensure the API checks queue ID (Konstantin) - Removed accidental inclusion of unrelated release notes v5: - Bring function format in line with other functions in the file - Ensure the API is supported by the driver before calling it (Konstantin) doc/guides/rel_notes/release_20_11.rst | 8 ++++++- lib/librte_ethdev/rte_ethdev.c | 23 +++++++++++++++++++ lib/librte_ethdev/rte_ethdev.h | 28 ++++++++++++++++++++++++ lib/librte_ethdev/rte_ethdev_driver.h | 28 ++++++++++++++++++++++++ lib/librte_ethdev/rte_ethdev_version.map | 1 + 5 files changed, 87 insertions(+), 1 deletion(-) diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst index c61d7fcf67..4c6a615ce9 100644 --- a/doc/guides/rel_notes/release_20_11.rst +++ b/doc/guides/rel_notes/release_20_11.rst @@ -71,7 +71,13 @@ New Features * **Added the FEC API, for a generic FEC query and config.** Added the FEC API which provides functions for query FEC capabilities and - current FEC mode from device. Also, API for configuring FEC mode is also provided. + current FEC mode from device. Also, API for configuring FEC mode is also + provided. + +* **ethdev: add 1 new EXPERIMENTAL API for PMD power management.** + + * ``rte_eth_get_wake_addr()`` + * add new eth_dev_ops ``get_wake_addr`` * **Updated Broadcom bnxt driver.** diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index 59beb8aec2..d972a3a656 100644 --- a/lib/librte_ethdev/rte_ethdev.c +++ b/lib/librte_ethdev/rte_ethdev.c @@ -4844,6 +4844,29 @@ rte_eth_tx_burst_mode_get(uint16_t port_id, uint16_t queue_id, dev->dev_ops->tx_burst_mode_get(dev, queue_id, mode)); } +int +rte_eth_get_wake_addr(uint16_t port_id, uint16_t queue_id, + volatile void **wake_addr, uint64_t *expected, uint64_t *mask, + uint8_t *data_sz) +{ + struct rte_eth_dev *dev; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); + + dev = &rte_eth_devices[port_id]; + + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_wake_addr, -ENOTSUP); + + if (queue_id >= dev->data->nb_rx_queues) { + RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id); + return -EINVAL; + } + + return eth_err(port_id, + dev->dev_ops->get_wake_addr(dev->data->rx_queues[queue_id], + wake_addr, expected, mask, data_sz)); +} + int rte_eth_dev_set_mc_addr_list(uint16_t port_id, struct rte_ether_addr *mc_addr_set, diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h index 3a31f94367..008bc3cce0 100644 --- a/lib/librte_ethdev/rte_ethdev.h +++ b/lib/librte_ethdev/rte_ethdev.h @@ -4119,6 +4119,34 @@ __rte_experimental int rte_eth_tx_burst_mode_get(uint16_t port_id, uint16_t queue_id, struct rte_eth_burst_mode *mode); +/** + * Retrieve the wake up address for the receive queue. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param queue_id + * The Rx queue on the Ethernet device for which information will be + * retrieved. + * @param wake_addr + * The pointer to the address which will be monitored. + * @param expected + * The pointer to value to be expected when descriptor is set. + * @param mask + * The pointer to comparison bitmask for the expected value. + * @param data_sz + * The pointer to data size for the expected value and comparison bitmask. + * + * @return + * - 0: Success. + * -ENOTSUP: Operation not supported. + * -EINVAL: Invalid parameters. + * -ENODEV: Invalid port ID. + */ +__rte_experimental +int rte_eth_get_wake_addr(uint16_t port_id, uint16_t queue_id, + volatile void **wake_addr, uint64_t *expected, uint64_t *mask, + uint8_t *data_sz); + /** * Retrieve device registers and register attributes (number of registers and * register size) diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h index 35cc4fb186..76b179de42 100644 --- a/lib/librte_ethdev/rte_ethdev_driver.h +++ b/lib/librte_ethdev/rte_ethdev_driver.h @@ -655,6 +655,32 @@ typedef int (*eth_fec_get_t)(struct rte_eth_dev *dev, */ typedef int (*eth_fec_set_t)(struct rte_eth_dev *dev, uint32_t fec_capa); +/** + * @internal + * Get address of memory location whose contents will change whenever there is + * new data to be received on an RX queue. + * + * @param rxq + * Ethdev queue pointer. + * @param tail_desc_addr + * The pointer point to where the address will be stored. + * @param expected + * The pointer point to value to be expected when descriptor is set. + * @param mask + * The pointer point to comparison bitmask for the expected value. + * @param data_sz + * Data size for the expected value (can be 1, 2, 4, or 8 bytes) + * @return + * Negative errno value on error, 0 on success. + * + * @retval 0 + * Success + * @retval -EINVAL + * Invalid parameters + */ +typedef int (*eth_get_wake_addr_t)(void *rxq, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t *data_sz); + /** * @internal A structure containing the functions exported by an Ethernet driver. */ @@ -801,6 +827,8 @@ struct eth_dev_ops { /**< Get Forward Error Correction(FEC) mode. */ eth_fec_set_t fec_set; /**< Set Forward Error Correction(FEC) mode. */ + eth_get_wake_addr_t get_wake_addr; + /**< Get next RX queue ring entry address. */ }; /** diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev/rte_ethdev_version.map index f8a0945812..6c2ea5996d 100644 --- a/lib/librte_ethdev/rte_ethdev_version.map +++ b/lib/librte_ethdev/rte_ethdev_version.map @@ -232,6 +232,7 @@ EXPERIMENTAL { rte_eth_fec_get_capability; rte_eth_fec_get; rte_eth_fec_set; + rte_eth_get_wake_addr; }; INTERNAL { From patchwork Thu Oct 15 12:04:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 80892 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 700F8A04DB; Thu, 15 Oct 2020 14:05:59 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id F34DE1E56A; Thu, 15 Oct 2020 14:04:45 +0200 (CEST) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 761D71E55F for ; Thu, 15 Oct 2020 14:04:39 +0200 (CEST) IronPort-SDR: RkcJdhcNBhbt31IwLz2HthF+o7cfi6K+q+QPiglCzsNAS7S3j62EIavCNi6bO7IXK6xkKQDGUy OmTaNO6x9NfQ== X-IronPort-AV: E=McAfee;i="6000,8403,9774"; a="227977187" X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="227977187" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2020 05:04:39 -0700 IronPort-SDR: MiTw/sgaCSNFSOXe1hN0xIaocxm2BhCVLQvPGOs5cnxZl3TeLdcivjp4oJX2Z1Mk1bqFYNOtic YoQgc+xT+SSg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="314485611" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.52]) by orsmga003.jf.intel.com with ESMTP; 15 Oct 2020 05:04:34 -0700 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , David Hunt , Ray Kinsella , Neil Horman , konstantin.ananyev@intel.com, jerinjacobk@gmail.com, bruce.richardson@intel.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, chris.macnamara@intel.com Date: Thu, 15 Oct 2020 13:04:10 +0100 Message-Id: X-Mailer: git-send-email 2.17.1 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v7 05/10] power: add PMD power management API and callback X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Add a simple on/off switch that will enable saving power when no packets are arriving. It is based on counting the number of empty polls and, when the number reaches a certain threshold, entering an architecture-defined optimized power state that will either wait until a TSC timestamp expires, or when packets arrive. This API mandates a core-to-single-queue mapping (that is, multiple queued per device are supported, but they have to be polled on different cores). This design is using PMD RX callbacks. 1. UMWAIT/UMONITOR: When a certain threshold of empty polls is reached, the core will go into a power optimized sleep while waiting on an address of next RX descriptor to be written to. 2. Pause instruction Instead of move the core into deeper C state, this method uses the pause instruction to avoid busy polling. 3. Frequency scaling Reuse existing DPDK power library to scale up/down core frequency depending on traffic volume. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: David Hunt Acked-by: Konstantin Ananyev --- Notes: v7: - Fixed race condition (Konstantin) - Slight rework of the structure of monitor code - Added missing inline for wakeup v6: - Added wakeup mechanism for UMWAIT - Removed memory allocation (everything is now allocated statically) - Fixed various typos and comments - Check for invalid queue ID - Moved release notes to this patch v5: - Make error checking more robust - Prevent initializing scaling if ACPI or PSTATE env wasn't set - Prevent initializing UMWAIT path if PMD doesn't support get_wake_addr - Add some debug logging - Replace x86-specific code path to generic path using the intrinsic check doc/guides/rel_notes/release_20_11.rst | 11 + lib/librte_power/meson.build | 5 +- lib/librte_power/rte_power_pmd_mgmt.c | 320 +++++++++++++++++++++++++ lib/librte_power/rte_power_pmd_mgmt.h | 92 +++++++ lib/librte_power/rte_power_version.map | 4 + 5 files changed, 430 insertions(+), 2 deletions(-) create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst index 4c6a615ce9..a814c67d54 100644 --- a/doc/guides/rel_notes/release_20_11.rst +++ b/doc/guides/rel_notes/release_20_11.rst @@ -204,6 +204,17 @@ New Features * Added support to update subport rate dynamically. +* **Add PMD power management mechanism** + + 3 new Ethernet PMD power management mechanism is added through existing + RX callback infrastructure. + + * Add power saving scheme based on UMWAIT instruction (x86 only) + * Add power saving scheme based on ``rte_pause()`` + * Add power saving scheme based on frequency scaling through the power library + * Add new EXPERIMENTAL API ``rte_power_pmd_mgmt_queue_enable()`` + * Add new EXPERIMENTAL API ``rte_power_pmd_mgmt_queue_disable()`` + Removed Items ------------- diff --git a/lib/librte_power/meson.build b/lib/librte_power/meson.build index 78c031c943..cc3c7a8646 100644 --- a/lib/librte_power/meson.build +++ b/lib/librte_power/meson.build @@ -9,6 +9,7 @@ sources = files('rte_power.c', 'power_acpi_cpufreq.c', 'power_kvm_vm.c', 'guest_channel.c', 'rte_power_empty_poll.c', 'power_pstate_cpufreq.c', + 'rte_power_pmd_mgmt.c', 'power_common.c') -headers = files('rte_power.h','rte_power_empty_poll.h') -deps += ['timer'] +headers = files('rte_power.h','rte_power_empty_poll.h','rte_power_pmd_mgmt.h') +deps += ['timer' ,'ethdev'] diff --git a/lib/librte_power/rte_power_pmd_mgmt.c b/lib/librte_power/rte_power_pmd_mgmt.c new file mode 100644 index 0000000000..344b12d5ff --- /dev/null +++ b/lib/librte_power/rte_power_pmd_mgmt.c @@ -0,0 +1,320 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2020 Intel Corporation + */ + +#include +#include +#include +#include +#include +#include + +#include "rte_power_pmd_mgmt.h" + +#define EMPTYPOLL_MAX 512 + +/** + * Possible power management states of an ethdev port. + */ +enum pmd_mgmt_state { + /** Device power management is disabled. */ + PMD_MGMT_DISABLED = 0, + /** Device power management is enabled. */ + PMD_MGMT_ENABLED, +}; + +struct pmd_queue_cfg { + enum pmd_mgmt_state pwr_mgmt_state; + /**< State of power management for this queue */ + enum rte_power_pmd_mgmt_type cb_mode; + /**< Callback mode for this queue */ + const struct rte_eth_rxtx_callback *cur_cb; + /**< Callback instance */ + rte_spinlock_t umwait_lock; + /**< Per-queue status lock - used only for UMWAIT mode */ + volatile void *wait_addr; + /**< UMWAIT wakeup address */ + uint64_t empty_poll_stats; + /**< Number of empty polls */ +} __rte_cache_aligned; + +static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT]; + +/* trigger a write to the cache line we're waiting on */ +static inline void +umwait_wakeup(volatile void *addr) +{ + uint64_t val; + + val = __atomic_load_n((volatile uint64_t *)addr, __ATOMIC_RELAXED); + __atomic_compare_exchange_n((volatile uint64_t *)addr, &val, val, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED); +} + +static inline void +umwait_sleep(struct pmd_queue_cfg *q_conf, uint16_t port_id, uint16_t qidx) +{ + volatile void *target_addr; + uint64_t expected, mask; + uint8_t data_sz; + uint16_t ret; + + /* + * get wake up address fot this RX queue, as well as expected value, + * comparison mask, and data size. + */ + ret = rte_eth_get_wake_addr(port_id, qidx, &target_addr, + &expected, &mask, &data_sz); + + /* this should always succeed as all checks have been done already */ + if (unlikely(ret != 0)) + return; + + /* + * take out a spinlock to prevent control plane from concurrently + * modifying the wakeup data. + */ + rte_spinlock_lock(&q_conf->umwait_lock); + + /* have we been disabled by control plane? */ + if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) { + /* we're good to go */ + + /* + * store the wakeup address so that control plane can trigger a + * write to this address and wake us up. + */ + q_conf->wait_addr = target_addr; + /* -1ULL is maximum value for TSC */ + rte_power_monitor_sync(target_addr, expected, mask, -1ULL, + data_sz, &q_conf->umwait_lock); + /* erase the address */ + q_conf->wait_addr = NULL; + } + rte_spinlock_unlock(&q_conf->umwait_lock); +} + +static uint16_t +clb_umwait(uint16_t port_id, uint16_t qidx, + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, + uint16_t max_pkts __rte_unused, void *addr __rte_unused) +{ + + struct pmd_queue_cfg *q_conf; + + q_conf = &port_cfg[port_id][qidx]; + + if (unlikely(nb_rx == 0)) { + q_conf->empty_poll_stats++; + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) + umwait_sleep(q_conf, port_id, qidx); + } else + q_conf->empty_poll_stats = 0; + + return nb_rx; +} + +static uint16_t +clb_pause(uint16_t port_id, uint16_t qidx, + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, + uint16_t max_pkts __rte_unused, void *addr __rte_unused) +{ + struct pmd_queue_cfg *q_conf; + + q_conf = &port_cfg[port_id][qidx]; + + if (unlikely(nb_rx == 0)) { + q_conf->empty_poll_stats++; + /* sleep for 1 microsecond */ + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) + rte_delay_us(1); + } else + q_conf->empty_poll_stats = 0; + + return nb_rx; +} + +static uint16_t +clb_scale_freq(uint16_t port_id, uint16_t qidx, + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, + uint16_t max_pkts __rte_unused, void *_ __rte_unused) +{ + struct pmd_queue_cfg *q_conf; + + q_conf = &port_cfg[port_id][qidx]; + + if (unlikely(nb_rx == 0)) { + q_conf->empty_poll_stats++; + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) + /* scale down freq */ + rte_power_freq_min(rte_lcore_id()); + } else { + q_conf->empty_poll_stats = 0; + /* scale up freq */ + rte_power_freq_max(rte_lcore_id()); + } + + return nb_rx; +} + +int +rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id, + uint16_t port_id, uint16_t queue_id, + enum rte_power_pmd_mgmt_type mode) +{ + struct rte_eth_dev *dev; + struct pmd_queue_cfg *queue_cfg; + int ret; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); + dev = &rte_eth_devices[port_id]; + + /* check if queue id is valid */ + if (queue_id >= dev->data->nb_rx_queues || + queue_id >= RTE_MAX_QUEUES_PER_PORT) { + return -EINVAL; + } + + queue_cfg = &port_cfg[port_id][queue_id]; + + if (queue_cfg->pwr_mgmt_state == PMD_MGMT_ENABLED) { + ret = -EINVAL; + goto end; + } + + switch (mode) { + case RTE_POWER_MGMT_TYPE_WAIT: + { + /* check if rte_power_monitor is supported */ + uint64_t dummy_expected, dummy_mask; + struct rte_cpu_intrinsics i; + volatile void *dummy_addr; + uint8_t dummy_sz; + + rte_cpu_get_intrinsics_support(&i); + + if (!i.power_monitor) { + RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n"); + ret = -ENOTSUP; + goto end; + } + + /* check if the device supports the necessary PMD API */ + if (rte_eth_get_wake_addr(port_id, queue_id, + &dummy_addr, &dummy_expected, + &dummy_mask, &dummy_sz) == -ENOTSUP) { + RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_rxq_ring_addr_get\n"); + ret = -ENOTSUP; + goto end; + } + /* initialize UMWAIT spinlock */ + rte_spinlock_init(&queue_cfg->umwait_lock); + + /* initialize data before enabling the callback */ + queue_cfg->empty_poll_stats = 0; + queue_cfg->cb_mode = mode; + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; + + queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id, + clb_umwait, NULL); + break; + } + case RTE_POWER_MGMT_TYPE_SCALE: + { + enum power_management_env env; + /* only PSTATE and ACPI modes are supported */ + if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) && + !rte_power_check_env_supported( + PM_ENV_PSTATE_CPUFREQ)) { + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are supported\n"); + ret = -ENOTSUP; + goto end; + } + /* ensure we could initialize the power library */ + if (rte_power_init(lcore_id)) { + ret = -EINVAL; + goto end; + } + /* ensure we initialized the correct env */ + env = rte_power_get_env(); + if (env != PM_ENV_ACPI_CPUFREQ && + env != PM_ENV_PSTATE_CPUFREQ) { + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes were initialized\n"); + ret = -ENOTSUP; + goto end; + } + /* initialize data before enabling the callback */ + queue_cfg->empty_poll_stats = 0; + queue_cfg->cb_mode = mode; + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; + + queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, + queue_id, clb_scale_freq, NULL); + break; + } + case RTE_POWER_MGMT_TYPE_PAUSE: + /* initialize data before enabling the callback */ + queue_cfg->empty_poll_stats = 0; + queue_cfg->cb_mode = mode; + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; + + queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id, + clb_pause, NULL); + break; + } + ret = 0; + +end: + return ret; +} + +int +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id, + uint16_t port_id, uint16_t queue_id) +{ + struct pmd_queue_cfg *queue_cfg; + int ret; + + queue_cfg = &port_cfg[port_id][queue_id]; + + if (queue_cfg->pwr_mgmt_state == PMD_MGMT_DISABLED) { + ret = -EINVAL; + goto end; + } + + switch (queue_cfg->cb_mode) { + case RTE_POWER_MGMT_TYPE_WAIT: + rte_spinlock_lock(&queue_cfg->umwait_lock); + + /* wake up the core from UMWAIT sleep, if any */ + if (queue_cfg->wait_addr != NULL) + umwait_wakeup(queue_cfg->wait_addr); + /* + * we need to disable early as there might be callback currently + * spinning on a lock. + */ + queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED; + + rte_spinlock_unlock(&queue_cfg->umwait_lock); + /* fall-through */ + case RTE_POWER_MGMT_TYPE_PAUSE: + rte_eth_remove_rx_callback(port_id, queue_id, + queue_cfg->cur_cb); + break; + case RTE_POWER_MGMT_TYPE_SCALE: + rte_power_freq_max(lcore_id); + rte_eth_remove_rx_callback(port_id, queue_id, + queue_cfg->cur_cb); + rte_power_exit(lcore_id); + break; + } + /* + * we don't free the RX callback here because it is unsafe to do so + * unless we know for a fact that all data plane threads have stopped. + */ + queue_cfg->cur_cb = NULL; + queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED; + ret = 0; +end: + return ret; +} diff --git a/lib/librte_power/rte_power_pmd_mgmt.h b/lib/librte_power/rte_power_pmd_mgmt.h new file mode 100644 index 0000000000..a7a3f98268 --- /dev/null +++ b/lib/librte_power/rte_power_pmd_mgmt.h @@ -0,0 +1,92 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2020 Intel Corporation + */ + +#ifndef _RTE_POWER_PMD_MGMT_H +#define _RTE_POWER_PMD_MGMT_H + +/** + * @file + * RTE PMD Power Management + */ +#include +#include + +#include +#include +#include +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * PMD Power Management Type + */ +enum rte_power_pmd_mgmt_type { + /** WAIT callback mode. */ + RTE_POWER_MGMT_TYPE_WAIT = 1, + /** PAUSE callback mode. */ + RTE_POWER_MGMT_TYPE_PAUSE, + /** Freq Scaling callback mode. */ + RTE_POWER_MGMT_TYPE_SCALE, +}; + +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice + * + * Setup per-queue power management callback. + * + * @note This function is not thread-safe. + * + * @param lcore_id + * lcore_id. + * @param port_id + * The port identifier of the Ethernet device. + * @param queue_id + * The queue identifier of the Ethernet device. + * @param mode + * The power management callback function type. + + * @return + * 0 on success + * <0 on error + */ +__rte_experimental +int +rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id, + uint16_t port_id, + uint16_t queue_id, + enum rte_power_pmd_mgmt_type mode); + +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice + * + * Remove per-queue power management callback. + * + * @note This function is not thread-safe. + * + * @param lcore_id + * lcore_id. + * @param port_id + * The port identifier of the Ethernet device. + * @param queue_id + * The queue identifier of the Ethernet device. + * @return + * 0 on success + * <0 on error + */ +__rte_experimental +int +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id, + uint16_t port_id, + uint16_t queue_id); +#ifdef __cplusplus +} +#endif + +#endif diff --git a/lib/librte_power/rte_power_version.map b/lib/librte_power/rte_power_version.map index 69ca9af616..3f2f6cd6f6 100644 --- a/lib/librte_power/rte_power_version.map +++ b/lib/librte_power/rte_power_version.map @@ -34,4 +34,8 @@ EXPERIMENTAL { rte_power_guest_channel_receive_msg; rte_power_poll_stat_fetch; rte_power_poll_stat_update; + # added in 20.11 + rte_power_pmd_mgmt_queue_enable; + rte_power_pmd_mgmt_queue_disable; + }; From patchwork Thu Oct 15 12:04:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 80893 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id DC577A04DB; Thu, 15 Oct 2020 14:06:25 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 296511E862; Thu, 15 Oct 2020 14:04:55 +0200 (CEST) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 8F0691E566 for ; Thu, 15 Oct 2020 14:04:43 +0200 (CEST) IronPort-SDR: Mun+hH9/qJ8QPgyUTsYfBAvNeoM0/PakUR89b/Yh/6yhwKGsSIXw8oxyTCn2HEt2uj1BvTe+Ed 0Jix8QMi5vJg== X-IronPort-AV: E=McAfee;i="6000,8403,9774"; a="227977197" X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="227977197" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2020 05:04:43 -0700 IronPort-SDR: hOYgdTEyVwMGoC2gpuEHmD8EJrvsdvtKrH1qz+NiUli2lv5WCZYelXZIqr0jfYz7xTSvZL4NCT rRbDVIZfMJJw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="314485626" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.52]) by orsmga003.jf.intel.com with ESMTP; 15 Oct 2020 05:04:39 -0700 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , Jeff Guo , Haiyue Wang , david.hunt@intel.com, konstantin.ananyev@intel.com, jerinjacobk@gmail.com, bruce.richardson@intel.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, chris.macnamara@intel.com Date: Thu, 15 Oct 2020 13:04:11 +0100 Message-Id: X-Mailer: git-send-email 2.17.1 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v7 06/10] net/ixgbe: implement power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Implement support for the power management API by implementing a `get_wake_addr` function that will return an address of an RX ring's status bit. Signed-off-by: Anatoly Burakov Signed-off-by: Liang Ma Acked-by: Konstantin Ananyev --- drivers/net/ixgbe/ixgbe_ethdev.c | 1 + drivers/net/ixgbe/ixgbe_rxtx.c | 25 +++++++++++++++++++++++++ drivers/net/ixgbe/ixgbe_rxtx.h | 2 ++ 3 files changed, 28 insertions(+) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 0b98e210e7..30b3f416d4 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -588,6 +588,7 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = { .udp_tunnel_port_del = ixgbe_dev_udp_tunnel_port_del, .tm_ops_get = ixgbe_tm_ops_get, .tx_done_cleanup = ixgbe_dev_tx_done_cleanup, + .get_wake_addr = ixgbe_get_wake_addr, }; /* diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index 29d385c062..b1d656d270 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.c +++ b/drivers/net/ixgbe/ixgbe_rxtx.c @@ -1366,6 +1366,31 @@ const uint32_t RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP, }; +int ixgbe_get_wake_addr(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t *data_sz) +{ + volatile union ixgbe_adv_rx_desc *rxdp; + struct ixgbe_rx_queue *rxq = rx_queue; + uint16_t desc; + + desc = rxq->rx_tail; + rxdp = &rxq->rx_ring[desc]; + /* watch for changes in status bit */ + *tail_desc_addr = &rxdp->wb.upper.status_error; + + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + *expected = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD); + *mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD); + + /* the registers are 32-bit */ + *data_sz = 4; + + return 0; +} + /* @note: fix ixgbe_dev_supported_ptypes_get() if any change here. */ static inline uint32_t ixgbe_rxd_pkt_info_to_pkt_type(uint32_t pkt_info, uint16_t ptype_mask) diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h index 0b5589ef4d..6b9afec655 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.h +++ b/drivers/net/ixgbe/ixgbe_rxtx.h @@ -299,5 +299,7 @@ uint64_t ixgbe_get_tx_port_offloads(struct rte_eth_dev *dev); uint64_t ixgbe_get_rx_queue_offloads(struct rte_eth_dev *dev); uint64_t ixgbe_get_rx_port_offloads(struct rte_eth_dev *dev); uint64_t ixgbe_get_tx_queue_offloads(struct rte_eth_dev *dev); +int ixgbe_get_wake_addr(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t *data_sz); #endif /* _IXGBE_RXTX_H_ */ From patchwork Thu Oct 15 12:04:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 80894 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 690E9A04DB; Thu, 15 Oct 2020 14:06:43 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id CF7FC1E86B; Thu, 15 Oct 2020 14:04:56 +0200 (CEST) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id B879F1E553 for ; Thu, 15 Oct 2020 14:04:47 +0200 (CEST) IronPort-SDR: VZlwZMv4I8zjtWyxmqv8aLFylvo4ijjYzEFdyUlCgeiwcgSIwHGRbvEVmmeXVi0Jdpj0VjJDaJ +Phz9aGeOQyg== X-IronPort-AV: E=McAfee;i="6000,8403,9774"; a="227977204" X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="227977204" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2020 05:04:47 -0700 IronPort-SDR: t60p2DWBnJNkRQ4PvBOKvMlc4Z/HVw/rdNP2a9P7Layy0pORd6P7M8gTbZE1EcXkladdrrBejn 8IXMlaxTl2pg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="314485637" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.52]) by orsmga003.jf.intel.com with ESMTP; 15 Oct 2020 05:04:43 -0700 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , Beilei Xing , Jeff Guo , david.hunt@intel.com, konstantin.ananyev@intel.com, jerinjacobk@gmail.com, bruce.richardson@intel.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, chris.macnamara@intel.com Date: Thu, 15 Oct 2020 13:04:12 +0100 Message-Id: X-Mailer: git-send-email 2.17.1 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v7 07/10] net/i40e: implement power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Implement support for the power management API by implementing a `get_wake_addr` function that will return an address of an RX ring's status bit. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev Acked-by: Jeff Guo --- drivers/net/i40e/i40e_ethdev.c | 1 + drivers/net/i40e/i40e_rxtx.c | 26 ++++++++++++++++++++++++++ drivers/net/i40e/i40e_rxtx.h | 2 ++ 3 files changed, 29 insertions(+) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 943cfe71dc..cab86f8ec9 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -513,6 +513,7 @@ static const struct eth_dev_ops i40e_eth_dev_ops = { .mtu_set = i40e_dev_mtu_set, .tm_ops_get = i40e_tm_ops_get, .tx_done_cleanup = i40e_tx_done_cleanup, + .get_wake_addr = i40e_get_wake_addr, }; /* store statistics names and its offset in stats structure */ diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index f2844d3f74..cdb1cd494b 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -71,6 +71,32 @@ #define I40E_TX_OFFLOAD_NOTSUP_MASK \ (PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK) +int +i40e_get_wake_addr(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t *data_sz) +{ + struct i40e_rx_queue *rxq = rx_queue; + volatile union i40e_rx_desc *rxdp; + uint16_t desc; + + desc = rxq->rx_tail; + rxdp = &rxq->rx_ring[desc]; + /* watch for changes in status bit */ + *tail_desc_addr = &rxdp->wb.qword1.status_error_len; + + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + *expected = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); + *mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); + + /* registers are 64-bit */ + *data_sz = 8; + + return 0; +} + static inline void i40e_rxd_to_vlan_tci(struct rte_mbuf *mb, volatile union i40e_rx_desc *rxdp) { diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h index 57d7b4160b..5826cf1099 100644 --- a/drivers/net/i40e/i40e_rxtx.h +++ b/drivers/net/i40e/i40e_rxtx.h @@ -248,6 +248,8 @@ uint16_t i40e_recv_scattered_pkts_vec_avx2(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts); uint16_t i40e_xmit_pkts_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts); +int i40e_get_wake_addr(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *value, uint8_t *data_sz); /* For each value it means, datasheet of hardware can tell more details * From patchwork Thu Oct 15 12:04:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 80895 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id B829EA04DB; Thu, 15 Oct 2020 14:07:09 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id B6B961E877; Thu, 15 Oct 2020 14:05:10 +0200 (CEST) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 6E6F01E877 for ; Thu, 15 Oct 2020 14:05:07 +0200 (CEST) IronPort-SDR: CKGSvBjDJ2lLpv09vVbftO+FYkKmKZq6JpNHriV4tAR2k6JUbfv/oBud8OLvJkhEBqCLVKVW7F 7AQnA5Gr9Gdg== X-IronPort-AV: E=McAfee;i="6000,8403,9774"; a="227977214" X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="227977214" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2020 05:04:51 -0700 IronPort-SDR: 0c9X5aMspbiwbc2rpLNrL0Di2v7SOmKL94Px1aAlATAtG0K267Muvz72aOXaJd2k4A4i5nvKRh DRHYlhc6Uaew== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="314485646" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.52]) by orsmga003.jf.intel.com with ESMTP; 15 Oct 2020 05:04:47 -0700 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , Qiming Yang , Qi Zhang , david.hunt@intel.com, konstantin.ananyev@intel.com, jerinjacobk@gmail.com, bruce.richardson@intel.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, chris.macnamara@intel.com Date: Thu, 15 Oct 2020 13:04:13 +0100 Message-Id: <095c61d91e71b4e9c3d67c5e21be4edc5cb3481d.1602763439.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v7 08/10] net/ice: implement power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Implement support for the power management API by implementing a `get_wake_addr` function that will return an address of an RX ring's status bit. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev --- drivers/net/ice/ice_ethdev.c | 1 + drivers/net/ice/ice_rxtx.c | 26 ++++++++++++++++++++++++++ drivers/net/ice/ice_rxtx.h | 2 ++ 3 files changed, 29 insertions(+) diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index d8ce09d28f..260de5dfd7 100644 --- a/drivers/net/ice/ice_ethdev.c +++ b/drivers/net/ice/ice_ethdev.c @@ -216,6 +216,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = { .udp_tunnel_port_add = ice_dev_udp_tunnel_port_add, .udp_tunnel_port_del = ice_dev_udp_tunnel_port_del, .tx_done_cleanup = ice_tx_done_cleanup, + .get_wake_addr = ice_get_wake_addr, }; /* store statistics names and its offset in stats structure */ diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index 79e6df11f4..7c0f963d96 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -25,6 +25,32 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask; uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask; uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask; +int ice_get_wake_addr(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t *data_sz) +{ + volatile union ice_rx_flex_desc *rxdp; + struct ice_rx_queue *rxq = rx_queue; + uint16_t desc; + + desc = rxq->rx_tail; + rxdp = &rxq->rx_ring[desc]; + /* watch for changes in status bit */ + *tail_desc_addr = &rxdp->wb.status_error0; + + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + *expected = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S); + *mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S); + + /* register is 16-bit */ + *data_sz = 2; + + return 0; +} + + static inline uint8_t ice_proto_xtr_type_to_rxdid(uint8_t xtr_type) { diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h index 1c23c7541e..7eeb8d467e 100644 --- a/drivers/net/ice/ice_rxtx.h +++ b/drivers/net/ice/ice_rxtx.h @@ -250,6 +250,8 @@ uint16_t ice_xmit_pkts_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts); int ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc); int ice_tx_done_cleanup(void *txq, uint32_t free_cnt); +int ice_get_wake_addr(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t *data_sz); #define FDIR_PARSING_ENABLE_PER_QUEUE(ad, on) do { \ int i; \ From patchwork Thu Oct 15 12:04:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 80897 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9D15FA04DB; Thu, 15 Oct 2020 14:07:58 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 1ED551E895; Thu, 15 Oct 2020 14:05:19 +0200 (CEST) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 674F61E877 for ; Thu, 15 Oct 2020 14:05:09 +0200 (CEST) IronPort-SDR: Fp8/xlFFmgF8feP9+pbdTWHdPaqVUj4NKW6S4UqvlGHLCfiMMsb77qjabnKP4QwXhDVc23HgxQ uv8mKEXT854g== X-IronPort-AV: E=McAfee;i="6000,8403,9774"; a="227977228" X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="227977228" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2020 05:04:55 -0700 IronPort-SDR: nd0T87eheTKR6X/nF264LGnsZSyVI37NUhIDv6ThVXUKH99AQXhB9sACppDuoJK2dDCvlc/3iH rljSDILmLfiQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="314485662" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.52]) by orsmga003.jf.intel.com with ESMTP; 15 Oct 2020 05:04:51 -0700 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , David Hunt , konstantin.ananyev@intel.com, jerinjacobk@gmail.com, bruce.richardson@intel.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, chris.macnamara@intel.com Date: Thu, 15 Oct 2020 13:04:14 +0100 Message-Id: <6bb55b9d5310881e0d8d47321dc7f99d1f030c59.1602763439.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v7 09/10] examples/l3fwd-power: enable PMD power mgmt X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Add PMD power management feature support to l3fwd-power sample app. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: David Hunt --- Notes: v6: - Fixed typos in documentation .../sample_app_ug/l3_forward_power_man.rst | 13 ++++++ examples/l3fwd-power/main.c | 41 ++++++++++++++++++- 2 files changed, 53 insertions(+), 1 deletion(-) diff --git a/doc/guides/sample_app_ug/l3_forward_power_man.rst b/doc/guides/sample_app_ug/l3_forward_power_man.rst index 0cc6f2e62e..2767fb91aa 100644 --- a/doc/guides/sample_app_ug/l3_forward_power_man.rst +++ b/doc/guides/sample_app_ug/l3_forward_power_man.rst @@ -109,6 +109,8 @@ where, * --telemetry: Telemetry mode. +* --pmd-mgmt: PMD power management mode. + See :doc:`l3_forward` for details. The L3fwd-power example reuses the L3fwd command line options. @@ -459,3 +461,14 @@ reference cycles and accordingly busy rate is set to either 0% or The new stats ``empty_poll`` , ``full_poll`` and ``busy_percent`` can be viewed by running the script ``/usertools/dpdk-telemetry-client.py`` and selecting the menu option ``Send for global Metrics``. + +PMD power management Mode +------------------------- + +The PMD power management mode support for ``l3fwd-power`` is a standalone mode, in this mode +``l3fwd-power`` does simple l3fwding along with enable the power saving scheme on specific +port/queue/lcore. Main purpose for this mode is to demonstrate how to use the PMD power management API. + +.. code-block:: console + + ./build/examples/dpdk-l3fwd-power -l 1-3 -- --pmd-mgmt -p 0x0f --config="(0,0,2),(0,1,3)" diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index d0e6c9bd77..af64dd521f 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -47,6 +47,7 @@ #include #include #include +#include #include "perf_core.h" #include "main.h" @@ -199,7 +200,8 @@ enum appmode { APP_MODE_LEGACY, APP_MODE_EMPTY_POLL, APP_MODE_TELEMETRY, - APP_MODE_INTERRUPT + APP_MODE_INTERRUPT, + APP_MODE_PMD_MGMT }; enum appmode app_mode; @@ -1750,6 +1752,7 @@ parse_ep_config(const char *q_arg) #define CMD_LINE_OPT_EMPTY_POLL "empty-poll" #define CMD_LINE_OPT_INTERRUPT_ONLY "interrupt-only" #define CMD_LINE_OPT_TELEMETRY "telemetry" +#define CMD_LINE_OPT_PMD_MGMT "pmd-mgmt" /* Parse the argument given in the command line of the application */ static int @@ -1771,6 +1774,7 @@ parse_args(int argc, char **argv) {CMD_LINE_OPT_LEGACY, 0, 0, 0}, {CMD_LINE_OPT_TELEMETRY, 0, 0, 0}, {CMD_LINE_OPT_INTERRUPT_ONLY, 0, 0, 0}, + {CMD_LINE_OPT_PMD_MGMT, 0, 0, 0}, {NULL, 0, 0, 0} }; @@ -1881,6 +1885,16 @@ parse_args(int argc, char **argv) printf("telemetry mode is enabled\n"); } + if (!strncmp(lgopts[option_index].name, + CMD_LINE_OPT_PMD_MGMT, + sizeof(CMD_LINE_OPT_PMD_MGMT))) { + if (app_mode != APP_MODE_DEFAULT) { + printf(" power mgmt mode is mutually exclusive with other modes\n"); + return -1; + } + app_mode = APP_MODE_PMD_MGMT; + printf("PMD power mgmt mode is enabled\n"); + } if (!strncmp(lgopts[option_index].name, CMD_LINE_OPT_INTERRUPT_ONLY, sizeof(CMD_LINE_OPT_INTERRUPT_ONLY))) { @@ -2437,6 +2451,8 @@ mode_to_str(enum appmode mode) return "telemetry"; case APP_MODE_INTERRUPT: return "interrupt-only"; + case APP_MODE_PMD_MGMT: + return "pmd mgmt"; default: return "invalid"; } @@ -2705,6 +2721,12 @@ main(int argc, char **argv) } else if (!check_ptype(portid)) rte_exit(EXIT_FAILURE, "PMD can not provide needed ptypes\n"); + if (app_mode == APP_MODE_PMD_MGMT) { + rte_power_pmd_mgmt_queue_enable(lcore_id, + portid, queueid, + RTE_POWER_MGMT_TYPE_SCALE); + + } } } @@ -2790,6 +2812,9 @@ main(int argc, char **argv) SKIP_MASTER); } else if (app_mode == APP_MODE_INTERRUPT) { rte_eal_mp_remote_launch(main_intr_loop, NULL, CALL_MASTER); + } else if (app_mode == APP_MODE_PMD_MGMT) { + rte_eal_mp_remote_launch(main_telemetry_loop, NULL, + CALL_MASTER); } if (app_mode == APP_MODE_EMPTY_POLL || app_mode == APP_MODE_TELEMETRY) @@ -2812,6 +2837,20 @@ main(int argc, char **argv) if (app_mode == APP_MODE_EMPTY_POLL) rte_power_empty_poll_stat_free(); + if (app_mode == APP_MODE_PMD_MGMT) { + for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { + if (rte_lcore_is_enabled(lcore_id) == 0) + continue; + qconf = &lcore_conf[lcore_id]; + for (queue = 0; queue < qconf->n_rx_queue; ++queue) { + portid = qconf->rx_queue_list[queue].port_id; + queueid = qconf->rx_queue_list[queue].queue_id; + rte_power_pmd_mgmt_queue_disable(lcore_id, + portid, queueid); + } + } + } + if ((app_mode == APP_MODE_LEGACY || app_mode == APP_MODE_EMPTY_POLL) && deinit_power_library()) rte_exit(EXIT_FAILURE, "deinit_power_library failed\n"); From patchwork Thu Oct 15 12:04:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 80896 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2C95BA04DB; Thu, 15 Oct 2020 14:07:36 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 93DFB1E556; Thu, 15 Oct 2020 14:05:17 +0200 (CEST) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 952341E559 for ; Thu, 15 Oct 2020 14:05:09 +0200 (CEST) IronPort-SDR: Z2EjlfoQo2oT98jvAtXvlOAyW8bVluO16Y2RTfBKj4Y2RzbcsLOfs1K6crARMaO+lxxKI2onsC oFTEnYvtKJ0w== X-IronPort-AV: E=McAfee;i="6000,8403,9774"; a="227977241" X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="227977241" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2020 05:04:58 -0700 IronPort-SDR: fAHcWKu0to8ZF6Dmpl1GDnadcizF3OgS2UB+B1Mhd+na30YJYByVSQzdSBGI4ml/d+IldTHF+u 6nJ5XtUxFARA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,378,1596524400"; d="scan'208";a="314485696" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.52]) by orsmga003.jf.intel.com with ESMTP; 15 Oct 2020 05:04:55 -0700 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , David Hunt , konstantin.ananyev@intel.com, jerinjacobk@gmail.com, bruce.richardson@intel.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, chris.macnamara@intel.com Date: Thu, 15 Oct 2020 13:04:15 +0100 Message-Id: <1b4dd60949bbada468f9fe9fcdc3f5357aaa1ec8.1602763439.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: References: In-Reply-To: References: Subject: [dpdk-dev] [PATCH v7 10/10] doc: update programmer's guide for power library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Update programmer's guide to document PMD power management usage. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: David Hunt --- Notes: v5: - Moved l3fwd-power update to the l3fwd-power-related commit - Some rewordings and clarifications doc/guides/prog_guide/power_man.rst | 42 +++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst index 0a3755a901..38c64d31e4 100644 --- a/doc/guides/prog_guide/power_man.rst +++ b/doc/guides/prog_guide/power_man.rst @@ -192,6 +192,45 @@ User Cases ---------- The mechanism can applied to any device which is based on polling. e.g. NIC, FPGA. +PMD Power Management API +------------------------ + +Abstract +~~~~~~~~ +Existing power management mechanisms require developers to change application +design or change code to make use of it. The PMD power management API provides a +convenient alternative by utilizing Ethernet PMD RX callbacks, and triggering +power saving whenever empty poll count reaches a certain number. + + * UMWAIT/UMONITOR + + This power saving scheme will put the CPU into optimized power state and use + the UMWAIT/UMONITOR instructions to monitor the Ethernet PMD RX descriptor + address, and wake the CPU up whenever there's new traffic. + + * Pause + + This power saving scheme will use the `rte_pause` function to avoid busy + polling. + + * Frequency scaling + + This power saving scheme will use existing power library functionality to + scale the core frequency up/down depending on traffic volume. + + +.. note:: + + Currently, this power management API is limited to mandatory mapping of 1 + queue to 1 core (multiple queues are supported, but they must be polled from + different cores). + +API Overview for PMD Power Management +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +* **Queue Enable**: Enable specific power scheme for certain queue/port/core + +* **Queue Disable**: Disable power scheme for certain queue/port/core + References ---------- @@ -200,3 +239,6 @@ References * The :doc:`../sample_app_ug/vm_power_management` chapter in the :doc:`../sample_app_ug/index` section. + +* The :doc:`../sample_app_ug/rxtx_callbacks` + chapter in the :doc:`../sample_app_ug/index` section.