From patchwork Thu Jan 14 14:46:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 86625 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 52B70A0A02; Thu, 14 Jan 2021 15:46:26 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 712B0141300; Thu, 14 Jan 2021 15:46:21 +0100 (CET) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by mails.dpdk.org (Postfix) with ESMTP id C82641412F6 for ; Thu, 14 Jan 2021 15:46:19 +0100 (CET) IronPort-SDR: hWYQQ6Bn9bCIJLjoSi2KYBXyyThJE4FzRfhX7kJvyW6vx3rvUsdy/OXxPsef93sXzDBK9c1FhR cF1FqFsy8HVA== X-IronPort-AV: E=McAfee;i="6000,8403,9863"; a="174870224" X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="174870224" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 06:46:19 -0800 IronPort-SDR: 6bw7ph0IqZHEmNhRE8psaRqjhBGE1EL6e8YXDGB6ZJJ+HtORaPQ0g4+QCEJE/zTp7gKecy4XH9 wW1LmA38aRhQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="465271283" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by fmsmga001.fm.intel.com with ESMTP; 14 Jan 2021 06:46:16 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Jerin Jacob , Ruifeng Wang , Jan Viktorin , David Christensen , Ray Kinsella , Neil Horman , Bruce Richardson , Konstantin Ananyev , thomas@monjalon.net, timothy.mcdaniel@intel.com, david.hunt@intel.com, chris.macnamara@intel.com Date: Thu, 14 Jan 2021 14:46:03 +0000 Message-Id: <7fd091ecf4480a6fdf84bc34a8e1700eaf793e13.1610635488.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v17 01/11] eal: uninline power intrinsics X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Currently, power intrinsics are inline functions. Make them part of the ABI so that we can have various internal data associated with them without exposing said data to the outside world. Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev --- Notes: v14: - Fix compile issues on ARM and PPC64 by moving implementations to .c files .../arm/include/rte_power_intrinsics.h | 40 ------ lib/librte_eal/arm/meson.build | 1 + lib/librte_eal/arm/rte_power_intrinsics.c | 45 +++++++ .../include/generic/rte_power_intrinsics.h | 6 +- .../ppc/include/rte_power_intrinsics.h | 40 ------ lib/librte_eal/ppc/meson.build | 1 + lib/librte_eal/ppc/rte_power_intrinsics.c | 45 +++++++ lib/librte_eal/version.map | 3 + .../x86/include/rte_power_intrinsics.h | 115 ----------------- lib/librte_eal/x86/meson.build | 1 + lib/librte_eal/x86/rte_power_intrinsics.c | 120 ++++++++++++++++++ 11 files changed, 219 insertions(+), 198 deletions(-) create mode 100644 lib/librte_eal/arm/rte_power_intrinsics.c create mode 100644 lib/librte_eal/ppc/rte_power_intrinsics.c create mode 100644 lib/librte_eal/x86/rte_power_intrinsics.c diff --git a/lib/librte_eal/arm/include/rte_power_intrinsics.h b/lib/librte_eal/arm/include/rte_power_intrinsics.h index a4a1bc1159..9e498e9ebf 100644 --- a/lib/librte_eal/arm/include/rte_power_intrinsics.h +++ b/lib/librte_eal/arm/include/rte_power_intrinsics.h @@ -13,46 +13,6 @@ extern "C" { #include "generic/rte_power_intrinsics.h" -/** - * This function is not supported on ARM. - */ -static inline void -rte_power_monitor(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz) -{ - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); - RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(data_sz); -} - -/** - * This function is not supported on ARM. - */ -static inline void -rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz, rte_spinlock_t *lck) -{ - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); - RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(lck); - RTE_SET_USED(data_sz); -} - -/** - * This function is not supported on ARM. - */ -static inline void -rte_power_pause(const uint64_t tsc_timestamp) -{ - RTE_SET_USED(tsc_timestamp); -} - #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/arm/meson.build b/lib/librte_eal/arm/meson.build index d62875ebae..6ec53ea03a 100644 --- a/lib/librte_eal/arm/meson.build +++ b/lib/librte_eal/arm/meson.build @@ -7,4 +7,5 @@ sources += files( 'rte_cpuflags.c', 'rte_cycles.c', 'rte_hypervisor.c', + 'rte_power_intrinsics.c', ) diff --git a/lib/librte_eal/arm/rte_power_intrinsics.c b/lib/librte_eal/arm/rte_power_intrinsics.c new file mode 100644 index 0000000000..ab1f44f611 --- /dev/null +++ b/lib/librte_eal/arm/rte_power_intrinsics.c @@ -0,0 +1,45 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2021 Intel Corporation + */ + +#include "rte_power_intrinsics.h" + +/** + * This function is not supported on ARM. + */ +void +rte_power_monitor(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on ARM. + */ +void +rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz, rte_spinlock_t *lck) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(lck); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on ARM. + */ +void +rte_power_pause(const uint64_t tsc_timestamp) +{ + RTE_SET_USED(tsc_timestamp); +} diff --git a/lib/librte_eal/include/generic/rte_power_intrinsics.h b/lib/librte_eal/include/generic/rte_power_intrinsics.h index dd520d90fa..67977bd511 100644 --- a/lib/librte_eal/include/generic/rte_power_intrinsics.h +++ b/lib/librte_eal/include/generic/rte_power_intrinsics.h @@ -52,7 +52,7 @@ * to undefined result. */ __rte_experimental -static inline void rte_power_monitor(const volatile void *p, +void rte_power_monitor(const volatile void *p, const uint64_t expected_value, const uint64_t value_mask, const uint64_t tsc_timestamp, const uint8_t data_sz); @@ -97,7 +97,7 @@ static inline void rte_power_monitor(const volatile void *p, * wakes up. */ __rte_experimental -static inline void rte_power_monitor_sync(const volatile void *p, +void rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, const uint64_t value_mask, const uint64_t tsc_timestamp, const uint8_t data_sz, rte_spinlock_t *lck); @@ -118,6 +118,6 @@ static inline void rte_power_monitor_sync(const volatile void *p, * architecture-dependent. */ __rte_experimental -static inline void rte_power_pause(const uint64_t tsc_timestamp); +void rte_power_pause(const uint64_t tsc_timestamp); #endif /* _RTE_POWER_INTRINSIC_H_ */ diff --git a/lib/librte_eal/ppc/include/rte_power_intrinsics.h b/lib/librte_eal/ppc/include/rte_power_intrinsics.h index 4ed03d521f..c0e9ac279f 100644 --- a/lib/librte_eal/ppc/include/rte_power_intrinsics.h +++ b/lib/librte_eal/ppc/include/rte_power_intrinsics.h @@ -13,46 +13,6 @@ extern "C" { #include "generic/rte_power_intrinsics.h" -/** - * This function is not supported on PPC64. - */ -static inline void -rte_power_monitor(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz) -{ - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); - RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(data_sz); -} - -/** - * This function is not supported on PPC64. - */ -static inline void -rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz, rte_spinlock_t *lck) -{ - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); - RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(lck); - RTE_SET_USED(data_sz); -} - -/** - * This function is not supported on PPC64. - */ -static inline void -rte_power_pause(const uint64_t tsc_timestamp) -{ - RTE_SET_USED(tsc_timestamp); -} - #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/ppc/meson.build b/lib/librte_eal/ppc/meson.build index f4b6d95c42..43c46542fb 100644 --- a/lib/librte_eal/ppc/meson.build +++ b/lib/librte_eal/ppc/meson.build @@ -7,4 +7,5 @@ sources += files( 'rte_cpuflags.c', 'rte_cycles.c', 'rte_hypervisor.c', + 'rte_power_intrinsics.c', ) diff --git a/lib/librte_eal/ppc/rte_power_intrinsics.c b/lib/librte_eal/ppc/rte_power_intrinsics.c new file mode 100644 index 0000000000..84340ca2a4 --- /dev/null +++ b/lib/librte_eal/ppc/rte_power_intrinsics.c @@ -0,0 +1,45 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2021 Intel Corporation + */ + +#include "rte_power_intrinsics.h" + +/** + * This function is not supported on PPC64. + */ +void +rte_power_monitor(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on PPC64. + */ +void +rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz, rte_spinlock_t *lck) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(lck); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on PPC64. + */ +void +rte_power_pause(const uint64_t tsc_timestamp) +{ + RTE_SET_USED(tsc_timestamp); +} diff --git a/lib/librte_eal/version.map b/lib/librte_eal/version.map index b1db7ec795..32eceb8869 100644 --- a/lib/librte_eal/version.map +++ b/lib/librte_eal/version.map @@ -405,6 +405,9 @@ EXPERIMENTAL { rte_vect_set_max_simd_bitwidth; # added in 21.02 + rte_power_monitor; + rte_power_monitor_sync; + rte_power_pause; rte_thread_tls_key_create; rte_thread_tls_key_delete; rte_thread_tls_value_get; diff --git a/lib/librte_eal/x86/include/rte_power_intrinsics.h b/lib/librte_eal/x86/include/rte_power_intrinsics.h index c7d790c854..e4c2b87f73 100644 --- a/lib/librte_eal/x86/include/rte_power_intrinsics.h +++ b/lib/librte_eal/x86/include/rte_power_intrinsics.h @@ -13,121 +13,6 @@ extern "C" { #include "generic/rte_power_intrinsics.h" -static inline uint64_t -__rte_power_get_umwait_val(const volatile void *p, const uint8_t sz) -{ - switch (sz) { - case sizeof(uint8_t): - return *(const volatile uint8_t *)p; - case sizeof(uint16_t): - return *(const volatile uint16_t *)p; - case sizeof(uint32_t): - return *(const volatile uint32_t *)p; - case sizeof(uint64_t): - return *(const volatile uint64_t *)p; - default: - /* this is an intrinsic, so we can't have any error handling */ - RTE_ASSERT(0); - return 0; - } -} - -/** - * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state. - * For more information about usage of these instructions, please refer to - * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. - */ -static inline void -rte_power_monitor(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz) -{ - const uint32_t tsc_l = (uint32_t)tsc_timestamp; - const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); - /* - * we're using raw byte codes for now as only the newest compiler - * versions support this instruction natively. - */ - - /* set address for UMONITOR */ - asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" - : - : "D"(p)); - - if (value_mask) { - const uint64_t cur_value = __rte_power_get_umwait_val(p, data_sz); - const uint64_t masked = cur_value & value_mask; - - /* if the masked value is already matching, abort */ - if (masked == expected_value) - return; - } - /* execute UMWAIT */ - asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" - : /* ignore rflags */ - : "D"(0), /* enter C0.2 */ - "a"(tsc_l), "d"(tsc_h)); -} - -/** - * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state. - * For more information about usage of these instructions, please refer to - * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. - */ -static inline void -rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz, rte_spinlock_t *lck) -{ - const uint32_t tsc_l = (uint32_t)tsc_timestamp; - const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); - /* - * we're using raw byte codes for now as only the newest compiler - * versions support this instruction natively. - */ - - /* set address for UMONITOR */ - asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" - : - : "D"(p)); - - if (value_mask) { - const uint64_t cur_value = __rte_power_get_umwait_val(p, data_sz); - const uint64_t masked = cur_value & value_mask; - - /* if the masked value is already matching, abort */ - if (masked == expected_value) - return; - } - rte_spinlock_unlock(lck); - - /* execute UMWAIT */ - asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" - : /* ignore rflags */ - : "D"(0), /* enter C0.2 */ - "a"(tsc_l), "d"(tsc_h)); - - rte_spinlock_lock(lck); -} - -/** - * This function uses TPAUSE instruction and will enter C0.2 state. For more - * information about usage of this instruction, please refer to Intel(R) 64 and - * IA-32 Architectures Software Developer's Manual. - */ -static inline void -rte_power_pause(const uint64_t tsc_timestamp) -{ - const uint32_t tsc_l = (uint32_t)tsc_timestamp; - const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); - - /* execute TPAUSE */ - asm volatile(".byte 0x66, 0x0f, 0xae, 0xf7;" - : /* ignore rflags */ - : "D"(0), /* enter C0.2 */ - "a"(tsc_l), "d"(tsc_h)); -} - #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/x86/meson.build b/lib/librte_eal/x86/meson.build index e78f29002e..dfd42dee0c 100644 --- a/lib/librte_eal/x86/meson.build +++ b/lib/librte_eal/x86/meson.build @@ -8,4 +8,5 @@ sources += files( 'rte_cycles.c', 'rte_hypervisor.c', 'rte_spinlock.c', + 'rte_power_intrinsics.c', ) diff --git a/lib/librte_eal/x86/rte_power_intrinsics.c b/lib/librte_eal/x86/rte_power_intrinsics.c new file mode 100644 index 0000000000..34c5fd9c3e --- /dev/null +++ b/lib/librte_eal/x86/rte_power_intrinsics.c @@ -0,0 +1,120 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#include "rte_power_intrinsics.h" + +static inline uint64_t +__get_umwait_val(const volatile void *p, const uint8_t sz) +{ + switch (sz) { + case sizeof(uint8_t): + return *(const volatile uint8_t *)p; + case sizeof(uint16_t): + return *(const volatile uint16_t *)p; + case sizeof(uint32_t): + return *(const volatile uint32_t *)p; + case sizeof(uint64_t): + return *(const volatile uint64_t *)p; + default: + /* this is an intrinsic, so we can't have any error handling */ + RTE_ASSERT(0); + return 0; + } +} + +/** + * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state. + * For more information about usage of these instructions, please refer to + * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. + */ +void +rte_power_monitor(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz) +{ + const uint32_t tsc_l = (uint32_t)tsc_timestamp; + const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + /* + * we're using raw byte codes for now as only the newest compiler + * versions support this instruction natively. + */ + + /* set address for UMONITOR */ + asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" + : + : "D"(p)); + + if (value_mask) { + const uint64_t cur_value = __get_umwait_val(p, data_sz); + const uint64_t masked = cur_value & value_mask; + + /* if the masked value is already matching, abort */ + if (masked == expected_value) + return; + } + /* execute UMWAIT */ + asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" + : /* ignore rflags */ + : "D"(0), /* enter C0.2 */ + "a"(tsc_l), "d"(tsc_h)); +} + +/** + * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state. + * For more information about usage of these instructions, please refer to + * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. + */ +void +rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz, rte_spinlock_t *lck) +{ + const uint32_t tsc_l = (uint32_t)tsc_timestamp; + const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + /* + * we're using raw byte codes for now as only the newest compiler + * versions support this instruction natively. + */ + + /* set address for UMONITOR */ + asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" + : + : "D"(p)); + + if (value_mask) { + const uint64_t cur_value = __get_umwait_val(p, data_sz); + const uint64_t masked = cur_value & value_mask; + + /* if the masked value is already matching, abort */ + if (masked == expected_value) + return; + } + rte_spinlock_unlock(lck); + + /* execute UMWAIT */ + asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" + : /* ignore rflags */ + : "D"(0), /* enter C0.2 */ + "a"(tsc_l), "d"(tsc_h)); + + rte_spinlock_lock(lck); +} + +/** + * This function uses TPAUSE instruction and will enter C0.2 state. For more + * information about usage of this instruction, please refer to Intel(R) 64 and + * IA-32 Architectures Software Developer's Manual. + */ +void +rte_power_pause(const uint64_t tsc_timestamp) +{ + const uint32_t tsc_l = (uint32_t)tsc_timestamp; + const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + + /* execute TPAUSE */ + asm volatile(".byte 0x66, 0x0f, 0xae, 0xf7;" + : /* ignore rflags */ + : "D"(0), /* enter C0.2 */ + "a"(tsc_l), "d"(tsc_h)); +} From patchwork Thu Jan 14 14:46:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 86626 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D3B8BA0A02; Thu, 14 Jan 2021 15:46:34 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id AF6BC141307; Thu, 14 Jan 2021 15:46:24 +0100 (CET) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by mails.dpdk.org (Postfix) with ESMTP id 978C6141303 for ; Thu, 14 Jan 2021 15:46:22 +0100 (CET) IronPort-SDR: wmFt0+KhHAbJGC7YoItawwVnIQ9k1Lq9G3YmyNsJWoEyNZkxS7vIgeo5GQ3ZIoIXSRpvhwTnA+ weXSmQ6tR7Qg== X-IronPort-AV: E=McAfee;i="6000,8403,9863"; a="174870236" X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="174870236" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 06:46:22 -0800 IronPort-SDR: r66jECUlno0KEckhpCXY+VGkjTl3WSue//0nRldX3oGVHGUdUEXIrcEKkCJwbxd09NgR1MVBRY Ehd6sLfxYcdQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="465271293" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by fmsmga001.fm.intel.com with ESMTP; 14 Jan 2021 06:46:19 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Jan Viktorin , Ruifeng Wang , Jerin Jacob , David Christensen , Bruce Richardson , Konstantin Ananyev , thomas@monjalon.net, timothy.mcdaniel@intel.com, david.hunt@intel.com, chris.macnamara@intel.com Date: Thu, 14 Jan 2021 14:46:04 +0000 Message-Id: <59fdd1679b9a65fc7097251b41195679f0d370fa.1610635488.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v17 02/11] eal: avoid invalid API usage in power intrinsics X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Currently, the API documentation mandates that if the user wants to use the power management intrinsics, they need to call the `rte_cpu_get_intrinsics_support` API and check support for specific intrinsics. However, if the user does not do that, it is possible to get illegal instruction error because we're using raw instruction opcodes, which may or may not be supported at runtime. Now that we have everything in a C file, we can check for support at startup and prevent the user from possibly encountering illegal instruction errors. We also add return values to the API's as well, because why not. Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev --- Notes: v16: - Add return values and proper error handling to the API v15: - Remove accidental whitespace changes v14: - Replace uint8_t with bool v14: - Replace uint8_t with bool lib/librte_eal/arm/rte_power_intrinsics.c | 12 +++- .../include/generic/rte_power_intrinsics.h | 24 +++++-- lib/librte_eal/ppc/rte_power_intrinsics.c | 12 +++- lib/librte_eal/x86/rte_power_intrinsics.c | 64 +++++++++++++++++-- 4 files changed, 94 insertions(+), 18 deletions(-) diff --git a/lib/librte_eal/arm/rte_power_intrinsics.c b/lib/librte_eal/arm/rte_power_intrinsics.c index ab1f44f611..7e7552fa8a 100644 --- a/lib/librte_eal/arm/rte_power_intrinsics.c +++ b/lib/librte_eal/arm/rte_power_intrinsics.c @@ -7,7 +7,7 @@ /** * This function is not supported on ARM. */ -void +int rte_power_monitor(const volatile void *p, const uint64_t expected_value, const uint64_t value_mask, const uint64_t tsc_timestamp, const uint8_t data_sz) @@ -17,12 +17,14 @@ rte_power_monitor(const volatile void *p, const uint64_t expected_value, RTE_SET_USED(value_mask); RTE_SET_USED(tsc_timestamp); RTE_SET_USED(data_sz); + + return -ENOTSUP; } /** * This function is not supported on ARM. */ -void +int rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, const uint64_t value_mask, const uint64_t tsc_timestamp, const uint8_t data_sz, rte_spinlock_t *lck) @@ -33,13 +35,17 @@ rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, RTE_SET_USED(tsc_timestamp); RTE_SET_USED(lck); RTE_SET_USED(data_sz); + + return -ENOTSUP; } /** * This function is not supported on ARM. */ -void +int rte_power_pause(const uint64_t tsc_timestamp) { RTE_SET_USED(tsc_timestamp); + + return -ENOTSUP; } diff --git a/lib/librte_eal/include/generic/rte_power_intrinsics.h b/lib/librte_eal/include/generic/rte_power_intrinsics.h index 67977bd511..37e4ec0414 100644 --- a/lib/librte_eal/include/generic/rte_power_intrinsics.h +++ b/lib/librte_eal/include/generic/rte_power_intrinsics.h @@ -34,7 +34,6 @@ * * @warning It is responsibility of the user to check if this function is * supported at runtime using `rte_cpu_get_intrinsics_support()` API call. - * Failing to do so may result in an illegal CPU instruction error. * * @param p * Address to monitor for changes. @@ -50,9 +49,14 @@ * Data size (in bytes) that will be used to compare expected value with the * memory address. Can be 1, 2, 4 or 8. Supplying any other value will lead * to undefined result. + * + * @return + * 0 on success + * -EINVAL on invalid parameters + * -ENOTSUP if unsupported */ __rte_experimental -void rte_power_monitor(const volatile void *p, +int rte_power_monitor(const volatile void *p, const uint64_t expected_value, const uint64_t value_mask, const uint64_t tsc_timestamp, const uint8_t data_sz); @@ -75,7 +79,6 @@ void rte_power_monitor(const volatile void *p, * * @warning It is responsibility of the user to check if this function is * supported at runtime using `rte_cpu_get_intrinsics_support()` API call. - * Failing to do so may result in an illegal CPU instruction error. * * @param p * Address to monitor for changes. @@ -95,9 +98,14 @@ void rte_power_monitor(const volatile void *p, * A spinlock that must be locked before entering the function, will be * unlocked while the CPU is sleeping, and will be locked again once the CPU * wakes up. + * + * @return + * 0 on success + * -EINVAL on invalid parameters + * -ENOTSUP if unsupported */ __rte_experimental -void rte_power_monitor_sync(const volatile void *p, +int rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, const uint64_t value_mask, const uint64_t tsc_timestamp, const uint8_t data_sz, rte_spinlock_t *lck); @@ -111,13 +119,17 @@ void rte_power_monitor_sync(const volatile void *p, * * @warning It is responsibility of the user to check if this function is * supported at runtime using `rte_cpu_get_intrinsics_support()` API call. - * Failing to do so may result in an illegal CPU instruction error. * * @param tsc_timestamp * Maximum TSC timestamp to wait for. Note that the wait behavior is * architecture-dependent. + * + * @return + * 0 on success + * -EINVAL on invalid parameters + * -ENOTSUP if unsupported */ __rte_experimental -void rte_power_pause(const uint64_t tsc_timestamp); +int rte_power_pause(const uint64_t tsc_timestamp); #endif /* _RTE_POWER_INTRINSIC_H_ */ diff --git a/lib/librte_eal/ppc/rte_power_intrinsics.c b/lib/librte_eal/ppc/rte_power_intrinsics.c index 84340ca2a4..929e0611b0 100644 --- a/lib/librte_eal/ppc/rte_power_intrinsics.c +++ b/lib/librte_eal/ppc/rte_power_intrinsics.c @@ -7,7 +7,7 @@ /** * This function is not supported on PPC64. */ -void +int rte_power_monitor(const volatile void *p, const uint64_t expected_value, const uint64_t value_mask, const uint64_t tsc_timestamp, const uint8_t data_sz) @@ -17,12 +17,14 @@ rte_power_monitor(const volatile void *p, const uint64_t expected_value, RTE_SET_USED(value_mask); RTE_SET_USED(tsc_timestamp); RTE_SET_USED(data_sz); + + return -ENOTSUP; } /** * This function is not supported on PPC64. */ -void +int rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, const uint64_t value_mask, const uint64_t tsc_timestamp, const uint8_t data_sz, rte_spinlock_t *lck) @@ -33,13 +35,17 @@ rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, RTE_SET_USED(tsc_timestamp); RTE_SET_USED(lck); RTE_SET_USED(data_sz); + + return -ENOTSUP; } /** * This function is not supported on PPC64. */ -void +int rte_power_pause(const uint64_t tsc_timestamp) { RTE_SET_USED(tsc_timestamp); + + return -ENOTSUP; } diff --git a/lib/librte_eal/x86/rte_power_intrinsics.c b/lib/librte_eal/x86/rte_power_intrinsics.c index 34c5fd9c3e..2a38440bec 100644 --- a/lib/librte_eal/x86/rte_power_intrinsics.c +++ b/lib/librte_eal/x86/rte_power_intrinsics.c @@ -4,6 +4,8 @@ #include "rte_power_intrinsics.h" +static bool wait_supported; + static inline uint64_t __get_umwait_val(const volatile void *p, const uint8_t sz) { @@ -17,24 +19,47 @@ __get_umwait_val(const volatile void *p, const uint8_t sz) case sizeof(uint64_t): return *(const volatile uint64_t *)p; default: - /* this is an intrinsic, so we can't have any error handling */ + /* shouldn't happen */ RTE_ASSERT(0); return 0; } } +static inline int +__check_val_size(const uint8_t sz) +{ + switch (sz) { + case sizeof(uint8_t): /* fall-through */ + case sizeof(uint16_t): /* fall-through */ + case sizeof(uint32_t): /* fall-through */ + case sizeof(uint64_t): /* fall-through */ + return 0; + default: + /* unexpected size */ + return -1; + } +} + /** * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state. * For more information about usage of these instructions, please refer to * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. */ -void +int rte_power_monitor(const volatile void *p, const uint64_t expected_value, const uint64_t value_mask, const uint64_t tsc_timestamp, const uint8_t data_sz) { const uint32_t tsc_l = (uint32_t)tsc_timestamp; const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + + /* prevent user from running this instruction if it's not supported */ + if (!wait_supported) + return -ENOTSUP; + + if (__check_val_size(data_sz) < 0) + return -EINVAL; + /* * we're using raw byte codes for now as only the newest compiler * versions support this instruction natively. @@ -51,13 +76,15 @@ rte_power_monitor(const volatile void *p, const uint64_t expected_value, /* if the masked value is already matching, abort */ if (masked == expected_value) - return; + return 0; } /* execute UMWAIT */ asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" : /* ignore rflags */ : "D"(0), /* enter C0.2 */ "a"(tsc_l), "d"(tsc_h)); + + return 0; } /** @@ -65,13 +92,21 @@ rte_power_monitor(const volatile void *p, const uint64_t expected_value, * For more information about usage of these instructions, please refer to * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. */ -void +int rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, const uint64_t value_mask, const uint64_t tsc_timestamp, const uint8_t data_sz, rte_spinlock_t *lck) { const uint32_t tsc_l = (uint32_t)tsc_timestamp; const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + + /* prevent user from running this instruction if it's not supported */ + if (!wait_supported) + return -ENOTSUP; + + if (__check_val_size(data_sz) < 0) + return -EINVAL; + /* * we're using raw byte codes for now as only the newest compiler * versions support this instruction natively. @@ -88,7 +123,7 @@ rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, /* if the masked value is already matching, abort */ if (masked == expected_value) - return; + return 0; } rte_spinlock_unlock(lck); @@ -99,6 +134,8 @@ rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, "a"(tsc_l), "d"(tsc_h)); rte_spinlock_lock(lck); + + return 0; } /** @@ -106,15 +143,30 @@ rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, * information about usage of this instruction, please refer to Intel(R) 64 and * IA-32 Architectures Software Developer's Manual. */ -void +int rte_power_pause(const uint64_t tsc_timestamp) { const uint32_t tsc_l = (uint32_t)tsc_timestamp; const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + /* prevent user from running this instruction if it's not supported */ + if (!wait_supported) + return -ENOTSUP; + /* execute TPAUSE */ asm volatile(".byte 0x66, 0x0f, 0xae, 0xf7;" : /* ignore rflags */ : "D"(0), /* enter C0.2 */ "a"(tsc_l), "d"(tsc_h)); + + return 0; +} + +RTE_INIT(rte_power_intrinsics_init) { + struct rte_cpu_intrinsics i; + + rte_cpu_get_intrinsics_support(&i); + + if (i.power_monitor && i.power_pause) + wait_supported = 1; } From patchwork Thu Jan 14 14:46:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 86627 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 39560A0A02; Thu, 14 Jan 2021 15:46:46 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id EA90614130D; Thu, 14 Jan 2021 15:46:26 +0100 (CET) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by mails.dpdk.org (Postfix) with ESMTP id 6687B14130B for ; Thu, 14 Jan 2021 15:46:25 +0100 (CET) IronPort-SDR: hkUoUTPyqpxBU3yA0KE8neBx4Szpq8TqTq+wsbKJNxXyc28njURnmudg7vuRcEnYW37n+r5j/J EVQHQ0BoKQPA== X-IronPort-AV: E=McAfee;i="6000,8403,9863"; a="174870251" X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="174870251" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 06:46:25 -0800 IronPort-SDR: BBR4iomTkSE7fTToWEqgqTbFm9vDz/ooR6bP1kp9L25AFEW3d1zJdii3wcI0F3gcEqStiy/y2w 8efV3Yvcxvgw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="465271307" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by fmsmga001.fm.intel.com with ESMTP; 14 Jan 2021 06:46:22 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Timothy McDaniel , Jan Viktorin , Ruifeng Wang , Jerin Jacob , David Christensen , Bruce Richardson , Konstantin Ananyev , thomas@monjalon.net, david.hunt@intel.com, chris.macnamara@intel.com Date: Thu, 14 Jan 2021 14:46:05 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v17 03/11] eal: change API of power intrinsics X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Instead of passing around pointers and integers, collect everything into struct. This makes API design around these intrinsics much easier. Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev --- Notes: v16: - Add error handling drivers/event/dlb/dlb.c | 10 ++-- drivers/event/dlb2/dlb2.c | 10 ++-- lib/librte_eal/arm/rte_power_intrinsics.c | 20 +++----- .../include/generic/rte_power_intrinsics.h | 50 ++++++++----------- lib/librte_eal/ppc/rte_power_intrinsics.c | 20 +++----- lib/librte_eal/x86/rte_power_intrinsics.c | 42 +++++++++------- 6 files changed, 70 insertions(+), 82 deletions(-) diff --git a/drivers/event/dlb/dlb.c b/drivers/event/dlb/dlb.c index 0c95c4793d..d2f2026291 100644 --- a/drivers/event/dlb/dlb.c +++ b/drivers/event/dlb/dlb.c @@ -3161,6 +3161,7 @@ dlb_dequeue_wait(struct dlb_eventdev *dlb, /* Interrupts not supported by PF PMD */ return 1; } else if (dlb->umwait_allowed) { + struct rte_power_monitor_cond pmc; volatile struct dlb_dequeue_qe *cq_base; union { uint64_t raw_qe[2]; @@ -3181,9 +3182,12 @@ dlb_dequeue_wait(struct dlb_eventdev *dlb, else expected_value = 0; - rte_power_monitor(monitor_addr, expected_value, - qe_mask.raw_qe[1], timeout + start_ticks, - sizeof(uint64_t)); + pmc.addr = monitor_addr; + pmc.val = expected_value; + pmc.mask = qe_mask.raw_qe[1]; + pmc.data_sz = sizeof(uint64_t); + + rte_power_monitor(&pmc, timeout + start_ticks); DLB_INC_STAT(ev_port->stats.traffic.rx_umonitor_umwait, 1); } else { diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c index 86724863f2..c9a8a02278 100644 --- a/drivers/event/dlb2/dlb2.c +++ b/drivers/event/dlb2/dlb2.c @@ -2870,6 +2870,7 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2, if (elapsed_ticks >= timeout) { return 1; } else if (dlb2->umwait_allowed) { + struct rte_power_monitor_cond pmc; volatile struct dlb2_dequeue_qe *cq_base; union { uint64_t raw_qe[2]; @@ -2890,9 +2891,12 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2, else expected_value = 0; - rte_power_monitor(monitor_addr, expected_value, - qe_mask.raw_qe[1], timeout + start_ticks, - sizeof(uint64_t)); + pmc.addr = monitor_addr; + pmc.val = expected_value; + pmc.mask = qe_mask.raw_qe[1]; + pmc.data_sz = sizeof(uint64_t); + + rte_power_monitor(&pmc, timeout + start_ticks); DLB2_INC_STAT(ev_port->stats.traffic.rx_umonitor_umwait, 1); } else { diff --git a/lib/librte_eal/arm/rte_power_intrinsics.c b/lib/librte_eal/arm/rte_power_intrinsics.c index 7e7552fa8a..5f1caaf25b 100644 --- a/lib/librte_eal/arm/rte_power_intrinsics.c +++ b/lib/librte_eal/arm/rte_power_intrinsics.c @@ -8,15 +8,11 @@ * This function is not supported on ARM. */ int -rte_power_monitor(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz) +rte_power_monitor(const struct rte_power_monitor_cond *pmc, + const uint64_t tsc_timestamp) { - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); + RTE_SET_USED(pmc); RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(data_sz); return -ENOTSUP; } @@ -25,16 +21,12 @@ rte_power_monitor(const volatile void *p, const uint64_t expected_value, * This function is not supported on ARM. */ int -rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz, rte_spinlock_t *lck) +rte_power_monitor_sync(const struct rte_power_monitor_cond *pmc, + const uint64_t tsc_timestamp, rte_spinlock_t *lck) { - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); + RTE_SET_USED(pmc); RTE_SET_USED(tsc_timestamp); RTE_SET_USED(lck); - RTE_SET_USED(data_sz); return -ENOTSUP; } diff --git a/lib/librte_eal/include/generic/rte_power_intrinsics.h b/lib/librte_eal/include/generic/rte_power_intrinsics.h index 37e4ec0414..3ad53068d5 100644 --- a/lib/librte_eal/include/generic/rte_power_intrinsics.h +++ b/lib/librte_eal/include/generic/rte_power_intrinsics.h @@ -18,6 +18,18 @@ * which are architecture-dependent. */ +struct rte_power_monitor_cond { + volatile void *addr; /**< Address to monitor for changes */ + uint64_t val; /**< Before attempting the monitoring, the address + * may be read and compared against this value. + **/ + uint64_t mask; /**< 64-bit mask to extract current value from addr */ + uint8_t data_sz; /**< Data size (in bytes) that will be used to compare + * expected value with the memory address. Can be 1, + * 2, 4, or 8. Supplying any other value will lead to + * undefined result. */ +}; + /** * @warning * @b EXPERIMENTAL: this API may change without prior notice @@ -35,20 +47,11 @@ * @warning It is responsibility of the user to check if this function is * supported at runtime using `rte_cpu_get_intrinsics_support()` API call. * - * @param p - * Address to monitor for changes. - * @param expected_value - * Before attempting the monitoring, the `p` address may be read and compared - * against this value. If `value_mask` is zero, this step will be skipped. - * @param value_mask - * The 64-bit mask to use to extract current value from `p`. + * @param pmc + * The monitoring condition structure. * @param tsc_timestamp * Maximum TSC timestamp to wait for. Note that the wait behavior is * architecture-dependent. - * @param data_sz - * Data size (in bytes) that will be used to compare expected value with the - * memory address. Can be 1, 2, 4 or 8. Supplying any other value will lead - * to undefined result. * * @return * 0 on success @@ -56,10 +59,8 @@ * -ENOTSUP if unsupported */ __rte_experimental -int rte_power_monitor(const volatile void *p, - const uint64_t expected_value, const uint64_t value_mask, - const uint64_t tsc_timestamp, const uint8_t data_sz); - +int rte_power_monitor(const struct rte_power_monitor_cond *pmc, + const uint64_t tsc_timestamp); /** * @warning * @b EXPERIMENTAL: this API may change without prior notice @@ -80,20 +81,11 @@ int rte_power_monitor(const volatile void *p, * @warning It is responsibility of the user to check if this function is * supported at runtime using `rte_cpu_get_intrinsics_support()` API call. * - * @param p - * Address to monitor for changes. - * @param expected_value - * Before attempting the monitoring, the `p` address may be read and compared - * against this value. If `value_mask` is zero, this step will be skipped. - * @param value_mask - * The 64-bit mask to use to extract current value from `p`. + * @param pmc + * The monitoring condition structure. * @param tsc_timestamp * Maximum TSC timestamp to wait for. Note that the wait behavior is * architecture-dependent. - * @param data_sz - * Data size (in bytes) that will be used to compare expected value with the - * memory address. Can be 1, 2, 4 or 8. Supplying any other value will lead - * to undefined result. * @param lck * A spinlock that must be locked before entering the function, will be * unlocked while the CPU is sleeping, and will be locked again once the CPU @@ -105,10 +97,8 @@ int rte_power_monitor(const volatile void *p, * -ENOTSUP if unsupported */ __rte_experimental -int rte_power_monitor_sync(const volatile void *p, - const uint64_t expected_value, const uint64_t value_mask, - const uint64_t tsc_timestamp, const uint8_t data_sz, - rte_spinlock_t *lck); +int rte_power_monitor_sync(const struct rte_power_monitor_cond *pmc, + const uint64_t tsc_timestamp, rte_spinlock_t *lck); /** * @warning diff --git a/lib/librte_eal/ppc/rte_power_intrinsics.c b/lib/librte_eal/ppc/rte_power_intrinsics.c index 929e0611b0..5e5a1fff5a 100644 --- a/lib/librte_eal/ppc/rte_power_intrinsics.c +++ b/lib/librte_eal/ppc/rte_power_intrinsics.c @@ -8,15 +8,11 @@ * This function is not supported on PPC64. */ int -rte_power_monitor(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz) +rte_power_monitor(const struct rte_power_monitor_cond *pmc, + const uint64_t tsc_timestamp) { - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); + RTE_SET_USED(pmc); RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(data_sz); return -ENOTSUP; } @@ -25,16 +21,12 @@ rte_power_monitor(const volatile void *p, const uint64_t expected_value, * This function is not supported on PPC64. */ int -rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz, rte_spinlock_t *lck) +rte_power_monitor_sync(const struct rte_power_monitor_cond *pmc, + const uint64_t tsc_timestamp, rte_spinlock_t *lck) { - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); + RTE_SET_USED(pmc); RTE_SET_USED(tsc_timestamp); RTE_SET_USED(lck); - RTE_SET_USED(data_sz); return -ENOTSUP; } diff --git a/lib/librte_eal/x86/rte_power_intrinsics.c b/lib/librte_eal/x86/rte_power_intrinsics.c index 2a38440bec..6be5c8b9f1 100644 --- a/lib/librte_eal/x86/rte_power_intrinsics.c +++ b/lib/librte_eal/x86/rte_power_intrinsics.c @@ -46,9 +46,8 @@ __check_val_size(const uint8_t sz) * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. */ int -rte_power_monitor(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz) +rte_power_monitor(const struct rte_power_monitor_cond *pmc, + const uint64_t tsc_timestamp) { const uint32_t tsc_l = (uint32_t)tsc_timestamp; const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); @@ -57,7 +56,10 @@ rte_power_monitor(const volatile void *p, const uint64_t expected_value, if (!wait_supported) return -ENOTSUP; - if (__check_val_size(data_sz) < 0) + if (pmc == NULL) + return -EINVAL; + + if (__check_val_size(pmc->data_sz) < 0) return -EINVAL; /* @@ -68,14 +70,15 @@ rte_power_monitor(const volatile void *p, const uint64_t expected_value, /* set address for UMONITOR */ asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" : - : "D"(p)); + : "D"(pmc->addr)); - if (value_mask) { - const uint64_t cur_value = __get_umwait_val(p, data_sz); - const uint64_t masked = cur_value & value_mask; + if (pmc->mask) { + const uint64_t cur_value = __get_umwait_val( + pmc->addr, pmc->data_sz); + const uint64_t masked = cur_value & pmc->mask; /* if the masked value is already matching, abort */ - if (masked == expected_value) + if (masked == pmc->val) return 0; } /* execute UMWAIT */ @@ -93,9 +96,8 @@ rte_power_monitor(const volatile void *p, const uint64_t expected_value, * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. */ int -rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz, rte_spinlock_t *lck) +rte_power_monitor_sync(const struct rte_power_monitor_cond *pmc, + const uint64_t tsc_timestamp, rte_spinlock_t *lck) { const uint32_t tsc_l = (uint32_t)tsc_timestamp; const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); @@ -104,7 +106,10 @@ rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, if (!wait_supported) return -ENOTSUP; - if (__check_val_size(data_sz) < 0) + if (pmc == NULL || lck == NULL) + return -EINVAL; + + if (__check_val_size(pmc->data_sz) < 0) return -EINVAL; /* @@ -115,14 +120,15 @@ rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, /* set address for UMONITOR */ asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" : - : "D"(p)); + : "D"(pmc->addr)); - if (value_mask) { - const uint64_t cur_value = __get_umwait_val(p, data_sz); - const uint64_t masked = cur_value & value_mask; + if (pmc->mask) { + const uint64_t cur_value = __get_umwait_val( + pmc->addr, pmc->data_sz); + const uint64_t masked = cur_value & pmc->mask; /* if the masked value is already matching, abort */ - if (masked == expected_value) + if (masked == pmc->val) return 0; } rte_spinlock_unlock(lck); From patchwork Thu Jan 14 14:46:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 86628 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 67174A0A02; Thu, 14 Jan 2021 15:46:59 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7853E14130B; Thu, 14 Jan 2021 15:46:30 +0100 (CET) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by mails.dpdk.org (Postfix) with ESMTP id 6AF26141316 for ; Thu, 14 Jan 2021 15:46:28 +0100 (CET) IronPort-SDR: eGJVBc2qwhHUY1NI6SuaJ3hbX5rxh89ADaXPHyGd20+hF7u85ZYDdwiovIgpoL7BACvXFsAaTR hWcti3eOHvzA== X-IronPort-AV: E=McAfee;i="6000,8403,9863"; a="174870265" X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="174870265" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 06:46:28 -0800 IronPort-SDR: UteqMUgCscXe+OvfW88Busz3qviDTkDppOAorFaT7b4h+rjjhOh6VOsVBJ4rx3yKSeLVILrzx/ t8EKCAe9ru5g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="465271320" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by fmsmga001.fm.intel.com with ESMTP; 14 Jan 2021 06:46:25 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Jan Viktorin , Ruifeng Wang , Jerin Jacob , David Christensen , Ray Kinsella , Neil Horman , Bruce Richardson , Konstantin Ananyev , thomas@monjalon.net, timothy.mcdaniel@intel.com, david.hunt@intel.com, chris.macnamara@intel.com Date: Thu, 14 Jan 2021 14:46:06 +0000 Message-Id: <8fa37c3dbeb27434e9b5dac57112c2d0ba8144e7.1610635488.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v17 04/11] eal: remove sync version of power monitor X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Currently, the "sync" version of power monitor intrinsic is supposed to be used for purposes of waking up a sleeping core. However, there are better ways to achieve the same result, so remove the unneeded function. Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev --- lib/librte_eal/arm/rte_power_intrinsics.c | 14 ----- .../include/generic/rte_power_intrinsics.h | 38 ------------- lib/librte_eal/ppc/rte_power_intrinsics.c | 14 ----- lib/librte_eal/version.map | 1 - lib/librte_eal/x86/rte_power_intrinsics.c | 54 ------------------- 5 files changed, 121 deletions(-) diff --git a/lib/librte_eal/arm/rte_power_intrinsics.c b/lib/librte_eal/arm/rte_power_intrinsics.c index 5f1caaf25b..8d271dc0c1 100644 --- a/lib/librte_eal/arm/rte_power_intrinsics.c +++ b/lib/librte_eal/arm/rte_power_intrinsics.c @@ -17,20 +17,6 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc, return -ENOTSUP; } -/** - * This function is not supported on ARM. - */ -int -rte_power_monitor_sync(const struct rte_power_monitor_cond *pmc, - const uint64_t tsc_timestamp, rte_spinlock_t *lck) -{ - RTE_SET_USED(pmc); - RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(lck); - - return -ENOTSUP; -} - /** * This function is not supported on ARM. */ diff --git a/lib/librte_eal/include/generic/rte_power_intrinsics.h b/lib/librte_eal/include/generic/rte_power_intrinsics.h index 3ad53068d5..85343bc9eb 100644 --- a/lib/librte_eal/include/generic/rte_power_intrinsics.h +++ b/lib/librte_eal/include/generic/rte_power_intrinsics.h @@ -61,44 +61,6 @@ struct rte_power_monitor_cond { __rte_experimental int rte_power_monitor(const struct rte_power_monitor_cond *pmc, const uint64_t tsc_timestamp); -/** - * @warning - * @b EXPERIMENTAL: this API may change without prior notice - * - * Monitor specific address for changes. This will cause the CPU to enter an - * architecture-defined optimized power state until either the specified - * memory address is written to, a certain TSC timestamp is reached, or other - * reasons cause the CPU to wake up. - * - * Additionally, an `expected` 64-bit value and 64-bit mask are provided. If - * mask is non-zero, the current value pointed to by the `p` pointer will be - * checked against the expected value, and if they match, the entering of - * optimized power state may be aborted. - * - * This call will also lock a spinlock on entering sleep, and release it on - * waking up the CPU. - * - * @warning It is responsibility of the user to check if this function is - * supported at runtime using `rte_cpu_get_intrinsics_support()` API call. - * - * @param pmc - * The monitoring condition structure. - * @param tsc_timestamp - * Maximum TSC timestamp to wait for. Note that the wait behavior is - * architecture-dependent. - * @param lck - * A spinlock that must be locked before entering the function, will be - * unlocked while the CPU is sleeping, and will be locked again once the CPU - * wakes up. - * - * @return - * 0 on success - * -EINVAL on invalid parameters - * -ENOTSUP if unsupported - */ -__rte_experimental -int rte_power_monitor_sync(const struct rte_power_monitor_cond *pmc, - const uint64_t tsc_timestamp, rte_spinlock_t *lck); /** * @warning diff --git a/lib/librte_eal/ppc/rte_power_intrinsics.c b/lib/librte_eal/ppc/rte_power_intrinsics.c index 5e5a1fff5a..f7862ea324 100644 --- a/lib/librte_eal/ppc/rte_power_intrinsics.c +++ b/lib/librte_eal/ppc/rte_power_intrinsics.c @@ -17,20 +17,6 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc, return -ENOTSUP; } -/** - * This function is not supported on PPC64. - */ -int -rte_power_monitor_sync(const struct rte_power_monitor_cond *pmc, - const uint64_t tsc_timestamp, rte_spinlock_t *lck) -{ - RTE_SET_USED(pmc); - RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(lck); - - return -ENOTSUP; -} - /** * This function is not supported on PPC64. */ diff --git a/lib/librte_eal/version.map b/lib/librte_eal/version.map index 32eceb8869..1fcd1d3bed 100644 --- a/lib/librte_eal/version.map +++ b/lib/librte_eal/version.map @@ -406,7 +406,6 @@ EXPERIMENTAL { # added in 21.02 rte_power_monitor; - rte_power_monitor_sync; rte_power_pause; rte_thread_tls_key_create; rte_thread_tls_key_delete; diff --git a/lib/librte_eal/x86/rte_power_intrinsics.c b/lib/librte_eal/x86/rte_power_intrinsics.c index 6be5c8b9f1..29247d8638 100644 --- a/lib/librte_eal/x86/rte_power_intrinsics.c +++ b/lib/librte_eal/x86/rte_power_intrinsics.c @@ -90,60 +90,6 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc, return 0; } -/** - * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state. - * For more information about usage of these instructions, please refer to - * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. - */ -int -rte_power_monitor_sync(const struct rte_power_monitor_cond *pmc, - const uint64_t tsc_timestamp, rte_spinlock_t *lck) -{ - const uint32_t tsc_l = (uint32_t)tsc_timestamp; - const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); - - /* prevent user from running this instruction if it's not supported */ - if (!wait_supported) - return -ENOTSUP; - - if (pmc == NULL || lck == NULL) - return -EINVAL; - - if (__check_val_size(pmc->data_sz) < 0) - return -EINVAL; - - /* - * we're using raw byte codes for now as only the newest compiler - * versions support this instruction natively. - */ - - /* set address for UMONITOR */ - asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" - : - : "D"(pmc->addr)); - - if (pmc->mask) { - const uint64_t cur_value = __get_umwait_val( - pmc->addr, pmc->data_sz); - const uint64_t masked = cur_value & pmc->mask; - - /* if the masked value is already matching, abort */ - if (masked == pmc->val) - return 0; - } - rte_spinlock_unlock(lck); - - /* execute UMWAIT */ - asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" - : /* ignore rflags */ - : "D"(0), /* enter C0.2 */ - "a"(tsc_l), "d"(tsc_h)); - - rte_spinlock_lock(lck); - - return 0; -} - /** * This function uses TPAUSE instruction and will enter C0.2 state. For more * information about usage of this instruction, please refer to Intel(R) 64 and From patchwork Thu Jan 14 14:46:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 86629 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 25C67A0A02; Thu, 14 Jan 2021 15:47:08 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B8654141319; Thu, 14 Jan 2021 15:46:32 +0100 (CET) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by mails.dpdk.org (Postfix) with ESMTP id 58FE6141312 for ; Thu, 14 Jan 2021 15:46:31 +0100 (CET) IronPort-SDR: 4Z43UBw9YxD9Cq53u/KqLII3XEPNQvB/ZQVqP2CaZFVvIoHf33QYKkMgmX2U0qfKxkDKW7vE7G TyvqxTma9qSQ== X-IronPort-AV: E=McAfee;i="6000,8403,9863"; a="174870276" X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="174870276" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 06:46:31 -0800 IronPort-SDR: OnumARjXZiuXzXr+gSeJt+OYrSWqnL3uiH+g8KHeWgHq1nm/+HfeEFcFmWJz1ZdNVHM7/IH9j3 XiqFO3+sIPJQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="465271332" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by fmsmga001.fm.intel.com with ESMTP; 14 Jan 2021 06:46:28 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Jan Viktorin , Ruifeng Wang , Jerin Jacob , David Christensen , Ray Kinsella , Neil Horman , Bruce Richardson , Konstantin Ananyev , thomas@monjalon.net, timothy.mcdaniel@intel.com, david.hunt@intel.com, chris.macnamara@intel.com Date: Thu, 14 Jan 2021 14:46:07 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v17 05/11] eal: add monitor wakeup function X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Now that we have everything in a C file, we can store the information about our sleep, and have a native mechanism to wake up the sleeping core. This mechanism would however only wake up a core that's sleeping while monitoring - waking up from `rte_power_pause` won't work. Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev --- Notes: v17: - Improve code readability with a goto (the horror!) - Fix the compile issues for non-x86 archs v16: - Improve error handling - Take a lock before UMONITOR v13: - Add comments around wakeup code to explain what it does - Add lcore_id parameter checking to prevent buffer overrun v15: - Fix check in UMWAIT callback v13: - Rework the synchronization mechanism to not require locking - Add more parameter checking - Rework n_rx_queues access to not go through internal PMD structures and use public API instead v13: - Rework the synchronization mechanism to not require locking - Add more parameter checking - Rework n_rx_queues access to not go through internal PMD structures and use public API instead lib/librte_eal/arm/rte_power_intrinsics.c | 11 +++ .../include/generic/rte_power_intrinsics.h | 16 ++++ lib/librte_eal/ppc/rte_power_intrinsics.c | 11 +++ lib/librte_eal/version.map | 1 + lib/librte_eal/x86/rte_power_intrinsics.c | 93 ++++++++++++++++++- 5 files changed, 131 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/arm/rte_power_intrinsics.c b/lib/librte_eal/arm/rte_power_intrinsics.c index 8d271dc0c1..e83f04072a 100644 --- a/lib/librte_eal/arm/rte_power_intrinsics.c +++ b/lib/librte_eal/arm/rte_power_intrinsics.c @@ -27,3 +27,14 @@ rte_power_pause(const uint64_t tsc_timestamp) return -ENOTSUP; } + +/** + * This function is not supported on ARM. + */ +int +rte_power_monitor_wakeup(const unsigned int lcore_id) +{ + RTE_SET_USED(lcore_id); + + return -ENOTSUP; +} diff --git a/lib/librte_eal/include/generic/rte_power_intrinsics.h b/lib/librte_eal/include/generic/rte_power_intrinsics.h index 85343bc9eb..6109d28faa 100644 --- a/lib/librte_eal/include/generic/rte_power_intrinsics.h +++ b/lib/librte_eal/include/generic/rte_power_intrinsics.h @@ -62,6 +62,22 @@ __rte_experimental int rte_power_monitor(const struct rte_power_monitor_cond *pmc, const uint64_t tsc_timestamp); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Wake up a specific lcore that is in a power optimized state and is monitoring + * an address. + * + * @note This function will *not* wake up a core that is in a power optimized + * state due to calling `rte_power_pause`. + * + * @param lcore_id + * Lcore ID of a sleeping thread. + */ +__rte_experimental +int rte_power_monitor_wakeup(const unsigned int lcore_id); + /** * @warning * @b EXPERIMENTAL: this API may change without prior notice diff --git a/lib/librte_eal/ppc/rte_power_intrinsics.c b/lib/librte_eal/ppc/rte_power_intrinsics.c index f7862ea324..7fc9586da7 100644 --- a/lib/librte_eal/ppc/rte_power_intrinsics.c +++ b/lib/librte_eal/ppc/rte_power_intrinsics.c @@ -27,3 +27,14 @@ rte_power_pause(const uint64_t tsc_timestamp) return -ENOTSUP; } + +/** + * This function is not supported on PPC64. + */ +int +rte_power_monitor_wakeup(const unsigned int lcore_id) +{ + RTE_SET_USED(lcore_id); + + return -ENOTSUP; +} diff --git a/lib/librte_eal/version.map b/lib/librte_eal/version.map index 1fcd1d3bed..fce90a112f 100644 --- a/lib/librte_eal/version.map +++ b/lib/librte_eal/version.map @@ -406,6 +406,7 @@ EXPERIMENTAL { # added in 21.02 rte_power_monitor; + rte_power_monitor_wakeup; rte_power_pause; rte_thread_tls_key_create; rte_thread_tls_key_delete; diff --git a/lib/librte_eal/x86/rte_power_intrinsics.c b/lib/librte_eal/x86/rte_power_intrinsics.c index 29247d8638..af3ae3237c 100644 --- a/lib/librte_eal/x86/rte_power_intrinsics.c +++ b/lib/librte_eal/x86/rte_power_intrinsics.c @@ -2,8 +2,31 @@ * Copyright(c) 2020 Intel Corporation */ +#include +#include +#include + #include "rte_power_intrinsics.h" +/* + * Per-lcore structure holding current status of C0.2 sleeps. + */ +static struct power_wait_status { + rte_spinlock_t lock; + volatile void *monitor_addr; /**< NULL if not currently sleeping */ +} __rte_cache_aligned wait_status[RTE_MAX_LCORE]; + +static inline void +__umwait_wakeup(volatile void *addr) +{ + uint64_t val; + + /* trigger a write but don't change the value */ + val = __atomic_load_n((volatile uint64_t *)addr, __ATOMIC_RELAXED); + __atomic_compare_exchange_n((volatile uint64_t *)addr, &val, val, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED); +} + static bool wait_supported; static inline uint64_t @@ -51,17 +74,29 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc, { const uint32_t tsc_l = (uint32_t)tsc_timestamp; const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + const unsigned int lcore_id = rte_lcore_id(); + struct power_wait_status *s; /* prevent user from running this instruction if it's not supported */ if (!wait_supported) return -ENOTSUP; + /* prevent non-EAL thread from using this API */ + if (lcore_id >= RTE_MAX_LCORE) + return -EINVAL; + if (pmc == NULL) return -EINVAL; if (__check_val_size(pmc->data_sz) < 0) return -EINVAL; + s = &wait_status[lcore_id]; + + /* update sleep address */ + rte_spinlock_lock(&s->lock); + s->monitor_addr = pmc->addr; + /* * we're using raw byte codes for now as only the newest compiler * versions support this instruction natively. @@ -72,6 +107,10 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc, : : "D"(pmc->addr)); + /* now that we've put this address into monitor, we can unlock */ + rte_spinlock_unlock(&s->lock); + + /* if we have a comparison mask, we might not need to sleep at all */ if (pmc->mask) { const uint64_t cur_value = __get_umwait_val( pmc->addr, pmc->data_sz); @@ -79,14 +118,21 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc, /* if the masked value is already matching, abort */ if (masked == pmc->val) - return 0; + goto end; } + /* execute UMWAIT */ asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" : /* ignore rflags */ : "D"(0), /* enter C0.2 */ "a"(tsc_l), "d"(tsc_h)); +end: + /* erase sleep address */ + rte_spinlock_lock(&s->lock); + s->monitor_addr = NULL; + rte_spinlock_unlock(&s->lock); + return 0; } @@ -122,3 +168,48 @@ RTE_INIT(rte_power_intrinsics_init) { if (i.power_monitor && i.power_pause) wait_supported = 1; } + +int +rte_power_monitor_wakeup(const unsigned int lcore_id) +{ + struct power_wait_status *s; + + /* prevent user from running this instruction if it's not supported */ + if (!wait_supported) + return -ENOTSUP; + + /* prevent buffer overrun */ + if (lcore_id >= RTE_MAX_LCORE) + return -EINVAL; + + s = &wait_status[lcore_id]; + + /* + * There is a race condition between sleep, wakeup and locking, but we + * don't need to handle it. + * + * Possible situations: + * + * 1. T1 locks, sets address, unlocks + * 2. T2 locks, triggers wakeup, unlocks + * 3. T1 sleeps + * + * In this case, because T1 has already set the address for monitoring, + * we will wake up immediately even if T2 triggers wakeup before T1 + * goes to sleep. + * + * 1. T1 locks, sets address, unlocks, goes to sleep, and wakes up + * 2. T2 locks, triggers wakeup, and unlocks + * 3. T1 locks, erases address, and unlocks + * + * In this case, since we've already woken up, the "wakeup" was + * unneeded, and since T1 is still waiting on T2 releasing the lock, the + * wakeup address is still valid so it's perfectly safe to write it. + */ + rte_spinlock_lock(&s->lock); + if (s->monitor_addr != NULL) + __umwait_wakeup(s->monitor_addr); + rte_spinlock_unlock(&s->lock); + + return 0; +} From patchwork Thu Jan 14 14:46:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 86630 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A488CA0A02; Thu, 14 Jan 2021 15:47:20 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5F729141324; Thu, 14 Jan 2021 15:46:35 +0100 (CET) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by mails.dpdk.org (Postfix) with ESMTP id 3EC3214131D for ; Thu, 14 Jan 2021 15:46:34 +0100 (CET) IronPort-SDR: 7bvdvYR5fYasNEoRqDFQ7S0TPXVRiGuUd5BbVT7/2cookfZiKb9hxtGl2Isc5YWPz3Ql5ilMNb ruoVZ/UgNYjw== X-IronPort-AV: E=McAfee;i="6000,8403,9863"; a="174870295" X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="174870295" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 06:46:33 -0800 IronPort-SDR: VGlcytSEYntk31Od4AUzPCiExRBlIBteBzS4xwf3DrcZAGno0KjcstrqjN8e/QZ7n4ys8sSv67 LqGjXKSs1Fqg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="465271345" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by fmsmga001.fm.intel.com with ESMTP; 14 Jan 2021 06:46:31 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , Ray Kinsella , Neil Horman , Thomas Monjalon , Ferruh Yigit , Andrew Rybchenko , konstantin.ananyev@intel.com, timothy.mcdaniel@intel.com, david.hunt@intel.com, bruce.richardson@intel.com, chris.macnamara@intel.com Date: Thu, 14 Jan 2021 14:46:08 +0000 Message-Id: <299417d0dc7c063461ed4ebec4d5bc9d0cf5afbd.1610635488.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v17 06/11] ethdev: add simple power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Add a simple API to allow getting the monitor conditions for power-optimized monitoring of the Rx queues from the PMD, as well as release notes information. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: Andrew Rybchenko --- Notes: v17: - Added libabigail ignore for driver-only ABI in ethdev as suggested by David v13: - Fix typos and issues raised by Andrew devtools/libabigail.abignore | 3 +++ doc/guides/rel_notes/release_21_02.rst | 5 +++++ lib/librte_ethdev/rte_ethdev.c | 28 ++++++++++++++++++++++++++ lib/librte_ethdev/rte_ethdev.h | 25 +++++++++++++++++++++++ lib/librte_ethdev/rte_ethdev_driver.h | 22 ++++++++++++++++++++ lib/librte_ethdev/version.map | 3 +++ 6 files changed, 86 insertions(+) diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore index 025f2c01bc..1c16114dce 100644 --- a/devtools/libabigail.abignore +++ b/devtools/libabigail.abignore @@ -7,3 +7,6 @@ symbol_version = INTERNAL [suppress_variable] symbol_version = INTERNAL +; Explicit ignore for driver-only ABI +[suppress_type] + name = eth_dev_ops diff --git a/doc/guides/rel_notes/release_21_02.rst b/doc/guides/rel_notes/release_21_02.rst index 706cbf8f0c..ec9958a141 100644 --- a/doc/guides/rel_notes/release_21_02.rst +++ b/doc/guides/rel_notes/release_21_02.rst @@ -55,6 +55,11 @@ New Features Also, make sure to start the actual text at the margin. ======================================================= +* **ethdev: added new API for PMD power management** + + * ``rte_eth_get_monitor_addr()``, to be used in conjunction with + ``rte_power_monitor()`` to enable automatic power management for PMD's. + Removed Items ------------- diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index 17ddacc78d..e19dbd838b 100644 --- a/lib/librte_ethdev/rte_ethdev.c +++ b/lib/librte_ethdev/rte_ethdev.c @@ -5115,6 +5115,34 @@ rte_eth_tx_burst_mode_get(uint16_t port_id, uint16_t queue_id, dev->dev_ops->tx_burst_mode_get(dev, queue_id, mode)); } +int +rte_eth_get_monitor_addr(uint16_t port_id, uint16_t queue_id, + struct rte_power_monitor_cond *pmc) +{ + struct rte_eth_dev *dev; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); + + dev = &rte_eth_devices[port_id]; + + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_monitor_addr, -ENOTSUP); + + if (queue_id >= dev->data->nb_rx_queues) { + RTE_ETHDEV_LOG(ERR, "Invalid Rx queue_id=%u\n", queue_id); + return -EINVAL; + } + + if (pmc == NULL) { + RTE_ETHDEV_LOG(ERR, "Invalid power monitor condition=%p\n", + pmc); + return -EINVAL; + } + + return eth_err(port_id, + dev->dev_ops->get_monitor_addr(dev->data->rx_queues[queue_id], + pmc)); +} + int rte_eth_dev_set_mc_addr_list(uint16_t port_id, struct rte_ether_addr *mc_addr_set, diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h index f5f8919186..ca0f91312e 100644 --- a/lib/librte_ethdev/rte_ethdev.h +++ b/lib/librte_ethdev/rte_ethdev.h @@ -157,6 +157,7 @@ extern "C" { #include #include #include +#include #include "rte_ethdev_trace_fp.h" #include "rte_dev_info.h" @@ -4334,6 +4335,30 @@ __rte_experimental int rte_eth_tx_burst_mode_get(uint16_t port_id, uint16_t queue_id, struct rte_eth_burst_mode *mode); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Retrieve the monitor condition for a given receive queue. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param queue_id + * The Rx queue on the Ethernet device for which information + * will be retrieved. + * @param pmc + * The pointer point to power-optimized monitoring condition structure. + * + * @return + * - 0: Success. + * -ENOTSUP: Operation not supported. + * -EINVAL: Invalid parameters. + * -ENODEV: Invalid port ID. + */ +__rte_experimental +int rte_eth_get_monitor_addr(uint16_t port_id, uint16_t queue_id, + struct rte_power_monitor_cond *pmc); + /** * Retrieve device registers and register attributes (number of registers and * register size) diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h index 0eacfd8425..3b3b0ec1a0 100644 --- a/lib/librte_ethdev/rte_ethdev_driver.h +++ b/lib/librte_ethdev/rte_ethdev_driver.h @@ -763,6 +763,26 @@ typedef int (*eth_hairpin_queue_peer_unbind_t) (struct rte_eth_dev *dev, uint16_t cur_queue, uint32_t direction); /**< @internal Unbind peer queue from the current queue. */ +/** + * @internal + * Get address of memory location whose contents will change whenever there is + * new data to be received on an Rx queue. + * + * @param rxq + * Ethdev queue pointer. + * @param pmc + * The pointer to power-optimized monitoring condition structure. + * @return + * Negative errno value on error, 0 on success. + * + * @retval 0 + * Success + * @retval -EINVAL + * Invalid parameters + */ +typedef int (*eth_get_monitor_addr_t)(void *rxq, + struct rte_power_monitor_cond *pmc); + /** * @internal A structure containing the functions exported by an Ethernet driver. */ @@ -917,6 +937,8 @@ struct eth_dev_ops { /**< Set up the connection between the pair of hairpin queues. */ eth_hairpin_queue_peer_unbind_t hairpin_queue_peer_unbind; /**< Disconnect the hairpin queues of a pair from each other. */ + eth_get_monitor_addr_t get_monitor_addr; + /**< Get power monitoring condition for Rx queue. */ }; /** diff --git a/lib/librte_ethdev/version.map b/lib/librte_ethdev/version.map index d3f5410806..a124e1e370 100644 --- a/lib/librte_ethdev/version.map +++ b/lib/librte_ethdev/version.map @@ -240,6 +240,9 @@ EXPERIMENTAL { rte_flow_get_restore_info; rte_flow_tunnel_action_decap_release; rte_flow_tunnel_item_release; + + # added in 21.02 + rte_eth_get_monitor_addr; }; INTERNAL { From patchwork Thu Jan 14 14:46:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 86631 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 27DCFA0A02; Thu, 14 Jan 2021 15:47:31 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id AF6E514131F; Thu, 14 Jan 2021 15:46:38 +0100 (CET) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by mails.dpdk.org (Postfix) with ESMTP id EA65A14132A for ; Thu, 14 Jan 2021 15:46:36 +0100 (CET) IronPort-SDR: wFFSGqK15eQyYuicsWUvYeYXqChtInIv36y1D4v/yS6SLheRzNPtNWapdNgyKq7wMKJAn6m6an 63T4FTF6AA1w== X-IronPort-AV: E=McAfee;i="6000,8403,9863"; a="174870308" X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="174870308" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 06:46:36 -0800 IronPort-SDR: IEzxbTv2DZ6eftFhXBxhCdFoaOrlhrrRDbpyjjjFHfdwxxXIv8vdNnmHPNnoXMTAN3sAWJE/8C URoHCAAK7uDg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="465271360" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by fmsmga001.fm.intel.com with ESMTP; 14 Jan 2021 06:46:34 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , David Hunt , Ray Kinsella , Neil Horman , thomas@monjalon.net, konstantin.ananyev@intel.com, timothy.mcdaniel@intel.com, bruce.richardson@intel.com, chris.macnamara@intel.com Date: Thu, 14 Jan 2021 14:46:09 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v17 07/11] power: add PMD power management API and callback X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Add a simple on/off switch that will enable saving power when no packets are arriving. It is based on counting the number of empty polls and, when the number reaches a certain threshold, entering an architecture-defined optimized power state that will either wait until a TSC timestamp expires, or when packets arrive. This API mandates a core-to-single-queue mapping (that is, multiple queued per device are supported, but they have to be polled on different cores). This design is using PMD RX callbacks. 1. UMWAIT/UMONITOR: When a certain threshold of empty polls is reached, the core will go into a power optimized sleep while waiting on an address of next RX descriptor to be written to. 2. TPAUSE/Pause instruction This method uses the pause (or TPAUSE, if available) instruction to avoid busy polling. 3. Frequency scaling Reuse existing DPDK power library to scale up/down core frequency depending on traffic volume. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: David Hunt --- Notes: v17: - Added memory barriers suggested by Konstantin - Removed the BUSY state doc/guides/prog_guide/power_man.rst | 44 +++ doc/guides/rel_notes/release_21_02.rst | 10 + lib/librte_power/meson.build | 5 +- lib/librte_power/rte_power_pmd_mgmt.c | 364 +++++++++++++++++++++++++ lib/librte_power/rte_power_pmd_mgmt.h | 90 ++++++ lib/librte_power/version.map | 5 + 6 files changed, 516 insertions(+), 2 deletions(-) create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst index 0a3755a901..02280dd689 100644 --- a/doc/guides/prog_guide/power_man.rst +++ b/doc/guides/prog_guide/power_man.rst @@ -192,6 +192,47 @@ User Cases ---------- The mechanism can applied to any device which is based on polling. e.g. NIC, FPGA. +PMD Power Management API +------------------------ + +Abstract +~~~~~~~~ +Existing power management mechanisms require developers to change application +design or change code to make use of it. The PMD power management API provides a +convenient alternative by utilizing Ethernet PMD RX callbacks, and triggering +power saving whenever empty poll count reaches a certain number. + + * Monitor + + This power saving scheme will put the CPU into optimized power state and use + the ``rte_power_monitor()`` function to monitor the Ethernet PMD RX + descriptor address, and wake the CPU up whenever there's new traffic. + + * Pause + + This power saving scheme will avoid busy polling by either entering + power-optimized sleep state with ``rte_power_pause()`` function, or, if it's + not available, use ``rte_pause()``. + + * Frequency scaling + + This power saving scheme will use existing ``librte_power`` library + functionality to scale the core frequency up/down depending on traffic + volume. + + +.. note:: + + Currently, this power management API is limited to mandatory mapping of 1 + queue to 1 core (multiple queues are supported, but they must be polled from + different cores). + +API Overview for PMD Power Management +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +* **Queue Enable**: Enable specific power scheme for certain queue/port/core + +* **Queue Disable**: Disable power scheme for certain queue/port/core + References ---------- @@ -200,3 +241,6 @@ References * The :doc:`../sample_app_ug/vm_power_management` chapter in the :doc:`../sample_app_ug/index` section. + +* The :doc:`../sample_app_ug/rxtx_callbacks` + chapter in the :doc:`../sample_app_ug/index` section. diff --git a/doc/guides/rel_notes/release_21_02.rst b/doc/guides/rel_notes/release_21_02.rst index ec9958a141..9cd8214e2d 100644 --- a/doc/guides/rel_notes/release_21_02.rst +++ b/doc/guides/rel_notes/release_21_02.rst @@ -60,6 +60,16 @@ New Features * ``rte_eth_get_monitor_addr()``, to be used in conjunction with ``rte_power_monitor()`` to enable automatic power management for PMD's. +* **Add PMD power management helper API** + + A new helper API has been added to make using Ethernet PMD power management + easier for the user: ``rte_power_pmd_mgmt_queue_enable()``. Three power + management schemes are supported initially: + + * Power saving based on UMWAIT instruction (x86 only) + * Power saving based on ``rte_pause()`` (generic) or TPAUSE instruction (x86 only) + * Power saving based on frequency scaling through the ``librte_power`` library + Removed Items ------------- diff --git a/lib/librte_power/meson.build b/lib/librte_power/meson.build index 4b4cf1b90b..51a471b669 100644 --- a/lib/librte_power/meson.build +++ b/lib/librte_power/meson.build @@ -9,6 +9,7 @@ sources = files('rte_power.c', 'power_acpi_cpufreq.c', 'power_kvm_vm.c', 'guest_channel.c', 'rte_power_empty_poll.c', 'power_pstate_cpufreq.c', + 'rte_power_pmd_mgmt.c', 'power_common.c') -headers = files('rte_power.h','rte_power_empty_poll.h') -deps += ['timer'] +headers = files('rte_power.h','rte_power_empty_poll.h','rte_power_pmd_mgmt.h') +deps += ['timer' ,'ethdev'] diff --git a/lib/librte_power/rte_power_pmd_mgmt.c b/lib/librte_power/rte_power_pmd_mgmt.c new file mode 100644 index 0000000000..3dd463d69a --- /dev/null +++ b/lib/librte_power/rte_power_pmd_mgmt.c @@ -0,0 +1,364 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2020 Intel Corporation + */ + +#include +#include +#include +#include +#include +#include + +#include "rte_power_pmd_mgmt.h" + +#define EMPTYPOLL_MAX 512 + +static struct pmd_conf_data { + struct rte_cpu_intrinsics intrinsics_support; + /**< what do we support? */ + uint64_t tsc_per_us; + /**< pre-calculated tsc diff for 1us */ + uint64_t pause_per_us; + /**< how many rte_pause can we fit in a microisecond? */ +} global_data; + +/** + * Possible power management states of an ethdev port. + */ +enum pmd_mgmt_state { + /** Device power management is disabled. */ + PMD_MGMT_DISABLED = 0, + /** Device power management is enabled. */ + PMD_MGMT_ENABLED +}; + +struct pmd_queue_cfg { + volatile enum pmd_mgmt_state pwr_mgmt_state; + /**< State of power management for this queue */ + enum rte_power_pmd_mgmt_type cb_mode; + /**< Callback mode for this queue */ + const struct rte_eth_rxtx_callback *cur_cb; + /**< Callback instance */ + volatile bool umwait_in_progress; + /**< are we currently sleeping? */ + uint64_t empty_poll_stats; + /**< Number of empty polls */ +} __rte_cache_aligned; + +static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT]; + +static void +calc_tsc(void) +{ + const uint64_t hz = rte_get_timer_hz(); + const uint64_t tsc_per_us = hz / US_PER_S; /* 1us */ + + global_data.tsc_per_us = tsc_per_us; + + /* only do this if we don't have tpause */ + if (!global_data.intrinsics_support.power_pause) { + const uint64_t start = rte_rdtsc_precise(); + const uint32_t n_pauses = 10000; + double us, us_per_pause; + uint64_t end; + unsigned int i; + + /* estimate number of rte_pause() calls per us*/ + for (i = 0; i < n_pauses; i++) + rte_pause(); + + end = rte_rdtsc_precise(); + us = (end - start) / (double)tsc_per_us; + us_per_pause = us / n_pauses; + + global_data.pause_per_us = (uint64_t)(1.0 / us_per_pause); + } +} + +static uint16_t +clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused, + uint16_t nb_rx, uint16_t max_pkts __rte_unused, + void *addr __rte_unused) +{ + + struct pmd_queue_cfg *q_conf; + + q_conf = &port_cfg[port_id][qidx]; + + if (unlikely(nb_rx == 0)) { + q_conf->empty_poll_stats++; + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) { + struct rte_power_monitor_cond pmc; + uint16_t ret; + + /* + * we might get a cancellation request while being + * inside the callback, in which case the wakeup + * wouldn't work because it would've arrived too early. + * + * to get around this, we notify the other thread that + * we're sleeping, so that it can spin until we're done. + * unsolicited wakeups are perfectly safe. + */ + q_conf->umwait_in_progress = true; + + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); + + /* check if we need to cancel sleep */ + if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) { + /* use monitoring condition to sleep */ + ret = rte_eth_get_monitor_addr(port_id, qidx, + &pmc); + if (ret == 0) + rte_power_monitor(&pmc, -1ULL); + } + q_conf->umwait_in_progress = false; + + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); + } + } else + q_conf->empty_poll_stats = 0; + + return nb_rx; +} + +static uint16_t +clb_pause(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused, + uint16_t nb_rx, uint16_t max_pkts __rte_unused, + void *addr __rte_unused) +{ + struct pmd_queue_cfg *q_conf; + + q_conf = &port_cfg[port_id][qidx]; + + if (unlikely(nb_rx == 0)) { + q_conf->empty_poll_stats++; + /* sleep for 1 microsecond */ + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) { + /* use tpause if we have it */ + if (global_data.intrinsics_support.power_pause) { + const uint64_t cur = rte_rdtsc(); + const uint64_t wait_tsc = + cur + global_data.tsc_per_us; + rte_power_pause(wait_tsc); + } else { + uint64_t i; + for (i = 0; i < global_data.pause_per_us; i++) + rte_pause(); + } + } + } else + q_conf->empty_poll_stats = 0; + + return nb_rx; +} + +static uint16_t +clb_scale_freq(uint16_t port_id, uint16_t qidx, + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, + uint16_t max_pkts __rte_unused, void *_ __rte_unused) +{ + struct pmd_queue_cfg *q_conf; + + q_conf = &port_cfg[port_id][qidx]; + + if (unlikely(nb_rx == 0)) { + q_conf->empty_poll_stats++; + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) + /* scale down freq */ + rte_power_freq_min(rte_lcore_id()); + } else { + q_conf->empty_poll_stats = 0; + /* scale up freq */ + rte_power_freq_max(rte_lcore_id()); + } + + return nb_rx; +} + +int +rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, + uint16_t queue_id, enum rte_power_pmd_mgmt_type mode) +{ + struct pmd_queue_cfg *queue_cfg; + struct rte_eth_dev_info info; + int ret; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); + + if (queue_id >= RTE_MAX_QUEUES_PER_PORT || lcore_id >= RTE_MAX_LCORE) { + ret = -EINVAL; + goto end; + } + + if (rte_eth_dev_info_get(port_id, &info) < 0) { + ret = -EINVAL; + goto end; + } + + /* check if queue id is valid */ + if (queue_id >= info.nb_rx_queues) { + ret = -EINVAL; + goto end; + } + + queue_cfg = &port_cfg[port_id][queue_id]; + + if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) { + ret = -EINVAL; + goto end; + } + + /* we need this in various places */ + rte_cpu_get_intrinsics_support(&global_data.intrinsics_support); + + switch (mode) { + case RTE_POWER_MGMT_TYPE_MONITOR: + { + struct rte_power_monitor_cond dummy; + + /* check if rte_power_monitor is supported */ + if (!global_data.intrinsics_support.power_monitor) { + RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n"); + ret = -ENOTSUP; + goto end; + } + + /* check if the device supports the necessary PMD API */ + if (rte_eth_get_monitor_addr(port_id, queue_id, + &dummy) == -ENOTSUP) { + RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_get_monitor_addr\n"); + ret = -ENOTSUP; + goto end; + } + /* initialize data before enabling the callback */ + queue_cfg->empty_poll_stats = 0; + queue_cfg->cb_mode = mode; + queue_cfg->umwait_in_progress = false; + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; + + /* ensure we update our state before callback starts */ + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); + + queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id, + clb_umwait, NULL); + break; + } + case RTE_POWER_MGMT_TYPE_SCALE: + { + enum power_management_env env; + /* only PSTATE and ACPI modes are supported */ + if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) && + !rte_power_check_env_supported( + PM_ENV_PSTATE_CPUFREQ)) { + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are supported\n"); + ret = -ENOTSUP; + goto end; + } + /* ensure we could initialize the power library */ + if (rte_power_init(lcore_id)) { + ret = -EINVAL; + goto end; + } + /* ensure we initialized the correct env */ + env = rte_power_get_env(); + if (env != PM_ENV_ACPI_CPUFREQ && + env != PM_ENV_PSTATE_CPUFREQ) { + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes were initialized\n"); + ret = -ENOTSUP; + goto end; + } + /* initialize data before enabling the callback */ + queue_cfg->empty_poll_stats = 0; + queue_cfg->cb_mode = mode; + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; + + /* this is not necessary here, but do it anyway */ + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); + + queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, + queue_id, clb_scale_freq, NULL); + break; + } + case RTE_POWER_MGMT_TYPE_PAUSE: + /* figure out various time-to-tsc conversions */ + if (global_data.tsc_per_us == 0) + calc_tsc(); + + /* initialize data before enabling the callback */ + queue_cfg->empty_poll_stats = 0; + queue_cfg->cb_mode = mode; + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; + + /* this is not necessary here, but do it anyway */ + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); + + queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id, + clb_pause, NULL); + break; + } + ret = 0; +end: + return ret; +} + +int +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id, + uint16_t port_id, uint16_t queue_id) +{ + struct pmd_queue_cfg *queue_cfg; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); + + if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT) + return -EINVAL; + + /* no need to check queue id as wrong queue id would not be enabled */ + queue_cfg = &port_cfg[port_id][queue_id]; + + if (queue_cfg->pwr_mgmt_state != PMD_MGMT_ENABLED) + return -EINVAL; + + /* stop any callbacks from progressing */ + queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED; + + /* ensure we update our state before continuing */ + rte_atomic_thread_fence(__ATOMIC_SEQ_CST); + + switch (queue_cfg->cb_mode) { + case RTE_POWER_MGMT_TYPE_MONITOR: + { + bool exit = false; + do { + /* + * we may request cancellation while the other thread + * has just entered the callback but hasn't started + * sleeping yet, so keep waking it up until we know it's + * done sleeping. + */ + if (queue_cfg->umwait_in_progress) + rte_power_monitor_wakeup(lcore_id); + else + exit = true; + } while (!exit); + } + /* fall-through */ + case RTE_POWER_MGMT_TYPE_PAUSE: + rte_eth_remove_rx_callback(port_id, queue_id, + queue_cfg->cur_cb); + break; + case RTE_POWER_MGMT_TYPE_SCALE: + rte_power_freq_max(lcore_id); + rte_eth_remove_rx_callback(port_id, queue_id, + queue_cfg->cur_cb); + rte_power_exit(lcore_id); + break; + } + /* + * we don't free the RX callback here because it is unsafe to do so + * unless we know for a fact that all data plane threads have stopped. + */ + queue_cfg->cur_cb = NULL; + + return 0; +} diff --git a/lib/librte_power/rte_power_pmd_mgmt.h b/lib/librte_power/rte_power_pmd_mgmt.h new file mode 100644 index 0000000000..0bfbc6ba69 --- /dev/null +++ b/lib/librte_power/rte_power_pmd_mgmt.h @@ -0,0 +1,90 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2020 Intel Corporation + */ + +#ifndef _RTE_POWER_PMD_MGMT_H +#define _RTE_POWER_PMD_MGMT_H + +/** + * @file + * RTE PMD Power Management + */ +#include +#include + +#include +#include +#include +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * PMD Power Management Type + */ +enum rte_power_pmd_mgmt_type { + /** Use power-optimized monitoring to wait for incoming traffic */ + RTE_POWER_MGMT_TYPE_MONITOR = 1, + /** Use power-optimized sleep to avoid busy polling */ + RTE_POWER_MGMT_TYPE_PAUSE, + /** Use frequency scaling when traffic is low */ + RTE_POWER_MGMT_TYPE_SCALE, +}; + +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice + * + * Enable power management on a specified RX queue and lcore. + * + * @note This function is not thread-safe. + * + * @param lcore_id + * lcore_id. + * @param port_id + * The port identifier of the Ethernet device. + * @param queue_id + * The queue identifier of the Ethernet device. + * @param mode + * The power management callback function type. + + * @return + * 0 on success + * <0 on error + */ +__rte_experimental +int +rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id, + uint16_t port_id, uint16_t queue_id, + enum rte_power_pmd_mgmt_type mode); + +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice + * + * Disable power management on a specified RX queue and lcore. + * + * @note This function is not thread-safe. + * + * @param lcore_id + * lcore_id. + * @param port_id + * The port identifier of the Ethernet device. + * @param queue_id + * The queue identifier of the Ethernet device. + * @return + * 0 on success + * <0 on error + */ +__rte_experimental +int +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id, + uint16_t port_id, uint16_t queue_id); +#ifdef __cplusplus +} +#endif + +#endif diff --git a/lib/librte_power/version.map b/lib/librte_power/version.map index 69ca9af616..61996b4d11 100644 --- a/lib/librte_power/version.map +++ b/lib/librte_power/version.map @@ -34,4 +34,9 @@ EXPERIMENTAL { rte_power_guest_channel_receive_msg; rte_power_poll_stat_fetch; rte_power_poll_stat_update; + + # added in 21.02 + rte_power_pmd_mgmt_queue_enable; + rte_power_pmd_mgmt_queue_disable; + }; From patchwork Thu Jan 14 14:46:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 86632 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C5DBBA0A02; Thu, 14 Jan 2021 15:47:40 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id EF04914132D; Thu, 14 Jan 2021 15:46:40 +0100 (CET) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by mails.dpdk.org (Postfix) with ESMTP id 4F9CD14131D for ; Thu, 14 Jan 2021 15:46:39 +0100 (CET) IronPort-SDR: s0cAfnyIYoSB+aLCOQBuQMTe4H20rBCt0o8FcBOzgqLVShzLO+BDXbMgtzq3sAq7ecIxQBqffT V5uULHMDjmLg== X-IronPort-AV: E=McAfee;i="6000,8403,9863"; a="174870311" X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="174870311" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 06:46:38 -0800 IronPort-SDR: 9FyGux3C3z3lOKDKMviGg7OESrX2H9hIO6EFqEB4O4dT7fVxQZw6u22K6RW5Myds86mdR6rcoD v8Vq0pYhkANw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="465271374" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by fmsmga001.fm.intel.com with ESMTP; 14 Jan 2021 06:46:36 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , Jeff Guo , Haiyue Wang , thomas@monjalon.net, konstantin.ananyev@intel.com, timothy.mcdaniel@intel.com, david.hunt@intel.com, bruce.richardson@intel.com, chris.macnamara@intel.com Date: Thu, 14 Jan 2021 14:46:10 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v17 08/11] net/ixgbe: implement power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Implement support for the power management API by implementing a `get_monitor_addr` function that will return an address of an RX ring's status bit. Signed-off-by: Anatoly Burakov Signed-off-by: Liang Ma Acked-by: Konstantin Ananyev --- drivers/net/ixgbe/ixgbe_ethdev.c | 1 + drivers/net/ixgbe/ixgbe_rxtx.c | 25 +++++++++++++++++++++++++ drivers/net/ixgbe/ixgbe_rxtx.h | 1 + 3 files changed, 27 insertions(+) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index d7a1806ab8..97acf35d24 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -560,6 +560,7 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = { .udp_tunnel_port_del = ixgbe_dev_udp_tunnel_port_del, .tm_ops_get = ixgbe_tm_ops_get, .tx_done_cleanup = ixgbe_dev_tx_done_cleanup, + .get_monitor_addr = ixgbe_get_monitor_addr, }; /* diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index 7bb8460359..cc8f70e6dd 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.c +++ b/drivers/net/ixgbe/ixgbe_rxtx.c @@ -1369,6 +1369,31 @@ const uint32_t RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP, }; +int +ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) +{ + volatile union ixgbe_adv_rx_desc *rxdp; + struct ixgbe_rx_queue *rxq = rx_queue; + uint16_t desc; + + desc = rxq->rx_tail; + rxdp = &rxq->rx_ring[desc]; + /* watch for changes in status bit */ + pmc->addr = &rxdp->wb.upper.status_error; + + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD); + pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD); + + /* the registers are 32-bit */ + pmc->data_sz = sizeof(uint32_t); + + return 0; +} + /* @note: fix ixgbe_dev_supported_ptypes_get() if any change here. */ static inline uint32_t ixgbe_rxd_pkt_info_to_pkt_type(uint32_t pkt_info, uint16_t ptype_mask) diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h index 6d2f7c9da3..8a25e98df6 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.h +++ b/drivers/net/ixgbe/ixgbe_rxtx.h @@ -299,5 +299,6 @@ uint64_t ixgbe_get_tx_port_offloads(struct rte_eth_dev *dev); uint64_t ixgbe_get_rx_queue_offloads(struct rte_eth_dev *dev); uint64_t ixgbe_get_rx_port_offloads(struct rte_eth_dev *dev); uint64_t ixgbe_get_tx_queue_offloads(struct rte_eth_dev *dev); +int ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc); #endif /* _IXGBE_RXTX_H_ */ From patchwork Thu Jan 14 14:46:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 86633 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 74785A0A02; Thu, 14 Jan 2021 15:47:52 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8F79914133B; Thu, 14 Jan 2021 15:46:46 +0100 (CET) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by mails.dpdk.org (Postfix) with ESMTP id 67787141339 for ; Thu, 14 Jan 2021 15:46:45 +0100 (CET) IronPort-SDR: HLDGgtHx1L3SNfvn3UipGrRTs3Y0lKv7uMfk0vlus7afJ+hRV80zdZug5onfjys52tF1q4pczu TDERv89XdE0g== X-IronPort-AV: E=McAfee;i="6000,8403,9864"; a="157556238" X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="157556238" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 06:46:44 -0800 IronPort-SDR: 9atDrlBcwE5psd8LrGdpJi6TYruPtP2Ej/zXZvqe4RIz5lOyNT6p7TAGuZ+R4v+whuoMfjYXvy ySNnVbxpXdMA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="465271397" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by fmsmga001.fm.intel.com with ESMTP; 14 Jan 2021 06:46:39 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , Beilei Xing , Jeff Guo , thomas@monjalon.net, konstantin.ananyev@intel.com, timothy.mcdaniel@intel.com, david.hunt@intel.com, bruce.richardson@intel.com, chris.macnamara@intel.com Date: Thu, 14 Jan 2021 14:46:11 +0000 Message-Id: <786eaba8b038429313a956bb4c78f54e8b417991.1610635488.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v17 09/11] net/i40e: implement power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Implement support for the power management API by implementing a `get_monitor_addr` function that will return an address of an RX ring's status bit. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev Acked-by: Jeff Guo --- drivers/net/i40e/i40e_ethdev.c | 1 + drivers/net/i40e/i40e_rxtx.c | 25 +++++++++++++++++++++++++ drivers/net/i40e/i40e_rxtx.h | 1 + 3 files changed, 27 insertions(+) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 14622484a0..ba1abc584f 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -510,6 +510,7 @@ static const struct eth_dev_ops i40e_eth_dev_ops = { .mtu_set = i40e_dev_mtu_set, .tm_ops_get = i40e_tm_ops_get, .tx_done_cleanup = i40e_tx_done_cleanup, + .get_monitor_addr = i40e_get_monitor_addr, }; /* store statistics names and its offset in stats structure */ diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 5df9a9df56..0b4220fc9c 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -72,6 +72,31 @@ #define I40E_TX_OFFLOAD_NOTSUP_MASK \ (PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK) +int +i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) +{ + struct i40e_rx_queue *rxq = rx_queue; + volatile union i40e_rx_desc *rxdp; + uint16_t desc; + + desc = rxq->rx_tail; + rxdp = &rxq->rx_ring[desc]; + /* watch for changes in status bit */ + pmc->addr = &rxdp->wb.qword1.status_error_len; + + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); + pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); + + /* registers are 64-bit */ + pmc->data_sz = sizeof(uint64_t); + + return 0; +} + static inline void i40e_rxd_to_vlan_tci(struct rte_mbuf *mb, volatile union i40e_rx_desc *rxdp) { diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h index 57d7b4160b..e1494525ce 100644 --- a/drivers/net/i40e/i40e_rxtx.h +++ b/drivers/net/i40e/i40e_rxtx.h @@ -248,6 +248,7 @@ uint16_t i40e_recv_scattered_pkts_vec_avx2(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts); uint16_t i40e_xmit_pkts_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts); +int i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc); /* For each value it means, datasheet of hardware can tell more details * From patchwork Thu Jan 14 14:46:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 86634 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id AC0FCA0A02; Thu, 14 Jan 2021 15:48:02 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id BEF71141332; Thu, 14 Jan 2021 15:46:48 +0100 (CET) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by mails.dpdk.org (Postfix) with ESMTP id 88FE8141332 for ; Thu, 14 Jan 2021 15:46:47 +0100 (CET) IronPort-SDR: uCUTz4Ho/R6heKFjt5ofPa1dq82VGwBxfG8ynC7iIQyIWtCXBKfyZ0NL8YNcJv2YckiL/JmXxv wiCVWM+4l8+g== X-IronPort-AV: E=McAfee;i="6000,8403,9864"; a="157556242" X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="157556242" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 06:46:47 -0800 IronPort-SDR: jZEeBGwo0hM0TT2lI7TPUvXqtNI+avVCYW6eM/Qwz8VY+WtJ0Xx/iL4n+h25oXFgh0dqVdpib7 7vrf7rfsoS+w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="465271416" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by fmsmga001.fm.intel.com with ESMTP; 14 Jan 2021 06:46:44 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , Qiming Yang , Qi Zhang , thomas@monjalon.net, konstantin.ananyev@intel.com, timothy.mcdaniel@intel.com, david.hunt@intel.com, bruce.richardson@intel.com, chris.macnamara@intel.com Date: Thu, 14 Jan 2021 14:46:12 +0000 Message-Id: <3f58120d458462fe37385563ba5c7a157328c68e.1610635488.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v17 10/11] net/ice: implement power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Implement support for the power management API by implementing a `get_monitor_addr` function that will return an address of an RX ring's status bit. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev --- drivers/net/ice/ice_ethdev.c | 1 + drivers/net/ice/ice_rxtx.c | 26 ++++++++++++++++++++++++++ drivers/net/ice/ice_rxtx.h | 1 + 3 files changed, 28 insertions(+) diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index 587f485ee3..38c6263946 100644 --- a/drivers/net/ice/ice_ethdev.c +++ b/drivers/net/ice/ice_ethdev.c @@ -216,6 +216,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = { .udp_tunnel_port_add = ice_dev_udp_tunnel_port_add, .udp_tunnel_port_del = ice_dev_udp_tunnel_port_del, .tx_done_cleanup = ice_tx_done_cleanup, + .get_monitor_addr = ice_get_monitor_addr, }; /* store statistics names and its offset in stats structure */ diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index d052bd0f1b..066651dc48 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -26,6 +26,32 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask; uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask; uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask; +int +ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) +{ + volatile union ice_rx_flex_desc *rxdp; + struct ice_rx_queue *rxq = rx_queue; + uint16_t desc; + + desc = rxq->rx_tail; + rxdp = &rxq->rx_ring[desc]; + /* watch for changes in status bit */ + pmc->addr = &rxdp->wb.status_error0; + + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S); + pmc->mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S); + + /* register is 16-bit */ + pmc->data_sz = sizeof(uint16_t); + + return 0; +} + + static inline uint8_t ice_proto_xtr_type_to_rxdid(uint8_t xtr_type) { diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h index 6b16716063..906fbefdc4 100644 --- a/drivers/net/ice/ice_rxtx.h +++ b/drivers/net/ice/ice_rxtx.h @@ -263,6 +263,7 @@ uint16_t ice_xmit_pkts_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts); int ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc); int ice_tx_done_cleanup(void *txq, uint32_t free_cnt); +int ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc); #define FDIR_PARSING_ENABLE_PER_QUEUE(ad, on) do { \ int i; \ From patchwork Thu Jan 14 14:46:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 86635 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id CE6BCA0A02; Thu, 14 Jan 2021 15:48:11 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id ED383141345; Thu, 14 Jan 2021 15:46:51 +0100 (CET) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by mails.dpdk.org (Postfix) with ESMTP id 06050141343 for ; Thu, 14 Jan 2021 15:46:49 +0100 (CET) IronPort-SDR: P8jJ2lkSlZb1vpYgxjkjnFMmv1D+tAJ0Woy/2CvhZw9E7u2Mch/asuB7eFHavhI9DRXa+1xpMD qER6q+88Gb1Q== X-IronPort-AV: E=McAfee;i="6000,8403,9864"; a="157556250" X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="157556250" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 06:46:49 -0800 IronPort-SDR: SGFmHj9zUqgso42pEGvFBY+MFvgs215ezBOH0U2gmezTBLavF84q7+qWoN1NxDhMcrB0p1CSoa MDsFvNxrS57w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="465271428" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by fmsmga001.fm.intel.com with ESMTP; 14 Jan 2021 06:46:47 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , David Hunt , thomas@monjalon.net, konstantin.ananyev@intel.com, timothy.mcdaniel@intel.com, bruce.richardson@intel.com, chris.macnamara@intel.com Date: Thu, 14 Jan 2021 14:46:13 +0000 Message-Id: <258d6e9ae723318bdca531c1a5b51dcebec48435.1610635488.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v17 11/11] examples/l3fwd-power: enable PMD power mgmt X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Add PMD power management feature support to l3fwd-power sample app. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: David Hunt --- Notes: v12: - Allow selecting PMD power management scheme from command-line - Enforce 1 core 1 queue rule .../sample_app_ug/l3_forward_power_man.rst | 35 ++++++++ examples/l3fwd-power/main.c | 89 ++++++++++++++++++- 2 files changed, 122 insertions(+), 2 deletions(-) diff --git a/doc/guides/sample_app_ug/l3_forward_power_man.rst b/doc/guides/sample_app_ug/l3_forward_power_man.rst index 85a78a5c1e..aaa9367fae 100644 --- a/doc/guides/sample_app_ug/l3_forward_power_man.rst +++ b/doc/guides/sample_app_ug/l3_forward_power_man.rst @@ -109,6 +109,8 @@ where, * --telemetry: Telemetry mode. +* --pmd-mgmt: PMD power management mode. + See :doc:`l3_forward` for details. The L3fwd-power example reuses the L3fwd command line options. @@ -456,3 +458,36 @@ reference cycles and accordingly busy rate is set to either 0% or The new stats ``empty_poll`` , ``full_poll`` and ``busy_percent`` can be viewed by running the script ``/usertools/dpdk-telemetry-client.py`` and selecting the menu option ``Send for global Metrics``. + +PMD power management Mode +------------------------- + +The PMD power management mode support for ``l3fwd-power`` is a standalone mode, in this mode +``l3fwd-power`` does simple l3fwding along with enable the power saving scheme on specific +port/queue/lcore. Main purpose for this mode is to demonstrate how to use the PMD power management API. + +.. code-block:: console + + ./build/examples/dpdk-l3fwd-power -l 1-3 -- --pmd-mgmt -p 0x0f --config="(0,0,2),(0,1,3)" + +PMD Power Management Mode +------------------------- +There is also a traffic-aware operating mode that, instead of using explicit +power management, will use automatic PMD power management. This mode is limited +to one queue per core, and has three available power management schemes: + +* ``monitor`` - this will use ``rte_power_monitor()`` function to enter a + power-optimized state (subject to platform support). + +* ``pause`` - this will use ``rte_power_pause()`` or ``rte_pause()`` to avoid + busy looping when there is no traffic. + +* ``scale`` - this will use frequency scaling routines available in the + ``librte_power`` library. + +See :doc:`Power Management<../prog_guide/power_man>` chapter in the DPDK +Programmer's Guide for more details on PMD power management. + +.. code-block:: console + + .//examples/dpdk-l3fwd-power -l 1-3 -- -p 0x0f --config="(0,0,2),(0,1,3)" --pmd-mgmt=scale diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index 995a3b6ad7..e312b6f355 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -47,6 +47,7 @@ #include #include #include +#include #include "perf_core.h" #include "main.h" @@ -199,11 +200,14 @@ enum appmode { APP_MODE_LEGACY, APP_MODE_EMPTY_POLL, APP_MODE_TELEMETRY, - APP_MODE_INTERRUPT + APP_MODE_INTERRUPT, + APP_MODE_PMD_MGMT }; enum appmode app_mode; +static enum rte_power_pmd_mgmt_type pmgmt_type; + enum freq_scale_hint_t { FREQ_LOWER = -1, @@ -1611,7 +1615,9 @@ print_usage(const char *prgname) " follow (training_flag, high_threshold, med_threshold)\n" " --telemetry: enable telemetry mode, to update" " empty polls, full polls, and core busyness to telemetry\n" - " --interrupt-only: enable interrupt-only mode\n", + " --interrupt-only: enable interrupt-only mode\n" + " --pmd-mgmt MODE: enable PMD power management mode. " + "Currently supported modes: monitor, pause, scale\n", prgname); } @@ -1701,6 +1707,32 @@ parse_config(const char *q_arg) return 0; } + +static int +parse_pmd_mgmt_config(const char *name) +{ +#define PMD_MGMT_MONITOR "monitor" +#define PMD_MGMT_PAUSE "pause" +#define PMD_MGMT_SCALE "scale" + + if (strncmp(PMD_MGMT_MONITOR, name, sizeof(PMD_MGMT_MONITOR)) == 0) { + pmgmt_type = RTE_POWER_MGMT_TYPE_MONITOR; + return 0; + } + + if (strncmp(PMD_MGMT_PAUSE, name, sizeof(PMD_MGMT_PAUSE)) == 0) { + pmgmt_type = RTE_POWER_MGMT_TYPE_PAUSE; + return 0; + } + + if (strncmp(PMD_MGMT_SCALE, name, sizeof(PMD_MGMT_SCALE)) == 0) { + pmgmt_type = RTE_POWER_MGMT_TYPE_SCALE; + return 0; + } + /* unknown PMD power management mode */ + return -1; +} + static int parse_ep_config(const char *q_arg) { @@ -1755,6 +1787,7 @@ parse_ep_config(const char *q_arg) #define CMD_LINE_OPT_EMPTY_POLL "empty-poll" #define CMD_LINE_OPT_INTERRUPT_ONLY "interrupt-only" #define CMD_LINE_OPT_TELEMETRY "telemetry" +#define CMD_LINE_OPT_PMD_MGMT "pmd-mgmt" /* Parse the argument given in the command line of the application */ static int @@ -1776,6 +1809,7 @@ parse_args(int argc, char **argv) {CMD_LINE_OPT_LEGACY, 0, 0, 0}, {CMD_LINE_OPT_TELEMETRY, 0, 0, 0}, {CMD_LINE_OPT_INTERRUPT_ONLY, 0, 0, 0}, + {CMD_LINE_OPT_PMD_MGMT, 1, 0, 0}, {NULL, 0, 0, 0} }; @@ -1886,6 +1920,21 @@ parse_args(int argc, char **argv) printf("telemetry mode is enabled\n"); } + if (!strncmp(lgopts[option_index].name, + CMD_LINE_OPT_PMD_MGMT, + sizeof(CMD_LINE_OPT_PMD_MGMT))) { + if (app_mode != APP_MODE_DEFAULT) { + printf(" power mgmt mode is mutually exclusive with other modes\n"); + return -1; + } + if (parse_pmd_mgmt_config(optarg) < 0) { + printf(" Invalid PMD power management mode: %s\n", + optarg); + return -1; + } + app_mode = APP_MODE_PMD_MGMT; + printf("PMD power mgmt mode is enabled\n"); + } if (!strncmp(lgopts[option_index].name, CMD_LINE_OPT_INTERRUPT_ONLY, sizeof(CMD_LINE_OPT_INTERRUPT_ONLY))) { @@ -2442,6 +2491,8 @@ mode_to_str(enum appmode mode) return "telemetry"; case APP_MODE_INTERRUPT: return "interrupt-only"; + case APP_MODE_PMD_MGMT: + return "pmd mgmt"; default: return "invalid"; } @@ -2671,6 +2722,13 @@ main(int argc, char **argv) qconf = &lcore_conf[lcore_id]; printf("\nInitializing rx queues on lcore %u ... ", lcore_id ); fflush(stdout); + + /* PMD power management mode can only do 1 queue per core */ + if (app_mode == APP_MODE_PMD_MGMT && qconf->n_rx_queue > 1) { + rte_exit(EXIT_FAILURE, + "In PMD power management mode, only one queue per lcore is allowed\n"); + } + /* init RX queues */ for(queue = 0; queue < qconf->n_rx_queue; ++queue) { struct rte_eth_rxconf rxq_conf; @@ -2708,6 +2766,16 @@ main(int argc, char **argv) rte_exit(EXIT_FAILURE, "Fail to add ptype cb\n"); } + + if (app_mode == APP_MODE_PMD_MGMT) { + ret = rte_power_pmd_mgmt_queue_enable( + lcore_id, portid, queueid, + pmgmt_type); + if (ret < 0) + rte_exit(EXIT_FAILURE, + "rte_power_pmd_mgmt_queue_enable: err=%d, port=%d\n", + ret, portid); + } } } @@ -2798,6 +2866,9 @@ main(int argc, char **argv) SKIP_MAIN); } else if (app_mode == APP_MODE_INTERRUPT) { rte_eal_mp_remote_launch(main_intr_loop, NULL, CALL_MAIN); + } else if (app_mode == APP_MODE_PMD_MGMT) { + /* reuse telemetry loop for PMD power management mode */ + rte_eal_mp_remote_launch(main_telemetry_loop, NULL, CALL_MAIN); } if (app_mode == APP_MODE_EMPTY_POLL || app_mode == APP_MODE_TELEMETRY) @@ -2824,6 +2895,20 @@ main(int argc, char **argv) if (app_mode == APP_MODE_EMPTY_POLL) rte_power_empty_poll_stat_free(); + if (app_mode == APP_MODE_PMD_MGMT) { + for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { + if (rte_lcore_is_enabled(lcore_id) == 0) + continue; + qconf = &lcore_conf[lcore_id]; + for (queue = 0; queue < qconf->n_rx_queue; ++queue) { + portid = qconf->rx_queue_list[queue].port_id; + queueid = qconf->rx_queue_list[queue].queue_id; + rte_power_pmd_mgmt_queue_disable(lcore_id, + portid, queueid); + } + } + } + if ((app_mode == APP_MODE_LEGACY || app_mode == APP_MODE_EMPTY_POLL) && deinit_power_library()) rte_exit(EXIT_FAILURE, "deinit_power_library failed\n");