From patchwork Thu Jan 14 14:46:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Burakov, Anatoly" X-Patchwork-Id: 86625 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 52B70A0A02; Thu, 14 Jan 2021 15:46:26 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 712B0141300; Thu, 14 Jan 2021 15:46:21 +0100 (CET) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by mails.dpdk.org (Postfix) with ESMTP id C82641412F6 for ; Thu, 14 Jan 2021 15:46:19 +0100 (CET) IronPort-SDR: hWYQQ6Bn9bCIJLjoSi2KYBXyyThJE4FzRfhX7kJvyW6vx3rvUsdy/OXxPsef93sXzDBK9c1FhR cF1FqFsy8HVA== X-IronPort-AV: E=McAfee;i="6000,8403,9863"; a="174870224" X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="174870224" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 06:46:19 -0800 IronPort-SDR: 6bw7ph0IqZHEmNhRE8psaRqjhBGE1EL6e8YXDGB6ZJJ+HtORaPQ0g4+QCEJE/zTp7gKecy4XH9 wW1LmA38aRhQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="465271283" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by fmsmga001.fm.intel.com with ESMTP; 14 Jan 2021 06:46:16 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Jerin Jacob , Ruifeng Wang , Jan Viktorin , David Christensen , Ray Kinsella , Neil Horman , Bruce Richardson , Konstantin Ananyev , thomas@monjalon.net, timothy.mcdaniel@intel.com, david.hunt@intel.com, chris.macnamara@intel.com Date: Thu, 14 Jan 2021 14:46:03 +0000 Message-Id: <7fd091ecf4480a6fdf84bc34a8e1700eaf793e13.1610635488.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v17 01/11] eal: uninline power intrinsics X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Currently, power intrinsics are inline functions. Make them part of the ABI so that we can have various internal data associated with them without exposing said data to the outside world. Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev --- Notes: v14: - Fix compile issues on ARM and PPC64 by moving implementations to .c files .../arm/include/rte_power_intrinsics.h | 40 ------ lib/librte_eal/arm/meson.build | 1 + lib/librte_eal/arm/rte_power_intrinsics.c | 45 +++++++ .../include/generic/rte_power_intrinsics.h | 6 +- .../ppc/include/rte_power_intrinsics.h | 40 ------ lib/librte_eal/ppc/meson.build | 1 + lib/librte_eal/ppc/rte_power_intrinsics.c | 45 +++++++ lib/librte_eal/version.map | 3 + .../x86/include/rte_power_intrinsics.h | 115 ----------------- lib/librte_eal/x86/meson.build | 1 + lib/librte_eal/x86/rte_power_intrinsics.c | 120 ++++++++++++++++++ 11 files changed, 219 insertions(+), 198 deletions(-) create mode 100644 lib/librte_eal/arm/rte_power_intrinsics.c create mode 100644 lib/librte_eal/ppc/rte_power_intrinsics.c create mode 100644 lib/librte_eal/x86/rte_power_intrinsics.c diff --git a/lib/librte_eal/arm/include/rte_power_intrinsics.h b/lib/librte_eal/arm/include/rte_power_intrinsics.h index a4a1bc1159..9e498e9ebf 100644 --- a/lib/librte_eal/arm/include/rte_power_intrinsics.h +++ b/lib/librte_eal/arm/include/rte_power_intrinsics.h @@ -13,46 +13,6 @@ extern "C" { #include "generic/rte_power_intrinsics.h" -/** - * This function is not supported on ARM. - */ -static inline void -rte_power_monitor(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz) -{ - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); - RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(data_sz); -} - -/** - * This function is not supported on ARM. - */ -static inline void -rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz, rte_spinlock_t *lck) -{ - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); - RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(lck); - RTE_SET_USED(data_sz); -} - -/** - * This function is not supported on ARM. - */ -static inline void -rte_power_pause(const uint64_t tsc_timestamp) -{ - RTE_SET_USED(tsc_timestamp); -} - #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/arm/meson.build b/lib/librte_eal/arm/meson.build index d62875ebae..6ec53ea03a 100644 --- a/lib/librte_eal/arm/meson.build +++ b/lib/librte_eal/arm/meson.build @@ -7,4 +7,5 @@ sources += files( 'rte_cpuflags.c', 'rte_cycles.c', 'rte_hypervisor.c', + 'rte_power_intrinsics.c', ) diff --git a/lib/librte_eal/arm/rte_power_intrinsics.c b/lib/librte_eal/arm/rte_power_intrinsics.c new file mode 100644 index 0000000000..ab1f44f611 --- /dev/null +++ b/lib/librte_eal/arm/rte_power_intrinsics.c @@ -0,0 +1,45 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2021 Intel Corporation + */ + +#include "rte_power_intrinsics.h" + +/** + * This function is not supported on ARM. + */ +void +rte_power_monitor(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on ARM. + */ +void +rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz, rte_spinlock_t *lck) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(lck); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on ARM. + */ +void +rte_power_pause(const uint64_t tsc_timestamp) +{ + RTE_SET_USED(tsc_timestamp); +} diff --git a/lib/librte_eal/include/generic/rte_power_intrinsics.h b/lib/librte_eal/include/generic/rte_power_intrinsics.h index dd520d90fa..67977bd511 100644 --- a/lib/librte_eal/include/generic/rte_power_intrinsics.h +++ b/lib/librte_eal/include/generic/rte_power_intrinsics.h @@ -52,7 +52,7 @@ * to undefined result. */ __rte_experimental -static inline void rte_power_monitor(const volatile void *p, +void rte_power_monitor(const volatile void *p, const uint64_t expected_value, const uint64_t value_mask, const uint64_t tsc_timestamp, const uint8_t data_sz); @@ -97,7 +97,7 @@ static inline void rte_power_monitor(const volatile void *p, * wakes up. */ __rte_experimental -static inline void rte_power_monitor_sync(const volatile void *p, +void rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, const uint64_t value_mask, const uint64_t tsc_timestamp, const uint8_t data_sz, rte_spinlock_t *lck); @@ -118,6 +118,6 @@ static inline void rte_power_monitor_sync(const volatile void *p, * architecture-dependent. */ __rte_experimental -static inline void rte_power_pause(const uint64_t tsc_timestamp); +void rte_power_pause(const uint64_t tsc_timestamp); #endif /* _RTE_POWER_INTRINSIC_H_ */ diff --git a/lib/librte_eal/ppc/include/rte_power_intrinsics.h b/lib/librte_eal/ppc/include/rte_power_intrinsics.h index 4ed03d521f..c0e9ac279f 100644 --- a/lib/librte_eal/ppc/include/rte_power_intrinsics.h +++ b/lib/librte_eal/ppc/include/rte_power_intrinsics.h @@ -13,46 +13,6 @@ extern "C" { #include "generic/rte_power_intrinsics.h" -/** - * This function is not supported on PPC64. - */ -static inline void -rte_power_monitor(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz) -{ - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); - RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(data_sz); -} - -/** - * This function is not supported on PPC64. - */ -static inline void -rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz, rte_spinlock_t *lck) -{ - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); - RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(lck); - RTE_SET_USED(data_sz); -} - -/** - * This function is not supported on PPC64. - */ -static inline void -rte_power_pause(const uint64_t tsc_timestamp) -{ - RTE_SET_USED(tsc_timestamp); -} - #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/ppc/meson.build b/lib/librte_eal/ppc/meson.build index f4b6d95c42..43c46542fb 100644 --- a/lib/librte_eal/ppc/meson.build +++ b/lib/librte_eal/ppc/meson.build @@ -7,4 +7,5 @@ sources += files( 'rte_cpuflags.c', 'rte_cycles.c', 'rte_hypervisor.c', + 'rte_power_intrinsics.c', ) diff --git a/lib/librte_eal/ppc/rte_power_intrinsics.c b/lib/librte_eal/ppc/rte_power_intrinsics.c new file mode 100644 index 0000000000..84340ca2a4 --- /dev/null +++ b/lib/librte_eal/ppc/rte_power_intrinsics.c @@ -0,0 +1,45 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2021 Intel Corporation + */ + +#include "rte_power_intrinsics.h" + +/** + * This function is not supported on PPC64. + */ +void +rte_power_monitor(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on PPC64. + */ +void +rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz, rte_spinlock_t *lck) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(lck); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on PPC64. + */ +void +rte_power_pause(const uint64_t tsc_timestamp) +{ + RTE_SET_USED(tsc_timestamp); +} diff --git a/lib/librte_eal/version.map b/lib/librte_eal/version.map index b1db7ec795..32eceb8869 100644 --- a/lib/librte_eal/version.map +++ b/lib/librte_eal/version.map @@ -405,6 +405,9 @@ EXPERIMENTAL { rte_vect_set_max_simd_bitwidth; # added in 21.02 + rte_power_monitor; + rte_power_monitor_sync; + rte_power_pause; rte_thread_tls_key_create; rte_thread_tls_key_delete; rte_thread_tls_value_get; diff --git a/lib/librte_eal/x86/include/rte_power_intrinsics.h b/lib/librte_eal/x86/include/rte_power_intrinsics.h index c7d790c854..e4c2b87f73 100644 --- a/lib/librte_eal/x86/include/rte_power_intrinsics.h +++ b/lib/librte_eal/x86/include/rte_power_intrinsics.h @@ -13,121 +13,6 @@ extern "C" { #include "generic/rte_power_intrinsics.h" -static inline uint64_t -__rte_power_get_umwait_val(const volatile void *p, const uint8_t sz) -{ - switch (sz) { - case sizeof(uint8_t): - return *(const volatile uint8_t *)p; - case sizeof(uint16_t): - return *(const volatile uint16_t *)p; - case sizeof(uint32_t): - return *(const volatile uint32_t *)p; - case sizeof(uint64_t): - return *(const volatile uint64_t *)p; - default: - /* this is an intrinsic, so we can't have any error handling */ - RTE_ASSERT(0); - return 0; - } -} - -/** - * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state. - * For more information about usage of these instructions, please refer to - * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. - */ -static inline void -rte_power_monitor(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz) -{ - const uint32_t tsc_l = (uint32_t)tsc_timestamp; - const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); - /* - * we're using raw byte codes for now as only the newest compiler - * versions support this instruction natively. - */ - - /* set address for UMONITOR */ - asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" - : - : "D"(p)); - - if (value_mask) { - const uint64_t cur_value = __rte_power_get_umwait_val(p, data_sz); - const uint64_t masked = cur_value & value_mask; - - /* if the masked value is already matching, abort */ - if (masked == expected_value) - return; - } - /* execute UMWAIT */ - asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" - : /* ignore rflags */ - : "D"(0), /* enter C0.2 */ - "a"(tsc_l), "d"(tsc_h)); -} - -/** - * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state. - * For more information about usage of these instructions, please refer to - * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. - */ -static inline void -rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz, rte_spinlock_t *lck) -{ - const uint32_t tsc_l = (uint32_t)tsc_timestamp; - const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); - /* - * we're using raw byte codes for now as only the newest compiler - * versions support this instruction natively. - */ - - /* set address for UMONITOR */ - asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" - : - : "D"(p)); - - if (value_mask) { - const uint64_t cur_value = __rte_power_get_umwait_val(p, data_sz); - const uint64_t masked = cur_value & value_mask; - - /* if the masked value is already matching, abort */ - if (masked == expected_value) - return; - } - rte_spinlock_unlock(lck); - - /* execute UMWAIT */ - asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" - : /* ignore rflags */ - : "D"(0), /* enter C0.2 */ - "a"(tsc_l), "d"(tsc_h)); - - rte_spinlock_lock(lck); -} - -/** - * This function uses TPAUSE instruction and will enter C0.2 state. For more - * information about usage of this instruction, please refer to Intel(R) 64 and - * IA-32 Architectures Software Developer's Manual. - */ -static inline void -rte_power_pause(const uint64_t tsc_timestamp) -{ - const uint32_t tsc_l = (uint32_t)tsc_timestamp; - const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); - - /* execute TPAUSE */ - asm volatile(".byte 0x66, 0x0f, 0xae, 0xf7;" - : /* ignore rflags */ - : "D"(0), /* enter C0.2 */ - "a"(tsc_l), "d"(tsc_h)); -} - #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/x86/meson.build b/lib/librte_eal/x86/meson.build index e78f29002e..dfd42dee0c 100644 --- a/lib/librte_eal/x86/meson.build +++ b/lib/librte_eal/x86/meson.build @@ -8,4 +8,5 @@ sources += files( 'rte_cycles.c', 'rte_hypervisor.c', 'rte_spinlock.c', + 'rte_power_intrinsics.c', ) diff --git a/lib/librte_eal/x86/rte_power_intrinsics.c b/lib/librte_eal/x86/rte_power_intrinsics.c new file mode 100644 index 0000000000..34c5fd9c3e --- /dev/null +++ b/lib/librte_eal/x86/rte_power_intrinsics.c @@ -0,0 +1,120 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#include "rte_power_intrinsics.h" + +static inline uint64_t +__get_umwait_val(const volatile void *p, const uint8_t sz) +{ + switch (sz) { + case sizeof(uint8_t): + return *(const volatile uint8_t *)p; + case sizeof(uint16_t): + return *(const volatile uint16_t *)p; + case sizeof(uint32_t): + return *(const volatile uint32_t *)p; + case sizeof(uint64_t): + return *(const volatile uint64_t *)p; + default: + /* this is an intrinsic, so we can't have any error handling */ + RTE_ASSERT(0); + return 0; + } +} + +/** + * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state. + * For more information about usage of these instructions, please refer to + * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. + */ +void +rte_power_monitor(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz) +{ + const uint32_t tsc_l = (uint32_t)tsc_timestamp; + const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + /* + * we're using raw byte codes for now as only the newest compiler + * versions support this instruction natively. + */ + + /* set address for UMONITOR */ + asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" + : + : "D"(p)); + + if (value_mask) { + const uint64_t cur_value = __get_umwait_val(p, data_sz); + const uint64_t masked = cur_value & value_mask; + + /* if the masked value is already matching, abort */ + if (masked == expected_value) + return; + } + /* execute UMWAIT */ + asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" + : /* ignore rflags */ + : "D"(0), /* enter C0.2 */ + "a"(tsc_l), "d"(tsc_h)); +} + +/** + * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state. + * For more information about usage of these instructions, please refer to + * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. + */ +void +rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz, rte_spinlock_t *lck) +{ + const uint32_t tsc_l = (uint32_t)tsc_timestamp; + const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + /* + * we're using raw byte codes for now as only the newest compiler + * versions support this instruction natively. + */ + + /* set address for UMONITOR */ + asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" + : + : "D"(p)); + + if (value_mask) { + const uint64_t cur_value = __get_umwait_val(p, data_sz); + const uint64_t masked = cur_value & value_mask; + + /* if the masked value is already matching, abort */ + if (masked == expected_value) + return; + } + rte_spinlock_unlock(lck); + + /* execute UMWAIT */ + asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" + : /* ignore rflags */ + : "D"(0), /* enter C0.2 */ + "a"(tsc_l), "d"(tsc_h)); + + rte_spinlock_lock(lck); +} + +/** + * This function uses TPAUSE instruction and will enter C0.2 state. For more + * information about usage of this instruction, please refer to Intel(R) 64 and + * IA-32 Architectures Software Developer's Manual. + */ +void +rte_power_pause(const uint64_t tsc_timestamp) +{ + const uint32_t tsc_l = (uint32_t)tsc_timestamp; + const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + + /* execute TPAUSE */ + asm volatile(".byte 0x66, 0x0f, 0xae, 0xf7;" + : /* ignore rflags */ + : "D"(0), /* enter C0.2 */ + "a"(tsc_l), "d"(tsc_h)); +}