From patchwork Mon Jan 11 14:58:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 86315 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D73B8A09FF; Mon, 11 Jan 2021 15:59:07 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7EF58140E93; Mon, 11 Jan 2021 15:59:04 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by mails.dpdk.org (Postfix) with ESMTP id 07D4C140E7C for ; Mon, 11 Jan 2021 15:59:01 +0100 (CET) IronPort-SDR: nHHRwqGz8JF69C34xIyoiDYm/Agp+/1Y/2zTOayzPYANfXWncz8jzWkzdUGuvQdLNEIkCAO8n4 T9XMMnxZ1hXA== X-IronPort-AV: E=McAfee;i="6000,8403,9860"; a="157652977" X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="157652977" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2021 06:59:01 -0800 IronPort-SDR: EsTo2VvVxq2ErApUYzaWZ92H1dIZgmJf9q/x1iZJLm8lklatF8vcFQGfT5nEdJkC4u/pnaATzs A8U+zCMx4RDQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="423816502" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by orsmga001.jf.intel.com with ESMTP; 11 Jan 2021 06:58:58 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Jan Viktorin , Ruifeng Wang , Jerin Jacob , David Christensen , Ray Kinsella , Neil Horman , Bruce Richardson , Konstantin Ananyev , thomas@monjalon.net, timothy.mcdaniel@intel.com, david.hunt@intel.com, chris.macnamara@intel.com Date: Mon, 11 Jan 2021 14:58:46 +0000 Message-Id: <18244c0453adf9a216f88e8edc14f1e68b53053b.1610377084.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v15 01/11] eal: uninline power intrinsics X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Currently, power intrinsics are inline functions. Make them part of the ABI so that we can have various internal data associated with them without exposing said data to the outside world. Signed-off-by: Anatoly Burakov --- Notes: v14: - Fix compile issues on ARM and PPC64 by moving implementations to .c files .../arm/include/rte_power_intrinsics.h | 40 ------ lib/librte_eal/arm/meson.build | 1 + lib/librte_eal/arm/rte_power_intrinsics.c | 42 ++++++ .../include/generic/rte_power_intrinsics.h | 6 +- .../ppc/include/rte_power_intrinsics.h | 40 ------ lib/librte_eal/ppc/meson.build | 1 + lib/librte_eal/ppc/rte_power_intrinsics.c | 42 ++++++ lib/librte_eal/version.map | 5 + .../x86/include/rte_power_intrinsics.h | 115 ----------------- lib/librte_eal/x86/meson.build | 1 + lib/librte_eal/x86/rte_power_intrinsics.c | 120 ++++++++++++++++++ 11 files changed, 215 insertions(+), 198 deletions(-) create mode 100644 lib/librte_eal/arm/rte_power_intrinsics.c create mode 100644 lib/librte_eal/ppc/rte_power_intrinsics.c create mode 100644 lib/librte_eal/x86/rte_power_intrinsics.c diff --git a/lib/librte_eal/arm/include/rte_power_intrinsics.h b/lib/librte_eal/arm/include/rte_power_intrinsics.h index a4a1bc1159..9e498e9ebf 100644 --- a/lib/librte_eal/arm/include/rte_power_intrinsics.h +++ b/lib/librte_eal/arm/include/rte_power_intrinsics.h @@ -13,46 +13,6 @@ extern "C" { #include "generic/rte_power_intrinsics.h" -/** - * This function is not supported on ARM. - */ -static inline void -rte_power_monitor(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz) -{ - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); - RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(data_sz); -} - -/** - * This function is not supported on ARM. - */ -static inline void -rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz, rte_spinlock_t *lck) -{ - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); - RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(lck); - RTE_SET_USED(data_sz); -} - -/** - * This function is not supported on ARM. - */ -static inline void -rte_power_pause(const uint64_t tsc_timestamp) -{ - RTE_SET_USED(tsc_timestamp); -} - #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/arm/meson.build b/lib/librte_eal/arm/meson.build index d62875ebae..6ec53ea03a 100644 --- a/lib/librte_eal/arm/meson.build +++ b/lib/librte_eal/arm/meson.build @@ -7,4 +7,5 @@ sources += files( 'rte_cpuflags.c', 'rte_cycles.c', 'rte_hypervisor.c', + 'rte_power_intrinsics.c', ) diff --git a/lib/librte_eal/arm/rte_power_intrinsics.c b/lib/librte_eal/arm/rte_power_intrinsics.c new file mode 100644 index 0000000000..e5a49facb4 --- /dev/null +++ b/lib/librte_eal/arm/rte_power_intrinsics.c @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2021 Intel Corporation + */ + +#include "rte_power_intrinsics.h" + +/** + * This function is not supported on ARM. + */ +void rte_power_monitor(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on ARM. + */ +void rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz, rte_spinlock_t *lck) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(lck); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on ARM. + */ +void rte_power_pause(const uint64_t tsc_timestamp) +{ + RTE_SET_USED(tsc_timestamp); +} diff --git a/lib/librte_eal/include/generic/rte_power_intrinsics.h b/lib/librte_eal/include/generic/rte_power_intrinsics.h index dd520d90fa..67977bd511 100644 --- a/lib/librte_eal/include/generic/rte_power_intrinsics.h +++ b/lib/librte_eal/include/generic/rte_power_intrinsics.h @@ -52,7 +52,7 @@ * to undefined result. */ __rte_experimental -static inline void rte_power_monitor(const volatile void *p, +void rte_power_monitor(const volatile void *p, const uint64_t expected_value, const uint64_t value_mask, const uint64_t tsc_timestamp, const uint8_t data_sz); @@ -97,7 +97,7 @@ static inline void rte_power_monitor(const volatile void *p, * wakes up. */ __rte_experimental -static inline void rte_power_monitor_sync(const volatile void *p, +void rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, const uint64_t value_mask, const uint64_t tsc_timestamp, const uint8_t data_sz, rte_spinlock_t *lck); @@ -118,6 +118,6 @@ static inline void rte_power_monitor_sync(const volatile void *p, * architecture-dependent. */ __rte_experimental -static inline void rte_power_pause(const uint64_t tsc_timestamp); +void rte_power_pause(const uint64_t tsc_timestamp); #endif /* _RTE_POWER_INTRINSIC_H_ */ diff --git a/lib/librte_eal/ppc/include/rte_power_intrinsics.h b/lib/librte_eal/ppc/include/rte_power_intrinsics.h index 4ed03d521f..c0e9ac279f 100644 --- a/lib/librte_eal/ppc/include/rte_power_intrinsics.h +++ b/lib/librte_eal/ppc/include/rte_power_intrinsics.h @@ -13,46 +13,6 @@ extern "C" { #include "generic/rte_power_intrinsics.h" -/** - * This function is not supported on PPC64. - */ -static inline void -rte_power_monitor(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz) -{ - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); - RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(data_sz); -} - -/** - * This function is not supported on PPC64. - */ -static inline void -rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz, rte_spinlock_t *lck) -{ - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); - RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(lck); - RTE_SET_USED(data_sz); -} - -/** - * This function is not supported on PPC64. - */ -static inline void -rte_power_pause(const uint64_t tsc_timestamp) -{ - RTE_SET_USED(tsc_timestamp); -} - #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/ppc/meson.build b/lib/librte_eal/ppc/meson.build index f4b6d95c42..43c46542fb 100644 --- a/lib/librte_eal/ppc/meson.build +++ b/lib/librte_eal/ppc/meson.build @@ -7,4 +7,5 @@ sources += files( 'rte_cpuflags.c', 'rte_cycles.c', 'rte_hypervisor.c', + 'rte_power_intrinsics.c', ) diff --git a/lib/librte_eal/ppc/rte_power_intrinsics.c b/lib/librte_eal/ppc/rte_power_intrinsics.c new file mode 100644 index 0000000000..785effabe6 --- /dev/null +++ b/lib/librte_eal/ppc/rte_power_intrinsics.c @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2021 Intel Corporation + */ + +#include "rte_power_intrinsics.h" + +/** + * This function is not supported on PPC64. + */ +void rte_power_monitor(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on PPC64. + */ +void rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz, rte_spinlock_t *lck) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(lck); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on PPC64. + */ +void rte_power_pause(const uint64_t tsc_timestamp) +{ + RTE_SET_USED(tsc_timestamp); +} diff --git a/lib/librte_eal/version.map b/lib/librte_eal/version.map index 354c068f31..31bf76ae81 100644 --- a/lib/librte_eal/version.map +++ b/lib/librte_eal/version.map @@ -403,6 +403,11 @@ EXPERIMENTAL { rte_service_lcore_may_be_active; rte_vect_get_max_simd_bitwidth; rte_vect_set_max_simd_bitwidth; + + # added in 21.02 + rte_power_monitor; + rte_power_monitor_sync; + rte_power_pause; }; INTERNAL { diff --git a/lib/librte_eal/x86/include/rte_power_intrinsics.h b/lib/librte_eal/x86/include/rte_power_intrinsics.h index c7d790c854..e4c2b87f73 100644 --- a/lib/librte_eal/x86/include/rte_power_intrinsics.h +++ b/lib/librte_eal/x86/include/rte_power_intrinsics.h @@ -13,121 +13,6 @@ extern "C" { #include "generic/rte_power_intrinsics.h" -static inline uint64_t -__rte_power_get_umwait_val(const volatile void *p, const uint8_t sz) -{ - switch (sz) { - case sizeof(uint8_t): - return *(const volatile uint8_t *)p; - case sizeof(uint16_t): - return *(const volatile uint16_t *)p; - case sizeof(uint32_t): - return *(const volatile uint32_t *)p; - case sizeof(uint64_t): - return *(const volatile uint64_t *)p; - default: - /* this is an intrinsic, so we can't have any error handling */ - RTE_ASSERT(0); - return 0; - } -} - -/** - * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state. - * For more information about usage of these instructions, please refer to - * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. - */ -static inline void -rte_power_monitor(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz) -{ - const uint32_t tsc_l = (uint32_t)tsc_timestamp; - const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); - /* - * we're using raw byte codes for now as only the newest compiler - * versions support this instruction natively. - */ - - /* set address for UMONITOR */ - asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" - : - : "D"(p)); - - if (value_mask) { - const uint64_t cur_value = __rte_power_get_umwait_val(p, data_sz); - const uint64_t masked = cur_value & value_mask; - - /* if the masked value is already matching, abort */ - if (masked == expected_value) - return; - } - /* execute UMWAIT */ - asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" - : /* ignore rflags */ - : "D"(0), /* enter C0.2 */ - "a"(tsc_l), "d"(tsc_h)); -} - -/** - * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state. - * For more information about usage of these instructions, please refer to - * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. - */ -static inline void -rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz, rte_spinlock_t *lck) -{ - const uint32_t tsc_l = (uint32_t)tsc_timestamp; - const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); - /* - * we're using raw byte codes for now as only the newest compiler - * versions support this instruction natively. - */ - - /* set address for UMONITOR */ - asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" - : - : "D"(p)); - - if (value_mask) { - const uint64_t cur_value = __rte_power_get_umwait_val(p, data_sz); - const uint64_t masked = cur_value & value_mask; - - /* if the masked value is already matching, abort */ - if (masked == expected_value) - return; - } - rte_spinlock_unlock(lck); - - /* execute UMWAIT */ - asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" - : /* ignore rflags */ - : "D"(0), /* enter C0.2 */ - "a"(tsc_l), "d"(tsc_h)); - - rte_spinlock_lock(lck); -} - -/** - * This function uses TPAUSE instruction and will enter C0.2 state. For more - * information about usage of this instruction, please refer to Intel(R) 64 and - * IA-32 Architectures Software Developer's Manual. - */ -static inline void -rte_power_pause(const uint64_t tsc_timestamp) -{ - const uint32_t tsc_l = (uint32_t)tsc_timestamp; - const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); - - /* execute TPAUSE */ - asm volatile(".byte 0x66, 0x0f, 0xae, 0xf7;" - : /* ignore rflags */ - : "D"(0), /* enter C0.2 */ - "a"(tsc_l), "d"(tsc_h)); -} - #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/x86/meson.build b/lib/librte_eal/x86/meson.build index e78f29002e..dfd42dee0c 100644 --- a/lib/librte_eal/x86/meson.build +++ b/lib/librte_eal/x86/meson.build @@ -8,4 +8,5 @@ sources += files( 'rte_cycles.c', 'rte_hypervisor.c', 'rte_spinlock.c', + 'rte_power_intrinsics.c', ) diff --git a/lib/librte_eal/x86/rte_power_intrinsics.c b/lib/librte_eal/x86/rte_power_intrinsics.c new file mode 100644 index 0000000000..34c5fd9c3e --- /dev/null +++ b/lib/librte_eal/x86/rte_power_intrinsics.c @@ -0,0 +1,120 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#include "rte_power_intrinsics.h" + +static inline uint64_t +__get_umwait_val(const volatile void *p, const uint8_t sz) +{ + switch (sz) { + case sizeof(uint8_t): + return *(const volatile uint8_t *)p; + case sizeof(uint16_t): + return *(const volatile uint16_t *)p; + case sizeof(uint32_t): + return *(const volatile uint32_t *)p; + case sizeof(uint64_t): + return *(const volatile uint64_t *)p; + default: + /* this is an intrinsic, so we can't have any error handling */ + RTE_ASSERT(0); + return 0; + } +} + +/** + * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state. + * For more information about usage of these instructions, please refer to + * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. + */ +void +rte_power_monitor(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz) +{ + const uint32_t tsc_l = (uint32_t)tsc_timestamp; + const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + /* + * we're using raw byte codes for now as only the newest compiler + * versions support this instruction natively. + */ + + /* set address for UMONITOR */ + asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" + : + : "D"(p)); + + if (value_mask) { + const uint64_t cur_value = __get_umwait_val(p, data_sz); + const uint64_t masked = cur_value & value_mask; + + /* if the masked value is already matching, abort */ + if (masked == expected_value) + return; + } + /* execute UMWAIT */ + asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" + : /* ignore rflags */ + : "D"(0), /* enter C0.2 */ + "a"(tsc_l), "d"(tsc_h)); +} + +/** + * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state. + * For more information about usage of these instructions, please refer to + * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. + */ +void +rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz, rte_spinlock_t *lck) +{ + const uint32_t tsc_l = (uint32_t)tsc_timestamp; + const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + /* + * we're using raw byte codes for now as only the newest compiler + * versions support this instruction natively. + */ + + /* set address for UMONITOR */ + asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" + : + : "D"(p)); + + if (value_mask) { + const uint64_t cur_value = __get_umwait_val(p, data_sz); + const uint64_t masked = cur_value & value_mask; + + /* if the masked value is already matching, abort */ + if (masked == expected_value) + return; + } + rte_spinlock_unlock(lck); + + /* execute UMWAIT */ + asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" + : /* ignore rflags */ + : "D"(0), /* enter C0.2 */ + "a"(tsc_l), "d"(tsc_h)); + + rte_spinlock_lock(lck); +} + +/** + * This function uses TPAUSE instruction and will enter C0.2 state. For more + * information about usage of this instruction, please refer to Intel(R) 64 and + * IA-32 Architectures Software Developer's Manual. + */ +void +rte_power_pause(const uint64_t tsc_timestamp) +{ + const uint32_t tsc_l = (uint32_t)tsc_timestamp; + const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + + /* execute TPAUSE */ + asm volatile(".byte 0x66, 0x0f, 0xae, 0xf7;" + : /* ignore rflags */ + : "D"(0), /* enter C0.2 */ + "a"(tsc_l), "d"(tsc_h)); +} From patchwork Mon Jan 11 14:58:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 86316 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2B5AEA09FF; Mon, 11 Jan 2021 15:59:18 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 4137F140EA1; Mon, 11 Jan 2021 15:59:06 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by mails.dpdk.org (Postfix) with ESMTP id BFFDB140E92 for ; Mon, 11 Jan 2021 15:59:03 +0100 (CET) IronPort-SDR: etv5PWl9oPYuvhLYz+/i0WZb4/LY2UPZAL+My9XEGhstUPIx3SxyO0Z7pJiCnxOE5eQISliq/s JUJHkf7E0q/Q== X-IronPort-AV: E=McAfee;i="6000,8403,9860"; a="157652991" X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="157652991" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2021 06:59:03 -0800 IronPort-SDR: P2FS/RA47mrdoPI537Qb1dbyh1cSuVpWzDhSAr/sSxm9HPQSTzRujxqTNfls7IYVa44uW2VgC+ 636qbKqT2Jng== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="423816537" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by orsmga001.jf.intel.com with ESMTP; 11 Jan 2021 06:59:01 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Bruce Richardson , Konstantin Ananyev , thomas@monjalon.net, timothy.mcdaniel@intel.com, david.hunt@intel.com, chris.macnamara@intel.com Date: Mon, 11 Jan 2021 14:58:47 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v15 02/11] eal: avoid invalid API usage in power intrinsics X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Currently, the API documentation mandates that if the user wants to use the power management intrinsics, they need to call the `rte_cpu_get_intrinsics_support` API and check support for specific intrinsics. However, if the user does not do that, it is possible to get illegal instruction error because we're using raw instruction opcodes, which may or may not be supported at runtime. Now that we have everything in a C file, we can check for support at startup and prevent the user from possibly encountering illegal instruction errors. Signed-off-by: Anatoly Burakov --- Notes: v15: - Remove accidental whitespace changes v14: - Replace uint8_t with bool v14: - Replace uint8_t with bool .../include/generic/rte_power_intrinsics.h | 3 --- lib/librte_eal/x86/rte_power_intrinsics.c | 25 +++++++++++++++++++ 2 files changed, 25 insertions(+), 3 deletions(-) diff --git a/lib/librte_eal/include/generic/rte_power_intrinsics.h b/lib/librte_eal/include/generic/rte_power_intrinsics.h index 67977bd511..ffa72f7578 100644 --- a/lib/librte_eal/include/generic/rte_power_intrinsics.h +++ b/lib/librte_eal/include/generic/rte_power_intrinsics.h @@ -34,7 +34,6 @@ * * @warning It is responsibility of the user to check if this function is * supported at runtime using `rte_cpu_get_intrinsics_support()` API call. - * Failing to do so may result in an illegal CPU instruction error. * * @param p * Address to monitor for changes. @@ -75,7 +74,6 @@ void rte_power_monitor(const volatile void *p, * * @warning It is responsibility of the user to check if this function is * supported at runtime using `rte_cpu_get_intrinsics_support()` API call. - * Failing to do so may result in an illegal CPU instruction error. * * @param p * Address to monitor for changes. @@ -111,7 +109,6 @@ void rte_power_monitor_sync(const volatile void *p, * * @warning It is responsibility of the user to check if this function is * supported at runtime using `rte_cpu_get_intrinsics_support()` API call. - * Failing to do so may result in an illegal CPU instruction error. * * @param tsc_timestamp * Maximum TSC timestamp to wait for. Note that the wait behavior is diff --git a/lib/librte_eal/x86/rte_power_intrinsics.c b/lib/librte_eal/x86/rte_power_intrinsics.c index 34c5fd9c3e..a164ad55fc 100644 --- a/lib/librte_eal/x86/rte_power_intrinsics.c +++ b/lib/librte_eal/x86/rte_power_intrinsics.c @@ -4,6 +4,8 @@ #include "rte_power_intrinsics.h" +static bool wait_supported; + static inline uint64_t __get_umwait_val(const volatile void *p, const uint8_t sz) { @@ -35,6 +37,11 @@ rte_power_monitor(const volatile void *p, const uint64_t expected_value, { const uint32_t tsc_l = (uint32_t)tsc_timestamp; const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + + /* prevent user from running this instruction if it's not supported */ + if (!wait_supported) + return; + /* * we're using raw byte codes for now as only the newest compiler * versions support this instruction natively. @@ -72,6 +79,11 @@ rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, { const uint32_t tsc_l = (uint32_t)tsc_timestamp; const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + + /* prevent user from running this instruction if it's not supported */ + if (!wait_supported) + return; + /* * we're using raw byte codes for now as only the newest compiler * versions support this instruction natively. @@ -112,9 +124,22 @@ rte_power_pause(const uint64_t tsc_timestamp) const uint32_t tsc_l = (uint32_t)tsc_timestamp; const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + /* prevent user from running this instruction if it's not supported */ + if (!wait_supported) + return; + /* execute TPAUSE */ asm volatile(".byte 0x66, 0x0f, 0xae, 0xf7;" : /* ignore rflags */ : "D"(0), /* enter C0.2 */ "a"(tsc_l), "d"(tsc_h)); } + +RTE_INIT(rte_power_intrinsics_init) { + struct rte_cpu_intrinsics i; + + rte_cpu_get_intrinsics_support(&i); + + if (i.power_monitor && i.power_pause) + wait_supported = 1; +} From patchwork Mon Jan 11 14:58:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 86317 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E137CA09FF; Mon, 11 Jan 2021 15:59:27 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7E2F7140EA8; Mon, 11 Jan 2021 15:59:08 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by mails.dpdk.org (Postfix) with ESMTP id 9B5E0140EA4 for ; Mon, 11 Jan 2021 15:59:06 +0100 (CET) IronPort-SDR: hjxR6QfbOjxOOdpDH3yQ1uOG4vgK/bJBuAsSHClvKarrTsyFweFsQFt1tf9r8Wz7ZFHYUB1npG +XClFwCc9UgA== X-IronPort-AV: E=McAfee;i="6000,8403,9860"; a="157652997" X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="157652997" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2021 06:59:06 -0800 IronPort-SDR: 32ucdH0/haZKJdhlpu7ODWIdp0m/7XgEokF+pKZ6ODQcRokD7yqu3VC1IgenkJRR3n3LboTEbu PHLOYW5N49rw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="423816546" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by orsmga001.jf.intel.com with ESMTP; 11 Jan 2021 06:59:03 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Timothy McDaniel , Jerin Jacob , Ruifeng Wang , Jan Viktorin , David Christensen , Bruce Richardson , Konstantin Ananyev , thomas@monjalon.net, david.hunt@intel.com, chris.macnamara@intel.com Date: Mon, 11 Jan 2021 14:58:48 +0000 Message-Id: <54a11365bf7911b3e11471a89b515a497514fc72.1610377084.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v15 03/11] eal: change API of power intrinsics X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Instead of passing around pointers and integers, collect everything into struct. This makes API design around these intrinsics much easier. Signed-off-by: Anatoly Burakov --- drivers/event/dlb/dlb.c | 10 ++-- drivers/event/dlb2/dlb2.c | 10 ++-- lib/librte_eal/arm/rte_power_intrinsics.c | 25 ++++------ .../include/generic/rte_power_intrinsics.h | 49 ++++++++----------- lib/librte_eal/ppc/rte_power_intrinsics.c | 25 ++++------ lib/librte_eal/x86/rte_power_intrinsics.c | 32 ++++++------ 6 files changed, 70 insertions(+), 81 deletions(-) diff --git a/drivers/event/dlb/dlb.c b/drivers/event/dlb/dlb.c index 0c95c4793d..d2f2026291 100644 --- a/drivers/event/dlb/dlb.c +++ b/drivers/event/dlb/dlb.c @@ -3161,6 +3161,7 @@ dlb_dequeue_wait(struct dlb_eventdev *dlb, /* Interrupts not supported by PF PMD */ return 1; } else if (dlb->umwait_allowed) { + struct rte_power_monitor_cond pmc; volatile struct dlb_dequeue_qe *cq_base; union { uint64_t raw_qe[2]; @@ -3181,9 +3182,12 @@ dlb_dequeue_wait(struct dlb_eventdev *dlb, else expected_value = 0; - rte_power_monitor(monitor_addr, expected_value, - qe_mask.raw_qe[1], timeout + start_ticks, - sizeof(uint64_t)); + pmc.addr = monitor_addr; + pmc.val = expected_value; + pmc.mask = qe_mask.raw_qe[1]; + pmc.data_sz = sizeof(uint64_t); + + rte_power_monitor(&pmc, timeout + start_ticks); DLB_INC_STAT(ev_port->stats.traffic.rx_umonitor_umwait, 1); } else { diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c index 86724863f2..c9a8a02278 100644 --- a/drivers/event/dlb2/dlb2.c +++ b/drivers/event/dlb2/dlb2.c @@ -2870,6 +2870,7 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2, if (elapsed_ticks >= timeout) { return 1; } else if (dlb2->umwait_allowed) { + struct rte_power_monitor_cond pmc; volatile struct dlb2_dequeue_qe *cq_base; union { uint64_t raw_qe[2]; @@ -2890,9 +2891,12 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2, else expected_value = 0; - rte_power_monitor(monitor_addr, expected_value, - qe_mask.raw_qe[1], timeout + start_ticks, - sizeof(uint64_t)); + pmc.addr = monitor_addr; + pmc.val = expected_value; + pmc.mask = qe_mask.raw_qe[1]; + pmc.data_sz = sizeof(uint64_t); + + rte_power_monitor(&pmc, timeout + start_ticks); DLB2_INC_STAT(ev_port->stats.traffic.rx_umonitor_umwait, 1); } else { diff --git a/lib/librte_eal/arm/rte_power_intrinsics.c b/lib/librte_eal/arm/rte_power_intrinsics.c index e5a49facb4..f2c3506b90 100644 --- a/lib/librte_eal/arm/rte_power_intrinsics.c +++ b/lib/librte_eal/arm/rte_power_intrinsics.c @@ -7,36 +7,31 @@ /** * This function is not supported on ARM. */ -void rte_power_monitor(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz) +void +rte_power_monitor(const struct rte_power_monitor_cond *pmc, + const uint64_t tsc_timestamp) { - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); + RTE_SET_USED(pmc); RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(data_sz); } /** * This function is not supported on ARM. */ -void rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz, rte_spinlock_t *lck) +void +rte_power_monitor_sync(const struct rte_power_monitor_cond *pmc, + const uint64_t tsc_timestamp, rte_spinlock_t *lck) { - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); + RTE_SET_USED(pmc); RTE_SET_USED(tsc_timestamp); RTE_SET_USED(lck); - RTE_SET_USED(data_sz); } /** * This function is not supported on ARM. */ -void rte_power_pause(const uint64_t tsc_timestamp) +void +rte_power_pause(const uint64_t tsc_timestamp) { RTE_SET_USED(tsc_timestamp); } diff --git a/lib/librte_eal/include/generic/rte_power_intrinsics.h b/lib/librte_eal/include/generic/rte_power_intrinsics.h index ffa72f7578..00c670cb50 100644 --- a/lib/librte_eal/include/generic/rte_power_intrinsics.h +++ b/lib/librte_eal/include/generic/rte_power_intrinsics.h @@ -18,6 +18,18 @@ * which are architecture-dependent. */ +struct rte_power_monitor_cond { + volatile void *addr; /**< Address to monitor for changes */ + uint64_t val; /**< Before attempting the monitoring, the address + * may be read and compared against this value. + **/ + uint64_t mask; /**< 64-bit mask to extract current value from addr */ + uint8_t data_sz; /**< Data size (in bytes) that will be used to compare + * expected value with the memory address. Can be 1, + * 2, 4, or 8. Supplying any other value will lead to + * undefined result. */ +}; + /** * @warning * @b EXPERIMENTAL: this API may change without prior notice @@ -35,25 +47,15 @@ * @warning It is responsibility of the user to check if this function is * supported at runtime using `rte_cpu_get_intrinsics_support()` API call. * - * @param p - * Address to monitor for changes. - * @param expected_value - * Before attempting the monitoring, the `p` address may be read and compared - * against this value. If `value_mask` is zero, this step will be skipped. - * @param value_mask - * The 64-bit mask to use to extract current value from `p`. + * @param pmc + * The monitoring condition structure. * @param tsc_timestamp * Maximum TSC timestamp to wait for. Note that the wait behavior is * architecture-dependent. - * @param data_sz - * Data size (in bytes) that will be used to compare expected value with the - * memory address. Can be 1, 2, 4 or 8. Supplying any other value will lead - * to undefined result. */ __rte_experimental -void rte_power_monitor(const volatile void *p, - const uint64_t expected_value, const uint64_t value_mask, - const uint64_t tsc_timestamp, const uint8_t data_sz); +void rte_power_monitor(const struct rte_power_monitor_cond *pmc, + const uint64_t tsc_timestamp); /** * @warning @@ -75,30 +77,19 @@ void rte_power_monitor(const volatile void *p, * @warning It is responsibility of the user to check if this function is * supported at runtime using `rte_cpu_get_intrinsics_support()` API call. * - * @param p - * Address to monitor for changes. - * @param expected_value - * Before attempting the monitoring, the `p` address may be read and compared - * against this value. If `value_mask` is zero, this step will be skipped. - * @param value_mask - * The 64-bit mask to use to extract current value from `p`. + * @param pmc + * The monitoring condition structure. * @param tsc_timestamp * Maximum TSC timestamp to wait for. Note that the wait behavior is * architecture-dependent. - * @param data_sz - * Data size (in bytes) that will be used to compare expected value with the - * memory address. Can be 1, 2, 4 or 8. Supplying any other value will lead - * to undefined result. * @param lck * A spinlock that must be locked before entering the function, will be * unlocked while the CPU is sleeping, and will be locked again once the CPU * wakes up. */ __rte_experimental -void rte_power_monitor_sync(const volatile void *p, - const uint64_t expected_value, const uint64_t value_mask, - const uint64_t tsc_timestamp, const uint8_t data_sz, - rte_spinlock_t *lck); +void rte_power_monitor_sync(const struct rte_power_monitor_cond *pmc, + const uint64_t tsc_timestamp, rte_spinlock_t *lck); /** * @warning diff --git a/lib/librte_eal/ppc/rte_power_intrinsics.c b/lib/librte_eal/ppc/rte_power_intrinsics.c index 785effabe6..3897d2024d 100644 --- a/lib/librte_eal/ppc/rte_power_intrinsics.c +++ b/lib/librte_eal/ppc/rte_power_intrinsics.c @@ -7,36 +7,31 @@ /** * This function is not supported on PPC64. */ -void rte_power_monitor(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz) +void +rte_power_monitor(const struct rte_power_monitor_cond *pmc, + const uint64_t tsc_timestamp) { - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); + RTE_SET_USED(pmc); RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(data_sz); } /** * This function is not supported on PPC64. */ -void rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz, rte_spinlock_t *lck) +void +rte_power_monitor_sync(const struct rte_power_monitor_cond *pmc, + const uint64_t tsc_timestamp, rte_spinlock_t *lck) { - RTE_SET_USED(p); - RTE_SET_USED(expected_value); - RTE_SET_USED(value_mask); + RTE_SET_USED(pmc); RTE_SET_USED(tsc_timestamp); RTE_SET_USED(lck); - RTE_SET_USED(data_sz); } /** * This function is not supported on PPC64. */ -void rte_power_pause(const uint64_t tsc_timestamp) +void +rte_power_pause(const uint64_t tsc_timestamp) { RTE_SET_USED(tsc_timestamp); } diff --git a/lib/librte_eal/x86/rte_power_intrinsics.c b/lib/librte_eal/x86/rte_power_intrinsics.c index a164ad55fc..9b0638148d 100644 --- a/lib/librte_eal/x86/rte_power_intrinsics.c +++ b/lib/librte_eal/x86/rte_power_intrinsics.c @@ -31,9 +31,8 @@ __get_umwait_val(const volatile void *p, const uint8_t sz) * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. */ void -rte_power_monitor(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz) +rte_power_monitor(const struct rte_power_monitor_cond *pmc, + const uint64_t tsc_timestamp) { const uint32_t tsc_l = (uint32_t)tsc_timestamp; const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); @@ -50,14 +49,15 @@ rte_power_monitor(const volatile void *p, const uint64_t expected_value, /* set address for UMONITOR */ asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" : - : "D"(p)); + : "D"(pmc->addr)); - if (value_mask) { - const uint64_t cur_value = __get_umwait_val(p, data_sz); - const uint64_t masked = cur_value & value_mask; + if (pmc->mask) { + const uint64_t cur_value = __get_umwait_val( + pmc->addr, pmc->data_sz); + const uint64_t masked = cur_value & pmc->mask; /* if the masked value is already matching, abort */ - if (masked == expected_value) + if (masked == pmc->val) return; } /* execute UMWAIT */ @@ -73,9 +73,8 @@ rte_power_monitor(const volatile void *p, const uint64_t expected_value, * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. */ void -rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, - const uint64_t value_mask, const uint64_t tsc_timestamp, - const uint8_t data_sz, rte_spinlock_t *lck) +rte_power_monitor_sync(const struct rte_power_monitor_cond *pmc, + const uint64_t tsc_timestamp, rte_spinlock_t *lck) { const uint32_t tsc_l = (uint32_t)tsc_timestamp; const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); @@ -92,14 +91,15 @@ rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, /* set address for UMONITOR */ asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" : - : "D"(p)); + : "D"(pmc->addr)); - if (value_mask) { - const uint64_t cur_value = __get_umwait_val(p, data_sz); - const uint64_t masked = cur_value & value_mask; + if (pmc->mask) { + const uint64_t cur_value = __get_umwait_val( + pmc->addr, pmc->data_sz); + const uint64_t masked = cur_value & pmc->mask; /* if the masked value is already matching, abort */ - if (masked == expected_value) + if (masked == pmc->val) return; } rte_spinlock_unlock(lck); From patchwork Mon Jan 11 14:58:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 86318 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 615EEA09FF; Mon, 11 Jan 2021 15:59:37 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id BA446140EAB; Mon, 11 Jan 2021 15:59:10 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by mails.dpdk.org (Postfix) with ESMTP id A9800140EAE for ; Mon, 11 Jan 2021 15:59:09 +0100 (CET) IronPort-SDR: AZWObMAod8L/Qb8ByT6fV5xPNRSf6pp55R4Osk12PqSMKTDJx0EHF5a7GqgCDmdHwUb2Vhnss3 IwIqzmFJ5uDw== X-IronPort-AV: E=McAfee;i="6000,8403,9860"; a="157653007" X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="157653007" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2021 06:59:09 -0800 IronPort-SDR: EFcNwrbh8a6lDMCIRn17UFFsefqHQOFg8y1fK/eXkUnXAt9Jyqi4n8vip7hMqlRj0oNolauRHK zVVIOJHhWoOw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="423816572" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by orsmga001.jf.intel.com with ESMTP; 11 Jan 2021 06:59:06 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Jerin Jacob , Ruifeng Wang , Jan Viktorin , David Christensen , Ray Kinsella , Neil Horman , Bruce Richardson , Konstantin Ananyev , thomas@monjalon.net, timothy.mcdaniel@intel.com, david.hunt@intel.com, chris.macnamara@intel.com Date: Mon, 11 Jan 2021 14:58:49 +0000 Message-Id: <0fd864903f721114425d90352dc43431ae1b3cb9.1610377084.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v15 04/11] eal: remove sync version of power monitor X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Currently, the "sync" version of power monitor intrinsic is supposed to be used for purposes of waking up a sleeping core. However, there are better ways to achieve the same result, so remove the unneeded function. Signed-off-by: Anatoly Burakov --- lib/librte_eal/arm/rte_power_intrinsics.c | 12 ----- .../include/generic/rte_power_intrinsics.h | 34 -------------- lib/librte_eal/ppc/rte_power_intrinsics.c | 12 ----- lib/librte_eal/version.map | 1 - lib/librte_eal/x86/rte_power_intrinsics.c | 46 ------------------- 5 files changed, 105 deletions(-) diff --git a/lib/librte_eal/arm/rte_power_intrinsics.c b/lib/librte_eal/arm/rte_power_intrinsics.c index f2c3506b90..6b8219b919 100644 --- a/lib/librte_eal/arm/rte_power_intrinsics.c +++ b/lib/librte_eal/arm/rte_power_intrinsics.c @@ -15,18 +15,6 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc, RTE_SET_USED(tsc_timestamp); } -/** - * This function is not supported on ARM. - */ -void -rte_power_monitor_sync(const struct rte_power_monitor_cond *pmc, - const uint64_t tsc_timestamp, rte_spinlock_t *lck) -{ - RTE_SET_USED(pmc); - RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(lck); -} - /** * This function is not supported on ARM. */ diff --git a/lib/librte_eal/include/generic/rte_power_intrinsics.h b/lib/librte_eal/include/generic/rte_power_intrinsics.h index 00c670cb50..a6f1955996 100644 --- a/lib/librte_eal/include/generic/rte_power_intrinsics.h +++ b/lib/librte_eal/include/generic/rte_power_intrinsics.h @@ -57,40 +57,6 @@ __rte_experimental void rte_power_monitor(const struct rte_power_monitor_cond *pmc, const uint64_t tsc_timestamp); -/** - * @warning - * @b EXPERIMENTAL: this API may change without prior notice - * - * Monitor specific address for changes. This will cause the CPU to enter an - * architecture-defined optimized power state until either the specified - * memory address is written to, a certain TSC timestamp is reached, or other - * reasons cause the CPU to wake up. - * - * Additionally, an `expected` 64-bit value and 64-bit mask are provided. If - * mask is non-zero, the current value pointed to by the `p` pointer will be - * checked against the expected value, and if they match, the entering of - * optimized power state may be aborted. - * - * This call will also lock a spinlock on entering sleep, and release it on - * waking up the CPU. - * - * @warning It is responsibility of the user to check if this function is - * supported at runtime using `rte_cpu_get_intrinsics_support()` API call. - * - * @param pmc - * The monitoring condition structure. - * @param tsc_timestamp - * Maximum TSC timestamp to wait for. Note that the wait behavior is - * architecture-dependent. - * @param lck - * A spinlock that must be locked before entering the function, will be - * unlocked while the CPU is sleeping, and will be locked again once the CPU - * wakes up. - */ -__rte_experimental -void rte_power_monitor_sync(const struct rte_power_monitor_cond *pmc, - const uint64_t tsc_timestamp, rte_spinlock_t *lck); - /** * @warning * @b EXPERIMENTAL: this API may change without prior notice diff --git a/lib/librte_eal/ppc/rte_power_intrinsics.c b/lib/librte_eal/ppc/rte_power_intrinsics.c index 3897d2024d..9a40c4d5d6 100644 --- a/lib/librte_eal/ppc/rte_power_intrinsics.c +++ b/lib/librte_eal/ppc/rte_power_intrinsics.c @@ -15,18 +15,6 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc, RTE_SET_USED(tsc_timestamp); } -/** - * This function is not supported on PPC64. - */ -void -rte_power_monitor_sync(const struct rte_power_monitor_cond *pmc, - const uint64_t tsc_timestamp, rte_spinlock_t *lck) -{ - RTE_SET_USED(pmc); - RTE_SET_USED(tsc_timestamp); - RTE_SET_USED(lck); -} - /** * This function is not supported on PPC64. */ diff --git a/lib/librte_eal/version.map b/lib/librte_eal/version.map index 31bf76ae81..20945b1efa 100644 --- a/lib/librte_eal/version.map +++ b/lib/librte_eal/version.map @@ -406,7 +406,6 @@ EXPERIMENTAL { # added in 21.02 rte_power_monitor; - rte_power_monitor_sync; rte_power_pause; }; diff --git a/lib/librte_eal/x86/rte_power_intrinsics.c b/lib/librte_eal/x86/rte_power_intrinsics.c index 9b0638148d..487a783a2c 100644 --- a/lib/librte_eal/x86/rte_power_intrinsics.c +++ b/lib/librte_eal/x86/rte_power_intrinsics.c @@ -67,52 +67,6 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc, "a"(tsc_l), "d"(tsc_h)); } -/** - * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state. - * For more information about usage of these instructions, please refer to - * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. - */ -void -rte_power_monitor_sync(const struct rte_power_monitor_cond *pmc, - const uint64_t tsc_timestamp, rte_spinlock_t *lck) -{ - const uint32_t tsc_l = (uint32_t)tsc_timestamp; - const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); - - /* prevent user from running this instruction if it's not supported */ - if (!wait_supported) - return; - - /* - * we're using raw byte codes for now as only the newest compiler - * versions support this instruction natively. - */ - - /* set address for UMONITOR */ - asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" - : - : "D"(pmc->addr)); - - if (pmc->mask) { - const uint64_t cur_value = __get_umwait_val( - pmc->addr, pmc->data_sz); - const uint64_t masked = cur_value & pmc->mask; - - /* if the masked value is already matching, abort */ - if (masked == pmc->val) - return; - } - rte_spinlock_unlock(lck); - - /* execute UMWAIT */ - asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" - : /* ignore rflags */ - : "D"(0), /* enter C0.2 */ - "a"(tsc_l), "d"(tsc_h)); - - rte_spinlock_lock(lck); -} - /** * This function uses TPAUSE instruction and will enter C0.2 state. For more * information about usage of this instruction, please refer to Intel(R) 64 and From patchwork Mon Jan 11 14:58:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 86319 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4F0E9A09FF; Mon, 11 Jan 2021 15:59:47 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0709D140EB1; Mon, 11 Jan 2021 15:59:14 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by mails.dpdk.org (Postfix) with ESMTP id 76030140EB0 for ; Mon, 11 Jan 2021 15:59:12 +0100 (CET) IronPort-SDR: DN7rFVSSzGR71LvuVUru/bfyncO8Bv7PNi+f01dfX6EdIBhOyorzw7SIPOiTZRQgWg+3PLeqs2 YLfXcT+cXYaQ== X-IronPort-AV: E=McAfee;i="6000,8403,9860"; a="157653016" X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="157653016" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2021 06:59:12 -0800 IronPort-SDR: Jn/XhajzMjUsVY31h8QB8D5d+p57uxeWz5v86TRZjeVAbLambfB2OUMUjOrOb8d/cseOSPKi0r mflpi4vcpvUQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="423816583" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by orsmga001.jf.intel.com with ESMTP; 11 Jan 2021 06:59:09 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Jerin Jacob , Ruifeng Wang , Jan Viktorin , David Christensen , Ray Kinsella , Neil Horman , Bruce Richardson , Konstantin Ananyev , thomas@monjalon.net, timothy.mcdaniel@intel.com, david.hunt@intel.com, chris.macnamara@intel.com Date: Mon, 11 Jan 2021 14:58:50 +0000 Message-Id: <31c29fca19a26d59e0496376c5ef6edbd0713144.1610377085.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v15 05/11] eal: add monitor wakeup function X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Now that we have everything in a C file, we can store the information about our sleep, and have a native mechanism to wake up the sleeping core. This mechanism would however only wake up a core that's sleeping while monitoring - waking up from `rte_power_pause` won't work. Signed-off-by: Anatoly Burakov --- Notes: v13: - Add comments around wakeup code to explain what it does - Add lcore_id parameter checking to prevent buffer overrun lib/librte_eal/arm/rte_power_intrinsics.c | 9 ++ .../include/generic/rte_power_intrinsics.h | 16 ++++ lib/librte_eal/ppc/rte_power_intrinsics.c | 9 ++ lib/librte_eal/version.map | 1 + lib/librte_eal/x86/rte_power_intrinsics.c | 85 +++++++++++++++++++ 5 files changed, 120 insertions(+) diff --git a/lib/librte_eal/arm/rte_power_intrinsics.c b/lib/librte_eal/arm/rte_power_intrinsics.c index 6b8219b919..14081a2c5b 100644 --- a/lib/librte_eal/arm/rte_power_intrinsics.c +++ b/lib/librte_eal/arm/rte_power_intrinsics.c @@ -23,3 +23,12 @@ rte_power_pause(const uint64_t tsc_timestamp) { RTE_SET_USED(tsc_timestamp); } + +/** + * This function is not supported on ARM. + */ +void +rte_power_monitor_wakeup(const unsigned int lcore_id) +{ + RTE_SET_USED(lcore_id); +} diff --git a/lib/librte_eal/include/generic/rte_power_intrinsics.h b/lib/librte_eal/include/generic/rte_power_intrinsics.h index a6f1955996..e311d6f8ea 100644 --- a/lib/librte_eal/include/generic/rte_power_intrinsics.h +++ b/lib/librte_eal/include/generic/rte_power_intrinsics.h @@ -57,6 +57,22 @@ __rte_experimental void rte_power_monitor(const struct rte_power_monitor_cond *pmc, const uint64_t tsc_timestamp); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Wake up a specific lcore that is in a power optimized state and is monitoring + * an address. + * + * @note This function will *not* wake up a core that is in a power optimized + * state due to calling `rte_power_pause`. + * + * @param lcore_id + * Lcore ID of a sleeping thread. + */ +__rte_experimental +void rte_power_monitor_wakeup(const unsigned int lcore_id); + /** * @warning * @b EXPERIMENTAL: this API may change without prior notice diff --git a/lib/librte_eal/ppc/rte_power_intrinsics.c b/lib/librte_eal/ppc/rte_power_intrinsics.c index 9a40c4d5d6..a7db61a7c3 100644 --- a/lib/librte_eal/ppc/rte_power_intrinsics.c +++ b/lib/librte_eal/ppc/rte_power_intrinsics.c @@ -23,3 +23,12 @@ rte_power_pause(const uint64_t tsc_timestamp) { RTE_SET_USED(tsc_timestamp); } + +/** + * This function is not supported on PPC64. + */ +void +rte_power_monitor_wakeup(const unsigned int lcore_id) +{ + RTE_SET_USED(lcore_id); +} diff --git a/lib/librte_eal/version.map b/lib/librte_eal/version.map index 20945b1efa..ac026e289d 100644 --- a/lib/librte_eal/version.map +++ b/lib/librte_eal/version.map @@ -406,6 +406,7 @@ EXPERIMENTAL { # added in 21.02 rte_power_monitor; + rte_power_monitor_wakeup; rte_power_pause; }; diff --git a/lib/librte_eal/x86/rte_power_intrinsics.c b/lib/librte_eal/x86/rte_power_intrinsics.c index 487a783a2c..941da138ce 100644 --- a/lib/librte_eal/x86/rte_power_intrinsics.c +++ b/lib/librte_eal/x86/rte_power_intrinsics.c @@ -2,8 +2,31 @@ * Copyright(c) 2020 Intel Corporation */ +#include +#include +#include + #include "rte_power_intrinsics.h" +/* + * Per-lcore structure holding current status of C0.2 sleeps. + */ +static struct power_wait_status { + rte_spinlock_t lock; + volatile void *monitor_addr; /**< NULL if not currently sleeping */ +} __rte_cache_aligned wait_status[RTE_MAX_LCORE]; + +static inline void +__umwait_wakeup(volatile void *addr) +{ + uint64_t val; + + /* trigger a write but don't change the value */ + val = __atomic_load_n((volatile uint64_t *)addr, __ATOMIC_RELAXED); + __atomic_compare_exchange_n((volatile uint64_t *)addr, &val, val, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED); +} + static bool wait_supported; static inline uint64_t @@ -36,6 +59,12 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc, { const uint32_t tsc_l = (uint32_t)tsc_timestamp; const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + const unsigned int lcore_id = rte_lcore_id(); + struct power_wait_status *s; + + /* prevent non-EAL thread from using this API */ + if (lcore_id >= RTE_MAX_LCORE) + return; /* prevent user from running this instruction if it's not supported */ if (!wait_supported) @@ -60,11 +89,24 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc, if (masked == pmc->val) return; } + + s = &wait_status[lcore_id]; + + /* update sleep address */ + rte_spinlock_lock(&s->lock); + s->monitor_addr = pmc->addr; + rte_spinlock_unlock(&s->lock); + /* execute UMWAIT */ asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" : /* ignore rflags */ : "D"(0), /* enter C0.2 */ "a"(tsc_l), "d"(tsc_h)); + + /* erase sleep address */ + rte_spinlock_lock(&s->lock); + s->monitor_addr = NULL; + rte_spinlock_unlock(&s->lock); } /** @@ -97,3 +139,46 @@ RTE_INIT(rte_power_intrinsics_init) { if (i.power_monitor && i.power_pause) wait_supported = 1; } + +void +rte_power_monitor_wakeup(const unsigned int lcore_id) +{ + struct power_wait_status *s; + + /* prevent buffer overrun */ + if (lcore_id >= RTE_MAX_LCORE) + return; + + /* prevent user from running this instruction if it's not supported */ + if (!wait_supported) + return; + + s = &wait_status[lcore_id]; + + /* + * There is a race condition between sleep, wakeup and locking, but we + * don't need to handle it. + * + * Possible situations: + * + * 1. T1 locks, sets address, unlocks + * 2. T2 locks, triggers wakeup, unlocks + * 3. T1 sleeps + * + * In this case, because T1 has already set the address for monitoring, + * we will wake up immediately even if T2 triggers wakeup before T1 + * goes to sleep. + * + * 1. T1 locks, sets address, unlocks, goes to sleep, and wakes up + * 2. T2 locks, triggers wakeup, and unlocks + * 3. T1 locks, erases address, and unlocks + * + * In this case, since we've already woken up, the "wakeup" was + * unneeded, and since T1 is still waiting on T2 releasing the lock, the + * wakeup address is still valid so it's perfectly safe to write it. + */ + rte_spinlock_lock(&s->lock); + if (s->monitor_addr != NULL) + __umwait_wakeup(s->monitor_addr); + rte_spinlock_unlock(&s->lock); +} From patchwork Mon Jan 11 14:58:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 86320 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id EA390A09FF; Mon, 11 Jan 2021 15:59:58 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B3A12140EBF; Mon, 11 Jan 2021 15:59:17 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by mails.dpdk.org (Postfix) with ESMTP id 1188C140EB8 for ; Mon, 11 Jan 2021 15:59:15 +0100 (CET) IronPort-SDR: JMWXqCDVNlnnjmJ6Un7kZZo3P7ioKC3UvC77TQbifNF1ZGunuNHPBUGdVtZELCUoVcmx8XpCj/ efZnlD6vRejA== X-IronPort-AV: E=McAfee;i="6000,8403,9860"; a="157653024" X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="157653024" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2021 06:59:15 -0800 IronPort-SDR: XQd+PBQRcq/l9pZ/cGNO135dtsDKI1oJ7r+eMcFqGFGS0DouIqQsX44knaBkTbu+CHB3L+Fejx dSiSVy06V/sA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="423816594" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by orsmga001.jf.intel.com with ESMTP; 11 Jan 2021 06:59:12 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , Thomas Monjalon , Ferruh Yigit , Andrew Rybchenko , Ray Kinsella , Neil Horman , konstantin.ananyev@intel.com, timothy.mcdaniel@intel.com, david.hunt@intel.com, bruce.richardson@intel.com, chris.macnamara@intel.com Date: Mon, 11 Jan 2021 14:58:51 +0000 Message-Id: <99a2546b1d4945984d51b1043c924f94e7e0ac8c.1610377085.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v15 06/11] ethdev: add simple power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Add a simple API to allow getting the monitor conditions for power-optimized monitoring of the Rx queues from the PMD, as well as release notes information. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: Andrew Rybchenko --- Notes: v13: - Fix typos and issues raised by Andrew doc/guides/rel_notes/release_21_02.rst | 5 +++++ lib/librte_ethdev/rte_ethdev.c | 28 ++++++++++++++++++++++++++ lib/librte_ethdev/rte_ethdev.h | 25 +++++++++++++++++++++++ lib/librte_ethdev/rte_ethdev_driver.h | 22 ++++++++++++++++++++ lib/librte_ethdev/version.map | 3 +++ 5 files changed, 83 insertions(+) diff --git a/doc/guides/rel_notes/release_21_02.rst b/doc/guides/rel_notes/release_21_02.rst index 638f98168b..6de0cb568e 100644 --- a/doc/guides/rel_notes/release_21_02.rst +++ b/doc/guides/rel_notes/release_21_02.rst @@ -55,6 +55,11 @@ New Features Also, make sure to start the actual text at the margin. ======================================================= +* **ethdev: added new API for PMD power management** + + * ``rte_eth_get_monitor_addr()``, to be used in conjunction with + ``rte_power_monitor()`` to enable automatic power management for PMD's. + Removed Items ------------- diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index 17ddacc78d..e19dbd838b 100644 --- a/lib/librte_ethdev/rte_ethdev.c +++ b/lib/librte_ethdev/rte_ethdev.c @@ -5115,6 +5115,34 @@ rte_eth_tx_burst_mode_get(uint16_t port_id, uint16_t queue_id, dev->dev_ops->tx_burst_mode_get(dev, queue_id, mode)); } +int +rte_eth_get_monitor_addr(uint16_t port_id, uint16_t queue_id, + struct rte_power_monitor_cond *pmc) +{ + struct rte_eth_dev *dev; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); + + dev = &rte_eth_devices[port_id]; + + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_monitor_addr, -ENOTSUP); + + if (queue_id >= dev->data->nb_rx_queues) { + RTE_ETHDEV_LOG(ERR, "Invalid Rx queue_id=%u\n", queue_id); + return -EINVAL; + } + + if (pmc == NULL) { + RTE_ETHDEV_LOG(ERR, "Invalid power monitor condition=%p\n", + pmc); + return -EINVAL; + } + + return eth_err(port_id, + dev->dev_ops->get_monitor_addr(dev->data->rx_queues[queue_id], + pmc)); +} + int rte_eth_dev_set_mc_addr_list(uint16_t port_id, struct rte_ether_addr *mc_addr_set, diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h index f5f8919186..ca0f91312e 100644 --- a/lib/librte_ethdev/rte_ethdev.h +++ b/lib/librte_ethdev/rte_ethdev.h @@ -157,6 +157,7 @@ extern "C" { #include #include #include +#include #include "rte_ethdev_trace_fp.h" #include "rte_dev_info.h" @@ -4334,6 +4335,30 @@ __rte_experimental int rte_eth_tx_burst_mode_get(uint16_t port_id, uint16_t queue_id, struct rte_eth_burst_mode *mode); +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Retrieve the monitor condition for a given receive queue. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param queue_id + * The Rx queue on the Ethernet device for which information + * will be retrieved. + * @param pmc + * The pointer point to power-optimized monitoring condition structure. + * + * @return + * - 0: Success. + * -ENOTSUP: Operation not supported. + * -EINVAL: Invalid parameters. + * -ENODEV: Invalid port ID. + */ +__rte_experimental +int rte_eth_get_monitor_addr(uint16_t port_id, uint16_t queue_id, + struct rte_power_monitor_cond *pmc); + /** * Retrieve device registers and register attributes (number of registers and * register size) diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h index 0eacfd8425..3b3b0ec1a0 100644 --- a/lib/librte_ethdev/rte_ethdev_driver.h +++ b/lib/librte_ethdev/rte_ethdev_driver.h @@ -763,6 +763,26 @@ typedef int (*eth_hairpin_queue_peer_unbind_t) (struct rte_eth_dev *dev, uint16_t cur_queue, uint32_t direction); /**< @internal Unbind peer queue from the current queue. */ +/** + * @internal + * Get address of memory location whose contents will change whenever there is + * new data to be received on an Rx queue. + * + * @param rxq + * Ethdev queue pointer. + * @param pmc + * The pointer to power-optimized monitoring condition structure. + * @return + * Negative errno value on error, 0 on success. + * + * @retval 0 + * Success + * @retval -EINVAL + * Invalid parameters + */ +typedef int (*eth_get_monitor_addr_t)(void *rxq, + struct rte_power_monitor_cond *pmc); + /** * @internal A structure containing the functions exported by an Ethernet driver. */ @@ -917,6 +937,8 @@ struct eth_dev_ops { /**< Set up the connection between the pair of hairpin queues. */ eth_hairpin_queue_peer_unbind_t hairpin_queue_peer_unbind; /**< Disconnect the hairpin queues of a pair from each other. */ + eth_get_monitor_addr_t get_monitor_addr; + /**< Get power monitoring condition for Rx queue. */ }; /** diff --git a/lib/librte_ethdev/version.map b/lib/librte_ethdev/version.map index d3f5410806..a124e1e370 100644 --- a/lib/librte_ethdev/version.map +++ b/lib/librte_ethdev/version.map @@ -240,6 +240,9 @@ EXPERIMENTAL { rte_flow_get_restore_info; rte_flow_tunnel_action_decap_release; rte_flow_tunnel_item_release; + + # added in 21.02 + rte_eth_get_monitor_addr; }; INTERNAL { From patchwork Mon Jan 11 14:58:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 86321 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 641DFA09FF; Mon, 11 Jan 2021 16:00:09 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 2B8B8140EC5; Mon, 11 Jan 2021 15:59:20 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by mails.dpdk.org (Postfix) with ESMTP id 7DB50140EC3 for ; Mon, 11 Jan 2021 15:59:18 +0100 (CET) IronPort-SDR: mdcwL/jiIWfBXF35GEwBHCXvmRF0m9vNOVZRF7u8NObTEKwKCLM8yrXwwmDFbsC/2uGTepYBr9 qUWAlMoS6pRw== X-IronPort-AV: E=McAfee;i="6000,8403,9860"; a="157653033" X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="157653033" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2021 06:59:18 -0800 IronPort-SDR: T66HqE5EyihR8CfpT+haVoKnFPZCIBdB2tv57NyzwM04XAOUmDJDPHfk2BhOtFiOkPBq2BuOUt QXxJahi51Z6A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="423816600" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by orsmga001.jf.intel.com with ESMTP; 11 Jan 2021 06:59:15 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , David Hunt , Ray Kinsella , Neil Horman , thomas@monjalon.net, konstantin.ananyev@intel.com, timothy.mcdaniel@intel.com, bruce.richardson@intel.com, chris.macnamara@intel.com Date: Mon, 11 Jan 2021 14:58:52 +0000 Message-Id: <12e1a8da587630c31584e874b270b0187337cc87.1610377085.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v15 07/11] power: add PMD power management API and callback X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Add a simple on/off switch that will enable saving power when no packets are arriving. It is based on counting the number of empty polls and, when the number reaches a certain threshold, entering an architecture-defined optimized power state that will either wait until a TSC timestamp expires, or when packets arrive. This API mandates a core-to-single-queue mapping (that is, multiple queued per device are supported, but they have to be polled on different cores). This design is using PMD RX callbacks. 1. UMWAIT/UMONITOR: When a certain threshold of empty polls is reached, the core will go into a power optimized sleep while waiting on an address of next RX descriptor to be written to. 2. TPAUSE/Pause instruction This method uses the pause (or TPAUSE, if available) instruction to avoid busy polling. 3. Frequency scaling Reuse existing DPDK power library to scale up/down core frequency depending on traffic volume. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov --- Notes: v15: - Fix check in UMWAIT callback v13: - Rework the synchronization mechanism to not require locking - Add more parameter checking - Rework n_rx_queues access to not go through internal PMD structures and use public API instead v13: - Rework the synchronization mechanism to not require locking - Add more parameter checking - Rework n_rx_queues access to not go through internal PMD structures and use public API instead doc/guides/prog_guide/power_man.rst | 44 +++ doc/guides/rel_notes/release_21_02.rst | 10 + lib/librte_power/meson.build | 5 +- lib/librte_power/rte_power_pmd_mgmt.c | 359 +++++++++++++++++++++++++ lib/librte_power/rte_power_pmd_mgmt.h | 90 +++++++ lib/librte_power/version.map | 5 + 6 files changed, 511 insertions(+), 2 deletions(-) create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst index 0a3755a901..02280dd689 100644 --- a/doc/guides/prog_guide/power_man.rst +++ b/doc/guides/prog_guide/power_man.rst @@ -192,6 +192,47 @@ User Cases ---------- The mechanism can applied to any device which is based on polling. e.g. NIC, FPGA. +PMD Power Management API +------------------------ + +Abstract +~~~~~~~~ +Existing power management mechanisms require developers to change application +design or change code to make use of it. The PMD power management API provides a +convenient alternative by utilizing Ethernet PMD RX callbacks, and triggering +power saving whenever empty poll count reaches a certain number. + + * Monitor + + This power saving scheme will put the CPU into optimized power state and use + the ``rte_power_monitor()`` function to monitor the Ethernet PMD RX + descriptor address, and wake the CPU up whenever there's new traffic. + + * Pause + + This power saving scheme will avoid busy polling by either entering + power-optimized sleep state with ``rte_power_pause()`` function, or, if it's + not available, use ``rte_pause()``. + + * Frequency scaling + + This power saving scheme will use existing ``librte_power`` library + functionality to scale the core frequency up/down depending on traffic + volume. + + +.. note:: + + Currently, this power management API is limited to mandatory mapping of 1 + queue to 1 core (multiple queues are supported, but they must be polled from + different cores). + +API Overview for PMD Power Management +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +* **Queue Enable**: Enable specific power scheme for certain queue/port/core + +* **Queue Disable**: Disable power scheme for certain queue/port/core + References ---------- @@ -200,3 +241,6 @@ References * The :doc:`../sample_app_ug/vm_power_management` chapter in the :doc:`../sample_app_ug/index` section. + +* The :doc:`../sample_app_ug/rxtx_callbacks` + chapter in the :doc:`../sample_app_ug/index` section. diff --git a/doc/guides/rel_notes/release_21_02.rst b/doc/guides/rel_notes/release_21_02.rst index 6de0cb568e..b34828cad6 100644 --- a/doc/guides/rel_notes/release_21_02.rst +++ b/doc/guides/rel_notes/release_21_02.rst @@ -60,6 +60,16 @@ New Features * ``rte_eth_get_monitor_addr()``, to be used in conjunction with ``rte_power_monitor()`` to enable automatic power management for PMD's. +* **Add PMD power management helper API** + + A new helper API has been added to make using Ethernet PMD power management + easier for the user: ``rte_power_pmd_mgmt_queue_enable()``. Three power + management schemes are supported initially: + + * Power saving based on UMWAIT instruction (x86 only) + * Power saving based on ``rte_pause()`` (generic) or TPAUSE instruction (x86 only) + * Power saving based on frequency scaling through the ``librte_power`` library + Removed Items ------------- diff --git a/lib/librte_power/meson.build b/lib/librte_power/meson.build index 4b4cf1b90b..51a471b669 100644 --- a/lib/librte_power/meson.build +++ b/lib/librte_power/meson.build @@ -9,6 +9,7 @@ sources = files('rte_power.c', 'power_acpi_cpufreq.c', 'power_kvm_vm.c', 'guest_channel.c', 'rte_power_empty_poll.c', 'power_pstate_cpufreq.c', + 'rte_power_pmd_mgmt.c', 'power_common.c') -headers = files('rte_power.h','rte_power_empty_poll.h') -deps += ['timer'] +headers = files('rte_power.h','rte_power_empty_poll.h','rte_power_pmd_mgmt.h') +deps += ['timer' ,'ethdev'] diff --git a/lib/librte_power/rte_power_pmd_mgmt.c b/lib/librte_power/rte_power_pmd_mgmt.c new file mode 100644 index 0000000000..470c3a912b --- /dev/null +++ b/lib/librte_power/rte_power_pmd_mgmt.c @@ -0,0 +1,359 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2020 Intel Corporation + */ + +#include +#include +#include +#include +#include +#include + +#include "rte_power_pmd_mgmt.h" + +#define EMPTYPOLL_MAX 512 + +static struct pmd_conf_data { + struct rte_cpu_intrinsics intrinsics_support; + /**< what do we support? */ + uint64_t tsc_per_us; + /**< pre-calculated tsc diff for 1us */ + uint64_t pause_per_us; + /**< how many rte_pause can we fit in a microisecond? */ +} global_data; + +/** + * Possible power management states of an ethdev port. + */ +enum pmd_mgmt_state { + /** Device power management is disabled. */ + PMD_MGMT_DISABLED = 0, + /** Device power management is enabled. */ + PMD_MGMT_ENABLED, + /** Device powermanagement status is about to change. */ + PMD_MGMT_BUSY +}; + +struct pmd_queue_cfg { + volatile enum pmd_mgmt_state pwr_mgmt_state; + /**< State of power management for this queue */ + enum rte_power_pmd_mgmt_type cb_mode; + /**< Callback mode for this queue */ + const struct rte_eth_rxtx_callback *cur_cb; + /**< Callback instance */ + volatile bool umwait_in_progress; + /**< are we currently sleeping? */ + uint64_t empty_poll_stats; + /**< Number of empty polls */ +} __rte_cache_aligned; + +static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT]; + +static void +calc_tsc(void) +{ + const uint64_t hz = rte_get_timer_hz(); + const uint64_t tsc_per_us = hz / US_PER_S; /* 1us */ + + global_data.tsc_per_us = tsc_per_us; + + /* only do this if we don't have tpause */ + if (!global_data.intrinsics_support.power_pause) { + const uint64_t start = rte_rdtsc_precise(); + const uint32_t n_pauses = 10000; + double us, us_per_pause; + uint64_t end; + unsigned int i; + + /* estimate number of rte_pause() calls per us*/ + for (i = 0; i < n_pauses; i++) + rte_pause(); + + end = rte_rdtsc_precise(); + us = (end - start) / (double)tsc_per_us; + us_per_pause = us / n_pauses; + + global_data.pause_per_us = (uint64_t)(1.0 / us_per_pause); + } +} + +static uint16_t +clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused, + uint16_t nb_rx, uint16_t max_pkts __rte_unused, + void *addr __rte_unused) +{ + + struct pmd_queue_cfg *q_conf; + + q_conf = &port_cfg[port_id][qidx]; + + if (unlikely(nb_rx == 0)) { + q_conf->empty_poll_stats++; + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) { + struct rte_power_monitor_cond pmc; + uint16_t ret; + + /* + * we might get a cancellation request while being + * inside the callback, in which case the wakeup + * wouldn't work because it would've arrived too early. + * + * to get around this, we notify the other thread that + * we're sleeping, so that it can spin until we're done. + * unsolicited wakeups are perfectly safe. + */ + q_conf->umwait_in_progress = true; + + /* check if we need to cancel sleep */ + if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) { + /* use monitoring condition to sleep */ + ret = rte_eth_get_monitor_addr(port_id, qidx, + &pmc); + if (ret == 0) + rte_power_monitor(&pmc, -1ULL); + } + q_conf->umwait_in_progress = false; + } + } else + q_conf->empty_poll_stats = 0; + + return nb_rx; +} + +static uint16_t +clb_pause(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused, + uint16_t nb_rx, uint16_t max_pkts __rte_unused, + void *addr __rte_unused) +{ + struct pmd_queue_cfg *q_conf; + + q_conf = &port_cfg[port_id][qidx]; + + if (unlikely(nb_rx == 0)) { + q_conf->empty_poll_stats++; + /* sleep for 1 microsecond */ + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) { + /* use tpause if we have it */ + if (global_data.intrinsics_support.power_pause) { + const uint64_t cur = rte_rdtsc(); + const uint64_t wait_tsc = + cur + global_data.tsc_per_us; + rte_power_pause(wait_tsc); + } else { + uint64_t i; + for (i = 0; i < global_data.pause_per_us; i++) + rte_pause(); + } + } + } else + q_conf->empty_poll_stats = 0; + + return nb_rx; +} + +static uint16_t +clb_scale_freq(uint16_t port_id, uint16_t qidx, + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, + uint16_t max_pkts __rte_unused, void *_ __rte_unused) +{ + struct pmd_queue_cfg *q_conf; + + q_conf = &port_cfg[port_id][qidx]; + + if (unlikely(nb_rx == 0)) { + q_conf->empty_poll_stats++; + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) + /* scale down freq */ + rte_power_freq_min(rte_lcore_id()); + } else { + q_conf->empty_poll_stats = 0; + /* scale up freq */ + rte_power_freq_max(rte_lcore_id()); + } + + return nb_rx; +} + +int +rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, + uint16_t queue_id, enum rte_power_pmd_mgmt_type mode) +{ + struct pmd_queue_cfg *queue_cfg; + struct rte_eth_dev_info info; + int ret; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); + + if (queue_id >= RTE_MAX_QUEUES_PER_PORT || lcore_id >= RTE_MAX_LCORE) { + ret = -EINVAL; + goto end; + } + + if (rte_eth_dev_info_get(port_id, &info) < 0) { + ret = -EINVAL; + goto end; + } + + /* check if queue id is valid */ + if (queue_id >= info.nb_rx_queues) { + ret = -EINVAL; + goto end; + } + + queue_cfg = &port_cfg[port_id][queue_id]; + + if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) { + ret = -EINVAL; + goto end; + } + + /* we're about to change our state */ + queue_cfg->pwr_mgmt_state = PMD_MGMT_BUSY; + + /* we need this in various places */ + rte_cpu_get_intrinsics_support(&global_data.intrinsics_support); + + switch (mode) { + case RTE_POWER_MGMT_TYPE_MONITOR: + { + struct rte_power_monitor_cond dummy; + + /* check if rte_power_monitor is supported */ + if (!global_data.intrinsics_support.power_monitor) { + RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n"); + ret = -ENOTSUP; + goto rollback; + } + + /* check if the device supports the necessary PMD API */ + if (rte_eth_get_monitor_addr(port_id, queue_id, + &dummy) == -ENOTSUP) { + RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_get_monitor_addr\n"); + ret = -ENOTSUP; + goto rollback; + } + /* initialize data before enabling the callback */ + queue_cfg->empty_poll_stats = 0; + queue_cfg->cb_mode = mode; + queue_cfg->umwait_in_progress = false; + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; + + queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id, + clb_umwait, NULL); + break; + } + case RTE_POWER_MGMT_TYPE_SCALE: + { + enum power_management_env env; + /* only PSTATE and ACPI modes are supported */ + if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) && + !rte_power_check_env_supported( + PM_ENV_PSTATE_CPUFREQ)) { + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are supported\n"); + ret = -ENOTSUP; + goto rollback; + } + /* ensure we could initialize the power library */ + if (rte_power_init(lcore_id)) { + ret = -EINVAL; + goto rollback; + } + /* ensure we initialized the correct env */ + env = rte_power_get_env(); + if (env != PM_ENV_ACPI_CPUFREQ && + env != PM_ENV_PSTATE_CPUFREQ) { + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes were initialized\n"); + ret = -ENOTSUP; + goto rollback; + } + /* initialize data before enabling the callback */ + queue_cfg->empty_poll_stats = 0; + queue_cfg->cb_mode = mode; + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; + + queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, + queue_id, clb_scale_freq, NULL); + break; + } + case RTE_POWER_MGMT_TYPE_PAUSE: + /* figure out various time-to-tsc conversions */ + if (global_data.tsc_per_us == 0) + calc_tsc(); + + /* initialize data before enabling the callback */ + queue_cfg->empty_poll_stats = 0; + queue_cfg->cb_mode = mode; + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; + + queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id, + clb_pause, NULL); + break; + } + ret = 0; + + return ret; + +rollback: + queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED; +end: + return ret; +} + +int +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id, + uint16_t port_id, uint16_t queue_id) +{ + struct pmd_queue_cfg *queue_cfg; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); + + if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT) + return -EINVAL; + + /* no need to check queue id as wrong queue id would not be enabled */ + queue_cfg = &port_cfg[port_id][queue_id]; + + if (queue_cfg->pwr_mgmt_state != PMD_MGMT_ENABLED) + return -EINVAL; + + /* let the callback know we're shutting down */ + queue_cfg->pwr_mgmt_state = PMD_MGMT_BUSY; + + switch (queue_cfg->cb_mode) { + case RTE_POWER_MGMT_TYPE_MONITOR: + { + bool exit = false; + do { + /* + * we may request cancellation while the other thread + * has just entered the callback but hasn't started + * sleeping yet, so keep waking it up until we know it's + * done sleeping. + */ + if (queue_cfg->umwait_in_progress) + rte_power_monitor_wakeup(lcore_id); + else + exit = true; + } while (!exit); + } + /* fall-through */ + case RTE_POWER_MGMT_TYPE_PAUSE: + rte_eth_remove_rx_callback(port_id, queue_id, + queue_cfg->cur_cb); + break; + case RTE_POWER_MGMT_TYPE_SCALE: + rte_power_freq_max(lcore_id); + rte_eth_remove_rx_callback(port_id, queue_id, + queue_cfg->cur_cb); + rte_power_exit(lcore_id); + break; + } + /* + * we don't free the RX callback here because it is unsafe to do so + * unless we know for a fact that all data plane threads have stopped. + */ + queue_cfg->cur_cb = NULL; + queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED; + + return 0; +} diff --git a/lib/librte_power/rte_power_pmd_mgmt.h b/lib/librte_power/rte_power_pmd_mgmt.h new file mode 100644 index 0000000000..0bfbc6ba69 --- /dev/null +++ b/lib/librte_power/rte_power_pmd_mgmt.h @@ -0,0 +1,90 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2020 Intel Corporation + */ + +#ifndef _RTE_POWER_PMD_MGMT_H +#define _RTE_POWER_PMD_MGMT_H + +/** + * @file + * RTE PMD Power Management + */ +#include +#include + +#include +#include +#include +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * PMD Power Management Type + */ +enum rte_power_pmd_mgmt_type { + /** Use power-optimized monitoring to wait for incoming traffic */ + RTE_POWER_MGMT_TYPE_MONITOR = 1, + /** Use power-optimized sleep to avoid busy polling */ + RTE_POWER_MGMT_TYPE_PAUSE, + /** Use frequency scaling when traffic is low */ + RTE_POWER_MGMT_TYPE_SCALE, +}; + +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice + * + * Enable power management on a specified RX queue and lcore. + * + * @note This function is not thread-safe. + * + * @param lcore_id + * lcore_id. + * @param port_id + * The port identifier of the Ethernet device. + * @param queue_id + * The queue identifier of the Ethernet device. + * @param mode + * The power management callback function type. + + * @return + * 0 on success + * <0 on error + */ +__rte_experimental +int +rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id, + uint16_t port_id, uint16_t queue_id, + enum rte_power_pmd_mgmt_type mode); + +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice + * + * Disable power management on a specified RX queue and lcore. + * + * @note This function is not thread-safe. + * + * @param lcore_id + * lcore_id. + * @param port_id + * The port identifier of the Ethernet device. + * @param queue_id + * The queue identifier of the Ethernet device. + * @return + * 0 on success + * <0 on error + */ +__rte_experimental +int +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id, + uint16_t port_id, uint16_t queue_id); +#ifdef __cplusplus +} +#endif + +#endif diff --git a/lib/librte_power/version.map b/lib/librte_power/version.map index 69ca9af616..61996b4d11 100644 --- a/lib/librte_power/version.map +++ b/lib/librte_power/version.map @@ -34,4 +34,9 @@ EXPERIMENTAL { rte_power_guest_channel_receive_msg; rte_power_poll_stat_fetch; rte_power_poll_stat_update; + + # added in 21.02 + rte_power_pmd_mgmt_queue_enable; + rte_power_pmd_mgmt_queue_disable; + }; From patchwork Mon Jan 11 14:58:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 86322 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E0497A09FF; Mon, 11 Jan 2021 16:00:20 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5FC4E140ECA; Mon, 11 Jan 2021 15:59:23 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by mails.dpdk.org (Postfix) with ESMTP id 00551140E9A for ; Mon, 11 Jan 2021 15:59:21 +0100 (CET) IronPort-SDR: YgHZ+5UBDDtQ5Gy+hAc7HSnCIeuCQAUaTErPVw0VV0HRfpAEYtt+dG4xZfuEgUGKR9nvOimrlJ bOpbVJ1M1sOg== X-IronPort-AV: E=McAfee;i="6000,8403,9860"; a="157653041" X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="157653041" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2021 06:59:21 -0800 IronPort-SDR: vhxlGW9RJlmV+GM9ygmHuv7ZziYihPp7RBsOfmOJDRSi5rwjSAxZXnTQT8IFPM4Tcf/UEEHFi5 Jb9FL4U01m2Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="423816611" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by orsmga001.jf.intel.com with ESMTP; 11 Jan 2021 06:59:18 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , Jeff Guo , Haiyue Wang , thomas@monjalon.net, konstantin.ananyev@intel.com, timothy.mcdaniel@intel.com, david.hunt@intel.com, bruce.richardson@intel.com, chris.macnamara@intel.com Date: Mon, 11 Jan 2021 14:58:53 +0000 Message-Id: <29d8b742fa668b9e161baffb45a1d9b2c5614fe1.1610377085.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v15 08/11] net/ixgbe: implement power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Implement support for the power management API by implementing a `get_monitor_addr` function that will return an address of an RX ring's status bit. Signed-off-by: Anatoly Burakov Signed-off-by: Liang Ma Acked-by: Konstantin Ananyev --- drivers/net/ixgbe/ixgbe_ethdev.c | 1 + drivers/net/ixgbe/ixgbe_rxtx.c | 25 +++++++++++++++++++++++++ drivers/net/ixgbe/ixgbe_rxtx.h | 1 + 3 files changed, 27 insertions(+) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 9a47a8b262..4b7a5ca60b 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -560,6 +560,7 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = { .udp_tunnel_port_del = ixgbe_dev_udp_tunnel_port_del, .tm_ops_get = ixgbe_tm_ops_get, .tx_done_cleanup = ixgbe_dev_tx_done_cleanup, + .get_monitor_addr = ixgbe_get_monitor_addr, }; /* diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index 6cfbb582e2..7e046a1819 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.c +++ b/drivers/net/ixgbe/ixgbe_rxtx.c @@ -1369,6 +1369,31 @@ const uint32_t RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP, }; +int +ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) +{ + volatile union ixgbe_adv_rx_desc *rxdp; + struct ixgbe_rx_queue *rxq = rx_queue; + uint16_t desc; + + desc = rxq->rx_tail; + rxdp = &rxq->rx_ring[desc]; + /* watch for changes in status bit */ + pmc->addr = &rxdp->wb.upper.status_error; + + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD); + pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD); + + /* the registers are 32-bit */ + pmc->data_sz = sizeof(uint32_t); + + return 0; +} + /* @note: fix ixgbe_dev_supported_ptypes_get() if any change here. */ static inline uint32_t ixgbe_rxd_pkt_info_to_pkt_type(uint32_t pkt_info, uint16_t ptype_mask) diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h index 6d2f7c9da3..8a25e98df6 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.h +++ b/drivers/net/ixgbe/ixgbe_rxtx.h @@ -299,5 +299,6 @@ uint64_t ixgbe_get_tx_port_offloads(struct rte_eth_dev *dev); uint64_t ixgbe_get_rx_queue_offloads(struct rte_eth_dev *dev); uint64_t ixgbe_get_rx_port_offloads(struct rte_eth_dev *dev); uint64_t ixgbe_get_tx_queue_offloads(struct rte_eth_dev *dev); +int ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc); #endif /* _IXGBE_RXTX_H_ */ From patchwork Mon Jan 11 14:58:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 86323 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 34866A09FF; Mon, 11 Jan 2021 16:00:34 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 14E27140ED3; Mon, 11 Jan 2021 15:59:25 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by mails.dpdk.org (Postfix) with ESMTP id F2E78140EB6 for ; Mon, 11 Jan 2021 15:59:22 +0100 (CET) IronPort-SDR: Bdnk/8SCz4tNybSaTaQdXnolxmo/Om1l9VusAJMNV6tz44IGSQcvVQp2WcPPXk5iqTHnIKaJmm iCW2Tk/C7otw== X-IronPort-AV: E=McAfee;i="6000,8403,9860"; a="157653049" X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="157653049" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2021 06:59:22 -0800 IronPort-SDR: 2M6udHGYBf3jCDvA9QMDEmAtTc0CeRvf1bETeM+rSrg/uBUE+KOTtwqNMznMz+EZzMtnGEjNb3 70fbCb8Ort4w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="423816663" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by orsmga001.jf.intel.com with ESMTP; 11 Jan 2021 06:59:20 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , Beilei Xing , Jeff Guo , thomas@monjalon.net, konstantin.ananyev@intel.com, timothy.mcdaniel@intel.com, david.hunt@intel.com, bruce.richardson@intel.com, chris.macnamara@intel.com Date: Mon, 11 Jan 2021 14:58:54 +0000 Message-Id: <0b7e2a3a73d80a56ab9fd0946d8bba12d7400cef.1610377085.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v15 09/11] net/i40e: implement power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Implement support for the power management API by implementing a `get_monitor_addr` function that will return an address of an RX ring's status bit. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev Acked-by: Jeff Guo --- drivers/net/i40e/i40e_ethdev.c | 1 + drivers/net/i40e/i40e_rxtx.c | 25 +++++++++++++++++++++++++ drivers/net/i40e/i40e_rxtx.h | 1 + 3 files changed, 27 insertions(+) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index f54769c29d..af2577a140 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -510,6 +510,7 @@ static const struct eth_dev_ops i40e_eth_dev_ops = { .mtu_set = i40e_dev_mtu_set, .tm_ops_get = i40e_tm_ops_get, .tx_done_cleanup = i40e_tx_done_cleanup, + .get_monitor_addr = i40e_get_monitor_addr, }; /* store statistics names and its offset in stats structure */ diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 5df9a9df56..0b4220fc9c 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -72,6 +72,31 @@ #define I40E_TX_OFFLOAD_NOTSUP_MASK \ (PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK) +int +i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) +{ + struct i40e_rx_queue *rxq = rx_queue; + volatile union i40e_rx_desc *rxdp; + uint16_t desc; + + desc = rxq->rx_tail; + rxdp = &rxq->rx_ring[desc]; + /* watch for changes in status bit */ + pmc->addr = &rxdp->wb.qword1.status_error_len; + + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); + pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); + + /* registers are 64-bit */ + pmc->data_sz = sizeof(uint64_t); + + return 0; +} + static inline void i40e_rxd_to_vlan_tci(struct rte_mbuf *mb, volatile union i40e_rx_desc *rxdp) { diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h index 57d7b4160b..e1494525ce 100644 --- a/drivers/net/i40e/i40e_rxtx.h +++ b/drivers/net/i40e/i40e_rxtx.h @@ -248,6 +248,7 @@ uint16_t i40e_recv_scattered_pkts_vec_avx2(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts); uint16_t i40e_xmit_pkts_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts); +int i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc); /* For each value it means, datasheet of hardware can tell more details * From patchwork Mon Jan 11 14:58:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 86324 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E0120A09FF; Mon, 11 Jan 2021 16:00:45 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 53440140ED9; Mon, 11 Jan 2021 15:59:26 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by mails.dpdk.org (Postfix) with ESMTP id 5318E140ED7 for ; Mon, 11 Jan 2021 15:59:25 +0100 (CET) IronPort-SDR: vbfBwRpp9NGfYmkUKcA/pbCdHZAn1r12CtIX8FJ/s5yw9JYgSV1muvdg9s24xTWQwn2MWUofJl qTdiIvV0PZNA== X-IronPort-AV: E=McAfee;i="6000,8403,9860"; a="157653056" X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="157653056" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2021 06:59:24 -0800 IronPort-SDR: jgdDxVDjEyheeUJ7puc6cHhsafq3fQqwQ8232DzTvYFlwnXBlLw2aOou3YxI5bwg0oIQO0mQiL fB8+LQOSAC1w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="423816680" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by orsmga001.jf.intel.com with ESMTP; 11 Jan 2021 06:59:22 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , Qiming Yang , Qi Zhang , thomas@monjalon.net, konstantin.ananyev@intel.com, timothy.mcdaniel@intel.com, david.hunt@intel.com, bruce.richardson@intel.com, chris.macnamara@intel.com Date: Mon, 11 Jan 2021 14:58:55 +0000 Message-Id: <82955d7279da4c609bcbe2305734731c23f814cc.1610377085.git.anatoly.burakov@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v15 10/11] net/ice: implement power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Implement support for the power management API by implementing a `get_monitor_addr` function that will return an address of an RX ring's status bit. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev --- drivers/net/ice/ice_ethdev.c | 1 + drivers/net/ice/ice_rxtx.c | 26 ++++++++++++++++++++++++++ drivers/net/ice/ice_rxtx.h | 1 + 3 files changed, 28 insertions(+) diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index 9a5d6a559f..c21682c120 100644 --- a/drivers/net/ice/ice_ethdev.c +++ b/drivers/net/ice/ice_ethdev.c @@ -216,6 +216,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = { .udp_tunnel_port_add = ice_dev_udp_tunnel_port_add, .udp_tunnel_port_del = ice_dev_udp_tunnel_port_del, .tx_done_cleanup = ice_tx_done_cleanup, + .get_monitor_addr = ice_get_monitor_addr, }; /* store statistics names and its offset in stats structure */ diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index 5fbd68eafc..fa9e9a235b 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -26,6 +26,32 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask; uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask; uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask; +int +ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc) +{ + volatile union ice_rx_flex_desc *rxdp; + struct ice_rx_queue *rxq = rx_queue; + uint16_t desc; + + desc = rxq->rx_tail; + rxdp = &rxq->rx_ring[desc]; + /* watch for changes in status bit */ + pmc->addr = &rxdp->wb.status_error0; + + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S); + pmc->mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S); + + /* register is 16-bit */ + pmc->data_sz = sizeof(uint16_t); + + return 0; +} + + static inline uint8_t ice_proto_xtr_type_to_rxdid(uint8_t xtr_type) { diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h index 6b16716063..906fbefdc4 100644 --- a/drivers/net/ice/ice_rxtx.h +++ b/drivers/net/ice/ice_rxtx.h @@ -263,6 +263,7 @@ uint16_t ice_xmit_pkts_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts); int ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc); int ice_tx_done_cleanup(void *txq, uint32_t free_cnt); +int ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc); #define FDIR_PARSING_ENABLE_PER_QUEUE(ad, on) do { \ int i; \ From patchwork Mon Jan 11 14:58:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anatoly Burakov X-Patchwork-Id: 86325 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 5BF25A09FF; Mon, 11 Jan 2021 16:00:56 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8AE68140ED6; Mon, 11 Jan 2021 15:59:29 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by mails.dpdk.org (Postfix) with ESMTP id B628E140ED6 for ; Mon, 11 Jan 2021 15:59:27 +0100 (CET) IronPort-SDR: HkmaHjPAf7cGeHR8lgKbX0x30I1VcAVDs44cCNSWL2tqg8sTHfCqm+Damvxw2UA6qJ3TtBo1hZ akPWt0Q8Hhvw== X-IronPort-AV: E=McAfee;i="6000,8403,9860"; a="157653069" X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="157653069" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Jan 2021 06:59:27 -0800 IronPort-SDR: gCQe9YSB6nqRc/1I+yc4YtDZ2tXd+nfibeZDAzM/NvTMemY1Tv75xOFyvEzL0LTuEFkgCUZHic Z0t8dec8UO4w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.79,338,1602572400"; d="scan'208";a="423816686" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.179]) by orsmga001.jf.intel.com with ESMTP; 11 Jan 2021 06:59:24 -0800 From: Anatoly Burakov To: dev@dpdk.org Cc: Liang Ma , David Hunt , thomas@monjalon.net, konstantin.ananyev@intel.com, timothy.mcdaniel@intel.com, bruce.richardson@intel.com, chris.macnamara@intel.com Date: Mon, 11 Jan 2021 14:58:56 +0000 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH v15 11/11] examples/l3fwd-power: enable PMD power mgmt X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Liang Ma Add PMD power management feature support to l3fwd-power sample app. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov --- Notes: v12: - Allow selecting PMD power management scheme from command-line - Enforce 1 core 1 queue rule .../sample_app_ug/l3_forward_power_man.rst | 35 ++++++++ examples/l3fwd-power/main.c | 89 ++++++++++++++++++- 2 files changed, 122 insertions(+), 2 deletions(-) diff --git a/doc/guides/sample_app_ug/l3_forward_power_man.rst b/doc/guides/sample_app_ug/l3_forward_power_man.rst index 85a78a5c1e..aaa9367fae 100644 --- a/doc/guides/sample_app_ug/l3_forward_power_man.rst +++ b/doc/guides/sample_app_ug/l3_forward_power_man.rst @@ -109,6 +109,8 @@ where, * --telemetry: Telemetry mode. +* --pmd-mgmt: PMD power management mode. + See :doc:`l3_forward` for details. The L3fwd-power example reuses the L3fwd command line options. @@ -456,3 +458,36 @@ reference cycles and accordingly busy rate is set to either 0% or The new stats ``empty_poll`` , ``full_poll`` and ``busy_percent`` can be viewed by running the script ``/usertools/dpdk-telemetry-client.py`` and selecting the menu option ``Send for global Metrics``. + +PMD power management Mode +------------------------- + +The PMD power management mode support for ``l3fwd-power`` is a standalone mode, in this mode +``l3fwd-power`` does simple l3fwding along with enable the power saving scheme on specific +port/queue/lcore. Main purpose for this mode is to demonstrate how to use the PMD power management API. + +.. code-block:: console + + ./build/examples/dpdk-l3fwd-power -l 1-3 -- --pmd-mgmt -p 0x0f --config="(0,0,2),(0,1,3)" + +PMD Power Management Mode +------------------------- +There is also a traffic-aware operating mode that, instead of using explicit +power management, will use automatic PMD power management. This mode is limited +to one queue per core, and has three available power management schemes: + +* ``monitor`` - this will use ``rte_power_monitor()`` function to enter a + power-optimized state (subject to platform support). + +* ``pause`` - this will use ``rte_power_pause()`` or ``rte_pause()`` to avoid + busy looping when there is no traffic. + +* ``scale`` - this will use frequency scaling routines available in the + ``librte_power`` library. + +See :doc:`Power Management<../prog_guide/power_man>` chapter in the DPDK +Programmer's Guide for more details on PMD power management. + +.. code-block:: console + + .//examples/dpdk-l3fwd-power -l 1-3 -- -p 0x0f --config="(0,0,2),(0,1,3)" --pmd-mgmt=scale diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index 995a3b6ad7..e312b6f355 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -47,6 +47,7 @@ #include #include #include +#include #include "perf_core.h" #include "main.h" @@ -199,11 +200,14 @@ enum appmode { APP_MODE_LEGACY, APP_MODE_EMPTY_POLL, APP_MODE_TELEMETRY, - APP_MODE_INTERRUPT + APP_MODE_INTERRUPT, + APP_MODE_PMD_MGMT }; enum appmode app_mode; +static enum rte_power_pmd_mgmt_type pmgmt_type; + enum freq_scale_hint_t { FREQ_LOWER = -1, @@ -1611,7 +1615,9 @@ print_usage(const char *prgname) " follow (training_flag, high_threshold, med_threshold)\n" " --telemetry: enable telemetry mode, to update" " empty polls, full polls, and core busyness to telemetry\n" - " --interrupt-only: enable interrupt-only mode\n", + " --interrupt-only: enable interrupt-only mode\n" + " --pmd-mgmt MODE: enable PMD power management mode. " + "Currently supported modes: monitor, pause, scale\n", prgname); } @@ -1701,6 +1707,32 @@ parse_config(const char *q_arg) return 0; } + +static int +parse_pmd_mgmt_config(const char *name) +{ +#define PMD_MGMT_MONITOR "monitor" +#define PMD_MGMT_PAUSE "pause" +#define PMD_MGMT_SCALE "scale" + + if (strncmp(PMD_MGMT_MONITOR, name, sizeof(PMD_MGMT_MONITOR)) == 0) { + pmgmt_type = RTE_POWER_MGMT_TYPE_MONITOR; + return 0; + } + + if (strncmp(PMD_MGMT_PAUSE, name, sizeof(PMD_MGMT_PAUSE)) == 0) { + pmgmt_type = RTE_POWER_MGMT_TYPE_PAUSE; + return 0; + } + + if (strncmp(PMD_MGMT_SCALE, name, sizeof(PMD_MGMT_SCALE)) == 0) { + pmgmt_type = RTE_POWER_MGMT_TYPE_SCALE; + return 0; + } + /* unknown PMD power management mode */ + return -1; +} + static int parse_ep_config(const char *q_arg) { @@ -1755,6 +1787,7 @@ parse_ep_config(const char *q_arg) #define CMD_LINE_OPT_EMPTY_POLL "empty-poll" #define CMD_LINE_OPT_INTERRUPT_ONLY "interrupt-only" #define CMD_LINE_OPT_TELEMETRY "telemetry" +#define CMD_LINE_OPT_PMD_MGMT "pmd-mgmt" /* Parse the argument given in the command line of the application */ static int @@ -1776,6 +1809,7 @@ parse_args(int argc, char **argv) {CMD_LINE_OPT_LEGACY, 0, 0, 0}, {CMD_LINE_OPT_TELEMETRY, 0, 0, 0}, {CMD_LINE_OPT_INTERRUPT_ONLY, 0, 0, 0}, + {CMD_LINE_OPT_PMD_MGMT, 1, 0, 0}, {NULL, 0, 0, 0} }; @@ -1886,6 +1920,21 @@ parse_args(int argc, char **argv) printf("telemetry mode is enabled\n"); } + if (!strncmp(lgopts[option_index].name, + CMD_LINE_OPT_PMD_MGMT, + sizeof(CMD_LINE_OPT_PMD_MGMT))) { + if (app_mode != APP_MODE_DEFAULT) { + printf(" power mgmt mode is mutually exclusive with other modes\n"); + return -1; + } + if (parse_pmd_mgmt_config(optarg) < 0) { + printf(" Invalid PMD power management mode: %s\n", + optarg); + return -1; + } + app_mode = APP_MODE_PMD_MGMT; + printf("PMD power mgmt mode is enabled\n"); + } if (!strncmp(lgopts[option_index].name, CMD_LINE_OPT_INTERRUPT_ONLY, sizeof(CMD_LINE_OPT_INTERRUPT_ONLY))) { @@ -2442,6 +2491,8 @@ mode_to_str(enum appmode mode) return "telemetry"; case APP_MODE_INTERRUPT: return "interrupt-only"; + case APP_MODE_PMD_MGMT: + return "pmd mgmt"; default: return "invalid"; } @@ -2671,6 +2722,13 @@ main(int argc, char **argv) qconf = &lcore_conf[lcore_id]; printf("\nInitializing rx queues on lcore %u ... ", lcore_id ); fflush(stdout); + + /* PMD power management mode can only do 1 queue per core */ + if (app_mode == APP_MODE_PMD_MGMT && qconf->n_rx_queue > 1) { + rte_exit(EXIT_FAILURE, + "In PMD power management mode, only one queue per lcore is allowed\n"); + } + /* init RX queues */ for(queue = 0; queue < qconf->n_rx_queue; ++queue) { struct rte_eth_rxconf rxq_conf; @@ -2708,6 +2766,16 @@ main(int argc, char **argv) rte_exit(EXIT_FAILURE, "Fail to add ptype cb\n"); } + + if (app_mode == APP_MODE_PMD_MGMT) { + ret = rte_power_pmd_mgmt_queue_enable( + lcore_id, portid, queueid, + pmgmt_type); + if (ret < 0) + rte_exit(EXIT_FAILURE, + "rte_power_pmd_mgmt_queue_enable: err=%d, port=%d\n", + ret, portid); + } } } @@ -2798,6 +2866,9 @@ main(int argc, char **argv) SKIP_MAIN); } else if (app_mode == APP_MODE_INTERRUPT) { rte_eal_mp_remote_launch(main_intr_loop, NULL, CALL_MAIN); + } else if (app_mode == APP_MODE_PMD_MGMT) { + /* reuse telemetry loop for PMD power management mode */ + rte_eal_mp_remote_launch(main_telemetry_loop, NULL, CALL_MAIN); } if (app_mode == APP_MODE_EMPTY_POLL || app_mode == APP_MODE_TELEMETRY) @@ -2824,6 +2895,20 @@ main(int argc, char **argv) if (app_mode == APP_MODE_EMPTY_POLL) rte_power_empty_poll_stat_free(); + if (app_mode == APP_MODE_PMD_MGMT) { + for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { + if (rte_lcore_is_enabled(lcore_id) == 0) + continue; + qconf = &lcore_conf[lcore_id]; + for (queue = 0; queue < qconf->n_rx_queue; ++queue) { + portid = qconf->rx_queue_list[queue].port_id; + queueid = qconf->rx_queue_list[queue].queue_id; + rte_power_pmd_mgmt_queue_disable(lcore_id, + portid, queueid); + } + } + } + if ((app_mode == APP_MODE_LEGACY || app_mode == APP_MODE_EMPTY_POLL) && deinit_power_library()) rte_exit(EXIT_FAILURE, "deinit_power_library failed\n");