From patchwork Tue Oct 27 14:59:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 82333 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 989D7A04B5; Tue, 27 Oct 2020 15:59:36 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 902F858CD; Tue, 27 Oct 2020 15:59:29 +0100 (CET) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 89B194C6C for ; Tue, 27 Oct 2020 15:59:28 +0100 (CET) IronPort-SDR: LrwvJLCeLUAHh6cazQPw/Dlsb0eALTd2muwH5Y3df0HuFsMmkLNdq6mtePmM2colJ3Uz2+s+Oc 3JHBhtLwkI6A== X-IronPort-AV: E=McAfee;i="6000,8403,9786"; a="167314023" X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="167314023" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2020 07:59:26 -0700 IronPort-SDR: qWZNHLYxr2o9lsSV7lZck0TUVJTsu5laT0TjKbXyV/Z2qubiMy8QqpJ4Er27qE90znODX1FELc fS3c9UxaoOiQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="303902984" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga007.fm.intel.com with ESMTP; 27 Oct 2020 07:59:23 -0700 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 09RExMaw015662; Tue, 27 Oct 2020 14:59:22 GMT Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 09RExMCM022511; Tue, 27 Oct 2020 14:59:22 GMT Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 09RExLfm022507; Tue, 27 Oct 2020 14:59:21 GMT From: Liang Ma To: dev@dpdk.org Cc: ruifeng.wang@arm.com, haiyue.wang@intel.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, david.hunt@intel.com, jerinjacobk@gmail.com, nhorman@tuxdriver.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, mw@semihalf.com, gtzalik@amazon.com, ajit.khaparde@broadcom.com, hkalra@marvell.com, johndale@cisco.com, xavier.huwei@huawei.com, xuanziyang2@huawei.com, matan@nvidia.com, yongwang@vmware.com, Anatoly Burakov Date: Tue, 27 Oct 2020 14:59:01 +0000 Message-Id: <1603810749-22285-2-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1603810749-22285-1-git-send-email-liang.j.ma@intel.com> References: <1603810749-22285-1-git-send-email-liang.j.ma@intel.com> In-Reply-To: <1603494392-7181-1-git-send-email-liang.j.ma@intel.com> References: <1603494392-7181-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [PATCH v10 1/9] eal: add new x86 cpuid support for WAITPKG X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add a new CPUID flag indicating processor support for UMONITOR/UMWAIT and TPAUSE instructions instruction. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev --- lib/librte_eal/x86/include/rte_cpuflags.h | 1 + lib/librte_eal/x86/rte_cpuflags.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/lib/librte_eal/x86/include/rte_cpuflags.h b/lib/librte_eal/x86/include/rte_cpuflags.h index c1d20364d1..848ba9cbfb 100644 --- a/lib/librte_eal/x86/include/rte_cpuflags.h +++ b/lib/librte_eal/x86/include/rte_cpuflags.h @@ -132,6 +132,7 @@ enum rte_cpu_flag_t { RTE_CPUFLAG_MOVDIR64B, /**< Direct Store Instructions 64B */ RTE_CPUFLAG_AVX512VP2INTERSECT, /**< AVX512 Two Register Intersection */ + RTE_CPUFLAG_WAITPKG, /**< UMONITOR/UMWAIT/TPAUSE */ /* The last item */ RTE_CPUFLAG_NUMFLAGS, /**< This should always be the last! */ }; diff --git a/lib/librte_eal/x86/rte_cpuflags.c b/lib/librte_eal/x86/rte_cpuflags.c index 30439e7951..0325c4b93b 100644 --- a/lib/librte_eal/x86/rte_cpuflags.c +++ b/lib/librte_eal/x86/rte_cpuflags.c @@ -110,6 +110,8 @@ const struct feature_entry rte_cpu_feature_table[] = { FEAT_DEF(AVX512F, 0x00000007, 0, RTE_REG_EBX, 16) FEAT_DEF(RDSEED, 0x00000007, 0, RTE_REG_EBX, 18) + FEAT_DEF(WAITPKG, 0x00000007, 0, RTE_REG_ECX, 5) + FEAT_DEF(LAHF_SAHF, 0x80000001, 0, RTE_REG_ECX, 0) FEAT_DEF(LZCNT, 0x80000001, 0, RTE_REG_ECX, 4) From patchwork Tue Oct 27 14:59:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 82334 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2D206A04B5; Tue, 27 Oct 2020 15:59:58 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 222B45AB2; Tue, 27 Oct 2020 15:59:35 +0100 (CET) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 9B85D5A8C for ; Tue, 27 Oct 2020 15:59:32 +0100 (CET) IronPort-SDR: dqkAOzCYIX/bxj0+IgWvZEnA5yQTCiJTUoklxdoyrDnY0WHJXPK414RP/9KtPU2xExO+fzl5fG UHrwwHsk0mxA== X-IronPort-AV: E=McAfee;i="6000,8403,9786"; a="185839887" X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="185839887" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2020 07:59:30 -0700 IronPort-SDR: NIAyb2cZeKAiRMr1CJzY900aRLU/ZrZV1jqn+awwQuqEoQxRq4q7GITWfM0h77u4/PGaiDdcdg wq+lGbyNuTxw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="394507260" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga001.jf.intel.com with ESMTP; 27 Oct 2020 07:59:26 -0700 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 09RExPc1015665; Tue, 27 Oct 2020 14:59:25 GMT Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 09RExP3r022565; Tue, 27 Oct 2020 14:59:25 GMT Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 09RExPYi022557; Tue, 27 Oct 2020 14:59:25 GMT From: Liang Ma To: dev@dpdk.org Cc: ruifeng.wang@arm.com, haiyue.wang@intel.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, david.hunt@intel.com, jerinjacobk@gmail.com, nhorman@tuxdriver.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, mw@semihalf.com, gtzalik@amazon.com, ajit.khaparde@broadcom.com, hkalra@marvell.com, johndale@cisco.com, xavier.huwei@huawei.com, xuanziyang2@huawei.com, matan@nvidia.com, yongwang@vmware.com, Anatoly Burakov , Jerin Jacob , Jan Viktorin , David Christensen Date: Tue, 27 Oct 2020 14:59:02 +0000 Message-Id: <1603810749-22285-3-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1603810749-22285-1-git-send-email-liang.j.ma@intel.com> References: <1603810749-22285-1-git-send-email-liang.j.ma@intel.com> In-Reply-To: <1603494392-7181-1-git-send-email-liang.j.ma@intel.com> References: <1603494392-7181-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [PATCH v10 2/9] eal: add power management intrinsics X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add two new power management intrinsics, and provide an implementation in eal/x86 based on UMONITOR/UMWAIT instructions. The instructions are implemented as raw byte opcodes because there is not yet widespread compiler support for these instructions. The power management instructions provide an architecture-specific function to either wait until a specified TSC timestamp is reached, or optionally wait until either a TSC timestamp is reached or a memory location is written to. The monitor function also provides an optional comparison, to avoid sleeping when the expected write has already happened, and no more writes are expected. For more details, please refer to Intel(R) 64 and IA-32 Architectures Software Developer's Manual, Volume 2. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: David Christensen Acked-by: Jerin Jacob Acked-by: Konstantin Ananyev Acked-by: Ruifeng Wang --- lib/librte_eal/arm/include/meson.build | 1 + .../arm/include/rte_power_intrinsics.h | 60 ++++++++ .../include/generic/rte_power_intrinsics.h | 111 ++++++++++++++ lib/librte_eal/include/meson.build | 1 + lib/librte_eal/ppc/include/meson.build | 1 + .../ppc/include/rte_power_intrinsics.h | 60 ++++++++ lib/librte_eal/x86/include/meson.build | 1 + .../x86/include/rte_power_intrinsics.h | 135 ++++++++++++++++++ 8 files changed, 370 insertions(+) create mode 100644 lib/librte_eal/arm/include/rte_power_intrinsics.h create mode 100644 lib/librte_eal/include/generic/rte_power_intrinsics.h create mode 100644 lib/librte_eal/ppc/include/rte_power_intrinsics.h create mode 100644 lib/librte_eal/x86/include/rte_power_intrinsics.h diff --git a/lib/librte_eal/arm/include/meson.build b/lib/librte_eal/arm/include/meson.build index 73b750a18f..c6a9f70d73 100644 --- a/lib/librte_eal/arm/include/meson.build +++ b/lib/librte_eal/arm/include/meson.build @@ -20,6 +20,7 @@ arch_headers = files( 'rte_pause_32.h', 'rte_pause_64.h', 'rte_pause.h', + 'rte_power_intrinsics.h', 'rte_prefetch_32.h', 'rte_prefetch_64.h', 'rte_prefetch.h', diff --git a/lib/librte_eal/arm/include/rte_power_intrinsics.h b/lib/librte_eal/arm/include/rte_power_intrinsics.h new file mode 100644 index 0000000000..a4a1bc1159 --- /dev/null +++ b/lib/librte_eal/arm/include/rte_power_intrinsics.h @@ -0,0 +1,60 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#ifndef _RTE_POWER_INTRINSIC_ARM_H_ +#define _RTE_POWER_INTRINSIC_ARM_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include + +#include "generic/rte_power_intrinsics.h" + +/** + * This function is not supported on ARM. + */ +static inline void +rte_power_monitor(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on ARM. + */ +static inline void +rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz, rte_spinlock_t *lck) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(lck); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on ARM. + */ +static inline void +rte_power_pause(const uint64_t tsc_timestamp) +{ + RTE_SET_USED(tsc_timestamp); +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_POWER_INTRINSIC_ARM_H_ */ diff --git a/lib/librte_eal/include/generic/rte_power_intrinsics.h b/lib/librte_eal/include/generic/rte_power_intrinsics.h new file mode 100644 index 0000000000..fb897d9060 --- /dev/null +++ b/lib/librte_eal/include/generic/rte_power_intrinsics.h @@ -0,0 +1,111 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#ifndef _RTE_POWER_INTRINSIC_H_ +#define _RTE_POWER_INTRINSIC_H_ + +#include + +#include +#include + +/** + * @file + * Advanced power management operations. + * + * This file define APIs for advanced power management, + * which are architecture-dependent. + */ + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Monitor specific address for changes. This will cause the CPU to enter an + * architecture-defined optimized power state until either the specified + * memory address is written to, a certain TSC timestamp is reached, or other + * reasons cause the CPU to wake up. + * + * Additionally, an `expected` 64-bit value and 64-bit mask are provided. If + * mask is non-zero, the current value pointed to by the `p` pointer will be + * checked against the expected value, and if they match, the entering of + * optimized power state may be aborted. + * + * @param p + * Address to monitor for changes. + * @param expected_value + * Before attempting the monitoring, the `p` address may be read and compared + * against this value. If `value_mask` is zero, this step will be skipped. + * @param value_mask + * The 64-bit mask to use to extract current value from `p`. + * @param tsc_timestamp + * Maximum TSC timestamp to wait for. Note that the wait behavior is + * architecture-dependent. + * @param data_sz + * Data size (in bytes) that will be used to compare expected value with the + * memory address. Can be 1, 2, 4 or 8. Supplying any other value will lead + * to undefined result. + */ +__rte_experimental +static inline void rte_power_monitor(const volatile void *p, + const uint64_t expected_value, const uint64_t value_mask, + const uint64_t tsc_timestamp, const uint8_t data_sz); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Monitor specific address for changes. This will cause the CPU to enter an + * architecture-defined optimized power state until either the specified + * memory address is written to, a certain TSC timestamp is reached, or other + * reasons cause the CPU to wake up. + * + * Additionally, an `expected` 64-bit value and 64-bit mask are provided. If + * mask is non-zero, the current value pointed to by the `p` pointer will be + * checked against the expected value, and if they match, the entering of + * optimized power state may be aborted. + * + * This call will also lock a spinlock on entering sleep, and release it on + * waking up the CPU. + * + * @param p + * Address to monitor for changes. + * @param expected_value + * Before attempting the monitoring, the `p` address may be read and compared + * against this value. If `value_mask` is zero, this step will be skipped. + * @param value_mask + * The 64-bit mask to use to extract current value from `p`. + * @param tsc_timestamp + * Maximum TSC timestamp to wait for. Note that the wait behavior is + * architecture-dependent. + * @param data_sz + * Data size (in bytes) that will be used to compare expected value with the + * memory address. Can be 1, 2, 4 or 8. Supplying any other value will lead + * to undefined result. + * @param lck + * A spinlock that must be locked before entering the function, will be + * unlocked while the CPU is sleeping, and will be locked again once the CPU + * wakes up. + */ +__rte_experimental +static inline void rte_power_monitor_sync(const volatile void *p, + const uint64_t expected_value, const uint64_t value_mask, + const uint64_t tsc_timestamp, const uint8_t data_sz, + rte_spinlock_t *lck); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Enter an architecture-defined optimized power state until a certain TSC + * timestamp is reached. + * + * @param tsc_timestamp + * Maximum TSC timestamp to wait for. Note that the wait behavior is + * architecture-dependent. + */ +__rte_experimental +static inline void rte_power_pause(const uint64_t tsc_timestamp); + +#endif /* _RTE_POWER_INTRINSIC_H_ */ diff --git a/lib/librte_eal/include/meson.build b/lib/librte_eal/include/meson.build index cd09027958..3a12e87e19 100644 --- a/lib/librte_eal/include/meson.build +++ b/lib/librte_eal/include/meson.build @@ -60,6 +60,7 @@ generic_headers = files( 'generic/rte_memcpy.h', 'generic/rte_pause.h', 'generic/rte_prefetch.h', + 'generic/rte_power_intrinsics.h', 'generic/rte_rwlock.h', 'generic/rte_spinlock.h', 'generic/rte_ticketlock.h', diff --git a/lib/librte_eal/ppc/include/meson.build b/lib/librte_eal/ppc/include/meson.build index ab4bd28092..0873b2aecb 100644 --- a/lib/librte_eal/ppc/include/meson.build +++ b/lib/librte_eal/ppc/include/meson.build @@ -10,6 +10,7 @@ arch_headers = files( 'rte_io.h', 'rte_memcpy.h', 'rte_pause.h', + 'rte_power_intrinsics.h', 'rte_prefetch.h', 'rte_rwlock.h', 'rte_spinlock.h', diff --git a/lib/librte_eal/ppc/include/rte_power_intrinsics.h b/lib/librte_eal/ppc/include/rte_power_intrinsics.h new file mode 100644 index 0000000000..4ed03d521f --- /dev/null +++ b/lib/librte_eal/ppc/include/rte_power_intrinsics.h @@ -0,0 +1,60 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#ifndef _RTE_POWER_INTRINSIC_PPC_H_ +#define _RTE_POWER_INTRINSIC_PPC_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include + +#include "generic/rte_power_intrinsics.h" + +/** + * This function is not supported on PPC64. + */ +static inline void +rte_power_monitor(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on PPC64. + */ +static inline void +rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz, rte_spinlock_t *lck) +{ + RTE_SET_USED(p); + RTE_SET_USED(expected_value); + RTE_SET_USED(value_mask); + RTE_SET_USED(tsc_timestamp); + RTE_SET_USED(lck); + RTE_SET_USED(data_sz); +} + +/** + * This function is not supported on PPC64. + */ +static inline void +rte_power_pause(const uint64_t tsc_timestamp) +{ + RTE_SET_USED(tsc_timestamp); +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_POWER_INTRINSIC_PPC_H_ */ diff --git a/lib/librte_eal/x86/include/meson.build b/lib/librte_eal/x86/include/meson.build index f0e998c2fe..494a8142a2 100644 --- a/lib/librte_eal/x86/include/meson.build +++ b/lib/librte_eal/x86/include/meson.build @@ -13,6 +13,7 @@ arch_headers = files( 'rte_io.h', 'rte_memcpy.h', 'rte_prefetch.h', + 'rte_power_intrinsics.h', 'rte_pause.h', 'rte_rtm.h', 'rte_rwlock.h', diff --git a/lib/librte_eal/x86/include/rte_power_intrinsics.h b/lib/librte_eal/x86/include/rte_power_intrinsics.h new file mode 100644 index 0000000000..f9b761d796 --- /dev/null +++ b/lib/librte_eal/x86/include/rte_power_intrinsics.h @@ -0,0 +1,135 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#ifndef _RTE_POWER_INTRINSIC_X86_H_ +#define _RTE_POWER_INTRINSIC_X86_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include + +#include "generic/rte_power_intrinsics.h" + +static inline uint64_t +__get_umwait_val(const volatile void *p, const uint8_t sz) +{ + switch (sz) { + case sizeof(uint8_t): + return *(const volatile uint8_t *)p; + case sizeof(uint16_t): + return *(const volatile uint16_t *)p; + case sizeof(uint32_t): + return *(const volatile uint32_t *)p; + case sizeof(uint64_t): + return *(const volatile uint64_t *)p; + default: + /* this is an intrinsic, so we can't have any error handling */ + RTE_ASSERT(0); + return 0; + } +} + +/** + * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state. + * For more information about usage of these instructions, please refer to + * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. + */ +static inline void +rte_power_monitor(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz) +{ + const uint32_t tsc_l = (uint32_t)tsc_timestamp; + const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + /* + * we're using raw byte codes for now as only the newest compiler + * versions support this instruction natively. + */ + + /* set address for UMONITOR */ + asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" + : + : "D"(p)); + + if (value_mask) { + const uint64_t cur_value = __get_umwait_val(p, data_sz); + const uint64_t masked = cur_value & value_mask; + + /* if the masked value is already matching, abort */ + if (masked == expected_value) + return; + } + /* execute UMWAIT */ + asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" + : /* ignore rflags */ + : "D"(0), /* enter C0.2 */ + "a"(tsc_l), "d"(tsc_h)); +} + +/** + * This function uses UMONITOR/UMWAIT instructions and will enter C0.2 state. + * For more information about usage of these instructions, please refer to + * Intel(R) 64 and IA-32 Architectures Software Developer's Manual. + */ +static inline void +rte_power_monitor_sync(const volatile void *p, const uint64_t expected_value, + const uint64_t value_mask, const uint64_t tsc_timestamp, + const uint8_t data_sz, rte_spinlock_t *lck) +{ + const uint32_t tsc_l = (uint32_t)tsc_timestamp; + const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + /* + * we're using raw byte codes for now as only the newest compiler + * versions support this instruction natively. + */ + + /* set address for UMONITOR */ + asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" + : + : "D"(p)); + + if (value_mask) { + const uint64_t cur_value = __get_umwait_val(p, data_sz); + const uint64_t masked = cur_value & value_mask; + + /* if the masked value is already matching, abort */ + if (masked == expected_value) + return; + } + rte_spinlock_unlock(lck); + + /* execute UMWAIT */ + asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;" + : /* ignore rflags */ + : "D"(0), /* enter C0.2 */ + "a"(tsc_l), "d"(tsc_h)); + + rte_spinlock_lock(lck); +} + +/** + * This function uses TPAUSE instruction and will enter C0.2 state. For more + * information about usage of this instruction, please refer to Intel(R) 64 and + * IA-32 Architectures Software Developer's Manual. + */ +static inline void +rte_power_pause(const uint64_t tsc_timestamp) +{ + const uint32_t tsc_l = (uint32_t)tsc_timestamp; + const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32); + + /* execute TPAUSE */ + asm volatile(".byte 0x66, 0x0f, 0xae, 0xf7;" + : /* ignore rflags */ + : "D"(0), /* enter C0.2 */ + "a"(tsc_l), "d"(tsc_h)); +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_POWER_INTRINSIC_X86_H_ */ From patchwork Tue Oct 27 14:59:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 82335 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 16764A04B5; Tue, 27 Oct 2020 16:00:29 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 498A06966; Tue, 27 Oct 2020 15:59:39 +0100 (CET) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 8B4945A8C for ; Tue, 27 Oct 2020 15:59:33 +0100 (CET) IronPort-SDR: +oESneGXnuRnX32/Spo4PxGlI/1BQzCW9FTmfXfJHu5EP/OZjDlDbt9aBvMVLAiKrutGsEDzAk UXJR1AlCLIsg== X-IronPort-AV: E=McAfee;i="6000,8403,9786"; a="185839905" X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="185839905" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2020 07:59:32 -0700 IronPort-SDR: Wyotn1pW2NprgklNpqfm7mQYygjZpfvRkQFqAGh6vriT6mGkZ5NW7wLVMbh5h09oazPUO9EH3g /a2edXI0iqnw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="394507268" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga001.jf.intel.com with ESMTP; 27 Oct 2020 07:59:28 -0700 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 09RExRok015670; Tue, 27 Oct 2020 14:59:27 GMT Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 09RExRAw022603; Tue, 27 Oct 2020 14:59:27 GMT Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 09RExRSA022599; Tue, 27 Oct 2020 14:59:27 GMT From: Liang Ma To: dev@dpdk.org Cc: ruifeng.wang@arm.com, haiyue.wang@intel.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, david.hunt@intel.com, jerinjacobk@gmail.com, nhorman@tuxdriver.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, mw@semihalf.com, gtzalik@amazon.com, ajit.khaparde@broadcom.com, hkalra@marvell.com, johndale@cisco.com, xavier.huwei@huawei.com, xuanziyang2@huawei.com, matan@nvidia.com, yongwang@vmware.com, Anatoly Burakov , Jerin Jacob , Jan Viktorin , David Christensen , Ray Kinsella Date: Tue, 27 Oct 2020 14:59:03 +0000 Message-Id: <1603810749-22285-4-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1603810749-22285-1-git-send-email-liang.j.ma@intel.com> References: <1603810749-22285-1-git-send-email-liang.j.ma@intel.com> In-Reply-To: <1603494392-7181-1-git-send-email-liang.j.ma@intel.com> References: <1603494392-7181-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [PATCH v10 3/9] eal: add intrinsics support check infrastructure X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Currently, it is not possible to check support for intrinsics that are platform-specific, cannot be abstracted in a generic way, or do not have support on all architectures. The CPUID flags can be used to some extent, but they are only defined for their platform, while intrinsics will be available to all code as they are in generic headers. This patch introduces infrastructure to check support for certain platform-specific intrinsics, and adds support for checking support for IA power management-related intrinsics for UMWAIT/UMONITOR and TPAUSE. Signed-off-by: Anatoly Burakov Signed-off-by: Liang Ma Acked-by: David Christensen Acked-by: Jerin Jacob Acked-by: Ruifeng Wang Acked-by: Ray Kinsella --- Notes: v8: - Rename eal version.map v6: - Fix the comments --- lib/librte_eal/arm/rte_cpuflags.c | 6 +++++ lib/librte_eal/include/generic/rte_cpuflags.h | 26 +++++++++++++++++++ .../include/generic/rte_power_intrinsics.h | 12 +++++++++ lib/librte_eal/ppc/rte_cpuflags.c | 7 +++++ lib/librte_eal/version.map | 1 + lib/librte_eal/x86/rte_cpuflags.c | 12 +++++++++ 6 files changed, 64 insertions(+) diff --git a/lib/librte_eal/arm/rte_cpuflags.c b/lib/librte_eal/arm/rte_cpuflags.c index 7b257b7873..e3a53bcece 100644 --- a/lib/librte_eal/arm/rte_cpuflags.c +++ b/lib/librte_eal/arm/rte_cpuflags.c @@ -151,3 +151,9 @@ rte_cpu_get_flag_name(enum rte_cpu_flag_t feature) return NULL; return rte_cpu_feature_table[feature].name; } + +void +rte_cpu_get_intrinsics_support(struct rte_cpu_intrinsics *intrinsics) +{ + memset(intrinsics, 0, sizeof(*intrinsics)); +} diff --git a/lib/librte_eal/include/generic/rte_cpuflags.h b/lib/librte_eal/include/generic/rte_cpuflags.h index 872f0ebe3e..28a5aecde8 100644 --- a/lib/librte_eal/include/generic/rte_cpuflags.h +++ b/lib/librte_eal/include/generic/rte_cpuflags.h @@ -13,6 +13,32 @@ #include "rte_common.h" #include +#include + +/** + * Structure used to describe platform-specific intrinsics that may or may not + * be supported at runtime. + */ +struct rte_cpu_intrinsics { + uint32_t power_monitor : 1; + /**< indicates support for rte_power_monitor function */ + uint32_t power_pause : 1; + /**< indicates support for rte_power_pause function */ +}; + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Check CPU support for various intrinsics at runtime. + * + * @param intrinsics + * Pointer to a structure to be filled. + */ +__rte_experimental +void +rte_cpu_get_intrinsics_support(struct rte_cpu_intrinsics *intrinsics); + /** * Enumeration of all CPU features supported */ diff --git a/lib/librte_eal/include/generic/rte_power_intrinsics.h b/lib/librte_eal/include/generic/rte_power_intrinsics.h index fb897d9060..03a326f076 100644 --- a/lib/librte_eal/include/generic/rte_power_intrinsics.h +++ b/lib/librte_eal/include/generic/rte_power_intrinsics.h @@ -32,6 +32,10 @@ * checked against the expected value, and if they match, the entering of * optimized power state may be aborted. * + * @warning It is responsibility of the user to check if this function is + * supported at runtime using `rte_cpu_get_features()` API call. Failing to do + * so may result in an illegal CPU instruction error. + * * @param p * Address to monitor for changes. * @param expected_value @@ -69,6 +73,10 @@ static inline void rte_power_monitor(const volatile void *p, * This call will also lock a spinlock on entering sleep, and release it on * waking up the CPU. * + * @warning It is responsibility of the user to check if this function is + * supported at runtime using `rte_cpu_get_features()` API call. Failing to do + * so may result in an illegal CPU instruction error. + * * @param p * Address to monitor for changes. * @param expected_value @@ -101,6 +109,10 @@ static inline void rte_power_monitor_sync(const volatile void *p, * Enter an architecture-defined optimized power state until a certain TSC * timestamp is reached. * + * @warning It is responsibility of the user to check if this function is + * supported at runtime using `rte_cpu_get_features()` API call. Failing to do + * so may result in an illegal CPU instruction error. + * * @param tsc_timestamp * Maximum TSC timestamp to wait for. Note that the wait behavior is * architecture-dependent. diff --git a/lib/librte_eal/ppc/rte_cpuflags.c b/lib/librte_eal/ppc/rte_cpuflags.c index 3bb7563ce9..61db5c216d 100644 --- a/lib/librte_eal/ppc/rte_cpuflags.c +++ b/lib/librte_eal/ppc/rte_cpuflags.c @@ -8,6 +8,7 @@ #include #include #include +#include #include /* Symbolic values for the entries in the auxiliary table */ @@ -108,3 +109,9 @@ rte_cpu_get_flag_name(enum rte_cpu_flag_t feature) return NULL; return rte_cpu_feature_table[feature].name; } + +void +rte_cpu_get_intrinsics_support(struct rte_cpu_intrinsics *intrinsics) +{ + memset(intrinsics, 0, sizeof(*intrinsics)); +} diff --git a/lib/librte_eal/version.map b/lib/librte_eal/version.map index c23ff57ce6..269cdccfd3 100644 --- a/lib/librte_eal/version.map +++ b/lib/librte_eal/version.map @@ -402,6 +402,7 @@ EXPERIMENTAL { rte_service_lcore_may_be_active; rte_vect_get_max_simd_bitwidth; rte_vect_set_max_simd_bitwidth; + rte_cpu_get_intrinsics_support; }; INTERNAL { diff --git a/lib/librte_eal/x86/rte_cpuflags.c b/lib/librte_eal/x86/rte_cpuflags.c index 0325c4b93b..a96312ff7f 100644 --- a/lib/librte_eal/x86/rte_cpuflags.c +++ b/lib/librte_eal/x86/rte_cpuflags.c @@ -7,6 +7,7 @@ #include #include #include +#include #include "rte_cpuid.h" @@ -179,3 +180,14 @@ rte_cpu_get_flag_name(enum rte_cpu_flag_t feature) return NULL; return rte_cpu_feature_table[feature].name; } + +void +rte_cpu_get_intrinsics_support(struct rte_cpu_intrinsics *intrinsics) +{ + memset(intrinsics, 0, sizeof(*intrinsics)); + + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_WAITPKG)) { + intrinsics->power_monitor = 1; + intrinsics->power_pause = 1; + } +} From patchwork Tue Oct 27 14:59:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 82336 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 09067A04B5; Tue, 27 Oct 2020 16:01:18 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id B23EC6C96; Tue, 27 Oct 2020 15:59:56 +0100 (CET) Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id A86056A1C for ; Tue, 27 Oct 2020 15:59:53 +0100 (CET) IronPort-SDR: 4/Dr3k4NAuWJOFM48LROGtEzUaoDKKDtqe6dyuO2glEkznMgVOYnoC2NVwOzVCGN2YKIfydAod 43PboviV7DaQ== X-IronPort-AV: E=McAfee;i="6000,8403,9786"; a="168193245" X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="168193245" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2020 07:59:52 -0700 IronPort-SDR: xfZ8S7rOwHcvwR9GJ3bZvb+50gGQ3WKGpw9vqRwgvB/oO6UeKIqyPFgojZ/lYgqMM4k5l0JK0M 37YhYUI0p+EA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="303779189" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga008.fm.intel.com with ESMTP; 27 Oct 2020 07:59:47 -0700 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 09RExjiH015721; Tue, 27 Oct 2020 14:59:45 GMT Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 09RExjdS022718; Tue, 27 Oct 2020 14:59:45 GMT Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 09RExjoW022706; Tue, 27 Oct 2020 14:59:45 GMT From: Liang Ma To: dev@dpdk.org Cc: ruifeng.wang@arm.com, haiyue.wang@intel.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, david.hunt@intel.com, jerinjacobk@gmail.com, nhorman@tuxdriver.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, mw@semihalf.com, gtzalik@amazon.com, ajit.khaparde@broadcom.com, hkalra@marvell.com, johndale@cisco.com, xavier.huwei@huawei.com, xuanziyang2@huawei.com, matan@nvidia.com, yongwang@vmware.com, Anatoly Burakov , Ferruh Yigit , Andrew Rybchenko , Ray Kinsella Date: Tue, 27 Oct 2020 14:59:04 +0000 Message-Id: <1603810749-22285-5-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1603810749-22285-1-git-send-email-liang.j.ma@intel.com> References: <1603810749-22285-1-git-send-email-liang.j.ma@intel.com> In-Reply-To: <1603494392-7181-1-git-send-email-liang.j.ma@intel.com> References: <1603494392-7181-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [PATCH v10 4/9] ethdev: add simple power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add a simple API to allow getting address of next RX descriptor from the PMD, as well as release notes information. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev --- Notes: v10: - Address minor issue on comments and release notes v8: - Rename version map file name v7: - Fixed race condition (Konstantin) - Slight rework of the structure of monitor code - Added missing inline for wakeup v6: - Added wakeup mechanism for UMWAIT - Removed memory allocation (everything is now allocated statically) - Fixed various typos and comments - Check for invalid queue ID - Moved release notes to this patch v5: - Make error checking more robust - Prevent initializing scaling if ACPI or PSTATE env wasn't set - Prevent initializing UMWAIT path if PMD doesn't support get_wake_addr - Add some debug logging - Replace x86-specific code path to generic path using the intrinsic check --- doc/guides/rel_notes/release_20_11.rst | 4 ++++ lib/librte_ethdev/rte_ethdev.c | 23 ++++++++++++++++++ lib/librte_ethdev/rte_ethdev.h | 33 ++++++++++++++++++++++++++ lib/librte_ethdev/rte_ethdev_driver.h | 28 ++++++++++++++++++++++ lib/librte_ethdev/version.map | 1 + 5 files changed, 89 insertions(+) diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst index dca8d41eb6..2bdc8f9948 100644 --- a/doc/guides/rel_notes/release_20_11.rst +++ b/doc/guides/rel_notes/release_20_11.rst @@ -139,6 +139,10 @@ New Features Hairpin Tx part flow rules can be inserted explicitly. New API is added to get the hairpin peer ports list. +* **ethdev: added 1 new API for PMD power management.** + + * ``rte_eth_get_wake_addr()`` is added to get the wake up address from device. + * **Updated Broadcom bnxt driver.** Updated the Broadcom bnxt driver with new features and improvements, including: diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index b12bb3854d..4f3115fe8e 100644 --- a/lib/librte_ethdev/rte_ethdev.c +++ b/lib/librte_ethdev/rte_ethdev.c @@ -5138,6 +5138,29 @@ rte_eth_tx_burst_mode_get(uint16_t port_id, uint16_t queue_id, dev->dev_ops->tx_burst_mode_get(dev, queue_id, mode)); } +int +rte_eth_get_wake_addr(uint16_t port_id, uint16_t queue_id, + volatile void **wake_addr, uint64_t *expected, uint64_t *mask, + uint8_t *data_sz) +{ + struct rte_eth_dev *dev; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); + + dev = &rte_eth_devices[port_id]; + + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_wake_addr, -ENOTSUP); + + if (queue_id >= dev->data->nb_rx_queues) { + RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id); + return -EINVAL; + } + + return eth_err(port_id, + dev->dev_ops->get_wake_addr(dev->data->rx_queues[queue_id], + wake_addr, expected, mask, data_sz)); +} + int rte_eth_dev_set_mc_addr_list(uint16_t port_id, struct rte_ether_addr *mc_addr_set, diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h index e341a08817..e522d5d433 100644 --- a/lib/librte_ethdev/rte_ethdev.h +++ b/lib/librte_ethdev/rte_ethdev.h @@ -4364,6 +4364,39 @@ __rte_experimental int rte_eth_tx_burst_mode_get(uint16_t port_id, uint16_t queue_id, struct rte_eth_burst_mode *mode); +/** + * Retrieve the wake up address for device queue or other data structure. + * Depend on the HW design, any write back mechanism can be used as a signal + * to wake up the processor. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param queue_id + * The Rx queue on the Ethernet device for which information will be + * retrieved. + * @param wake_addr + * The pointer to the address which will be monitored and filled by driver. + * @param expected + * The pointer to value to be expected when target address is set. + * filled by driver. + * @param mask + * The pointer to comparison bitmask for the expected value that is filled by + * driver. + * @param data_sz + * The pointer to data size for the expected value and comparison bitmask that + * is filled by driver. + * + * @return + * - 0: Success. + * -ENOTSUP: Operation not supported. + * -EINVAL: Invalid parameters. + * -ENODEV: Invalid port ID. + */ +__rte_experimental +int rte_eth_get_wake_addr(uint16_t port_id, uint16_t queue_id, + volatile void **wake_addr, uint64_t *expected, uint64_t *mask, + uint8_t *data_sz); + /** * Retrieve device registers and register attributes (number of registers and * register size) diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h index c63b9f7eb7..d7548dfe74 100644 --- a/lib/librte_ethdev/rte_ethdev_driver.h +++ b/lib/librte_ethdev/rte_ethdev_driver.h @@ -752,6 +752,32 @@ typedef int (*eth_hairpin_queue_peer_unbind_t) (struct rte_eth_dev *dev, uint16_t cur_queue, uint32_t direction); /**< @internal Unbind peer queue from the current queue. */ +/** + * @internal + * Get address of memory location whose contents will change whenever there is + * new data to be received on an RX queue. + * + * @param rxq + * Ethdev queue pointer. + * @param tail_desc_addr + * The pointer point to where the address will be stored. + * @param expected + * The pointer point to value to be expected when descriptor is set. + * @param mask + * The pointer point to comparison bitmask for the expected value. + * @param data_sz + * Data size for the expected value (can be 1, 2, 4, or 8 bytes) + * @return + * Negative errno value on error, 0 on success. + * + * @retval 0 + * Success + * @retval -EINVAL + * Invalid parameters + */ +typedef int (*eth_get_wake_addr_t)(void *rxq, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t *data_sz); + /** * @internal A structure containing the functions exported by an Ethernet driver. */ @@ -910,6 +936,8 @@ struct eth_dev_ops { /**< Set up the connection between the pair of hairpin queues. */ eth_hairpin_queue_peer_unbind_t hairpin_queue_peer_unbind; /**< Disconnect the hairpin queues of a pair from each other. */ + eth_get_wake_addr_t get_wake_addr; + /**< Get next RX queue ring entry address. */ }; /** diff --git a/lib/librte_ethdev/version.map b/lib/librte_ethdev/version.map index 8ddda2547f..f9ce4e3c8d 100644 --- a/lib/librte_ethdev/version.map +++ b/lib/librte_ethdev/version.map @@ -235,6 +235,7 @@ EXPERIMENTAL { rte_eth_fec_get_capability; rte_eth_fec_get; rte_eth_fec_set; + rte_eth_get_wake_addr; rte_flow_shared_action_create; rte_flow_shared_action_destroy; rte_flow_shared_action_query; From patchwork Tue Oct 27 14:59:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 82337 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 09364A04B5; Tue, 27 Oct 2020 16:01:47 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 0AEA772E3; Tue, 27 Oct 2020 16:00:01 +0100 (CET) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id AC4186CAC for ; Tue, 27 Oct 2020 15:59:57 +0100 (CET) IronPort-SDR: JfkW4LKTHj6O8hlVdv94AMf0x6kVpEFLm3RmT/nsPjt0DOCcZxu2mW/wS2RNX9p1Y0CxJgvj31 pQBtF4aCDhRg== X-IronPort-AV: E=McAfee;i="6000,8403,9786"; a="168220851" X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="168220851" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2020 07:59:55 -0700 IronPort-SDR: mT82enQQj4sySerWXlPxdWI1v0kLP6n7MWOlKw1cMKhzUtUyvID8VrgTHzWiUoodjOSFOTRx30 HhnmNEL4kiSQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="394507327" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga001.jf.intel.com with ESMTP; 27 Oct 2020 07:59:51 -0700 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 09RExp0T015764; Tue, 27 Oct 2020 14:59:51 GMT Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 09RExpFX022802; Tue, 27 Oct 2020 14:59:51 GMT Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 09RExpE0022798; Tue, 27 Oct 2020 14:59:51 GMT From: Liang Ma To: dev@dpdk.org Cc: ruifeng.wang@arm.com, haiyue.wang@intel.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, david.hunt@intel.com, jerinjacobk@gmail.com, nhorman@tuxdriver.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, mw@semihalf.com, gtzalik@amazon.com, ajit.khaparde@broadcom.com, hkalra@marvell.com, johndale@cisco.com, xavier.huwei@huawei.com, xuanziyang2@huawei.com, matan@nvidia.com, yongwang@vmware.com, Anatoly Burakov , Ray Kinsella Date: Tue, 27 Oct 2020 14:59:05 +0000 Message-Id: <1603810749-22285-6-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1603810749-22285-1-git-send-email-liang.j.ma@intel.com> References: <1603810749-22285-1-git-send-email-liang.j.ma@intel.com> In-Reply-To: <1603494392-7181-1-git-send-email-liang.j.ma@intel.com> References: <1603494392-7181-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [PATCH v10 5/9] power: add PMD power management API and callback X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add a simple on/off switch that will enable saving power when no packets are arriving. It is based on counting the number of empty polls and, when the number reaches a certain threshold, entering an architecture-defined optimized power state that will either wait until a TSC timestamp expires, or when packets arrive. This API mandates a core-to-single-queue mapping (that is, multiple queued per device are supported, but they have to be polled on different cores). This design is using PMD RX callbacks. 1. UMWAIT/UMONITOR: When a certain threshold of empty polls is reached, the core will go into a power optimized sleep while waiting on an address of next RX descriptor to be written to. 2. Pause instruction Instead of move the core into deeper C state, this method uses the pause instruction to avoid busy polling. 3. Frequency scaling Reuse existing DPDK power library to scale up/down core frequency depending on traffic volume. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: David Hunt Acked-by: Konstantin Ananyev --- Notes: v10: - Updated power library document v8: - Rename version map file name v7: - Fixed race condition (Konstantin) - Slight rework of the structure of monitor code - Added missing inline for wakeup v6: - Added wakeup mechanism for UMWAIT - Removed memory allocation (everything is now allocated statically) - Fixed various typos and comments - Check for invalid queue ID - Moved release notes to this patch v5: - Make error checking more robust - Prevent initializing scaling if ACPI or PSTATE env wasn't set - Prevent initializing UMWAIT path if PMD doesn't support get_wake_addr - Add some debug logging - Replace x86-specific code path to generic path using the intrinsic check --- doc/guides/prog_guide/power_man.rst | 48 ++++ doc/guides/rel_notes/release_20_11.rst | 11 + lib/librte_power/meson.build | 5 +- lib/librte_power/rte_power_pmd_mgmt.c | 320 +++++++++++++++++++++++++ lib/librte_power/rte_power_pmd_mgmt.h | 92 +++++++ lib/librte_power/version.map | 4 + 6 files changed, 478 insertions(+), 2 deletions(-) create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst index 0a3755a901..1b1064c749 100644 --- a/doc/guides/prog_guide/power_man.rst +++ b/doc/guides/prog_guide/power_man.rst @@ -192,6 +192,51 @@ User Cases ---------- The mechanism can applied to any device which is based on polling. e.g. NIC, FPGA. +PMD Power Management API +------------------------ + +Abstract +~~~~~~~~ + +Existing power management mechanisms require developers to change application +design or change code to make use of it. The PMD power management API provides a +convenient alternative by utilizing Ethernet PMD RX callbacks, and triggering +power saving whenever empty poll count reaches a certain number. + +There are multiple power saving schemes available for developer to choose. +Although developer can configure each queue with different scheme, however, +It's strongly recommend to configure the queue within same port with same scheme. + + * UMWAIT/UMONITOR + + This power saving scheme will put the CPU into optimized power state and use + the UMWAIT/UMONITOR instructions to monitor the Ethernet PMD RX descriptor + address, and wake the CPU up whenever there's new traffic. + + * Pause + + This power saving scheme will use the `rte_pause` function to avoid busy + polling. + + * Frequency scaling + + This power saving scheme will use existing power library functionality to + scale the core frequency up/down depending on traffic volume. + + +.. note:: + + Currently, this power management API is limited to mandatory mapping of 1 + queue to 1 core (multiple queues are supported, but they must be polled from + different cores). + +API Overview for PMD Power Management +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +* **Queue Enable**: Enable specific power scheme for certain queue/port/core + +* **Queue Disable**: Disable power scheme for certain queue/port/core + References ---------- @@ -200,3 +245,6 @@ References * The :doc:`../sample_app_ug/vm_power_management` chapter in the :doc:`../sample_app_ug/index` section. + +* The :doc:`../sample_app_ug/rxtx_callbacks` + chapter in the :doc:`../sample_app_ug/index` section. diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst index 2bdc8f9948..5fd8c16025 100644 --- a/doc/guides/rel_notes/release_20_11.rst +++ b/doc/guides/rel_notes/release_20_11.rst @@ -349,6 +349,17 @@ New Features * Replaced ``--scalar`` command-line option with ``--alg=``, to allow the user to select the desired classify method. +* **Add PMD power management mechanism** + + 3 new Ethernet PMD power management mechanism is added through existing + RX callback infrastructure. + + * Add power saving scheme based on UMWAIT instruction (x86 only) + * Add power saving scheme based on ``rte_pause()`` + * Add power saving scheme based on frequency scaling through the power library + * Add new EXPERIMENTAL API ``rte_power_pmd_mgmt_queue_enable()`` + * Add new EXPERIMENTAL API ``rte_power_pmd_mgmt_queue_disable()`` + Removed Items ------------- diff --git a/lib/librte_power/meson.build b/lib/librte_power/meson.build index 78c031c943..cc3c7a8646 100644 --- a/lib/librte_power/meson.build +++ b/lib/librte_power/meson.build @@ -9,6 +9,7 @@ sources = files('rte_power.c', 'power_acpi_cpufreq.c', 'power_kvm_vm.c', 'guest_channel.c', 'rte_power_empty_poll.c', 'power_pstate_cpufreq.c', + 'rte_power_pmd_mgmt.c', 'power_common.c') -headers = files('rte_power.h','rte_power_empty_poll.h') -deps += ['timer'] +headers = files('rte_power.h','rte_power_empty_poll.h','rte_power_pmd_mgmt.h') +deps += ['timer' ,'ethdev'] diff --git a/lib/librte_power/rte_power_pmd_mgmt.c b/lib/librte_power/rte_power_pmd_mgmt.c new file mode 100644 index 0000000000..0dcaddc3bd --- /dev/null +++ b/lib/librte_power/rte_power_pmd_mgmt.c @@ -0,0 +1,320 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2020 Intel Corporation + */ + +#include +#include +#include +#include +#include +#include + +#include "rte_power_pmd_mgmt.h" + +#define EMPTYPOLL_MAX 512 + +/** + * Possible power management states of an ethdev port. + */ +enum pmd_mgmt_state { + /** Device power management is disabled. */ + PMD_MGMT_DISABLED = 0, + /** Device power management is enabled. */ + PMD_MGMT_ENABLED, +}; + +struct pmd_queue_cfg { + enum pmd_mgmt_state pwr_mgmt_state; + /**< State of power management for this queue */ + enum rte_power_pmd_mgmt_type cb_mode; + /**< Callback mode for this queue */ + const struct rte_eth_rxtx_callback *cur_cb; + /**< Callback instance */ + rte_spinlock_t umwait_lock; + /**< Per-queue status lock - used only for UMWAIT mode */ + volatile void *wait_addr; + /**< UMWAIT wakeup address */ + uint64_t empty_poll_stats; + /**< Number of empty polls */ +} __rte_cache_aligned; + +static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT]; + +/* trigger a write to the cache line we're waiting on */ +static inline void +umwait_wakeup(volatile void *addr) +{ + uint64_t val; + + val = __atomic_load_n((volatile uint64_t *)addr, __ATOMIC_RELAXED); + __atomic_compare_exchange_n((volatile uint64_t *)addr, &val, val, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED); +} + +static inline void +umwait_sleep(struct pmd_queue_cfg *q_conf, uint16_t port_id, uint16_t qidx) +{ + volatile void *target_addr; + uint64_t expected, mask; + uint8_t data_sz; + uint16_t ret; + + /* + * get wake up address for this RX queue, as well as expected value, + * comparison mask, and data size. + */ + ret = rte_eth_get_wake_addr(port_id, qidx, &target_addr, + &expected, &mask, &data_sz); + + /* this should always succeed as all checks have been done already */ + if (unlikely(ret != 0)) + return; + + /* + * take out a spinlock to prevent control plane from concurrently + * modifying the wakeup data. + */ + rte_spinlock_lock(&q_conf->umwait_lock); + + /* have we been disabled by control plane? */ + if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) { + /* we're good to go */ + + /* + * store the wakeup address so that control plane can trigger a + * write to this address and wake us up. + */ + q_conf->wait_addr = target_addr; + /* -1ULL is maximum value for TSC */ + rte_power_monitor_sync(target_addr, expected, mask, -1ULL, + data_sz, &q_conf->umwait_lock); + /* erase the address */ + q_conf->wait_addr = NULL; + } + rte_spinlock_unlock(&q_conf->umwait_lock); +} + +static uint16_t +clb_umwait(uint16_t port_id, uint16_t qidx, + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, + uint16_t max_pkts __rte_unused, void *addr __rte_unused) +{ + + struct pmd_queue_cfg *q_conf; + + q_conf = &port_cfg[port_id][qidx]; + + if (unlikely(nb_rx == 0)) { + q_conf->empty_poll_stats++; + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) + umwait_sleep(q_conf, port_id, qidx); + } else + q_conf->empty_poll_stats = 0; + + return nb_rx; +} + +static uint16_t +clb_pause(uint16_t port_id, uint16_t qidx, + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, + uint16_t max_pkts __rte_unused, void *addr __rte_unused) +{ + struct pmd_queue_cfg *q_conf; + + q_conf = &port_cfg[port_id][qidx]; + + if (unlikely(nb_rx == 0)) { + q_conf->empty_poll_stats++; + /* sleep for 1 microsecond */ + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) + rte_delay_us(1); + } else + q_conf->empty_poll_stats = 0; + + return nb_rx; +} + +static uint16_t +clb_scale_freq(uint16_t port_id, uint16_t qidx, + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, + uint16_t max_pkts __rte_unused, void *_ __rte_unused) +{ + struct pmd_queue_cfg *q_conf; + + q_conf = &port_cfg[port_id][qidx]; + + if (unlikely(nb_rx == 0)) { + q_conf->empty_poll_stats++; + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) + /* scale down freq */ + rte_power_freq_min(rte_lcore_id()); + } else { + q_conf->empty_poll_stats = 0; + /* scale up freq */ + rte_power_freq_max(rte_lcore_id()); + } + + return nb_rx; +} + +int +rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id, + uint16_t port_id, uint16_t queue_id, + enum rte_power_pmd_mgmt_type mode) +{ + struct rte_eth_dev *dev; + struct pmd_queue_cfg *queue_cfg; + int ret; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); + dev = &rte_eth_devices[port_id]; + + /* check if queue id is valid */ + if (queue_id >= dev->data->nb_rx_queues || + queue_id >= RTE_MAX_QUEUES_PER_PORT) { + return -EINVAL; + } + + queue_cfg = &port_cfg[port_id][queue_id]; + + if (queue_cfg->pwr_mgmt_state == PMD_MGMT_ENABLED) { + ret = -EINVAL; + goto end; + } + + switch (mode) { + case RTE_POWER_MGMT_TYPE_WAIT: + { + /* check if rte_power_monitor is supported */ + uint64_t dummy_expected, dummy_mask; + struct rte_cpu_intrinsics i; + volatile void *dummy_addr; + uint8_t dummy_sz; + + rte_cpu_get_intrinsics_support(&i); + + if (!i.power_monitor) { + RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n"); + ret = -ENOTSUP; + goto end; + } + + /* check if the device supports the necessary PMD API */ + if (rte_eth_get_wake_addr(port_id, queue_id, + &dummy_addr, &dummy_expected, + &dummy_mask, &dummy_sz) == -ENOTSUP) { + RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_rxq_ring_addr_get\n"); + ret = -ENOTSUP; + goto end; + } + /* initialize UMWAIT spinlock */ + rte_spinlock_init(&queue_cfg->umwait_lock); + + /* initialize data before enabling the callback */ + queue_cfg->empty_poll_stats = 0; + queue_cfg->cb_mode = mode; + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; + + queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id, + clb_umwait, NULL); + break; + } + case RTE_POWER_MGMT_TYPE_SCALE: + { + enum power_management_env env; + /* only PSTATE and ACPI modes are supported */ + if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) && + !rte_power_check_env_supported( + PM_ENV_PSTATE_CPUFREQ)) { + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are supported\n"); + ret = -ENOTSUP; + goto end; + } + /* ensure we could initialize the power library */ + if (rte_power_init(lcore_id)) { + ret = -EINVAL; + goto end; + } + /* ensure we initialized the correct env */ + env = rte_power_get_env(); + if (env != PM_ENV_ACPI_CPUFREQ && + env != PM_ENV_PSTATE_CPUFREQ) { + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes were initialized\n"); + ret = -ENOTSUP; + goto end; + } + /* initialize data before enabling the callback */ + queue_cfg->empty_poll_stats = 0; + queue_cfg->cb_mode = mode; + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; + + queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, + queue_id, clb_scale_freq, NULL); + break; + } + case RTE_POWER_MGMT_TYPE_PAUSE: + /* initialize data before enabling the callback */ + queue_cfg->empty_poll_stats = 0; + queue_cfg->cb_mode = mode; + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; + + queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id, + clb_pause, NULL); + break; + } + ret = 0; + +end: + return ret; +} + +int +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id, + uint16_t port_id, uint16_t queue_id) +{ + struct pmd_queue_cfg *queue_cfg; + int ret; + + queue_cfg = &port_cfg[port_id][queue_id]; + + if (queue_cfg->pwr_mgmt_state == PMD_MGMT_DISABLED) { + ret = -EINVAL; + goto end; + } + + switch (queue_cfg->cb_mode) { + case RTE_POWER_MGMT_TYPE_WAIT: + rte_spinlock_lock(&queue_cfg->umwait_lock); + + /* wake up the core from UMWAIT sleep, if any */ + if (queue_cfg->wait_addr != NULL) + umwait_wakeup(queue_cfg->wait_addr); + /* + * we need to disable early as there might be callback currently + * spinning on a lock. + */ + queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED; + + rte_spinlock_unlock(&queue_cfg->umwait_lock); + /* fall-through */ + case RTE_POWER_MGMT_TYPE_PAUSE: + rte_eth_remove_rx_callback(port_id, queue_id, + queue_cfg->cur_cb); + break; + case RTE_POWER_MGMT_TYPE_SCALE: + rte_power_freq_max(lcore_id); + rte_eth_remove_rx_callback(port_id, queue_id, + queue_cfg->cur_cb); + rte_power_exit(lcore_id); + break; + } + /* + * we don't free the RX callback here because it is unsafe to do so + * unless we know for a fact that all data plane threads have stopped. + */ + queue_cfg->cur_cb = NULL; + queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED; + ret = 0; +end: + return ret; +} diff --git a/lib/librte_power/rte_power_pmd_mgmt.h b/lib/librte_power/rte_power_pmd_mgmt.h new file mode 100644 index 0000000000..a7a3f98268 --- /dev/null +++ b/lib/librte_power/rte_power_pmd_mgmt.h @@ -0,0 +1,92 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2020 Intel Corporation + */ + +#ifndef _RTE_POWER_PMD_MGMT_H +#define _RTE_POWER_PMD_MGMT_H + +/** + * @file + * RTE PMD Power Management + */ +#include +#include + +#include +#include +#include +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * PMD Power Management Type + */ +enum rte_power_pmd_mgmt_type { + /** WAIT callback mode. */ + RTE_POWER_MGMT_TYPE_WAIT = 1, + /** PAUSE callback mode. */ + RTE_POWER_MGMT_TYPE_PAUSE, + /** Freq Scaling callback mode. */ + RTE_POWER_MGMT_TYPE_SCALE, +}; + +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice + * + * Setup per-queue power management callback. + * + * @note This function is not thread-safe. + * + * @param lcore_id + * lcore_id. + * @param port_id + * The port identifier of the Ethernet device. + * @param queue_id + * The queue identifier of the Ethernet device. + * @param mode + * The power management callback function type. + + * @return + * 0 on success + * <0 on error + */ +__rte_experimental +int +rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id, + uint16_t port_id, + uint16_t queue_id, + enum rte_power_pmd_mgmt_type mode); + +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice + * + * Remove per-queue power management callback. + * + * @note This function is not thread-safe. + * + * @param lcore_id + * lcore_id. + * @param port_id + * The port identifier of the Ethernet device. + * @param queue_id + * The queue identifier of the Ethernet device. + * @return + * 0 on success + * <0 on error + */ +__rte_experimental +int +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id, + uint16_t port_id, + uint16_t queue_id); +#ifdef __cplusplus +} +#endif + +#endif diff --git a/lib/librte_power/version.map b/lib/librte_power/version.map index 69ca9af616..3f2f6cd6f6 100644 --- a/lib/librte_power/version.map +++ b/lib/librte_power/version.map @@ -34,4 +34,8 @@ EXPERIMENTAL { rte_power_guest_channel_receive_msg; rte_power_poll_stat_fetch; rte_power_poll_stat_update; + # added in 20.11 + rte_power_pmd_mgmt_queue_enable; + rte_power_pmd_mgmt_queue_disable; + }; From patchwork Tue Oct 27 14:59:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 82338 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 75174A04B5; Tue, 27 Oct 2020 16:02:12 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 010D272EF; Tue, 27 Oct 2020 16:00:05 +0100 (CET) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 073E672EC for ; Tue, 27 Oct 2020 16:00:01 +0100 (CET) IronPort-SDR: SmrqSJxxsuWBqJ/4atAYQyhOnFOgtRJ5Tncso5gH4z8yUDwP8qOpu5c0dSWLOZLhC6q4Epyns0 ouiNXXwM0k1A== X-IronPort-AV: E=McAfee;i="6000,8403,9786"; a="167314115" X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="167314115" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2020 08:00:00 -0700 IronPort-SDR: 00e16BHSm2ly1NYo6jhleKCEvRfiySRi6zkjP2vL6PUZqfXghX2Cozh3QQuO3wFFwhZW7f00SV zRx5/6gi6vmw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="524754983" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga006.fm.intel.com with ESMTP; 27 Oct 2020 07:59:56 -0700 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 09RExt9v015772; Tue, 27 Oct 2020 14:59:55 GMT Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 09RExtO6022859; Tue, 27 Oct 2020 14:59:55 GMT Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 09RExtaB022855; Tue, 27 Oct 2020 14:59:55 GMT From: Liang Ma To: dev@dpdk.org Cc: ruifeng.wang@arm.com, haiyue.wang@intel.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, david.hunt@intel.com, jerinjacobk@gmail.com, nhorman@tuxdriver.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, mw@semihalf.com, gtzalik@amazon.com, ajit.khaparde@broadcom.com, hkalra@marvell.com, johndale@cisco.com, xavier.huwei@huawei.com, xuanziyang2@huawei.com, matan@nvidia.com, yongwang@vmware.com, Anatoly Burakov , Jeff Guo Date: Tue, 27 Oct 2020 14:59:06 +0000 Message-Id: <1603810749-22285-7-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1603810749-22285-1-git-send-email-liang.j.ma@intel.com> References: <1603810749-22285-1-git-send-email-liang.j.ma@intel.com> In-Reply-To: <1603494392-7181-1-git-send-email-liang.j.ma@intel.com> References: <1603494392-7181-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [PATCH v10 6/9] net/ixgbe: implement power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Implement support for the power management API by implementing a `get_wake_addr` function that will return an address of an RX ring's status bit. Signed-off-by: Anatoly Burakov Signed-off-by: Liang Ma Acked-by: Konstantin Ananyev --- drivers/net/ixgbe/ixgbe_ethdev.c | 1 + drivers/net/ixgbe/ixgbe_rxtx.c | 25 +++++++++++++++++++++++++ drivers/net/ixgbe/ixgbe_rxtx.h | 2 ++ 3 files changed, 28 insertions(+) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 00101c2eec..fcc4026372 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -588,6 +588,7 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = { .udp_tunnel_port_del = ixgbe_dev_udp_tunnel_port_del, .tm_ops_get = ixgbe_tm_ops_get, .tx_done_cleanup = ixgbe_dev_tx_done_cleanup, + .get_wake_addr = ixgbe_get_wake_addr, }; /* diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index 5f19972031..305822836d 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.c +++ b/drivers/net/ixgbe/ixgbe_rxtx.c @@ -1367,6 +1367,31 @@ const uint32_t RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP, }; +int ixgbe_get_wake_addr(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t *data_sz) +{ + volatile union ixgbe_adv_rx_desc *rxdp; + struct ixgbe_rx_queue *rxq = rx_queue; + uint16_t desc; + + desc = rxq->rx_tail; + rxdp = &rxq->rx_ring[desc]; + /* watch for changes in status bit */ + *tail_desc_addr = &rxdp->wb.upper.status_error; + + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + *expected = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD); + *mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD); + + /* the registers are 32-bit */ + *data_sz = 4; + + return 0; +} + /* @note: fix ixgbe_dev_supported_ptypes_get() if any change here. */ static inline uint32_t ixgbe_rxd_pkt_info_to_pkt_type(uint32_t pkt_info, uint16_t ptype_mask) diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h index 6d2f7c9da3..1ef0b05e66 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.h +++ b/drivers/net/ixgbe/ixgbe_rxtx.h @@ -299,5 +299,7 @@ uint64_t ixgbe_get_tx_port_offloads(struct rte_eth_dev *dev); uint64_t ixgbe_get_rx_queue_offloads(struct rte_eth_dev *dev); uint64_t ixgbe_get_rx_port_offloads(struct rte_eth_dev *dev); uint64_t ixgbe_get_tx_queue_offloads(struct rte_eth_dev *dev); +int ixgbe_get_wake_addr(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t *data_sz); #endif /* _IXGBE_RXTX_H_ */ From patchwork Tue Oct 27 14:59:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 82339 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1AE81A04B5; Tue, 27 Oct 2020 16:02:36 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id ACD85A54B; Tue, 27 Oct 2020 16:00:08 +0100 (CET) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 6011BA54B for ; Tue, 27 Oct 2020 16:00:06 +0100 (CET) IronPort-SDR: ckM+rTVk6c4JMHMANFDfOtynT7FSGWBp6Je6hP/c1VRoki7yuDexgItbvz4a2aTo/J9XSPnJ0+ MyRPUnNKeJ5g== X-IronPort-AV: E=McAfee;i="6000,8403,9786"; a="155063108" X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="155063108" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2020 08:00:04 -0700 IronPort-SDR: 60Y2zCmeAkbABp2y8OMqCvviOl9A3uhIy+8pMfGhVUPElBqXjMkflUPimo5I+ABFK0jw9lIJjW EStei4du+jAA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="334415361" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga002.jf.intel.com with ESMTP; 27 Oct 2020 07:59:59 -0700 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 09RExxx3015784; Tue, 27 Oct 2020 14:59:59 GMT Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 09RExx8s022914; Tue, 27 Oct 2020 14:59:59 GMT Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 09RExxUo022910; Tue, 27 Oct 2020 14:59:59 GMT From: Liang Ma To: dev@dpdk.org Cc: ruifeng.wang@arm.com, haiyue.wang@intel.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, david.hunt@intel.com, jerinjacobk@gmail.com, nhorman@tuxdriver.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, mw@semihalf.com, gtzalik@amazon.com, ajit.khaparde@broadcom.com, hkalra@marvell.com, johndale@cisco.com, xavier.huwei@huawei.com, xuanziyang2@huawei.com, matan@nvidia.com, yongwang@vmware.com, Anatoly Burakov , Beilei Xing , Jeff Guo Date: Tue, 27 Oct 2020 14:59:07 +0000 Message-Id: <1603810749-22285-8-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1603810749-22285-1-git-send-email-liang.j.ma@intel.com> References: <1603810749-22285-1-git-send-email-liang.j.ma@intel.com> In-Reply-To: <1603494392-7181-1-git-send-email-liang.j.ma@intel.com> References: <1603494392-7181-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [PATCH v10 7/9] net/i40e: implement power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Implement support for the power management API by implementing a `get_wake_addr` function that will return an address of an RX ring's status bit. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev Acked-by: Jeff Guo --- drivers/net/i40e/i40e_ethdev.c | 1 + drivers/net/i40e/i40e_rxtx.c | 26 ++++++++++++++++++++++++++ drivers/net/i40e/i40e_rxtx.h | 2 ++ 3 files changed, 29 insertions(+) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 4778aaf299..358a38232b 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -513,6 +513,7 @@ static const struct eth_dev_ops i40e_eth_dev_ops = { .mtu_set = i40e_dev_mtu_set, .tm_ops_get = i40e_tm_ops_get, .tx_done_cleanup = i40e_tx_done_cleanup, + .get_wake_addr = i40e_get_wake_addr, }; /* store statistics names and its offset in stats structure */ diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 5df9a9df56..78862fe3a2 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -72,6 +72,32 @@ #define I40E_TX_OFFLOAD_NOTSUP_MASK \ (PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK) +int +i40e_get_wake_addr(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t *data_sz) +{ + struct i40e_rx_queue *rxq = rx_queue; + volatile union i40e_rx_desc *rxdp; + uint16_t desc; + + desc = rxq->rx_tail; + rxdp = &rxq->rx_ring[desc]; + /* watch for changes in status bit */ + *tail_desc_addr = &rxdp->wb.qword1.status_error_len; + + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + *expected = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); + *mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); + + /* registers are 64-bit */ + *data_sz = 8; + + return 0; +} + static inline void i40e_rxd_to_vlan_tci(struct rte_mbuf *mb, volatile union i40e_rx_desc *rxdp) { diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h index 57d7b4160b..5826cf1099 100644 --- a/drivers/net/i40e/i40e_rxtx.h +++ b/drivers/net/i40e/i40e_rxtx.h @@ -248,6 +248,8 @@ uint16_t i40e_recv_scattered_pkts_vec_avx2(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts); uint16_t i40e_xmit_pkts_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts); +int i40e_get_wake_addr(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *value, uint8_t *data_sz); /* For each value it means, datasheet of hardware can tell more details * From patchwork Tue Oct 27 14:59:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 82340 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6AB53A04B5; Tue, 27 Oct 2020 16:03:05 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D3DE472EE; Tue, 27 Oct 2020 16:00:13 +0100 (CET) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 5478E72D9 for ; Tue, 27 Oct 2020 16:00:12 +0100 (CET) IronPort-SDR: kTecXbhL6OxAKHbPYuYnGtUbyB52YnvZzNo8O2iJLzfo93jZL/W/E5UIABD4vs/JuBcPXgAYLz u8mIrXIW2MYQ== X-IronPort-AV: E=McAfee;i="6000,8403,9786"; a="167314183" X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="167314183" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2020 08:00:08 -0700 IronPort-SDR: JdsE4gyJhjIsoW+/nklP/h4WGg2rj3rXmbZNLufjPblarTspBJCko/2RS6KarVImiYS3yjOThB kHX6D1V/uEAA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="350303048" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga008.jf.intel.com with ESMTP; 27 Oct 2020 08:00:01 -0700 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 09RF01dC015788; Tue, 27 Oct 2020 15:00:01 GMT Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 09RF01mH022957; Tue, 27 Oct 2020 15:00:01 GMT Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 09RF01ni022953; Tue, 27 Oct 2020 15:00:01 GMT From: Liang Ma To: dev@dpdk.org Cc: ruifeng.wang@arm.com, haiyue.wang@intel.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, david.hunt@intel.com, jerinjacobk@gmail.com, nhorman@tuxdriver.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, mw@semihalf.com, gtzalik@amazon.com, ajit.khaparde@broadcom.com, hkalra@marvell.com, johndale@cisco.com, xavier.huwei@huawei.com, xuanziyang2@huawei.com, matan@nvidia.com, yongwang@vmware.com, Anatoly Burakov , Qiming Yang , Qi Zhang Date: Tue, 27 Oct 2020 14:59:08 +0000 Message-Id: <1603810749-22285-9-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1603810749-22285-1-git-send-email-liang.j.ma@intel.com> References: <1603810749-22285-1-git-send-email-liang.j.ma@intel.com> In-Reply-To: <1603494392-7181-1-git-send-email-liang.j.ma@intel.com> References: <1603494392-7181-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [PATCH v10 8/9] net/ice: implement power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Implement support for the power management API by implementing a `get_wake_addr` function that will return an address of an RX ring's status bit. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev --- drivers/net/ice/ice_ethdev.c | 1 + drivers/net/ice/ice_rxtx.c | 26 ++++++++++++++++++++++++++ drivers/net/ice/ice_rxtx.h | 2 ++ 3 files changed, 29 insertions(+) diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index c65125ff32..54f185ad4d 100644 --- a/drivers/net/ice/ice_ethdev.c +++ b/drivers/net/ice/ice_ethdev.c @@ -216,6 +216,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = { .udp_tunnel_port_add = ice_dev_udp_tunnel_port_add, .udp_tunnel_port_del = ice_dev_udp_tunnel_port_del, .tx_done_cleanup = ice_tx_done_cleanup, + .get_wake_addr = ice_get_wake_addr, }; /* store statistics names and its offset in stats structure */ diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index ee576c362a..fafd6ada62 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -26,6 +26,32 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask; uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask; uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask; +int ice_get_wake_addr(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t *data_sz) +{ + volatile union ice_rx_flex_desc *rxdp; + struct ice_rx_queue *rxq = rx_queue; + uint16_t desc; + + desc = rxq->rx_tail; + rxdp = &rxq->rx_ring[desc]; + /* watch for changes in status bit */ + *tail_desc_addr = &rxdp->wb.status_error0; + + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + *expected = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S); + *mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S); + + /* register is 16-bit */ + *data_sz = 2; + + return 0; +} + + static inline uint8_t ice_proto_xtr_type_to_rxdid(uint8_t xtr_type) { diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h index 1c23c7541e..7eeb8d467e 100644 --- a/drivers/net/ice/ice_rxtx.h +++ b/drivers/net/ice/ice_rxtx.h @@ -250,6 +250,8 @@ uint16_t ice_xmit_pkts_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts); int ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc); int ice_tx_done_cleanup(void *txq, uint32_t free_cnt); +int ice_get_wake_addr(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t *data_sz); #define FDIR_PARSING_ENABLE_PER_QUEUE(ad, on) do { \ int i; \ From patchwork Tue Oct 27 14:59:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 82341 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0A254A04DD; Tue, 27 Oct 2020 16:03:28 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 4FCC8BBBA; Tue, 27 Oct 2020 16:00:15 +0100 (CET) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id 24CEB6938 for ; Tue, 27 Oct 2020 16:00:10 +0100 (CET) IronPort-SDR: XV59X7St2dGi+LZazJsXAtxTJcjGegqZKqomLUxti/FM5dZieSXMfhJCSgtHngnHDXiX5+rt8I L3J0JGoB2GUw== X-IronPort-AV: E=McAfee;i="6000,8403,9786"; a="147378712" X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="147378712" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2020 08:00:09 -0700 IronPort-SDR: I4nz30kYh/ge8OLtFqk9JGM/zt4/U55vxox29BaNrQdqTT8AN7mNz15JLSJQV7skhHCCv7rrCX Ow0jdaIoAA6A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,424,1596524400"; d="scan'208";a="424430535" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga001.fm.intel.com with ESMTP; 27 Oct 2020 08:00:05 -0700 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 09RF04WD016046; Tue, 27 Oct 2020 15:00:04 GMT Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 09RF04rX023085; Tue, 27 Oct 2020 15:00:04 GMT Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 09RF04jV023081; Tue, 27 Oct 2020 15:00:04 GMT From: Liang Ma To: dev@dpdk.org Cc: ruifeng.wang@arm.com, haiyue.wang@intel.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, david.hunt@intel.com, jerinjacobk@gmail.com, nhorman@tuxdriver.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, mw@semihalf.com, gtzalik@amazon.com, ajit.khaparde@broadcom.com, hkalra@marvell.com, johndale@cisco.com, xavier.huwei@huawei.com, xuanziyang2@huawei.com, matan@nvidia.com, yongwang@vmware.com, Anatoly Burakov Date: Tue, 27 Oct 2020 14:59:09 +0000 Message-Id: <1603810749-22285-10-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1603810749-22285-1-git-send-email-liang.j.ma@intel.com> References: <1603810749-22285-1-git-send-email-liang.j.ma@intel.com> In-Reply-To: <1603494392-7181-1-git-send-email-liang.j.ma@intel.com> References: <1603494392-7181-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [PATCH v10 9/9] examples/l3fwd-power: enable PMD power mgmt X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add PMD power management feature support to l3fwd-power sample app. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: David Hunt --- Notes: v8: - Add return status check for queue enable v6: - Fixed typos in documentation --- .../sample_app_ug/l3_forward_power_man.rst | 13 ++++++ examples/l3fwd-power/main.c | 46 ++++++++++++++++++- 2 files changed, 58 insertions(+), 1 deletion(-) diff --git a/doc/guides/sample_app_ug/l3_forward_power_man.rst b/doc/guides/sample_app_ug/l3_forward_power_man.rst index d7e1dc5813..b10ebe662e 100644 --- a/doc/guides/sample_app_ug/l3_forward_power_man.rst +++ b/doc/guides/sample_app_ug/l3_forward_power_man.rst @@ -109,6 +109,8 @@ where, * --telemetry: Telemetry mode. +* --pmd-mgmt: PMD power management mode. + See :doc:`l3_forward` for details. The L3fwd-power example reuses the L3fwd command line options. @@ -455,3 +457,14 @@ reference cycles and accordingly busy rate is set to either 0% or The new stats ``empty_poll`` , ``full_poll`` and ``busy_percent`` can be viewed by running the script ``/usertools/dpdk-telemetry-client.py`` and selecting the menu option ``Send for global Metrics``. + +PMD power management Mode +------------------------- + +The PMD power management mode support for ``l3fwd-power`` is a standalone mode, in this mode +``l3fwd-power`` does simple l3fwding along with enable the power saving scheme on specific +port/queue/lcore. Main purpose for this mode is to demonstrate how to use the PMD power management API. + +.. code-block:: console + + ./build/examples/dpdk-l3fwd-power -l 1-3 -- --pmd-mgmt -p 0x0f --config="(0,0,2),(0,1,3)" diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index a48d75f68f..aafa415f0b 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -47,6 +47,7 @@ #include #include #include +#include #include "perf_core.h" #include "main.h" @@ -199,7 +200,8 @@ enum appmode { APP_MODE_LEGACY, APP_MODE_EMPTY_POLL, APP_MODE_TELEMETRY, - APP_MODE_INTERRUPT + APP_MODE_INTERRUPT, + APP_MODE_PMD_MGMT }; enum appmode app_mode; @@ -1750,6 +1752,7 @@ parse_ep_config(const char *q_arg) #define CMD_LINE_OPT_EMPTY_POLL "empty-poll" #define CMD_LINE_OPT_INTERRUPT_ONLY "interrupt-only" #define CMD_LINE_OPT_TELEMETRY "telemetry" +#define CMD_LINE_OPT_PMD_MGMT "pmd-mgmt" /* Parse the argument given in the command line of the application */ static int @@ -1771,6 +1774,7 @@ parse_args(int argc, char **argv) {CMD_LINE_OPT_LEGACY, 0, 0, 0}, {CMD_LINE_OPT_TELEMETRY, 0, 0, 0}, {CMD_LINE_OPT_INTERRUPT_ONLY, 0, 0, 0}, + {CMD_LINE_OPT_PMD_MGMT, 0, 0, 0}, {NULL, 0, 0, 0} }; @@ -1881,6 +1885,16 @@ parse_args(int argc, char **argv) printf("telemetry mode is enabled\n"); } + if (!strncmp(lgopts[option_index].name, + CMD_LINE_OPT_PMD_MGMT, + sizeof(CMD_LINE_OPT_PMD_MGMT))) { + if (app_mode != APP_MODE_DEFAULT) { + printf(" power mgmt mode is mutually exclusive with other modes\n"); + return -1; + } + app_mode = APP_MODE_PMD_MGMT; + printf("PMD power mgmt mode is enabled\n"); + } if (!strncmp(lgopts[option_index].name, CMD_LINE_OPT_INTERRUPT_ONLY, sizeof(CMD_LINE_OPT_INTERRUPT_ONLY))) { @@ -2437,6 +2451,8 @@ mode_to_str(enum appmode mode) return "telemetry"; case APP_MODE_INTERRUPT: return "interrupt-only"; + case APP_MODE_PMD_MGMT: + return "pmd mgmt"; default: return "invalid"; } @@ -2705,6 +2721,17 @@ main(int argc, char **argv) } else if (!check_ptype(portid)) rte_exit(EXIT_FAILURE, "PMD can not provide needed ptypes\n"); + if (app_mode == APP_MODE_PMD_MGMT) { + ret = rte_power_pmd_mgmt_queue_enable(lcore_id, + portid, queueid, + RTE_POWER_MGMT_TYPE_SCALE); + + if (ret < 0) + rte_exit(EXIT_FAILURE, + "rte_power_pmd_mgmt enable: err=%d, " + "port=%d\n", ret, portid); + + } } } @@ -2790,6 +2817,9 @@ main(int argc, char **argv) SKIP_MAIN); } else if (app_mode == APP_MODE_INTERRUPT) { rte_eal_mp_remote_launch(main_intr_loop, NULL, CALL_MAIN); + } else if (app_mode == APP_MODE_PMD_MGMT) { + rte_eal_mp_remote_launch(main_telemetry_loop, NULL, + CALL_MAIN); } if (app_mode == APP_MODE_EMPTY_POLL || app_mode == APP_MODE_TELEMETRY) @@ -2816,6 +2846,20 @@ main(int argc, char **argv) if (app_mode == APP_MODE_EMPTY_POLL) rte_power_empty_poll_stat_free(); + if (app_mode == APP_MODE_PMD_MGMT) { + for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { + if (rte_lcore_is_enabled(lcore_id) == 0) + continue; + qconf = &lcore_conf[lcore_id]; + for (queue = 0; queue < qconf->n_rx_queue; ++queue) { + portid = qconf->rx_queue_list[queue].port_id; + queueid = qconf->rx_queue_list[queue].queue_id; + rte_power_pmd_mgmt_queue_disable(lcore_id, + portid, queueid); + } + } + } + if ((app_mode == APP_MODE_LEGACY || app_mode == APP_MODE_EMPTY_POLL) && deinit_power_library()) rte_exit(EXIT_FAILURE, "deinit_power_library failed\n");