From patchwork Mon Nov 2 11:10:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 83382 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8AD91A04E7; Mon, 2 Nov 2020 12:11:38 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 6A68EC7F8; Mon, 2 Nov 2020 12:11:37 +0100 (CET) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 27DECC7F4 for ; Mon, 2 Nov 2020 12:11:33 +0100 (CET) IronPort-SDR: 6N/EFt9NIeOZ9qBWcEgXW8aBdWPqNvP8sfLFX5ruSAbX07Gy7o5Tf13I3Jm4xH4jSOjx/KE1Fs GprvqDGCpb8A== X-IronPort-AV: E=McAfee;i="6000,8403,9792"; a="233037654" X-IronPort-AV: E=Sophos;i="5.77,444,1596524400"; d="scan'208";a="233037654" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2020 03:11:32 -0800 IronPort-SDR: xe47QPzEsS9KtLjFSwtmzu8ilXwybhZ77KqSaaMExpeJTrVcKzoFOrL7tMEqfPv2E0G+9QitwQ y8/6ZU9ijV1Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,444,1596524400"; d="scan'208";a="305656345" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga007.fm.intel.com with ESMTP; 02 Nov 2020 03:11:28 -0800 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 0A2BBRJB011969; Mon, 2 Nov 2020 11:11:27 GMT Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 0A2BBR8J028078; Mon, 2 Nov 2020 11:11:27 GMT Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 0A2BBRZT028074; Mon, 2 Nov 2020 11:11:27 GMT From: Liang Ma To: dev@dpdk.org Cc: ruifeng.wang@arm.com, haiyue.wang@intel.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, david.hunt@intel.com, jerinjacobk@gmail.com, nhorman@tuxdriver.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, mw@semihalf.com, gtzalik@amazon.com, ajit.khaparde@broadcom.com, hkalra@marvell.com, johndale@cisco.com, matan@nvidia.com, yongwang@vmware.com, Anatoly Burakov , Ferruh Yigit , Andrew Rybchenko , Ray Kinsella Date: Mon, 2 Nov 2020 11:10:00 +0000 Message-Id: <1604315406-27669-2-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1604315406-27669-1-git-send-email-liang.j.ma@intel.com> References: <2772eb151ccba5cc17186e6161d8834176924753.1590598121.git.anatoly.burakov@intel.com> <1604315406-27669-1-git-send-email-liang.j.ma@intel.com> MIME-Version: 1.0 Subject: [dpdk-dev] =?utf-8?q?=5BPATCH_v11_1/6=5D_ethdev=3A_add_simple_power?= =?utf-8?q?_management_API?= X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add a simple API to allow getting address of getting notification information from the PMD, as well as release notes information. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev --- Notes: v11: - Rework the API Doxygen documentation v10: - Address minor issue on comments and release notes v8: - Rename version map file name v7: - Fixed race condition (Konstantin) - Slight rework of the structure of monitor code - Added missing inline for wakeup v6: - Added wakeup mechanism for UMWAIT - Removed memory allocation (everything is now allocated statically) - Fixed various typos and comments - Check for invalid queue ID - Moved release notes to this patch v5: - Make error checking more robust - Prevent initializing scaling if ACPI or PSTATE env wasn't set - Prevent initializing UMWAIT path if PMD doesn't support get_wake_addr - Add some debug logging - Replace x86-specific code path to generic path using the intrinsic check --- doc/guides/rel_notes/release_20_11.rst | 4 +++ lib/librte_ethdev/rte_ethdev.c | 23 +++++++++++++++ lib/librte_ethdev/rte_ethdev.h | 41 ++++++++++++++++++++++++++ lib/librte_ethdev/rte_ethdev_driver.h | 28 ++++++++++++++++++ lib/librte_ethdev/version.map | 1 + 5 files changed, 97 insertions(+) diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst index 88b9086390..e95e6aa7a5 100644 --- a/doc/guides/rel_notes/release_20_11.rst +++ b/doc/guides/rel_notes/release_20_11.rst @@ -148,6 +148,10 @@ New Features Hairpin Tx part flow rules can be inserted explicitly. New API is added to get the hairpin peer ports list. +* **ethdev: added 1 new API for PMD power management.** + + * ``rte_eth_get_wake_addr()`` is added to get the wake up address from device. + * **Updated Broadcom bnxt driver.** Updated the Broadcom bnxt driver with new features and improvements, including: diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethdev.c index b12bb3854d..4f3115fe8e 100644 --- a/lib/librte_ethdev/rte_ethdev.c +++ b/lib/librte_ethdev/rte_ethdev.c @@ -5138,6 +5138,29 @@ rte_eth_tx_burst_mode_get(uint16_t port_id, uint16_t queue_id, dev->dev_ops->tx_burst_mode_get(dev, queue_id, mode)); } +int +rte_eth_get_wake_addr(uint16_t port_id, uint16_t queue_id, + volatile void **wake_addr, uint64_t *expected, uint64_t *mask, + uint8_t *data_sz) +{ + struct rte_eth_dev *dev; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); + + dev = &rte_eth_devices[port_id]; + + RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->get_wake_addr, -ENOTSUP); + + if (queue_id >= dev->data->nb_rx_queues) { + RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id); + return -EINVAL; + } + + return eth_err(port_id, + dev->dev_ops->get_wake_addr(dev->data->rx_queues[queue_id], + wake_addr, expected, mask, data_sz)); +} + int rte_eth_dev_set_mc_addr_list(uint16_t port_id, struct rte_ether_addr *mc_addr_set, diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h index e341a08817..d208fe99ca 100644 --- a/lib/librte_ethdev/rte_ethdev.h +++ b/lib/librte_ethdev/rte_ethdev.h @@ -4364,6 +4364,47 @@ __rte_experimental int rte_eth_tx_burst_mode_get(uint16_t port_id, uint16_t queue_id, struct rte_eth_burst_mode *mode); +/** + * In order to make use of some PMD Power Management schemes, the user might + * want to wait (sleep/poll) until new packets arrive. This function + * retrieves the necessary information from the PMD to enter that wait/sleep + * state. The main parameter provided is the address to monitor while waiting + * to wake up. In addition to this wake address, the function also provides + * extra information including expected value, selection mask and data size to + * monitor. The user is expected to use this information to enter low power + * mode using the rte_power_monitor API, and the core will exit low power mode + * upon reaching the expected condition: + * (((uint64_t)read_mem(wake_addr, data_sz)) & mask) == expected). + * @note The low power mode can also exit in other cases, e.g. interrupt. + * + * @param[in] port_id + * The port identifier of the Ethernet device. + * @param[in] queue_id + * The Rx queue on the Ethernet device for which information will be + * retrieved. + * @param[out] wake_addr + * The pointer to the address which will be monitored. + * @param[out] expected + * The pointer to the expected value to allow wakeup condition. + * @param[out] mask + * The pointer to comparison bitmask for the expected value. + * @note a mask value of zero should be treated as: + * “no special wakeup values for provided address from the driver”. + * @param[out] data_sz + * The pointer to data size for the expected value (in bytes) + * @note valid values are 1,2,4,8 + * + * @return + * - 0: Success. + * -ENOTSUP: Operation not supported. + * -EINVAL: Invalid parameters. + * -ENODEV: Invalid port ID. + */ +__rte_experimental +int rte_eth_get_wake_addr(uint16_t port_id, uint16_t queue_id, + volatile void **wake_addr, uint64_t *expected, uint64_t *mask, + uint8_t *data_sz); + /** * Retrieve device registers and register attributes (number of registers and * register size) diff --git a/lib/librte_ethdev/rte_ethdev_driver.h b/lib/librte_ethdev/rte_ethdev_driver.h index c63b9f7eb7..e7ce1e261d 100644 --- a/lib/librte_ethdev/rte_ethdev_driver.h +++ b/lib/librte_ethdev/rte_ethdev_driver.h @@ -752,6 +752,32 @@ typedef int (*eth_hairpin_queue_peer_unbind_t) (struct rte_eth_dev *dev, uint16_t cur_queue, uint32_t direction); /**< @internal Unbind peer queue from the current queue. */ +/** + * @internal + * Get address of memory location whose contents will change whenever there is + * new data to be received. + * + * @param rxq + * Ethdev queue pointer. + * @param tail_desc_addr + * The pointer point to where the address will be stored. + * @param expected + * The pointer point to value to be expected when descriptor is set. + * @param mask + * The pointer point to comparison bitmask for the expected value. + * @param data_sz + * Data size for the expected value (can be 1, 2, 4, or 8 bytes) + * @return + * Negative errno value on error, 0 on success. + * + * @retval 0 + * Success + * @retval -EINVAL + * Invalid parameters + */ +typedef int (*eth_get_wake_addr_t)(void *rxq, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t *data_sz); + /** * @internal A structure containing the functions exported by an Ethernet driver. */ @@ -910,6 +936,8 @@ struct eth_dev_ops { /**< Set up the connection between the pair of hairpin queues. */ eth_hairpin_queue_peer_unbind_t hairpin_queue_peer_unbind; /**< Disconnect the hairpin queues of a pair from each other. */ + eth_get_wake_addr_t get_wake_addr; + /**< Get next RX queue ring entry address. */ }; /** diff --git a/lib/librte_ethdev/version.map b/lib/librte_ethdev/version.map index 8ddda2547f..f9ce4e3c8d 100644 --- a/lib/librte_ethdev/version.map +++ b/lib/librte_ethdev/version.map @@ -235,6 +235,7 @@ EXPERIMENTAL { rte_eth_fec_get_capability; rte_eth_fec_get; rte_eth_fec_set; + rte_eth_get_wake_addr; rte_flow_shared_action_create; rte_flow_shared_action_destroy; rte_flow_shared_action_query; From patchwork Mon Nov 2 11:10:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 83384 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 49C96A04E7; Mon, 2 Nov 2020 12:12:19 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id DD439C820; Mon, 2 Nov 2020 12:11:40 +0100 (CET) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id F138DC7F4 for ; Mon, 2 Nov 2020 12:11:36 +0100 (CET) IronPort-SDR: 9kUqwoPb4kfF/ITPxQdlpusROMDTJdDj2RVRmKMIIJYWAdkhf99nrgq7yJ9zLougDwyodcaJCV 8zRM0MwoIxRg== X-IronPort-AV: E=McAfee;i="6000,8403,9792"; a="155855072" X-IronPort-AV: E=Sophos;i="5.77,444,1596524400"; d="scan'208";a="155855072" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2020 03:11:35 -0800 IronPort-SDR: vq4kauUhRrbbrVhcMp76KQyfzT7axusQmyzfEmIFjYtxG04fAd5ZKomCXarYnU08S1/Z/v2XAF Rzxp0KKukzcA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,444,1596524400"; d="scan'208";a="351974950" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga008.jf.intel.com with ESMTP; 02 Nov 2020 03:11:31 -0800 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 0A2BBUx9011972; Mon, 2 Nov 2020 11:11:30 GMT Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 0A2BBUls028128; Mon, 2 Nov 2020 11:11:30 GMT Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 0A2BBUj2028124; Mon, 2 Nov 2020 11:11:30 GMT From: Liang Ma To: dev@dpdk.org Cc: ruifeng.wang@arm.com, haiyue.wang@intel.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, david.hunt@intel.com, jerinjacobk@gmail.com, nhorman@tuxdriver.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, mw@semihalf.com, gtzalik@amazon.com, ajit.khaparde@broadcom.com, hkalra@marvell.com, johndale@cisco.com, matan@nvidia.com, yongwang@vmware.com, Anatoly Burakov , Ray Kinsella Date: Mon, 2 Nov 2020 11:10:01 +0000 Message-Id: <1604315406-27669-3-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1604315406-27669-1-git-send-email-liang.j.ma@intel.com> References: <2772eb151ccba5cc17186e6161d8834176924753.1590598121.git.anatoly.burakov@intel.com> <1604315406-27669-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [PATCH v11 2/6] power: add PMD power management API and callback X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add a simple on/off switch that will enable saving power when no packets are arriving. It is based on counting the number of empty polls and, when the number reaches a certain threshold, entering an architecture-defined optimized power state that will either wait until a TSC timestamp expires, or when packets arrive. This API mandates a core-to-single-queue mapping (that is, multiple queued per device are supported, but they have to be polled on different cores). This design is using PMD RX callbacks. The following are the available schemes: 1. UMWAIT/UMONITOR: When a certain threshold of empty polls is reached, the core will go into a power optimized sleep while waiting on an address of next RX descriptor to be written to. 2. Pause instruction Instead of move the core into deeper C state, this method uses the pause instruction to avoid busy polling. 3. Frequency scaling Reuse existing DPDK power library to scale up/down core frequency depending on traffic volume. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: David Hunt Acked-by: Konstantin Ananyev --- Notes: v11: - Updated power library document - Updated release notes v10: - Updated power library document v8: - Rename version map file name v7: - Fixed race condition (Konstantin) - Slight rework of the structure of monitor code - Added missing inline for wakeup v6: - Added wakeup mechanism for UMWAIT - Removed memory allocation (everything is now allocated statically) - Fixed various typos and comments - Check for invalid queue ID - Moved release notes to this patch v5: - Make error checking more robust - Prevent initializing scaling if ACPI or PSTATE env wasn't set - Prevent initializing UMWAIT path if PMD doesn't support get_wake_addr - Add some debug logging - Replace x86-specific code path to generic path using the intrinsic check --- doc/guides/prog_guide/power_man.rst | 51 ++++ doc/guides/rel_notes/release_20_11.rst | 15 +- lib/librte_power/meson.build | 5 +- lib/librte_power/rte_power_pmd_mgmt.c | 320 +++++++++++++++++++++++++ lib/librte_power/rte_power_pmd_mgmt.h | 92 +++++++ lib/librte_power/version.map | 4 + 6 files changed, 484 insertions(+), 3 deletions(-) create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst index 0a3755a901..380e7aace7 100644 --- a/doc/guides/prog_guide/power_man.rst +++ b/doc/guides/prog_guide/power_man.rst @@ -192,6 +192,54 @@ User Cases ---------- The mechanism can applied to any device which is based on polling. e.g. NIC, FPGA. +PMD Power Management API +------------------------ + +Abstract +~~~~~~~~ + +Existing Power Management mechanisms require developers to change the design of +an application or change code to make use of it. The PMD Power Management API +provides a convenient alternative by use Ethernet PMD RX callbacks, and +triggering power saving whenever the empty poll count reaches a certain number. + +There are multiple power saving schemes available for the developer to choose. +Although the developer can configure each queue with different scheme, It's +strongly recommended to configure the queue within the same port with the same +scheme. + +The following are the available schemes: + + * UMWAIT/UMONITOR + + This power saving scheme will put the core into an optimized power state and + use the UMWAIT/UMONITOR instructions to monitor the Ethernet PMD RX + descriptor address, and wake the CPU up whenever there's new traffic. + + * Pause + + This power saving scheme will use the "rte_pause" function to reduce the impact + of busy polling. + + * Frequency scaling + + This power saving scheme will use existing power library functionality to + scale the core frequency up/down depending on traffic volume. + + +.. note:: + + Currently, this Power Management API is limited to mapping of 1 queue to 1 + core (multiple queues are supported, but they must be polled from different + cores). + +API Overview for PMD Power Management +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +* **Queue Enable**: Enable a specific power scheme for a certain queue/port/core + +* **Queue Disable**: Disable power scheme for certain queue/port/core + References ---------- @@ -200,3 +248,6 @@ References * The :doc:`../sample_app_ug/vm_power_management` chapter in the :doc:`../sample_app_ug/index` section. + +* The :doc:`../sample_app_ug/rxtx_callbacks` + chapter in the :doc:`../sample_app_ug/index` section. diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst index e95e6aa7a5..0430fca9cc 100644 --- a/doc/guides/rel_notes/release_20_11.rst +++ b/doc/guides/rel_notes/release_20_11.rst @@ -150,7 +150,9 @@ New Features * **ethdev: added 1 new API for PMD power management.** - * ``rte_eth_get_wake_addr()`` is added to get the wake up address from device. + * ``rte_eth_get_wake_addr()`` function has been added to allow applications to + fetch the wake up information from the device. Processor need that information + to wake up from the low power state. * **Updated Broadcom bnxt driver.** @@ -362,6 +364,17 @@ New Features * Replaced ``--scalar`` command-line option with ``--alg=``, to allow the user to select the desired classify method. +* **Added PMD power management mechanism** + + The new Ethernet PMD Power Management mechanisms have been added through the + existing RX callback infrastructure. + + * Added power saving scheme based on UMWAIT instruction (x86 only) + * Added power saving scheme based on ``rte_pause()`` + * Added power saving scheme based on frequency scaling through the power library + * Added new EXPERIMENTAL API ``rte_power_pmd_mgmt_queue_enable()`` + * Added new EXPERIMENTAL API ``rte_power_pmd_mgmt_queue_disable()`` + Removed Items ------------- diff --git a/lib/librte_power/meson.build b/lib/librte_power/meson.build index 78c031c943..cc3c7a8646 100644 --- a/lib/librte_power/meson.build +++ b/lib/librte_power/meson.build @@ -9,6 +9,7 @@ sources = files('rte_power.c', 'power_acpi_cpufreq.c', 'power_kvm_vm.c', 'guest_channel.c', 'rte_power_empty_poll.c', 'power_pstate_cpufreq.c', + 'rte_power_pmd_mgmt.c', 'power_common.c') -headers = files('rte_power.h','rte_power_empty_poll.h') -deps += ['timer'] +headers = files('rte_power.h','rte_power_empty_poll.h','rte_power_pmd_mgmt.h') +deps += ['timer' ,'ethdev'] diff --git a/lib/librte_power/rte_power_pmd_mgmt.c b/lib/librte_power/rte_power_pmd_mgmt.c new file mode 100644 index 0000000000..0dcaddc3bd --- /dev/null +++ b/lib/librte_power/rte_power_pmd_mgmt.c @@ -0,0 +1,320 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2020 Intel Corporation + */ + +#include +#include +#include +#include +#include +#include + +#include "rte_power_pmd_mgmt.h" + +#define EMPTYPOLL_MAX 512 + +/** + * Possible power management states of an ethdev port. + */ +enum pmd_mgmt_state { + /** Device power management is disabled. */ + PMD_MGMT_DISABLED = 0, + /** Device power management is enabled. */ + PMD_MGMT_ENABLED, +}; + +struct pmd_queue_cfg { + enum pmd_mgmt_state pwr_mgmt_state; + /**< State of power management for this queue */ + enum rte_power_pmd_mgmt_type cb_mode; + /**< Callback mode for this queue */ + const struct rte_eth_rxtx_callback *cur_cb; + /**< Callback instance */ + rte_spinlock_t umwait_lock; + /**< Per-queue status lock - used only for UMWAIT mode */ + volatile void *wait_addr; + /**< UMWAIT wakeup address */ + uint64_t empty_poll_stats; + /**< Number of empty polls */ +} __rte_cache_aligned; + +static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT]; + +/* trigger a write to the cache line we're waiting on */ +static inline void +umwait_wakeup(volatile void *addr) +{ + uint64_t val; + + val = __atomic_load_n((volatile uint64_t *)addr, __ATOMIC_RELAXED); + __atomic_compare_exchange_n((volatile uint64_t *)addr, &val, val, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED); +} + +static inline void +umwait_sleep(struct pmd_queue_cfg *q_conf, uint16_t port_id, uint16_t qidx) +{ + volatile void *target_addr; + uint64_t expected, mask; + uint8_t data_sz; + uint16_t ret; + + /* + * get wake up address for this RX queue, as well as expected value, + * comparison mask, and data size. + */ + ret = rte_eth_get_wake_addr(port_id, qidx, &target_addr, + &expected, &mask, &data_sz); + + /* this should always succeed as all checks have been done already */ + if (unlikely(ret != 0)) + return; + + /* + * take out a spinlock to prevent control plane from concurrently + * modifying the wakeup data. + */ + rte_spinlock_lock(&q_conf->umwait_lock); + + /* have we been disabled by control plane? */ + if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) { + /* we're good to go */ + + /* + * store the wakeup address so that control plane can trigger a + * write to this address and wake us up. + */ + q_conf->wait_addr = target_addr; + /* -1ULL is maximum value for TSC */ + rte_power_monitor_sync(target_addr, expected, mask, -1ULL, + data_sz, &q_conf->umwait_lock); + /* erase the address */ + q_conf->wait_addr = NULL; + } + rte_spinlock_unlock(&q_conf->umwait_lock); +} + +static uint16_t +clb_umwait(uint16_t port_id, uint16_t qidx, + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, + uint16_t max_pkts __rte_unused, void *addr __rte_unused) +{ + + struct pmd_queue_cfg *q_conf; + + q_conf = &port_cfg[port_id][qidx]; + + if (unlikely(nb_rx == 0)) { + q_conf->empty_poll_stats++; + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) + umwait_sleep(q_conf, port_id, qidx); + } else + q_conf->empty_poll_stats = 0; + + return nb_rx; +} + +static uint16_t +clb_pause(uint16_t port_id, uint16_t qidx, + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, + uint16_t max_pkts __rte_unused, void *addr __rte_unused) +{ + struct pmd_queue_cfg *q_conf; + + q_conf = &port_cfg[port_id][qidx]; + + if (unlikely(nb_rx == 0)) { + q_conf->empty_poll_stats++; + /* sleep for 1 microsecond */ + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) + rte_delay_us(1); + } else + q_conf->empty_poll_stats = 0; + + return nb_rx; +} + +static uint16_t +clb_scale_freq(uint16_t port_id, uint16_t qidx, + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, + uint16_t max_pkts __rte_unused, void *_ __rte_unused) +{ + struct pmd_queue_cfg *q_conf; + + q_conf = &port_cfg[port_id][qidx]; + + if (unlikely(nb_rx == 0)) { + q_conf->empty_poll_stats++; + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) + /* scale down freq */ + rte_power_freq_min(rte_lcore_id()); + } else { + q_conf->empty_poll_stats = 0; + /* scale up freq */ + rte_power_freq_max(rte_lcore_id()); + } + + return nb_rx; +} + +int +rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id, + uint16_t port_id, uint16_t queue_id, + enum rte_power_pmd_mgmt_type mode) +{ + struct rte_eth_dev *dev; + struct pmd_queue_cfg *queue_cfg; + int ret; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); + dev = &rte_eth_devices[port_id]; + + /* check if queue id is valid */ + if (queue_id >= dev->data->nb_rx_queues || + queue_id >= RTE_MAX_QUEUES_PER_PORT) { + return -EINVAL; + } + + queue_cfg = &port_cfg[port_id][queue_id]; + + if (queue_cfg->pwr_mgmt_state == PMD_MGMT_ENABLED) { + ret = -EINVAL; + goto end; + } + + switch (mode) { + case RTE_POWER_MGMT_TYPE_WAIT: + { + /* check if rte_power_monitor is supported */ + uint64_t dummy_expected, dummy_mask; + struct rte_cpu_intrinsics i; + volatile void *dummy_addr; + uint8_t dummy_sz; + + rte_cpu_get_intrinsics_support(&i); + + if (!i.power_monitor) { + RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n"); + ret = -ENOTSUP; + goto end; + } + + /* check if the device supports the necessary PMD API */ + if (rte_eth_get_wake_addr(port_id, queue_id, + &dummy_addr, &dummy_expected, + &dummy_mask, &dummy_sz) == -ENOTSUP) { + RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_rxq_ring_addr_get\n"); + ret = -ENOTSUP; + goto end; + } + /* initialize UMWAIT spinlock */ + rte_spinlock_init(&queue_cfg->umwait_lock); + + /* initialize data before enabling the callback */ + queue_cfg->empty_poll_stats = 0; + queue_cfg->cb_mode = mode; + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; + + queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id, + clb_umwait, NULL); + break; + } + case RTE_POWER_MGMT_TYPE_SCALE: + { + enum power_management_env env; + /* only PSTATE and ACPI modes are supported */ + if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) && + !rte_power_check_env_supported( + PM_ENV_PSTATE_CPUFREQ)) { + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are supported\n"); + ret = -ENOTSUP; + goto end; + } + /* ensure we could initialize the power library */ + if (rte_power_init(lcore_id)) { + ret = -EINVAL; + goto end; + } + /* ensure we initialized the correct env */ + env = rte_power_get_env(); + if (env != PM_ENV_ACPI_CPUFREQ && + env != PM_ENV_PSTATE_CPUFREQ) { + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes were initialized\n"); + ret = -ENOTSUP; + goto end; + } + /* initialize data before enabling the callback */ + queue_cfg->empty_poll_stats = 0; + queue_cfg->cb_mode = mode; + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; + + queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, + queue_id, clb_scale_freq, NULL); + break; + } + case RTE_POWER_MGMT_TYPE_PAUSE: + /* initialize data before enabling the callback */ + queue_cfg->empty_poll_stats = 0; + queue_cfg->cb_mode = mode; + queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED; + + queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id, + clb_pause, NULL); + break; + } + ret = 0; + +end: + return ret; +} + +int +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id, + uint16_t port_id, uint16_t queue_id) +{ + struct pmd_queue_cfg *queue_cfg; + int ret; + + queue_cfg = &port_cfg[port_id][queue_id]; + + if (queue_cfg->pwr_mgmt_state == PMD_MGMT_DISABLED) { + ret = -EINVAL; + goto end; + } + + switch (queue_cfg->cb_mode) { + case RTE_POWER_MGMT_TYPE_WAIT: + rte_spinlock_lock(&queue_cfg->umwait_lock); + + /* wake up the core from UMWAIT sleep, if any */ + if (queue_cfg->wait_addr != NULL) + umwait_wakeup(queue_cfg->wait_addr); + /* + * we need to disable early as there might be callback currently + * spinning on a lock. + */ + queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED; + + rte_spinlock_unlock(&queue_cfg->umwait_lock); + /* fall-through */ + case RTE_POWER_MGMT_TYPE_PAUSE: + rte_eth_remove_rx_callback(port_id, queue_id, + queue_cfg->cur_cb); + break; + case RTE_POWER_MGMT_TYPE_SCALE: + rte_power_freq_max(lcore_id); + rte_eth_remove_rx_callback(port_id, queue_id, + queue_cfg->cur_cb); + rte_power_exit(lcore_id); + break; + } + /* + * we don't free the RX callback here because it is unsafe to do so + * unless we know for a fact that all data plane threads have stopped. + */ + queue_cfg->cur_cb = NULL; + queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED; + ret = 0; +end: + return ret; +} diff --git a/lib/librte_power/rte_power_pmd_mgmt.h b/lib/librte_power/rte_power_pmd_mgmt.h new file mode 100644 index 0000000000..a7a3f98268 --- /dev/null +++ b/lib/librte_power/rte_power_pmd_mgmt.h @@ -0,0 +1,92 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2010-2020 Intel Corporation + */ + +#ifndef _RTE_POWER_PMD_MGMT_H +#define _RTE_POWER_PMD_MGMT_H + +/** + * @file + * RTE PMD Power Management + */ +#include +#include + +#include +#include +#include +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * PMD Power Management Type + */ +enum rte_power_pmd_mgmt_type { + /** WAIT callback mode. */ + RTE_POWER_MGMT_TYPE_WAIT = 1, + /** PAUSE callback mode. */ + RTE_POWER_MGMT_TYPE_PAUSE, + /** Freq Scaling callback mode. */ + RTE_POWER_MGMT_TYPE_SCALE, +}; + +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice + * + * Setup per-queue power management callback. + * + * @note This function is not thread-safe. + * + * @param lcore_id + * lcore_id. + * @param port_id + * The port identifier of the Ethernet device. + * @param queue_id + * The queue identifier of the Ethernet device. + * @param mode + * The power management callback function type. + + * @return + * 0 on success + * <0 on error + */ +__rte_experimental +int +rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id, + uint16_t port_id, + uint16_t queue_id, + enum rte_power_pmd_mgmt_type mode); + +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice + * + * Remove per-queue power management callback. + * + * @note This function is not thread-safe. + * + * @param lcore_id + * lcore_id. + * @param port_id + * The port identifier of the Ethernet device. + * @param queue_id + * The queue identifier of the Ethernet device. + * @return + * 0 on success + * <0 on error + */ +__rte_experimental +int +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id, + uint16_t port_id, + uint16_t queue_id); +#ifdef __cplusplus +} +#endif + +#endif diff --git a/lib/librte_power/version.map b/lib/librte_power/version.map index 69ca9af616..3f2f6cd6f6 100644 --- a/lib/librte_power/version.map +++ b/lib/librte_power/version.map @@ -34,4 +34,8 @@ EXPERIMENTAL { rte_power_guest_channel_receive_msg; rte_power_poll_stat_fetch; rte_power_poll_stat_update; + # added in 20.11 + rte_power_pmd_mgmt_queue_enable; + rte_power_pmd_mgmt_queue_disable; + }; From patchwork Mon Nov 2 11:10:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 83383 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 93090A04E7; Mon, 2 Nov 2020 12:11:55 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 019D8C802; Mon, 2 Nov 2020 12:11:39 +0100 (CET) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 22266C7F4 for ; Mon, 2 Nov 2020 12:11:36 +0100 (CET) IronPort-SDR: 101dbR2BhPKjQ/i6+NbgIIBbglkY7Aj4KDdUVF/I23dOeAPS9uWSKrtSG+yq9H39kns/Yk5W/W f3WMe2pexmTQ== X-IronPort-AV: E=McAfee;i="6000,8403,9792"; a="233037661" X-IronPort-AV: E=Sophos;i="5.77,444,1596524400"; d="scan'208";a="233037661" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2020 03:11:36 -0800 IronPort-SDR: V6sVN9V5tYfv3OLGkqb1M/UCrvvLVES7Frt49CzsOaClu3GsSyxssDYq36HM1njqdzLaIjWY4B tSpQ5xH7tRrQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,444,1596524400"; d="scan'208";a="305656379" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga007.fm.intel.com with ESMTP; 02 Nov 2020 03:11:32 -0800 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 0A2BBW4j011977; Mon, 2 Nov 2020 11:11:32 GMT Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 0A2BBWo5028174; Mon, 2 Nov 2020 11:11:32 GMT Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 0A2BBWH9028167; Mon, 2 Nov 2020 11:11:32 GMT From: Liang Ma To: dev@dpdk.org Cc: ruifeng.wang@arm.com, haiyue.wang@intel.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, david.hunt@intel.com, jerinjacobk@gmail.com, nhorman@tuxdriver.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, mw@semihalf.com, gtzalik@amazon.com, ajit.khaparde@broadcom.com, hkalra@marvell.com, johndale@cisco.com, matan@nvidia.com, yongwang@vmware.com, Anatoly Burakov , Jeff Guo Date: Mon, 2 Nov 2020 11:10:02 +0000 Message-Id: <1604315406-27669-4-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1604315406-27669-1-git-send-email-liang.j.ma@intel.com> References: <2772eb151ccba5cc17186e6161d8834176924753.1590598121.git.anatoly.burakov@intel.com> <1604315406-27669-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [PATCH v11 3/6] net/ixgbe: implement power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Implement support for the power management API by implementing a `get_wake_addr` function that will return an address of an RX ring's status bit. Signed-off-by: Anatoly Burakov Signed-off-by: Liang Ma Acked-by: Konstantin Ananyev --- drivers/net/ixgbe/ixgbe_ethdev.c | 1 + drivers/net/ixgbe/ixgbe_rxtx.c | 25 +++++++++++++++++++++++++ drivers/net/ixgbe/ixgbe_rxtx.h | 2 ++ 3 files changed, 28 insertions(+) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 00101c2eec..fcc4026372 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -588,6 +588,7 @@ static const struct eth_dev_ops ixgbe_eth_dev_ops = { .udp_tunnel_port_del = ixgbe_dev_udp_tunnel_port_del, .tm_ops_get = ixgbe_tm_ops_get, .tx_done_cleanup = ixgbe_dev_tx_done_cleanup, + .get_wake_addr = ixgbe_get_wake_addr, }; /* diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c index 6cfbb582e2..db94b9d05d 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.c +++ b/drivers/net/ixgbe/ixgbe_rxtx.c @@ -1369,6 +1369,31 @@ const uint32_t RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP, }; +int ixgbe_get_wake_addr(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t *data_sz) +{ + volatile union ixgbe_adv_rx_desc *rxdp; + struct ixgbe_rx_queue *rxq = rx_queue; + uint16_t desc; + + desc = rxq->rx_tail; + rxdp = &rxq->rx_ring[desc]; + /* watch for changes in status bit */ + *tail_desc_addr = &rxdp->wb.upper.status_error; + + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + *expected = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD); + *mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD); + + /* the registers are 32-bit */ + *data_sz = 4; + + return 0; +} + /* @note: fix ixgbe_dev_supported_ptypes_get() if any change here. */ static inline uint32_t ixgbe_rxd_pkt_info_to_pkt_type(uint32_t pkt_info, uint16_t ptype_mask) diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h index 6d2f7c9da3..1ef0b05e66 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx.h +++ b/drivers/net/ixgbe/ixgbe_rxtx.h @@ -299,5 +299,7 @@ uint64_t ixgbe_get_tx_port_offloads(struct rte_eth_dev *dev); uint64_t ixgbe_get_rx_queue_offloads(struct rte_eth_dev *dev); uint64_t ixgbe_get_rx_port_offloads(struct rte_eth_dev *dev); uint64_t ixgbe_get_tx_queue_offloads(struct rte_eth_dev *dev); +int ixgbe_get_wake_addr(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t *data_sz); #endif /* _IXGBE_RXTX_H_ */ From patchwork Mon Nov 2 11:10:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 83385 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1257FA04E7; Mon, 2 Nov 2020 12:12:39 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 75841C82A; Mon, 2 Nov 2020 12:11:42 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id A3569C81A for ; Mon, 2 Nov 2020 12:11:40 +0100 (CET) IronPort-SDR: +606nk9U3tBg4jTkyByEXwq8IS0fE0/4VtCoYujO12KJZczj2iVVDgy9kNci6U5jSfHVbqo3rb Z8IqCQhSjfaw== X-IronPort-AV: E=McAfee;i="6000,8403,9792"; a="148722254" X-IronPort-AV: E=Sophos;i="5.77,444,1596524400"; d="scan'208";a="148722254" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2020 03:11:38 -0800 IronPort-SDR: imxC7WUeKr4eP3+tCBPk6osKsn/09Lf6XuPoc/Jd3sXMSPOM+oHUh/6uvVKvU+u13HrNswoWWe 8noWOEiJ4RkA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,444,1596524400"; d="scan'208";a="470337524" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga004.jf.intel.com with ESMTP; 02 Nov 2020 03:11:34 -0800 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 0A2BBYXs011980; Mon, 2 Nov 2020 11:11:34 GMT Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 0A2BBXll028213; Mon, 2 Nov 2020 11:11:33 GMT Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 0A2BBXjN028209; Mon, 2 Nov 2020 11:11:33 GMT From: Liang Ma To: dev@dpdk.org Cc: ruifeng.wang@arm.com, haiyue.wang@intel.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, david.hunt@intel.com, jerinjacobk@gmail.com, nhorman@tuxdriver.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, mw@semihalf.com, gtzalik@amazon.com, ajit.khaparde@broadcom.com, hkalra@marvell.com, johndale@cisco.com, matan@nvidia.com, yongwang@vmware.com, Anatoly Burakov , Beilei Xing , Jeff Guo Date: Mon, 2 Nov 2020 11:10:03 +0000 Message-Id: <1604315406-27669-5-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1604315406-27669-1-git-send-email-liang.j.ma@intel.com> References: <2772eb151ccba5cc17186e6161d8834176924753.1590598121.git.anatoly.burakov@intel.com> <1604315406-27669-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [PATCH v11 4/6] net/i40e: implement power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Implement support for the power management API by implementing a `get_wake_addr` function that will return an address of an RX ring's status bit. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev Acked-by: Jeff Guo --- drivers/net/i40e/i40e_ethdev.c | 1 + drivers/net/i40e/i40e_rxtx.c | 26 ++++++++++++++++++++++++++ drivers/net/i40e/i40e_rxtx.h | 2 ++ 3 files changed, 29 insertions(+) diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 4778aaf299..358a38232b 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -513,6 +513,7 @@ static const struct eth_dev_ops i40e_eth_dev_ops = { .mtu_set = i40e_dev_mtu_set, .tm_ops_get = i40e_tm_ops_get, .tx_done_cleanup = i40e_tx_done_cleanup, + .get_wake_addr = i40e_get_wake_addr, }; /* store statistics names and its offset in stats structure */ diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 5df9a9df56..78862fe3a2 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c @@ -72,6 +72,32 @@ #define I40E_TX_OFFLOAD_NOTSUP_MASK \ (PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_MASK) +int +i40e_get_wake_addr(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t *data_sz) +{ + struct i40e_rx_queue *rxq = rx_queue; + volatile union i40e_rx_desc *rxdp; + uint16_t desc; + + desc = rxq->rx_tail; + rxdp = &rxq->rx_ring[desc]; + /* watch for changes in status bit */ + *tail_desc_addr = &rxdp->wb.qword1.status_error_len; + + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + *expected = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); + *mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT); + + /* registers are 64-bit */ + *data_sz = 8; + + return 0; +} + static inline void i40e_rxd_to_vlan_tci(struct rte_mbuf *mb, volatile union i40e_rx_desc *rxdp) { diff --git a/drivers/net/i40e/i40e_rxtx.h b/drivers/net/i40e/i40e_rxtx.h index 57d7b4160b..5826cf1099 100644 --- a/drivers/net/i40e/i40e_rxtx.h +++ b/drivers/net/i40e/i40e_rxtx.h @@ -248,6 +248,8 @@ uint16_t i40e_recv_scattered_pkts_vec_avx2(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts); uint16_t i40e_xmit_pkts_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts); +int i40e_get_wake_addr(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *value, uint8_t *data_sz); /* For each value it means, datasheet of hardware can tell more details * From patchwork Mon Nov 2 11:10:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 83386 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id D9D5CA04E7; Mon, 2 Nov 2020 12:12:56 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 19F66C836; Mon, 2 Nov 2020 12:11:44 +0100 (CET) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id A50B2C81A for ; Mon, 2 Nov 2020 12:11:41 +0100 (CET) IronPort-SDR: 2vwHxLiA+45l7yzS0q0F/hJkhdca+o31b/vGCpA/Kf25r0xK1/Fe8rlUY+lokhoGxXWGlci+IN oOPUFsfbW9og== X-IronPort-AV: E=McAfee;i="6000,8403,9792"; a="230499052" X-IronPort-AV: E=Sophos;i="5.77,444,1596524400"; d="scan'208";a="230499052" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2020 03:11:39 -0800 IronPort-SDR: rSN5Wm95S+PQWH7+vfmoC5SC5JhnOWG9iYG/0aAuzsEqRZwfqXe2A2Hhvv7n8p3SQujDgPhAIS HRyiTLqAkNdQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,444,1596524400"; d="scan'208";a="320031708" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga003.jf.intel.com with ESMTP; 02 Nov 2020 03:11:36 -0800 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 0A2BBZ2s011983; Mon, 2 Nov 2020 11:11:35 GMT Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 0A2BBZ3R028263; Mon, 2 Nov 2020 11:11:35 GMT Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 0A2BBZYg028259; Mon, 2 Nov 2020 11:11:35 GMT From: Liang Ma To: dev@dpdk.org Cc: ruifeng.wang@arm.com, haiyue.wang@intel.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, david.hunt@intel.com, jerinjacobk@gmail.com, nhorman@tuxdriver.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, mw@semihalf.com, gtzalik@amazon.com, ajit.khaparde@broadcom.com, hkalra@marvell.com, johndale@cisco.com, matan@nvidia.com, yongwang@vmware.com, Anatoly Burakov , Qiming Yang , Qi Zhang Date: Mon, 2 Nov 2020 11:10:04 +0000 Message-Id: <1604315406-27669-6-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1604315406-27669-1-git-send-email-liang.j.ma@intel.com> References: <2772eb151ccba5cc17186e6161d8834176924753.1590598121.git.anatoly.burakov@intel.com> <1604315406-27669-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [PATCH v11 5/6] net/ice: implement power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Implement support for the power management API by implementing a `get_wake_addr` function that will return an address of an RX ring's status bit. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: Konstantin Ananyev --- drivers/net/ice/ice_ethdev.c | 1 + drivers/net/ice/ice_rxtx.c | 26 ++++++++++++++++++++++++++ drivers/net/ice/ice_rxtx.h | 2 ++ 3 files changed, 29 insertions(+) diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index c65125ff32..54f185ad4d 100644 --- a/drivers/net/ice/ice_ethdev.c +++ b/drivers/net/ice/ice_ethdev.c @@ -216,6 +216,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = { .udp_tunnel_port_add = ice_dev_udp_tunnel_port_add, .udp_tunnel_port_del = ice_dev_udp_tunnel_port_del, .tx_done_cleanup = ice_tx_done_cleanup, + .get_wake_addr = ice_get_wake_addr, }; /* store statistics names and its offset in stats structure */ diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c index ee576c362a..fafd6ada62 100644 --- a/drivers/net/ice/ice_rxtx.c +++ b/drivers/net/ice/ice_rxtx.c @@ -26,6 +26,32 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask; uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask; uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask; +int ice_get_wake_addr(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t *data_sz) +{ + volatile union ice_rx_flex_desc *rxdp; + struct ice_rx_queue *rxq = rx_queue; + uint16_t desc; + + desc = rxq->rx_tail; + rxdp = &rxq->rx_ring[desc]; + /* watch for changes in status bit */ + *tail_desc_addr = &rxdp->wb.status_error0; + + /* + * we expect the DD bit to be set to 1 if this descriptor was already + * written to. + */ + *expected = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S); + *mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S); + + /* register is 16-bit */ + *data_sz = 2; + + return 0; +} + + static inline uint8_t ice_proto_xtr_type_to_rxdid(uint8_t xtr_type) { diff --git a/drivers/net/ice/ice_rxtx.h b/drivers/net/ice/ice_rxtx.h index 1c23c7541e..7eeb8d467e 100644 --- a/drivers/net/ice/ice_rxtx.h +++ b/drivers/net/ice/ice_rxtx.h @@ -250,6 +250,8 @@ uint16_t ice_xmit_pkts_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts); int ice_fdir_programming(struct ice_pf *pf, struct ice_fltr_desc *fdir_desc); int ice_tx_done_cleanup(void *txq, uint32_t free_cnt); +int ice_get_wake_addr(void *rx_queue, volatile void **tail_desc_addr, + uint64_t *expected, uint64_t *mask, uint8_t *data_sz); #define FDIR_PARSING_ENABLE_PER_QUEUE(ad, on) do { \ int i; \ From patchwork Mon Nov 2 11:10:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 83387 X-Patchwork-Delegate: thomas@monjalon.net Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 94955A04E7; Mon, 2 Nov 2020 12:13:17 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 75C42C850; Mon, 2 Nov 2020 12:11:45 +0100 (CET) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id 02BC2C830 for ; Mon, 2 Nov 2020 12:11:42 +0100 (CET) IronPort-SDR: qR2YEykuapaQssG8IUXhE3UfJsa8HS7hvB1Z1TfdeJAqyfmwVA9NuECDCU28JghMuTgNr8K7Py oxohoWtC5Zeg== X-IronPort-AV: E=McAfee;i="6000,8403,9792"; a="148150007" X-IronPort-AV: E=Sophos;i="5.77,444,1596524400"; d="scan'208";a="148150007" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2020 03:11:40 -0800 IronPort-SDR: AcGZrFb5gH1a83FQK1eZJljbSyP/k7AmVP16VwNIYfo6ekUaVm6q35ld6eEwmaAvTHYrPdNDL8 TtfMzMdLpzjQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,444,1596524400"; d="scan'208";a="362578097" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by FMSMGA003.fm.intel.com with ESMTP; 02 Nov 2020 03:11:37 -0800 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 0A2BBanV011986; Mon, 2 Nov 2020 11:11:36 GMT Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 0A2BBa03028305; Mon, 2 Nov 2020 11:11:36 GMT Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 0A2BBaDD028301; Mon, 2 Nov 2020 11:11:36 GMT From: Liang Ma To: dev@dpdk.org Cc: ruifeng.wang@arm.com, haiyue.wang@intel.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, david.hunt@intel.com, jerinjacobk@gmail.com, nhorman@tuxdriver.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, mw@semihalf.com, gtzalik@amazon.com, ajit.khaparde@broadcom.com, hkalra@marvell.com, johndale@cisco.com, matan@nvidia.com, yongwang@vmware.com, Anatoly Burakov Date: Mon, 2 Nov 2020 11:10:05 +0000 Message-Id: <1604315406-27669-7-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1604315406-27669-1-git-send-email-liang.j.ma@intel.com> References: <2772eb151ccba5cc17186e6161d8834176924753.1590598121.git.anatoly.burakov@intel.com> <1604315406-27669-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [PATCH v11 6/6] examples/l3fwd-power: enable PMD power mgmt X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Add PMD power management feature support to l3fwd-power sample app. Signed-off-by: Liang Ma Signed-off-by: Anatoly Burakov Acked-by: David Hunt --- Notes: v11: - Update l3fwd-power documentation v8: - Add return status check for queue enable v6: - Fixed typos in documentation --- .../sample_app_ug/l3_forward_power_man.rst | 14 ++++++ examples/l3fwd-power/main.c | 46 ++++++++++++++++++- 2 files changed, 59 insertions(+), 1 deletion(-) diff --git a/doc/guides/sample_app_ug/l3_forward_power_man.rst b/doc/guides/sample_app_ug/l3_forward_power_man.rst index d7e1dc5813..149342d112 100644 --- a/doc/guides/sample_app_ug/l3_forward_power_man.rst +++ b/doc/guides/sample_app_ug/l3_forward_power_man.rst @@ -109,6 +109,8 @@ where, * --telemetry: Telemetry mode. +* --pmd-mgmt: PMD power management mode. + See :doc:`l3_forward` for details. The L3fwd-power example reuses the L3fwd command line options. @@ -455,3 +457,15 @@ reference cycles and accordingly busy rate is set to either 0% or The new stats ``empty_poll`` , ``full_poll`` and ``busy_percent`` can be viewed by running the script ``/usertools/dpdk-telemetry-client.py`` and selecting the menu option ``Send for global Metrics``. + +PMD power management Mode +------------------------- + +The PMD Power Management mode support for ``l3fwd-power`` is a standalone mode, in this mode +``l3fwd-power`` does simple l3fwding along with enabling the power saving scheme on a specific +port/queue/lcore. The main purpose for this mode is to demonstrate how to use the PMD power +management API. + +.. code-block:: console + + ./build/examples/dpdk-l3fwd-power -l 1-3 -- --pmd-mgmt -p 0x0f --config="(0,0,2),(0,1,3)" diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c index a48d75f68f..aafa415f0b 100644 --- a/examples/l3fwd-power/main.c +++ b/examples/l3fwd-power/main.c @@ -47,6 +47,7 @@ #include #include #include +#include #include "perf_core.h" #include "main.h" @@ -199,7 +200,8 @@ enum appmode { APP_MODE_LEGACY, APP_MODE_EMPTY_POLL, APP_MODE_TELEMETRY, - APP_MODE_INTERRUPT + APP_MODE_INTERRUPT, + APP_MODE_PMD_MGMT }; enum appmode app_mode; @@ -1750,6 +1752,7 @@ parse_ep_config(const char *q_arg) #define CMD_LINE_OPT_EMPTY_POLL "empty-poll" #define CMD_LINE_OPT_INTERRUPT_ONLY "interrupt-only" #define CMD_LINE_OPT_TELEMETRY "telemetry" +#define CMD_LINE_OPT_PMD_MGMT "pmd-mgmt" /* Parse the argument given in the command line of the application */ static int @@ -1771,6 +1774,7 @@ parse_args(int argc, char **argv) {CMD_LINE_OPT_LEGACY, 0, 0, 0}, {CMD_LINE_OPT_TELEMETRY, 0, 0, 0}, {CMD_LINE_OPT_INTERRUPT_ONLY, 0, 0, 0}, + {CMD_LINE_OPT_PMD_MGMT, 0, 0, 0}, {NULL, 0, 0, 0} }; @@ -1881,6 +1885,16 @@ parse_args(int argc, char **argv) printf("telemetry mode is enabled\n"); } + if (!strncmp(lgopts[option_index].name, + CMD_LINE_OPT_PMD_MGMT, + sizeof(CMD_LINE_OPT_PMD_MGMT))) { + if (app_mode != APP_MODE_DEFAULT) { + printf(" power mgmt mode is mutually exclusive with other modes\n"); + return -1; + } + app_mode = APP_MODE_PMD_MGMT; + printf("PMD power mgmt mode is enabled\n"); + } if (!strncmp(lgopts[option_index].name, CMD_LINE_OPT_INTERRUPT_ONLY, sizeof(CMD_LINE_OPT_INTERRUPT_ONLY))) { @@ -2437,6 +2451,8 @@ mode_to_str(enum appmode mode) return "telemetry"; case APP_MODE_INTERRUPT: return "interrupt-only"; + case APP_MODE_PMD_MGMT: + return "pmd mgmt"; default: return "invalid"; } @@ -2705,6 +2721,17 @@ main(int argc, char **argv) } else if (!check_ptype(portid)) rte_exit(EXIT_FAILURE, "PMD can not provide needed ptypes\n"); + if (app_mode == APP_MODE_PMD_MGMT) { + ret = rte_power_pmd_mgmt_queue_enable(lcore_id, + portid, queueid, + RTE_POWER_MGMT_TYPE_SCALE); + + if (ret < 0) + rte_exit(EXIT_FAILURE, + "rte_power_pmd_mgmt enable: err=%d, " + "port=%d\n", ret, portid); + + } } } @@ -2790,6 +2817,9 @@ main(int argc, char **argv) SKIP_MAIN); } else if (app_mode == APP_MODE_INTERRUPT) { rte_eal_mp_remote_launch(main_intr_loop, NULL, CALL_MAIN); + } else if (app_mode == APP_MODE_PMD_MGMT) { + rte_eal_mp_remote_launch(main_telemetry_loop, NULL, + CALL_MAIN); } if (app_mode == APP_MODE_EMPTY_POLL || app_mode == APP_MODE_TELEMETRY) @@ -2816,6 +2846,20 @@ main(int argc, char **argv) if (app_mode == APP_MODE_EMPTY_POLL) rte_power_empty_poll_stat_free(); + if (app_mode == APP_MODE_PMD_MGMT) { + for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { + if (rte_lcore_is_enabled(lcore_id) == 0) + continue; + qconf = &lcore_conf[lcore_id]; + for (queue = 0; queue < qconf->n_rx_queue; ++queue) { + portid = qconf->rx_queue_list[queue].port_id; + queueid = qconf->rx_queue_list[queue].queue_id; + rte_power_pmd_mgmt_queue_disable(lcore_id, + portid, queueid); + } + } + } + if ((app_mode == APP_MODE_LEGACY || app_mode == APP_MODE_EMPTY_POLL) && deinit_power_library()) rte_exit(EXIT_FAILURE, "deinit_power_library failed\n");