From patchwork Fri Oct 23 23:06:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Liang, Ma" X-Patchwork-Id: 82015 Return-Path: X-Original-To: patchwork@inbox.dpdk.org Delivered-To: patchwork@inbox.dpdk.org Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 53A28A04B0; Sat, 24 Oct 2020 01:06:52 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 5B4F55AA3; Sat, 24 Oct 2020 01:06:50 +0200 (CEST) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 8ECFD5A62 for ; Sat, 24 Oct 2020 01:06:47 +0200 (CEST) IronPort-SDR: xFE0JkzSUwvA468b9tfq3Lp/xiPyno7iVgdTD6dav8B041+X8rNfRiHQd6z6MGL+MS4zkh3WXI Mpn+KN2M/OGQ== X-IronPort-AV: E=McAfee;i="6000,8403,9783"; a="185449680" X-IronPort-AV: E=Sophos;i="5.77,410,1596524400"; d="scan'208";a="185449680" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Oct 2020 16:06:44 -0700 IronPort-SDR: FfrwhH1SF/3N1OfL3FkQu1LQOwTytRtXMhabPVCC2weTD6yiaI/QxgJqmGFgtW1UwzrMPNfZUQ 7UZjdDxLK83w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,410,1596524400"; d="scan'208";a="302907083" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by fmsmga007.fm.intel.com with ESMTP; 23 Oct 2020 16:06:41 -0700 Received: from sivswdev09.ir.intel.com (sivswdev09.ir.intel.com [10.237.217.48]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id 09NN6emW016401; Sat, 24 Oct 2020 00:06:40 +0100 Received: from sivswdev09.ir.intel.com (localhost [127.0.0.1]) by sivswdev09.ir.intel.com with ESMTP id 09NN6eVW007317; Sat, 24 Oct 2020 00:06:40 +0100 Received: (from lma25@localhost) by sivswdev09.ir.intel.com with LOCAL id 09NN6dQX007300; Sat, 24 Oct 2020 00:06:39 +0100 From: Liang Ma To: dev@dpdk.org Cc: anatoly.burakov@intel.com, viktorin@rehivetech.com, qi.z.zhang@intel.com, ruifeng.wang@arm.com, beilei.xing@intel.com, jia.guo@intel.com, qiming.yang@intel.com, haiyue.wang@intel.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, david.hunt@intel.com, jerinjacobk@gmail.com, nhorman@tuxdriver.com, thomas@monjalon.net, timothy.mcdaniel@intel.com, gage.eads@intel.com, drc@linux.vnet.ibm.com, Liang Ma Date: Sat, 24 Oct 2020 00:06:22 +0100 Message-Id: <1603494392-7181-1-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: References: In-Reply-To: <1603473432-11153-1-git-send-email-liang.j.ma@intel.com> References: <1603473432-11153-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [PATCH v9 00/10] Add PMD power mgmt X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This patchset proposes a simple API for Ethernet drivers to cause the CPU to enter a power-optimized state while waiting for packets to arrive, along with a set of generic intrinsics that facilitate that. This is achieved through cooperation with the NIC driver that will allow us to know address of wake up event, and wait for writes on it. On IA, this is achieved through using UMONITOR/UMWAIT instructions. They are used in their raw opcode form because there is no widespread compiler support for them yet. Still, the API is made generic enough to hopefully support other architectures, if they happen to implement similar instructions. To achieve power savings, there is a very simple mechanism used: we're counting empty polls, and if a certain threshold is reached, we get the address of next RX ring descriptor from the NIC driver, arm the monitoring hardware, and enter a power-optimized state. We will then wake up when either a timeout happens, or a write happens (or generally whenever CPU feels like waking up - this is platform- specific), and proceed as normal. The empty poll counter is reset whenever we actually get packets, so we only go to sleep when we know nothing is going on. The mechanism is generic which can be used for any write back descriptor. Why are we putting it into ethdev as opposed to leaving this up to the application? Our customers specifically requested a way to do it wit minimal changes to the application code. The current approach allows to just flip a switch and automatically have power savings. - Only 1:1 core to queue mapping is supported, meaning that each lcore must at most handle RX on a single queue - Support 3 type policies. UMWAIT/PAUSE/Frequency_Scale - Power management is enabled per-queue - The API doesn't extend to other device types Liang Ma (10): eal: add new x86 cpuid support for WAITPKG eal: add power management intrinsics eal: add intrinsics support check infrastructure ethdev: add simple power management API power: add PMD power management API and callback net/ixgbe: implement power management API net/i40e: implement power management API net/ice: implement power management API examples/l3fwd-power: enable PMD power mgmt doc: update programmer's guide for power library doc/guides/prog_guide/power_man.rst | 42 +++ doc/guides/rel_notes/release_20_11.rst | 16 + .../sample_app_ug/l3_forward_power_man.rst | 13 + drivers/net/i40e/i40e_ethdev.c | 1 + drivers/net/i40e/i40e_rxtx.c | 26 ++ drivers/net/i40e/i40e_rxtx.h | 2 + drivers/net/ice/ice_ethdev.c | 1 + drivers/net/ice/ice_rxtx.c | 26 ++ drivers/net/ice/ice_rxtx.h | 2 + drivers/net/ixgbe/ixgbe_ethdev.c | 1 + drivers/net/ixgbe/ixgbe_rxtx.c | 25 ++ drivers/net/ixgbe/ixgbe_rxtx.h | 2 + examples/l3fwd-power/main.c | 46 ++- lib/librte_eal/arm/include/meson.build | 1 + .../arm/include/rte_power_intrinsics.h | 60 ++++ lib/librte_eal/arm/rte_cpuflags.c | 6 + lib/librte_eal/include/generic/rte_cpuflags.h | 26 ++ .../include/generic/rte_power_intrinsics.h | 123 +++++++ lib/librte_eal/include/meson.build | 1 + lib/librte_eal/ppc/include/meson.build | 1 + .../ppc/include/rte_power_intrinsics.h | 60 ++++ lib/librte_eal/ppc/rte_cpuflags.c | 7 + lib/librte_eal/version.map | 1 + lib/librte_eal/x86/include/meson.build | 1 + lib/librte_eal/x86/include/rte_cpuflags.h | 1 + .../x86/include/rte_power_intrinsics.h | 135 ++++++++ lib/librte_eal/x86/rte_cpuflags.c | 14 + lib/librte_ethdev/rte_ethdev.c | 23 ++ lib/librte_ethdev/rte_ethdev.h | 28 ++ lib/librte_ethdev/rte_ethdev_driver.h | 28 ++ lib/librte_ethdev/version.map | 1 + lib/librte_power/meson.build | 5 +- lib/librte_power/rte_power_pmd_mgmt.c | 320 ++++++++++++++++++ lib/librte_power/rte_power_pmd_mgmt.h | 92 +++++ lib/librte_power/version.map | 4 + 35 files changed, 1138 insertions(+), 3 deletions(-) create mode 100644 lib/librte_eal/arm/include/rte_power_intrinsics.h create mode 100644 lib/librte_eal/include/generic/rte_power_intrinsics.h create mode 100644 lib/librte_eal/ppc/include/rte_power_intrinsics.h create mode 100644 lib/librte_eal/x86/include/rte_power_intrinsics.h create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h