Message ID | cover.1610473000.git.anatoly.burakov@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Add PMD power management | expand |
On Tue, Jan 12, 2021 at 6:37 PM Anatoly Burakov <anatoly.burakov@intel.com> wrote: > > This patchset proposes a simple API for Ethernet drivers to cause the > CPU to enter a power-optimized state while waiting for packets to > arrive. There are multiple proposed mechanisms to achieve said power > savings: simple frequency scaling, idle loop, and monitoring the Rx > queue for incoming packages. The latter is achieved through cooperation > with the NIC driver that will allow us to know address of wake up event, > and wait for writes on that address. > > On IA, this is achieved through using UMONITOR/UMWAIT instructions. They > are used in their raw opcode form because there is no widespread > compiler support for them yet. Still, the API is made generic enough to > hopefully support other architectures, if they happen to implement > similar instructions. > > To achieve power savings, there is a very simple mechanism used: we're > counting empty polls, and if a certain threshold is reached, we employ > one of the suggested power management schemes automatically, from within > a Rx callback inside the PMD. Once there's traffic again, the empty poll > counter is reset. > > This patchset also introduces a few changes into existing power > management-related intrinsics, namely to provide a native way of waking > up a sleeping core without application being responsible for it, as well > as general robustness improvements. There's quite a bit of locking going > on, but these locks are per-thread and very little (if any) contention > is expected, so the performance impact shouldn't be that bad (and in any > case the locking happens when we're about to sleep anyway). > > Why are we putting it into ethdev as opposed to leaving this up to the > application? Our customers specifically requested a way to do it with > minimal changes to the application code. The current approach allows to > just flip a switch and automatically have power savings. > > Things of note: > > - Only 1:1 core to queue mapping is supported, meaning that each lcore > must at most handle RX on a single queue If we want to save power, it is likely we would poll more rxqs on a thread. > - Support 3 type policies. Monitor/Pause/Frequency Scaling > - Power management is enabled per-queue > - The API doesn't extend to other device types > > v16: > - Implemented Konstantin's suggestions and comments > - Added return values to the API - This revision breaks SPDK build (reported by UNH): http://mails.dpdk.org/archives/test-report/2021-January/174069.html - Build is broken for ARM and PPC at patch: 86491d5bd4 - (HEAD) eal: add monitor wakeup function (25 minutes ago) <Anatoly Burakov> Only pasting the ARM failure: ninja: Entering directory `/home/dmarchan/builds/build-arm64-host-clang' [1/297] Compiling C object 'lib/76b5a35@@rte_eal@sta/librte_eal_arm_rte_power_intrinsics.c.o'. FAILED: lib/76b5a35@@rte_eal@sta/librte_eal_arm_rte_power_intrinsics.c.o aarch64-linux-gnu-gcc -Ilib/76b5a35@@rte_eal@sta -Ilib -I../../dpdk/lib -I. -I../../dpdk/ -Iconfig -I../../dpdk/config -Ilib/librte_eal/include -I../../dpdk/lib/librte_eal/include -Ilib/librte_eal/linux/include -I../../dpdk/lib/librte_eal/linux/include -Ilib/librte_eal/arm/include -I../../dpdk/lib/librte_eal/arm/include -Ilib/librte_eal/common -I../../dpdk/lib/librte_eal/common -Ilib/librte_eal -I../../dpdk/lib/librte_eal -Ilib/librte_kvargs -I../../dpdk/lib/librte_kvargs -Ilib/librte_telemetry/../librte_metrics -I../../dpdk/lib/librte_telemetry/../librte_metrics -Ilib/librte_telemetry -I../../dpdk/lib/librte_telemetry -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O2 -g -include rte_config.h -Wextra -Wcast-qual -Wdeprecated -Wformat -Wformat-nonliteral -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpointer-arith -Wsign-compare -Wstrict-prototypes -Wundef -Wwrite-strings -Wno-packed-not-aligned -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=armv8-a+crc -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -Wno-format-truncation '-DABI_VERSION="21.1"' -DRTE_LIBEAL_USE_GETENTROPY -MD -MQ 'lib/76b5a35@@rte_eal@sta/librte_eal_arm_rte_power_intrinsics.c.o' -MF 'lib/76b5a35@@rte_eal@sta/librte_eal_arm_rte_power_intrinsics.c.o.d' -o 'lib/76b5a35@@rte_eal@sta/librte_eal_arm_rte_power_intrinsics.c.o' -c ../../dpdk/lib/librte_eal/arm/rte_power_intrinsics.c ../../dpdk/lib/librte_eal/arm/rte_power_intrinsics.c:35:1: error: conflicting types for ‘rte_power_monitor_wakeup’ rte_power_monitor_wakeup(const unsigned int lcore_id) ^~~~~~~~~~~~~~~~~~~~~~~~ In file included from ../../dpdk/lib/librte_eal/arm/include/rte_power_intrinsics.h:14, from ../../dpdk/lib/librte_eal/arm/rte_power_intrinsics.c:5: ../../dpdk/lib/librte_eal/include/generic/rte_power_intrinsics.h:79:5: note: previous declaration of ‘rte_power_monitor_wakeup’ was here int rte_power_monitor_wakeup(const unsigned int lcore_id); ^~~~~~~~~~~~~~~~~~~~~~~~ ninja: build stopped: subcommand failed. - The ABI check is still not happy as I reported earlier. Reproduced on v16 (GHA had a hiccup on this revision, but previous ones had the failure too): 1 Changed variable: [C] 'rte_eth_dev rte_eth_devices[]' was changed at rte_ethdev_core.h:196:1: type of variable changed: array element type 'struct rte_eth_dev' changed: type size hasn't changed 1 data member change: type of 'const eth_dev_ops* rte_eth_dev::dev_ops' changed: in pointed to type 'const eth_dev_ops': in unqualified underlying type 'struct eth_dev_ops' at rte_ethdev_driver.h:789:1: type size changed from 6208 to 6272 (in bits) 1 data member insertion: 'eth_get_monitor_addr_t eth_dev_ops::get_monitor_addr', at offset 6208 (in bits) at rte_ethdev_driver.h:940:1 no data member changes (94 filtered); type size hasn't changed Error: ABI issue reported for 'abidiff --suppr /home/dmarchan/dpdk/devtools/../devtools/libabigail.abignore --no-added-syms --headers-dir1 /home/dmarchan/abi/v20.11/build-gcc-static/usr/local/include --headers-dir2 /home/dmarchan/builds/build-gcc-static/install/usr/local/include /home/dmarchan/abi/v20.11/build-gcc-static/dump/librte_ethdev.dump /home/dmarchan/builds/build-gcc-static/install/dump/librte_ethdev.dump' ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged this as a potential issue). One solution is to add an exception on the eth_dev_ops structure. --- a/devtools/libabigail.abignore +++ b/devtools/libabigail.abignore @@ -7,3 +7,7 @@ symbol_version = INTERNAL [suppress_variable] symbol_version = INTERNAL + +; Explicit ignore for driver-only ABI +[suppress_type] + name = eth_dev_ops
On 14-Jan-21 9:36 AM, David Marchand wrote: > On Tue, Jan 12, 2021 at 6:37 PM Anatoly Burakov > <anatoly.burakov@intel.com> wrote: >> >> This patchset proposes a simple API for Ethernet drivers to cause the >> CPU to enter a power-optimized state while waiting for packets to >> arrive. There are multiple proposed mechanisms to achieve said power >> savings: simple frequency scaling, idle loop, and monitoring the Rx >> queue for incoming packages. The latter is achieved through cooperation >> with the NIC driver that will allow us to know address of wake up event, >> and wait for writes on that address. >> >> On IA, this is achieved through using UMONITOR/UMWAIT instructions. They >> are used in their raw opcode form because there is no widespread >> compiler support for them yet. Still, the API is made generic enough to >> hopefully support other architectures, if they happen to implement >> similar instructions. >> >> To achieve power savings, there is a very simple mechanism used: we're >> counting empty polls, and if a certain threshold is reached, we employ >> one of the suggested power management schemes automatically, from within >> a Rx callback inside the PMD. Once there's traffic again, the empty poll >> counter is reset. >> >> This patchset also introduces a few changes into existing power >> management-related intrinsics, namely to provide a native way of waking >> up a sleeping core without application being responsible for it, as well >> as general robustness improvements. There's quite a bit of locking going >> on, but these locks are per-thread and very little (if any) contention >> is expected, so the performance impact shouldn't be that bad (and in any >> case the locking happens when we're about to sleep anyway). >> >> Why are we putting it into ethdev as opposed to leaving this up to the >> application? Our customers specifically requested a way to do it with >> minimal changes to the application code. The current approach allows to >> just flip a switch and automatically have power savings. >> >> Things of note: >> >> - Only 1:1 core to queue mapping is supported, meaning that each lcore >> must at most handle RX on a single queue > > If we want to save power, it is likely we would poll more rxqs on a thread. We are investigating possibilities to make that happen, but for this patchset, this is the limitation. > > >> - Support 3 type policies. Monitor/Pause/Frequency Scaling >> - Power management is enabled per-queue >> - The API doesn't extend to other device types >> >> v16: >> - Implemented Konstantin's suggestions and comments >> - Added return values to the API > > - This revision breaks SPDK build (reported by UNH): > http://mails.dpdk.org/archives/test-report/2021-January/174069.html > > > - Build is broken for ARM and PPC at patch: > 86491d5bd4 - (HEAD) eal: add monitor wakeup function (25 minutes ago) > <Anatoly Burakov> > > Only pasting the ARM failure: > ninja: Entering directory `/home/dmarchan/builds/build-arm64-host-clang' > [1/297] Compiling C object > 'lib/76b5a35@@rte_eal@sta/librte_eal_arm_rte_power_intrinsics.c.o'. > FAILED: lib/76b5a35@@rte_eal@sta/librte_eal_arm_rte_power_intrinsics.c.o > aarch64-linux-gnu-gcc -Ilib/76b5a35@@rte_eal@sta -Ilib > -I../../dpdk/lib -I. -I../../dpdk/ -Iconfig -I../../dpdk/config > -Ilib/librte_eal/include -I../../dpdk/lib/librte_eal/include > -Ilib/librte_eal/linux/include > -I../../dpdk/lib/librte_eal/linux/include -Ilib/librte_eal/arm/include > -I../../dpdk/lib/librte_eal/arm/include -Ilib/librte_eal/common > -I../../dpdk/lib/librte_eal/common -Ilib/librte_eal > -I../../dpdk/lib/librte_eal -Ilib/librte_kvargs > -I../../dpdk/lib/librte_kvargs > -Ilib/librte_telemetry/../librte_metrics > -I../../dpdk/lib/librte_telemetry/../librte_metrics > -Ilib/librte_telemetry -I../../dpdk/lib/librte_telemetry > -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall > -Winvalid-pch -Werror -O2 -g -include rte_config.h -Wextra -Wcast-qual > -Wdeprecated -Wformat -Wformat-nonliteral -Wformat-security > -Wmissing-declarations -Wmissing-prototypes -Wnested-externs > -Wold-style-definition -Wpointer-arith -Wsign-compare > -Wstrict-prototypes -Wundef -Wwrite-strings -Wno-packed-not-aligned > -Wno-missing-field-initializers -D_GNU_SOURCE -fPIC -march=armv8-a+crc > -DALLOW_EXPERIMENTAL_API -DALLOW_INTERNAL_API -Wno-format-truncation > '-DABI_VERSION="21.1"' -DRTE_LIBEAL_USE_GETENTROPY -MD -MQ > 'lib/76b5a35@@rte_eal@sta/librte_eal_arm_rte_power_intrinsics.c.o' -MF > 'lib/76b5a35@@rte_eal@sta/librte_eal_arm_rte_power_intrinsics.c.o.d' > -o 'lib/76b5a35@@rte_eal@sta/librte_eal_arm_rte_power_intrinsics.c.o' > -c ../../dpdk/lib/librte_eal/arm/rte_power_intrinsics.c > ../../dpdk/lib/librte_eal/arm/rte_power_intrinsics.c:35:1: error: > conflicting types for ‘rte_power_monitor_wakeup’ > rte_power_monitor_wakeup(const unsigned int lcore_id) > ^~~~~~~~~~~~~~~~~~~~~~~~ > In file included from > ../../dpdk/lib/librte_eal/arm/include/rte_power_intrinsics.h:14, > from ../../dpdk/lib/librte_eal/arm/rte_power_intrinsics.c:5: > ../../dpdk/lib/librte_eal/include/generic/rte_power_intrinsics.h:79:5: > note: previous declaration of ‘rte_power_monitor_wakeup’ was here > int rte_power_monitor_wakeup(const unsigned int lcore_id); > ^~~~~~~~~~~~~~~~~~~~~~~~ > ninja: build stopped: subcommand failed. Woops, wrong return value in the .c files. Will fix! > > > > - The ABI check is still not happy as I reported earlier. > Reproduced on v16 (GHA had a hiccup on this revision, but previous > ones had the failure too): > > 1 Changed variable: > > [C] 'rte_eth_dev rte_eth_devices[]' was changed at rte_ethdev_core.h:196:1: > type of variable changed: > array element type 'struct rte_eth_dev' changed: > type size hasn't changed > 1 data member change: > type of 'const eth_dev_ops* rte_eth_dev::dev_ops' changed: > in pointed to type 'const eth_dev_ops': > in unqualified underlying type 'struct eth_dev_ops' at > rte_ethdev_driver.h:789:1: > type size changed from 6208 to 6272 (in bits) > 1 data member insertion: > 'eth_get_monitor_addr_t > eth_dev_ops::get_monitor_addr', at offset 6208 (in bits) at > rte_ethdev_driver.h:940:1 > no data member changes (94 filtered); > type size hasn't changed > > Error: ABI issue reported for 'abidiff --suppr > /home/dmarchan/dpdk/devtools/../devtools/libabigail.abignore > --no-added-syms --headers-dir1 > /home/dmarchan/abi/v20.11/build-gcc-static/usr/local/include > --headers-dir2 /home/dmarchan/builds/build-gcc-static/install/usr/local/include > /home/dmarchan/abi/v20.11/build-gcc-static/dump/librte_ethdev.dump > /home/dmarchan/builds/build-gcc-static/install/dump/librte_ethdev.dump' > > ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged > this as a potential issue). > > One solution is to add an exception on the eth_dev_ops structure. > > --- a/devtools/libabigail.abignore > +++ b/devtools/libabigail.abignore > @@ -7,3 +7,7 @@ > symbol_version = INTERNAL > [suppress_variable] > symbol_version = INTERNAL > + > +; Explicit ignore for driver-only ABI > +[suppress_type] > + name = eth_dev_ops > > Right, OK. I didn't realize an "exception" is something you actually do in code, not an ad-hoc community process type thing :) I'll add this in the next revision.