mbox series

[v17,00/11] Add PMD power management

Message ID cover.1610635488.git.anatoly.burakov@intel.com (mailing list archive)
Headers
Series Add PMD power management |

Message

Burakov, Anatoly Jan. 14, 2021, 2:46 p.m. UTC
  This patchset proposes a simple API for Ethernet drivers to cause the  
CPU to enter a power-optimized state while waiting for packets to  
arrive. There are multiple proposed mechanisms to achieve said power
savings: simple frequency scaling, idle loop, and monitoring the Rx
queue for incoming packages. The latter is achieved through cooperation
with the NIC driver that will allow us to know address of wake up event,
and wait for writes on that address.

On IA, this is achieved through using UMONITOR/UMWAIT instructions. They 
are used in their raw opcode form because there is no widespread 
compiler support for them yet. Still, the API is made generic enough to
hopefully support other architectures, if they happen to implement 
similar instructions.

To achieve power savings, there is a very simple mechanism used: we're 
counting empty polls, and if a certain threshold is reached, we employ
one of the suggested power management schemes automatically, from within
a Rx callback inside the PMD. Once there's traffic again, the empty poll
counter is reset.

This patchset also introduces a few changes into existing power 
management-related intrinsics, namely to provide a native way of waking 
up a sleeping core without application being responsible for it, as well 
as general robustness improvements. There's quite a bit of locking going 
on, but these locks are per-thread and very little (if any) contention 
is expected, so the performance impact shouldn't be that bad (and in any 
case the locking happens when we're about to sleep anyway).

Why are we putting it into ethdev as opposed to leaving this up to the 
application? Our customers specifically requested a way to do it with
minimal changes to the application code. The current approach allows to 
just flip a switch and automatically have power savings.

Things of note:

- Only 1:1 core to queue mapping is supported, meaning that each lcore 
  must at most handle RX on a single queue
- Support 3 type policies. Monitor/Pause/Frequency Scaling
- Power management is enabled per-queue
- The API doesn't extend to other device types

v17:
- Added exception for ethdev driver-only ABI
- Added memory barriers for monitor/wakeup (Konstantin)
- Fixed compiled issues on non-x86 platforms (hopefully!)

v16:
- Implemented Konstantin's suggestions and comments
- Added return values to the API

v15:
- Fixed incorrect check in UMWAIT callback
- Fixed accidental whitespace changes

v14:
- Fixed ARM/PPC builds
- Addressed various review comments

v13:
- Reworked the librte_power code to require less locking and handle invalid
  parameters better
- Fix numerous rebase errors present in v12

v12:
- Rebase on top of 21.02
- Rework of power intrinsics code

Anatoly Burakov (5):
  eal: uninline power intrinsics
  eal: avoid invalid API usage in power intrinsics
  eal: change API of power intrinsics
  eal: remove sync version of power monitor
  eal: add monitor wakeup function

Liang Ma (6):
  ethdev: add simple power management API
  power: add PMD power management API and callback
  net/ixgbe: implement power management API
  net/i40e: implement power management API
  net/ice: implement power management API
  examples/l3fwd-power: enable PMD power mgmt

 devtools/libabigail.abignore                  |   3 +
 doc/guides/prog_guide/power_man.rst           |  44 +++
 doc/guides/rel_notes/release_21_02.rst        |  15 +
 .../sample_app_ug/l3_forward_power_man.rst    |  35 ++
 drivers/event/dlb/dlb.c                       |  10 +-
 drivers/event/dlb2/dlb2.c                     |  10 +-
 drivers/net/i40e/i40e_ethdev.c                |   1 +
 drivers/net/i40e/i40e_rxtx.c                  |  25 ++
 drivers/net/i40e/i40e_rxtx.h                  |   1 +
 drivers/net/ice/ice_ethdev.c                  |   1 +
 drivers/net/ice/ice_rxtx.c                    |  26 ++
 drivers/net/ice/ice_rxtx.h                    |   1 +
 drivers/net/ixgbe/ixgbe_ethdev.c              |   1 +
 drivers/net/ixgbe/ixgbe_rxtx.c                |  25 ++
 drivers/net/ixgbe/ixgbe_rxtx.h                |   1 +
 examples/l3fwd-power/main.c                   |  89 ++++-
 .../arm/include/rte_power_intrinsics.h        |  40 --
 lib/librte_eal/arm/meson.build                |   1 +
 lib/librte_eal/arm/rte_power_intrinsics.c     |  40 ++
 .../include/generic/rte_power_intrinsics.h    |  88 ++---
 .../ppc/include/rte_power_intrinsics.h        |  40 --
 lib/librte_eal/ppc/meson.build                |   1 +
 lib/librte_eal/ppc/rte_power_intrinsics.c     |  40 ++
 lib/librte_eal/version.map                    |   3 +
 .../x86/include/rte_power_intrinsics.h        | 115 ------
 lib/librte_eal/x86/meson.build                |   1 +
 lib/librte_eal/x86/rte_power_intrinsics.c     | 215 +++++++++++
 lib/librte_ethdev/rte_ethdev.c                |  28 ++
 lib/librte_ethdev/rte_ethdev.h                |  25 ++
 lib/librte_ethdev/rte_ethdev_driver.h         |  22 ++
 lib/librte_ethdev/version.map                 |   3 +
 lib/librte_power/meson.build                  |   5 +-
 lib/librte_power/rte_power_pmd_mgmt.c         | 364 ++++++++++++++++++
 lib/librte_power/rte_power_pmd_mgmt.h         |  90 +++++
 lib/librte_power/version.map                  |   5 +
 35 files changed, 1155 insertions(+), 259 deletions(-)
 create mode 100644 lib/librte_eal/arm/rte_power_intrinsics.c
 create mode 100644 lib/librte_eal/ppc/rte_power_intrinsics.c
 create mode 100644 lib/librte_eal/x86/rte_power_intrinsics.c
 create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c
 create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h
  

Comments

David Marchand Jan. 18, 2021, 3:24 p.m. UTC | #1
On Thu, Jan 14, 2021 at 3:46 PM Anatoly Burakov
<anatoly.burakov@intel.com> wrote:
>
> This patchset proposes a simple API for Ethernet drivers to cause the
> CPU to enter a power-optimized state while waiting for packets to
> arrive. There are multiple proposed mechanisms to achieve said power
> savings: simple frequency scaling, idle loop, and monitoring the Rx
> queue for incoming packages. The latter is achieved through cooperation
> with the NIC driver that will allow us to know address of wake up event,
> and wait for writes on that address.
>
> On IA, this is achieved through using UMONITOR/UMWAIT instructions. They
> are used in their raw opcode form because there is no widespread
> compiler support for them yet. Still, the API is made generic enough to
> hopefully support other architectures, if they happen to implement
> similar instructions.
>
> To achieve power savings, there is a very simple mechanism used: we're
> counting empty polls, and if a certain threshold is reached, we employ
> one of the suggested power management schemes automatically, from within
> a Rx callback inside the PMD. Once there's traffic again, the empty poll
> counter is reset.
>
> This patchset also introduces a few changes into existing power
> management-related intrinsics, namely to provide a native way of waking
> up a sleeping core without application being responsible for it, as well
> as general robustness improvements. There's quite a bit of locking going
> on, but these locks are per-thread and very little (if any) contention
> is expected, so the performance impact shouldn't be that bad (and in any
> case the locking happens when we're about to sleep anyway).
>
> Why are we putting it into ethdev as opposed to leaving this up to the
> application? Our customers specifically requested a way to do it with
> minimal changes to the application code. The current approach allows to
> just flip a switch and automatically have power savings.
>
> Things of note:
>
> - Only 1:1 core to queue mapping is supported, meaning that each lcore
>   must at most handle RX on a single queue
> - Support 3 type policies. Monitor/Pause/Frequency Scaling
> - Power management is enabled per-queue
> - The API doesn't extend to other device types
>
> v17:
> - Added exception for ethdev driver-only ABI
> - Added memory barriers for monitor/wakeup (Konstantin)
> - Fixed compiled issues on non-x86 platforms (hopefully!)

SPDK build is still broken.
http://mails.dpdk.org/archives/test-report/2021-January/174840.html

==== 20 line log output for Ubuntu 18.04 (dpdk_compile_spdk): ====
rte_power_pmd_mgmt.c:(.text.experimental+0x1cc): undefined reference
to `rte_eth_add_rx_callback'
rte_power_pmd_mgmt.c:(.text.experimental+0x1f8): undefined reference
to `rte_eth_get_monitor_addr'
rte_power_pmd_mgmt.c:(.text.experimental+0x37f): undefined reference
to `rte_eth_dev_logtype'
/dpdk/build/lib/librte_power.a(librte_power_rte_power_pmd_mgmt.c.o):
In function `rte_power_pmd_mgmt_queue_disable':
rte_power_pmd_mgmt.c:(.text.experimental+0x42a): undefined reference
to `rte_eth_dev_is_valid_port'
rte_power_pmd_mgmt.c:(.text.experimental+0x4e7): undefined reference
to `rte_eth_remove_rx_callback'
rte_power_pmd_mgmt.c:(.text.experimental+0x536): undefined reference
to `rte_eth_remove_rx_callback'
rte_power_pmd_mgmt.c:(.text.experimental+0x54d): undefined reference
to `rte_eth_dev_logtype'
collect2: error: ld returned 1 exit status
/spdk/mk/spdk.app.mk:65: recipe for target 'iscsi_fuzz' failed
/spdk/mk/spdk.subdirs.mk:44: recipe for target 'iscsi_fuzz' failed
/spdk/mk/spdk.subdirs.mk:44: recipe for target 'fuzz' failed
make[4]: *** [iscsi_fuzz] Error 1
make[3]: *** [iscsi_fuzz] Error 2
make[2]: *** [fuzz] Error 2
/spdk/mk/spdk.subdirs.mk:44: recipe for target 'app' failed
make[1]: *** [app] Error 2
/spdk/mk/spdk.subdirs.mk:44: recipe for target 'test' failed
make: *** [test] Error 2
[2] Error running command.


I guess this is because of the added dependency of rte_ethdev to rte_power.
Afaics, SPDK does not use pkg-config:
https://github.com/spdk/spdk/blob/master/lib/env_dpdk/env.mk#L53
  
Burakov, Anatoly Jan. 18, 2021, 3:45 p.m. UTC | #2
On 18-Jan-21 3:24 PM, David Marchand wrote:
> On Thu, Jan 14, 2021 at 3:46 PM Anatoly Burakov
> <anatoly.burakov@intel.com> wrote:
>>
>> This patchset proposes a simple API for Ethernet drivers to cause the
>> CPU to enter a power-optimized state while waiting for packets to
>> arrive. There are multiple proposed mechanisms to achieve said power
>> savings: simple frequency scaling, idle loop, and monitoring the Rx
>> queue for incoming packages. The latter is achieved through cooperation
>> with the NIC driver that will allow us to know address of wake up event,
>> and wait for writes on that address.
>>
>> On IA, this is achieved through using UMONITOR/UMWAIT instructions. They
>> are used in their raw opcode form because there is no widespread
>> compiler support for them yet. Still, the API is made generic enough to
>> hopefully support other architectures, if they happen to implement
>> similar instructions.
>>
>> To achieve power savings, there is a very simple mechanism used: we're
>> counting empty polls, and if a certain threshold is reached, we employ
>> one of the suggested power management schemes automatically, from within
>> a Rx callback inside the PMD. Once there's traffic again, the empty poll
>> counter is reset.
>>
>> This patchset also introduces a few changes into existing power
>> management-related intrinsics, namely to provide a native way of waking
>> up a sleeping core without application being responsible for it, as well
>> as general robustness improvements. There's quite a bit of locking going
>> on, but these locks are per-thread and very little (if any) contention
>> is expected, so the performance impact shouldn't be that bad (and in any
>> case the locking happens when we're about to sleep anyway).
>>
>> Why are we putting it into ethdev as opposed to leaving this up to the
>> application? Our customers specifically requested a way to do it with
>> minimal changes to the application code. The current approach allows to
>> just flip a switch and automatically have power savings.
>>
>> Things of note:
>>
>> - Only 1:1 core to queue mapping is supported, meaning that each lcore
>>    must at most handle RX on a single queue
>> - Support 3 type policies. Monitor/Pause/Frequency Scaling
>> - Power management is enabled per-queue
>> - The API doesn't extend to other device types
>>
>> v17:
>> - Added exception for ethdev driver-only ABI
>> - Added memory barriers for monitor/wakeup (Konstantin)
>> - Fixed compiled issues on non-x86 platforms (hopefully!)
> 
> SPDK build is still broken.
> http://mails.dpdk.org/archives/test-report/2021-January/174840.html
> 
> ==== 20 line log output for Ubuntu 18.04 (dpdk_compile_spdk): ====
> rte_power_pmd_mgmt.c:(.text.experimental+0x1cc): undefined reference
> to `rte_eth_add_rx_callback'
> rte_power_pmd_mgmt.c:(.text.experimental+0x1f8): undefined reference
> to `rte_eth_get_monitor_addr'
> rte_power_pmd_mgmt.c:(.text.experimental+0x37f): undefined reference
> to `rte_eth_dev_logtype'
> /dpdk/build/lib/librte_power.a(librte_power_rte_power_pmd_mgmt.c.o):
> In function `rte_power_pmd_mgmt_queue_disable':
> rte_power_pmd_mgmt.c:(.text.experimental+0x42a): undefined reference
> to `rte_eth_dev_is_valid_port'
> rte_power_pmd_mgmt.c:(.text.experimental+0x4e7): undefined reference
> to `rte_eth_remove_rx_callback'
> rte_power_pmd_mgmt.c:(.text.experimental+0x536): undefined reference
> to `rte_eth_remove_rx_callback'
> rte_power_pmd_mgmt.c:(.text.experimental+0x54d): undefined reference
> to `rte_eth_dev_logtype'
> collect2: error: ld returned 1 exit status
> /spdk/mk/spdk.app.mk:65: recipe for target 'iscsi_fuzz' failed
> /spdk/mk/spdk.subdirs.mk:44: recipe for target 'iscsi_fuzz' failed
> /spdk/mk/spdk.subdirs.mk:44: recipe for target 'fuzz' failed
> make[4]: *** [iscsi_fuzz] Error 1
> make[3]: *** [iscsi_fuzz] Error 2
> make[2]: *** [fuzz] Error 2
> /spdk/mk/spdk.subdirs.mk:44: recipe for target 'app' failed
> make[1]: *** [app] Error 2
> /spdk/mk/spdk.subdirs.mk:44: recipe for target 'test' failed
> make: *** [test] Error 2
> [2] Error running command.
> 
> 
> I guess this is because of the added dependency of rte_ethdev to rte_power.
> Afaics, SPDK does not use pkg-config:
> https://github.com/spdk/spdk/blob/master/lib/env_dpdk/env.mk#L53
> 
> 

Sooo... this is an SPDK issue then? Because i can't see any way of 
fixing the issue on DPDK side.
  
Thomas Monjalon Jan. 18, 2021, 4:06 p.m. UTC | #3
18/01/2021 16:45, Burakov, Anatoly:
> On 18-Jan-21 3:24 PM, David Marchand wrote:
> > On Thu, Jan 14, 2021 at 3:46 PM Anatoly Burakov
> > <anatoly.burakov@intel.com> wrote:
> >>
> >> This patchset proposes a simple API for Ethernet drivers to cause the
> >> CPU to enter a power-optimized state while waiting for packets to
> >> arrive. There are multiple proposed mechanisms to achieve said power
> >> savings: simple frequency scaling, idle loop, and monitoring the Rx
> >> queue for incoming packages. The latter is achieved through cooperation
> >> with the NIC driver that will allow us to know address of wake up event,
> >> and wait for writes on that address.
[...]
> >> Why are we putting it into ethdev as opposed to leaving this up to the
> >> application? Our customers specifically requested a way to do it with
> >> minimal changes to the application code. The current approach allows to
> >> just flip a switch and automatically have power savings.

The customer laziness is usually a bad justification :)
I think we could achieve the same with not too much code
on application side.
And I'm not sure hiding queue management is sane.
Remember this rule: application must remain in control.

[...]
> > SPDK build is still broken.
> > http://mails.dpdk.org/archives/test-report/2021-January/174840.html
[...]
> > I guess this is because of the added dependency of rte_ethdev to rte_power.
> > Afaics, SPDK does not use pkg-config:
> > https://github.com/spdk/spdk/blob/master/lib/env_dpdk/env.mk#L53
> 
> Sooo... this is an SPDK issue then? Because i can't see any way of 
> fixing the issue on DPDK side.

Yes SPDK should not skip pkg-config.
But it raises 2 question:
	- are we breaking ABI compatibility?
	- is ethdev management expected for librte_power?

It makes me wonder whether we should host the few functions mixing
librte_ethdev and librte_power somewhere else.
The question is where?
  
Burakov, Anatoly Jan. 18, 2021, 5:02 p.m. UTC | #4
On 18-Jan-21 4:06 PM, Thomas Monjalon wrote:
> 18/01/2021 16:45, Burakov, Anatoly:
>> On 18-Jan-21 3:24 PM, David Marchand wrote:
>>> On Thu, Jan 14, 2021 at 3:46 PM Anatoly Burakov
>>> <anatoly.burakov@intel.com> wrote:
>>>>
>>>> This patchset proposes a simple API for Ethernet drivers to cause the
>>>> CPU to enter a power-optimized state while waiting for packets to
>>>> arrive. There are multiple proposed mechanisms to achieve said power
>>>> savings: simple frequency scaling, idle loop, and monitoring the Rx
>>>> queue for incoming packages. The latter is achieved through cooperation
>>>> with the NIC driver that will allow us to know address of wake up event,
>>>> and wait for writes on that address.
> [...]
>>>> Why are we putting it into ethdev as opposed to leaving this up to the
>>>> application? Our customers specifically requested a way to do it with
>>>> minimal changes to the application code. The current approach allows to
>>>> just flip a switch and automatically have power savings.
> 
> The customer laziness is usually a bad justification :)
> I think we could achieve the same with not too much code
> on application side.

Yes, we could. Customers could basically take this patch and reimplement 
it inside their application, and get the same benefits (with also added 
benefit of having knowledge about their queue/core mapping, and so being 
able to use the PAUSE or SCALE schemes for more than one queue).

However, i still think it's a valid use case - if we can do it that way 
and have a ready-made power management story, why not?

> And I'm not sure hiding queue management is sane.
> Remember this rule: application must remain in control.
>

The application can still be in control by just not using the API and 
implementing things manually instead. Nothing is being taken away from 
the ability of application to be in control.

> [...]
>>> SPDK build is still broken.
>>> http://mails.dpdk.org/archives/test-report/2021-January/174840.html
> [...]
>>> I guess this is because of the added dependency of rte_ethdev to rte_power.
>>> Afaics, SPDK does not use pkg-config:
>>> https://github.com/spdk/spdk/blob/master/lib/env_dpdk/env.mk#L53
>>
>> Sooo... this is an SPDK issue then? Because i can't see any way of
>> fixing the issue on DPDK side.
> 
> Yes SPDK should not skip pkg-config.
> But it raises 2 question:
> 	- are we breaking ABI compatibility?

Good question. Does including an extra intra-DPDK dependency count as 
ABI break? I was under impression that we didn't want DPDK to be 
distributed as individual libraries but rather would like it to be used 
as a whole, so if internal dependencies between components change, it's 
not a big deal (unless a third-party build system is used that 
explicitly specifies dependencies rather than using pkg-config).

> 	- is ethdev management expected for librte_power?
> 
> It makes me wonder whether we should host the few functions mixing
> librte_ethdev and librte_power somewhere else.
> The question is where?
> 

That could be another possibility. We could put this into a separate 
library, but IMO it would serve no purpose other than avoiding adding a 
dependency on *internal* component to librte_power. I'm not sure it's a 
worthy trade off.
  
David Marchand Jan. 18, 2021, 5:54 p.m. UTC | #5
On Mon, Jan 18, 2021 at 6:02 PM Burakov, Anatoly
<anatoly.burakov@intel.com> wrote:
> >>> SPDK build is still broken.
> >>> http://mails.dpdk.org/archives/test-report/2021-January/174840.html
> > [...]
> >>> I guess this is because of the added dependency of rte_ethdev to rte_power.
> >>> Afaics, SPDK does not use pkg-config:
> >>> https://github.com/spdk/spdk/blob/master/lib/env_dpdk/env.mk#L53
> >>
> >> Sooo... this is an SPDK issue then? Because i can't see any way of
> >> fixing the issue on DPDK side.
> >
> > Yes SPDK should not skip pkg-config.
> > But it raises 2 question:
> >       - are we breaking ABI compatibility?
>
> Good question. Does including an extra intra-DPDK dependency count as
> ABI break? I was under impression that we didn't want DPDK to be
> distributed as individual libraries but rather would like it to be used
> as a whole, so if internal dependencies between components change, it's
> not a big deal (unless a third-party build system is used that
> explicitly specifies dependencies rather than using pkg-config).

I don't get where an ABI breakage would be.

What I reported is an issue with static link.

For shared link, I would expect librte_power would expose its
dependency on rte_ethdev via a DT_NEEDED entry.
The final binary does not have to be aware of it.
  
Thomas Monjalon Jan. 18, 2021, 10:52 p.m. UTC | #6
14/01/2021 15:46, Anatoly Burakov:
> Anatoly Burakov (5):
>   eal: uninline power intrinsics
>   eal: avoid invalid API usage in power intrinsics
>   eal: change API of power intrinsics
>   eal: remove sync version of power monitor
>   eal: add monitor wakeup function
> 
> Liang Ma (6):
>   ethdev: add simple power management API
>   power: add PMD power management API and callback
>   net/ixgbe: implement power management API
>   net/i40e: implement power management API
>   net/ice: implement power management API
>   examples/l3fwd-power: enable PMD power mgmt

The librte_power part may deserve another iteration.
I suggest to give it a chance for a better version in -rc2.

For 21.02-rc1, the EAL and ethdev (including PMDs) parts are merged, thanks.
  
Burakov, Anatoly Jan. 19, 2021, 10:30 a.m. UTC | #7
On 18-Jan-21 10:52 PM, Thomas Monjalon wrote:
> 14/01/2021 15:46, Anatoly Burakov:
>> Anatoly Burakov (5):
>>    eal: uninline power intrinsics
>>    eal: avoid invalid API usage in power intrinsics
>>    eal: change API of power intrinsics
>>    eal: remove sync version of power monitor
>>    eal: add monitor wakeup function
>>
>> Liang Ma (6):
>>    ethdev: add simple power management API
>>    power: add PMD power management API and callback
>>    net/ixgbe: implement power management API
>>    net/i40e: implement power management API
>>    net/ice: implement power management API
>>    examples/l3fwd-power: enable PMD power mgmt
> 
> The librte_power part may deserve another iteration.
> I suggest to give it a chance for a better version in -rc2.
> 
> For 21.02-rc1, the EAL and ethdev (including PMDs) parts are merged, thanks.
> 

Good to hear, thanks!

I'll get on the respin of remaining parts.