mbox series

[v4,0/6] use WFE for locks and ring on aarch64

Message ID 1566454356-37277-1-git-send-email-gavin.hu@arm.com (mailing list archive)
Headers
Series use WFE for locks and ring on aarch64 |

Message

Gavin Hu Aug. 22, 2019, 6:12 a.m. UTC
  DPDK has multiple use cases where the core repeatedly polls a location in
memory. This polling results in many cache and memory transactions.

Arm architecture provides WFE (Wait For Event) instruction, which allows
the cpu core to enter a low power state until woken up by the update to the
memory location being polled. Thus reducing the cache and memory
transactions.

x86 has the PAUSE hint instruction to reduce such overhead.

The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling
for a memory location to become equal to a given value'.

For non-Arm platforms, these APIs are just wrappers around do-while loop
with rte_pause, so there are no performance differences.

For Arm platforms, use of WFE can be configured using CONFIG_RTE_USE_WFE
option. It is disabled by default.

Currently, use of WFE is supported only for aarch64 platforms. armv7
platforms do support the WFE instruction, but they require explicit wake up
events(sev) and are less performannt.

Testing shows that, performance varies across different platforms, with
some showing degradation.

CONFIG_RTE_ARM_USE_WFE should be enabled depending on the performance
benchmarking on the target platforms. Power saving should be an bonus,
but currenly we don't have ways to characterize that.

V4:
- rename the config as CONFIG_RTE_ARM_USE_WFE to indicate it applys to arm only
- introduce a macro for assembly Skelton to reduce the duplication of code
- add one patch for nxp fslmc to address a compiling error
V3:
- Convert RFCs to patches
V2:
- Use inline functions instead of marcos
- Add load and compare in the beginning of the APIs
- Fix some style errors in asm inline 
V1:
- Add the new APIs and use it for ring and locks

Gavin Hu (6):
  bus/fslmc: fix the conflicting dmb function
  eal: add the APIs to wait until equal
  ticketlock: use new API to reduce contention on aarch64
  ring: use wfe to wait for ring tail update on aarch64
  spinlock: use wfe to reduce contention on aarch64
  config: add WFE config entry for aarch64

 config/arm/meson.build                             |  1 +
 config/common_base                                 |  6 +++++
 drivers/bus/fslmc/mc/fsl_mc_sys.h                  | 10 +++++---
 drivers/bus/fslmc/mc/mc_sys.c                      |  3 +--
 .../common/include/arch/arm/rte_pause_64.h         | 30 ++++++++++++++++++++++
 .../common/include/arch/arm/rte_spinlock.h         | 25 ++++++++++++++++++
 lib/librte_eal/common/include/generic/rte_pause.h  | 26 ++++++++++++++++++-
 .../common/include/generic/rte_ticketlock.h        |  3 +--
 lib/librte_ring/rte_ring_c11_mem.h                 |  4 +--
 lib/librte_ring/rte_ring_generic.h                 |  3 +--
 10 files changed, 99 insertions(+), 12 deletions(-)
  

Comments

David Marchand Oct. 16, 2019, 8:08 a.m. UTC | #1
Hello guys,

This series got a lot of attention from ARM people and it seems ready
for integration.
But I did not see comment from other architectures, could you have a
look please?


Thanks.
  
David Christensen Oct. 24, 2019, 8:26 p.m. UTC | #2
> This series got a lot of attention from ARM people and it seems ready
> for integration.
> But I did not see comment from other architectures, could you have a
> look please?

I spent some time going through the Power ISA specification and the 
Linux code and didn't find an equivalent.  Under Linux this looks like
a __cmpwait_case_XX operation but that's only defined for arm64 and used 
in barrier operations.

Dave