mbox series

[RFC,0/5] use WFE for locks and ring on aarch64

Message ID 1561911676-37718-1-git-send-email-gavin.hu@arm.com (mailing list archive)
Headers
Series use WFE for locks and ring on aarch64 |

Message

Gavin Hu June 30, 2019, 4:21 p.m. UTC
  DPDK has multiple use cases where the core repeatedly polls a location in
memory. This polling results in many cache and memory transactions.

Arm architecture provides WFE (Wait For Event) instruction, which allows
the cpu core to enter a low power state until woken up by the update to the
memory location being polled. Thus reducing the cache and memory
transactions.

x86 has the PAUSE hint instruction to reduce such overhead.

The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling
for a memory location to become equal to a given value'.

For non-Arm platforms, these APIs are just wrappers around do-while loop
with rte_pause, so there are no performance differences.

For Arm platforms, use of WFE can be configured using CONFIG_RTE_USE_WFE
option. It is disabled by default.

Currently, use of WFE is supported only for aarch64 platforms. armv7
platforms do support the WFE instruction, but they require explicit wake up
events(sev) and are less performannt.

Testing shows that, performance varies across different platforms, with
some showing degradation.

CONFIG_RTE_USE_WFE should be enabled depending on the performance on the
target platforms.

Gavin Hu (5):
  eal: add the APIs to wait until equal
  ticketlock: use new API to reduce contention on aarch64
  ring: use wfe to wait for ring tail update on aarch64
  spinlock: use wfe to reduce contention on aarch64
  config: add WFE config entry for aarch64

 config/arm/meson.build                             |   1 +
 config/common_armv8a_linux                         |   6 +
 .../common/include/arch/arm/rte_pause_64.h         | 143 +++++++++++++++++++++
 .../common/include/arch/arm/rte_spinlock.h         |  25 ++++
 lib/librte_eal/common/include/generic/rte_pause.h  |  20 +++
 .../common/include/generic/rte_spinlock.h          |   2 +-
 .../common/include/generic/rte_ticketlock.h        |   4 +-
 lib/librte_ring/rte_ring_c11_mem.h                 |   5 +-
 lib/librte_ring/rte_ring_generic.h                 |   4 +-
 9 files changed, 203 insertions(+), 7 deletions(-)
  

Comments

Stephen Hemminger June 30, 2019, 8:29 p.m. UTC | #1
On Mon,  1 Jul 2019 00:21:11 +0800
Gavin Hu <gavin.hu@arm.com> wrote:

> DPDK has multiple use cases where the core repeatedly polls a location in
> memory. This polling results in many cache and memory transactions.
> 
> Arm architecture provides WFE (Wait For Event) instruction, which allows
> the cpu core to enter a low power state until woken up by the update to the
> memory location being polled. Thus reducing the cache and memory
> transactions.
> 
> x86 has the PAUSE hint instruction to reduce such overhead.
> 
> The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling
> for a memory location to become equal to a given value'.
> 
> For non-Arm platforms, these APIs are just wrappers around do-while loop
> with rte_pause, so there are no performance differences.
> 
> For Arm platforms, use of WFE can be configured using CONFIG_RTE_USE_WFE
> option. It is disabled by default.
> 
> Currently, use of WFE is supported only for aarch64 platforms. armv7
> platforms do support the WFE instruction, but they require explicit wake up
> events(sev) and are less performannt.
> 
> Testing shows that, performance varies across different platforms, with
> some showing degradation.
> 
> CONFIG_RTE_USE_WFE should be enabled depending on the performance on the
> target platforms.

How does this work if process is preempted?
  
Gavin Hu July 1, 2019, 9:12 a.m. UTC | #2
Hi Stephen,

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Monday, July 1, 2019 4:30 AM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>
> Cc: dev@dpdk.org; thomas@monjalon.net; jerinj@marvell.com;
> hemant.agrawal@nxp.com; bruce.richardson@intel.com;
> chaozhu@linux.vnet.ibm.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] [RFC 0/5] use WFE for locks and ring on aarch64
> 
> On Mon,  1 Jul 2019 00:21:11 +0800
> Gavin Hu <gavin.hu@arm.com> wrote:
> 
> > DPDK has multiple use cases where the core repeatedly polls a location in
> > memory. This polling results in many cache and memory transactions.
> >
> > Arm architecture provides WFE (Wait For Event) instruction, which allows
> > the cpu core to enter a low power state until woken up by the update to the
> > memory location being polled. Thus reducing the cache and memory
> > transactions.
> >
> > x86 has the PAUSE hint instruction to reduce such overhead.
> >
> > The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling
> > for a memory location to become equal to a given value'.
> >
> > For non-Arm platforms, these APIs are just wrappers around do-while loop
> > with rte_pause, so there are no performance differences.
> >
> > For Arm platforms, use of WFE can be configured using
> CONFIG_RTE_USE_WFE
> > option. It is disabled by default.
> >
> > Currently, use of WFE is supported only for aarch64 platforms. armv7
> > platforms do support the WFE instruction, but they require explicit wake up
> > events(sev) and are less performannt.
> >
> > Testing shows that, performance varies across different platforms, with
> > some showing degradation.
> >
> > CONFIG_RTE_USE_WFE should be enabled depending on the performance
> on the
> > target platforms.
> 
> How does this work if process is preempted?
WFE won't prevent pre-emption from the kernel as that is down to a timer/re-scheduling interrupt.
Software using the WFE mechanism must tolerate spurious wake-up events, including timer/re-scheduling interrupts, so a re-check of the condition upon exit of WFE is needed to be in place(this is already included in the patch)