[v13,2/5] eal: add the APIs to wait until equal
diff mbox series

Message ID 1573162528-16230-3-git-send-email-david.marchand@redhat.com
State New
Delegated to: David Marchand
Headers show
Series
  • use WFE for aarch64
Related show

Checks

Context Check Description
ci/Intel-compilation fail Compilation issues
ci/checkpatch success coding style OK

Commit Message

David Marchand Nov. 7, 2019, 9:35 p.m. UTC
From: Gavin Hu <gavin.hu@arm.com>

The rte_wait_until_equal_xx APIs abstract the functionality of
'polling for a memory location to become equal to a given value'.

Add the RTE_ARM_USE_WFE configuration entry for aarch64, disabled
by default. When it is enabled, the above APIs will call WFE instruction
to save CPU cycles and power.

From a VM, when calling this API on aarch64, it may trap in and out to
release vCPUs whereas cause high exit latency. Since kernel 4.18.20 an
adaptive trapping mechanism is introduced to balance the latency and
workload.

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
---
Changelog since v12:
- added release notes update,
- fixed function prototypes indent,
- reimplemented the arm implementation without exposing internal inline
  functions,
- added asserts in generic implementation,

---
 config/arm/meson.build                             |   1 +
 config/common_base                                 |   5 +
 doc/guides/rel_notes/release_19_11.rst             |   5 +
 .../common/include/arch/arm/rte_pause_64.h         | 133 +++++++++++++++++++++
 lib/librte_eal/common/include/generic/rte_pause.h  | 105 ++++++++++++++++
 5 files changed, 249 insertions(+)

Comments

Ananyev, Konstantin Nov. 8, 2019, 4:38 p.m. UTC | #1
Hi David,

> From: Gavin Hu <gavin.hu@arm.com>
> 
> The rte_wait_until_equal_xx APIs abstract the functionality of
> 'polling for a memory location to become equal to a given value'.
> 
> Add the RTE_ARM_USE_WFE configuration entry for aarch64, disabled
> by default. When it is enabled, the above APIs will call WFE instruction
> to save CPU cycles and power.
> 
> From a VM, when calling this API on aarch64, it may trap in and out to
> release vCPUs whereas cause high exit latency. Since kernel 4.18.20 an
> adaptive trapping mechanism is introduced to balance the latency and
> workload.
> 
> Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Phil Yang <phil.yang@arm.com>
> Acked-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> Acked-by: Jerin Jacob <jerinj@marvell.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
> Changelog since v12:
> - added release notes update,
> - fixed function prototypes indent,
> - reimplemented the arm implementation without exposing internal inline
>   functions,
> - added asserts in generic implementation,
> 
> ---
>  config/arm/meson.build                             |   1 +
>  config/common_base                                 |   5 +
>  doc/guides/rel_notes/release_19_11.rst             |   5 +
>  .../common/include/arch/arm/rte_pause_64.h         | 133 +++++++++++++++++++++
>  lib/librte_eal/common/include/generic/rte_pause.h  | 105 ++++++++++++++++
>  5 files changed, 249 insertions(+)
> 
> diff --git a/config/arm/meson.build b/config/arm/meson.build
> index 46dff3a..ea47425 100644
> --- a/config/arm/meson.build
> +++ b/config/arm/meson.build
> @@ -26,6 +26,7 @@ flags_common_default = [
>  	['RTE_LIBRTE_AVP_PMD', false],
> 
>  	['RTE_SCHED_VECTOR', false],
> +	['RTE_ARM_USE_WFE', false],
>  ]
> 
>  flags_generic = [
> diff --git a/config/common_base b/config/common_base
> index 1858598..bb1b1ed 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -110,6 +110,11 @@ CONFIG_RTE_MAX_VFIO_CONTAINERS=64
>  CONFIG_RTE_MALLOC_DEBUG=n
>  CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
>  CONFIG_RTE_USE_LIBBSD=n
> +# Use WFE instructions to implement the rte_wait_for_equal_xxx APIs,
> +# calling these APIs put the cores in low power state while waiting
> +# for the memory address to become equal to the expected value.
> +# This is supported only by aarch64.
> +CONFIG_RTE_ARM_USE_WFE=n
> 
>  #
>  # Recognize/ignore the AVX/AVX512 CPU flags for performance/power testing.
> diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
> index fe11b4b..af5f2c5 100644
> --- a/doc/guides/rel_notes/release_19_11.rst
> +++ b/doc/guides/rel_notes/release_19_11.rst
> @@ -65,6 +65,11 @@ New Features
> 
>    The lock-free stack implementation is enabled for aarch64 platforms.
> 
> +* **Added Wait Until Equal API.**
> +
> +  A new API has been added to wait for a memory location to be updated with a
> +  16-bit, 32-bit, 64-bit value.
> +
>  * **Changed mempool allocation behaviour.**
> 
>    Objects are no longer across pages by default.
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_pause_64.h b/lib/librte_eal/common/include/arch/arm/rte_pause_64.h
> index 93895d3..e87d10b 100644
> --- a/lib/librte_eal/common/include/arch/arm/rte_pause_64.h
> +++ b/lib/librte_eal/common/include/arch/arm/rte_pause_64.h
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2017 Cavium, Inc
> + * Copyright(c) 2019 Arm Limited
>   */
> 
>  #ifndef _RTE_PAUSE_ARM64_H_
> @@ -10,6 +11,11 @@ extern "C" {
>  #endif
> 
>  #include <rte_common.h>
> +
> +#ifdef RTE_ARM_USE_WFE
> +#define RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
> +#endif
> +
>  #include "generic/rte_pause.h"
> 
>  static inline void rte_pause(void)
> @@ -17,6 +23,133 @@ static inline void rte_pause(void)
>  	asm volatile("yield" ::: "memory");
>  }
> 
> +#ifdef RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
> +
> +/* Send an event to quit WFE. */
> +#define __SEVL() { asm volatile("sevl" : : : "memory"); }
> +
> +/* Put processor into low power WFE(Wait For Event) state. */
> +#define __WFE() { asm volatile("wfe" : : : "memory"); }
> +
> +static __rte_always_inline void
> +rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t expected,
> +		int memorder)
> +{
> +	uint16_t value;
> +
> +	assert(memorder == __ATOMIC_ACQUIRE || memorder == __ATOMIC_RELAXED);
> +
> +	/*
> +	 * Atomic exclusive load from addr, it returns the 16-bit content of
> +	 * *addr while making it 'monitored',when it is written by someone
> +	 * else, the 'monitored' state is cleared and a event is generated
> +	 * implicitly to exit WFE.
> +	 */
> +#define __LOAD_EXC_16(src, dst, memorder) {               \
> +	if (memorder == __ATOMIC_RELAXED) {               \
> +		asm volatile("ldxrh %w[tmp], [%x[addr]]"  \
> +			: [tmp] "=&r" (dst)               \
> +			: [addr] "r"(src)                 \
> +			: "memory");                      \
> +	} else {                                          \
> +		asm volatile("ldaxrh %w[tmp], [%x[addr]]" \
> +			: [tmp] "=&r" (dst)               \
> +			: [addr] "r"(src)                 \
> +			: "memory");                      \
> +	} }
> +
> +	__LOAD_EXC_16(addr, value, memorder)
> +	if (value != expected) {
> +		__SEVL()
> +		do {
> +			__WFE()
> +			__LOAD_EXC_16(addr, value, memorder)
> +		} while (value != expected);
> +	}
> +#undef __LOAD_EXC_16
> +}
> +
> +static __rte_always_inline void
> +rte_wait_until_equal_32(volatile uint32_t *addr, uint32_t expected,
> +		int memorder)
> +{
> +	uint32_t value;
> +
> +	assert(memorder == __ATOMIC_ACQUIRE || memorder == __ATOMIC_RELAXED);
> +
> +	/*
> +	 * Atomic exclusive load from addr, it returns the 32-bit content of
> +	 * *addr while making it 'monitored',when it is written by someone
> +	 * else, the 'monitored' state is cleared and a event is generated
> +	 * implicitly to exit WFE.
> +	 */
> +#define __LOAD_EXC_32(src, dst, memorder) {              \
> +	if (memorder == __ATOMIC_RELAXED) {              \
> +		asm volatile("ldxr %w[tmp], [%x[addr]]"  \
> +			: [tmp] "=&r" (dst)              \
> +			: [addr] "r"(src)                \
> +			: "memory");                     \
> +	} else {                                         \
> +		asm volatile("ldaxr %w[tmp], [%x[addr]]" \
> +			: [tmp] "=&r" (dst)              \
> +			: [addr] "r"(src)                \
> +			: "memory");                     \
> +	} }
> +
> +	__LOAD_EXC_32(addr, value, memorder)
> +	if (value != expected) {
> +		__SEVL()
> +		do {
> +			__WFE()
> +			__LOAD_EXC_32(addr, value, memorder)
> +		} while (value != expected);
> +	}
> +#undef __LOAD_EXC_32
> +}
> +
> +static __rte_always_inline void
> +rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
> +		int memorder)
> +{
> +	uint64_t value;
> +
> +	assert(memorder == __ATOMIC_ACQUIRE || memorder == __ATOMIC_RELAXED);
> +
> +	/*
> +	 * Atomic exclusive load from addr, it returns the 64-bit content of
> +	 * *addr while making it 'monitored',when it is written by someone
> +	 * else, the 'monitored' state is cleared and a event is generated
> +	 * implicitly to exit WFE.
> +	 */
> +#define __LOAD_EXC_64(src, dst, memorder) {              \
> +	if (memorder == __ATOMIC_RELAXED) {              \
> +		asm volatile("ldxr %x[tmp], [%x[addr]]"  \
> +			: [tmp] "=&r" (dst)              \
> +			: [addr] "r"(src)                \
> +			: "memory");                     \
> +	} else {                                         \
> +		asm volatile("ldaxr %x[tmp], [%x[addr]]" \
> +			: [tmp] "=&r" (dst)              \
> +			: [addr] "r"(src)                \
> +			: "memory");                     \
> +	} }
> +
> +	__LOAD_EXC_64(addr, value, memorder)
> +	if (value != expected) {
> +		__SEVL()
> +		do {
> +			__WFE()
> +			__LOAD_EXC_64(addr, value, memorder)
> +		} while (value != expected);
> +	}
> +}
> +#undef __LOAD_EXC_64
> +
> +#undef __SEVL
> +#undef __WFE

Personally I don't see how these define/undef are better then inline functions...
Again I suppose they might be re-used in future some other ARM specific low-level primitvies.
My preference would be to keep them as inline function - much cleaner code.
But as I don't develop for/use, I wouldn't insist and let you and arm guys to decide.
Konstantin 

> +
> +#endif
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/common/include/generic/rte_pause.h b/lib/librte_eal/common/include/generic/rte_pause.h
> index 52bd4db..7422785 100644
> --- a/lib/librte_eal/common/include/generic/rte_pause.h
> +++ b/lib/librte_eal/common/include/generic/rte_pause.h
> @@ -1,5 +1,6 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2017 Cavium, Inc
> + * Copyright(c) 2019 Arm Limited
>   */
> 
>  #ifndef _RTE_PAUSE_H_
> @@ -12,6 +13,12 @@
>   *
>   */
> 
> +#include <stdint.h>
> +#include <assert.h>
> +#include <rte_common.h>
> +#include <rte_atomic.h>
> +#include <rte_compat.h>
> +
>  /**
>   * Pause CPU execution for a short while
>   *
> @@ -20,4 +27,102 @@
>   */
>  static inline void rte_pause(void);
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Wait for *addr to be updated with a 16-bit expected value, with a relaxed
> + * memory ordering model meaning the loads around this API can be reordered.
> + *
> + * @param addr
> + *  A pointer to the memory location.
> + * @param expected
> + *  A 16-bit expected value to be in the memory location.
> + * @param memorder
> + *  Two different memory orders that can be specified:
> + *  __ATOMIC_ACQUIRE and __ATOMIC_RELAXED. These map to
> + *  C++11 memory orders with the same names, see the C++11 standard or
> + *  the GCC wiki on atomic synchronization for detailed definition.
> + */
> +__rte_experimental
> +static __rte_always_inline void
> +rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t expected,
> +		int memorder);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Wait for *addr to be updated with a 32-bit expected value, with a relaxed
> + * memory ordering model meaning the loads around this API can be reordered.
> + *
> + * @param addr
> + *  A pointer to the memory location.
> + * @param expected
> + *  A 32-bit expected value to be in the memory location.
> + * @param memorder
> + *  Two different memory orders that can be specified:
> + *  __ATOMIC_ACQUIRE and __ATOMIC_RELAXED. These map to
> + *  C++11 memory orders with the same names, see the C++11 standard or
> + *  the GCC wiki on atomic synchronization for detailed definition.
> + */
> +__rte_experimental
> +static __rte_always_inline void
> +rte_wait_until_equal_32(volatile uint32_t *addr, uint32_t expected,
> +		int memorder);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
> + *
> + * Wait for *addr to be updated with a 64-bit expected value, with a relaxed
> + * memory ordering model meaning the loads around this API can be reordered.
> + *
> + * @param addr
> + *  A pointer to the memory location.
> + * @param expected
> + *  A 64-bit expected value to be in the memory location.
> + * @param memorder
> + *  Two different memory orders that can be specified:
> + *  __ATOMIC_ACQUIRE and __ATOMIC_RELAXED. These map to
> + *  C++11 memory orders with the same names, see the C++11 standard or
> + *  the GCC wiki on atomic synchronization for detailed definition.
> + */
> +__rte_experimental
> +static __rte_always_inline void
> +rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
> +		int memorder);
> +
> +#ifndef RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
> +static __rte_always_inline void
> +rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t expected,
> +		int memorder)
> +{
> +	assert(memorder == __ATOMIC_ACQUIRE || memorder == __ATOMIC_RELAXED);
> +
> +	while (__atomic_load_n(addr, memorder) != expected)
> +		rte_pause();
> +}
> +
> +static __rte_always_inline void
> +rte_wait_until_equal_32(volatile uint32_t *addr, uint32_t expected,
> +		int memorder)
> +{
> +	assert(memorder == __ATOMIC_ACQUIRE || memorder == __ATOMIC_RELAXED);
> +
> +	while (__atomic_load_n(addr, memorder) != expected)
> +		rte_pause();
> +}
> +
> +static __rte_always_inline void
> +rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
> +		int memorder)
> +{
> +	assert(memorder == __ATOMIC_ACQUIRE || memorder == __ATOMIC_RELAXED);
> +
> +	while (__atomic_load_n(addr, memorder) != expected)
> +		rte_pause();
> +}
> +#endif
> +
>  #endif /* _RTE_PAUSE_H_ */
> --
> 1.8.3.1
Thomas Monjalon Nov. 8, 2019, 5 p.m. UTC | #2
08/11/2019 17:38, Ananyev, Konstantin:
> > From: Gavin Hu <gavin.hu@arm.com>
> > +static __rte_always_inline void
> > +rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
> > +		int memorder)
> > +{
> > +	uint64_t value;
> > +
> > +	assert(memorder == __ATOMIC_ACQUIRE || memorder == __ATOMIC_RELAXED);
> > +
> > +	/*
> > +	 * Atomic exclusive load from addr, it returns the 64-bit content of
> > +	 * *addr while making it 'monitored',when it is written by someone
> > +	 * else, the 'monitored' state is cleared and a event is generated
> > +	 * implicitly to exit WFE.
> > +	 */
> > +#define __LOAD_EXC_64(src, dst, memorder) {              \
> > +	if (memorder == __ATOMIC_RELAXED) {              \
> > +		asm volatile("ldxr %x[tmp], [%x[addr]]"  \
> > +			: [tmp] "=&r" (dst)              \
> > +			: [addr] "r"(src)                \
> > +			: "memory");                     \
> > +	} else {                                         \
> > +		asm volatile("ldaxr %x[tmp], [%x[addr]]" \
> > +			: [tmp] "=&r" (dst)              \
> > +			: [addr] "r"(src)                \
> > +			: "memory");                     \
> > +	} }
> > +
> > +	__LOAD_EXC_64(addr, value, memorder)
> > +	if (value != expected) {
> > +		__SEVL()
> > +		do {
> > +			__WFE()
> > +			__LOAD_EXC_64(addr, value, memorder)
> > +		} while (value != expected);
> > +	}
> > +}
> > +#undef __LOAD_EXC_64
> > +
> > +#undef __SEVL
> > +#undef __WFE
> 
> Personally I don't see how these define/undef are better then inline functions...

The benefit it to not pollute the API namespace
with some functions which are used only locally.
After using a macro, it disappears with #undef.

> Again I suppose they might be re-used in future some other ARM specific low-level primitvies.

Do this low-level routines _LOAD_EXC_ need to be exposed in EAL for re-use?

> My preference would be to keep them as inline function - much cleaner code.
> But as I don't develop for/use, I wouldn't insist and let you and arm guys to decide.
Ananyev, Konstantin Nov. 8, 2019, 6:36 p.m. UTC | #3
> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Friday, November 8, 2019 5:00 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: David Marchand <david.marchand@redhat.com>; dev@dpdk.org; nd@arm.com; Gavin Hu <gavin.hu@arm.com>; Mcnamara, John
> <john.mcnamara@intel.com>; Kovacevic, Marko <marko.kovacevic@intel.com>; Jerin Jacob <jerinj@marvell.com>; Jan Viktorin
> <viktorin@rehivetech.com>
> Subject: Re: [PATCH v13 2/5] eal: add the APIs to wait until equal
> 
> 08/11/2019 17:38, Ananyev, Konstantin:
> > > From: Gavin Hu <gavin.hu@arm.com>
> > > +static __rte_always_inline void
> > > +rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
> > > +		int memorder)
> > > +{
> > > +	uint64_t value;
> > > +
> > > +	assert(memorder == __ATOMIC_ACQUIRE || memorder == __ATOMIC_RELAXED);
> > > +
> > > +	/*
> > > +	 * Atomic exclusive load from addr, it returns the 64-bit content of
> > > +	 * *addr while making it 'monitored',when it is written by someone
> > > +	 * else, the 'monitored' state is cleared and a event is generated
> > > +	 * implicitly to exit WFE.
> > > +	 */
> > > +#define __LOAD_EXC_64(src, dst, memorder) {              \
> > > +	if (memorder == __ATOMIC_RELAXED) {              \
> > > +		asm volatile("ldxr %x[tmp], [%x[addr]]"  \
> > > +			: [tmp] "=&r" (dst)              \
> > > +			: [addr] "r"(src)                \
> > > +			: "memory");                     \
> > > +	} else {                                         \
> > > +		asm volatile("ldaxr %x[tmp], [%x[addr]]" \
> > > +			: [tmp] "=&r" (dst)              \
> > > +			: [addr] "r"(src)                \
> > > +			: "memory");                     \
> > > +	} }
> > > +
> > > +	__LOAD_EXC_64(addr, value, memorder)
> > > +	if (value != expected) {
> > > +		__SEVL()
> > > +		do {
> > > +			__WFE()
> > > +			__LOAD_EXC_64(addr, value, memorder)
> > > +		} while (value != expected);
> > > +	}
> > > +}
> > > +#undef __LOAD_EXC_64
> > > +
> > > +#undef __SEVL
> > > +#undef __WFE
> >
> > Personally I don't see how these define/undef are better then inline functions...
> 
> The benefit it to not pollute the API namespace
> with some functions which are used only locally.
> After using a macro, it disappears with #undef.
> 
> > Again I suppose they might be re-used in future some other ARM specific low-level primitvies.
> 
> Do this low-level routines _LOAD_EXC_ need to be exposed in EAL for re-use?

About load_exc don't know for sure...
Though as I can see wfe/sevl are used here:
http://patchwork.dpdk.org/patch/61779/
Might be it is possible to reuse these functions/macros...
But again, I am not arm expert, would be interested to know what arm guys think.

> 
> > My preference would be to keep them as inline function - much cleaner code.
> > But as I don't develop for/use, I wouldn't insist and let you and arm guys to decide.
> 
>
Jerin Jacob Nov. 11, 2019, 5:11 a.m. UTC | #4
On Sat, Nov 9, 2019 at 12:07 AM Ananyev, Konstantin
<konstantin.ananyev@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Friday, November 8, 2019 5:00 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: David Marchand <david.marchand@redhat.com>; dev@dpdk.org; nd@arm.com; Gavin Hu <gavin.hu@arm.com>; Mcnamara, John
> > <john.mcnamara@intel.com>; Kovacevic, Marko <marko.kovacevic@intel.com>; Jerin Jacob <jerinj@marvell.com>; Jan Viktorin
> > <viktorin@rehivetech.com>
> > Subject: Re: [PATCH v13 2/5] eal: add the APIs to wait until equal
> >
> > 08/11/2019 17:38, Ananyev, Konstantin:
> > > > From: Gavin Hu <gavin.hu@arm.com>
> > > > +static __rte_always_inline void
> > > > +rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
> > > > +         int memorder)
> > > > +{
> > > > + uint64_t value;
> > > > +
> > > > + assert(memorder == __ATOMIC_ACQUIRE || memorder == __ATOMIC_RELAXED);
> > > > +
> > > > + /*
> > > > +  * Atomic exclusive load from addr, it returns the 64-bit content of
> > > > +  * *addr while making it 'monitored',when it is written by someone
> > > > +  * else, the 'monitored' state is cleared and a event is generated
> > > > +  * implicitly to exit WFE.
> > > > +  */
> > > > +#define __LOAD_EXC_64(src, dst, memorder) {              \
> > > > + if (memorder == __ATOMIC_RELAXED) {              \
> > > > +         asm volatile("ldxr %x[tmp], [%x[addr]]"  \
> > > > +                 : [tmp] "=&r" (dst)              \
> > > > +                 : [addr] "r"(src)                \
> > > > +                 : "memory");                     \
> > > > + } else {                                         \
> > > > +         asm volatile("ldaxr %x[tmp], [%x[addr]]" \
> > > > +                 : [tmp] "=&r" (dst)              \
> > > > +                 : [addr] "r"(src)                \
> > > > +                 : "memory");                     \
> > > > + } }
> > > > +
> > > > + __LOAD_EXC_64(addr, value, memorder)
> > > > + if (value != expected) {
> > > > +         __SEVL()
> > > > +         do {
> > > > +                 __WFE()
> > > > +                 __LOAD_EXC_64(addr, value, memorder)
> > > > +         } while (value != expected);
> > > > + }
> > > > +}
> > > > +#undef __LOAD_EXC_64
> > > > +
> > > > +#undef __SEVL
> > > > +#undef __WFE
> > >
> > > Personally I don't see how these define/undef are better then inline functions...
> >
> > The benefit it to not pollute the API namespace
> > with some functions which are used only locally.
> > After using a macro, it disappears with #undef.
> >
> > > Again I suppose they might be re-used in future some other ARM specific low-level primitvies.
> >
> > Do this low-level routines _LOAD_EXC_ need to be exposed in EAL for re-use?
>
> About load_exc don't know for sure...
> Though as I can see wfe/sevl are used here:
> http://patchwork.dpdk.org/patch/61779/
> Might be it is possible to reuse these functions/macros...
> But again, I am not arm expert, would be interested to know what arm guys think.


Considering WFE has a requirement on using load_exec on ARM so IMO, it
makes sense to expose
load_exec in conjunction with wfe/sevl to use it properly.



>
> >
> > > My preference would be to keep them as inline function - much cleaner code.
> > > But as I don't develop for/use, I wouldn't insist and let you and arm guys to decide.
> >
> >
>
Gavin Hu (Arm Technology China) Nov. 11, 2019, 5:51 a.m. UTC | #5
> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Monday, November 11, 2019 1:12 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: thomas@monjalon.net; David Marchand
> <david.marchand@redhat.com>; dev@dpdk.org; nd <nd@arm.com>; Gavin
> Hu (Arm Technology China) <Gavin.Hu@arm.com>; Mcnamara, John
> <john.mcnamara@intel.com>; Kovacevic, Marko
> <marko.kovacevic@intel.com>; jerinj@marvell.com; Jan Viktorin
> <viktorin@rehivetech.com>
> Subject: Re: [dpdk-dev] [PATCH v13 2/5] eal: add the APIs to wait until
> equal
> 
> On Sat, Nov 9, 2019 at 12:07 AM Ananyev, Konstantin
> <konstantin.ananyev@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Thomas Monjalon <thomas@monjalon.net>
> > > Sent: Friday, November 8, 2019 5:00 PM
> > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > Cc: David Marchand <david.marchand@redhat.com>; dev@dpdk.org;
> nd@arm.com; Gavin Hu <gavin.hu@arm.com>; Mcnamara, John
> > > <john.mcnamara@intel.com>; Kovacevic, Marko
> <marko.kovacevic@intel.com>; Jerin Jacob <jerinj@marvell.com>; Jan
> Viktorin
> > > <viktorin@rehivetech.com>
> > > Subject: Re: [PATCH v13 2/5] eal: add the APIs to wait until equal
> > >
> > > 08/11/2019 17:38, Ananyev, Konstantin:
> > > > > From: Gavin Hu <gavin.hu@arm.com>
> > > > > +static __rte_always_inline void
> > > > > +rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
> > > > > +         int memorder)
> > > > > +{
> > > > > + uint64_t value;
> > > > > +
> > > > > + assert(memorder == __ATOMIC_ACQUIRE || memorder ==
> __ATOMIC_RELAXED);
> > > > > +
> > > > > + /*
> > > > > +  * Atomic exclusive load from addr, it returns the 64-bit content of
> > > > > +  * *addr while making it 'monitored',when it is written by someone
> > > > > +  * else, the 'monitored' state is cleared and a event is generated
> > > > > +  * implicitly to exit WFE.
> > > > > +  */
> > > > > +#define __LOAD_EXC_64(src, dst, memorder) {              \
> > > > > + if (memorder == __ATOMIC_RELAXED) {              \
> > > > > +         asm volatile("ldxr %x[tmp], [%x[addr]]"  \
> > > > > +                 : [tmp] "=&r" (dst)              \
> > > > > +                 : [addr] "r"(src)                \
> > > > > +                 : "memory");                     \
> > > > > + } else {                                         \
> > > > > +         asm volatile("ldaxr %x[tmp], [%x[addr]]" \
> > > > > +                 : [tmp] "=&r" (dst)              \
> > > > > +                 : [addr] "r"(src)                \
> > > > > +                 : "memory");                     \
> > > > > + } }
> > > > > +
> > > > > + __LOAD_EXC_64(addr, value, memorder)
> > > > > + if (value != expected) {
> > > > > +         __SEVL()
> > > > > +         do {
> > > > > +                 __WFE()
> > > > > +                 __LOAD_EXC_64(addr, value, memorder)
> > > > > +         } while (value != expected);
> > > > > + }
> > > > > +}
> > > > > +#undef __LOAD_EXC_64
> > > > > +
> > > > > +#undef __SEVL
> > > > > +#undef __WFE
> > > >
> > > > Personally I don't see how these define/undef are better then inline
> functions...
> > >
> > > The benefit it to not pollute the API namespace
> > > with some functions which are used only locally.
> > > After using a macro, it disappears with #undef.
> > >
> > > > Again I suppose they might be re-used in future some other ARM
> specific low-level primitvies.
> > >
> > > Do this low-level routines _LOAD_EXC_ need to be exposed in EAL for re-
> use?
> >
> > About load_exc don't know for sure...
> > Though as I can see wfe/sevl are used here:
> > http://patchwork.dpdk.org/patch/61779/
> > Might be it is possible to reuse these functions/macros...
> > But again, I am not arm expert, would be interested to know what arm
> guys think.
> 
> 
> Considering WFE has a requirement on using load_exec on ARM so IMO, it
> makes sense to expose
> load_exec in conjunction with wfe/sevl to use it properly.
Agree, WFE should have more use cases can be developed, not limited to the WAIT_UNTIL_EQUAL_XX APIs. Marvel's patch is an example.
Sorry I don't have more use case so far, but Arm is evolving WFE, so I think more use cases are emerging. 
/Gavin
> 
> 
> >
> > >
> > > > My preference would be to keep them as inline function - much
> cleaner code.
> > > > But as I don't develop for/use, I wouldn't insist and let you and arm
> guys to decide.
> > >
> > >
> >

Patch
diff mbox series

diff --git a/config/arm/meson.build b/config/arm/meson.build
index 46dff3a..ea47425 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -26,6 +26,7 @@  flags_common_default = [
 	['RTE_LIBRTE_AVP_PMD', false],
 
 	['RTE_SCHED_VECTOR', false],
+	['RTE_ARM_USE_WFE', false],
 ]
 
 flags_generic = [
diff --git a/config/common_base b/config/common_base
index 1858598..bb1b1ed 100644
--- a/config/common_base
+++ b/config/common_base
@@ -110,6 +110,11 @@  CONFIG_RTE_MAX_VFIO_CONTAINERS=64
 CONFIG_RTE_MALLOC_DEBUG=n
 CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=n
 CONFIG_RTE_USE_LIBBSD=n
+# Use WFE instructions to implement the rte_wait_for_equal_xxx APIs,
+# calling these APIs put the cores in low power state while waiting
+# for the memory address to become equal to the expected value.
+# This is supported only by aarch64.
+CONFIG_RTE_ARM_USE_WFE=n
 
 #
 # Recognize/ignore the AVX/AVX512 CPU flags for performance/power testing.
diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index fe11b4b..af5f2c5 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -65,6 +65,11 @@  New Features
 
   The lock-free stack implementation is enabled for aarch64 platforms.
 
+* **Added Wait Until Equal API.**
+
+  A new API has been added to wait for a memory location to be updated with a
+  16-bit, 32-bit, 64-bit value.
+
 * **Changed mempool allocation behaviour.**
 
   Objects are no longer across pages by default.
diff --git a/lib/librte_eal/common/include/arch/arm/rte_pause_64.h b/lib/librte_eal/common/include/arch/arm/rte_pause_64.h
index 93895d3..e87d10b 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_pause_64.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_pause_64.h
@@ -1,5 +1,6 @@ 
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2017 Cavium, Inc
+ * Copyright(c) 2019 Arm Limited
  */
 
 #ifndef _RTE_PAUSE_ARM64_H_
@@ -10,6 +11,11 @@  extern "C" {
 #endif
 
 #include <rte_common.h>
+
+#ifdef RTE_ARM_USE_WFE
+#define RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
+#endif
+
 #include "generic/rte_pause.h"
 
 static inline void rte_pause(void)
@@ -17,6 +23,133 @@  static inline void rte_pause(void)
 	asm volatile("yield" ::: "memory");
 }
 
+#ifdef RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
+
+/* Send an event to quit WFE. */
+#define __SEVL() { asm volatile("sevl" : : : "memory"); }
+
+/* Put processor into low power WFE(Wait For Event) state. */
+#define __WFE() { asm volatile("wfe" : : : "memory"); }
+
+static __rte_always_inline void
+rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t expected,
+		int memorder)
+{
+	uint16_t value;
+
+	assert(memorder == __ATOMIC_ACQUIRE || memorder == __ATOMIC_RELAXED);
+
+	/*
+	 * Atomic exclusive load from addr, it returns the 16-bit content of
+	 * *addr while making it 'monitored',when it is written by someone
+	 * else, the 'monitored' state is cleared and a event is generated
+	 * implicitly to exit WFE.
+	 */
+#define __LOAD_EXC_16(src, dst, memorder) {               \
+	if (memorder == __ATOMIC_RELAXED) {               \
+		asm volatile("ldxrh %w[tmp], [%x[addr]]"  \
+			: [tmp] "=&r" (dst)               \
+			: [addr] "r"(src)                 \
+			: "memory");                      \
+	} else {                                          \
+		asm volatile("ldaxrh %w[tmp], [%x[addr]]" \
+			: [tmp] "=&r" (dst)               \
+			: [addr] "r"(src)                 \
+			: "memory");                      \
+	} }
+
+	__LOAD_EXC_16(addr, value, memorder)
+	if (value != expected) {
+		__SEVL()
+		do {
+			__WFE()
+			__LOAD_EXC_16(addr, value, memorder)
+		} while (value != expected);
+	}
+#undef __LOAD_EXC_16
+}
+
+static __rte_always_inline void
+rte_wait_until_equal_32(volatile uint32_t *addr, uint32_t expected,
+		int memorder)
+{
+	uint32_t value;
+
+	assert(memorder == __ATOMIC_ACQUIRE || memorder == __ATOMIC_RELAXED);
+
+	/*
+	 * Atomic exclusive load from addr, it returns the 32-bit content of
+	 * *addr while making it 'monitored',when it is written by someone
+	 * else, the 'monitored' state is cleared and a event is generated
+	 * implicitly to exit WFE.
+	 */
+#define __LOAD_EXC_32(src, dst, memorder) {              \
+	if (memorder == __ATOMIC_RELAXED) {              \
+		asm volatile("ldxr %w[tmp], [%x[addr]]"  \
+			: [tmp] "=&r" (dst)              \
+			: [addr] "r"(src)                \
+			: "memory");                     \
+	} else {                                         \
+		asm volatile("ldaxr %w[tmp], [%x[addr]]" \
+			: [tmp] "=&r" (dst)              \
+			: [addr] "r"(src)                \
+			: "memory");                     \
+	} }
+
+	__LOAD_EXC_32(addr, value, memorder)
+	if (value != expected) {
+		__SEVL()
+		do {
+			__WFE()
+			__LOAD_EXC_32(addr, value, memorder)
+		} while (value != expected);
+	}
+#undef __LOAD_EXC_32
+}
+
+static __rte_always_inline void
+rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
+		int memorder)
+{
+	uint64_t value;
+
+	assert(memorder == __ATOMIC_ACQUIRE || memorder == __ATOMIC_RELAXED);
+
+	/*
+	 * Atomic exclusive load from addr, it returns the 64-bit content of
+	 * *addr while making it 'monitored',when it is written by someone
+	 * else, the 'monitored' state is cleared and a event is generated
+	 * implicitly to exit WFE.
+	 */
+#define __LOAD_EXC_64(src, dst, memorder) {              \
+	if (memorder == __ATOMIC_RELAXED) {              \
+		asm volatile("ldxr %x[tmp], [%x[addr]]"  \
+			: [tmp] "=&r" (dst)              \
+			: [addr] "r"(src)                \
+			: "memory");                     \
+	} else {                                         \
+		asm volatile("ldaxr %x[tmp], [%x[addr]]" \
+			: [tmp] "=&r" (dst)              \
+			: [addr] "r"(src)                \
+			: "memory");                     \
+	} }
+
+	__LOAD_EXC_64(addr, value, memorder)
+	if (value != expected) {
+		__SEVL()
+		do {
+			__WFE()
+			__LOAD_EXC_64(addr, value, memorder)
+		} while (value != expected);
+	}
+}
+#undef __LOAD_EXC_64
+
+#undef __SEVL
+#undef __WFE
+
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/common/include/generic/rte_pause.h b/lib/librte_eal/common/include/generic/rte_pause.h
index 52bd4db..7422785 100644
--- a/lib/librte_eal/common/include/generic/rte_pause.h
+++ b/lib/librte_eal/common/include/generic/rte_pause.h
@@ -1,5 +1,6 @@ 
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2017 Cavium, Inc
+ * Copyright(c) 2019 Arm Limited
  */
 
 #ifndef _RTE_PAUSE_H_
@@ -12,6 +13,12 @@ 
  *
  */
 
+#include <stdint.h>
+#include <assert.h>
+#include <rte_common.h>
+#include <rte_atomic.h>
+#include <rte_compat.h>
+
 /**
  * Pause CPU execution for a short while
  *
@@ -20,4 +27,102 @@ 
  */
 static inline void rte_pause(void);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Wait for *addr to be updated with a 16-bit expected value, with a relaxed
+ * memory ordering model meaning the loads around this API can be reordered.
+ *
+ * @param addr
+ *  A pointer to the memory location.
+ * @param expected
+ *  A 16-bit expected value to be in the memory location.
+ * @param memorder
+ *  Two different memory orders that can be specified:
+ *  __ATOMIC_ACQUIRE and __ATOMIC_RELAXED. These map to
+ *  C++11 memory orders with the same names, see the C++11 standard or
+ *  the GCC wiki on atomic synchronization for detailed definition.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t expected,
+		int memorder);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Wait for *addr to be updated with a 32-bit expected value, with a relaxed
+ * memory ordering model meaning the loads around this API can be reordered.
+ *
+ * @param addr
+ *  A pointer to the memory location.
+ * @param expected
+ *  A 32-bit expected value to be in the memory location.
+ * @param memorder
+ *  Two different memory orders that can be specified:
+ *  __ATOMIC_ACQUIRE and __ATOMIC_RELAXED. These map to
+ *  C++11 memory orders with the same names, see the C++11 standard or
+ *  the GCC wiki on atomic synchronization for detailed definition.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_wait_until_equal_32(volatile uint32_t *addr, uint32_t expected,
+		int memorder);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Wait for *addr to be updated with a 64-bit expected value, with a relaxed
+ * memory ordering model meaning the loads around this API can be reordered.
+ *
+ * @param addr
+ *  A pointer to the memory location.
+ * @param expected
+ *  A 64-bit expected value to be in the memory location.
+ * @param memorder
+ *  Two different memory orders that can be specified:
+ *  __ATOMIC_ACQUIRE and __ATOMIC_RELAXED. These map to
+ *  C++11 memory orders with the same names, see the C++11 standard or
+ *  the GCC wiki on atomic synchronization for detailed definition.
+ */
+__rte_experimental
+static __rte_always_inline void
+rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
+		int memorder);
+
+#ifndef RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
+static __rte_always_inline void
+rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t expected,
+		int memorder)
+{
+	assert(memorder == __ATOMIC_ACQUIRE || memorder == __ATOMIC_RELAXED);
+
+	while (__atomic_load_n(addr, memorder) != expected)
+		rte_pause();
+}
+
+static __rte_always_inline void
+rte_wait_until_equal_32(volatile uint32_t *addr, uint32_t expected,
+		int memorder)
+{
+	assert(memorder == __ATOMIC_ACQUIRE || memorder == __ATOMIC_RELAXED);
+
+	while (__atomic_load_n(addr, memorder) != expected)
+		rte_pause();
+}
+
+static __rte_always_inline void
+rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
+		int memorder)
+{
+	assert(memorder == __ATOMIC_ACQUIRE || memorder == __ATOMIC_RELAXED);
+
+	while (__atomic_load_n(addr, memorder) != expected)
+		rte_pause();
+}
+#endif
+
 #endif /* _RTE_PAUSE_H_ */