[v8,1/5] eal: add new definitions for wait scheme

Message ID 20211029082021.945586-2-feifei.wang2@arm.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series add new definitions for wait scheme |

Checks

Context Check Description
ci/checkpatch success coding style OK

Commit Message

Feifei Wang Oct. 29, 2021, 8:20 a.m. UTC
  Introduce macros as generic interface for address monitoring.

Add '__LOAD_EXC_128' for size of 128. For different size, encapsulate
'__LOAD_EXC_16', '__LOAD_EXC_32', '__LOAD_EXC_64' and '__LOAD_EXC_128'
into a new macro '__LOAD_EXC'.

Furthermore, to prevent compilation warning in arm:
----------------------------------------------
'warning: implicit declaration of function ...'
----------------------------------------------
Delete 'undef' constructions for '__LOAD_EXC_xx', '__SEVL' and '__WFE'.
And add ‘__RTE_ARM’ for these macros to fix the namespace.
This is because original macros are undefine at the end of the file.
If new macro 'rte_wait_event' calls them in other files, they will be
seen as 'not defined'.

Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/eal/arm/include/rte_pause_64.h  | 202 +++++++++++++++++-----------
 lib/eal/include/generic/rte_pause.h |  28 ++++
 2 files changed, 154 insertions(+), 76 deletions(-)
  

Comments

Jerin Jacob Oct. 29, 2021, 1:54 p.m. UTC | #1
On Fri, Oct 29, 2021 at 1:50 PM Feifei Wang <feifei.wang2@arm.com> wrote:
>
> Introduce macros as generic interface for address monitoring.
>
> Add '__LOAD_EXC_128' for size of 128. For different size, encapsulate
> '__LOAD_EXC_16', '__LOAD_EXC_32', '__LOAD_EXC_64' and '__LOAD_EXC_128'
> into a new macro '__LOAD_EXC'.
>
> Furthermore, to prevent compilation warning in arm:
> ----------------------------------------------
> 'warning: implicit declaration of function ...'
> ----------------------------------------------
> Delete 'undef' constructions for '__LOAD_EXC_xx', '__SEVL' and '__WFE'.
> And add ‘__RTE_ARM’ for these macros to fix the namespace.
> This is because original macros are undefine at the end of the file.
> If new macro 'rte_wait_event' calls them in other files, they will be
> seen as 'not defined'.
>
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

Acked-by: Jerin Jacob <jerinj@marvell.com>
  
David Marchand Oct. 31, 2021, 8:38 a.m. UTC | #2
On Fri, Oct 29, 2021 at 10:20 AM Feifei Wang <feifei.wang2@arm.com> wrote:
>
> Introduce macros as generic interface for address monitoring.

The main point of this patch is to add a new generic helper.


>
> Add '__LOAD_EXC_128' for size of 128. For different size, encapsulate
> '__LOAD_EXC_16', '__LOAD_EXC_32', '__LOAD_EXC_64' and '__LOAD_EXC_128'
> into a new macro '__LOAD_EXC'.

ARM macros are just a result of introducing this new helper as a macro.
I would not mention them.


>
> Furthermore, to prevent compilation warning in arm:
> ----------------------------------------------
> 'warning: implicit declaration of function ...'
> ----------------------------------------------
> Delete 'undef' constructions for '__LOAD_EXC_xx', '__SEVL' and '__WFE'.
> And add ‘__RTE_ARM’ for these macros to fix the namespace.
> This is because original macros are undefine at the end of the file.
> If new macro 'rte_wait_event' calls them in other files, they will be
> seen as 'not defined'.


About this new helper, it's rather confusing:
- it is a macro, should be in capital letters,
- "rte_wait_event(addr, mask, cond, expected)" waits until "*addr &
mask cond expected" becomes false. I find this confusing. I would
invert the condition.
- so far, we had rte_wait_until_* helpers, rte_wait_event seems like a
step backward as it seems to talk about the ARM stuff (wfe),
- the masking part is artificial in some cases, at least let's avoid
using a too generic name, we can decide to add a non-masked helper
later.

For those reasons, I'd prefer we have something like:

/*
 * Wait until *addr & mask makes the condition true. With a relaxed memory
 * ordering model, the loads around this helper can be reordered.
 *
 * @param addr
 *  A pointer to the memory location.
 * @param mask
 *  A mask of *addr bits in interest.
 * @param cond
 *  A symbol representing the condition.
 * @param expected
 *  An expected value to be in the memory location.
 * @param memorder
 *  Two different memory orders that can be specified:
 *  __ATOMIC_ACQUIRE and __ATOMIC_RELAXED. These map to
 *  C++11 memory orders with the same names, see the C++11 standard or
 *  the GCC wiki on atomic synchronization for detailed definition.
 */
#define RTE_WAIT_UNTIL_MASKED(addr, mask, cond, expected, memorder)
           \
do {
           \
        RTE_BUILD_BUG_ON(memorder != __ATOMIC_ACQUIRE &&
           \
                memorder != __ATOMIC_RELAXED);
           \
        typeof(*(addr)) expected_value = expected;
           \
        while (!((__atomic_load_n(addr, memorder) & (mask)) cond
expected_value)) \
                rte_pause();
           \
} while (0)


Comments below.

>
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  lib/eal/arm/include/rte_pause_64.h  | 202 +++++++++++++++++-----------
>  lib/eal/include/generic/rte_pause.h |  28 ++++
>  2 files changed, 154 insertions(+), 76 deletions(-)
>
> diff --git a/lib/eal/arm/include/rte_pause_64.h b/lib/eal/arm/include/rte_pause_64.h
> index e87d10b8cc..783c6aae87 100644
> --- a/lib/eal/arm/include/rte_pause_64.h
> +++ b/lib/eal/arm/include/rte_pause_64.h

[snip]

> +/*
> + * Atomic exclusive load from addr, it returns the 64-bit content of
> + * *addr while making it 'monitored', when it is written by someone
> + * else, the 'monitored' state is cleared and an event is generated
> + * implicitly to exit WFE.
> + */
> +#define __RTE_ARM_LOAD_EXC_64(src, dst, memorder) {      \
> +       if (memorder == __ATOMIC_RELAXED) {              \
> +               asm volatile("ldxr %x[tmp], [%x[addr]]"  \
> +                       : [tmp] "=&r" (dst)              \
> +                       : [addr] "r" (src)               \
> +                       : "memory");                     \
> +       } else {                                         \
> +               asm volatile("ldaxr %x[tmp], [%x[addr]]" \
> +                       : [tmp] "=&r" (dst)              \
> +                       : [addr] "r" (src)               \
> +                       : "memory");                     \
> +       } }
> +
> +/*
> + * Atomic exclusive load from addr, it returns the 128-bit content of
> + * *addr while making it 'monitored', when it is written by someone
> + * else, the 'monitored' state is cleared and an event is generated
> + * implicitly to exit WFE.
> + */
> +#define __RTE_ARM_LOAD_EXC_128(src, dst, memorder) {                    \
> +       volatile rte_int128_t *dst_128 = (volatile rte_int128_t *)&dst; \

dst needs some () protection => &(dst)
Is volatile necessary?


> +       if (memorder == __ATOMIC_RELAXED) {                             \
> +               asm volatile("ldxp %x[tmp0], %x[tmp1], [%x[addr]]"      \
> +                       : [tmp0] "=&r" (dst_128->val[0]),               \
> +                         [tmp1] "=&r" (dst_128->val[1])                \
> +                       : [addr] "r" (src)                              \
> +                       : "memory");                                    \
> +       } else {                                                        \
> +               asm volatile("ldaxp %x[tmp0], %x[tmp1], [%x[addr]]"     \
> +                       : [tmp0] "=&r" (dst_128->val[0]),               \
> +                         [tmp1] "=&r" (dst_128->val[1])                \
> +                       : [addr] "r" (src)                              \
> +                       : "memory");                                    \
> +       } }                                                             \
> +
> +#define __RTE_ARM_LOAD_EXC(src, dst, memorder, size) {          \
> +       RTE_BUILD_BUG_ON(size != 16 && size != 32 && size != 64 \
> +                               && size != 128);                \

Indent should be one tab (idem in other places of this patch).
Double tab is when we have line continuation in tests.


> +       if (size == 16)                                         \
> +               __RTE_ARM_LOAD_EXC_16(src, dst, memorder)       \
> +       else if (size == 32)                                    \
> +               __RTE_ARM_LOAD_EXC_32(src, dst, memorder)       \
> +       else if (size == 64)                                    \
> +               __RTE_ARM_LOAD_EXC_64(src, dst, memorder)       \
> +       else if (size == 128)                                   \
> +               __RTE_ARM_LOAD_EXC_128(src, dst, memorder)      \
> +}
> +

[snip]

> -#undef __LOAD_EXC_64
>
> -#undef __SEVL
> -#undef __WFE
> +#define rte_wait_event(addr, mask, cond, expected, memorder)              \
> +do {                                                                      \
> +       RTE_BUILD_BUG_ON(!__builtin_constant_p(memorder));                \

Is this check on memorder being constant necessary?
We have a build bug on, right after, would it not catch non constant cases?


> +       RTE_BUILD_BUG_ON(memorder != __ATOMIC_ACQUIRE &&                  \
> +                               memorder != __ATOMIC_RELAXED);            \
> +       const uint32_t size = sizeof(*(addr)) << 3;                       \
> +       typeof(*(addr)) expected_value = (expected);                      \

No need for () around expected.


> +       typeof(*(addr)) value;                                            \
> +       __RTE_ARM_LOAD_EXC((addr), value, memorder, size)                 \

No need for () around addr.


> +       if ((value & (mask)) cond expected_value) {                       \
> +               __RTE_ARM_SEVL()                                          \
> +               do {                                                      \
> +                       __RTE_ARM_WFE()                                   \
> +                       __RTE_ARM_LOAD_EXC((addr), value, memorder, size) \

Idem.


> +               } while ((value & (mask)) cond expected_value);           \
> +       }                                                                 \
> +} while (0)
>
>  #endif
>
> diff --git a/lib/eal/include/generic/rte_pause.h b/lib/eal/include/generic/rte_pause.h
> index 668ee4a184..d0c5b5a415 100644
> --- a/lib/eal/include/generic/rte_pause.h
> +++ b/lib/eal/include/generic/rte_pause.h
> @@ -111,6 +111,34 @@ rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
>         while (__atomic_load_n(addr, memorder) != expected)
>                 rte_pause();
>  }

With this patch, ARM header goes though a conversion of assert() to
compilation checks (build bug on).
I don't see a reason not to do the same in generic header.

As a result of this conversion, #include <assert.h> then can be removed.
Though it triggers build failure on following files (afaics) who were
implictly relying on this inclusion:
drivers/net/ark/ark_ddm.c
drivers/net/ark/ark_udm.c
drivers/net/ice/ice_fdir_filter.c
drivers/net/ionic/ionic_rxtx.c
drivers/net/mlx4/mlx4_txq.c
  
Feifei Wang Nov. 1, 2021, 2:29 a.m. UTC | #3
> -----邮件原件-----
> 发件人: David Marchand <david.marchand@redhat.com>
> 发送时间: Sunday, October 31, 2021 4:39 PM
> 收件人: Feifei Wang <Feifei.Wang2@arm.com>
> 抄送: Ruifeng Wang <Ruifeng.Wang@arm.com>; dev <dev@dpdk.org>; nd
> <nd@arm.com>; Jerin Jacob <jerinjacobk@gmail.com>; Stephen Hemminger
> <stephen@networkplumber.org>; thomas@monjalon.net; Mattias Rönnblom
> <mattias.ronnblom@ericsson.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>
> 主题: Re: [PATCH v8 1/5] eal: add new definitions for wait scheme
> 
> On Fri, Oct 29, 2021 at 10:20 AM Feifei Wang <feifei.wang2@arm.com>
> wrote:
> >
> > Introduce macros as generic interface for address monitoring.
> 
> The main point of this patch is to add a new generic helper.
[Feifei] Thanks for the comments, I will change this commit message.
> 
> 
> >
> > Add '__LOAD_EXC_128' for size of 128. For different size, encapsulate
> > '__LOAD_EXC_16', '__LOAD_EXC_32', '__LOAD_EXC_64' and
> '__LOAD_EXC_128'
> > into a new macro '__LOAD_EXC'.
> 
> ARM macros are just a result of introducing this new helper as a macro.
> I would not mention them.
[Feifei] Ok, I will delete it.
> 
> 
> >
> > Furthermore, to prevent compilation warning in arm:
> > ----------------------------------------------
> > 'warning: implicit declaration of function ...'
> > ----------------------------------------------
> > Delete 'undef' constructions for '__LOAD_EXC_xx', '__SEVL' and '__WFE'.
> > And add ‘__RTE_ARM’ for these macros to fix the namespace.
> > This is because original macros are undefine at the end of the file.
> > If new macro 'rte_wait_event' calls them in other files, they will be
> > seen as 'not defined'.
> 
> 
> About this new helper, it's rather confusing:
> - it is a macro, should be in capital letters,
> - "rte_wait_event(addr, mask, cond, expected)" waits until "*addr & mask
> cond expected" becomes false. I find this confusing. I would invert the
> condition.
> - so far, we had rte_wait_until_* helpers, rte_wait_event seems like a step
> backward as it seems to talk about the ARM stuff (wfe),
[Feifei] So if I understand correctly, we need to avoid using 'wait_event' as name.

> - the masking part is artificial in some cases, at least let's avoid using a too
> generic name, we can decide to add a non-masked helper later.
[Feifei] Ok, I will change this name to match the mask.
> 
> For those reasons, I'd prefer we have something like:
> 
> /*
>  * Wait until *addr & mask makes the condition true. With a relaxed memory
>  * ordering model, the loads around this helper can be reordered.
>  *
>  * @param addr
>  *  A pointer to the memory location.
>  * @param mask
>  *  A mask of *addr bits in interest.
>  * @param cond
>  *  A symbol representing the condition.
>  * @param expected
>  *  An expected value to be in the memory location.
>  * @param memorder
>  *  Two different memory orders that can be specified:
>  *  __ATOMIC_ACQUIRE and __ATOMIC_RELAXED. These map to
>  *  C++11 memory orders with the same names, see the C++11 standard or
>  *  the GCC wiki on atomic synchronization for detailed definition.
>  */
> #define RTE_WAIT_UNTIL_MASKED(addr, mask, cond, expected, memorder)
>            \
> do {
>            \
>         RTE_BUILD_BUG_ON(memorder != __ATOMIC_ACQUIRE &&
>            \
>                 memorder != __ATOMIC_RELAXED);
>            \
>         typeof(*(addr)) expected_value = expected;
>            \
>         while (!((__atomic_load_n(addr, memorder) & (mask)) cond
> expected_value)) \
>                 rte_pause();
>            \
> } while (0)
> 
> 
> Comments below.
> 
> >
> > Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> >  lib/eal/arm/include/rte_pause_64.h  | 202
> > +++++++++++++++++-----------  lib/eal/include/generic/rte_pause.h |
> > 28 ++++
> >  2 files changed, 154 insertions(+), 76 deletions(-)
> >
> > diff --git a/lib/eal/arm/include/rte_pause_64.h
> > b/lib/eal/arm/include/rte_pause_64.h
> > index e87d10b8cc..783c6aae87 100644
> > --- a/lib/eal/arm/include/rte_pause_64.h
> > +++ b/lib/eal/arm/include/rte_pause_64.h
> 
> [snip]
> 
> > +/*
> > + * Atomic exclusive load from addr, it returns the 64-bit content of
> > + * *addr while making it 'monitored', when it is written by someone
> > + * else, the 'monitored' state is cleared and an event is generated
> > + * implicitly to exit WFE.
> > + */
> > +#define __RTE_ARM_LOAD_EXC_64(src, dst, memorder) {      \
> > +       if (memorder == __ATOMIC_RELAXED) {              \
> > +               asm volatile("ldxr %x[tmp], [%x[addr]]"  \
> > +                       : [tmp] "=&r" (dst)              \
> > +                       : [addr] "r" (src)               \
> > +                       : "memory");                     \
> > +       } else {                                         \
> > +               asm volatile("ldaxr %x[tmp], [%x[addr]]" \
> > +                       : [tmp] "=&r" (dst)              \
> > +                       : [addr] "r" (src)               \
> > +                       : "memory");                     \
> > +       } }
> > +
> > +/*
> > + * Atomic exclusive load from addr, it returns the 128-bit content of
> > + * *addr while making it 'monitored', when it is written by someone
> > + * else, the 'monitored' state is cleared and an event is generated
> > + * implicitly to exit WFE.
> > + */
> > +#define __RTE_ARM_LOAD_EXC_128(src, dst, memorder) {                    \
> > +       volatile rte_int128_t *dst_128 = (volatile rte_int128_t
> > +*)&dst; \
> 
> dst needs some () protection => &(dst)
[Feifei] dst is from internal defined variable 'value' in
'rte_wait_event', we can ensure it is just a simple variable,
so we do not need to add '()'
> Is volatile necessary?
[Feifei] This volatile is for the case that our parameter 'addr' is volatile.
And we use 'typeof' for 'value', so the value will be defined as 'volatile':
+ typeof(*(addr)) value
So if there is now 'volatile' here, compiler will be report warning:
'volatile is  discard' .                                        
> 
> 
> > +       if (memorder == __ATOMIC_RELAXED) {                             \
> > +               asm volatile("ldxp %x[tmp0], %x[tmp1], [%x[addr]]"      \
> > +                       : [tmp0] "=&r" (dst_128->val[0]),               \
> > +                         [tmp1] "=&r" (dst_128->val[1])                \
> > +                       : [addr] "r" (src)                              \
> > +                       : "memory");                                    \
> > +       } else {                                                        \
> > +               asm volatile("ldaxp %x[tmp0], %x[tmp1], [%x[addr]]"     \
> > +                       : [tmp0] "=&r" (dst_128->val[0]),               \
> > +                         [tmp1] "=&r" (dst_128->val[1])                \
> > +                       : [addr] "r" (src)                              \
> > +                       : "memory");                                    \
> > +       } }                                                             \
> > +
> > +#define __RTE_ARM_LOAD_EXC(src, dst, memorder, size) {          \
> > +       RTE_BUILD_BUG_ON(size != 16 && size != 32 && size != 64 \
> > +                               && size != 128);                \
> 
> Indent should be one tab (idem in other places of this patch).
> Double tab is when we have line continuation in tests.
[Feifei] Ok.
> 
> 
> > +       if (size == 16)                                         \
> > +               __RTE_ARM_LOAD_EXC_16(src, dst, memorder)       \
> > +       else if (size == 32)                                    \
> > +               __RTE_ARM_LOAD_EXC_32(src, dst, memorder)       \
> > +       else if (size == 64)                                    \
> > +               __RTE_ARM_LOAD_EXC_64(src, dst, memorder)       \
> > +       else if (size == 128)                                   \
> > +               __RTE_ARM_LOAD_EXC_128(src, dst, memorder)      \
> > +}
> > +
> 
> [snip]
> 
> > -#undef __LOAD_EXC_64
> >
> > -#undef __SEVL
> > -#undef __WFE
> > +#define rte_wait_event(addr, mask, cond, expected, memorder)              \
> > +do {                                                                      \
> > +       RTE_BUILD_BUG_ON(!__builtin_constant_p(memorder));                \
> 
> Is this check on memorder being constant necessary?
> We have a build bug on, right after, would it not catch non constant cases?
I think this can firstly check whether memorder has been assigned or NULL.
> 
> > +       RTE_BUILD_BUG_ON(memorder != __ATOMIC_ACQUIRE &&
> \
> > +                               memorder != __ATOMIC_RELAXED);            \
> > +       const uint32_t size = sizeof(*(addr)) << 3;                       \
> > +       typeof(*(addr)) expected_value = (expected);                      \
> 
> No need for () around expected.
[Feifei] expected and addr are macro arguments, and we cannot
know what form users will define them, so in order to avoid un-predicted
side-effects with operands associativity, It is necessary to add them.
Please see the discussion with Konstantin:
http://patches.dpdk.org/project/dpdk/patch/20211020084523.1309177-2-feifei.wang2@arm.com/
> 
> 
> > +       typeof(*(addr)) value;                                            \
> > +       __RTE_ARM_LOAD_EXC((addr), value, memorder, size)                 \
> 
> No need for () around addr.
> 
> 
> > +       if ((value & (mask)) cond expected_value) {                       \
> > +               __RTE_ARM_SEVL()                                          \
> > +               do {                                                      \
> > +                       __RTE_ARM_WFE()                                   \
> > +                       __RTE_ARM_LOAD_EXC((addr), value, memorder,
> > + size) \
> 
> Idem.
> 
> 
> > +               } while ((value & (mask)) cond expected_value);           \
> > +       }                                                                 \
> > +} while (0)
> >
> >  #endif
> >
> > diff --git a/lib/eal/include/generic/rte_pause.h
> > b/lib/eal/include/generic/rte_pause.h
> > index 668ee4a184..d0c5b5a415 100644
> > --- a/lib/eal/include/generic/rte_pause.h
> > +++ b/lib/eal/include/generic/rte_pause.h
> > @@ -111,6 +111,34 @@ rte_wait_until_equal_64(volatile uint64_t *addr,
> uint64_t expected,
> >         while (__atomic_load_n(addr, memorder) != expected)
> >                 rte_pause();
> >  }
> 
> With this patch, ARM header goes though a conversion of assert() to
> compilation checks (build bug on).
> I don't see a reason not to do the same in generic header.
> 
> As a result of this conversion, #include <assert.h> then can be removed.
> Though it triggers build failure on following files (afaics) who were implictly
> relying on this inclusion:
> drivers/net/ark/ark_ddm.c
> drivers/net/ark/ark_udm.c
> drivers/net/ice/ice_fdir_filter.c
> drivers/net/ionic/ionic_rxtx.c
> drivers/net/mlx4/mlx4_txq.c
[Feifei]You are right, and  we can put this change in another patch series. 
> 
> 
> --
> David Marchand
  

Patch

diff --git a/lib/eal/arm/include/rte_pause_64.h b/lib/eal/arm/include/rte_pause_64.h
index e87d10b8cc..783c6aae87 100644
--- a/lib/eal/arm/include/rte_pause_64.h
+++ b/lib/eal/arm/include/rte_pause_64.h
@@ -26,47 +26,120 @@  static inline void rte_pause(void)
 #ifdef RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
 
 /* Send an event to quit WFE. */
-#define __SEVL() { asm volatile("sevl" : : : "memory"); }
+#define __RTE_ARM_SEVL() { asm volatile("sevl" : : : "memory"); }
 
 /* Put processor into low power WFE(Wait For Event) state. */
-#define __WFE() { asm volatile("wfe" : : : "memory"); }
+#define __RTE_ARM_WFE() { asm volatile("wfe" : : : "memory"); }
 
-static __rte_always_inline void
-rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t expected,
-		int memorder)
-{
-	uint16_t value;
-
-	assert(memorder == __ATOMIC_ACQUIRE || memorder == __ATOMIC_RELAXED);
-
-	/*
-	 * Atomic exclusive load from addr, it returns the 16-bit content of
-	 * *addr while making it 'monitored',when it is written by someone
-	 * else, the 'monitored' state is cleared and a event is generated
-	 * implicitly to exit WFE.
-	 */
-#define __LOAD_EXC_16(src, dst, memorder) {               \
+/*
+ * Atomic exclusive load from addr, it returns the 16-bit content of
+ * *addr while making it 'monitored', when it is written by someone
+ * else, the 'monitored' state is cleared and an event is generated
+ * implicitly to exit WFE.
+ */
+#define __RTE_ARM_LOAD_EXC_16(src, dst, memorder) {       \
 	if (memorder == __ATOMIC_RELAXED) {               \
 		asm volatile("ldxrh %w[tmp], [%x[addr]]"  \
 			: [tmp] "=&r" (dst)               \
-			: [addr] "r"(src)                 \
+			: [addr] "r" (src)                \
 			: "memory");                      \
 	} else {                                          \
 		asm volatile("ldaxrh %w[tmp], [%x[addr]]" \
 			: [tmp] "=&r" (dst)               \
-			: [addr] "r"(src)                 \
+			: [addr] "r" (src)                \
 			: "memory");                      \
 	} }
 
-	__LOAD_EXC_16(addr, value, memorder)
+/*
+ * Atomic exclusive load from addr, it returns the 32-bit content of
+ * *addr while making it 'monitored', when it is written by someone
+ * else, the 'monitored' state is cleared and an event is generated
+ * implicitly to exit WFE.
+ */
+#define __RTE_ARM_LOAD_EXC_32(src, dst, memorder) {      \
+	if (memorder == __ATOMIC_RELAXED) {              \
+		asm volatile("ldxr %w[tmp], [%x[addr]]"  \
+			: [tmp] "=&r" (dst)              \
+			: [addr] "r" (src)               \
+			: "memory");                     \
+	} else {                                         \
+		asm volatile("ldaxr %w[tmp], [%x[addr]]" \
+			: [tmp] "=&r" (dst)              \
+			: [addr] "r" (src)               \
+			: "memory");                     \
+	} }
+
+/*
+ * Atomic exclusive load from addr, it returns the 64-bit content of
+ * *addr while making it 'monitored', when it is written by someone
+ * else, the 'monitored' state is cleared and an event is generated
+ * implicitly to exit WFE.
+ */
+#define __RTE_ARM_LOAD_EXC_64(src, dst, memorder) {      \
+	if (memorder == __ATOMIC_RELAXED) {              \
+		asm volatile("ldxr %x[tmp], [%x[addr]]"  \
+			: [tmp] "=&r" (dst)              \
+			: [addr] "r" (src)               \
+			: "memory");                     \
+	} else {                                         \
+		asm volatile("ldaxr %x[tmp], [%x[addr]]" \
+			: [tmp] "=&r" (dst)              \
+			: [addr] "r" (src)               \
+			: "memory");                     \
+	} }
+
+/*
+ * Atomic exclusive load from addr, it returns the 128-bit content of
+ * *addr while making it 'monitored', when it is written by someone
+ * else, the 'monitored' state is cleared and an event is generated
+ * implicitly to exit WFE.
+ */
+#define __RTE_ARM_LOAD_EXC_128(src, dst, memorder) {                    \
+	volatile rte_int128_t *dst_128 = (volatile rte_int128_t *)&dst; \
+	if (memorder == __ATOMIC_RELAXED) {                             \
+		asm volatile("ldxp %x[tmp0], %x[tmp1], [%x[addr]]"      \
+			: [tmp0] "=&r" (dst_128->val[0]),               \
+			  [tmp1] "=&r" (dst_128->val[1])                \
+			: [addr] "r" (src)                              \
+			: "memory");                                    \
+	} else {                                                        \
+		asm volatile("ldaxp %x[tmp0], %x[tmp1], [%x[addr]]"     \
+			: [tmp0] "=&r" (dst_128->val[0]),               \
+			  [tmp1] "=&r" (dst_128->val[1])                \
+			: [addr] "r" (src)                              \
+			: "memory");                                    \
+	} }                                                             \
+
+#define __RTE_ARM_LOAD_EXC(src, dst, memorder, size) {          \
+	RTE_BUILD_BUG_ON(size != 16 && size != 32 && size != 64 \
+				&& size != 128);                \
+	if (size == 16)                                         \
+		__RTE_ARM_LOAD_EXC_16(src, dst, memorder)       \
+	else if (size == 32)                                    \
+		__RTE_ARM_LOAD_EXC_32(src, dst, memorder)       \
+	else if (size == 64)                                    \
+		__RTE_ARM_LOAD_EXC_64(src, dst, memorder)       \
+	else if (size == 128)                                   \
+		__RTE_ARM_LOAD_EXC_128(src, dst, memorder)      \
+}
+
+static __rte_always_inline void
+rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t expected,
+		int memorder)
+{
+	uint16_t value;
+
+	RTE_BUILD_BUG_ON(memorder != __ATOMIC_ACQUIRE &&
+				memorder != __ATOMIC_RELAXED);
+
+	__RTE_ARM_LOAD_EXC_16(addr, value, memorder)
 	if (value != expected) {
-		__SEVL()
+		__RTE_ARM_SEVL()
 		do {
-			__WFE()
-			__LOAD_EXC_16(addr, value, memorder)
+			__RTE_ARM_WFE()
+			__RTE_ARM_LOAD_EXC_16(addr, value, memorder)
 		} while (value != expected);
 	}
-#undef __LOAD_EXC_16
 }
 
 static __rte_always_inline void
@@ -75,36 +148,17 @@  rte_wait_until_equal_32(volatile uint32_t *addr, uint32_t expected,
 {
 	uint32_t value;
 
-	assert(memorder == __ATOMIC_ACQUIRE || memorder == __ATOMIC_RELAXED);
-
-	/*
-	 * Atomic exclusive load from addr, it returns the 32-bit content of
-	 * *addr while making it 'monitored',when it is written by someone
-	 * else, the 'monitored' state is cleared and a event is generated
-	 * implicitly to exit WFE.
-	 */
-#define __LOAD_EXC_32(src, dst, memorder) {              \
-	if (memorder == __ATOMIC_RELAXED) {              \
-		asm volatile("ldxr %w[tmp], [%x[addr]]"  \
-			: [tmp] "=&r" (dst)              \
-			: [addr] "r"(src)                \
-			: "memory");                     \
-	} else {                                         \
-		asm volatile("ldaxr %w[tmp], [%x[addr]]" \
-			: [tmp] "=&r" (dst)              \
-			: [addr] "r"(src)                \
-			: "memory");                     \
-	} }
+	RTE_BUILD_BUG_ON(memorder != __ATOMIC_ACQUIRE &&
+				memorder != __ATOMIC_RELAXED);
 
-	__LOAD_EXC_32(addr, value, memorder)
+	__RTE_ARM_LOAD_EXC_32(addr, value, memorder)
 	if (value != expected) {
-		__SEVL()
+		__RTE_ARM_SEVL()
 		do {
-			__WFE()
-			__LOAD_EXC_32(addr, value, memorder)
+			__RTE_ARM_WFE()
+			__RTE_ARM_LOAD_EXC_32(addr, value, memorder)
 		} while (value != expected);
 	}
-#undef __LOAD_EXC_32
 }
 
 static __rte_always_inline void
@@ -113,40 +167,36 @@  rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
 {
 	uint64_t value;
 
-	assert(memorder == __ATOMIC_ACQUIRE || memorder == __ATOMIC_RELAXED);
-
-	/*
-	 * Atomic exclusive load from addr, it returns the 64-bit content of
-	 * *addr while making it 'monitored',when it is written by someone
-	 * else, the 'monitored' state is cleared and a event is generated
-	 * implicitly to exit WFE.
-	 */
-#define __LOAD_EXC_64(src, dst, memorder) {              \
-	if (memorder == __ATOMIC_RELAXED) {              \
-		asm volatile("ldxr %x[tmp], [%x[addr]]"  \
-			: [tmp] "=&r" (dst)              \
-			: [addr] "r"(src)                \
-			: "memory");                     \
-	} else {                                         \
-		asm volatile("ldaxr %x[tmp], [%x[addr]]" \
-			: [tmp] "=&r" (dst)              \
-			: [addr] "r"(src)                \
-			: "memory");                     \
-	} }
+	RTE_BUILD_BUG_ON(memorder != __ATOMIC_ACQUIRE &&
+				memorder != __ATOMIC_RELAXED);
 
-	__LOAD_EXC_64(addr, value, memorder)
+	__RTE_ARM_LOAD_EXC_64(addr, value, memorder)
 	if (value != expected) {
-		__SEVL()
+		__RTE_ARM_SEVL()
 		do {
-			__WFE()
-			__LOAD_EXC_64(addr, value, memorder)
+			__RTE_ARM_WFE()
+			__RTE_ARM_LOAD_EXC_64(addr, value, memorder)
 		} while (value != expected);
 	}
 }
-#undef __LOAD_EXC_64
 
-#undef __SEVL
-#undef __WFE
+#define rte_wait_event(addr, mask, cond, expected, memorder)              \
+do {                                                                      \
+	RTE_BUILD_BUG_ON(!__builtin_constant_p(memorder));                \
+	RTE_BUILD_BUG_ON(memorder != __ATOMIC_ACQUIRE &&                  \
+				memorder != __ATOMIC_RELAXED);            \
+	const uint32_t size = sizeof(*(addr)) << 3;                       \
+	typeof(*(addr)) expected_value = (expected);                      \
+	typeof(*(addr)) value;                                            \
+	__RTE_ARM_LOAD_EXC((addr), value, memorder, size)                 \
+	if ((value & (mask)) cond expected_value) {                       \
+		__RTE_ARM_SEVL()                                          \
+		do {                                                      \
+			__RTE_ARM_WFE()                                   \
+			__RTE_ARM_LOAD_EXC((addr), value, memorder, size) \
+		} while ((value & (mask)) cond expected_value);           \
+	}                                                                 \
+} while (0)
 
 #endif
 
diff --git a/lib/eal/include/generic/rte_pause.h b/lib/eal/include/generic/rte_pause.h
index 668ee4a184..d0c5b5a415 100644
--- a/lib/eal/include/generic/rte_pause.h
+++ b/lib/eal/include/generic/rte_pause.h
@@ -111,6 +111,34 @@  rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
 	while (__atomic_load_n(addr, memorder) != expected)
 		rte_pause();
 }
+
+/*
+ * Wait until *addr breaks the condition, with a relaxed memory
+ * ordering model meaning the loads around this API can be reordered.
+ *
+ * @param addr
+ *  A pointer to the memory location.
+ * @param mask
+ *  A mask of value bits in interest.
+ * @param cond
+ *  A symbol representing the condition.
+ * @param expected
+ *  An expected value to be in the memory location.
+ * @param memorder
+ *  Two different memory orders that can be specified:
+ *  __ATOMIC_ACQUIRE and __ATOMIC_RELAXED. These map to
+ *  C++11 memory orders with the same names, see the C++11 standard or
+ *  the GCC wiki on atomic synchronization for detailed definition.
+ */
+#define rte_wait_event(addr, mask, cond, expected, memorder)                       \
+do {                                                                               \
+	RTE_BUILD_BUG_ON(!__builtin_constant_p(memorder));                         \
+	RTE_BUILD_BUG_ON(memorder != __ATOMIC_ACQUIRE &&                           \
+				memorder != __ATOMIC_RELAXED);                     \
+	typeof(*(addr)) expected_value = (expected);                               \
+	while ((__atomic_load_n((addr), (memorder)) & (mask)) cond expected_value) \
+		rte_pause();                                                       \
+} while (0)
 #endif
 
 #endif /* _RTE_PAUSE_H_ */