[v7,3/7] spinlock: use wfe to reduce contention on aarch64

Message ID 1569562904-43950-4-git-send-email-gavin.hu@arm.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series use WFE for aarch64 |

Checks

Context Check Description
ci/Intel-compilation success Compilation OK
ci/checkpatch success coding style OK

Commit Message

Gavin Hu Sept. 27, 2019, 5:41 a.m. UTC
  In acquiring a spinlock, cores repeatedly poll the lock variable.
This is replaced by rte_wait_until_equal API.

Running the micro benchmarking and the testpmd and l3fwd traffic tests
on ThunderX2, Ampere eMAG80 and Arm N1SDP, everything went well and no
notable performance gain nor degradation was measured.

Signed-off-by: Gavin Hu <gavin.hu@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Tested-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 .../common/include/arch/arm/rte_spinlock.h         | 26 ++++++++++++++++++++++
 1 file changed, 26 insertions(+)
  

Comments

David Marchand Oct. 17, 2019, 6:27 p.m. UTC | #1
On Fri, Sep 27, 2019 at 7:43 AM Gavin Hu <gavin.hu@arm.com> wrote:
>
> In acquiring a spinlock, cores repeatedly poll the lock variable.
> This is replaced by rte_wait_until_equal API.
>
> Running the micro benchmarking and the testpmd and l3fwd traffic tests
> on ThunderX2, Ampere eMAG80 and Arm N1SDP, everything went well and no
> notable performance gain nor degradation was measured.
>
> Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Reviewed-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Steve Capper <steve.capper@arm.com>
> Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Tested-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> ---
>  .../common/include/arch/arm/rte_spinlock.h         | 26 ++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
>
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> index 1a6916b..b61c055 100644
> --- a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> +++ b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> @@ -16,6 +16,32 @@ extern "C" {
>  #include <rte_common.h>
>  #include "generic/rte_spinlock.h"
>
> +/* armv7a does support WFE, but an explicit wake-up signal using SEV is
> + * required (must be preceded by DSB to drain the store buffer) and
> + * this is less performant, so keep armv7a implementation unchanged.
> + */
> +#ifndef RTE_FORCE_INTRINSICS

Earlier, in the same file, I can see:
https://git.dpdk.org/dpdk/tree/lib/librte_eal/common/include/arch/arm/rte_spinlock.h?h=v19.08#n8

#ifndef RTE_FORCE_INTRINSICS
#  error Platform must be built with CONFIG_RTE_FORCE_INTRINSICS
#endif

IIUC, this is dead code.

> +static inline void
> +rte_spinlock_lock(rte_spinlock_t *sl)
> +{
> +       unsigned int tmp;
> +       /* http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.
> +        * faqs/ka16809.html
> +        */
> +       asm volatile(
> +               "1:     ldaxr %w[tmp], %w[locked]\n"
> +               "cbnz   %w[tmp], 2f\n"
> +               "stxr   %w[tmp], %w[one], %w[locked]\n"
> +               "cbnz   %w[tmp], 1b\n"
> +               "ret\n"
> +               "2:     sevl\n"
> +               "wfe\n"
> +               "jmp    1b\n"
> +               : [tmp] "=&r" (tmp), [locked] "+Q"(sl->locked)
> +               : [one] "r" (1)
> +}
> +#endif
> +
>  static inline int rte_tm_supported(void)
>  {
>         return 0;
> --
> 2.7.4
>
  
Gavin Hu Oct. 18, 2019, 5:45 a.m. UTC | #2
Hi David,

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Friday, October 18, 2019 2:28 AM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>
> Cc: dev <dev@dpdk.org>; nd <nd@arm.com>; thomas@monjalon.net;
> Stephen Hemminger <stephen@networkplumber.org>;
> hemant.agrawal@nxp.com; jerinj@marvell.com; Pavan Nikhilesh
> <pbhagavatula@marvell.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang (Arm Technology China)
> <Ruifeng.Wang@arm.com>; Phil Yang (Arm Technology China)
> <Phil.Yang@arm.com>; Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v7 3/7] spinlock: use wfe to reduce
> contention on aarch64
> 
> On Fri, Sep 27, 2019 at 7:43 AM Gavin Hu <gavin.hu@arm.com> wrote:
> >
> > In acquiring a spinlock, cores repeatedly poll the lock variable.
> > This is replaced by rte_wait_until_equal API.
> >
> > Running the micro benchmarking and the testpmd and l3fwd traffic tests
> > on ThunderX2, Ampere eMAG80 and Arm N1SDP, everything went well
> and no
> > notable performance gain nor degradation was measured.
> >
> > Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Phil Yang <phil.yang@arm.com>
> > Reviewed-by: Steve Capper <steve.capper@arm.com>
> > Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Tested-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> > ---
> >  .../common/include/arch/arm/rte_spinlock.h         | 26
> ++++++++++++++++++++++
> >  1 file changed, 26 insertions(+)
> >
> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> > index 1a6916b..b61c055 100644
> > --- a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> > +++ b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> > @@ -16,6 +16,32 @@ extern "C" {
> >  #include <rte_common.h>
> >  #include "generic/rte_spinlock.h"
> >
> > +/* armv7a does support WFE, but an explicit wake-up signal using SEV is
> > + * required (must be preceded by DSB to drain the store buffer) and
> > + * this is less performant, so keep armv7a implementation unchanged.
> > + */
> > +#ifndef RTE_FORCE_INTRINSICS
> 
> Earlier, in the same file, I can see:
> https://git.dpdk.org/dpdk/tree/lib/librte_eal/common/include/arch/arm/rte
> _spinlock.h?h=v19.08#n8
> 
> #ifndef RTE_FORCE_INTRINSICS
> #  error Platform must be built with CONFIG_RTE_FORCE_INTRINSICS
> #endif
> 
> IIUC, this is dead code.
Yes, will remove in next version.

> 
> > +static inline void
> > +rte_spinlock_lock(rte_spinlock_t *sl)
> > +{
> > +       unsigned int tmp;
> > +       /* http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.
> > +        * faqs/ka16809.html
> > +        */
> > +       asm volatile(
> > +               "1:     ldaxr %w[tmp], %w[locked]\n"
> > +               "cbnz   %w[tmp], 2f\n"
> > +               "stxr   %w[tmp], %w[one], %w[locked]\n"
> > +               "cbnz   %w[tmp], 1b\n"
> > +               "ret\n"
> > +               "2:     sevl\n"
> > +               "wfe\n"
> > +               "jmp    1b\n"
> > +               : [tmp] "=&r" (tmp), [locked] "+Q"(sl->locked)
> > +               : [one] "r" (1)
> > +}
> > +#endif
> > +
> >  static inline int rte_tm_supported(void)
> >  {
> >         return 0;
> > --
> > 2.7.4
> >
> 
> 
> --
> David Marchand
  
Gavin Hu Oct. 21, 2019, 7:27 a.m. UTC | #3
Hi David,

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Friday, October 18, 2019 2:28 AM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>
> Cc: dev <dev@dpdk.org>; nd <nd@arm.com>; thomas@monjalon.net;
> Stephen Hemminger <stephen@networkplumber.org>;
> hemant.agrawal@nxp.com; jerinj@marvell.com; Pavan Nikhilesh
> <pbhagavatula@marvell.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang (Arm Technology China)
> <Ruifeng.Wang@arm.com>; Phil Yang (Arm Technology China)
> <Phil.Yang@arm.com>; Steve Capper <Steve.Capper@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v7 3/7] spinlock: use wfe to reduce
> contention on aarch64
> 
> On Fri, Sep 27, 2019 at 7:43 AM Gavin Hu <gavin.hu@arm.com> wrote:
> >
> > In acquiring a spinlock, cores repeatedly poll the lock variable.
> > This is replaced by rte_wait_until_equal API.
> >
> > Running the micro benchmarking and the testpmd and l3fwd traffic tests
> > on ThunderX2, Ampere eMAG80 and Arm N1SDP, everything went well
> and no
> > notable performance gain nor degradation was measured.
> >
> > Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Phil Yang <phil.yang@arm.com>
> > Reviewed-by: Steve Capper <steve.capper@arm.com>
> > Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Tested-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> > ---
> >  .../common/include/arch/arm/rte_spinlock.h         | 26
> ++++++++++++++++++++++
> >  1 file changed, 26 insertions(+)
> >
> > diff --git a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> > index 1a6916b..b61c055 100644
> > --- a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> > +++ b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
> > @@ -16,6 +16,32 @@ extern "C" {
> >  #include <rte_common.h>
> >  #include "generic/rte_spinlock.h"
> >
> > +/* armv7a does support WFE, but an explicit wake-up signal using SEV
> is
> > + * required (must be preceded by DSB to drain the store buffer) and
> > + * this is less performant, so keep armv7a implementation unchanged.
> > + */
> > +#ifndef RTE_FORCE_INTRINSICS
> 
> Earlier, in the same file, I can see:
> https://git.dpdk.org/dpdk/tree/lib/librte_eal/common/include/arch/arm/r
> te_spinlock.h?h=v19.08#n8
> 
> #ifndef RTE_FORCE_INTRINSICS
> #  error Platform must be built with CONFIG_RTE_FORCE_INTRINSICS
> #endif
> 
> IIUC, this is dead code.

This is not dead code, RTE_FORCE_INTRINSICS is still mandatory for aarch64 ad RTE_ARM_USE_WFE is optional currently.
Yes, as Jerin pointed out, we may make it optional also, like x86, but now it is still too earlier before WFE is mandatory, anyway it is in our plan. 
I will tweak a little bit for the two macros to reflect this logic in v8.
/Gavin
> 
> > +static inline void
> > +rte_spinlock_lock(rte_spinlock_t *sl)
> > +{
> > +       unsigned int tmp;
> > +       /* http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.
> > +        * faqs/ka16809.html
> > +        */
> > +       asm volatile(
> > +               "1:     ldaxr %w[tmp], %w[locked]\n"
> > +               "cbnz   %w[tmp], 2f\n"
> > +               "stxr   %w[tmp], %w[one], %w[locked]\n"
> > +               "cbnz   %w[tmp], 1b\n"
> > +               "ret\n"
> > +               "2:     sevl\n"
> > +               "wfe\n"
> > +               "jmp    1b\n"
> > +               : [tmp] "=&r" (tmp), [locked] "+Q"(sl->locked)
> > +               : [one] "r" (1)
> > +}
> > +#endif
> > +
> >  static inline int rte_tm_supported(void)
> >  {
> >         return 0;
> > --
> > 2.7.4
> >
> 
> 
> --
> David Marchand
  

Patch

diff --git a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
index 1a6916b..b61c055 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_spinlock.h
@@ -16,6 +16,32 @@  extern "C" {
 #include <rte_common.h>
 #include "generic/rte_spinlock.h"
 
+/* armv7a does support WFE, but an explicit wake-up signal using SEV is
+ * required (must be preceded by DSB to drain the store buffer) and
+ * this is less performant, so keep armv7a implementation unchanged.
+ */
+#ifndef RTE_FORCE_INTRINSICS
+static inline void
+rte_spinlock_lock(rte_spinlock_t *sl)
+{
+	unsigned int tmp;
+	/* http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.
+	 * faqs/ka16809.html
+	 */
+	asm volatile(
+		"1:	ldaxr %w[tmp], %w[locked]\n"
+		"cbnz   %w[tmp], 2f\n"
+		"stxr   %w[tmp], %w[one], %w[locked]\n"
+		"cbnz   %w[tmp], 1b\n"
+		"ret\n"
+		"2:	sevl\n"
+		"wfe\n"
+		"jmp	1b\n"
+		: [tmp] "=&r" (tmp), [locked] "+Q"(sl->locked)
+		: [one] "r" (1)
+}
+#endif
+
 static inline int rte_tm_supported(void)
 {
 	return 0;